Difficulties in reconstituting patients' trajectory in the public health information systems are raised by errors in patients' identification processes. A crucial issue to achieve is avoiding doubles in distributed web databases. We explored Needleman and Wunsch (N&W) algorithm in order to optimize the properties of string matching. Five variants of the N&W algorithm were developed. The algorithms were implemented for a web Multi-Source Information System. This system was dedicated to tracking patients with End-Stage Renal Disease at both regional and national level. A simulated study database of 73,210 records was created. An insertion or suppression of each character of the original string was simulated. The rate of double entries was 2% given an acceptable distance set to 5 modifications. The search was sensitive and specific with an acceptable detection time. It detected up to 10% of modifications that is above the estimated error rate. A variant of the N&W algorithm designed as “cut-off heuristic”, proved to be efficient for the search of double entries occurring in nominative distributed databases.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com