Patient medical records are often fragmented across disparate healthcare databases, potentially resulting in duplicate records that may be detrimental to health care services. These duplicate records can be found through a process called record linkage. This paper describes a set of duplicate records in a medical data warehouse found by linking to an external resource containing family history and vital records. Our objective was to investigate the impact database characteristics and linkage methods have on identifying duplicate records using an external resource. Frequency counts were made for demographic field values and compared between the set of duplicate records, the data warehouse, and the external resource. Considerations for understanding the relationship that records labeled as duplicates have with dataset characteristics and linkage methods were identified. Several noticeable patterns were identified where frequency counts between sets deviated from what was expected including how the growth of a minority population affected which records were identified as duplicates. Record linkage is a complex process where results can be affected by subtleties in data characteristics, changes in data trends, and reliance on external data sources. These changes should be taken into account to ensure any anomalies in results describe real effects and are not artifacts caused by datasets or linkage methods. This paper describes how frequency count analysis can be an effective way to detect and resolve anomalies in linkage results and how external resources that provide additional contextual information can prove useful in discovering duplicate records.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com