Alphons Eggerth, Dieter Hayn, Karl Kreiner, Sai Veeranki, Heimo Traninger, Robert Modre-Osprian, Günter Schreier
Abstract
Background: Huge amounts of data are collected by healthcare providers and other institutions. However, there are data protection regulations, which limit their utilisation for secondary use, e.g. research. In scenarios, where several data sources are obtained without universal identifiers, record linkage methods need to be applied to obtain a comprehensive dataset.
Objectives: In this study, we had the objective to link two datasets comprising data from ergometric performance tests in order to have reference values to free text annotations for assessing their data quality.
Methods: We applied an iterative, distance-based time series record linkage algorithm to find corresponding entries in the two given datasets. Subsequently, we assessed the resulting matching rate. The implementation was done in Matlab.
Results: The matching rate of our record linkage algorithm was 74.5% for matching patients' records with their ergometry records. The highest rate of appropriate free text annotations was 87.9%.
Conclusion: For the given scenario, our algorithm matched 74.5% of the patients. However, we had no gold standard for validating our results. Most of the free text annotations contained the expected values.