As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
A record linkage algorithm tries to identify records which belong to the same individual. We analyze the matching behavior of an approach used in the E-PIX matching tool on the very limited attribute set of first name, last name, date of birth and sex. Our benchmark set contains almost 37,000 records from the Popgen biobank. We develop a model which allows us to predict the workload on clerical review for data sets growing up to a factor of 10 or even more, without the need for a data set of this size. Based on this model we show two parameter sets with comparable detection rate of true duplicates, but where only one of them scales well on growing data sets. Our model provides realistic example records for each predicted matching of an upscaled data set. Thus, it enables to identify the parameters which need to be adjusted in order to improve the quality of the matching candidates. We also show that unreviewed merging of records is prone to homonym errors on data sets with 200,000 records and the limited attribute set above, while the merged record pairs are obviously different in clerical review.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.