As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Record linkage across diverse domains is a challenging task with industrial applications ranging from medical records to social media identity linkage. In this study, we present a complex case in the music industry: linking incomplete metadata across domains, from Sound Recordings (SRs) to their corresponding Musical Works (WKs). We present a definition of the problem and highlight its key aspects: comparing record fields beyond conventional string similarity; matching lists of names that only partially align; applying attribute rules, as some attribute values may reflect the quality of information; and applying contextual rules, since the match between an SR and a WK should be evaluated within the context consisting of related WKs. We present a synthetic benchmark that replicates the complexities of the real-world industry problems. While not the focus of the paper, we also report preliminary results of a Transformer-based model that leverages pre-trained embeddings of entity attribute values along with information from the aforementioned key aspects.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.