Registries of clinical studies such as ClinicalTrials.gov are an important source of information. However, the process of manually entering metadata is prone to errors which impedes their use and thereby the overall usefulness of the registry. In this work, we propose a generic approach towards detection of errors in the metadata by using the Shapes Constraint Language for defining rule templates covering constraints regarding value type and cardinality. We developed a Python 3 algorithm for the automatic validation of 15 rule instances applied to the whole ClinicalTrials.gov database (355,862 studies; 27th October 2020) resulting in more than 5 million metadata verifications. Our results show a large number of errors in different metadata fields, such as i) missing values, ii) values not coming from a predefined set or iii) wrong cardinalities, can be detected using this approach. Since 2015 approximately 5% of all studies contain one or more errors. In the future, we will apply this technique to other registries and develop more complex rules by focusing on the semantics of the metadata. This could render the possibility of automatically correcting entries, increasing the value of registries of clinical studies.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com