Data quality (DQ) is an important prerequisite for secondary use of electronic health record (EHR) data in clinical research, particularly with regards to progressing towards a learning health system, one of the MIRACUM consortium’s goals. Following the successful integration of the i2b2 research data repository in MIRACUM, we present a standardized and generic DQ framework.
State of the art:
Already established DQ evaluation methods do not cover all of MIRACUM’s requirements.
A data quality analysis plan was developed to assess common data quality dimensions for demographic-, condition-, procedure- and department-related variables of MIRACUM’s research data repository.
A data quality analysis (DQA) tool was developed using R scripts packaged in a Docker image with all the necessary dependencies and R libraries for easy distribution. It integrates with the i2b2 data repository at each MIRACUM site, executes an analysis on the data and generates a DQ report.
Our DQA tool brings the analysis to the data and thus meets the MIRACUM data protection requirements. It evaluates established DQ dimensions of data repositories in a standardized and easily distributable way. This analysis allowed us to reveal and revise inconsistencies in earlier versions of the ETL jobs. The framework is portable, easy to deploy across different sites and even further adaptable to other database schemes.
The presented framework provides the first step towards a unified, standardized and harmonized EHR DQ assessment in MIRACUM. DQ issues can now be systematically identified by individual hospitals to subsequently implement site- or consortium-wide feedback loops to increase data quality.