As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Background: A challenge of using electronic health records for secondary analyses is data quality. Body mass index (BMI) is an important predictor for various diseases but often not documented properly.
Objectives: The aim of our study is to perform data cleansing on BMI values and to find the best method for an imputation of missing values in order to increase data quality. Further, we want to assess the effect of changes in data quality on the performance of a prediction model based on machine learning.
Methods: After data cleansing on BMI data, we compared machine learning methods and statistical methods in their accuracy of imputed values using the root mean square error. In a second step, we used three variations of BMI data as a training set for a model predicting the occurrence of delirium.
Results: Neural network and linear regression models performed best for imputation. There were no changes in model performance for different BMI input data.
Conclusion: Although data quality issues may lead to biases, it does not always affect performance of secondary analyses.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.