Publicly available datasets – for example via cBioPortal for Cancer Genomics – could be a valuable source for benchmarks and comparisons with local patient records. However, such an approach is only valid if patient cohorts are comparable to each other and if the documentation is complete and sufficient. In this paper, records from exocrine pancreatic cancer patients documented in a local cancer registry are compared with two public datasets to calculate overall survival. Several data preprocessing steps were necessary to ensure comparability of the different datasets and a common database schema was created. Our assumption that the public datasets could be used to augment the data of the local cancer registry could not be validated, since the analysis on overall survival showed a significant difference. We discuss several reasons and explanations for this finding. So far, comparing different datasets with each other and drawing medical conclusions on such comparisons should be conducted with great caution.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com