Automatic document classification is a common problem that has successfully been addressed with machine learning methods. However, these methods require extensive training data, which is not always readily available. Additionally, in privacy-sensitive settings, transfer and reuse of trained machine learning models is not an option because sensitive information could potentially be reconstructed from the model. Therefore, we propose a transfer learning method that uses ontologies to normalize the feature space of text classifiers to create a controlled vocabulary. This ensures that the trained models do not contain personal data, and can be widely reused without violating the GDPR. Furthermore, the ontologies can be enriched so that the classifiers can be transferred to contexts with different terminology without additional training. Applying classifiers trained on medical documents to medical texts written in colloquial language shows promising results and highlights the potential of the approach. The compliance with GDPR by design opens many further application domains for transfer learning based solutions.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com