Biomedical research has become data-driven. To create the required big datasets, health data needs to be shared or reused out of the context of its initial purpose. This leads to significant privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and robust tools. In this work, we focus on the problem of scalability.
Protecting biomedical data from inference attacks is challenging, in particular for numeric data. An important privacy model in this context is t-closeness, which has also been defined for attribute values which are totally ordered. However, directly implementing a scalable algorithmic representation of the mathematical definition of the model proves difficult. In this paper we therefore present a series of optimizations that can be used to achieve efficiency in production use. An experimental evaluation shows that our approach reduces execution times of anonymization processes involving t-closeness by up to a factor of two.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com