Improving Data Utilization of K-anonymity through Clustering Optimization
Hewen Wang(a), Jingsha He(a), Nafei Zhu(a),(*)
Transactions on Data Privacy 15:3 (2022) 177 - 192
Abstract, PDF
(a) Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China.
e-mail:hewenwang @emails.bjut.edu.cn; jhe @bjut.edu.cn; znf @bjut.edu.cn
|
Abstract
K-anonymity privacy protection model demonstrates good performance in privacy protection and, has been widely applied in such scenarios as data publishing, location-based services, and social networks. With the aim of ensuring k-anonymity to conform to the requirements of privacy protection with improved data utilization, this study proposes a k-anonymity algorithm based on central point clustering, so as to improve the quality of clustering through optimizing the selection of cluster centroids, leading to the improvement in effectiveness and efficiency of k-anonymity. After clustering, the quasi-identifier attributes are aligned for classification and generalization, which is evaluated using appropriate information loss metrics. To measure the distance between records and between records and clusters, this study also establishes a definition of such distance that is positively correlated to the amount of information that is lost by combining the characteristics of the depth and width of the generalization hierarchy, in an effort to improve of the utility of the algorithm. The experimental results show that the proposed algorithm not only meets the basic anonymity requirements, but also improves data utilization compared with some prevailing algorithms.
|