ABSTRACT
Clustering is a data mining technique that has been effectively used in the last few decades for knowledge extraction. Privacy is a major problem while releasing data for clustering and therefore privacy-preserving data mining (PPDM) algorithms have been developed. Aggregation is a popular PPDM technique that has been used. However, in the last few years, certain applications require that data mining be performed on high-dimensional data. The present privacy preservation techniques perform aggregation in a univariate manner along each dimension. This affects the utility measures, information measures, and especially retention of original clusters. This paper proposes a new technique called as subspace-based aggregation (SBA). SBA categorizes the dimensions into dense and non-dense subspaces based on the density of points. Aggregation is performed separately for dense and non-dense subspaces. This approach helps to maximize utility measures, information measures, and retention of clusters. SBA is run on high-dimensional continuous datasets from UCI Machine Learning repository. SBA is compared with related work methods such as SINGLE, SIMPLE, MDAV, and PPPCA. SBA provides an improvement of 66% in utility, 400% in cluster identification, 5% in co-variance, and standard deviation.
Disclosure statement
No potential conflict of interest was reported by the authors.
Additional information
Notes on contributors
Shashidhar Virupaksha
Shashidhar Virupaksha has received his BTech degree in Computers Science Engineering from SASTRA University. He received his Masters in Computer Science Engineering from BIT Mesra. He has worked in WIPRO Technologies for Information Security and Data privacy projects. He was involved in finding critical flaws in ESSO Software. He was Head of the Department CSE at VLITS Guntur. Presently he is working in Presidency University Bengaluru. He is pursuing his PhD from VFSTR Deemed to be University, Guntur Andhra Pradesh. He has publications in Springer and IEEE Conferences.
D. Venkatesulu
D. Venkatesulu received his MTech degree (1988) from Andhra University, Visakhapatnam and PhD (1999) from IIT, Madras. He worked in IT industry for a period of 18 years and he is currently working as a Professor and Head in the Department of Computer Science and Engineering, VFSTR Deemed to be University, Guntur Andhra Pradesh. His areas of interest include distributed systems, Data mining and Wireless sensor networks.