86
Views
2
CrossRef citations to date
0
Altmetric
Articles

Subspace-based aggregation for enhancing utility, information measures, and cluster identification in privacy preserved data mining on high-dimensional continuous data

ORCID Icon & ORCID Icon
Pages 1130-1139 | Received 07 Aug 2019, Accepted 24 Oct 2019, Published online: 18 Nov 2019
 

ABSTRACT

Clustering is a data mining technique that has been effectively used in the last few decades for knowledge extraction. Privacy is a major problem while releasing data for clustering and therefore privacy-preserving data mining (PPDM) algorithms have been developed. Aggregation is a popular PPDM technique that has been used. However, in the last few years, certain applications require that data mining be performed on high-dimensional data. The present privacy preservation techniques perform aggregation in a univariate manner along each dimension. This affects the utility measures, information measures, and especially retention of original clusters. This paper proposes a new technique called as subspace-based aggregation (SBA). SBA categorizes the dimensions into dense and non-dense subspaces based on the density of points. Aggregation is performed separately for dense and non-dense subspaces. This approach helps to maximize utility measures, information measures, and retention of clusters. SBA is run on high-dimensional continuous datasets from UCI Machine Learning repository. SBA is compared with related work methods such as SINGLE, SIMPLE, MDAV, and PPPCA. SBA provides an improvement of 66% in utility, 400% in cluster identification, 5% in co-variance, and standard deviation.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Shashidhar Virupaksha

Shashidhar Virupaksha has received his BTech degree in Computers Science Engineering from SASTRA University. He received his Masters in Computer Science Engineering from BIT Mesra. He has worked in WIPRO Technologies for Information Security and Data privacy projects. He was involved in finding critical flaws in ESSO Software. He was Head of the Department CSE at VLITS Guntur. Presently he is working in Presidency University Bengaluru. He is pursuing his PhD from VFSTR Deemed to be University, Guntur Andhra Pradesh. He has publications in Springer and IEEE Conferences.

D. Venkatesulu

D. Venkatesulu received his MTech degree (1988) from Andhra University, Visakhapatnam and PhD (1999) from IIT, Madras. He worked in IT industry for a period of 18 years and he is currently working as a Professor and Head in the Department of Computer Science and Engineering, VFSTR Deemed to be University, Guntur Andhra Pradesh. His areas of interest include distributed systems, Data mining and Wireless sensor networks.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.