Privacy-Preserving Hierarchical-k-means Clustering on Horizontally Partitioned Data

Anrong Xue School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, ChinaCorrespondence[email protected]

Dongjie Jiang School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, China

Shiguang Ju School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, China

Weihe Chen School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, China

Handa Ma School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, China

Abstract

Privacy preserving mining of distributed data is an important direction for data mining, and privacy preserving clustering is one of the main researches. Privacy-preserving data mining techniques enable knowledge discovery without requiring disclosure of private data. The existing privacy preserving algorithms mainly concentrated on association rules and classification, only few algorithms on privacy preserving clustering, and these algorithms mainly concentrated on centralized and vertically partitioned data. So we proposed privacy preserving hierarchical k-means clustering algorithm on horizontally partitioned data, denoted as HPPHKC.

The complexity on k-means clustering algorithm is only O(n), so most existing privacy preserving clustering algorithms are concentrated on k-means and based on two parties and the trusted third party, these algorithms have the drawbacks of inaccurate results because of choosing initial clustering centers randomly and applying to multi-party difficult and revealing privacy because of depending on the third party excessively. By introduction of three protocols for secure multi-party computation: distance computation, cluster center computation, and standardization and combination of the merits of hierarchical and k-means clustering, we presented a privacy-preserving hierarchical-k-means clustering algorithm on horizontally partitioned data for semi-honest parties using some secure multi-party computation protocols. The algorithm uses the security protocol mentioned above to achieve the protection of the privacy data, and uses the hierarchical clustering algorithm to obtain k cluster centers, then uses the k-means clustering algorithm to obtain the final k clusters. We introduce the clustering feature and the clustering feature tree, which are used to summarize the cluster representations. A clustering feature (CF) is a three-dimensional vector summarizing information about clusters of objects. The i^th clustering feature is CF_i = (cn_i,cc_i,cp_i), where cn_i is the number of i^th clusters, denoted as the size of i^th cluster, cc_i is the center of the cn_i objects, and cp_i is the pointer of the list of cn_i objects. The algorithm has two phases: the first phase, every object can be as a cluster, a secure computation protocol is used to compute the dissimilarity matrix and the most similar clusters will be merged. This process is repeated until we get the assigned clusters number k and get k clustering centers. In the second phase, the semi-honest third party and all data involved parties use the k-means algorithm refine the results of the first phase and get the final clustering results. Finally, we give the proof of security of the algorithm and analysis of communication costs, and we show that our scheme is secure and complete with good efficiency.

Keywords:

This work was supported by the National Natural Science Foundation of China (No: 60603041, No:60773049), the Science Foundation of Jiangsu Education Council (No: 05KJB520017), the Science Foundation of Jiangsu (No: BK2006073).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Privacy-Preserving Hierarchical-k-means Clustering on Horizontally Partitioned Data

Information for

Open access

Opportunities

Help and information

Privacy-Preserving Hierarchical-k-means Clustering on Horizontally Partitioned Data

Abstract

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature