ABSTRACT
Cluster analysis is the most popular and often the foremost task in big data analytics as it helps in unearthing hidden patterns and trends in data. Traditional single-objective clustering techniques often suffer from accuracy fluctuations especially when applied over data groups of varying densities and imbalanced distribution as well as in the presence of outliers. This paper presents a multi-phase clustering solution that achieves good accuracy measures even in the case of noisy and not- well-separated data (linearly not separable data). The proposed design combines a two-stage Particle Swarm Optimisation (PSO) clustering with K-means logic and a state-of-the-art outlier removal technique. The use of two different optimisation criteria in the two stages of PSO clustering equips the model with the ability to escape local minima traps in the process of convergence. Extensive experiments featuring a wide variety of data have been carried out and the system could achieve accuracy levels as high as 99.9% and an average of 87.4% on notwell-separated data. The model has also been proved to be robust on eight out of the ten datasets of the Fundamental Clustering Problem Suit (FCPS), a benchmark for clustering algorithms.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data references
[1] The Fundamental Clustering Problems Suite (FCPS), https://www.uni-marburg.de/fb12/arbeitsgruppen/datenbionik/data?language_sync=1
[2] Dua, D. & Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science