Abstract
Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel ‘self’-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well.
GRAPHICAL ABSTRACT
Disclosure statement
No potential conflict of interest was reported by the author(s).
Acknowledgments
The authors would like to thank Yvette Perrot for her tireless support, advice, and knowledge of Astrophysical observation and connecting me with presenting to the Square Kilometer Array Organisation (SKAO).