75
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

How many clusters exist? Answer via maximum clustering similarity implemented in R

ORCID Icon, , &
Pages 62-79 | Received 07 Jul 2018, Accepted 20 Apr 2019, Published online: 22 May 2019
 

Abstract

Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R© statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R©, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.

Disclosure statement

No potential conflict of interest was reported by the authors.

ORCID

Ahmed N. Albatineh http://orcid.org/0000-0001-5646-4945

Additional information

Notes on contributors

Ahmed N. Albatineh

Ahmed N. Albatineh is currently an associate professor of Biostatistics in the department of Community Medicine and Behavioral Sciences within the Faculty of Medicine at Kuwait University. He received a Bachelor of Science in Mathematics from Yarmouk University in Jordan, a Master of Science in Operations Research, a Master of Science in Applied Statistics, and a PhD in Statistics all from Western Michigan University in Kalamazoo, Michigan, USA. He taught at Nova Southeastern University and Florida International University. His research interests are in Cluster Analysis, Statistical Computations, and application of Statistics in Health Sciences.

Meredith L. Wilcox

Meredith L. Wilcox is the Director of Project and Quality Management at Midwest Biomedical Research. In this role, she manages clinical trials from the start-up phase to study completion. She also oversees the conduct and quality of nutrition and pharmaceutical trials at the site level at MB Clinical Research. Meredith is currently transitioning to a statistician role at Midwest Biomedical Research. Meredith holds a Bachelor of Science in Statistics and a Master of Public Health (MPH) with a specialization in Biostatistics.

Bashar Zogheib

Bashar Zogheib received his PhD in Mathematics from the University of Windsor, Ontario, Canada in 2006 after receiving two Master degrees: in Statistics and mathematics from the University of Windsor, Canada. He also received a third Master degree in Mathematics Education from Wayne State University, Michigan, USA. His research and numerous peer-reviewed publications focus primarily on numerical solutions for partial differential equations, computational fluid dynamics, applied statistics, and mathematics education. He previously taught at the University of Windsor in Canada, Millersville University of Pennsylvania and Nova Southeastern University in Florida. Currently, he is the Associate Dean for Administration for the college of Arts and Sciences and a Professor of Mathematics at the American University of Kuwait.

Magdalena Niewiadomska-Bugaj

Magdalena Niewiadomska-Bugaj is professor and chair of the Department of Statistics at Western Michigan University in Kalamazoo, Michigan, USA. Her research interests include classification, categorical data, methodology for zero inflated data, and association modeling.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.