75
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

How many clusters exist? Answer via maximum clustering similarity implemented in R

ORCID Icon, , &
Pages 62-79 | Received 07 Jul 2018, Accepted 20 Apr 2019, Published online: 22 May 2019
 

Abstract

Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R© statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R©, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.

Disclosure statement

No potential conflict of interest was reported by the authors.

ORCID

Ahmed N. Albatineh http://orcid.org/0000-0001-5646-4945

Additional information

Notes on contributors

Ahmed N. Albatineh

Ahmed N. Albatineh is currently an associate professor of Biostatistics in the department of Community Medicine and Behavioral Sciences within the Faculty of Medicine at Kuwait University. He received a Bachelor of Science in Mathematics from Yarmouk University in Jordan, a Master of Science in Operations Research, a Master of Science in Applied Statistics, and a PhD in Statistics all from Western Michigan University in Kalamazoo, Michigan, USA. He taught at Nova Southeastern University and Florida International University. His research interests are in Cluster Analysis, Statistical Computations, and application of Statistics in Health Sciences.

Meredith L. Wilcox

Meredith L. Wilcox is the Director of Project and Quality Management at Midwest Biomedical Research. In this role, she manages clinical trials from the start-up phase to study completion. She also oversees the conduct and quality of nutrition and pharmaceutical trials at the site level at MB Clinical Research. Meredith is currently transitioning to a statistician role at Midwest Biomedical Research. Meredith holds a Bachelor of Science in Statistics and a Master of Public Health (MPH) with a specialization in Biostatistics.

Bashar Zogheib

Bashar Zogheib received his PhD in Mathematics from the University of Windsor, Ontario, Canada in 2006 after receiving two Master degrees: in Statistics and mathematics from the University of Windsor, Canada. He also received a third Master degree in Mathematics Education from Wayne State University, Michigan, USA. His research and numerous peer-reviewed publications focus primarily on numerical solutions for partial differential equations, computational fluid dynamics, applied statistics, and mathematics education. He previously taught at the University of Windsor in Canada, Millersville University of Pennsylvania and Nova Southeastern University in Florida. Currently, he is the Associate Dean for Administration for the college of Arts and Sciences and a Professor of Mathematics at the American University of Kuwait.

Magdalena Niewiadomska-Bugaj

Magdalena Niewiadomska-Bugaj is professor and chair of the Department of Statistics at Western Michigan University in Kalamazoo, Michigan, USA. Her research interests include classification, categorical data, methodology for zero inflated data, and association modeling.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 509.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.