Search in:

Advanced search

Biostatistics & Epidemiology Volume 3, 2019 - Issue 1

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Research Articles

How many clusters exist? Answer via maximum clustering similarity implemented in R

Ahmed N. AlbatinehDepartment of Community Medicine and Behavioral Sciences, Kuwait University, Kuwait, KuwaitCorrespondence[email protected]

https://orcid.org/0000-0001-5646-4945 View further author information

Meredith L. WilcoxFlorida International University, Miami, FL, USA;Present address: MB Clinical Research, Boca Raton, FL, USAView further author information

Bashar ZogheibDepartment of Mathematics and Natural Sciences, American University of Kuwait, Kuwait, KuwaitView further author information

Magdalena Niewiadomska-BugajDepartment of Statistics, Western Michigan University, Kalamazoo, MI, USAView further author information

Pages 62-79 | Received 07 Jul 2018, Accepted 20 Apr 2019, Published online: 22 May 2019

Cite this article
https://doi.org/10.1080/24709360.2019.1615770
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Everitt BS, Landau S, Leese M. Cluster analysis. New York: Oxford University Press; 2001.
Google Scholar
Marriott FHC. Practical problems in a method of cluster analysis. Biometrics. 1971;27:501–514. doi: 10.2307/2528592
Google Scholar
Hartigan JA. Clustering algorithms. New York: Wiley; 1975.
Google Scholar
Bock HH. On some significance tests in cluster analysis. J Classif. 1985;2:77–108. doi: 10.1007/BF01908065
Google Scholar
Hardy A. On the number of clusters. Comput Stat Data Anal. 1996;23:83–96. doi: 10.1016/S0167-9473(96)00022-9
Google Scholar
Gordon AD. Classification. 2nd ed. St Andrews: Chapman and Hall/CRC; 1999.
Google Scholar
Milligan G, Cooper M. An examination of procedures for determining the number of clusters in a data set. Psychometrika. 1985;50:159–179. doi: 10.1007/BF02294245
Google Scholar
Milligan G, Cooper M. A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behav Res. 1986;21:441–458. doi: 10.1207/s15327906mbr2104_5
Google Scholar
Koziol JA. Cluster analysis of antigenic profiles of tumors: selection of number of clusters using Akaike's information criterion. Methods Inf Med. 1990;29:200–204. doi: 10.1055/s-0038-1634783
Google Scholar
Sugar CA, James GM. Finding the number of clusters in a data set: an information theoretic approach. J Am Stat Assoc. 2003;98:750–763. doi: 10.1198/016214503000000666
Google Scholar
Banfield JD, Raftery AE. Model-based Gaussian and non-Gaussian clustering. Biometrics. 1993;49:803–821. doi: 10.2307/2532201
Google Scholar
Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J. 1998;41:578–588. doi: 10.1093/comjnl/41.8.578
Google Scholar
Krolak-Schwerdt S, Eckes T. A graph theoretic criterion for determining the number of clusters in a data set. Multi Behavior Res. 1992;27:541–565. doi: 10.1207/s15327906mbr2704_3
Google Scholar
Vassilliou A, Tambouratzis DG, Koutras MV, et al. A new similarity measure and its use in determining the number of clusters in a multivariate data set. Commun Stat Theory Method. 2004;33:1643–1666. doi: 10.1081/STA-120037266
Google Scholar
Breckenridge JN. Replicating cluster analysis: method, consistency, and validity. Multivariate Behav Res. 1989;24:147–161. doi: 10.1207/s15327906mbr2402_1
Google Scholar
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of custers in a dataset. Genome Biol. 2002;3:1–21. doi: 10.1186/gb-2002-3-7-research0036
Google Scholar
R Development Core Team. R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria; 2011. ISBN 3-900051-07-0, Available from: http://www.R-project.org/.
Google Scholar
Calinski RB, Harabasz J. A dendrite method for cluster analaysis. Commun Stat. 1974;3:1–27.
Google Scholar
Krzanowski WJ, Lai YT. A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics. 1985;44:23–34. doi: 10.2307/2531893
Google Scholar
Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis. New York: John Wiley & Sons; 1990.
Google Scholar
Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979;2:224–227. doi: 10.1109/TPAMI.1979.4766909
Google Scholar
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J Roy Stat Soc B. 2001;63:411–423. doi: 10.1111/1467-9868.00293
Google Scholar
Sarle WS. Cubic clustering criterion. SAS Institute Inc, Cary, NC; 1983 (SAS Technical Report; A-108).
Google Scholar
Ben-Hur A, Elisseeff A, Guyon I. A stability based method for discovering structure in clustered data. Pac Symp Biocomput. 2002;7:6–17.
Google Scholar
Albatineh AN, Niewiadomska-Bugaj M. MCS: a method for finding the number of clusters. J Class. 2011a;28:184–209. doi: 10.1007/s00357-010-9069-1
Google Scholar
Albatineh AN, Niewiadomska-Bugaj M. Correcting Jaccard and other similarity indices for chance agreement in cluster analysis. Adv Data Anal Class. 2011b;5:179–200. doi: 10.1007/s11634-011-0090-y
Google Scholar
Jain AK, Dubes RC. Algorithms for clustering data. Englewood Cliffs (NJ): Prentice Hall; 1988.
Google Scholar
Albatineh AN, Niewiadomska-Bugaj M, Mihalko DP. On similarity indices and correction for chance agreement. J Class. 2006;23:301–313. doi: 10.1007/s00357-006-0017-z
Google Scholar
Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850. doi: 10.1080/01621459.1971.10482356
Google Scholar
Jaccard P. The distribution of the flora of the alpine zone. New Phytol. 1912;11:37–50. doi: 10.1111/j.1469-8137.1912.tb05611.x
Google Scholar
Albatineh AN. Means and variances for a class of similarity indices in cluster analysis. J Stat Plan Inference. 2010;140:2828–2838. doi: 10.1016/j.jspi.2010.03.005
Google Scholar
Hubert L, Arabie P. Comparing partitions. J Class. 1985;2:193–218. doi: 10.1007/BF01908075
Google Scholar
Morey L, Agresti A. The measurement of classification agreement: an adjustment to the rand statistic for chance agreement. Educ Psychol Meas. 1984;44:33–37. doi: 10.1177/0013164484441003
Google Scholar
Rogers DJ, Tanimoto TT. A computer program for classifying plants. Science. 1960;132:1115–1118. doi: 10.1126/science.132.3434.1115
Google Scholar
Sokal RR, Sneath PHA. Principles of numerical taxonomy. San Francisco: W H Freeman; 1963.
Google Scholar
Gower JC, Legendre P. Metric and Euclidean properties of dissimilarity coefficients. J Class. 1986;3:5–48. doi: 10.1007/BF01896809
Google Scholar
Azzalini A, Bowman AW. A look at some data on the old faithful geyser. Appl Stat. 1990;3:357–365. doi: 10.2307/2347385
Google Scholar
Batschelet E. Circular statistics in biology. London: Academic Press; 1981.
Google Scholar
Fisher NI. Statistical analysis of circular data. Cambridge: Cambridge University Press; 1993.
Google Scholar
Mardia KV, Jupp PE. Directional statistics. Chichester: John Wiley & Sons; 2000.
Google Scholar
Lund U. Cluster analysis for directional data. Commun Stat Simul Comput. 1999;4:1001–1009. doi: 10.1080/03610919908813589
Google Scholar
Yang MS, Pan JA. On fuzzy clustering of directional data. Fuzzy Sets Syst. 1997;91:319–326. doi: 10.1016/S0165-0114(96)00157-1
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

How many clusters exist? Answer via maximum clustering similarity implemented in R

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

How many clusters exist? Answer via maximum clustering similarity implemented in R

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date