Search in:

Advanced search

Journal of Computational and Graphical Statistics Volume 29, 2020 - Issue 1

Submit an article Journal homepage

1,439

Views

CrossRef citations to date

Altmetric

Original Articles

Estimating the Number of Clusters Using Cross-Validation

Wei FuStern School of Business, New York University, New York, NY

Patrick O. PerryStern School of Business, New York University, New York, NYCorrespondence[email protected]

Pages 162-173 | Received 09 Feb 2017, Accepted 17 Jul 2019, Published online: 30 Sep 2019

Cite this article
https://doi.org/10.1080/10618600.2019.1647846
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Ben-Hur, A., Elisseeff, A., and Guyon, I. (2001), “A Stability Based Method for Discovering Structure in Clustered Data,” in Pacific Symposium on Biocomputing (Vol. 7), pp. 6–17.
Google Scholar
Caliński, T., and Harabasz, J. (1974), “A Dendrite Method for Cluster Analysis,” Communications in Statistics—Theory and Methods, 3, 1–27. DOI: 10.1080/03610927408827101.
Google Scholar
Charrad, M., Ghazzali, N., Boiteau, V., and Niknafs, A. (2014), “NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set,” Journal of Statistical Software, 61, 1–36. DOI: 10.18637/jss.v061.i06.
Web of Science ®Google Scholar
Chiang, M. M.-T., and Mirkin, B. (2010), “Intelligent Choice of the Number of Clusters in k-Means Clustering: An Experimental Study With Different Cluster Spreads,” Journal of Classification, 27, 3–40. DOI: 10.1007/s00357-010-9049-5.
Web of Science ®Google Scholar
Cho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T. G., Gabrielian, A. E., Landsman, D., Lockhart, D. J., and Davis, R. W. (1998), “A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle,” Molecular Cell, 2, 65–73. DOI: 10.1016/S1097-2765(00)80114-8.
PubMed Web of Science ®Google Scholar
Dortet-Bernadet, J.-L., and Wicker, N. (2008), “Model-Based Clustering on the Unit Sphere With an Illustration Using Gene Expression Profiles,” Biostatistics, 9, 66–80. DOI: 10.1093/biostatistics/kxm012.
PubMed Web of Science ®Google Scholar
Fang, Y., and Wang, J. (2012), “Selection of the Number of Clusters via the Bootstrap Method,” Computational Statistics & Data Analysis, 56, 468–477. DOI: 10.1016/j.csda.2011.09.003.
Web of Science ®Google Scholar
Fraley, C., and Raftery, A. E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation,” Journal of the American Statistical Association, 97, 611–631. DOI: 10.1198/016214502760047131.
Web of Science ®Google Scholar
Fujita, A., Takahashi, D. Y., and Patriota, A. G. (2014), “A Non-Parametric Method to Estimate the Number of Clusters,” Computational Statistics & Data Analysis, 73:27–39. DOI: 10.1016/j.csda.2013.11.012.
Web of Science ®Google Scholar
Gabriel, K. R. (2002), “Le Biplot–Outil d’Exploration de Données Multidimensionelles,” Journal de la Société Francaise de Statistique, 143, 5–55.
Google Scholar
Hartigan, J. A. (1975), Clustering Algorithms, New York: Wiley.
Google Scholar
Hartigan, J. A., and Wong, M. A. (1979), “Algorithm AS 136: A k-Means Clustering Algorithm,” Journal of the Royal Statistical Society, Series C, 28, 100–108. DOI: 10.2307/2346830.
Google Scholar
Haslbeck, J. M. B., and Wulff, D. U. (2018), “cluster: Cluster Analysis Basics and Extensions,” R Package Version 0.2-2.
Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics (2nd ed.), New York: Springer.
Google Scholar
Hennig, C. (2015), “fpc: Flexible Procedures for Clustering,” R Package Version 2.1-10.
Google Scholar
Jain, A. K. (2010), “Data Clustering: 50 Years Beyond k-Means,” Pattern Recognition Letters, 31, 651–666. DOI: 10.1016/j.patrec.2009.09.011.
Web of Science ®Google Scholar
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999), “Data Clustering: A Review,” ACM Computing Surveys, 31, 264–323. DOI: 10.1145/331499.331504.
Web of Science ®Google Scholar
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K. (2018), “cluster: Cluster Analysis Basics and Extensions,” R Package Version 2.0.7-1.
Google Scholar
Mangasarian, O. L., Setiono, R., and Wolberg, W. (1990), “Pattern Recognition via Linear Programming: Theory and Application to Medical Diagnosis,” in Large-Scale Numerical Optimization, Philadelphia, PA: SIAM, pp. 22–31.
Google Scholar
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., and Leisch, F. (2017), “e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien,” R Package Version 1.6-8.
Google Scholar
Owen, A. B., and Perry, P. O. (2009), “Bi-Cross-Validation of the SVD and the Nonnegative Matrix Factorization,” The Annals of Applied Statistics, 3, 564–594. DOI: 10.1214/08-AOAS227.
Web of Science ®Google Scholar
Pollard, D. (1981), “Strong Consistency of k-Means Clustering,” The Annals of Statistics, 9, 135–140. DOI: 10.1214/aos/1176345339.
Web of Science ®Google Scholar
Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y., Goumnerova, L. C., Black, P. M., Lau, C., and Allen, J. C. (2002), “Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression,” Nature, 415, 436–442. DOI: 10.1038/415436a.
PubMed Web of Science ®Google Scholar
R Core Team (2018), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Schlimmer, J. C. (1987), “Concept Acquisition Through Representational Adjustment,” PhD thesis, Department of Information and Computer Science, University of California, Irvine.
Google Scholar
Scrucca, L., Fop, M., Murphy, T. B., and Raftery, A. E. (2016), “mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models,” The R Journal, 8, 205–233. DOI: 10.32614/RJ-2016-021.
Google Scholar
Sugar, C. A., and James, G. M. (2003), “Finding the Number of Clusters in a Dataset,” Journal of the American Statistical Association, 98, 750–763. DOI: 10.1198/016214503000000666.
Web of Science ®Google Scholar
Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J., and Church, G. M. (1999), “Systematic Determination of Genetic Network Architecture,” Nature Genetics, 22, 281–285. DOI: 10.1038/10343.
PubMed Web of Science ®Google Scholar
Tibshirani, R., and Walther, G. (2005), “Cluster Validation by Prediction Strength,” Journal of Computational and Graphical Statistics, 14, 511–528. DOI: 10.1198/106186005X59243.
Web of Science ®Google Scholar
Tibshirani, R., Walther, G., and Hastie, T. (2001), “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” Journal of the Royal Statistical Society, Series B, 63, 411–423. DOI: 10.1111/1467-9868.00293.
Google Scholar
Venables, W. N., and Ripley, B. D. (2002), Modern Applied Statistics With S (4th ed.), New York: Springer, ISBN 0-387-95457-0.
Google Scholar
Wang, J. (2010), “Consistent Selection of the Number of Clusters via Crossvalidation,” Biometrika, 97, 893–904. DOI: 10.1093/biomet/asq061.
Web of Science ®Google Scholar
Wickham, H. (2016), ggplot2: Elegant Graphics for Data Analysis, New York: Springer-Verlag.
Google Scholar
Wilson, E. B. (1927), “Probable Inference, the Law of Succession, and Statistical Inference,” Journal of the American Statistical Association, 22, 209–212. DOI: 10.1080/01621459.1927.10502953.
Web of Science ®Google Scholar
Wold, S. (1978), “Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models,” Technometrics, 20, 397–405. DOI: 10.1080/00401706.1978.10489693.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Estimating the Number of Clusters Using Cross-Validation

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Estimating the Number of Clusters Using Cross-Validation

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date