81
Views
13
CrossRef citations to date
0
Altmetric
Theory and Method

Approximate Confidence Intervals for the Number of Clusters

, &
Pages 184-191 | Received 01 Jan 1984, Published online: 12 Mar 2012
 

Abstract

We consider clustering for the purpose of data reduction. Similar objects are grouped together in clusters so that one can then work with the few cluster descriptors instead of the many data points. The quality of any given clustering is measured by a loss function that takes into account both the parsimony of the clustering and the loss of information due to clustering. An optimal clustering can be obtained by minimizing the theoretical loss function. It is shown that a sample version of the loss function and optimal clustering converge strongly to their theoretical counterparts as the sample size tends to infinity. We then develop a bootstrap-based procedure for obtaining approximate confidence bounds on the number of clusters in the “best” clustering. The effectiveness of this procedure is evaluated in a simulation study. An application is presented.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.