Abstract
Determining the number of components in a mixture of distributions is an important but difficult problem. This article introduces a procedure called minimum information ratio estimation and validation (MIREV), which is based on a ratio of Fisher information matrices. The smallest eigenvalue of the information ratio matrix is used to determine the number of components. A measure of uncertainty may be obtained using a bootstrap technique. Simulations illustrate the effectiveness of the procedure. For mixtures of exponential families, an expression for the observed information ratio matrix provides insight to the success of the procedure. Cluster analysis attempts to identify and characterize subpopulations believed to be present in a population. A wide variety of methods, are available, including criterion optimization, hierarchical methods, and various heuristic methods. Criterion optimization techniques, such as mixture analysis, fuzzy clustering, and partitioning methods are popular because they allow a great deal of flexibility in defining when objects are similar. However, they typically assume models with a known number of subpopulations. When the number is unknown, the investigator usually obtains several solutions and must decide between them. The decision is difficult to justify without an objective procedure for comparing clustering results. Although numerous measures have been proposed to evaluate the quality of clustering results in general and the number of clusters in particular, these measures are difficult to interpret and often unreliable. The MIREV procedure works extremely well for some examples. Further research is required to establish the conditions under which the procedure can be expected to produce reliable results.