Abstract
With the use of finite mixture models for the clustering of a data set, the crucial question of how many clusters there are in the data can be addressed by testing for the smallest number of components in the mixture model compatible with the data. We investigate the performance of a resampling approach to this latter problem in the context of high-dimensional data, where the number of variables p is extremely large relative to the number of observations n. In order to be able to fit normal mixture models to such data, some form of dimension reduction has to be performed. This raises the question of whether a practically significant bias results if the bootstrapping is undertaken solely on the basis of the reduced dimensional form of the data, rather than using the full data from which to draw the bootstrap sample replications.
ACKNOWLEDGMENTS
The computational resources used in this work were provided by the Queensland Cyber Infrastructure Foundation. This work was supported by a grant from the Australian Research Council.