Abstract
Medical images and genetic assays typically generate data with more variables than subjects. Scientists may use a two-step approach for testing hypotheses about Gaussian mean vectors. In the first step, principal components analysis (PCA) selects a set of sample components fewer in number than the sample size. In the second step, applying classical multivariate analysis of variance (MANOVA) methods to the reduced set of variables provides the desired hypothesis tests. Simulation results presented here indicate that success of the PCA in the first step requires nearly all variation to occur in population components far fewer in number than the number of subjects. In the second step, multivariate tests fail to attain reasonable power except in restrictive, favorable cases. The results encourage using other approaches discussed in the article to provide dependable hypothesis testing with high dimension, low sample size data (HDLSS).
Mathematics Subject Classification:
Acknowledgments
Joint support for Chi and Muller came in part from a UF CTSI core grant via NCRR K30-RR022258, as well as NIDDK R01-DK072398, and NIDCR grant 1R01DE020832-01A1. Chi’s support included NINDS R21-NS065098. Muller’s support included NIDCR U54-DE019261, NCRR K30-RR022258, NHLBI R01-HL091005, and NIAAA R01-AA016549.