Abstract
In large-scale data analysis, such as in a microarray study to identify the most differentially expressed genes, diagnostic tests are frequently used to classify and predict subjects into their different categories. Frequently, these categories do not have an intrinsic natural order even though the quantitative test results have a relative order. As identifying the correct order for a proper definition of accuracy measures is important for a high-dimensional receiver operating characteristic (ROC) analysis, we propose rigorous and automated approaches to sort out the multiple categories using simple summary statistics such as means and relative effects. We discuss the hypervolume under the ROC manifold (HUM), its dependence on the order of the test results and the minimum acceptable HUM values in a general multi-category classification problem. Using a leukemia data set and a liver cancer data set, we show how our approaches provide accurate screening results when we have a large number of tests.
Acknowledgements
We are grateful to Jason Fine and Michael Kosorok for helpful comments on the inference for .