Abstract
The integrative analysis of multiple sequences of multiple tests has enjoyed increasing popularity in many applications, especially in large-scale genomics. In the context of large-scale multiple testing, the concept of signal classification has been developed recently for cases when the same features are involved in several independent studies, with the goal of classifying each feature into one of several classes. This article considers the problem of such signal classification in a generalized compound decision-making framework, where the observed data are assumed to be generated from an underlying four-state Cartesian hidden Markov model. Two oracle procedures are proposed for the total and set-specific control of misclassification rates, respectively, while the number of correct classifications is maximized. Optimal data-driven procedures are also proposed, with their asymptotic properties derived. It is shown that signal-classification could be improved significantly by taking into account the dependence structure among features, and the proposed procedures could have a better performance than their competitors that ignore the dependence structure. The proposed methods are applied to a psychiatric genetics study for detecting genetic variants that affect either or both of bipolar disorder and schizophrenia.
Supplementary Materials
Supplementary.pdf: The supplementary file contains the proofs of the theoretical results presented in this article.
CodeAndData.zip: Some computer codes for implementing the proposed methods and the real data used in Section 5.
Acknowledgments
The authors want to thank the Editor, the Associate Editor, and anonymous referees for their constructive comments and suggestions that improved the quality of the article significantly.
Disclosure Statement
The authors report there are no competing interests to declare.