Abstract
Boosting is a successful method for dealing with problems of high-dimensional classification of independent data. However, existing variants do not address the correlations in the context of longitudinal or cluster study-designs with measurements collected across two or more time points or in clusters. This article presents two new variants of boosting with a focus on high-dimensional classification problems with matched-pair binary responses or, more generally, any correlated binary responses. The first method is based on the generic functional gradient descent algorithm and the second method is based on a direct likelihood optimization approach. The performance and the computational requirements of the algorithms were evaluated using simulations. Whereas the performance of the two methods is similar, the computational efficiency of the generic-functional-gradient-descent-based algorithm far exceeds that of the direct-likelihood-optimization-based algorithm. The former method is illustrated using data on gene expression changes in de novo and relapsed childhood acute lymphoblastic leukemia. Computer code implementing the algorithms and the relevant dataset are available online as supplemental materials.