ABSTRACT
An important step in developing individualized treatment strategies is correct identification of subgroups of a heterogeneous population to allow specific treatment for each subgroup. This article considers the problem using samples drawn from a population consisting of subgroups with different mean values, along with certain covariates. We propose a penalized approach for subgroup analysis based on a regression model, in which heterogeneity is driven by unobserved latent factors and thus can be represented by using subject-specific intercepts. We apply concave penalty functions to pairwise differences of the intercepts. This procedure automatically divides the observations into subgroups. To implement the proposed approach, we develop an alternating direction method of multipliers algorithm with concave penalties and demonstrate its convergence. We also establish the theoretical properties of our proposed estimator and determine the order requirement of the minimal difference of signals between groups to recover them. These results provide a sound basis for making statistical inference in subgroup analysis. Our proposed method is further illustrated by simulation studies and analysis of a Cleveland heart disease dataset. Supplementary materials for this article are available online.
Supplementary Materials
In the supplementary materials, we give the technical proofs for Proposition 1 and Theorems 1–3. We also provide a detailed estimation procedure for model (2) based on the ADMM algorithm.
Acknowledgments
The authors are grateful to the editor, the associate editor, and two anonymous reviewers for their constructive comments that helped us improve the article substantially.
Funding
The research of Ma is supported in part by the U.S. NSF grant DMS-13-06972 and Hellman Fellowship. The research of Huang is supported in part by the U.S. NSF grant DMS-12-08225.