Abstract
We propose a procedure associated with the idea of the E-M algorithm for model selection in the presence of missing data. The idea extends the concept of parameters to include both the model and the parameters under the model, and thus allows the model to be part of the E-M iterations. We develop the procedure, known as the E-MS algorithm, under the assumption that the class of candidate models is finite. Some special cases of the procedure are considered, including E-MS with the generalized information criteria (GIC), and E-MS with the adaptive fence (AF; Jiang et al.). We prove numerical convergence of the E-MS algorithm as well as consistency in model selection of the limiting model of the E-MS convergence, for E-MS with GIC and E-MS with AF. We study the impact on model selection of different missing data mechanisms. Furthermore, we carry out extensive simulation studies on the finite-sample performance of the E-MS with comparisons to other procedures. The methodology is also illustrated on a real data analysis involving QTL mapping for an agricultural study on barley grains. Supplementary materials for this article are available online.
Additional information
Notes on contributors
Jiming Jiang
Jiming Jiang, Department of Statistics, University of California, Davis, One Shield Ave., Davis, CA 95616 (E-mail: [email protected]). Thuan Nguyen, Department of Public Health and Preventive Medicine, Oregon Health and Science University, Portland, OR 97239 (E-mail: [email protected]). J. Sunil Rao, Division of Biostatistics, School of Medicine, University of Miami, Miami, FL 33136 (E-mail: [email protected]). The research works of Jiming Jiang, Thuan Nguyen, and J. Sunil Rao were partially supported by the NSF grants SES-1121794, SES-1118469, and SES-1122399, respectively. The research works of all three authors were partially supported by the NIH grant R01-GM085205A1. The authors are grateful to a co-editor, an associate editor, and two referees for their valuable comments.
Thuan Nguyen
Jiming Jiang, Department of Statistics, University of California, Davis, One Shield Ave., Davis, CA 95616 (E-mail: [email protected]). Thuan Nguyen, Department of Public Health and Preventive Medicine, Oregon Health and Science University, Portland, OR 97239 (E-mail: [email protected]). J. Sunil Rao, Division of Biostatistics, School of Medicine, University of Miami, Miami, FL 33136 (E-mail: [email protected]). The research works of Jiming Jiang, Thuan Nguyen, and J. Sunil Rao were partially supported by the NSF grants SES-1121794, SES-1118469, and SES-1122399, respectively. The research works of all three authors were partially supported by the NIH grant R01-GM085205A1. The authors are grateful to a co-editor, an associate editor, and two referees for their valuable comments.
J. Sunil Rao
Jiming Jiang, Department of Statistics, University of California, Davis, One Shield Ave., Davis, CA 95616 (E-mail: [email protected]). Thuan Nguyen, Department of Public Health and Preventive Medicine, Oregon Health and Science University, Portland, OR 97239 (E-mail: [email protected]). J. Sunil Rao, Division of Biostatistics, School of Medicine, University of Miami, Miami, FL 33136 (E-mail: [email protected]). The research works of Jiming Jiang, Thuan Nguyen, and J. Sunil Rao were partially supported by the NSF grants SES-1121794, SES-1118469, and SES-1122399, respectively. The research works of all three authors were partially supported by the NIH grant R01-GM085205A1. The authors are grateful to a co-editor, an associate editor, and two referees for their valuable comments.