Abstract
Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates that are not associated with a response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalisation approach for variable selection. In the first step, a mixture of Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when the number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes data set, show satisfactory performance of the proposed approach.
Acknowledgements
The authors thank the associate editor and two referees for careful review and insightful comments, which have led to a significant improvement of this article. This study was supported by awards DMS-0904181 from NSF and CA142774 from NIH, USA.