ABSTRACT
Variable selection in ultra-high-dimensional data analysis is of critical importance in diverse scientific fields. There are many methods proposed for this problem in the past few decades. However, no attention has been paid on the model size produced by screening procedures and the difficulty of determining the screening threshold for practical use, which is key to real applications. Specifically, the pre-defined model size in a screening procedure lacks theoretical support, and users have no idea about the true positives and false positives in the final results. To solve this problem, we developed a new consistent independence screening procedure called HHG-CIS. The screening threshold of HHG-CIS can be set to explicitly control the family-wise error rate, thus it is easy to use in practice. Furthermore, HHG-CIS is model-free, robust to outliers, and outperforms other methods in models with interactions. We showed these advantages of HHG-CIS through extensive simulations and a real data application.
Acknowledgments
The authors would like to thank the Editor, the Associate Editor and the reviewer for their constructive and insightful comments and suggestions that greatly improved the paper.
Disclosure statement
No potential conflict of interest was reported by the author(s).
ORCID
Xiaodan Fan http://orcid.org/0000-0002-2744-9030