Abstract
As a tool for producing meaningful and interpretable results, subset or variable selection has been well studied in modern statistics. However, most of the existing methods focus on the independent data and cannot directly extend to the network-linked data where samples are connected with each other. To this end, we propose a subset selection method in the linear regression model by incorporating the network information into the intercept term, which can achieve automatic subset selection and have good network structural interpretability simultaneously. Based on this, we develop an efficient algorithm to recover the true subset, as well as determine subgroups. Simulation studies demonstrate that the proposal outperforms the state-of-art methods in estimation and selection accuracy. We also apply the proposed method on data from the national longitudinal study of adolescent health and show the superiority of selecting variables alone a network by a smaller model size and more accurate prediction.
Acknowledgments
The authors thank the editor, Associate Editor and referees for helpful suggestions that greatly improved the presentation of the paper.
Disclosure statement
No potential conflict of interest was reported by the author(s).