Search in:

Advanced search

Multivariate Behavioral Research Volume 50, 2015 - Issue 5

Submit an article Journal homepage

3,351

Views

213

CrossRef citations to date

Altmetric

Original Articles

Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences

Daniel M. McNeishDepartment of Human Development and Quantitative Methodology, University of Maryland, College ParkCorrespondence[email protected]

Pages 471-484 | Published online: 13 Oct 2015

Cite this article
https://doi.org/10.1080/00273171.2015.1036965
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

REFERENCES

Ambartsumian, V.A. (1929). On a problem of the theory of eigenvalues. Zeitschrift für Physik, 53, 690–695. doi: doi:10.1007/bf01330827
Google Scholar
Andersen, C.M., & Bro, R. (2010). Variable selection in regression—a tutorial. Journal of Chemometrics, 24, 728–737. doi: doi:10.1002/cem.1360
Web of Science ®Google Scholar
Babyak, M.A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66, 411–421. doi: doi:10.1097/01.psy.0000127692.23278.a9
PubMed Web of Science ®Google Scholar
Belloni, A. & Chernozhukov, V. (2013). Least squares after model selection in high dimensional sparse models. Bernoulli, 19, 521–547. doi: doi:10.3150/11-bej410
Web of Science ®Google Scholar
Candès, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35, 2313–2351. doi: doi:10.1214/009053606000001523
Web of Science ®Google Scholar
Cohen, J., Cohen, P., West, S.G., & Aiken, L.S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. Hoboken, NJ: Routledge.
Google Scholar
Derksen, S., & Keselman, H.J. (1992). Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45, 265–282. doi: doi:10.1111/j.2044-8317.1990.tb00940.x
Web of Science ®Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407–499. doi: doi:10.1214/009053604000000067
Web of Science ®Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22. doi: doi:10.1145/1401890.1401893
PubMed Web of Science ®Google Scholar
Gelman, A., & Shalizi, C.R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66, 8–38. doi: doi:10.1111/j.2044-8317.2011.02037.x
PubMed Web of Science ®Google Scholar
Groll, A., & Tutz, G. (2014). Variable selection for generalized linear mixed models by L1 penalized estimation. Statistics and Computing, 24, 137–154. doi: doi:10.1007/s11222-012-9359-z
Web of Science ®Google Scholar
Gui, J., & Li, H. (2005). Penalized Cox regression analysis in the high-dimensional and low sample size settings, with applications to microarray gene expression data. Bioinformatics, 21, 3001–3008. doi: doi:10.1093/bioinformatics/bti422
PubMed Web of Science ®Google Scholar
Hadamard, J. (1902). Sur les problèmes aux dérivées partielles et leur signification physique. Princeton University Bulletin, 13, 49–52.
Google Scholar
Harrell, F.E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. New York, NY: Springer.
Google Scholar
Harrell, F.E., Lee, K.L., Matchar, D.B., & Reichert, T.A. (1985). Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Cancer Treatment Reports, 69, 1071–1077.
PubMedGoogle Scholar
Hartmann, A., Van Der Kooij, A.J., & Zeeck, A. (2009). Exploring nonlinear relations: Models of clinical decision making by regression with optimal scaling. Psychotherapy Research, 19, 482–492. doi: doi:10.1080/10503300902905939
PubMed Web of Science ®Google Scholar
Hartmann, A., Zeeck, A., & Barrett, M.S. (2010). Interpersonal problems in eating disorders. International Journal of Eating Disorders, 43, 619–627. doi: doi:10.1002/eat.20747
PubMed Web of Science ®Google Scholar
Hawkins, D.M. (2004). The problem of overfitting. Journal of Chemical, Information, and Computer Sciences, 44, 1–12. doi: doi:10.1021/ci0342472
PubMedGoogle Scholar
Hesterberg, T., Choi, N.H., Meier, L., & Fraley, C. (2008). Least angle and ℓ1 penalized regression: A review. Statistics Surveys, 2, 61–93. doi: doi:10.1214/08-ss035
Google Scholar
Hoerl, A.E., & Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67. doi: doi:10.2307/1267351
Web of Science ®Google Scholar
Hurvich, C.M., & Tsai, C.L. (1990). The impact of model selection on inference in linear regression. The American Statistician, 44, 214–217. doi: doi:10.2307/2685338
Web of Science ®Google Scholar
Johnson, J.B. & Omland, K.S. (2004) Model selection in ecology and evolution. Trends in Ecology & Evolution, 19, 101–108. doi: doi:10.1016/j.tree.2003.10.013
PubMed Web of Science ®Google Scholar
Johnson, M., & Sinharay, S. (2011). Remarks from the new editors. Journal of Educational and Behavioral Statistics, 36, 3–5. doi: doi:10.3102/1076998610387267
Google Scholar
Khalili, A., & Chen, J. (2007). Variable selection in finite mixture of regression models. Journal of the American Statistical Association, 102, 1025–1038. doi: doi:10.1198/016214507000000590
Web of Science ®Google Scholar
Knapp, T. R., & Sawilowsky, S.S. (2001). Constructive criticisms of methodological and editorial practices. The Journal of Experimental Education, 70, 65–79. doi: doi:10.1080/00220970109599498
Web of Science ®Google Scholar
Lockhart, R., Taylor, J., Tibshirani, R.J., & Tibshirani, R. (2014). A significance test for the lasso. The Annals of Statistics, 42, 413–468. doi: doi:10.1214/13-aos1175
PubMed Web of Science ®Google Scholar
Lomax, R.G., & Hahs-Vaughn, D.L. (2013). Statistical concepts: A second course. New York, NY: Routledge.
Google Scholar
Meier, L., Van De Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B, 70, 53–71.
Google Scholar
Meier, L. (2009). grplasso: Fitting user specified models with Group Lasso penalty. R package version 0.4-2. . doi: doi:10.1111/j.1467-9868.2007.00627.x
Google Scholar
Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B, 72, 417–473. doi: doi:10.1111/j.1467-9868.2010.00740.x
Google Scholar
Park, M.Y., & Hastie, T. (2007). L1‐regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B, 69, 659–677. doi: doi:10.1111/j.1467-9868.2007.00607.x
Google Scholar
Pope, P.T., & Webster, J.T. (1972). The use of an F-statistic in stepwise regression procedures. Technometrics, 14, 327–340. doi: doi:10.1080/00401706.1972.10488919
Web of Science ®Google Scholar
Schelldorfer, J., Meier, L., & Bühlmann, P. (2014). GLMMLasso: An algorithm for high dimensional generalized linear mixed models using ℓ1-penalization. Journal of Computational and Graphical Statistics, 23, 460–477. doi: doi:10.1080/10618600.2013.773239
Web of Science ®Google Scholar
Scheidt, C.E., Hasenburg, A., Kunze, M., Waller, E., Pfeifer, R., Zimmermann, P., … Waller, N. (2012). Are individual differences of attachment predicting bereavement outcome after perinatal loss? A prospective cohort study. Journal of Psychosomatic Research, 73, 375–382. doi: doi:10.1016/j.jpsychores.2012.08.017
PubMed Web of Science ®Google Scholar
Schmid, N.S., Taylor, K.I., Foldi, N.S., Berres, M., & Monsch, A.U. (2013). Neuropsychological signs of Alzheimer's disease 8 years prior to diagnosis. Journal of Alzheimer's Disease, 34, 537–546.
PubMed Web of Science ®Google Scholar
Sharpe, D. (2013). Why the resistance to statistical innovations? Bridging the communication gap. Psychological Methods, 18, 572–582. doi: doi:10.1037/a0034177
PubMed Web of Science ®Google Scholar
Städler, N., Bühlmann, P., & Van De Geer, S. (2010). ℓ1-penalization for mixture regression models. Test, 19, 209–256. doi: doi:10.1007/s11749-010-0197-z
Web of Science ®Google Scholar
Stephens, P.A., Buskirk, S.W., Hayward, G.D., & Martinez del Rio, C. (2005) Information theory and hypothesis testing: A call for pluralism. Journal of Applied Ecology, 42, 4–12. doi: doi:10.1111/j.1365-2664.2005.01002.x
Web of Science ®Google Scholar
Steyerberg, E.W., Eijkemans, M.J., & Habbema, J.D. F. (1999). Stepwise selection in small data sets: A simulation study of bias in logistic regression analysis. Journal of Clinical Epidemiology, 52, 935–942.
PubMed Web of Science ®Google Scholar
Stock, J.H., & Watson, M.W. (2003). Introduction to econometrics. Boston, MA: Addison-Wesley. doi: doi:10.1007/s00362-009-0230-z
Google Scholar
Subramanian, J., & Simon, R. (2013). Overfitting in prediction models—Is it a problem only in high dimensions? Contemporary Clinical Trials, 36, 636–641. doi: doi:10.1016/j.cct.2013.06.011
PubMed Web of Science ®Google Scholar
Thompson, B. (1989). Why won't stepwise methods die? Measurement and Evaluation in Counseling and Development, 21, 146–148.
Web of Science ®Google Scholar
Thompson, B. (1995). Stepwise regression and stepwise discriminant analysis need not apply here: A guidelines editorial. Educational and Psychological Measurement, 55, 525–534. doi: doi:10.1177/0013164495055004001
Web of Science ®Google Scholar
Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. The Journal of Experimental Education, 70, 80–93. doi: doi:10.1080/00220970109599499
Web of Science ®Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288. doi: doi:10.1111/j.1467-9868.2011.00771.x
Google Scholar
Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385–395. doi: doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
PubMed Web of Science ®Google Scholar
Tikhonov, A.N. (1943). On the stability of inverse problems. Doklady Akademii Nauk SSSR, 39, 195–198.
Google Scholar
Vinod, H.D. (1978). A survey of ridge regression and related techniques for improvements over ordinary least squares. The Review of Economics and Statistics, 60, 121–131. doi: doi:10.2307/1924340
Web of Science ®Google Scholar
Waldmann, P., Mészáros, G., Gredler, B., Fuerst, C., & Sölkner, J. (2013). Evaluation of the lasso and the elastic net in genome-wide association studies. Frontiers in Genetics, 4 (270), 1–11. doi: doi:10.3389/fgene.2013.00270
Google Scholar
Wasserman, L., & Roeder, K. (2009). High dimensional variable selection. Annals of Statistics, 37, 2178. doi: doi:10.1214/08-aos646
PubMed Web of Science ®Google Scholar
Whittingham, M.J., Stephens, P.A., Bradbury, R.B., & Freckleton, R.P. (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75, 1182–1189. doi: doi:10.1111/j.1365-2656.2006.01141.x
PubMed Web of Science ®Google Scholar
Wilkinson, L. (1979). Tests of significance in stepwise regression. Psychological Bulletin, 86, 168–174. doi: doi:10.1037/0033-2909.86.1.168
Web of Science ®Google Scholar
Wintle, B.A., McCarthy, M.A., Volinsky, C.T. & Kavanagh, R.P. (2003) The use of Bayesian model averaging to better represent uncertainty in ecological models. Conservation Biology, 17, 1579–1590. doi: doi:10.1111/j.1523-1739.2003.00614.x
Web of Science ®Google Scholar
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68, 49–67. doi: doi:10.1111/j.1467-9868.2005.00532.x
Google Scholar
Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. The Annals of Statistics, 35, 2173–2192. doi: doi:10.1214/009053607000000127
Web of Science ®Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429. doi: doi:10.1198/016214506000000735
Web of Science ®Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301–320. doi: doi:10.1111/j.14679868.2005.00503.x
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences

REFERENCES

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences

REFERENCES

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date