681
Views
11
CrossRef citations to date
0
Altmetric
Regularization

Fast Cross-validation for Multi-penalty High-dimensional Ridge Regression

, &
Pages 835-847 | Received 20 May 2020, Accepted 12 Mar 2021, Published online: 19 May 2021
 

Abstract

High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type-specific penalties. The largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional estimation loop by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in low-dimensional space, rendering a speed-up of several orders of magnitude. We developed a flexible framework that facilitates multiple types of response, unpenalized covariates, several performance criteria and repeated CV. Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems. Moreover, we present similar computational shortcuts for maximum marginal likelihood and Bayesian probit regression. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners. Supplementary materials for this article are available online.

Supplementary Material

Table with computing times CV single-penalty ridge

Details on equivalence IWLS with Newton updating

Proof of Proposition 1

Details on prediction for new samples and estimation of coefficients

Details on Cox ridge regression

Details on ML derivation and comparison with CV

Summary of functionality of the multiridge package

Details about the data

Table comparing coefficients of multiridge and multiridge_pref

Multi-penalty Bayesian probit regression

Profile plots for CVL and AUC

ROC curves for cervical cancer example

Acknowledgments

The authors acknowledge NWO-ZonMw TOP grant COMPUTE CANCER (40-00812-98-16012)

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.