Bayesian Dynamic Feature Partitioning in High-Dimensional Regression With Big Data: Technometrics: Vol 64, No 2

466

Views

CrossRef citations to date

Altmetric

Abstract

Bayesian computation of high-dimensional linear regression models using Markov chain Monte Carlo (MCMC) or its variants can be extremely slow or completely prohibitive since these methods perform costly computations at each iteration of the sampling chain. Furthermore, this computational cost cannot usually be efficiently divided across a parallel architecture. These problems are aggravated if the data size is large or data arrive sequentially over time (streaming or online settings). This article proposes a novel dynamic feature partitioned regression (DFP) for efficient online inference for high-dimensional linear regressions with large or streaming data. DFP constructs a pseudo posterior density of the parameters at every time point, and quickly updates the pseudo posterior when a new block of data (data shard) arrives. DFP updates the pseudo posterior at every time point suitably and partitions the set of parameters to exploit parallelization for efficient posterior computation. The proposed approach is applied to high-dimensional linear regression models with Gaussian scale mixture priors and spike-and-slab priors on large parameter spaces, along with large data, and is found to yield state-of-the-art inferential performance. The algorithm enjoys theoretical support with pseudoposterior densities over time being arbitrarily close to the full posterior as the data size grows, as shown in the supplementary material. Supplementary material also contains details of the DFP algorithm applied to different priors. Package to implement DFP is available in https://github.com/Rene-Gutierrez/DynParRegReg. The dataset is available in https://github.com/Rene-Gutierrez/DynParRegReg\_Implementation.

Keywords:

Supplementary Material

Supplementary Material 1: This document contains proof of the convergence behavior for the DFP algorithm. It also contains details of the DFP algorithm when applied to the linear high-dimensional regression with Bayesian Lasso prior, Horseshoe prior and Spike & Lasso prior on coefficients.

TestScript.R: The package to implement DFP is available at https://github.com/Rene-Gutierrez/DynParRegReg. We also upload TestScript.R file which uses functions from the package to run the simulations.

Additional information

Funding

The research of Rajarshi Guhaniyogi is partially supported by grants from the Office of Naval Research (BAA N000141812741) and the National Science Foundation (DMS-1854662).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Bayesian Dynamic Feature Partitioning in High-Dimensional Regression With Big Data

Related Research Data

Information for

Open access

Opportunities

Help and information

Bayesian Dynamic Feature Partitioning in High-Dimensional Regression With Big Data

Abstract

Supplementary Material

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature