305
Views
9
CrossRef citations to date
0
Altmetric
`Big Data and Information Theory' in celebrating the 95th birthday of Professor Lotfi A. Zadeh

Big data analytics: integrating penalty strategies

&
Pages 105-115 | Received 15 Jan 2016, Accepted 06 Feb 2016, Published online: 16 Mar 2016
 

Abstract

We present efficient estimation and prediction strategies for the classical multiple regression model when the dimensions of the parameters are larger than the number of observations. These strategies are motivated by penalty estimation and Stein-type estimation procedures. More specifically, we consider the estimation of regression parameters in sparse linear models when some of the predictors may have a very weak influence on the response of interest. In a high-dimensional situation, a number of existing variable selection techniques exists. However, they yield different subset models and may have different numbers of predictors. Generally speaking, the least absolute shrinkage and selection operator (Lasso) approach produces an over-fitted model compared with its competitors, namely the smoothly clipped absolute deviation (SCAD) method and adaptive Lasso (aLasso). Thus, prediction based only on a submodel selected by such methods will be subject to selection bias. In order to minimize the inherited bias, we suggest combining two models to improve the estimation and prediction performance. In the context of two competing models where one model includes more predictors than the other based on relatively aggressive variable selection strategies, we plan to investigate the relative performance of Stein-type shrinkage and penalty estimators. The shrinkage estimator improves the prediction performance of submodels significantly selected from existing Lasso-type variable selection methods. A Monte Carlo simulation study is carried out using the relative mean squared error (RMSE) criterion to appraise the performance of the listed estimators. The proposed strategy is applied to the analysis of several real high-dimensional data sets.

AMS Subject Classifications:

Acknowledgements

We would like to thank Editor in Chief Professor Jiuping Xu and Managing Editor Dr Zongmin Li for their suggestions and encouragement regarding this paper.

Notes

No potential conflict of interest was reported by the authors.

Additional information

Funding

The research of the first author, S. Ejaz Ahmed, is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 289.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.