422
Views
2
CrossRef citations to date
0
Altmetric
Scalable and Efficient Computation

Compressed and Penalized Linear Regression

&
Pages 309-322 | Received 23 May 2018, Accepted 14 Aug 2019, Published online: 30 Sep 2019

References

  • Achlioptas, D. (2003), “Database-Friendly Random Projections: Johnson-Lindenstrauss With Binary Coins,” Journal of Computer and System Sciences, 66, 671–687. DOI: 10.1016/S0022-0000(03)00025-4.
  • Ailon, N., and Chazelle, B. (2006), “Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform,” in Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, ACM, pp. 557–563.
  • Avron, H., Maymounkov, P., and Toledo, S. (2010), “Blendenpik: Supercharging LAPACK’s Least-Squares Solver,” SIAM Journal on Scientific Computing, 32, 1217–1236. DOI: 10.1137/090767911.
  • Bair, E., Hastie, T., Paul, D., and Tibshirani, R. (2006), “Prediction by Supervised Principal Components,” Journal of the American Statistical Association, 101, 119–137. DOI: 10.1198/016214505000000628.
  • Becker, S., Kawas, B., Petrik, M., and Ramamurthy, K. N. (2017), “Robust Partially-Compressed Least-Squares,” in The Thirty-First AAAI Conference on Artificial Intelligence.
  • Cloonan, N., Forrest, A. R. R., Kolle, G., Gardiner, B. B. A., Faulkner, G. J., Brown, M. K., Taylor, D. F., Steptoe, A. L., Wani, S., Bethel, G., Robertson, A. J., Perkins, A. C., Bruce, S. J., Lee, C. C., Ranade, S. S., Peckham, H. E., Manning, J. M., McKernan, K. J., and Grimmond, S. M. (2008), “Stem Cell Transcriptome Profiling via Massive-Scale mRNA Sequencing,” Nature Methods, 5, 613–619. DOI: 10.1038/nmeth.1223.
  • Collins, D. L., Zijdenbos, A. P., Kollokian, V., Sled, J. G., Kabani, N. J., Holmes, C. J., and Evans, A. C. (1998), “Design and Construction of a Realistic Digital Brain Phantom,” IEEE Transactions on Medical Imaging, 17, 463–468. DOI: 10.1109/42.712135.
  • Coupe, P., Yger, P., Prima, S., Hellier, P., Kervrann, C., and Barillot, C. (2008), “An Optimized Blockwise Nonlocal Means Denoising Filter for 3-D Magnetic Resonance Images,” IEEE Transactions on Medical Imaging, 27, 425–441. DOI: 10.1109/TMI.2007.906087.
  • Dalpiaz, D., He, X., and Ma, P. (2013), “Bias Correction in RNA-Seq Short-Read Counts Using Penalized Regression,” Statistics in Biosciences, 5, 88–99. DOI: 10.1007/s12561-012-9057-6.
  • Dasgupta, A., Kumar, R., and Sarlós, T. (2010), “A Sparse Johnson-Lindenstrauss Transform,” in Proceedings of the 42nd ACM Symposium on Theory of Computing, ACM, pp. 341–350.
  • Ding, L., and McDonald, D. J. (2017), “Predicting Phenotypes From Microarrays Using Amplified, Initially Marginal, Eigenvector Regression,” Bioinformatics, 33, i350–i358. DOI: 10.1093/bioinformatics/btx265.
  • Drineas, P., Magdon-Ismail, M., Mahoney, M. W., and Woodruff, D. P. (2012), “Fast Approximation of Matrix Coherence and Statistical Leverage,” Journal of Machine Learning Research, 13, 3475–3506.
  • Drineas, P., Mahoney, M. W., Muthukrishnan, S., and Sarlós, T. (2011), “Faster Least Squares Approximation,” Numerische Mathematik, 117, 219–249. DOI: 10.1007/s00211-010-0331-6.
  • Efron, B. (1986), “How Biased Is the Apparent Error Rate of a Prediction Rule?,” Journal of the American Statistical Association, 81, 461–470. DOI: 10.1080/01621459.1986.10478291.
  • Frey, R. A., Ackerman, S., and Soden, B. J. (1996), “Climate Parameters From Satellite Spectral Measurements. Part 1: Collocated AVHRR and HIRS/2 Observations of Spectral Greenhouse Parameter,” Journal of Climate, 9, 327–344. DOI: 10.1175/1520-0442(1996)009<0327:CPFSSM>2.0.CO;2.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2010), “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, 33, 1–22. DOI: 10.18637/jss.v033.i01.
  • Gittens, A., and Mahoney, M. (2013), “Revisiting the Nystrom Method for Improved Large-Scale Machine Learning,” in Proceedings of the 30th International Conference on Machine Learning (ICML-13), JMLR Workshop and Conference Proceedings (Vol. 28), eds. S. Dasgupta and D. McAllester, pp. 567–575.
  • Golub, G. H., Heath, M., and Wahba, G. (1979), “Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter,” Technometrics, 21, 215–223. DOI: 10.1080/00401706.1979.10489751.
  • Golub, G. H., and Van Loan, C. F. (2012), Matrix Computations (Vol. 3), Baltimore, MD: JHU Press.
  • Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011), “Finding Structure With Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” SIAM Review, 53, 217–288. DOI: 10.1137/090771806.
  • Hoerl, A. E., and Kennard, R. W. (1970), “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, 12, 55–67. DOI: 10.1080/00401706.1970.10488634.
  • Homrighausen, D., and McDonald, D. J. (2016), “On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets,” Journal of Computational and Graphical Statistics, 25, 344–362. DOI: 10.1080/10618600.2014.995799.
  • Ingrassia, S., and Morlini, I. (2007), Equivalent Number of Degrees of Freedom for Neural Networks, Advances in Data Analysis, Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Kane, D. M., and Nelson, J. (2014), “Sparser Johnson-Lindenstrauss Transforms,” Journal of the ACM, 61, Article 4. DOI: 10.1145/2559902.
  • Lang, M., Bischl, B., and Surmann, D. (2017), “batchtools: Tools for R to Work on Batch Systems,” The Journal of Open Source Software, 2, 135. DOI: 10.21105/joss.00135.
  • Li, J., Jiang, H., and Wong, W. H. (2010), “Modeling Non-uniformity in Short-Read Rates in RNA-Seq Data,” Genome Biology, 11, 1–11. DOI: 10.1186/gb-2010-11-5-r50.
  • Ma, P., Mahoney, M. W., and Yu, B. (2015), “A Statistical Perspective on Algorithmic Leveraging,” The Journal of Machine Learning Research, 16, 861–911.
  • Mallows, C. L. (1973), “Some Comments on Cp,” Technometrics, 15, 661–675. DOI: 10.1080/00401706.1973.10489103.
  • Meng, X., Saunders, M. A., and Mahoney, M. W. (2014), “LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems,” SIAM Journal on Scientific Computing, 36, C95–C118. DOI: 10.1137/120866580.
  • Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., and Wold, B. (2008), “Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq,” Nature Methods, 5, 621–628. DOI: 10.1038/nmeth.1226.
  • Paul, D., Bair, E., Hastie, T., and Tibshirani, R. (2008), “Preconditioning for Feature Selection and Regression in High-Dimensional Problems,” The Annals of Statistics, 36, 1595–1618. DOI: 10.1214/009053607000000578.
  • Pilanci, M., and Wainwright, M. J. (2016), “Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares,” Journal of Machine Learning Research, 17, 1842–1879.
  • R Core Team (2019), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, available at https://www.R-project.org/.
  • Raskutti, G., and Mahoney, M. (2015), “Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares,” in Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR, Lille, France (Vol. 37), eds. F. Bach and D. Blei, pp. 617–625.
  • Rokhlin, V., and Tygert, M. (2008), “A Fast Randomized Algorithm for Overdetermined Linear Least-Squares Regression,” Proceedings of the National Academy of Sciences of the United States of America, 105, 13212–13217. DOI: 10.1073/pnas.0804869105.
  • Rudelson, M., and Vershynin, R. (2010), “Non-Asymptotic Theory of Random Matrices: Extreme Singular Values,” in Proceedings of the International Congress of Mathematicians 2010 (ICM 2010), eds. R. Bhatia, A. Pal, G. Rangarajan, V. Srinivas, and M. Vanninathan, pp. 1576–1602.
  • Saint-Marc, P., Chen, J.-S., and Medioni, G. (1989), “Adaptive Smoothing: A General Tool for Early Vision,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp. 618–624.
  • Sapiro, G. (1996), “From Active Contours to Anisotropic Diffusion: Connections Between Basic PDE’s in Image Processing,” in Proceedings of the International Conference on Image Processing (Vol. 1), IEEE, pp. 477–480.
  • Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011), “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent,” Journal of Statistical Software, 39, 1–13. DOI: 10.18637/jss.v039.i05.
  • Staten, P. W., Kahn, B. H., Schreier, M. M., and Heidinger, A. K. (2016), “Subpixel Characterization of HIRS Spectral Radiances Using Cloud Properties From AVHRR,” Journal of Atmospheric and Oceanic Technology, 33, 1519–1538. DOI: 10.1175/JTECH-D-15-0187.1.
  • Stein, C. M. (1981), “Estimation of the Mean of a Multivariate Normal Distribution,” The Annals of Statistics, 9, 1135–1151. DOI: 10.1214/aos/1176345632.
  • Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S. F., Schroth, G. P., and Burge, C. B. (2008), “Alternative Isoform Regulation in Human Tissue Transcriptomes,” Nature, 456, 470–476. DOI: 10.1038/nature07509.
  • Wang, J., Lee, J., Mahdavi, M., Kolar, M., and Srebro, N. (2017), “Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-Dimensional Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, PMLR, Fort Lauderdale, FL, USA (Vol. 54), eds. A. Singh and J. Zhu, pp. 1150–1158.
  • Wickham, H. (2017), “tidyverse: Easily Install and Load the ‘tidyverse’,” R Package Version 1.2.1, available at https://www.tidyverse.org.
  • Woodruff, D. P. (2014), “Sketching as a Tool for Numerical Linear Algebra,” Foundations and Trends® in Theoretical Computer Science, 10, 1–157. DOI: 10.1561/0400000060.
  • Xie, Y. (2015), Dynamic Documents With R and knitr (2nd ed.), Boca Raton, FL: Chapman and Hall/CRC.
  • Xie, Y. (2019), “knitr: A General-Purpose Package for Dynamic Report Generation in R,” R Package Version 1.22, available at https://yihui.name/knitr/.
  • Xie, Y., Allaire, J., and Grolemund, G. (2018), R Markdown: The Definitive Guide, Boca Raton, FL: Chapman and Hall/CRC.
  • Zhang, L., Mahdavi, M., Jin, R., Yang, T., and Zhu, S. (2013), “Recovering the Optimal Solution by Dual Random Projection,” in Proceedings of the 26th Annual Conference on Learning Theory, Proceedings of Machine Learning Research, PMLR (Vol. 30), eds. S. Shalev-Shwartz and I. Steinwart, pp. 135–157.
  • Zhou, S., Lafferty, J., and Wasserman, L. (2009), “Compressed and Privacy-Sensitive Sparse Regression,” IEEE Transactions on Information Theory, 55, 846–866. DOI: 10.1109/TIT.2008.2009605.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.