5,079
Views
145
CrossRef citations to date
0
Altmetric
Theory and Methods

Optimal Subsampling for Large Sample Logistic Regression

, &
Pages 829-844 | Received 01 Mar 2016, Published online: 06 Jun 2018

References

  • Atkinson, A., Donev, A., and Tobias, R. (2007), Optimum Experimental Designs, With SAS (Vol. 34), Oxford: Oxford University Press.
  • Baldi, P., Sadowski, P., and Whiteson, D. (2014), “Searching for Exotic Particles in High-Energy Physics With Deep Learning,” Nature Communications 5, 1–9, available at https://doi.org/10.1038/ncomms5308.
  • Buldygin, V., and Kozachenko, Y. V. (1980), “Sub-Gaussian Random Variables,” Ukrainian Mathematical Journal, 32, 483–489.
  • Clarkson, K. L., and Woodruff, D. P. (2013), “Low Rank Approximation and Regression in Input Sparsity Time,” in Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, ACM, pp. 81–90.
  • Dhillon, P., Lu, Y., Foster, D. P., and Ungar, L. (2013), “New Subsampling Algorithms for Fast Least Squares Regression,” in Advances in Neural Information Processing Systems, pp. 360–368.
  • Dines, L. L. (1926), “Note on Certain Associated Systems of Linear Equalities and Inequalities,” Annals of Mathematics, 28, 41–42.
  • Drineas, P., Magdon-Ismail, M., Mahoney, M., and Woodruff, D. (2012), “Faster Approximation of Matrix Coherence and Statistical Leverage,” Journal of Machine Learning Research, 13, 3475–3506.
  • Drineas, P., Mahoney, M., Muthukrishnan, S., and Sarlos, T. (2011), “Faster Least Squares Approximation,” Numerische Mathematik, 117, 219–249.
  • Drineas, P., Mahoney, M. W., and Muthukrishnan, S. (2006), “Sampling Algorithms for l2 Regression and Applications,” in Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, Philadelphia, PA: Society for Industrial and Applied Mathematics, pp. 1127–1136.
  • Efron, B. (1979), “Bootstrap Methods: Another Look at the Jackknife,” The Annals of Statistics, 7, 1–26.
  • Efron, B., and Tibshirani, R. J. (1994), An Introduction to the Bootstrap, Boca Raton, FL: CRC Press.
  • Fithian, W., and Hastie, T. (2014), “Local Case-Control Sampling: Efficient Subsampling in Imbalanced Data Sets,” Annals of Statistics, 42, 1693–1724.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014), Bayesian Data Analysis (3rd ed.), Boca Raton, FL: Chapman and Hall/CRC.
  • Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013), Applied Logistic Regression (Vol. 398), New York: Wiley.
  • Kiefer, J. (1959), “Optimum Experimental Designs,” Journal of the Royal Statistical Society, Series B, 21, 272–319.
  • King, G., and Zeng, L. (2001), “Logistic Regression in Rare Events Data,” Political Analysis, 9, 137–163.
  • Kohavi, R. (1996), “Scaling up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207.
  • Lichman, M. (2013), UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science. Available at http://archive.ics.uci.edu/ml
  • Ma, P., Mahoney, M., and Yu, B. (2014), “A Statistical Perspective on Algorithmic Leveraging,” in Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 91–99.
  • ——— (2015), “A Statistical Perspective on Algorithmic Leveraging,” Journal of Machine Learning Research, 16, 861–911.
  • Ma, P., and Sun, X. (2015), “Leveraging for Big Data Regression,” Wiley Interdisciplinary Reviews: Computational Statistics, 7, 70–76.
  • Mahoney, M. W., and Drineas, P. (2009), “CUR Matrix Decompositions for Improved Data Analysis,” Proceedings of the National Academy of Sciences, 106, 697–702.
  • McWilliams, B., Krummenacher, G., Lucic, M., and Buhmann, J. M. (2014), “Fast and Robust Least Squares Estimation in Corrupted Linear Models,” in Advances in Neural Information Processing Systems, pp. 415–423.
  • Owen, A. B. (2007), “Infinitely Imbalanced Logistic Regression,” The Journal of Machine Learning Research, 8, 761–773.
  • R Core Team (2015), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, Available at https://www.R-project.org/
  • Rokhlin, V., and Tygert, M. (2008), “A Fast Randomized Algorithm for Overdetermined Linear Least-Squares Regression,” Proceedings of the National Academy of Sciences, 105, 13212–13217.
  • Scott, A. J., and Wild, C. J. (1986), “Fitting Logistic Models Under Case-Control or Choice Based Sampling,” Journal of the Royal Statistical Society, Series B, 48, 170–182.
  • Silvapulle, M. (1981), “On the Existence of Maximum Likelihood Estimators for the Binomial Response Models,” Journal of the Royal Statistical Society, Series B, 43, 310–313.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.