Search in:

Advanced search

Journal of the American Statistical Association Volume 113, 2018 - Issue 522

Submit an article Journal homepage

5,079

Views

145

CrossRef citations to date

Altmetric

Theory and Methods

Optimal Subsampling for Large Sample Logistic Regression

HaiYing WangDepartment of Mathematics and Statistics, University of New Hampshire, Durham, NH;Department of Statistics, University of Connecticut, Storrs, CTView further author information

Rong ZhuAcademy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, ChinaView further author information

Ping MaDepartment of Statistics, University of Georgia, Athens, GAView further author information

Pages 829-844 | Received 01 Mar 2016, Published online: 06 Jun 2018

Cite this article
https://doi.org/10.1080/01621459.2017.1292914
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Atkinson, A., Donev, A., and Tobias, R. (2007), Optimum Experimental Designs, With SAS (Vol. 34), Oxford: Oxford University Press.
Google Scholar
Baldi, P., Sadowski, P., and Whiteson, D. (2014), “Searching for Exotic Particles in High-Energy Physics With Deep Learning,” Nature Communications 5, 1–9, available at https://doi.org/10.1038/ncomms5308.
Web of Science ®Google Scholar
Buldygin, V., and Kozachenko, Y. V. (1980), “Sub-Gaussian Random Variables,” Ukrainian Mathematical Journal, 32, 483–489.
Google Scholar
Clarkson, K. L., and Woodruff, D. P. (2013), “Low Rank Approximation and Regression in Input Sparsity Time,” in Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, ACM, pp. 81–90.
Google Scholar
Dhillon, P., Lu, Y., Foster, D. P., and Ungar, L. (2013), “New Subsampling Algorithms for Fast Least Squares Regression,” in Advances in Neural Information Processing Systems, pp. 360–368.
Google Scholar
Dines, L. L. (1926), “Note on Certain Associated Systems of Linear Equalities and Inequalities,” Annals of Mathematics, 28, 41–42.
Google Scholar
Drineas, P., Magdon-Ismail, M., Mahoney, M., and Woodruff, D. (2012), “Faster Approximation of Matrix Coherence and Statistical Leverage,” Journal of Machine Learning Research, 13, 3475–3506.
Web of Science ®Google Scholar
Drineas, P., Mahoney, M., Muthukrishnan, S., and Sarlos, T. (2011), “Faster Least Squares Approximation,” Numerische Mathematik, 117, 219–249.
Web of Science ®Google Scholar
Drineas, P., Mahoney, M. W., and Muthukrishnan, S. (2006), “Sampling Algorithms for l2 Regression and Applications,” in Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, Philadelphia, PA: Society for Industrial and Applied Mathematics, pp. 1127–1136.
Google Scholar
Efron, B. (1979), “Bootstrap Methods: Another Look at the Jackknife,” The Annals of Statistics, 7, 1–26.
Web of Science ®Google Scholar
Efron, B., and Tibshirani, R. J. (1994), An Introduction to the Bootstrap, Boca Raton, FL: CRC Press.
Google Scholar
Fithian, W., and Hastie, T. (2014), “Local Case-Control Sampling: Efficient Subsampling in Imbalanced Data Sets,” Annals of Statistics, 42, 1693–1724.
PubMed Web of Science ®Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014), Bayesian Data Analysis (3rd ed.), Boca Raton, FL: Chapman and Hall/CRC.
Google Scholar
Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013), Applied Logistic Regression (Vol. 398), New York: Wiley.
Google Scholar
Kiefer, J. (1959), “Optimum Experimental Designs,” Journal of the Royal Statistical Society, Series B, 21, 272–319.
Google Scholar
King, G., and Zeng, L. (2001), “Logistic Regression in Rare Events Data,” Political Analysis, 9, 137–163.
Google Scholar
Kohavi, R. (1996), “Scaling up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207.
Google Scholar
Lichman, M. (2013), UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science. Available at http://archive.ics.uci.edu/ml
Google Scholar
Ma, P., Mahoney, M., and Yu, B. (2014), “A Statistical Perspective on Algorithmic Leveraging,” in Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 91–99.
Google Scholar
——— (2015), “A Statistical Perspective on Algorithmic Leveraging,” Journal of Machine Learning Research, 16, 861–911.
Web of Science ®Google Scholar
Ma, P., and Sun, X. (2015), “Leveraging for Big Data Regression,” Wiley Interdisciplinary Reviews: Computational Statistics, 7, 70–76.
Google Scholar
Mahoney, M. W., and Drineas, P. (2009), “CUR Matrix Decompositions for Improved Data Analysis,” Proceedings of the National Academy of Sciences, 106, 697–702.
PubMed Web of Science ®Google Scholar
McWilliams, B., Krummenacher, G., Lucic, M., and Buhmann, J. M. (2014), “Fast and Robust Least Squares Estimation in Corrupted Linear Models,” in Advances in Neural Information Processing Systems, pp. 415–423.
Google Scholar
Owen, A. B. (2007), “Infinitely Imbalanced Logistic Regression,” The Journal of Machine Learning Research, 8, 761–773.
Web of Science ®Google Scholar
R Core Team (2015), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, Available at https://www.R-project.org/
Google Scholar
Rokhlin, V., and Tygert, M. (2008), “A Fast Randomized Algorithm for Overdetermined Linear Least-Squares Regression,” Proceedings of the National Academy of Sciences, 105, 13212–13217.
PubMed Web of Science ®Google Scholar
Scott, A. J., and Wild, C. J. (1986), “Fitting Logistic Models Under Case-Control or Choice Based Sampling,” Journal of the Royal Statistical Society, Series B, 48, 170–182.
Google Scholar
Silvapulle, M. (1981), “On the Existence of Maximum Likelihood Estimators for the Binomial Response Models,” Journal of the Royal Statistical Society, Series B, 43, 310–313.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Optimal Subsampling for Large Sample Logistic Regression

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Optimal Subsampling for Large Sample Logistic Regression

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date