Search in:

Advanced search

Journal of the American Statistical Association Volume 117, 2022 - Issue 537

Submit an article Journal homepage

3,145

Views

CrossRef citations to date

Altmetric

Theory and Methods

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data

Jun Yua School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, ChinaView further author information

HaiYing Wangb Department of Statistics, University of Connecticut, Storrs, CTView further author information

Mingyao Aic LMAM, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, ChinaCorrespondence[email protected]
View further author information

Huiming Zhangd School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, ChinaView further author information

Pages 265-276 | Received 31 Jan 2019, Accepted 20 May 2020, Published online: 07 Jul 2020

Cite this article
https://doi.org/10.1080/01621459.2020.1773832
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Berger, Y. G., and De La Riva Torres, O. (2016), “Empirical Likelihood Confidence Intervals for Complex Sampling Designs,” Journal of the Royal Statistical Society, Series B, 78, 319–314. DOI: https://doi.org/10.1111/rssb.12115.
Google Scholar
Breidt, F. J., and Opsomer, J. D. (2000), “Local Polynomial Regression Estimators in Survey Sampling,” The Annals of Statistics, 28, 1026–1053. DOI: https://doi.org/10.1214/aos/1015956706.
Web of Science ®Google Scholar
Chen, K., Hu, I., and Ying, Z. (1999), “Strong Consistency of Maximum Quasi-Likelihood Estimators in Generalized Linear Models With Fixed and Adaptive Designs,” The Annals of Statistics, 27, 1155–1163. DOI: https://doi.org/10.1214/aos/1017938919.
Web of Science ®Google Scholar
Chen, X. (2011), Quasi Likelihood Method for Generalized Linear Model (in Chinese), Hefei: Press of University of Science and Technology of China.
Google Scholar
Dhillon, P. S., Lu, Y., Foster, D., and Ungar, L. (2013), “New Subsampling Algorithms for Fast Least Squares Regression,” in International Conference on Neural Information Processing Systems, pp. 360–368.
Google Scholar
Drineas, P., Mahoney, M. W., Muthukrishnan, S., and Sarlós, T. (2011), “Faster Least Squares Approximation,” Numerische Mathematik, 117, 219–249. DOI: https://doi.org/10.1007/s00211-010-0331-6.
Web of Science ®Google Scholar
Duchi, J. C., Agarwal, A., and Wainwright, M. J. (2012), “Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling,” IEEE Transactions on Automatic Control, 57, 592–606. DOI: https://doi.org/10.1109/TAC.2011.2161027.
Web of Science ®Google Scholar
Fahrmeir, L., and Tutz, G. (2001), Multivariate Statistical Modelling Based on Generalized Linear Models, New York: Springer-Verlag.
Google Scholar
Jordan, M. I., Lee, J. D., and Yang, Y. (2019), “Communication-Efficient Distributed Statistical Inference,” Journal of the American Statistical Association, 114, 668–681. DOI: https://doi.org/10.1080/01621459.2018.1429274.
Web of Science ®Google Scholar
Kleiner, A., Talwalkar, A., Sarkar, P., and Jordan, M. I. (2015), “A Scalable Bootstrap for Massive Data,” Journal of the Royal Statistical Society, Series B, 76, 795–816. DOI: https://doi.org/10.1111/rssb.12050.
Google Scholar
Li, R., Lin, D. K., and Li, B. (2013), “Statistical Inference in Massive Data Sets,” Applied Stochastic Models in Business and Industry, 29, 399–409. DOI: https://doi.org/10.1002/asmb.1927.
Web of Science ®Google Scholar
Lin, N., and Xi, R. (2011), “Aggregated Estimating Equation Estimation,” Statistics & Its Interface, 1, 73–83. DOI: https://doi.org/10.4310/SII.2011.v4.n1.a8.
Google Scholar
Ma, P., Mahoney, M. W., and Yu, B. (2015), “A Statistical Perspective on Algorithmic Leveraging,” Journal of Machine Learning Research, 16, 861–919.
Web of Science ®Google Scholar
Mahoney, M. W. (2012), “Randomized Algorithms for Matrices and Data,” Foundations and Trends[textregistered] in Machine Learning, 3, 647–672.
Google Scholar
Mccullagh, P., and Nelder, J. A. (1989), Generalized Linear Models, Monographs on Statistics and Applied Probability (Vol. 37), London: Chapman & Hall.
Google Scholar
Neath, A. A., and Cavanaugh, J. E. (2012), “The Bayesian Information Criterion: Background, Derivation, and Applications,” Wiley Interdisciplinary Reviews: Computational Statistics, 4, 199–203. DOI: https://doi.org/10.1002/wics.199.
Google Scholar
Newey, W. K., and McFadden, D. (1994), “Large Sample Estimation and Hypothesis Testing,” in Handbook of Econometrics (Vol. 4), eds. R. F. Engle and D. L. McFadden, Amsterdam: Elsevier, pp. 2111–2245.
Google Scholar
Pukelsheim, F. (2006), Optimal Design of Experiments, Philadelphia, PA: Society for Industrial and Applied Mathematics.
Google Scholar
Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2019), “Speeding Up MCMC by Efficient Data Subsampling,” Journal of the American Statistical Association, 114, 831–843. DOI: https://doi.org/10.1080/01621459.2018.1448827.
Web of Science ®Google Scholar
R Core Team (2018), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, available at https://www.R-project.org/.
Google Scholar
Rao, C. R., Toutenburg, H., Shalabh and Heumann, C. (2007), Linear Models and Generalizations: Least Squares and Alternatives (3rd ed.), Berlin, Heidelberg: Springer Publishing Company, Inc.
Google Scholar
Särndal, C. E., Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling, New York: Springer.
Google Scholar
Schifano, E. D., Wu, J., Wang, C., Yan, J., and Chen, M.-H. (2016), “Online Updating of Statistical Inference in the Big Data Setting,” Technometrics, 58, 393–403. DOI: https://doi.org/10.1080/00401706.2016.1142900.
PubMed Web of Science ®Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., and Su, Z. (2008), “Arnetminer: Extraction and Mining of Academic Social Networks,” in KDD’08, pp. 990–998.
Google Scholar
Tzavelas, G. (1998), “A Note on the Uniqueness of the Quasi-Likelihood Estimator,” Statistics & Probability Letters, 38, 125–130. DOI: https://doi.org/10.1016/S0167-7152(97)00162-4.
Web of Science ®Google Scholar
van der Vaart, A. (1998), Asymptotic Statistics, New York: Cambridge University Press.
Google Scholar
Wang, H. Y., Yang, M., and Stufken, J. (2019), “Information-Based Optimal Subdata Selection for Big Data Linear Regression,” Journal of the American Statistical Association, 114, 393–405. DOI: https://doi.org/10.1080/01621459.2017.1408468.
Web of Science ®Google Scholar
Wang, H. Y., Zhu, R., and Ma, P. (2018), “Optimal Subsampling for Large Sample Logistic Regression,” Journal of the American Statistical Association, 113, 829–844. DOI: https://doi.org/10.1080/01621459.2017.1292914.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date