Publication Cover
Statistics
A Journal of Theoretical and Applied Statistics
Volume 57, 2023 - Issue 4
215
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Optimal subsampling algorithms for composite quantile regression in massive data

, &
Pages 811-843 | Received 18 Nov 2020, Accepted 17 Jul 2023, Published online: 24 Jul 2023

References

  • Lin N, Xi R. Aggregated estimating equation estimation. Stat Interface. 2011;4(1):73–83. doi: 10.4310/SII.2011.v4.n1.a8
  • Li R, Lin DK, Li B. Statistical inference in massive data sets. Appl Stoch Models Bus Ind. 2013;29:399–409.doi: 10.1002/asmb.1927
  • Schifano ED, Wu J, Wang C, et al. Online updating of statistical inference in the big data setting. Technometrics. 2016;58:393–403. doi: 10.1080/00401706.2016.1142900
  • Jordan MI, Lee JD, Yang Y. Communication-efficient distributed statistical inference. J Am Stat Assoc. 2019;114:668–681. doi: 10.1080/01621459.2018.1429274
  • Drineas P, Mahoney MW, Muthukrishnan S, et al. Faster least squares approximation. Numer Math. 2011;117:219–249. doi: 10.1007/s00211-010-0331-6
  • Dhillon PS, Lu Y, Foster D, et al. New subsampling algorithms for fast least squares regression. In: Advances in neural information processing systems; 2013. p. 360–368.
  • Kleiner A, Talwalkar A, Sarkar P, et al. A scalable bootstrap for massive data. J R Stat Soc Ser B. 2014;76:795–816. doi: 10.1111/rssb.12050
  • Ma P, Mahoney MW, Yu B. A statistical perspective on algorithmic leveraging. J Mach Learn Res. 2015;16:861–919.
  • Clarkson KL, Woodruff DP. Low rank approximation and regression in input sparsity time. J ACM. 2017;63:1–45. doi: 10.1145/3019134
  • Yang Y, Pilanci M, Wainwright MJ. Randomized sketches for kernels: fast and optimal nonparametric regression. Ann Stat. 2017;45:991–1023. doi: 10.1214/16-AOS1472
  • Quiroz M, Kohn R, Villani M, et al. Speeding up MCMC by efficient data subsampling. J Am Stat Assoc. 2019;114:831–843. doi: 10.1080/01621459.2018.1448827
  • Zhang A, Zhang H, Yin G. Adaptive iterative Hessian sketch via A-optimal subsampling. Stat Comput. 2020;30:1075–1090. doi: 10.1007/s11222-020-09936-8
  • Drineas P, Mahoney MW, Muthukrishnan S. Sampling algorithms for l2 regression and applications. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm; 2006. p. 1127–1136.
  • Drineas P, Magdon-Ismail M, Mahoney MW, et al. Faster approximation of matrix coherence and statistical leverage. J Mach Learn Res. 2012;13:3475–3506.
  • Raskutti G, Mahoney M. A statistical perspective on randomized sketching for ordinary least-squares. J Mach Learn Res. 2016;17:1–31.
  • Wang HY, Zhu R, Ma P. Optimal subsampling for large sample logistic regression. J Am Stat Assoc. 2018;113:829–844. doi: 10.1080/01621459.2017.1292914
  • Wang HY, Yang M, Stufken J. Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc. 2019;114:393–405. doi: 10.1080/01621459.2017.1408468
  • Yao YQ, Wang HY. Optimal subsampling for softmax regression. Stat Pap. 2019;60:585–599. doi: 10.1007/s00362-018-01068-6
  • Ai M, Yu J, Zhang HM, et al. Optimal subsampling algorithms for big data generalized linear models. Stat Sin. 2019. doi: 10.5705/ss.202018.0439
  • Yu J, Wang HY, Ai M, et al. Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. J Am Stat Assoc. 2022;117:265–276. doi: 10.1080/01621459.2020.1773832
  • Wang HY, Ma YY. Optimal subsampling for quantile regression in big data. Biometrika. 2021;108:99–112. doi: 10.1093/biomet/asaa043
  • Ai M, Wang F, Yu J, et al. Optimal subsampling for large-scale quantile regression. J Complex. 2020; 62. doi: 10.1016/j.jco.2020.101512
  • Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. Ann Stat. 2008;36:1108–1126.doi: 10.1214/07-AOS507
  • Kai B, Li R, Zou H. New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat. 2011;39:305–332. doi: 10.1214/10-AOS842
  • Jiang R, Zhou ZG, Qian WM, et al. Two step composite quantile regression for single-index models. Comput Stat Data Anal. 2013;64:180–191. doi: 10.1016/j.csda.2013.03.014
  • Ning Z, Tang L. Estimation and test procedures for composite quantile regression with covariates missing at random. Stat Probab Lett. 2014;95:15–25. doi: 10.1016/j.spl.2014.08.003
  • Zhao K, Lian H. A note on the efficiency of composite quantile regression. J Stat Comput Simul. 2016;86:1334–1341. doi: 10.1080/00949655.2015.1062096
  • Jin J, Ma T, Dai J, et al. Penalized weighted composite quantile regression for partially linear varying coefficient models with missing covariates. Comput Stat. 2021;36:541–575. doi: 10.1007/s00180-020-01012-z
  • Hansen MH, Hurwitz WN. On the theory of sampling from finite populations. Ann Rheum Dis. 1993;14:2111–2118.
  • Pukelsheim F. Optimal design of experiments. Philadelphia: Society for Industrial and Applied Mathematics; 2006.
  • Yang M. On the de la Garza phenomenon. Ann Stat. 2010;38:2499–2524. doi: 10.1214/09-AOS787
  • Atkinson A, Donev A, Tobias R. Optimum experimental designs, with SAS. Oxford: Oxford University Press; 2007.
  • Battey H, Fan J, Liu H, et al. Distributed testing and estimation under sparse high dimensional models. Ann Stat. 2018;46:1352–1382. doi: 10.1214/17-AOS1587
  • Volgushev S, Chao S-K, Cheng G. Distributed inference for quantile regression processes. Ann Stat. 2019;47:1634–1662. doi: 10.1214/18-AOS1730
  • Tian Y, Zhu Q, Tian M. Estimation of linear composite quantile regression using EM algorithm. Stat Probab Lett. 2016;117:183–191. doi: 10.1016/j.spl.2016.05.019
  • Liang X, Zou T, Guo B, et al. Assessing Beijing's PM2.5 pollution: severity, weather impact, APEC and winter heating. Proc R Soc Lond A. 2015;471:20150257.doi: 10.1098/rspa.2015.0257
  • Chen L, Zhou Y, Quantile regression in big data: a divide and conquer based strategy. Comput Stat Data Anal. 2019. doi: 10.1016/j.csda.2019.106892
  • Wang C, Chen MH, Schifano ED, et al. Statistical methods and computing for big data. In: Statistics and its interface, arXiv:1502.07989.
  • Jiang R, Hu XP, Yu KM, et al. Composite quantile regression for massive datasets. Statistics. 2018;52(5):980–1004. doi: 10.1080/02331888.2018.1500579
  • Knight K. Limiting distributions for L1 regression estimators under general conditions. Ann Stat. 1998;26:755–770. doi: 10.1214/aos/1028144858
  • Koenker R. Quantile regression. Cambridge: Cambridge University Press; 2005.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.