230
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Independence-Encouraging Subsampling for Nonparametric Additive Models

, ORCID Icon, ORCID Icon & ORCID Icon
Received 23 May 2023, Accepted 24 Feb 2024, Published online: 15 Apr 2024

References

  • Ai, M., Yu, J., Zhang, H., and Wang, H. (2021), “Optimal Subsampling Algorithms for Big Data Regressions,” Statistica Sinica, 31, 749–772. DOI: 10.5705/ss.202018.0439.
  • Breiman, L., and Friedman, J. H. (1985), “Estimating Optimal Transformations for Multiple Regression and Correlation,” Journal of the American Statistical Association, 80, 580–598. DOI: 10.1080/01621459.1985.10478157.
  • Buja, A., Hastie, T., and Tibshirani, R. (1989), “Linear Smoothers and Additive Models,” The Annals of Statistics, 17, 453–510. DOI: 10.1214/aos/1176347115.
  • Cheng, C.-S. (1980), “Orthogonal Arrays with Variable Numbers of Symbols,” The Annals of Statistics, 8, 447–453. DOI: 10.1214/aos/1176344964.
  • Cleveland, W. S. (1979), “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal of the American Statistical Association, 74, 829–836. DOI: 10.1080/01621459.1979.10481038.
  • Fan, J. (1992), “Design-Adaptive Nonparametric Regression,” Journal of the American statistical Association, 87, 998–1004. DOI: 10.1080/01621459.1992.10476255.
  • Fan, J., and Gijbels, I. (1996), Local Polynomial Modelling and Its Applications, London: Chapman and Hall.
  • Han, L., Tan, K. M. Yang, T., and Zhang, T. (2020), “Local Uncertainty Sampling for Large-Scale Multiclass Logistic Regression,” The Annals of Statistics, 48, 1770–1788. DOI: 10.1214/19-AOS1867.
  • Hastie, T. (2015), Generalized Additive Models, R package version 1.20.1.
  • Hastie, T., and Tibshirani, R. (1986), “Generalized Additive Models,” Statistical Science, 1, 297–310. DOI: 10.1214/ss/1177013604.
  • ——- (1990), Generalized Additive Models, New York: Chapman and Hall.
  • He, L., and Hung, Y. (2022), “Gaussian Process Prediction Using Design-based Subsampling,” Statistica Sinica, 32, 1165–1186. DOI: 10.5705/ss.202019.0376.
  • Hedayat, A., Sloane, N., and Stufken, J. (1999), Orthogonal Arrays: Theory and Applications. Springer Series in Statistics, New York: Springer.
  • Joseph, V. R., Gul, E., and Ba, S. (2015), “Maximum Projection Designs for Computer Experiments,” Biometrika, 102, 371–380. DOI: 10.1093/biomet/asv002.
  • Joseph, V. R., and Mak, S. (2021), “Supervised Compression of Big Data,” Statistical Analysis and Data Mining: The ASA Data Science Journal, 14, 217–229. DOI: 10.1002/sam.11508.
  • Lin, C. D., and Tang, B. (2015), “Latin Hypercubes and Space-Filling Designs,” in Handbook of Design and Analysis of Experiments, pp. 593–625.
  • Ma, P., Mahoney, M., and Yu, B. (2014), “A Statistical Perspective on Algorithmic Leveraging,” in International Conference on Machine Learning, pp. 91–99, PMLR.
  • Ma, P., and Sun, X. (2015), “Leveraging for Big Data Regression,” Wiley Interdisciplinary Reviews: Computational Statistics, 7, 70–76. DOI: 10.1002/wics.1324.
  • Ma, P., Zhang, X., Xing, X., Ma, J., and Mahoney, M. (2020), “Asymptotic Analysis of Sampling Estimators for Randomized Linear Algebra Algorithms,” PMLR, 108, 1026–1035.
  • Mak, S., and Joseph, V. R. (2018), “Support Points,” The Annals of Statistics, 46, 2562–2592. DOI: 10.1214/17-AOS1629.
  • Meng, C., Xie, R., Mandal, A., Zhang, X., Zhong, W., and Ma, P. (2021), “Lowcon: A Design-based Subsampling Approach in a Misspecified Linear Model,” Journal of Computational and Graphical Statistics, 30, 694–708. DOI: 10.1080/10618600.2020.1844215.
  • Meng, C., Zhang, X., Zhang, J., Zhong, W., and Ma, P. (2020), “More Efficient Approximation of Smoothing Splines via Space-Filling Basis Selection,” Biometrika, 107, 723–735. DOI: 10.1093/biomet/asaa019.
  • Mukerjee, R., and Wu, C.-F. (2006), A Modern Theory of Factorial Design, New York: Springer.
  • Opsomer, J. D. (2000), “Asymptotic Properties of Backfitting Estimators,” Journal of Multivariate Analysis, 73, 166–179. DOI: 10.1006/jmva.1999.1868.
  • Opsomer, J. D., and Ruppert, D. (1997), “Fitting a Bivariate Additive Model by Local Polynomial Regression,” The Annals of Statistics, 25, 186–211. DOI: 10.1214/aos/1034276626.
  • Shi, C., and Tang, B. (2021), “Model-Robust Subdata Selection for Big Data,” Journal of Statistical Theory and Practice, 15, 1–17. DOI: 10.1007/s42519-021-00217-9.
  • Wang, H., and Ma, Y. (2021), “Optimal Subsampling for Quantile Regression in Big Data,” Biometrika, 108, 99–112. DOI: 10.1093/biomet/asaa043.
  • Wang, H., Yang, M., and Stufken, J. (2019), “Information-based Optimal Subdata Selection for Big Data Linear Regression,” Journal of the American Statistical Association, 114, 393–405. DOI: 10.1080/01621459.2017.1408468.
  • Wang, H., Zhu, R., and Ma, P. (2018), “Optimal Subsampling for Large Sample Logistic Regression,” Journal of the American Statistical Association, 113, 829–844. DOI: 10.1080/01621459.2017.1292914.
  • Wang, L. (2022), “Balanced Subsampling for Big Data with Categorical Covariates,” arXiv preprint arXiv:2212.12595.
  • Wang, L., Elmstedt, J., Wong, W. K., and Xu, H. (2021), “Orthogonal Subsampling for Big Data Linear Regression,” The Annals of Applied Statistics, 15, 1273–1290. DOI: 10.1214/21-AOAS1462.
  • Wang, L., and Xu, H. (2022), “A Class of Multilevel Nonregular Designs for Studying Quantitative Factors,” Statistica Sinica, 32, 825–845. DOI: 10.5705/ss.202020.0223.
  • Wasserman, L. (2006), All of Nonparametric Statistics, New York: Springer.
  • Wu, C. J., and Hamada, M. S. (2011), Experiments: Planning, Analysis, and Optimization, Hoboken, NJ: Wiley.
  • Yang, Y., Pilanci, M., and Wainwright, M. J. (2017), “Randomized Sketches for Kernels: Fast and Otimal Nonparametric Regression,” The Annals of Statistics, 45, 991–1023. DOI: 10.1214/16-AOS1472.
  • Yi, S.-Y., and Zhou, Y.-D. (2023), “Model-Free Global Likelihood Subsampling for Massive Data,” Statistics and Computing, 33, 9. DOI: 10.1007/s11222-022-10185-0.
  • Zhang, J., Meng, C., Yu, J., Zhang, M., Zhong, W., and Ma, P. (2023), “An Optimal Transport Approach for Selecting a Representative Subsample with Application in Efficient Kernel Density Estimation,” Journal of Computational and Graphical Statistics, 32, 329–339. DOI: 10.1080/10618600.2022.2084404.
  • Zhang, X., Park, B. U., and Wang, J.-L. (2013), “Time-Varying Additive Models for Longitudinal Data,” Journal of the American Statistical Association, 108, 983–998. DOI: 10.1080/01621459.2013.778776.
  • Zhao, Y., Amemiya, Y., and Hung, Y. (2018), “Efficient Gaussian Process Modeling Using Experimental Design-based Subagging,” Statistica Sinica, 28, 1459–1479.
  • Zhu, J., Wang, L., and Sun, F. (2024), “Group-Orthogonal Subsampling for Hierarchical Data based on Linear Mixed Models,” Journal of Computational and Graphical Statistics. DOI: 10.1080/10618600.2023.2301093.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.