104
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A Projection Approach to Local Regression with Variable-Dimension Covariates

ORCID Icon, ORCID Icon &
Received 13 Feb 2023, Accepted 16 May 2024, Published online: 17 Jun 2024

References

  • Azzalini, A., and Bowman, A. W. (1990), “A Look at Some Data on the Old Faithful Geyser,” Journal of the Royal Statistical Society, Series C, 39, 357–365. DOI: 10.2307/2347385.
  • Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B. (2017), “Julia: A Fresh Approach to Numerical Computing,” SIAM Review, 59, 65–98. https://epubs.siam.org/doi/10.1137/141000671. DOI: 10.1137/141000671.
  • Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015), “Dirichlet–Laplace Priors for Optimal Shrinkage,” Journal of the American Statistical Association, 110, 1479–1490. DOI: 10.1080/01621459.2014.960967.
  • Cao, J., Moosman, A., and Johnson, V. E. (2010), “A Bayesian Chi-Squared Goodness-of-Fit Test for Censored Data Models,” Biometrics, 66, 426–434. DOI: 10.1111/j.1541-0420.2009.01294.x.
  • Chandra, N. K., Sarkar, A., de Groot, J. F., Yuan, Y., and Müller, P. (2023), “Bayesian Nonparametric Common Atoms Regression for Generating Synthetic Controls in Clinical Trials,” Journal of the American Statistical Association, 118, 2301–2314. DOI: 10.1080/01621459.2023.2231581.
  • Chen, J., and Revels, J. (2016), “Robust Benchmarking in Noisy Environments.” arXiv preprint arXiv:1608.04295v1. Available at https://arxiv.org/abs/1608.04295.
  • Daniels, M. J., and Hogan, J. W. (2008), Missing Data in Longitudinal Studies, Chapman & Hall/CRC Interdisciplinary Statistics, Boca Raton, FL: CRC Press.
  • Di, X., Liu, S., Xiang, L., and Jin, X. (2023), “Association between the Systemic Immune-Inflammation Index and Kidney Stone: A Cross-Sectional Study of NHANES 2007-2018,” Frontiers in Immunology, 14, 1116224. DOI: 10.3389/fimmu.2023.1116224.
  • Dunn, P. K., and Smyth, G. K. (1996), “Randomized Quantile Residuals,” Journal of Computational and Graphical Statistics, 5, 236–244. DOI: 10.1080/10618600.1996.10474708.
  • Fraley, C., and Raftery, A. E. (2002), “Model-based Clustering, Discriminant Analysis, and Density Estimation,” Journal of the American Statistical Association, 97, 611–631. DOI: 10.1198/016214502760047131.
  • Friedberg, R., Tibshirani, J., Athey, S., and Wager, S. (2021), “Local Linear Forests,” Journal of Computational and Graphical Statistics, 30, 503–517. DOI: 10.1080/10618600.2020.1831930.
  • Friedman, J. H. (1991), “Multivariate Adaptive Regression Splines,” The Annals of Statistics, 19, 1–67. DOI: 10.1214/aos/1176347963.
  • Gangwisch, J. E., Malaspina, D., Boden-Albala, B., and Heymsfield, S. B. (2005), “Inadequate Sleep as a Risk Factor for Obesity: Analyses of the NHANES I,” Sleep, 28, 1289–1296. DOI: 10.1093/sleep/28.10.1289.
  • Goodrich, B., Gabry, J., Ali, I., and Brilleman, S. (2022), “rstanarm: Bayesian Applied Regression Modeling via Stan. R package version 2.21.3. Available at https://mc-stan.org/rstanarm/.
  • Guha, S. (2010), “Posterior Simulation in Countable Mixture Models for Large Datasets,” Journal of the American Statistical Association, 105, 775–786. DOI: 10.1198/jasa.2010.tm09340.
  • Hartwell, M. L., Khojasteh, J., Wetherill, M. S., Croff, J. M., and Wheeler, D. (2019), “Using Structural Equation Modeling to Examine the Influence of Social, Behavioral, and Nutritional Variables on Health Outcomes Based on NHANES Data: Addressing Complex Design, Nonnormally Distributed Variables, and Missing Information,” Current Developments in Nutrition, 3, nnzz010. DOI: 10.1093/cdn/nzz010.
  • Heiner, M. J. (2023), ProductPartitionModels.jl: Models that Employ a Product Partition Distribution as a Prior on Partitions, Julia package version 0.8.2. Available at https://github.com/mheiner/ProductPartitionModels.jl
  • Jiang, W., Bogdan, M., Josse, J., Majewski, S., Miasojedow, B., Roçková, V., and Group, T. (2022), “Adaptive Bayesian SLOPE: Model Selection With Incomplete Data,” Journal of Computational and Graphical Statistics, 31, 113–137. DOI: 10.1080/10618600.2021.1963263.
  • Kapelner, A., and Bleich, J. (2015), “Prediction with Missing Data via Bayesian Additive Regression Trees,” Canadian Journal of Statistics, 43, 224–239. DOI: 10.1002/cjs.11248.
  • ——— (2016), “bartMachine: Machine Learning with Bayesian Additive Regression Trees,” Journal of Statistical Software, 70, 1–40.
  • Liaw, A., and Wiener, M. (2002), “Classification and Regression by randomForest,” R News, 2, 18–22. https://CRAN.R-project.org/doc/Rnews/.
  • Little, R. J., and Rubin, D. B. (2019), Statistical Analysis with Missing Data (3rd ed.), Hoboken, NJ: Wiley.
  • Liu, J. S. (1996), “Peskun’s Theorem and a Modified Discrete-State Gibbs Sampler,” Biometrika, 83, 681–682. DOI: 10.1093/biomet/83.3.681.
  • Mercaldo, S. F., and Blume, J. D. (2020), “Missing Data and Prediction: The Pattern Submodel,” Biostatistics, 21, 236–252. DOI: 10.1093/biostatistics/kxy040.
  • Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A., and Verbeke, G. (2014), Handbook of Missing Data Methodology, Chapman & Hall/CRC Handbooks of Modern Statistical Methods, Taylor & Francis. Available at https://books.google.com/books?id=6IzaBAAAQBAJ.
  • Müller, P., Quintana, F., and Rosner, G. L. (2011), “A Product Partition Model With Regression on Covariates,” Journal of Computational and Graphical Statistics, 20, 260–277. DOI: 10.1198/jcgs.2011.09066.
  • Murray, I., Adams, R., and MacKay, D. (2010), “Elliptical Slice Sampling,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings.
  • Neal, R. M. (2000), “Markov Chain Sampling Methods for Dirichlet Process Mixture Models,” Journal of Computational and Graphical Statistics, 9, 249–265. DOI: 10.1080/10618600.2000.10474879.
  • ——— (2003), “Slice Sampling,” The Annals of Statistics, 31, 705–767.
  • Ni, Y., Müller, P., Diesendruck, M., Williamson, S., Zhu, Y., and Ji, Y. (2020), “Scalable Bayesian Nonparametric Clustering and Classification,” Journal of Computational and Graphical Statistics, 29, 53–65. DOI: 10.1080/10618600.2019.1624366.
  • Page, G. L., and Quinlan, J. J. (2022), ppmSuite: A Collection of Models that Employ a Product Partition Distribution as a Prior on Partitions, R package version 0.2.4.
  • Page, G. L., Quintana, F. A., and Müller, P. (2022), “Clustering and Prediction with Variable Dimension Covariates,” Journal of Computational and Graphical Statistics, 31, 466–476. DOI: 10.1080/10618600.2021.1999824.
  • Pridham, G., Rockwood, K., and Rutenberg, A. (2022), “Strategies for Handling Missing Data that Improve Frailty Index Estimation and Predictive Power: Lessons from the NHANES Dataset,” GeroScience, 44, 897–923. DOI: 10.1007/s11357-021-00489-w.
  • Pruim, R. (2015), NHANES: Data from the US National Health and Nutrition Examination Study. R package version 2.1.0. Available at https://CRAN.R-project.org/package=NHANES.
  • Quintana, F. A., and Iglesias, P. L. (2003), “Bayesian Clustering and Product Partition Models,” Journal of the Royal Statistical Society, Series B, 65, 557–574. DOI: 10.1111/1467-9868.00402.
  • Quintana, F. A., Müller, P., and Papoila, A. L. (2015), “Cluster-Specific Variable Selection for Product Partition Models,” Scandinavian Journal of Statistics, 42, 1065–1077. DOI: 10.1111/sjos.12151.
  • R Core Team. (2022), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing. Available at https://www.R-project.org/.
  • Rackauckas, C., and Byrne, S. (2022), “RCall.jl”. Julia package version 0.13.14. Available at https://github.com/JuliaInterop/RCall.jl.
  • Schouten, R. M., Lugtig, P., and Vink, G. (2018), “Generating Missing Values for Simulation Purposes: A Multivariate Amputation Procedure,” Journal of Statistical Computation and Simulation, 88, 2909–2930. DOI: 10.1080/00949655.2018.1491577.
  • Stekhoven, D. J. (2022), missForest: Nonparametric Missing Value Imputation using Random Forest, R package version 1.5.
  • Stekhoven, D. J., and Buehlmann, P. (2012), “MissForest - Non-parametric Missing Value Imputation for Mixed-Type Data,” Bioinformatics, 28, 112–118. DOI: 10.1093/bioinformatics/btr597.
  • Su, Y.-S., Gelman, A., Hill, J., and Yajima, M. (2011), “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box,” Journal of Statistical Software, 45, 1–31. Available at https://www.jstatsoft.org/index.php/jss/article/view/v045i02. DOI: 10.18637/jss.v045.i02.
  • van Buuren, S., and Groothuis-Oudshoorn, K. (2011), “mice: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software, 45, 1–67. Available at https://www.jstatsoft.org/v45/i03/. DOI: 10.18637/jss.v045.i03.
  • Venables, W. N., and Ripley, B. D. (2002), Modern Applied Statistics with S (4th ed.), New York: Springer, available at URL https://www.stats.ox.ac.uk/pub/MASS4/.
  • Wickham, H. (2016), ggplot2: Elegant Graphics for Data Analysis, New York: Springer-Verlag, available at URL https://ggplot2.tidyverse.org.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.