195
Views
0
CrossRef citations to date
0
Altmetric
Research Article

smashGP: Large-Scale Spatial Modeling via Matrix-Free Gaussian Processes

, , & ORCID Icon
Received 24 Jan 2023, Accepted 03 May 2024, Published online: 13 Jun 2024

References

  • Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., and Keyes, D. E. (2018), “ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems,” IEEE Transactions on Parallel and Distributed Systems, 29, 2771–2784. DOI: 10.1109/TPDS.2018.2850749.
  • Anderson, C., Lee, D., and Dean, N. (2014), “Identifying Clusters in Bayesian Disease Mapping,” Biostatistics, 15, 457–469. DOI: 10.1093/biostatistics/kxu005.
  • Andugula, P., Durbha, S. S., Lokhande, A., and Suradhaniwar, S. (2017), “Gaussian Process based Spatial Modeling of Soil Moisture for Dense Soil Moisture Sensing Network,” in 2017 6th International Conference on Agro-Geoinformatics, pp. 1–5. DOI: 10.1109/Agro-Geoinformatics.2017.8047014.
  • Anitescu, M., Chen, J., and Wang, L. (2012), “A Matrix-free Approach for Solving the Parametric Gaussian Process Maximum Likelihood Problem,” SIAM Journal on Scientific Computing, 34, A240–A262. DOI: 10.1137/110831143.
  • Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2014), Hierarchical Modeling and Analysis for Spatial Data, Boca Raton, FL: CRC Press.
  • Börm, S., and Garcke, J. (2007), “Approximating Gaussian Processes with H2 Matrices,” in European Conferenceon Machine Learning, pp. 42–53, Springer.
  • Cai, D., Chow, E., Erlandson, L., Saad, Y., and Xi, Y. (2018), “SMASH: Structured Matrix Approximation by Separation and Hierarchy,” Numerical Linear Algebra with Applications, 25, e2204. DOI: 10.1002/nla.2204.
  • Cressie, N. (2015), Statistics for Spatial Data, New York: Wiley.
  • Cressie, N., and Johannesson, G. (2008), “Fixed Rank Kriging for Very Large Spatial Data Sets,” Journal of the Royal Statistical Society, Series B, 70, 209–226. DOI: 10.1111/j.1467-9868.2007.00633.x.
  • Cutajar, K., Osborne, M., Cunningham, J., and Filippone, M. (2016), “Preconditioning Kernel Matrices,” in International Conference on Machine Learning, pp. 2529–2538.
  • Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016), “Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets,” Journal of the American Statistical Association, 111, 800–812. DOI: 10.1080/01621459.2015.1044091.
  • Dongarra, J., Croz, J. D., Hammarling, S., and Duff, I. (1990), “A Set of Level 3 Basic Linear Algebra Subprograms,” ACM Transactions on Mathematical Software, 16, 1–17. DOI: 10.1145/77626.79170.
  • Erlandson, L., Cai, D., Xi, Y., and Chow, E. (2020), “Accelerating Parallel Hierarchical Matrix-Vector Products via Data-Driven Sampling,” in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 749–758.
  • Fang, D., Zhang, X., Yu, Q., Jin, T. C., and Tian, L. (2018), “A Novel Method for Carbon Dioxide Emission Forecasting Based on Improved Gaussian Processes Regression,” Journal of Cleaner Production, 173, 143–150. DOI: 10.1016/j.jclepro.2017.05.102.
  • Finley, A. O., Datta, A., Cook, B. D., Morton, D. C., Andersen, H. E., and Banerjee, S. (2019), “Efficient Algorithms for Bayesian Nearest Neighbor Gaussian Processes,” Journal of Computational and Graphical Statistics, 28, 401–414. DOI: 10.1080/10618600.2018.1537924.
  • Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009), “Improving the Performance of Predictive Process Modeling for Large Datasets,” Computational Statistics & Data Analysis, 53, 2873–2884. DOI: 10.1016/j.csda.2008.09.008.
  • Furrer, R., Genton, M. G., and Nychka, D. (2006), “Covariance Tapering for Interpolation of Large Spatial Datasets,” Journal of Computational and Graphical Statistics, 15, 502–523. DOI: 10.1198/106186006X132178.
  • Furrer, R., and Sain, S. (2010), “spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields,” Journal of Statistical Software, 36, 1–25. DOI: 10.18637/jss.v036.i10.
  • Gardner, J., Pleiss, G., Weinberger, K. Q., Bindel, D., and Wilson, A. G. (2018), “GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration,” in Advances in Neural Information Processing Systems (Vol. 31), eds. S. Bengio, H. Bengio, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Curran Associates, Inc.
  • Geoga, C. J., Anitescu, M., and Stein, M. L. (2020), “Scalable Gaussian Process Computations Using Hierarchical Matrices,” Journal of Computational and Graphical Statistics, 29, 227–237. DOI: 10.1080/10618600.2019.1652616.
  • Gerber, F., de Jong, R., Schaepman, M. E., Schaepman-Strub, G., and Furrer, R. (2018), “Predicting Missing Values in Spatio-Temporal Remote Sensing Data,” IEEE Transactions on Geoscience and Remote Sensing, 56, 2841–2853. DOI: 10.1109/TGRS.2017.2785240.
  • Gneiting, T., and Raftery, A. E. (2007), “Strictly Proper Scoring Rules, Prediction, and Estimation,” Journal of the American Statistical Association, 102, 359–378. DOI: 10.1198/016214506000001437.
  • Gramacy, R. (2016), “laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R,” Journal of Statistical Software, Articles, 72, 1–46. DOI: 10.18637/jss.v072.i01.
  • Guhaniyogi, R., and Banerjee, S. (2018), “Meta-Kriging: Scalable Bayesian Modeling and Inference for Massive Spatial Datasets,” Technometrics, 60, 430–444. DOI: 10.1080/00401706.2018.1437474.
  • Guinness, J. (2019), “Spectral Density Estimation for Random Fields via Periodic Embeddings,” Biometrika, 106, 267–286. DOI: 10.1093/biomet/asz004.
  • Hackbusch, W. (2015), Hierarchical Matrices: Algorithms and Analysis (Vol. 49), Berlin: Springer.
  • Heaton, M. J., Christensen, W. F., and Terres, M. A. (2017), “Nonstationary Gaussian Process Models Using Spatial Hierarchical Clustering from Finite Differences,” Technometrics, 59, 93–101. DOI: 10.1080/00401706.2015.1102763.
  • Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., Nychka, D. W., Sun, F., and Zammit-Mangion, A. (2019), “A Case Study Competition Among Methods for Analyzing Large Spatial Data,” Journal of Agricultural, Biological and Environmental Statistics, 24, 398–425. DOI: 10.1007/s13253-018-00348-w.
  • Hensman, J., Fusi, N., and Lawrence, N. D. (2013), “Gaussian Processes for Big Data,” in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, Arlington, Virginia, USA: AUAI Press, UAI’13, pp. 282–290.
  • Hutchinson, M. (1989), “A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines,” Communication in Statistics- Simulation and Computation, 18, 1059–1076.
  • Jurek, M., and Katzfuss, M. (2021), “Multi-Resolution Filters for Massive Spatio-Temporal Data,” Journal of Computational and Graphical Statistics, 30, 1095–1110. DOI: 10.1080/10618600.2021.1886938.
  • Katzfuss, M. (2017), “A Multi-Resolution Approximation for Massive Spatial Datasets,” Journal of the American Statistical Association, 112, 201–214. DOI: 10.1080/01621459.2015.1123632.
  • Kaul, M., Yang, B., and Jensen, C. S. (2013), “Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems,” in 2013 IEEE 14th International Conference on Mobile Data Management (Vol. 1), pp. 137–146.
  • Keshavarzzadeh, V., Zhe, S., Kirby, R. M., and Narayan, A. (2021), “GP-HMAT: Scalable, ${O}(n/log(n))$ Gaussian Process Regression with Hierarchical Low-Rank Matrices,” ArXiv:2201.00888 [cs, math].
  • Kim, H.-M., Mallick, B. K., and Holmes, C. C. (2005), “Analyzing Nonstationary Spatial Data Using Piecewise Gaussian Processes,” Journal of the American Statistical Association, 100, 653–668. DOI: 10.1198/016214504000002014.
  • Knorr-Held, L., and Rasser, G. (2000), “Bayesian Detection of Clusters and Discontinuities in Disease Maps,” Biometrics, 56, 13–21. DOI: 10.1111/j.0006-341x.2000.00013.x.
  • Lindgren, F., Rue, H., and Lindström, J. (2011), “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: the Stochastic Partial Differential Equation Approach,” Journal of the Royal Statistical Society, Series B, 73, 423–498. DOI: 10.1111/j.1467-9868.2011.00777.x.
  • Majumder, S., Guan, Y., Reich, B. J., and Saibaba, A. K. (2022), “Kryging: Geostatistical Analysis of Large-Scale Datasets Using Krylov” Subspace Methods, Statistics and Computing, 32, 74. DOI: 10.1007/s11222-022-10104-3.
  • Minden, V., Damle, A., Ho, K. L., and Ying, L. (2017), “Fast Spatial Gaussian Process Maximum Likelihood Estimation via Skeletonization Factorizations,” Multiscale Modeling & Simulation, 15, 1584–1611. DOI: 10.1137/17M1116477.
  • Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., and Sain, S. (2015), “A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets,” Journal of Computational and Graphical Statistics, 24, 579–599. DOI: 10.1080/10618600.2014.914946.
  • Perlin, K. (1985), “An Image Synthesizer,” ACM SIGGRAPH Computer Graphics, 19, 287–296. DOI: 10.1145/325165.325247.
  • Rasmussen, C. E., and Williams, C. K. I. (2005), Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning), Cambridge, MA: The MIT Press.
  • Roustant, O., Ginsbourger, D., and Deville, Y. (2012), “DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization,” Journal of Statistical Software, 51, 1–55. DOI: 10.18637/jss.v051.i01.
  • Saad, Y. (2003), Iterative Methods for Sparse Linear Systems, Philadelphia: SIAM.
  • Salvaña, M. L. O., Abdulah, S., Huang, H., Ltaief, H., Sun, Y., Genton, M. G., and Keyes, D. E. (2021), “High Performance Multivariate Geospatial Statistics on Manycore Systems,” IEEE Transactions on Parallel and Distributed Systems, 32, 2719–2733. DOI: 10.1109/TPDS.2021.3071423.
  • Sang, H., Jun, M., and Huang, J. Z. (2011), “Covariance Approximation for Large Multivariate Spatial Data Sets with an Application to Multiple Climate Model Errors,” Annals of Applied Statistics, 5, 2519–2548.
  • Stein, M. L. (2012), Interpolation of Spatial Data: Some Theory for Kriging, New York: Springer.
  • Ubaru, S., Chen, J., and Saad, Y. (2017), “Fast Estimation of Tr(f(A)) via Stochastic Lanczos Quadrature,” SIAM Journal on Matrix Analysis and Applications, 38, 1075–1099. DOI: 10.1137/16M1104974.
  • Williams, C., and Seeger, M. (2000), “Using the Nyström Method to Speed Up Kernel Machines,” in Advances in Neural Information Processing Systems (Vol. 13), eds. T. Leen, T. Dietterich, and V. Tresp, Cambridge, MA: MIT Press.
  • Woodbury, M. A. (1950), Inverting Modified Matrices, Statistical Research Group.
  • Zammit-Mangion, A., Cressie, N., and Shumack, C. (2018), “On Statistical Approaches to Generate Level 3 Products from Satellite Remote Sensing Retrievals,” Remote Sensing, 10, 155. DOI: 10.3390/rs10010155.
  • Zhu, C., Byrd, R. H., Lu, P., and Nocedal, J. (1997), “Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization,” ACM Transactions on mathematical software (TOMS), 23, 550–560. DOI: 10.1145/279232.279236.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.