194
Views
0
CrossRef citations to date
0
Altmetric
Data Science, Quality & Reliability

Optimize to generalize in Gaussian processes: An alternative objective based on the Rényi divergence

& ORCID Icon
Pages 600-610 | Received 31 Mar 2023, Accepted 20 Apr 2023, Published online: 13 Jul 2023

References

  • Alshraideh, H. and Khatatbeh, E. (2014) A Gaussian process control chart for monitoring autocorrelated process data. Journal of Quality Technology, 46(4), 317–322.
  • Alvarez, M. and Lawrence, N. (2008) Sparse convolved Gaussian processes for multi-output regression. Advances in Neural Information Processing Systems, 21, 57–64.
  • Asuncion, A. and Newman, D. J. (2007) UCI machine learning repository, University of California, School of Information and Computer Science, Irvine, CA. Available at http://www.ics.uci.edu/∼mlearn/MLRepository.html
  • Bhattacharya, A., Pati, D., Yang, Y. et al. (2019) Bayesian fractional posteriors. Annals of Statistics, 47(1), 39–66.
  • Bishop, C.M. (2006) Pattern Recognition and Machine Learning, Springer, New York, NY.
  • Blei, D.M., Kucukelbir, A. and McAuliffe, J.D. (2017) Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.
  • Bui, T., Hernández-Lobato, D., Hernandez-Lobato, J., Li, Y. and Turner, R. (2016) Deep Gaussian processes for regression using approximate expectation propagation, in International Conference on Machine Learning, Curran Associates, Inc., New York, NY, pp. 1472–1481.
  • Bui, T.D., Yan, J. and Turner, R.E. (2017) A unifying framework for Gaussian process pseudo-point approximations using power expectation propagation. The Journal of Machine Learning Research, 18(1), 3649–3720.
  • Burt, D.R., Rasmussen, C.E. and Van Der Wilk, M. (2019) Rates of convergence for sparse variational Gaussian process regression. arXiv preprint arXiv:1903.03571.
  • Chen, H., Zheng, L., Al Kontar, R. and Raskutti, G. (2020) Stochastic gradient descent in correlated settings: A study on Gaussian processes. Advances in Neural Information Processing Systems, 33, 2722–2733.
  • Currin, C., Mitchell, T., Morris, M. and Ylvisaker, D. (1991) Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association, 86(416), 953–963.
  • Daley, R. (1993) Atmospheric Data Analysis, Number 2. Cambridge University Press, Cambridge, UK.
  • Damianou, A. and Lawrence, N. (2013) Deep Gaussian processes, in Artificial Intelligence and Statistics, Scottsdale, AZ, pp. 207–215.
  • Deisenroth, M. and Mohamed, S. (2012) Expectation propagation in Gaussian process dynamical systems, in Advances in Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, pp. 2609–2617.
  • Dowsland, K.A. and Thompson, J. (2012) Simulated annealing, in Handbook of Natural Computing, Springer, Midtown Manhattan, NY, pp. 1623–1655.
  • Foret, P., Kleiner, A., Mobahi, H. and Neyshabur, B. (2020) Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412.
  • Frigola, R., Lindsten, F., Schön, T.B. and Rasmussen, C.E. (2013) Bayesian inference and learning in Gaussian process state-space models with particle MCMC, in Advances in Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, pp. 3156–3164.
  • Furrer, R., Genton, M.G. and Nychka, D. (2006) Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3), 502–523.
  • Gardner, J., Pleiss, G., Weinberger, K.Q., Bindel, D. and Wilson, A.G. (2018) Gpytorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration, in Advances in Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, pp. 7576–7586.
  • Gramacy, R.B. (2020) Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences, CRC Press, Boca Raton, FL.
  • Gramacy, R.B. and Apley, D.W. (2015) Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2), 561–578.
  • Gramacy, R.B. and Haaland, B. (2016) Speeding up neighborhood search in local Gaussian process prediction. Technometrics, 58(3), 294–303.
  • Gramacy, R.B. and Lian, H. (2012) Gaussian process single-index models as emulators for computer experiments. Technometrics, 54(1), 30–41.
  • Grünwald, P. (2012) The safe Bayesian, in International Conference on Algorithmic Learning Theory, Springer, Midtown Manhattan, NY, pp. 169–183.
  • Guinness, J. (2018) Permutation and grouping methods for sharpening Gaussian process approximations. Technometrics, 60(4), 415–429.
  • Havasi, M., Hernández-Lobato, J.M. and Murillo-Fuentes, J.J. (2018) Inference in deep Gaussian processes using stochastic gradient Hamiltonian Monte Carlo, in Advances in Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, pp. 7506–7516.
  • Henderson, D., Jacobson, S.H. and Johnson, A.W. (2003) The theory and practice of simulated annealing, in Handbook of Metaheuristics, Springer, Midtown Manhattan, NY, pp. 287–319.
  • Hensman, J., Fusi, N. and Lawrence, N.D. (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835.
  • Hensman, J., Matthews, A.G., Filippone, M. and Ghahramani, Z. (2015) MCMC for variationally sparse Gaussian processes, in Advances in Neural Information Processing Systems, Curran Associates, Inc., Red Hook, NY, pp. 1648–1656.
  • Hoang, T.N., Hoang, Q.M. and Low, B.K.H. (2015) A unifying framework of anytime sparse Gaussian process regression models with stochastic variational inference for big data, in International Conference on Machine Learning, Association for Computing, New York, NY, pp. 569–578.
  • Hoffman, M.D., Blei, D.M., Wang, C. and Paisley, J. (2013) Stochastic variational inference. The Journal of Machine Learning Research, 14(1), 1303–1347.
  • Jacot, A., Gabriel, F. and Hongler, C. (2018) Neural tangent kernel: Convergence and generalization in neural networks. arXiv preprint arXiv:1806.07572.
  • Jones, B. and Johnson, R.T. (2009) Design and analysis for the Gaussian process model. Quality and Reliability Engineering International, 25(5), 515–524.
  • Joseph, V.R., Gu, L., Ba, S. and Myers, W.R. (2019) Space-filling designs for robustness experiments. Technometrics, 61(1), 24–37.
  • Journel, A.G. and Huijbregts, C.J. (1978) Mining Geostatistics, Volume 600. Academic Press, London.
  • Kaufman, C.G., Schervish, M.J. and Nychka, D.W. (2008) Covariance tapering for likelihood-based estimation in large spatial data sets. Journal of the American Statistical Association, 103(484), 1545–1555.
  • Kawaguchi, K., Kaelbling, L.P. and Bengio, Y. (2017) Generalization in deep learning. arXiv preprint arXiv:1710.05468.
  • Kennedy, M.C. and O’Hagan, A. (2001) Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(3), 425–464.
  • Krishna, A., Joseph, V.R., Ba, S., Brenneman, W.A. and Myers, W.R. (2020) Robust experimental designs for model calibration. arXiv preprint arXiv:2008.00547.
  • Lalchand, V. and Rasmussen, C.E. (2019) Approximate inference for fully Bayesian Gaussian process regression. arXiv preprint arXiv:1912.13440.
  • Lindgren, F., Rue, H. and Lindström, J. (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4), 423–498.
  • Liu, H., Ong, Y.-S., Shen, X. and Cai, J. (2018) When Gaussian process meets big data: A review of scalable GPS. arXiv preprint arXiv:1807.01065.
  • Martinez-Cantin, R. (2014) Bayesopt: A Bayesian optimization library for nonlinear optimization, experimental design and bandits. Journal of Machine Learning Research, 15(1), 3735–3739.
  • Matheron, G. (1973) The intrinsic random functions and their applications. Advances in Applied Probability, 5(3), 439–468.
  • Matthews, A.G.d.G., Rowland, M., Hron, J., Turner, R.E. and Ghahramani, Z. (2018) Gaussian process behaviour in wide deep neural networks. arXiv preprint arXiv:1804.11271.
  • Miller, J.W. and Dunson, D.B. (2018) Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 81(3), 519–545.
  • Plumlee, M. (2019) Computer model calibration with confidence and consistency. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(3), 519–545.
  • Plumlee, M., Erickson, C., Ankenman, B. and Lawrence, E. (2020) Composite grid designs for adaptive computer experiments with fast inference. Biometrika, 108(3), 749–755.
  • Rana, S., Li, C., Gupta, S., Nguyen, V. and Venkatesh, S. (2017) High dimensional Bayesian optimization with elastic Gaussian process, in Proceedings of the 34th International Conference on Machine Learning, Volume 70. Association for Computing, New York, NY, pp. 2883–2891.
  • Rényi, A. (1961) On measures of entropy and information, in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, The Regents of the University of California, Berkeley, CA.
  • Ripley, B.D. (1981) Spatial Statistics, Volume 575. John Wiley & Sons, Hoboken, NJ.
  • Rose, K., Gurewitz, E. and Fox, G. (1990) A deterministic annealing approach to clustering. Pattern Recognition Letters, 11(9), 589–594.
  • Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989) Design and analysis of computer experiments. Statistical Science, 4, 409–423.
  • Snelson, E. and Ghahramani, Z. (2006) Sparse Gaussian processes using pseudo-inputs, in Advances in Neural Information Processing Systems, pp. 1257–1264.
  • Snoek, J., Larochelle, H. and Adams, R.P. (2012) Practical Bayesian optimization of machine learning algorithms, in Advances in Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, pp. 2951–2959.
  • Srinivas, N., Krause, A., Kakade, S.M. and Seeger, M. (2009) Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995.
  • Stein, M.L. (2014) Limitations on low rank approximations for covariance matrices of spatial data. Spatial Statistics, 8, 1–19.
  • Sung, C.-L., Hung, Y., Rittase, W., Zhu, C. and Jeff Wu, C. (2020) A generalized Gaussian process model for computer experiments with binary time series. Journal of the American Statistical Association, 115(530), 945–956.
  • Takapoui, R. and Javadi, H. (2016) Preconditioning via diagonal scaling. arXiv preprint arXiv:1610.03871.
  • Thompson, P.D. (1956) Optimum smoothing of two-dimensional fields 1. Tellus, 8(3), 384–393.
  • Titsias, M. (2009) Variational learning of inducing variables in sparse Gaussian processes, in Artificial Intelligence and Statistics, Artificial Intelligence and Statistics, Clearwater Beach, FL, pp. 567–574.
  • Titsias, M. and Lawrence, N.D. (2010) Bayesian Gaussian process latent variable model, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Artificial Intelligence and Statistics, Sardinia, Italy, pp. 844–851.
  • Tran, D., Ranganath, R. and Blei, D.M. (2015) The variational Gaussian process. arXiv preprint arXiv:1511.06499.
  • Tuo, R. and Wang, W. (2020) Kriging prediction with isotropic matern correlations: Robustness and experimental designs. Journal of Machine Learning Research, 21(187), 1–38.
  • Wang, K.A., Pleiss, G., Gardner, J.R., Tyree, S., Weinberger, K.Q. and Wilson, A.G. (2019) Exact Gaussian processes on a million data points. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.
  • Wang, W., Tuo, R. and Jeff Wu, C. (2019) On prediction properties of kriging: Uniform error bounds and robustness. Journal of the American Statistical Association, 115(530), 920–930.
  • Wei, P., Liu, F. and Tang, C. (2018) Reliability and reliability-based importance analysis of structural systems using multiple response Gaussian process model. Reliability Engineering & System Safety, 175, 183–195.
  • Yang, G. (2019) Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation. arXiv preprint arXiv:1902.04760.
  • Yue, X. and Kontar, R.A. (2020a) Why non-myopic Bayesian optimization is promising and how far should we look-ahead? A study via rollout, in International Conference on Artificial Intelligence and Statistics, Artificial Intelligence and Statistics, Sicily, Italy, pp. 2808–2818.
  • Yue, X. and Kontar, R.A. (2020b) Joint models for event prediction from time series and survival data. Technometrics, 63(4), 477–486.
  • Zhang, Q., Chien, P., Liu, Q., Xu, L. and Hong, Y. (2021) Mixed-input Gaussian process emulators for computer experiments with a large number of categorical levels. Journal of Quality Technology, 53(4), 410–420.
  • Zhu, H., Williams, C.K., Rohwer, R. and Morciniec, M. (1997) Gaussian regression and optimal finite dimensional linear models, Aston University, Birmingham, UK.
  • Zomaya, A.Y. and Kazman, R. (2010) Simulated annealing techniques, in Algorithms and Theory of Computation Handbook: General Concepts and Techniques, second edition, Chapman and Hall/CRC, Parkway, NW, pp. 33–33.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.