Abstract
Computer simulations often involve both qualitative and numerical inputs. Existing Gaussian process (GP) methods for handling this mainly assume a different response surface for each combination of levels of the qualitative factors and relate them via a multiresponse cross-covariance matrix. We introduce a substantially different approach that maps each qualitative factor to underlying numerical latent variables (LVs), with the mapped values estimated similarly to the other correlation parameters, and then uses any standard GP covariance function for numerical variables. This provides a parsimonious GP parameterization that treats qualitative factors the same as numerical variables and views them as affecting the response via similar physical mechanisms. This has strong physical justification, as the effects of a qualitative factor in any physics-based simulation model must always be due to some underlying numerical variables. Even when the underlying variables are many, sufficient dimension reduction arguments imply that their effects can be represented by a low-dimensional LV. This conjecture is supported by the superior predictive performance observed across a variety of examples. Moreover, the mapped LVs provide substantial insight into the nature and effects of the qualitative factors. Supplementary materials for the article are available online.
Supplementary materials
The online supplementary materials for this article contain numerical performance results for several additional examples, as well as further details on the examples in Section 4. The R-package “LVGP”, which is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=LVGP, contains the code for fitting LVGP models to general mixed-variable datasets.
Acknowledgments
The authors thank the Editor (Roshan Joseph) who handled this article and the anonymous Associate Editor and two Referees for their many helpful suggestions that have improved the article and its presentation.