Abstract
This work proposes a nonparametric method to compare the underlying mean functions given two noisy datasets. The motivation for the work stems from an application of comparing wind turbine power curves. Comparing wind turbine data presents new problems, namely the need to identify the regions of difference in the input space and to quantify the extent of difference that is statistically significant. Our proposed method, referred to as funGP, estimates the underlying functions for different data samples using Gaussian process models. We build a confidence band using the probability law of the estimated function differences under the null hypothesis. Then, the confidence band is used for the hypothesis test as well as for identifying the regions of difference. This identification of difference regions is a distinct feature, as existing methods tend to conduct an overall hypothesis test stating whether two functions are different. Understanding the difference regions can lead to further practical insights and help devise better control and maintenance strategies for wind turbines. The merit of funGP is demonstrated by using three simulation studies and four real wind turbine datasets.
Supplementary Material
Supplementary Material: The PDF file contains: (i) Derivation of the covariance function , (ii) Brief description of the Karhunen-Loève expansion of a Gaussian process, and (iii) Details on borehole and piston functions.
Computer Code: The computer code to reproduce all the results in this article are available on GitHub at https://github.com/TAMU-AML/funGP-Paper. A generic R function for applying the funGP algorithm to any dataset is available in DSWE package in R available through CRAN at https://CRAN.R-project.org/package=DSWE.
Acknowledgments
The authors thank the Editor, the Associate Editor, and the Reviewers for providing valuable feedback. Their comments has led to a significant improvement in the article. The authors would also like to acknowledge the role of Texas A&M’s high-performance research computing (HPRC), which enabled the authors to efficiently run their experiments.