ABSTRACT
We investigate the potential of applying a state-of-the-art, local derivative-free solver, Py-BOBYQA to global optimization problems. In particular, we demonstrate the potential of a restarts procedure – as distinct from multistart methods – to allow Py-BOBYQA to escape local minima (where ordinarily it would terminate at the first local minimum found). We also introduce an adaptive variant of restarts which yields improved performance on global optimization problems. As Py-BOBYQA is a model-based trust-region method, we compare largely with other global optimization methods for which (global) models are important, such as Bayesian optimization and response surface methods; we also consider state-of-the-art representative deterministic and stochastic codes, such as DIRECT and CMA-ES. We find numerically that the restarts procedures in Py-BOBYQA are effective at helping it to escape local minima, when compared to using no restarts in Py-BOBYQA. Additionally, we find that Py-BOBYQA with adaptive restarts has comparable performance with global optimization solvers for all accuracy/budget regimes, in both smooth and noisy settings. In particular, Py-BOBYQA variants are best performing for smooth and multiplicative noise problems in high-accuracy regimes. As a by-product, some preliminary conclusions can be drawn on the relative performance of the global solvers we have tested with default settings.
Acknowledgments
We acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility (http://dx.doi.org/10.5281/zenodo.22558) in carrying out this work.
Notes
1 Note that NEWUOA and BOBYQA are very similar codes, with the latter allowing bound constraints.
2 Due to our use of Python for the DFO code, for the sake of like-for-like comparisons, we considered GO solvers written in or interfacing to Python.
3 We used Python 2.7 with NumPy 1.12.1 and SciPy 1.0.1.
4 Available from https://github.com/numericalalgorithmsgroup/pybobyqa
5 The choice of constant factor 1.1 was simply considered to be a sensible choice, rather than selected based on testing or tuning.
6 Another possibility is to use performance profiles [Citation13], where the ‘difficulty’ of the problem is also taken into account. We have generated these as well, and compared them with data profiles. We found that they did not bring new/different information in our tests compared to data profiles and so we have not included them here for space reasons.
7 These are: Ackley, Bohachevsky 1, Bohachevsky 2, Camel 3, Cosine Mixture, Exponential, Griewank, Periodic, Powell Quadratic, Rastrigin, Salomon, Schaffer 1, and Schaffer 2.
8 The multi-class classifier came from by training 10 one-versus-all classifiers, with prediction by selecting the class with highest probability. The training set had 60,000 images, the test set had 10,000 images.
9 The case p = n + 1 when is linear (and hence contains no curvature information) does not yield competitive variants for general objectives [Citation1,Citation73].