254
Views
4
CrossRef citations to date
0
Altmetric
Original Article

Population subset selection for the use of a validation dataset for overfitting control in genetic programming

, ORCID Icon, &
Pages 243-271 | Received 23 Nov 2018, Accepted 11 Jul 2019, Published online: 31 Jul 2019
 

ABSTRACT

Genetic Programming (GP) is a technique which is able to solve different problems through the evolution of mathematical expressions. However, in order to be applied, its tendency to overfit the data is one of its main issues. The use of a validation dataset is a common alternative to prevent overfitting in many Machine Learning (ML) techniques, including GP. But, there is one key point which differentiates GP and other ML techniques: instead of training a single model, GP evolves a population of models. Therefore, the use of the validation dataset has several possibilities because any of those evolved models could be evaluated. This work explores the possibility of using the validation dataset not only on the training-best individual but also in a subset with the training-best individuals of the population. The study has been conducted with 5 well-known databases performing regression or classification tasks. In most of the cases, the results of the study point out to an improvement when the validation dataset is used on a subset of the population instead of only on the training-best individual, which also induces a reduction on the number of nodes and, consequently, a lower complexity on the expressions.

Acknowledgments

The experiments described in this work were performed on computers in the Supercomputing Center of Galicia (CESGA). Daniel Rivero and Enrique Fernndez-Blanco would also like to thank the support provided by the NVIDIA Research Grants Program.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work is supported by the Collaborative Project in Genomic Data Integration (CICLOGEN) PI17/01826 funded by the Carlos III Health Institute from the Spanish National plan for Scientific and Technical Research and Innovation 20132016 and the European Regional Development Funds (FEDER)A way to build Europe. This project was also supported by the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431G/01, ED431D 2017/16), he Galician Network for Colorectal Cancer Research (Ref. ED431D 2017/23), Competitive Reference Groups (Ref. ED431C 2018/49) and the European Regional Development Funds (FEDER) A way to build Europe Consellería de Cultura, Educación e Ordenación Universitaria, Xunta de Galicia[ED431C 2018/49,ED431D 2017/16,ED431D 2017/23,ED431G/01]; Instituto de Salud Carlos III [PI17/01826].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.