Abstract
Data Envelopment Analysis (DEA) presents the typical characteristics of a data-driven approach with the specific objective of determining technical efficiency and production frontiers in Engineering and Microeconomics. However, by construction, the frontier estimator generated by DEA suffers from overfitting problems; something that contrasts with currently accepted models in machine learning. In this regard, DEA can be seen as a preliminary stage of a more complex approach, where the aim is to avoid overfitting in order to determine a proper description of the underlying Data Generating Process that is behind the generation of the observations in a production process. In this paper, we introduce a possible solution to overcome the overfitting problem associated with DEA that is based on cross-validation. This process “peels” the standard DEA frontier (removing certain supporting hyperplanes) until a new convex technology, which also satisfies free disposability in inputs and outputs but not the principle of minimal extrapolation, is determined. Our approach is tested by resorting to a computational experience. Additionally, we illustrate how the new method could be used as a complement to the standard DEA technique through an empirical application based on a PISA (Programme for International Student Assessment) dataset.
Acknowledgements
We thank two anonymous reviewers for providing constructive comments and help in improving the contents and presentation of this paper.
Disclosure statement
We hereby certify that, to the best of our knowledge, no aspect of our current personal or professional circumstances place us in the position of having a conflict of interest with the content of this manuscript.
Notes
1 Letters in bold face denote vectors.
2 We resort to the radial model for generating the supporting hyperplanes in a DEA polyhedral technology. However, this type of model could not yield all the supporting hyperplanes of the technology, but a subset of them. In this regard, other alternatives to be implemented could be the algorithms introduced in Jahanshahloo et al. (Citation2007, Citation2010), to name a few. Nevertheless, the use of this type of algorithms would imply a considerable increase in the computing time of the new method.