ABSTRACT
Applications of machine learning techniques to economic problems are increasing. These are powerful techniques with great potential to extract insights from economic data. However, care must be taken to apply them correctly, or the wrong conclusions may be drawn. In the technology clubs literature, after applying a clustering algorithm, some authors train a supervised machine learning technique, such as a decision tree or a neural network, to predict the label of the clusters. Then, they use some performance metric (typically, accuracy) of that prediction as a measure of the quality of the clustering configuration they have found. This is an error with potential negative implications for policy, because obtaining a high accuracy in such a prediction does not mean that the clustering configuration found is correct. This paper explains in detail why this modus operandi is not sound from theoretical point of view and uses computer simulations to demonstrate it. We caution policy and indicate the direction for future investigations.
Acknowledgments
This paper is dedicated to the memory of both my parents for always loving and supporting me during my academic career: Francisco Rodríguez Gude and Esperanza Andrés González who died from COVID-19 in March 2020 in Madrid. The usual caveat applies.
Disclosure of potential conflicts of interest
No potential conflict of interest was reported by the author(s).
Notes
1 FYI Professor Susan Athey once served as a consultant chief economist for the Microsoft Corporation, Hal Varian is Google Chief Economist, and Pat Bajari is a chief economist and vice-president for Amazon.com.