Abstract
Data mining occurs because most economic hypotheses do not have a unique empirical interpretation but allow the econometrician much leeway in selecting conditioning variables, lags, functional forms, and sometimes the sample. The resulting problems are of interest not only to methodologists and philosophers concerned with how hypotheses are validated in the presence of some inevitable ad hocery but, also to readers of economics journals who have no interest in methodology but need to know whether to believe what they read. Since I focus on such mundane problems I make no claim of contributing to the deeper epistemological problems of relating empirical evidence to theory, and to the meaning of confirmation and disconfirmation when, say five of the eight specifications tested are consistent with the hypothesis and, three are not. Instead, I deal with a practical problem confronting a researcher who wants to persuade his readers but does not want to deceive them. He has fitted many regressions with varying results. How should he decide how many and which ones to report? This paper is therefore more about a problem in communicating results than about a problem in the philosophy of science. Hence, I use some common-sense notions, even though I cannot provide rigorous definitions for them.
Keywords: