ABSTRACT
Exploratory data analysis (EDA) is sometimes suggested as a hypothesis identification approach. It is often used as such in problem solving and consists of the analysis of observational data, often collected without well-defined hypotheses, with the purpose of finding clues that could inspire ideas and hypotheses. This article seeks to uncover some of the main principles of EDA in problem solving. The article discusses and explains EDA's main steps: (1) Display the data; (2) identify salient features; (3) interpret salient features. The empiricist notion of EDA, which pervades many textbook accounts of EDA, is criticized and contrasted to an account that emphasizes the role of mental models in hypothesis generation. The framework has some implications for the limitations of EDA. It also sheds light on the role of the statistician compared to the role of the context expert. The article argues that in teaching EDA the emphasis for statistical data analysis should be balanced with teaching students to theorize and be inquisitive. Throughout the article, ideas are illustrated by the well-known case of John Snow's studies of the transmission mechanism of cholera.