3,018
Views
38
CrossRef citations to date
0
Altmetric
Articles: International Year of Statistics Featured Discussion: InfoVis

Infovis and Statistical Graphics: Different Goals, Different Looks

Pages 2-28 | Received 01 Dec 2011, Published online: 27 Mar 2013
 

Abstract

The importance of graphical displays in statistical practice has been recognized sporadically in the statistical literature over the past century, with wider awareness following Tukey's Exploratory Data Analysis and Tufte's books in the succeeding decades. But statistical graphics still occupy an awkward in-between position: within statistics, exploratory and graphical methods represent a minor subfield and are not well integrated with larger themes of modeling and inference. Outside of statistics, infographics (also called information visualization or Infovis) are huge, but their purveyors and enthusiasts appear largely to be uninterested in statistical principles.

We present here a set of goals for graphical displays discussed primarily from the statistical point of view and discuss some inherent contradictions in these goals that may be impeding communication between the fields of statistics and Infovis. One of our constructive suggestions, to Infovis practitioners and statisticians alike, is to try not to cram into a single graph what can be better displayed in two or more. We recognize that we offer only one perspective and intend this article to be a starting point for a wide-ranging discussion among graphic designers, statisticians, and users of statistical methods. The purpose of this article is not to criticize but to explore the different goals that lead researchers in different fields to value different aspects of data visualization.

ACKNOWLEDGMENTS

We thank Nathan Yau for posting the infographics and commentary that motivated this work; Jessica Hullman, Hadley Wickham, Lee Wilkinson, Chris Volinksy, Kaiser Fung, Alfred Inselberg, Martin Wattenberg, and conference participants at the University of Kentucky, Iowa State University, and the University of California for helpful comments; the Institute of Education Sciences for grants R305D090006-09A and ED-GRANTS-032309-005; and the National Science Foundation for grants SES-1023189 and SES-1023176.

Notes

It may be unusual for a journal article to be reacting to a blog—but the blog in question has approximately 15,000 subscribers, about three times more than the most prominent academic statistics blogs and more readers per day than many scientific journals get per year. And the issue is not just circulation. Thanks to Yau and his commenters, Flowing Data is a thoughtful forum on the interface between statistics and graphic design.

When exposed to novel graphics, readers have to make an effort of understanding. Having made the effort, they have an emotional commitment to the graphic (nobody wants to admit they wasted their time). Whether they have actually learned anything useful is difficult to tell and would make for an interesting research study. This effect is related to the positive emotional buzz we can get from working out how to do something in R. Up to a certain point, the longer it takes us to work out how to do what we want to do, the more satisfaction we get from finding the solution—even if in retrospect, it was something we should have known. And all that time has been “wasted,” it could have been spent on the real statistical problem and not on the computing problem.

The data used in Nightingale's graph are available at http://understandinguncertainty.org/node/214.

We looked carefully at the graph and could only find one node that is an orphan (i.e., with no arrows pointing toward it). This node is “Media Sensationalism Bias.” Perhaps there could be another node leading to it, labeled “Pay $$ to friendly journalists.”

Throughout, we are talking only about graphs made in good faith. Just about any graphical technique can be abused, and just about any graphical technique can become a mess, if placed in the hands of a suitably dishonest or incompetent person. These concerns are important but are outside the scope of the present article.

Additional information

Notes on contributors

Andrew Gelman

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, New York, NY 10027 (E-mail: [email protected]). Antony Unwin, Department of Computer-Oriented Statistics and Data Analysis, University of Augsburg, Augsburg, Germany (E-mail: [email protected]).

Antony Unwin

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, New York, NY 10027 (E-mail: [email protected]). Antony Unwin, Department of Computer-Oriented Statistics and Data Analysis, University of Augsburg, Augsburg, Germany (E-mail: [email protected]).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.