Abstract

This article develops a generalization of the scatterplot matrix based on the recognition that most datasets include both categorical and quantitative information. Traditional grids of scatterplots often obscure important features of the data when one or more variables are categorical but coded as numerical. The generalized pairs plot offers a range of displays of paired combinations of categorical and quantitative variables. A mosaic plot, fluctuation diagram, or faceted bar chart may be used to display two categorical variables. A side-by-side boxplot, stripplot, faceted histogram, or density plot helps visualize a categorical and a quantitative variable. A traditional scatterplot is suitable for displaying a pair of numerical variables, but options also support density contours or annotating summary statistics such as the correlation and number of missing values, for example. By combining these, the generalized pairs plot may help to reveal structure in multivariate data that otherwise might go unnoticed in the process of exploratory data analysis. Two different R packages provide implementations of the generalized pairs plot, gpairs and GGally. Supplementary materials for this article are available online on the journal web site.

ACKNOWLEDGMENTS

The authors thank John Hartigan, Antony Unwin, and many students for advice and testing of these graphical displays. This work was partially supported by an unrestricted fellowship from Novartis, and by National Science Research grant DMS0706949.

Additional information

Notes on contributors

John W. Emerson

John W. Emerson is at the Department of Statistics, Yale University, New Haven, CT 06520 (E-mail: [email protected]). Walton A. Green is at the Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. Barret Schloerke is at the Department of Statistics, Iowa State University, Ames, IA 50011. Jason Crowley is at the Department of Statistics, Iowa State University, Ames, IA 50011. Dianne Cook is at the Department of Statistics, Iowa State University, Ames, IA 50011. Heike Hofmann is at the Department of Statistics, Iowa State University, Ames, IA 50011. Hadley Wickham is at the Department of Statistics, Rice University, Houston, TX 77251.

Walton A. Green

John W. Emerson is at the Department of Statistics, Yale University, New Haven, CT 06520 (E-mail: [email protected]). Walton A. Green is at the Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. Barret Schloerke is at the Department of Statistics, Iowa State University, Ames, IA 50011. Jason Crowley is at the Department of Statistics, Iowa State University, Ames, IA 50011. Dianne Cook is at the Department of Statistics, Iowa State University, Ames, IA 50011. Heike Hofmann is at the Department of Statistics, Iowa State University, Ames, IA 50011. Hadley Wickham is at the Department of Statistics, Rice University, Houston, TX 77251.

Barret Schloerke

John W. Emerson is at the Department of Statistics, Yale University, New Haven, CT 06520 (E-mail: [email protected]). Walton A. Green is at the Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. Barret Schloerke is at the Department of Statistics, Iowa State University, Ames, IA 50011. Jason Crowley is at the Department of Statistics, Iowa State University, Ames, IA 50011. Dianne Cook is at the Department of Statistics, Iowa State University, Ames, IA 50011. Heike Hofmann is at the Department of Statistics, Iowa State University, Ames, IA 50011. Hadley Wickham is at the Department of Statistics, Rice University, Houston, TX 77251.

Jason Crowley

John W. Emerson is at the Department of Statistics, Yale University, New Haven, CT 06520 (E-mail: [email protected]). Walton A. Green is at the Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. Barret Schloerke is at the Department of Statistics, Iowa State University, Ames, IA 50011. Jason Crowley is at the Department of Statistics, Iowa State University, Ames, IA 50011. Dianne Cook is at the Department of Statistics, Iowa State University, Ames, IA 50011. Heike Hofmann is at the Department of Statistics, Iowa State University, Ames, IA 50011. Hadley Wickham is at the Department of Statistics, Rice University, Houston, TX 77251.

Dianne Cook

John W. Emerson is at the Department of Statistics, Yale University, New Haven, CT 06520 (E-mail: [email protected]). Walton A. Green is at the Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. Barret Schloerke is at the Department of Statistics, Iowa State University, Ames, IA 50011. Jason Crowley is at the Department of Statistics, Iowa State University, Ames, IA 50011. Dianne Cook is at the Department of Statistics, Iowa State University, Ames, IA 50011. Heike Hofmann is at the Department of Statistics, Iowa State University, Ames, IA 50011. Hadley Wickham is at the Department of Statistics, Rice University, Houston, TX 77251.

Heike Hofmann

John W. Emerson is at the Department of Statistics, Yale University, New Haven, CT 06520 (E-mail: [email protected]). Walton A. Green is at the Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. Barret Schloerke is at the Department of Statistics, Iowa State University, Ames, IA 50011. Jason Crowley is at the Department of Statistics, Iowa State University, Ames, IA 50011. Dianne Cook is at the Department of Statistics, Iowa State University, Ames, IA 50011. Heike Hofmann is at the Department of Statistics, Iowa State University, Ames, IA 50011. Hadley Wickham is at the Department of Statistics, Rice University, Houston, TX 77251.

Hadley Wickham

John W. Emerson is at the Department of Statistics, Yale University, New Haven, CT 06520 (E-mail: [email protected]). Walton A. Green is at the Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. Barret Schloerke is at the Department of Statistics, Iowa State University, Ames, IA 50011. Jason Crowley is at the Department of Statistics, Iowa State University, Ames, IA 50011. Dianne Cook is at the Department of Statistics, Iowa State University, Ames, IA 50011. Heike Hofmann is at the Department of Statistics, Iowa State University, Ames, IA 50011. Hadley Wickham is at the Department of Statistics, Rice University, Houston, TX 77251.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.