3,006
Views
111
CrossRef citations to date
0
Altmetric
Statistical Computing and Graphics

Visualizing Count Data Regressions Using Rootograms

Pages 296-303 | Received 01 Jul 2014, Published online: 10 Aug 2016
 

ABSTRACT

The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg. It also contains the data and replication code.

Appendix A. R Implementation

For an overview of count data regression models in R we refer to Zeileis, Kleiber, and Jackman Citation(2008), where R implementations of hurdle and zero-inflation models are described in some detail. The corresponding fitting functions have now been moved to the countreg package, a new package that is currently under development by the authors of the present article. First versions are already available from http://R-Forge.R-project.org/projects/countreg/.

The current implementation of rootograms in countreg provides a generic function rootogram(object,...) along with several methods for different types of models/data. The methods all proceed in the same way: They first compute the observed and expected frequencies, obsj and expj respectively (see Section 2), and then call the default method that computes all required coordinates for drawing the rootograms. The latter has the following arguments:

rootogram(object, fitted, breaks = NULL,

style = c("hanging", "standing", "suspended"),

scale = c("sqrt", "raw"), plot = TRUE,

width = NULL, xlab = NULL, ylab = NULL,

main = NULL, …)

The arguments object and fitted need to provide the tables/vectors of observed and fitted frequencies. (The first argument is called object rather than observed for consistency with the generic function that only takes one required object argument and ....) The breaks need to be specified if a continuous distribution is employed while for a discrete distribution one may want to set the width of the bars to leave small gaps between the bars (as in our examples). Additionally, one of three styles can be specified: ”hanging” (default), ”standing”, or ”suspended”. The object returned is then a ‘data.frame’ with all the coordinates needed for plotting, and this is also drawn directly by default (plot = TRUE) along with the specified graphical arguments (xlab, ylab, main, ...). By default, the base graphics plot() method is used for drawing rootograms. In addition, there is also an autoplot() method for drawing rootograms using the ggplot2 package (Wickham Citation2009).

Above we used methods for objects of classes ‘glm’ and ‘hurdle’. There are further methods available, currently for univariate distributions fitted via fitdistr() (to objects of class ‘numeric’, Venables and Ripley Citation2002), zero-inflated models (objects of class ‘zeroinfl’, Zeileis, Kleiber, and Jackman Citation2008), zero-truncated models (objects of class ‘zerotrunc’, as fitted by the zerotrunc() function in countreg), generalized additive models (objects of class ‘gam’, Wood Citation2006), and for selected count distributions falling within the framework of generalized additive models for location, scale, and shape (objects of class ‘gamlss’, Rigby and Stasinopoulos Citation2005; Stasinopoulos and Rigby Citation2007).

Supplementary Materials

 

rootograms-furtherexamples.pdf: Two further empirical examples for visualizing count data regressions using rootograms. (PDF file)

 

countreg_0.1-5.tar.gz: R package with an implementation of several count data regression models along with the corresponding rootograms. Replication code is available in the manual pages. See help(”CrabSatellites”, package = ”countreg”) for the example from this article, and help(”FLXMRnegbin”, package = ”countreg”) as well as help(”Takeover-Bids”, package = ”countreg”) for the supplementary examples. The package is developed on R-Forge and current versions can be obtained from http://R-Forge.R-project.org/projects/countreg/. (GNU zipped tar file)

Acknowledgement

The authors thank the editor, the associate editor, and several anonymous reviewers for many valuable suggestions on earlier versions of this article.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 106.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.