20,743
Views
664
CrossRef citations to date
0
Altmetric
STATISTICAL DEVELOPMENTS AND APPLICATIONS

The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives

, &
Pages 121-136 | Received 02 Jan 2008, Published online: 10 Feb 2009
 

Abstract

Count data reflect the number of occurrences of a behavior in a fixed period of time (e.g., number of aggressive acts by children during a playground period). In cases in which the outcome variable is a count with a low arithmetic mean (typically < 10), standard ordinary least squares regression may produce biased results. We provide an introduction to regression models that provide appropriate analyses for count data. We introduce standard Poisson regression with an example and discuss its interpretation. Two variants of Poisson regression, overdispersed Poisson regression and negative binomial regression, are introduced that may provide more optimal results when a key assumption of standard Poisson regression is violated. We also discuss the problems of excess zeros in which a subgroup of respondents who would never display the behavior are included in the sample and truncated zeros in which respondents who have a zero count are excluded by the sampling plan. We provide computer syntax for our illustrations in SAS and SPSS. The Poisson family of regression models provides improved and now easy to implement analyses of count data.

[Supplementary materials are available for this article. Go to the publisher's online edition of Journal of Personality Assessment for the following free supplemental resources: the data set used to illustrate Poisson regression in this article, which is available in three formats—a text file, an SPSS database, or a SAS database.]

Acknowledgments

We thank Howard Tennen and Stephen Armeli for providing us with a portion of the data from CitationDeHart, Tennen, Armeli, Todd, & Affleck (2008). This data set provided the basis on which we developed our simulated data set.

Notes

1 STATA is another commercial program that is widely used by economists and sociologists for the analysis of count data. It is typically not available in many psychology departments.

3 Skewness is the extent of asymmetry of the distribution and is indexed in the population by γ1 = μ33, where μ3 is estimated by 1/N Σ (X i i )3, and σ is the population standard deviation. Excess positive kurtosis refers to distributions having sharper peaks and longer tails than the normal distribution. Excess kurtosis relative to the value 0 for the normal distribution is indexed in the population by μ44 – 3. Positive kurtosis (long tails) is associated with bias in estimates of standard errors.

a R 2

b Pseudo R 2.

4 The intercept is the predicted value of the outcome when each of the predictors is equal to zero. In this illustration, if the measure of sensation seeking were rescaled so that zero had a meaningful value (e.g., if it were rescored as a 0 to 6 scale or it were mean centered, sensationC = sensation – mean[sensation]), then the intercept would be interpretable. The lack of attention of the scaling of the predictors unfortunately makes the intercept a parameter of no interest in many behavioral science applications (see CitationCohen et al., 2003; CitationWainer, 2000).

5 Three problems arise with calculating R 2 for Poisson regression in the same manner as R 2 for OLS regression (CitationCameron & Trivedi, 1998). First, the convenient partitioning of the total sum of squares, ∑(Y i )2 = ∑(Y i i )2 + ∑( i )2, breaks down. Because of the nonlinear relationship between i and Y i , the cross-product term + 2∑(Y i i )( i ) will not equal zero. Second, because maximum likelihood estimation does not minimize the residual sum of squares, R 2 can be less than 0 or greater than 1. Indeed, adding a predictor variable can potentially decrease R 2. Third, the interpretation of R 2 is problematic when the variance around the regression line changes as a function of the predicted value.

a R 2

b Pseudo R 2.

6 The deviance residual is the square root of the individual contribution of case i to the deviance, d i = sign(Y i i )√2{l(Y i ) − l( i )}, where the sign(Y i i ) preserves the sign of the residual and l(Y i ) is the log of the density of Y i when μ = Y i and l( i ) is the log of the density of Y when μ = i .

7 GENLIN is included in SPSS Version 15 and 16 but only with the Advanced Models module or Graduate Pack.

8 Case diagnostics (leverage, Cook's D, and DFBETAS) for PROC GENMOD are available only in SAS Version 9.2 or later. Note that SAS and SPSS will produce slightly different values for deviance residuals and influence because SAS produces leverage values ranging from 1/N to 1 and SPSS “centers” the leverage values to range from 0 to (N – 1)/N.

9 Overdispersion also commonly occurs in longitudinal panel studies that result in clustering of the data. We do not discuss alternative analysis models for longitudinal designs such as generalized estimating equation models in this article (see CitationZeger, Liang, & Albert, 1988).

10 Some authors (e.g., CitationAllison, 1999) have suggested that another test statistic, the deviance chi-square goodness-of-fit statistic, can also be used to determine dispersion of the model. The Pearson chi-square and the deviance chi-square are typically very close in value.

11 This is the negative binomial 2 or NB2 model described by CitationMcCullagh and Nelder (1989). The negative binomial 1 or NB1, which has a slightly different variance function, is also discussed by McCullagh and Nelder.

12 The equation for the AIC = − 2ln f(y| ) + 2k, where y = (y 1, …, y n ) is a random sample of size n, is a vector of the maximum likelihood parameter estimates, ln f(y| ) is the log-likelihood of the current model, and k is the number of estimated parameters in the model. As shown in the equation, the first term is a measure of lack of fit; the second term is a function that penalizes models with a greater number of estimated parameters. The equation for the BIC = − 2ln f(y| ) + kln (n), where n is sample size. The relative magnitude of the penalty for having more estimated parameters is smaller for larger sample sizes.

13 In SAS, zero-inflated models can be estimated using a far more complicated procedure, PROC NLMIXED.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 344.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.