The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives: Journal of Personality Assessment: Vol 91 , No 2

Abstract

Count data reflect the number of occurrences of a behavior in a fixed period of time (e.g., number of aggressive acts by children during a playground period). In cases in which the outcome variable is a count with a low arithmetic mean (typically < 10), standard ordinary least squares regression may produce biased results. We provide an introduction to regression models that provide appropriate analyses for count data. We introduce standard Poisson regression with an example and discuss its interpretation. Two variants of Poisson regression, overdispersed Poisson regression and negative binomial regression, are introduced that may provide more optimal results when a key assumption of standard Poisson regression is violated. We also discuss the problems of excess zeros in which a subgroup of respondents who would never display the behavior are included in the sample and truncated zeros in which respondents who have a zero count are excluded by the sampling plan. We provide computer syntax for our illustrations in SAS and SPSS. The Poisson family of regression models provides improved and now easy to implement analyses of count data.

[Supplementary materials are available for this article. Go to the publisher's online edition of Journal of Personality Assessment for the following free supplemental resources: the data set used to illustrate Poisson regression in this article, which is available in three formats—a text file, an SPSS database, or a SAS database.]

Acknowledgments

We thank Howard Tennen and Stephen Armeli for providing us with a portion of the data from CitationDeHart, Tennen, Armeli, Todd, & Affleck (2008). This data set provided the basis on which we developed our simulated data set.

Notes

¹ STATA is another commercial program that is widely used by economists and sociologists for the analysis of count data. It is typically not available in many psychology departments.

² Retrieved December 27, 2007 from http://www.nationmaster.com/graph/peo_tot_fer_rat-people-total-fertility-rate.

³ Skewness is the extent of asymmetry of the distribution and is indexed in the population by γ₁ = μ³/σ³, where μ³ is estimated by 1/N Σ (X _i – _i)³, and σ is the population standard deviation. Excess positive kurtosis refers to distributions having sharper peaks and longer tails than the normal distribution. Excess kurtosis relative to the value 0 for the normal distribution is indexed in the population by μ⁴/σ⁴ – 3. Positive kurtosis (long tails) is associated with bias in estimates of standard errors.

^a R ²

^bPseudo R ².

⁴ The intercept is the predicted value of the outcome when each of the predictors is equal to zero. In this illustration, if the measure of sensation seeking were rescaled so that zero had a meaningful value (e.g., if it were rescored as a 0 to 6 scale or it were mean centered, sensation_C = sensation – mean[sensation]), then the intercept would be interpretable. The lack of attention of the scaling of the predictors unfortunately makes the intercept a parameter of no interest in many behavioral science applications (see CitationCohen et al., 2003; CitationWainer, 2000).

⁵ Three problems arise with calculating R ² for Poisson regression in the same manner as R ² for OLS regression (CitationCameron & Trivedi, 1998). First, the convenient partitioning of the total sum of squares, ∑(Y _i − )² = ∑(Y _i − _i)² + ∑( _i − )², breaks down. Because of the nonlinear relationship between _i and Y _i, the cross-product term + 2∑(Y _i − _i)( _i − ) will not equal zero. Second, because maximum likelihood estimation does not minimize the residual sum of squares, R ² can be less than 0 or greater than 1. Indeed, adding a predictor variable can potentially decrease R ². Third, the interpretation of R ² is problematic when the variance around the regression line changes as a function of the predicted value.

^a R ²

^bPseudo R ².

⁶ The deviance residual is the square root of the individual contribution of case i to the deviance, d _i = sign(Y _i − _i)√2{l(Y _i) − l( _i)}, where the sign(Y_i – _i) preserves the sign of the residual and l(Y _i) is the log of the density of Y _i when μ = Y _i and l( _i) is the log of the density of Y when μ = _i.

⁷ GENLIN is included in SPSS Version 15 and 16 but only with the Advanced Models module or Graduate Pack.

⁸ Case diagnostics (leverage, Cook's D, and DFBETAS) for PROC GENMOD are available only in SAS Version 9.2 or later. Note that SAS and SPSS will produce slightly different values for deviance residuals and influence because SAS produces leverage values ranging from 1/N to 1 and SPSS “centers” the leverage values to range from 0 to (N – 1)/N.

⁹ Overdispersion also commonly occurs in longitudinal panel studies that result in clustering of the data. We do not discuss alternative analysis models for longitudinal designs such as generalized estimating equation models in this article (see CitationZeger, Liang, & Albert, 1988).

¹⁰ Some authors (e.g., CitationAllison, 1999) have suggested that another test statistic, the deviance chi-square goodness-of-fit statistic, can also be used to determine dispersion of the model. The Pearson chi-square and the deviance chi-square are typically very close in value.

¹¹ This is the negative binomial 2 or NB2 model described by CitationMcCullagh and Nelder (1989). The negative binomial 1 or NB1, which has a slightly different variance function, is also discussed by McCullagh and Nelder.

¹² The equation for the AIC = − 2ln f(y| ) + 2k, where y = (y ₁, …, y _n) is a random sample of size n, is a vector of the maximum likelihood parameter estimates, ln f(y| ) is the log-likelihood of the current model, and k is the number of estimated parameters in the model. As shown in the equation, the first term is a measure of lack of fit; the second term is a function that penalizes models with a greater number of estimated parameters. The equation for the BIC = − 2ln f(y| ) + kln (n), where n is sample size. The relative magnitude of the penalty for having more estimated parameters is smaller for larger sample sizes.

¹³ In SAS, zero-inflated models can be estimated using a far more complicated procedure, PROC NLMIXED.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 344.00 Add to cart

* Local tax will be added as applicable

The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives

Abstract

Acknowledgments

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature