Abstract
Social scientists often habitually employ ANOVA methods when analyzing data from experiments when other analytic approaches are required instead. This paper illustrates how traditional analytic approaches can lead to incorrect research conclusions by reanalyzing data from a recent study by Williams, Block, and Fitzsimons (Citation2006a). Because the non‐negative dependent variable (illegal drug use) was super skewed and had a large majority of zero values, the use of improper statistical tests and the presence of just a few extreme, outlying observations produced the illusion that asking people to predict their likelihood of drug use increased that behavior significantly, when in fact it did not. The effect of behavior prediction questions on frequency of exercise also turns out to be non‐significant when analyzed properly. As this example illustrates, experimental researchers should choose and implement appropriate analytic approaches carefully.
Jon Krosnick is University Fellow at Resources for the Future. We thank Patti Williams, Lauren Block, and Gavan Fitzsimons for providing us with their data sets for the analyses reported here.
Notes
Jon Krosnick is University Fellow at Resources for the Future. We thank Patti Williams, Lauren Block, and Gavan Fitzsimons for providing us with their data sets for the analyses reported here.
1. Because of the presence of zeros in the distribution, the dependent variable must be transformed by adding a constant, as in ln(y+c), where c is a constant that moves the distribution to the right to eliminate the zero values. Selection of c is arbitrary. In the results reported here, we set c = 1, as is routinely done (Clarke & Green, Citation1988; Field et al., Citation1982; Kirk, Citation1995). In addition, we also tried setting c = .0001 and c = .05 and found that the associated p‐values were even less statistically significant than the results we report.
2. Count data are sometimes analyzed via regression based on a Poisson distribution as well, but we restricted our investigation to the more general negative binomial distribution, following recommendations by Gardner et al. (Citation1995) and Long (Citation1995).
3. The huge proportion of zero drug uses means that an estimator for a zero‐inflated distribution might seem appropriate (Long, Citation1995). Such an estimator assumes that the observed distribution results from two different processes, one that determines whether the dependent variable is zero or non‐zero, and a second that determines, among the non‐zeros, what the observed value will be. We did not implement this approach, because we saw no basis for assuming that different processes governed use versus non‐use of drugs and amount of drug use.
4. The log‐transformed distribution was generated by using ln(y) instead of ln(y+1), because the distribution did not include any zero values, so addition of a constant was not necessary.
5. Removing these two outliers also caused the non‐significant correlation that Williams et al. (Citation2006a) reported between the estimated likelihood of using drugs and the actual frequency of drug use within the treatment group (reported: r = .034; p = .761; N = 85) to become stronger and marginally significant (r = .204; p = .065; N = 83), showing again how sensitive the presented conclusions are to outliers.
6. The log‐transformation has to be conducted before the test is used. The log‐transformation in Stata would be done by using the command generate lndrugs = log(drugs) where drugs is the original variable and lndrugs is the transformed variable.