984
Views
2
CrossRef citations to date
0
Altmetric
Research Article

The regression trap: why regression analyses are not suitable for selecting determinants to target in behavior change interventionsOpen DataOpen Materials

&
Article: 2268684 | Received 24 Feb 2022, Accepted 02 Oct 2023, Published online: 25 Oct 2023
 

ABSTRACT

Objective

Regression analyses are commonly used for selecting determinants to target in behavior change interventions, but the aim of this article is to explain why regression analyses are not suitable for this purpose (i.e. the regression trap).

Methods

This aim is achieved by providing (1) a theoretical rationale based on overlap among determinants; (2) a mathematical rationale based on the formulas that are used to calculate regression coefficients; and (3) examples based on real-world data.

Results

First, the meaning of regression coefficients is commonly explained as expressing the association between a determinant and a target behavior ‘holding all other predictors constant.’ We explain that this often boils down to ‘neglecting a part of the psyche.’ Second, we demonstrate that the interpretation of regression coefficients is distorted by correlations between determinants. Third, the examples provided demonstrate the impact this has in practice. This results in interventions targeting determinants that are less relevant and, thereby, have less impact on behavior change.

Conclusion

There are theoretical, mathematical, and practical reasons why regression analyses, and by extension multivariate analyses relying on correlations, are not suitable to select determinants to target in behavior change interventions. Instead, intervention developers should consider univariate distributions and bivariate association estimates simultaneously and there are freely accessible tools available to do so.

Open Scholarship

This article has earned the Center for Open Science badges for Open Materials. The materials are openly accessible at https://osf.io/q9scj/.

Acknowledgements

The authors thank Rosa Thielmann and Rob Ruiter for providing insightful feedback on an earlier draft of this article.

Data availability statement

Data sharing not applicable – no new data generated.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Ethics statement

This paper describes theoretical and mathematical aspects of a certain type of analyses and does not involve actual data, nor work with participants, so and ethics statement is not applicable.

Notes

1 Although proportion of variance explained has been described as a pessimistic measure (Sutton, Citation1998).

2 While the examples used in this article concern linear regression models, the same reasoning applies to other regression models (e.g., logistic regression models).

3 A theory is a set of concepts and/or statements with specification of how phenomena relate to each other. Theory provides an organizing description of a system that accounts for what is known, and explains and predicts phenomena (Davis et al., Citation2015).

4 For example, somebody who likes running may enjoy experiencing a runner’s high (an affective determinant of behavior), but may also consider beneficial health effects of running to be optimal when running for at least the period of time that also yields that runner’s high (a cognitive determinant of behavior). In this case, the person’s causal representations around runner’s high are an intrinsic part of both constructs.

5 A system for modeling theories of behavior using a formal representation of the constructs within theories and the ways in which constructs relate to or interact with each other (Hale et al., Citation2020).

6 Multicollinearity refers to the phenomenon that if predictors correlate, each predictor supplies less information than if they had been orthogonal. This multicollinearity (as expressed by a predictor’s tolerance, i.e. 1 minus the R2 obtained when regressing the predictor on all other predictors, e.g. if the R2 of attitude, predicted by perceived norm and behavioral control, is .67, then attitude’s tolerance is .33) then causes the standard error of the predictor to become larger (by a factor equal to the variance inflation factor [VIF], which is the reciprocal of its tolerance; if attitude’s tolerance is .33, then its VIF is 3 [i.e., 1/.33] and its standard error is three times larger than if its correlation with perceived norms and behavioral control had been 0). Multicollinearity is a problem because it results in very unstable coefficient estimates (as manifesting in, for example, wide confidence intervals). However, the degree to which the predictors are unbiased estimates of the population value is not influenced. This in contrast to the problem we outline in this article, which does constitute bias in regression coefficients.

7 In principle, using a frequentist approach, decisions following from comparison of correlations should take the confidence intervals of these correlation coefficients into account, because they are indicative of the accuracy with which these correlations can be estimated. The wider these confidence intervals are, the more likely it is that there is overlap among the confidence intervals of various determinants, which makes it harder to differentiate among determinants. This stresses the importance of taking desired accuracy of parameter estimates into account during study planning (Peters & Crutzen, Citation2021).

8 The term multivariate, as opposed to multivariable, is used in contrast to bivariate in order to distinguish analyses including more than two variables with those including only two variables (Denis, Citation2020).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.