1,535
Views
35
CrossRef citations to date
0
Altmetric
Original Articles

Statistical analysis in Small-N Designs: using linear mixed-effects modeling for evaluating intervention effectiveness

ORCID Icon &
Pages 1-30 | Received 17 Aug 2017, Accepted 15 Mar 2018, Published online: 21 Mar 2018
 

ABSTRACT

Background: Advances in statistical methods and computing power have led to a renewed interest in addressing the statistical analysis challenges posed by Small-N Designs (SND). Linear mixed-effects modeling (LMEM) is a multiple regression technique that is flexible and suitable for SND and can provide standardized effect sizes and measures of statistical significance.

Aims: Our primary goals are to: 1) explain LMEM at the conceptual level, situating it in the context of treatment studies, and 2) provide practical guidance for implementing LMEM in repeated measures SND.

Methods & procedures: We illustrate an LMEM analysis, presenting data from a longitudinal training study of five individuals with acquired dysgraphia, analyzing both binomial (accuracy) and continuous (reaction time) repeated measurements.

Outcomes & results: The LMEM analysis reveals that both spelling accuracy and reaction time improved and, for accuracy, improved significantly more quickly under a training schedule with distributed, compared to clustered, practice. We present guidance on obtaining and interpreting various effect sizes and measures of statistical significance from LMEM, and include a simulation study comparing two p-value methods for generalized LMEM.

Conclusion: We provide a strong case for the application of LMEM to the analysis of training studies as a preferable alternative to visual analysis or other statistical techniques. When applied to a treatment dataset, the evidence supports that the approach holds up under the extreme conditions of small numbers of individuals, with repeated measures training data for both continuous (reaction time) and binomially distributed (accuracy) dependent measures. The approach provides standardized measures of effect sizes that are obtained through readily available and well-supported statistical packages, and provides statistically rigorous estimates of the expected average effect size of training effects, taking into account variability across both items and individuals.

Acknowledgments

We are very grateful to Jennifer Shea for her many valuable contributions to this project, from individual testing through data analysis. We also thank Colin Wilson for guidance on statistical theory. This work was supported by the multi-site NIH-supported grant DC006740 examining the neurobiology of language recovery in aphasia.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. The distinction between a repeated-measures design and a time series is that while the former provides multiple observations of the same variable at the same time point, the latter provides only a single observation per time point.

2. Nesting the effect of items within individuals would create separate effects for each word in each individual, e.g., allowing “house” to show a steeper training effect overall in one individual but a shallower effect in another individual. In the analysis we present, by crossing items with individuals, we assume instead that any differences in the effect of training the word “house” not accounted for by other variables, should be common across all individuals included in the study.

3. Type I error is the rejection of the null hypothesis when it is actually true, i.e., incorrectly finding an effect as significantly different from zero at some α threshold (typically 0.05) when it is not. In a treatment study this would correspond to a false positive—reporting an effect as significant when it is not.

4. To be clear, it is statistical independence of the errors that is a basic assumption of linear regression, not of the dependent variable itself. It is natural for data points observed from the same individual across time to show a relationship, due to any number of underlying factors. The problem of autocorrelation arises if this relationship is not accounted for by the independent variables in the model and/or a statistical method of control, and thus remains in the residual errors.

5. Barr et al. (Citation2013) clarify that “if a factor [i.e., a fixed-effect] is a between-unit factor, then a random intercept is usually sufficient” (p. 275). The point here is that if any given unit (e.g., any specific item) only takes on one value for some predictor, then a random slope for that predictor should not be included (by definition, a slope cannot be estimated based on only one value). Thus, e.g., a random slope for word frequency does not make sense by-items, as word frequency only varies between words, not within.

6. Random slopes were included for all variables that related to time (i.e., Session and DaysSince) both by-items and by-individuals, and for Schedule by-individuals, on the basis that they are the critical variables of interest. A random slope for Schedule by-items was not included because no item appeared in both schedules (i.e., each word was either on a Clustered or a Distributed schedule but never both).

7. The choice of modeling a small number of individuals as a group or as separate individuals, depends on both theoretical and practical questions. Theoretically, if the interest is in estimating an effect for an average individual, then the participants should be modeled as a group. If instead the interest is in measuring the effects specific to each individual, then each can be modeled separately. There may, however, be insufficient data for individual LMEM models (e.g., a single observation per individual per time point), but sufficient data when combining the individuals into a group.

8. The random-effects structure in model1 is of crossed random-effects for participants and items. The alternative, nested random-effects, for participants and items would be modeled as (1 + Schedule + Session + DaysSince | Target:Individual).

9. We recommend in this situation using sum-coding (i.e., coding “post” as −1 and “follow-up” as +1) as opposed to treatment or dummy-coding (coding as 0 and 1).

Additional information

Funding

This work was supported by the National Institutes of Health [DC006740];

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 386.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.