1,517
Views
35
CrossRef citations to date
0
Altmetric
Original Articles

Statistical analysis in Small-N Designs: using linear mixed-effects modeling for evaluating intervention effectiveness

ORCID Icon &
Pages 1-30 | Received 17 Aug 2017, Accepted 15 Mar 2018, Published online: 21 Mar 2018
 

ABSTRACT

Background: Advances in statistical methods and computing power have led to a renewed interest in addressing the statistical analysis challenges posed by Small-N Designs (SND). Linear mixed-effects modeling (LMEM) is a multiple regression technique that is flexible and suitable for SND and can provide standardized effect sizes and measures of statistical significance.

Aims: Our primary goals are to: 1) explain LMEM at the conceptual level, situating it in the context of treatment studies, and 2) provide practical guidance for implementing LMEM in repeated measures SND.

Methods & procedures: We illustrate an LMEM analysis, presenting data from a longitudinal training study of five individuals with acquired dysgraphia, analyzing both binomial (accuracy) and continuous (reaction time) repeated measurements.

Outcomes & results: The LMEM analysis reveals that both spelling accuracy and reaction time improved and, for accuracy, improved significantly more quickly under a training schedule with distributed, compared to clustered, practice. We present guidance on obtaining and interpreting various effect sizes and measures of statistical significance from LMEM, and include a simulation study comparing two p-value methods for generalized LMEM.

Conclusion: We provide a strong case for the application of LMEM to the analysis of training studies as a preferable alternative to visual analysis or other statistical techniques. When applied to a treatment dataset, the evidence supports that the approach holds up under the extreme conditions of small numbers of individuals, with repeated measures training data for both continuous (reaction time) and binomially distributed (accuracy) dependent measures. The approach provides standardized measures of effect sizes that are obtained through readily available and well-supported statistical packages, and provides statistically rigorous estimates of the expected average effect size of training effects, taking into account variability across both items and individuals.

Acknowledgments

We are very grateful to Jennifer Shea for her many valuable contributions to this project, from individual testing through data analysis. We also thank Colin Wilson for guidance on statistical theory. This work was supported by the multi-site NIH-supported grant DC006740 examining the neurobiology of language recovery in aphasia.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. The distinction between a repeated-measures design and a time series is that while the former provides multiple observations of the same variable at the same time point, the latter provides only a single observation per time point.

2. Nesting the effect of items within individuals would create separate effects for each word in each individual, e.g., allowing “house” to show a steeper training effect overall in one individual but a shallower effect in another individual. In the analysis we present, by crossing items with individuals, we assume instead that any differences in the effect of training the word “house” not accounted for by other variables, should be common across all individuals included in the study.

3. Type I error is the rejection of the null hypothesis when it is actually true, i.e., incorrectly finding an effect as significantly different from zero at some α threshold (typically 0.05) when it is not. In a treatment study this would correspond to a false positive—reporting an effect as significant when it is not.

4. To be clear, it is statistical independence of the errors that is a basic assumption of linear regression, not of the dependent variable itself. It is natural for data points observed from the same individual across time to show a relationship, due to any number of underlying factors. The problem of autocorrelation arises if this relationship is not accounted for by the independent variables in the model and/or a statistical method of control, and thus remains in the residual errors.

5. Barr et al. (Citation2013) clarify that “if a factor [i.e., a fixed-effect] is a between-unit factor, then a random intercept is usually sufficient” (p. 275). The point here is that if any given unit (e.g., any specific item) only takes on one value for some predictor, then a random slope for that predictor should not be included (by definition, a slope cannot be estimated based on only one value). Thus, e.g., a random slope for word frequency does not make sense by-items, as word frequency only varies between words, not within.

6. Random slopes were included for all variables that related to time (i.e., Session and DaysSince) both by-items and by-individuals, and for Schedule by-individuals, on the basis that they are the critical variables of interest. A random slope for Schedule by-items was not included because no item appeared in both schedules (i.e., each word was either on a Clustered or a Distributed schedule but never both).

7. The choice of modeling a small number of individuals as a group or as separate individuals, depends on both theoretical and practical questions. Theoretically, if the interest is in estimating an effect for an average individual, then the participants should be modeled as a group. If instead the interest is in measuring the effects specific to each individual, then each can be modeled separately. There may, however, be insufficient data for individual LMEM models (e.g., a single observation per individual per time point), but sufficient data when combining the individuals into a group.

8. The random-effects structure in model1 is of crossed random-effects for participants and items. The alternative, nested random-effects, for participants and items would be modeled as (1 + Schedule + Session + DaysSince | Target:Individual).

9. We recommend in this situation using sum-coding (i.e., coding “post” as −1 and “follow-up” as +1) as opposed to treatment or dummy-coding (coding as 0 and 1).

Additional information

Funding

This work was supported by the National Institutes of Health [DC006740];

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.