828
Views
0
CrossRef citations to date
0
Altmetric
Reviews of Books and Teaching Materials

ANOVA and Mixed Models: A Short Introduction Using R

I found this to be a well-written, practical, concise, and very clear handbook on the appropriate analysis of data from randomized experiments using the freely available R software. The book comes in at 187 pages including the references and the index, so it gets right to the point without a lot of extra words. This book will be easy to understand for practitioners from many different fields who are collecting and analyzing data from randomized experiments, and the consistent focus on data management and data analysis using R is excellent.

The book is very much written in the spirit of Faraway (Citation2014), with plenty of clearly-commented R code and output supported by concise text that describes the code and the output in a clear and no-nonsense manner. Readers will find it very easy to replicate all of the analyses and match the output provided. The writing style in general is extremely clear and practical, with a heavy focus on worked examples. I was happy to see that the book included code for implementing various randomized experimental designs in addition to the code for data analysis. Code for basic data visualization is provided throughout as well, often using ggplot2 functions.

Each of the chapters (summarized below) end with appendices/FAQs/“Outlook” sections that either go into more detail about some of the concepts introduced in the chapters, including code when relevant, or discuss extensions of the approaches introduced and more advanced applications. There are no exercises at the end of each chapter, making it a good handbook/supplement. There is also a very nice supporting website, where all of the code can be easily executed and replicated, and the author is quite open to any questions.

Chapter 1 provides a nice overview of key concepts in randomized experiments, including blocking, randomization, and internal and external validity. All descriptions of these important concepts are clear, brief, and directly to the point.

While Chapter 2 is ostensibly about completely randomized designs, it actually turns out to be a pleasant mini-course in applied statistics. The author provides lots of excellent advice for regression modeling with categorical predictors in general. The reader will find good general introductions of one-way ANOVA and different ways of getting to the same hypothesis test result in R. I really liked the description of the benefits of the ANCOVA approach in this chapter. I also liked the focus on informal visualization as opposed to statistical testing for assessing residual diagnostics. The author consistently formulates analyses in terms of regression models, where each parameter is carefully interpreted. The reader will also find a nice explanation of alternative ways to introduce model constraints (e.g., effect coding versus dummy coding) and its impact on interpretation. I also liked the careful derivation of all pieces of an overall estimator, with an emphasis on clear notation. I very much appreciated the author’s comments on how to critically review the descriptions of research designs or the interpretations of results in research papers (e.g., causal inference versus association, whether residual diagnostics were assessed, etc.).

Chapter 2 also provides good examples of performing power calculations in R, including code for simulation-based approaches. Unfortunately this type of power analysis code was not available to accompany the unique designs and tests covered in each chapter, which is really my only quibble about the book as a whole. It is also important to note that the vast majority of the book focuses on continuous dependent variables.

Chapter 3 reviews multiple comparisons and setting up contrasts to make sense of significant effects. The author clearly introduces the concept of controlling the Type I error rate. Readers will find clear code for applying corrections to adjust for multiple testing and create related visualizations. I liked the consistent approach of showing how multiple functions in R can lead to the same result. The chapter places a nice focus on using the right procedure for the right types of comparisons. The FAQ section at the end of the chapter has a nice discussion of why failing to look at multiple comparisons given a nonsignificant F-test could be erroneous.

Chapter 4 provides discussion of factorial designs with crossed experimental factors. There is a heavy focus on interactions and specifying/plotting/interpreting them, which is great. This chapter also provides detailed coverage of two-way ANOVA. The author pays careful attention to the implications of balanced and unbalanced designs and the modeling approaches that one should follow in each case. Readers will also find a nice discussion of the alternative Type I, Type II, and Type III tests that can be used when testing hypotheses in the unbalanced case. There is also discussion of three-way interactions at the end of the chapter.

Chapter 5 then turns to complete block designs, including randomized complete block designs. I really liked the point that blocking should not be done arbitrarily in analysis and is an important design strategy (and one that should be reflected in the analysis). This chapter also provides good discussion of Latin Square designs, including code for creating and analyzing them in R.

Chapter 6 dives into mixed-effects models. The author provides a great introduction to random effects and how they affect models. The chapter includes a great visual linking intra-class correlation (ICC) values to the idea of between-cluster variance; I will likely steal it for my classes, as I always draw this from scratch. The chapter provides a discussion of REML estimation and includes the use of lme4 functions to fit the models. There is a good comparison of fixed versus random effect approaches to an analysis and a corresponding discussion of what changes with the interpretation. There is also a discussion of models with crossed and nested random effects, along with examples of visualizing and decomposing variance. The author does a nice job of showing how to specify nesting in the lmer() function. The chapter does not really delve into likelihood ratio tests for variance components based on mixtures of Chi-square distributions, which in my view was fine given the smaller datasets that often arise from designed experiments. There is a continued excellent focus on model diagnostics, again primarily for continuous outcomes. I really liked the consistent focus on initial data visualization prior to analysis. The chapter includes a sufficiently complex example of the fitting and interpretation of a mixed-effects model. The chapter also shows how one could fit a mixed-effects model using the aov() function, which is a nice addition that reinforces the underlying variance decomposition when using these models.

Chapter 7 focuses on split-plot designs, which as the author notes are more common than people think in practice, and ties everything back to what is introduced in earlier chapters. There is a good emphasis on the differences between whole-plot and split-plot factors. As is the case in all of the chapters, there is a clear discussion of how degrees of freedom for F-tests are computed based on the specified model. The author includes two solid, thorough examples of mixed-effects models for split plot designs.

Finally, Chapter 8 provides coverage of incomplete block designs. As in prior chapters, there is ample consideration of what happens when applying different analytic approaches. The author discusses the advantages of balanced incomplete block designs (BIBDs) and talks about building BIBDs in R, including code for how to assess whether you actually have a BIBD (using functions that I was previously not aware of!).

This is a wonderful little handbook that should be on a lot of desks. Who is it best for? First, I see this as a supplemental textbook/handbook for an undergraduate or graduate course on the design and analysis of randomized experiments. Some background in probability and statistics is needed, so this could be used for a year-3 or year-4 class in an undergraduate sequence in statistics or a graduate course on experimental design taught in a non-statistics department to students with some prior coursework in statistics. Second, professional researchers and data analysts who are comfortable with R and frequently find themselves analyzing the results of randomized experiments will also find this book very useful. I enjoyed reading it and will definitely make use of it in my work as an applied statistician and survey methodologist.

Brady T. West
Institute for Social Research, University of Michigan-Ann Arbor
Ann Arbor, MI
[email protected]

Reference

  • Faraway, J. J. (2014), Linear Models with R (Second Edition), Boca Raton, FL: CRC Press.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.