2,240
Views
0
CrossRef citations to date
0
Altmetric
Reviews of Books and Teaching Materials

Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R

by Paul Roback and Julie Legler. Boca Raton, FL: Chapman & Hall/CRC Press, 2020(H)/2021(e-book), xvii + 418 pp., $99.95(H/e-book), ISBN: 9781439885383(H), 9780429066665(e-book).

This book is designed for undergraduate students who have completed their first regression course and are ready for regressions with more realistic and complex data. The book consists of 11 chapters that span generalized linear models and multilevel longitudinal data. The first 3 chapters review preliminary materials, such as multiple linear regression, likelihood-based methods, and distribution theory, all with comprehensive examples. Afterward, the authors incrementally build upon the statistical methods and software implementations presented, focusing on regression analyses with nonnormal, possibly dependent data. Multiple datasets in the textbook are also available on the GitHub repository. The authors have taught a course at St. Olaf College with this book and covered all the chapters within a 24-week semester.

The core contents of this textbook—beyond the preliminary review chapters (Chapters 1, 2, 3, and 7)—are divided into two parts. Namely, Chapters 4–6 introduce generalized linear models, including Poisson regression and logistic regression, while Chapters 8–11 introduce multilevel statistical models with correlated data. Each chapter increases in complexity with respect to regression models, from the unconditional means model to models with multiple covariates. More specifically: Chapters 4–6 discuss regression methods with count and binary outcomes and introduce a class of models generalized to these types of outcomes. Chapter 7, which describes correlated data, serves as a helpful bridge to the following chapters on multilevel data analysis. Chapters 8–9 illustrate how one works with multilevel (i.e., nested) data and how random and fixed-effects models can account for such correlations. Chapter 9 then introduces longitudinal data as a special kind of multilevel data, while Chapter 10 addresses more complex multilevel models with more than two levels. Finally, Chapter 11 concludes with multilevel generalized linear models, the culminating synthesis of the two primary model categories—nonnormal and multilevel—discussed throughout the book.

This textbook provides plausible case studies and exercises for hands-on experience. At the end of each chapter there are a number of open-ended problem sets, designed to give students a chance to work with real data and build up their experience in data analysis. The attendant datasets are available online. Students using R software will also find this book useful; the authors provide optional sections with some notes regarding R implementations, and the middle of each chapter offers detailed descriptions on how to read R output (e.g., the interpretation of variance components, regression coefficients, etc.). Ultimately, this book’s exercises and explanations will enable all students not only to understand the regression models that they run but also to explain their results to others.

While the book covers a wide range of model types, it appears to be an especially excellent resource for introducing Poisson regression, in Chapter 4. It clearly states the assumptions required for making inferences from Poisson regression models, compared to those implicated by the linear regression model. Graphical representations and point-by-point comparisons that illustrate the differences between these two approaches are particularly informative. The same chapter also introduces useful, advanced model variations, such as a zero-inflated Poisson regression with a specific real-data example. However, outside of this well-presented chapter, the book’s other chapters vary quite a bit in terms of their structure. Therefore, instructors might wish to deviate from the book’s order of contents. For example, Chapter 5 is relatively brief, limited in content, and seems to sit somewhat apart from the flow of the other chapters. Since Chapter 5 aims to describe a broader class of regression models, it might be more natural to teach this chapter before introducing Poisson regression, even though Poisson regression appears in Chapter 4. Similarly, within the chapters, the contents are often arranged in varying orders. Namely, some chapters (e.g., Chapters 4–6) begin by laying out the regression model first, while others (e.g., Chapters 8–11) introduce their respective statistical problems by digging deeply into the case studies.

This book might be easier for students to follow if it focused less on case studies. Opening each chapter with a list of learning objectives is helpful, but in some ways, it felt as if front-loading the chapters with case studies could make it harder for learners—who are still unfamiliar with the mathematical groundwork—to follow how the chapter’s statistical concepts were applied. Examples might be better-placed after the models are explained. For instance, at the beginning of Chapter 9, the complex data structure might have been more straightforward if longitudinal data (possibly with missingness) were represented using notations or long tables instead. Some other chapters (e.g., Chapters 8, 10, and 11) also begin with abundant exploratory analyses, including univariate summaries. Moreover, these analyses often deviate from the respective chapters main concept. As such, each case study’s contextual background may bury those key concepts (i.e., the data structure and the formulas that represent the core statistical problem). Lastly, the exercises presented at the end of each section are mostly descriptive, designed for students to interpret the results of the examples given, rather than to determine how to set up a statistical analysis on their own. Thus, students will find more opportunities in this book for interpretation and for solving conceptual problems than for practical application. Of course, fully understanding how the techniques can apply in real-world contexts and be implemented via software is a crucial part of statistics, but while the book helps students to apply the techniques to the very specific case studies and examples presented, it is less clear for students how they might extrapolate from the examples to apply this knowledge to other problems. For students and instructors seeking more advanced and technical materials with a similar focus, Dunn and Smyth (Citation2018) can be useful.

Even though this book is designed for undergraduate students, it still touches on advanced statistical methods, such as missing data methods and a parametric bootstrap method. In fact, each chapter contains optional sections that invite readers to related advanced concepts, such as technical details regarding the covariance structure of maximum likelihood estimations. Moreover, some chapters provide detailed illustrations of model-building with respect to each case study’s context. However, the inferential objective of many of the case studies is unclear, as well as how, in practice, one’s primary research interest should determine how uncertainties are measured. In addition, it would be nice to see clear separations of estimation and prediction problems, along with each of their respective impacts on statistical modeling, since model-building approaches can also depend on whether we are interested in predictions or in inferences on the parameters.

Overall, this book provides an overview of generalized linear regressions, including those with correlated outcomes. This book will be especially useful for instructors seeking real-world data that can be applied to regression problems. The exercises presented in each chapter can be used as classroom discussion topics or as assignment problem sets. The number of topics and case studies covered by this book is extensive, and some concepts are even beyond the undergraduate level; careful attention to the scope of teaching is needed if this book is being taught over a single semester. However, if an instructor selectively discusses the key statistical concepts and can supplement mathematical representations, this book would provide a comprehensive understanding of regressions with down-to-earth data for undergraduate students in various fields, including the social and biomedical sciences, education, economics, public health, and engineering.

Youjin Lee
Brown University

Reference

  • Dunn, P. K., and Smyth, G. K. (2018), Generalized Linear Models with Examples in R, New York: Springer.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.