Reviews of Books and Teaching Materials: The American Statistician: Vol 71 , No 1

Click to increase image sizeClick to decrease image size

IBM SPSS for Introductory Statistics: Use and Interpretation (5th ed.). George A. Morgan, Nancy L. Leech, Gene W. Gloeckner, and Karen C. Barrett. Taylor & Francis, 2013, xiv + 238 pp., $49.95(P), ISBN: 978-1-84-872982-7.

IBM SPSS for Introductory Statistics is intended as a supplemental text for introductory statistics or research methods courses in the behavioral or social sciences—it would not work well as a stand-alone text for someone trying to learn introductory statistical methods. According to the authors, the goals of the book are to help readers develop skills in three main areas: choosing appropriate statistical methods based on their research design, interpreting the IBM SPSS output from those methods, and writing about statistical results. The text uses IBM SPSS 20, though the authors note that they have successfully used the text with older versions of the software. For the purposes of this review, I used IBM SPSS 24 and encountered no real differences between its menus and output and those described in the text.

The most important change to this edition is the inclusion of a new chapter on methods for providing evidence for the reliability and validity of data. Other changes to this edition include (1) an expansion of Chapter 5 (Data File Management and Writing About Descriptive Statistics) to include using bar charts to help describe data and guidance on how to write about bar charts, (2) an expanded Appendix about getting started with IBM SPSS that discusses some procedures not otherwise covered in the text, and (3) updated windows and text to match the output of IBM SPSS 20.

The website for the text, http://www.routledge.com/9781848729827, includes links to student/instructor resources and the text's datasets. The modified high school and beyond (HSB) dataset is used extensively to generate sample research questions throughout the book, with other datasets used for extra problems at the end of most chapters.

This text provides a nice guide to using IBM SPSS to implement statistical methods that would be covered in traditional courses in introductory statistics or research methods, such as descriptive statistics, chi-square tests, correlation and regression, comparing means, and nonparametric tests. While the text does not—and is not intended to—provide great detail on the statistical methods themselves, it does a nice job summarizing the assumptions for each method and providing guidance on how to write about the results.

Methods and Applications of Longitudinal Data Analysis. Xian Liu. Academic Press, 2015, xviii + 511 pp., $150.00(H), ISBN: 978-0-12-801342-7.

In Methods and Applications of Longitudinal Data Analysis (MALDA), Liu provides a comprehensive look at methods for analyzing longitudinal data, with illustrations of the methods on two datasets across different chapters: a randomized controlled clinical trial dataset on the effectiveness of acupuncture treatment on post-traumatic stress disorder (PTSD) and a dataset from the Asset and Health Dynamics Among the Oldest Old (AHEAD) study.

In the preface to MALDA, the author describes the focus of the book as “application and practice”; intended for “professionals, academics and graduate students,” this book is clearly written with the above focus in mind. Liu's approach to presenting methods in longitudinal data analysis is engaging with numerical illustrations following the methods and techniques described in each chapter. The SAS programs, outputs, and interpretation not only help readers understand the methods described in the chapter, but also make it easy for readers to apply the methods to their own datasets.

The book consists of 14 chapters. Chapter 1 covers the basics of longitudinal data analyses and introduces the two datasets used in the book. Chapter 2 reviews some traditional methods for presenting longitudinal data, and paired t-tests for analyzing simple correlated data, which naturally leads to repeated-measure analysis of variance (ANOVA) and repeated-measure multivariate analysis of variance (MANOVA). Chapters 3–6 are devoted to linear mixed models, covering the specification, inference, restricted maximum likelihood estimation (REML), variance–covariance structures, and residual and diagnostics of linear mixed models. Chapter 7 covers some special topics when linear mixed models are not directly applicable, such as nonnormal random effect distribution, and pattern mixture models. Chapters 8–12 review models for non-normal longitudinal data, including generalized linear mixed models and generalized estimating equations; separate chapters describe mixed-effect models for binary outcomes (Chapter 10), nominal outcomes (Chapter 11), and transition models for categorical data (Chapter 12). Chapter 13 introduces the latent growth model and the latent growth mixture model in the structural equation modeling framework. The book concludes with Chapter 14, where methods for handling incomplete longitudinal data are discussed.

To fully understand the methods and techniques, some background of calculus, matrix algebra, and generalized linear models are needed. I agree with the author that researchers without these prerequisites can skip the detailed mathematical formulations and still benefit from the detailed account of the rationales for the methods, the empirical illustrations, and the SAS programming. The book can also be read selectively if readers want to learn a specific method described in the book; for example, generalized estimating equations, multinomial transition model, and latent growth model.

I do have suggestions for improving the book. First, the resolution of the SAS codes, outputs, and the plots are too low, e.g., the low resolution of Figure 2.2 on p. 33 makes it difficult to distinguish the trajectories between groups. Second, there are many typos which might mislead readers (e.g., on p. 30, “WHERE = 1” should be “WHERE treat = 1”). Third, including a chapter on study planning (e.g., sample size calculation) could be helpful for the target readers who may need design their own experiments. Finally, some of the author's discussion on how to deal with missing data is not accurate and I would refer readers to the book by Daniels and Hogan (Citation2008) or the classical book by Little and Rubin (Citation2014). I understand it is impossible to cover every aspects of longitudinal data analysis in a single book, and the author has referred readers to other books for some other topics that are not delineated further, for instance, models for longitudinal count data, time to event models.

Overall, this book offers a comprehensive overview of models and methods for longitudinal data analysis. The book may be used as a reference for researchers who are not familiar with longitudinal data, or a textbook for a nonstatistics major graduate-level course in longitudinal data analysis. The numerical illustrations using SAS programs would be useful references for practitioners in biomedicine, demography, psychology, sociology, and epidemiology.

Parallel Computing for Data Science:With Examples in R, C++ and CUDA. Norman Matloff. Boca Raton, FL: Chapman & Hall/CRC Press, 2015, xxiii + 324 pp., $62.95(H), ISBN: 978-1-46-658701-4.

Parallel computing in data science and, in particular, statistics is growing ever more important as datasets and the models used to analyze them grow in size and complexity. The use of ever more advanced computing methods by data analysts is an area experiencing a rapid growth that is likely to continue for some time. It is within this context that Dr. Matloff's Parallel Computing for Data Science arrives as a reference “to provide a general treatment of parallel processing in data science.” Matloff's timing could not be better with many parallel interfaces implemented as R packages having now been around for some time. Because the field is evolving so quickly, however, a text like this would ideally be general enough to convey important concepts that are applicable more broadly than the particular technology used to demonstrate them. Matloff's first edition is a helpful contribution, especially considering the difficult balancing act the author faced!

At the start of Chapter 1, the book wastes no time before diving into an example to demonstrate that even “embarrassingly parallel” problems may not exhibit good scaling. The example is revisited later in Chapter 3, where a solution to the scaling problem is presented by introducing the concept of “data chunking” and the tradeoff between the granularity of how a problem is divided up among parallel computing resources versus the communication overhead incurred. Inserted between these two chapters is Chapter 2, a basic discussion of memory hierarchy, latency, bandwidth, and networking in modern parallel computing architectures as well as “Big O” notation.

Much of the text is centered around using the R statistical computing language, although both C and CUDA are covered in later chapters. Chapter 4 provides a closer look at shared-memory computing in R while Chapter 5 covers the topic from the perspective of the C language accessed via R using the .C interface in RCPP. The use of graphics processing units (GPUs) via CUDA is introduced in Chapter 6 with a particular focus on writing CUDA kernels. Using such computing resources from R is mostly left as an exercise for the reader, although basic operations implemented in the R library Rth are briefly discussed in Chapter 7. The Message Passing Interface (MPI) to parallel computing is introduced in Chapter 8 and demonstrated by an example implementing a parallel algorithm to search for prime numbers. Chapter 9 then discusses MapReduce while Chapters 10 and 11 discuss parallel sorting and prefix scanning, topics perhaps of less interest to statisticians than to other data scientists. Chapter 12 discusses parallel matrix methods and libraries, and the book finishes with Chapter 13, which discusses what the author refers to as statistical approaches, or approximations, such as “chunk averaging.”

I am a statistician with a computer science background and aim to review Matloff's book from the perspective of statisticians with a wide variety of computing abilities and knowledge. Overall, the book covers a wide variety of topics and in so doing sacrifices depth and detail for breadth of coverage. There are of course tradeoffs between depth and breadth meaning that the book may be more useful for some readers than for others. For those fairly experienced in programming and possibly to some degree in parallel programming, the text may be a useful reference for concise summaries of particular topics. For instance, I am fairly experienced in parallel programming but have only had a brief exposure to CUDA. As such, I found Chapter 7 introducing Rth quite useful as I was unaware of this package and its ability to use the Thrust interface to CUDA. Yet I feel I would need further documentation to be confident in using Rth in my own project. At the same time, the advanced user would likely desire a deeper exploration of the topics and tools introduced in the book. For instance, the “chunking” example introduced in Chapter 1 would be fairly simplistic for an experienced reader. On the other hand, a novice statistician with minimal programming skills (e.g., a new graduate student with only a basic ability to implement analyses in R) would likely be overwhelmed by the text. For example, in its 284 pages (excluding appendices), the book only included one descriptive diagram.

The breadth of topics presented, in particular the R packages discussed for parallel computing in data science are a truly useful all inclusive reference. At the same time, covering such a wide variety of techniques in one place necessarily means some detail and/or conceptual discussion is sacrificed to keep the text to a reasonable length. The book contains no exercises and does not appear to be designed as a learning tool for use in a course. Nonetheless, in this age of rapidly expanding data and quickly changing computing technologies, statisticians need good reference materials to aid them in performing modern analyses and Dr. Matloff's Parallel Computing for Data Science is a timely and welcome development for our field.

Sharpening Your Advanced SAS® Skills is written as a reference guide for experienced SAS® users. Although the book is based upon the author's extensive experience in the pharmaceutical industry, the content is applicable to any SAS® user interested in refining their programming techniques in any industry. The organization of the book enables it to serve as both a quick syntax reference guide and also a primer of potentially new programming techniques. Some key layout features include: examples that drill down to line-by-line detailed syntax explanations and additional options that may be available; comparisons of similar statements that allow the reader to discover new and potentially better ways to perform a task; pictographs and tables to help convey points; and review questions to reinforce understanding of the subject matter. As the title suggests, the book is not intended for an SAS® novice, but even advanced SAS® programmers can expect to learn something novel.

In Chapter 1, the author details the subject of accessing and manipulating data using Proc SQL. It begins with an overview and a case for using Proc SQL by highlighting the benefits. The section on table access and retrieval includes a wonderful, detailed description of table joins, which is sure to be an often-referenced guide and the same is true for the succinct section on creating macro variables with Proc SQL. Concepts such as creating columns, grouping, sorting, modifying table structure and content, and connecting to relational databases are well documented with an abundance of well-described examples.

Chapter 2 explores the topic of SAS® macro processing. While the chapter starts with a basic description of what the SAS® macro language is and some more basic information such as the role of the ampersand (&) symbol and the percent sign (%) symbol, it moves quickly on to more advanced techniques. A reader who is new to the macro language could get up and running by a careful study of the first few sections of this chapter. But readers with more experience would be wise to skim over these introductory sections as well since the author has embedded details on some nice options throughout, some of which are likely to be new. A particularly useful section in this chapter is a detailed description of the art of testing SAS® macros.

Chapter 3 digs into more advanced programming techniques. This chapter will be a great place to explore for readers interested in improving efficiency of SAS® code in terms of space, processing time, and programming time. Topics include a section on table lookup techniques, which describes and compares techniques such as arrays, indices, hash tables, data set merges, IF-THEN/ELSE, and SQL joins. There is extended exploration of arrays and hash tables that describe their premise as well as detailed examples of both syntax and occasions for appropriate use. The chapter is rounded out with multiple tips on improving performance, making it a great resource as a reference or primer in these areas.

Chapter 4 highlights changes that are new in SAS® version 9.3. At the time of this review SAS® version 9.4 is available. However, this chapter does include several useful examples and some options that the reader may not yet know, such as an option in Proc SORT that makes sorts case insensitive.

The organization and detailed examples in this book are sure to make it a go-to resource in day-to-day programming.

Reviews of Books and Teaching Materials

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Reviews of Books and Teaching Materials

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature