364
Views
0
CrossRef citations to date
0
Altmetric
Reviews of Books and Teaching Materials

Reviews of Books and Teaching Materials

Pages 307-315 | Published online: 19 Nov 2014
 

Analyzing Wimbledon: The Power of Statistics.

Franc Klaassen and Jan R. Magnus. New York: Oxford University Press, 2014, xvi + 252 pp., $29.95(P), ISBN: 978-0-19-935596-9.

As an avid tennis fan for more than 20 years, I have always been intrigued with the role statistics can play in analyzing the outcome of a particular game, set, or even match. For example, when I watch Serena Williams, one of the best returners in the women’s game, receive serve and if she manages to take the first two points of a game, I have often wondered how probable it is she wins that game. Similarly, when Roger Federer, arguably the greatest men’s tennis player of all time, commits to serving and volleying behind a first serve, how likely does he win that point? Does Milos Raonic, one of the young rising stars in men’s tennis who possesses a cannon for a serve, win the majority of his matches if his first serve percentage is above 60%?

The game of tennis is rich with descriptive statistics to help provide insight into answering such questions, but it is also plentiful with other commonly accepted views concerning a variety of aspects within the game—dare I say, the “finer points” of the game. I must admit that it is difficult to watch a tennis match on television without one of the commentators stating that serving first in a set is an advantage, or the top-ranked players in the world play the so-called “big points” better, or that it is more difficult for a player to hold serve when he/she is trying to consolidate a break. Such statements are uttered time and time again during telecasts, and I am sure that others besides me have wondered about their validity. More to the point, is there statistical evidence to back up these widely held beliefs?

Fortunately, for die-hard tennis aficionados such as myself, this comprehensive new book by Professors Franc Klaassen and Jan Magnus investigates such well-known theories and hypotheses surrounding the game of tennis (22 of them in fact, many of which tennis pundits hold very strong views) through statistical analysis of point-by-point data from approximately 500 Wimbledon matches over the period from 1992 to 1995. At its core, this book is a culmination of various academic articles the two Dutch authors have written since 1999 on the game of tennis. The statistical methods used throughout the book range from straightforward to advanced, only really requiring background in probability and statistics at the level of a third- or fourth-year undergraduate student in the North American college/university system.

Following Chapter 1 that provides a stimulating introduction and underscores the fact that scoring in tennis is almost objective (much more so than in other sports such as football or basketball, for instance), Chapters 2 through 4 describe the use of the authors’ computer program Richard (aptly named after former Dutch Wimbledon champion Richard Krajicek) in forecasting the outcome of tennis matches as well as assessing the importance of certain points in a match. Central to their initial approach is the main assumption that winning a point on service follows an independent and identically distributed (iid) process. Few readers will believe this assumption actually holds true, and the authors do show later on in the book that this assumption is indeed false. However, what is remarkable though is not so much this conclusion (which is expected), but rather the fact that the deviation from iid is not as large as one would think. As a consequence, the statistical work the authors carry out here under the iid assumption serves as a credible approximation.

Realizing that a simple statistical analysis based on summary statistics like averages is not sufficient in making accurate conclusions on a number of often-espoused hypotheses, the authors make a strong case for the use of more sophisticated statistical techniques that call upon the theory of estimation, testing, and inference. As a result, Chapters 5 through 9 are much more methodological in nature, as several statistical models are introduced to more properly study a player’s quality as well as the strategy and efficiency of the service component (since having two serves poses an interesting question as to whether a player is really as good as his/her second serve, so often heard in tennis circles). The authors employ a generalized method of moments estimation scheme, taking into account critical factors such as differences in the quality of the tennis players (which is cleverly handled through the formulation of a “pyramid-based” design connected to a player’s world ranking), form of the day, and momentum in a match. These chapters, like much of the book, are full of interesting tables and figures to complement the statistical analysis and ensuing discussion.

Armed now with a more reliable statistical model, the authors in Chapters 10 through 12 revisit their earlier iid assumption and show that it is indeed the top-ranked players in the world who tend to play service points in an iid fashion, whereas weaker players deviate from iid in two fundamental ways: (i) their performance at the current point depends on what happened at the previous point, and (ii) they are also affected by the importance of the point (such as a break point). As the authors do throughout the book, they distinguish between men’s tennis and women’s tennis and, in some cases, discover that certain conclusions only apply to a particular gender. For example, there is statistical evidence to support a larger “discouragement effect” of missed break point opportunities in the women’s game than in the men’s game, meaning that the impact of a missed break point chance in the previous game tends to significantly affect the current game where the discouraged female receiver is now serving.

The authors sum things up nicely in Chapter 13, which provides an excellent account of earlier statistical findings by revisiting each of the 22 hypotheses one final time. The book also features three useful appendices. Appendix A gives a succinct summary of tennis rules and terminology that serves (no pun intended) to educate readers who are less familiar with the game of tennis. Appendix B contains a nice summary of the mathematical notation used for the numerous parameters and variables introduced in the book. For readers wanting a little more, Appendix C provides further details pertaining to the Wimbledon dataset used as well as the construction of the authors’ computer program Richard (which can be downloaded from the authors’ websites). The list of references on the game of tennis is also extensive, which makes the book a very useful reference for researchers interested in this area.

All in all, this book is an important contribution for statistical researchers who want to pursue their work in sports such as tennis. Even the mathematical statistician less familiar with the game of tennis will be fascinated by the wealth of ideas and techniques described in this book. Andre Agassi, American retired professional tennis player and former world number one, once described the game of tennis as boxing without the gloves. Analyzing Wimbledon: The Power of Statistics certainly pulls no punches and is a thoroughly enjoyable read that delivers on all counts.

Steve DREKIC

University of Waterloo

The A-Z Of Error-Free Research.

Phillip I. Good. Boca Raton, FL: CRC Press, 2013, xx + 249 pp., $52.95(P), ISBN: 978-1-4398-9737-9.

Presenting statistical concepts to a novice audience in a way that balances usefulness with comprehension while conveying the importance of the proper use of statistics is an ambitious and inherently difficult task. The A-Z of Error Free Research is the latest effort by author Phillip Good to provide some degree of statistical independence to the rookie researcher. In the same vein as his previous how-to guide, Common Errors in Statistics (and How to Avoid Them) (with co-author James Hardin), this text aims to provide a step-by-step guide to the design and analysis of clinical research studies, touching on topics as fundamental as variation and as complex as bootstrapping.

Though the title would suggest a thorough development of the topic, this text serves more as a primer on the fundamentals of research by presenting the pragmatic “10,000-foot view” of experimental design and analysis that an inexperienced researcher with some minimal training in statistics may find useful. Good has filled his primer with useful examples that include code in the R programming language, and has laced it with just enough humor to keep the reader engaged. However, at times the usefulness of the text is offset with an inappropriate level of mathematical detail that could potentially leave the reader confused.

The A-Z of Error Free Research is divided into four functional parts that span the planning, data collection, analysis, and reporting of clinical research experiments, and provides additional material on areas such as model building and observational studies. In the introductory chapters, Good emphasizes fundamental design concepts such as variation, representative samples, hypotheses, and the costs associated with decisions. In Chapter 4, he gives brief attention to designs ranging in complexity from a simple, completely randomized two-group design to an incomplete block design, along with the sage advice to keep it simple. Chapters 5 and 6 address outcome measures—what is a “good” response variable?—and data quality control. Data analysis is the focus of Chapters 7–11, which cover descriptive statistics, common inferential tests, multiplicity, and sample size estimation. The author also provides an exposure to topics not addressed in introductory level statistical methods courses such as the bootstrap, errors-in-variables regression, and principle component analysis. The latter chapters on reporting results, oral presentations, and graphics may be considered the most useful to the early-stage researcher, along with the “prescriptions” or recipes for the task at hand placed at the beginning of every chapter. Good closes each chapter with specific references to more in-depth development of topics for the motivated reader.

To maximize the impact of the text, a reader should have a background in research methods equivalent to an introductory graduate-level course in statistics. For those individuals, this text will help put theory into practice. The less-experienced researcher may find themselves confused and wanting for detail. Though the author provides an oversimplified view of topics such as survey development and elicitation of effect sizes and variance, at the very least he brings these and other important issues to the attention of the researcher and prepares them for eventual discussions with the statistician with which they should consult. An added note of caution should be given to the statistical novice—topics are not self-contained within the text. As a consequence, it should not be used as a cookbook for research, but should only be applied in practice after it is read from cover to cover. For instance, an example analysis using the Student’s t-test is given early in the chapter on hypothesis testing, but the reader is not warned about assumption checking and issues with heteroscedasticity until the later chapter on miscellaneous hypothesis tests.

In summary, this primer on research design helps shed light on some of the fundamental issues in experimental design, analysis, and dissemination of results. Some of the most effective chapters are those regarding logistics and planning, how one should think about data and variation, how best to present results, and even how to respond to rejection. The author assumes some basic knowledge of mathematical statistics and the R programming language, and for the reader with a solid research methods background this primer will be a useful addition to their toolbox.

Jo A. WICK

University of Kansas Medical Center

Bayesian Methods in Epidemiology.

Lyle Broemeling. Boca Raton, FL: Chapman and Hall/CRC, 2014, x + 454 pp., $89.95(H), ISBN: 978-1-466-56497-8.

In this book, Lyle Broemeling introduces Bayesian inference as it might be used in epidemiology. The aim is straightforward; by reading the book and working through the author’s examples, and the examples at the end of each chapter, analysts working in epidemiology are supposed to gain an understanding of how Bayesian methods might be applied to their own work. The book has seven main chapters, each covering topics that should be familiar to epidemiologists with a statistical background at the level of, say, Clayton and Hills (Citation1993).

The topics are describing associations, adjusting for covariates (directly and indirectly), life tables, survival analysis, and disease-screening. The final chapter dips more briefly into advanced topics, including spatial modeling. There is also an extensive introductory chapter describing the seven main chapters in some detail, and short appendices on the Bayesian calculus (i.e., updating priors with data to get posteriors) and on how to run Bayesian analyses in WinBUGS. The WinBUGS software is used throughout the book, with code and models available on the author’s website. Given that Bayesian approaches are often an appealing and rational choice for epidemiologic work, educating practicing epidemiologists about Bayesian methods in this way seems a sensible goal. However, the book appears out-of-date in several ways, and so does not really achieve its goal.

Given its intended audience, my biggest concern is the book’s portrayal of what epidemiologists do. For example, following the widespread adoption of the graphical formulation of causal inference (Greenland, Pearl, and Robins Citation1999) most epidemiologists today define a “confounder” as something more specific than the book’s simplistic definition: a variable related to disease and risk factor. The same generation of epidemiologists would reject the book's repeated suggestion that they are only interested in association, rather than in the thoughtful consideration of what those associations mean. And as today’s epidemiologists—like their statistical colleagues—are eager to embrace “Big Data” (Salathé et al. Citation2012; Khoury et al. Citation2013), it seems particularly jarring that every example dataset in the book can be and is presented within a few pages of typewriter-font text, as if in some pre-internet computing magazine.

Statistically, the book is also worryingly behind the times; for example, it has been known for several years that Gamma(0.001, 0.001) priors for precision parameters are a highly questionable default (Lambert et al. Citation2005; Browne et al. Citation2006; Gelman et al. Citation2006), yet they are ubiquitous here. The book’s advice on MCMC seems at best quaint; there are no multiple chains nor even looks at the behavior of a single chain. Finally, the examples themselves appear in serious need of updating; the book’s presentations consist of already well-worn examples from Clayton and Hills (Citation1993), or Kahn and Sempos (Citation1989), or the WinBUGS documentation (Lunn et al. Citation2000), without any obvious utility over the version in the source material. In addition to these serious problems, the book is poorly edited, does not provide a serious justification for using Bayesian methods instead of the more standard non-Bayesian ones, and includes some heavily mathematical exercises in an otherwise math-lite text.

It is difficult to recommend this book to those who want an introduction to Bayesian methods in epidemiology. For those who do, the texts by Woodworth (Citation2004) and Lesaffre and Lawson (Citation2012) are better places to start. For a more complete introduction to the WinBUGS software, see the BUGS book (Lunn et al. Citation2012), and for further examples of WinBUGS see the books by Congdon (Congdon Citation2003, Citation2005, Citation2006, Citation2010), and/or their extensive archives of accompanying code. If an introduction to Bayesian methods for methodology is needed that avoids calculus and MCMC, highly-readable explanations of Bayesian and approximately Bayesian approaches to many of the simpler situations in this book can be found in Rothman, Greenland, and Lash (Citation2008, chap. 18) and the references therein.

Kenneth RICE

University of Washington

Doing Statistical Mediation & Moderation.

Paul E. Jose. New York: The Guilford Press, 2013, xv + 336 pp., $50.00(P), ISBN: 978-1-4625-0815-0.

Doing Statistical Mediation & Moderation is a successful introductory textbook on mediation and moderation analysis using linear models. It is particularly appropriate for students and applied researchers in psychology, epidemiology, and other social sciences. The real psychological examples in this book might also be of great interest to statisticians interested in important applications in substantive subjects.

This book is well organized and clearly written. Chapter 1 first introduces the author’s personal experience about mediation and moderation analysis, and then tries to clarify the confusions about mediation and moderation from an intuitive perspective. Chapter 2 is a wonderful historical review of mediation and moderation analysis using linear models, with a particularly detailed and insightful discussion of Baron and Kenny’s landmark article. Chapter 3 gives a friendly introduction to the basics of mediation analysis: model fitting, parameter estimation, hypothesis testing, and result interpretation. Chapter 3 assumes a minimal statistical background of the readers, and illustrates all procedures with real examples from psychology. Chapter 4 targets higher-level readers and discusses more advanced topics including multiple mediators, bootstrapping standard errors, longitudinal and multilevel mediators, categorical mediators and/or outcomes, and nonlinear mediation. Chapter 4, building heavily on Chapter 3, contains concrete examples with online computer code for SAS and SPSS to help readers fully understand the materials. Chapter 5 introduces the basics of moderation analysis: data preparation and dummy coding, software implementation, graphical display, and statistical interpretation. Chapter 5 is statistically simpler than the previous two chapters, but the author supplements it with an interesting conceptual discussion about buffers, exacerbators, enhancers, and dampers. Chapter 6 is a survey of several special topics in moderation, and also has suggested further reading so interested readers can dive in further. And finally Chapter 7 discusses the “hybrids” such as moderated mediation, mediated moderation, and even moderated mediated moderation, and shows their applications in psychology.

A really nice aspect of the book is that the author provides all the datasets analyzed, allowing the readers to directly use them to learn and practice. In the appendices, the book provides suggested answers to the exercises, statistical software implementations, and Internet resources for mediation and moderation analysis. Given these comprehensive extra materials, this book could be not only used as a classroom textbook, but could also be used for self-learning.

The textbook targets those with little statistical background in mediation and moderation analysis, and its basic chapters (Chapters 1, 2, 3, 5) are clear and helpful for understanding the essence of mediation and moderation analysis. However, there is a big gap between the basic chapters and the advanced chapters (Chapters 4, 6, 7): the book runs too fast through the more statistically sophisticated materials, which may discourage some readers. Recognizing this potential problem, the author suggested additional reading materials for these advanced chapters, which may hopefully alleviate this draw-back.

Historically, the literature on mediation and moderation analysis started from applied research in psychology and other related social sciences. According to the personal experience of the author, when asked about the problems of mediation and moderation, a traditionally trained statistician might ask applied researchers to “define, explain, and describe moderation and mediation” (page 20 of this book) without providing any constructive suggestions. Currently, however, methodological researchers in statistics, biostatistics, epidemiology, and computer science are quite interested in mediation, moderation, and other related causal mechanisms. Recent years have witnessed a growing literature in these areas based on causal inference using the potential outcomes framework due to J. Neyman and D. B. Rubin or causal diagrams due to J. Pearl, allowing for more general treatment of the topics targeted by this book. Unfortunately, this book does not connect with these avenues of research.

This book is an excellent introductory level mediation and moderation analysis textbook that focuses on linear models to analyze data. It clearly explains the basic research questions and the corresponding regression-based statistical tools dealing with them. Applied researchers interested in applying linear regression-based mediation and moderation techniques to their own data could find an excellent starting point within its pages.

Peng DING

Harvard University

Epidemiology: Study Design and Data Analysis (3rd ed.).

Mark Woodward. Boca Raton, FL: Chapman and Hall/CRC, 2014, xxii + 832 pp., $99.95(H), ISBN: 978-1-4398-3970-6.

There are two primary audiences for Epidemiology Study Design and Data Analysis, researchers wishing to understand statistical issues and methods and applied statisticians wishing to apply their knowledge to the field of epidemiology. This text, like its predecessors, hits the mark. The text begins with fundamental issues of epidemiology and proceeds to discuss important topics such as measuring disease, causality, and “Studies using routine data” meaning ecological data, national data, and international data. These are concepts with which applied statisticians may not be immediately familiar. A chapter on descriptive and inferential statistics follows, as does a chapter on risk factors and confounding and interactions—essential concepts to epidemiology. Chapters on study design and related concepts are also covered before the author discusses different methods of modeling data. The text concludes with chapters on meta-analysis, risk scores, and computer intensive methods.

All chapters are accompanied with exercises making the text useful for coursework. A solutions manual is available for those adopting the text for coursework. These exercises only enhance what is already an excellent text on epidemiology. There are also substantial materials on a publisher-sponsored web site including SAS and Stata programs, a Microsoft Excel spreadsheet for calculating sample size, SAS macros for calculating absolute risk, and SAS and Stata programs for integrated discrimination improvement and net reclassification improvement. There is also a SAS macro for calculating the c-statistic from survival data. The only negative this reviewer finds is that there is not a companion of some sort demonstrating R code and R output parallel to that for SAS and Stata. Finally, all datasets (over 20) are available on the website.

New content to the third edition includes chapters on risk scores and clinical decision rules and computer intensive methods including sections on the bootstrap, permutation tests, and a comprehensive discussion on missing value imputation. These are all topics that are still evolving and it is hard to find such a well-written discussion of them in textbooks. Nonetheless, it is essential to teach these topics to today’s graduate students. The author has also added new material on splines, information criteria, propensity scoring, binomial regression models and, competing risks. These topics are all worth treatment and are seldom included in a text also discussing descriptive and inferential statistics.

Although the author identifies two audiences for Epidemiology Study Design and Data Analysis, this reviewer disagrees and believes there is a third audience—the graduate student. The author writes extremely well and the text is resplendent with exercises. It would be a crime if Epidemiology Study Design and Data Analysis were never used as a text! This reviewer recommends the text be used in a two semester epidemiology course because the information contained in Epidemiology Study Design and Data Analysis is so important that presentation should not be rushed. I wish a text like this had been available for my coursework. Enhancing its value as a text, it will be extremely useful as a reference book to its intended audience—researchers and applied statisticians. At the price of $99.95 it is a bargain and the only excuse for an epidemiologist or applied statistician to not have it on his or her bookshelf is that they have not seen or heard of it. Make this book your next purchase!

Gregory E. GILBERT

Institute for Research and Clinical Strategy, DeVry Education Group

Introduction to Statistical Process Control.

Peihua Qiu. Boca Raton, FL: CRC Press, 2014, xxxvii + 482 pp., $89.95(H), ISBN: 978-1-4398-4799-2.

There are dozens of books in the market under the title of “Statistical Process Control” or SPC. While many are written for nonstatisticians, some like Qiu’s book are designed for statisticians, for example, Montgomery (Citation2013), Koronacki and Thompson (Citation2001), and Ryan (Citation2011). Like Qiu, these authors generally start with a review of statistical methods, followed by several chapters on Shewhart type control charts, change-point detection by CUSUM, EWMA, multivariate control charts, and sampling inspection schemes. Some also include quality improvement by design (Taguchi’s designs). The question is what is new in Qiu’s book. The main difference is in the last four chapters of his book, which focus on multivariate and nonparametric methods. Chapter 7 describes multivariate Shewhart charts; multivariate CUSUM charts; and multivariate EWMA charts. In Chapter 8, the discussion is on univariate nonparametric methods based on ranks, and methods based on log-linear analysis of categorical data. In Chapter 9, we find multivariate nonparametric process control. Finally, in Chapter 10 we have material on profile monitoring of processes. The material in all chapters is presented in a concise manner, without proofs, but with many relevant references. I find the last four chapters of the book valuable and worth having. Graduate students in statistics departments and experienced researchers would benefit from the quick exposure to the ideas and methodologies and in particular from the extensive bibliography.

Although the author remarks that the book can be used by students who have only studied elementary statistics, I believe at least two semesters of probability and statistics are important for understanding the first six chapters. For the last four chapters, students should have courses in linear models, multivariate analysis, and nonparametric statistics, at the beginning graduate level. In short, the book contains material not presented in other books along with many relevant references. It would be a useful addition to the libraries of graduate students and researchers.

Shelemyahu ZACKS

Binghamton University

JMP Start Statistics: A Guide to Statistics and Data Analysis Using JMP (5th ed.).

John Sall, Ann Lehman, Mia Stephens, and Lee Creighton. Cary, NC: SAS Institute, 2012, xvi + 625 pp., $89.95(P), ISBN: 978-1-61290-204-3.

The fifth edition of JMP Start Statistics has been updated to feature JMP 10, though JMP 11 was used for the purposes of this review with only minor obstacles (all of which were easily hurdled by using the Help menu). The Design of Experiments (DOE) chapter has been revamped to “reflect the popularity and utility of optimal designs” (p. xv) and includes a new section on split plot designs. In response to feedback on previous editions, “chapters have been rearranged to streamline their pedagogy, and new sections and chapters have been added where needed” (p. xv). I compared the table of contents for the fifth edition to an online table of contents for the fourth edition. An example of the streamlined presentation is the combination of the chapter on bivariate and multivariate relationships with the chapter on discriminant and cluster analysis. Examples of the more substantial new sections include additional topics in the chapter on control charts and a new section in the Simulations chapter describing some of the examples in the Sample Scripts folder. The chapter on times series analysis has been removed from the fifth edition.

Throughout the text, the detailed, step-by-step instructions on implementing analyses in JMP are very clear and easy to follow. Conveniently, all datasets used in the book are either available in JMP’s Sample Data folder or easily entered by hand. The first four chapters provide a crash course in JMP. Chapter 2, titled “JMP Right In,” shows off many cool features of JMP, such as the dynamic linking of plots and data tables and the emphasis on “context building,” and generally gets readers excited about JMP’s capabilities. Chapter 5 provides a “big picture” overview of statistics, and Chapter 6 is about simulations. The latter chapter has some really great examples, some of which may be beyond the needs of most introductory statistics students. The remaining 13 chapters address a variety of statistical topics, beginning with topics seen in typical introductory courses (e.g., one-sample means, comparing two means, matched pairs, analysis of variance (ANOVA), and chi-square tests) and covering more advanced statistical topics (e.g., logistic regression, correspondence analysis, principal components, random and mixed effects models, and density estimation). It should be noted that with the “contextual” approach, descriptive statistics and specific plots of data are discussed as the initial stage of the larger data analysis process (including inference) for that type of data.

Any qualms that I have with the text are generally minor. The authors have a tendency to mention topics before they have been introduced; for example, on pp. 22–23 when describing JMP’s emphasis on building context for data analysis, they mention histograms, t-tests, scatterplots, and nonparametric methods before any statistical topics have been introduced. Additionally, I do not care for the fact that the book (and JMP) uses the term “histogram” as the name of the displays for both a single numerical and a single categorical variable (e.g., p. 293 and 303).

My biggest qualm (and the basis for my reservation to suggest that this be used as a standalone teaching text) is that the authors tend to skimp on details in some places. (I realize that, in some instances, this is by necessity, otherwise the text would be gigantic.) The most troubling offense is the discussion of the plots generated as part of a logistic regression analysis (p. 318). These plots consisted of the fitted logit curve as well as the “data points.” The values plotted on the x-axis made sense for the example—they were in the dataset. I could not, however, make heads or tails of what was being plotted on the y-axis—they were values between 0 and 1, but were nowhere to be found in the data table nor could they be constructed from information in the data table or output. What was more troubling was that when I repeated the analysis for the example dataset a second time, different “data points” were plotted. To be fair, neither the text nor the JMP Help menu gives a clear explanation of what is being plotted. (It appears that the y-coordinates are randomly generated.)

On the whole, the organization, layout, and clear step-by-step instructions make the fifth edition of JMP Start Statistics a great reference text. I would recommend it as a supplement to a more standard statistical text (possibly in a graduate course that uses JMP, since more material is likely to be covered) or for self-learners interested in using JMP (while it would not be necessary to have a background in statistics, some prior exposure would be helpful). I would also say it is a must-have for instructors who want to start using JMP in their courses; it has some great examples and scripts to use for teaching! An e-book version (ISBN 978-1-61290-307-1) is also available from the publisher.

Jessica L. CHAPMAN

St. Lawrence University

R Statistical Application Development by Example: Beginner’s Guide.

Prabhanjan Narayanachar Tattar. Birmingham, UK: Packt Publishing Ltd., 2013, vi + 324 pp., $44.99(P), ISBN: 978-1-84951-944-1.

R Statistical Application Development by Example authored by Prabhanjan Narayanachar Tattar is a nice compact book that integrates R programming with popular statistical methods. For each topic introduced, an example is provided as well as code implementing the example with R, which is followed by discussion. The cover of the book states: “Learn by doing: less theory, more results,” and this book accomplishes exactly that.

The book serves well for both undergraduate and graduate students who are inexperienced with R. It does not devote much space developing the mathematical background for each topic but instead briefly introduces what the reader needs to know, keeping notation to a level that is easily understandable. Each chapter begins with a description of the overall objective, the topics that will be covered, and what the reader can expect to learn. After a topic has been introduced, there is a section called: “Time for action.” Here, the authors carefully walk the reader through an example and provide detailed R code. Snapshots of output and graphs are provided to further enhance the readers’ comprehension. Following the example, there is a “What just happened” section, where the authors summarize what the reader learned and accomplished. The book also contains “Have a go hero” sections that challenge the reader to carry out exercises on their own.

Chapter 1 begins with an introduction to different types of variables (categorical and continuous), R installation, and its various packages. It then proceeds to give a crash course on some of the more popular discrete and continuous distributions, where relevant R code is given to familiarize the reader with these distributions in an R environment. Chapter 2 goes over the fundamentals of objects, vectors, matrices, and lists. Next, the chapter details how to store variables in data frames followed by a section on how to import data from various types of external files and the various types of R functions available to accomplish this task. There is then a short discussion on the programming language SQL and the chapter concludes with a section on exporting data and graphs and how to manage an R session. Chapter 3 discusses graphical techniques for categorical variables and continuous variables. Chapter 4 focuses on exploratory analysis, with a special attention to methods that are robust to outliers, such as quantiles, the median, and interquartile range. It proceeds with stem-and-leaf plots, letter values, and bagplots (bivariate boxplots) and then it discusses resistant lines and smoothing techniques. The focus of chapter 5 is statistical inference. The chapter opens with the definition of the likelihood function, presents them for some of the well-known distributions, and provides code for how to plot them. It then discusses how to find maximum likelihood estimators by hand and using R. The chapter wraps up with interval estimation and hypothesis testing. Chapter 6 covers linear regression analyses and provides an example complete with code, output, and interpretation. The chapter also discusses ANOVA within the regression context, multiple regression, model building, confidence intervals, diagnostics, and model selection. Chapter 7 is concerned with methods when the outcome variable is binary, including probit and logistic regression, model validation, diagnostics, and ROC curves. Chapter 8 discusses alternative regression methods, such as polynomial regression, splines, piecewise linear regression, and ridge regression as well as associated issues such as over fitting and model assessment. Chapters 9 and 10 introduce the reader to classification and regression trees (CART), and related topics such as pruning, bagging, bootstrapping, and random forests.

Overall the book is a great resource. It clearly spans a wide variety of topics and they are all supplemented with detailed examples. It does have a few problems, however. There are a number of typographical errors, including notational errors, which caused me some confusion. (Presumably, these will be corrected in later printings or editions.) Some of the snapshots of R output and graphs were blurred and/or used print that was too small making them difficult to read. The “Pop quiz” sections described in the book’s preface as being intended to test the readers understanding, located in chapters 8 and 9, were not multiple choice as the preface stated they would be.

In summary, this book should nicely fit the needs of the beginning R programmer who wants to develop a working knowledge of the software. Although other quality options such as the cookbooks of Crawley (Citation2013) and Teetor (Citation2011) cover a greater range of statistical topics, Tattar’s book has a friendly format, numerous examples, detailed code, and commentary that make it a useful addition and a valuable learning resource.

Roberto C. CRACKEL

University of California, Riverside

The R Student Companion.

Brian Dennis. Boca Raton, FL: CRC Press, 2013, xvii + 339 pp., $41.95(P), ISBN: 978-1-439-87540-7.

R is a widely used, “free software environment for statistical computing and graphics” (R Core Team Citation2013). However, as the author of this book points out, “...R goes way beyond statistics. It is a comprehensive software package for scientific computations of all sorts, with many high-level mathematical, graphical, and simulation tools built in” (p. xiv). As this statement may suggest, this book does not focus on the statistical capabilities of R. Rather, it provides an introduction to basic R skills for novice programmers with the idea that by mastering these skills, students will be well prepared to use R in upper-level high school and college mathematics, statistics, and other science courses.

This book is written for readers with only a moderate knowledge of high school algebra, and focuses on using R for mathematical and scientific calculations that precede calculus and statistics courses. The author states, “Anything in science, mathematics, and other quantitative courses for which a calculator is used is better performed in R” (p. xiv). The instruction in most chapters is two-fold: a review of a mathematical, scientific, or statistical concept paired with a hands-on demonstration of R using data that have been published in a scientific journal article. The reader is encouraged to practice coding in R using the examples that are given in the text and using the Computational Challenges that are listed at the end of each chapter. Both the mathematical and coding instruction is written in a conversational tone, which is well suited for the intended high school and undergraduate audience.

Chapters 1–5 introduce fundamental concepts of using R. Chapter 1 shows how basic mathematical operations are performed in the console window and demonstrates how vectors are used in R. During this introduction, the reader is reminded of the algebraic order of operations. This chapter concludes with simple graphing commands and a real-life ecology example. Chapter 2 demonstrates some “best-practice” coding techniques while introducing the use of R scripts. The example for this chapter is a compound interest equation, which is derived and then coded in an R script. Built-in and user-defined functions are introduced in Chapter 3, and an example is given of both types of functions to calculate the molar mass of a compound. The graphical tools described in Chapter 1 are enhanced in Chapter 4, where various one- and two-variable graphical techniques are demonstrated using an economic/political dataset. Data input and output is the topic of Chapter 5, where a “large” dataset (n = 90) relating university grade point averages to other variables is employed to demonstrate the use and manipulation of data frames in R.

Chapters 6 and 7 focus on programming tools such as loops, logical and Boolean operators, and conditional statements. Loops are demonstrated using a population growth example, and the other tools are used to simulate baseball at-bats, and to construct a profile plot for a blood pressure dataset. Chapters 8–10 mathematically describe quadratic, trigonometric, and exponential functions, and then R code is used in conjunction with real-life data to explore these functions further. Data relating fishing effort to fishing yield are modeled in Chapter 8, distances to nearby stars are calculated in Chapter 9, and examples such as radioactive decay and population growth are examined in Chapter 10. Chapters 11 and 12 cover simple matrix algebra concepts, with Chapter 11 focusing on vector and matrix operations and Chapter 12 focusing on solving systems of linear equations using matrices. Population growth is again studied, this time using matrices, and the time to the next Old Faithful eruption is solved using a system of linear equations. Advanced tools used to customize graphs are described in Chapter 13, as are three-dimensional plots. The latter are used to show sums of squares for the Old Faithful eruption data using a range of possible slope and intercept values. To prepare the reader for simulation capabilities of R, probability is discussed in Chapter 14. This discussion is followed by a simulation of randomly fluctuating stock prices. Linear and nonlinear statistical models are discussed in Chapter 15, and the methods are demonstrated using the fishing effort to fishing yield data and population growth data of earlier chapters. In Chapter 16, all of the R and mathematical tools are combined in a large example that models the Earth’s trajectory around the Sun.

The book concludes with three appendices. Appendix A informs the reader how to install R. Appendix B details the various methods for obtaining help in R. Appendix C is a short (14 page) mini-manual detailing R commands. Many of these commands appear in the main text of the book, while other commands allow the user to begin to explore additional R capabilities.

With the Computational Challenge problems given at the end of each chapter, the format of this book is like a textbook. In fact, the book itself mentions presenting what is learned in front of the class (see, e.g., p. 1), and in the Preface the author gives ideas of how this book could be used in a classroom setting. However, due to the assortment of mathematical/scientific topics, this book may not fit well into a typical high school or college curriculum. While the wide variety of mathematical and scientific topics make the book interesting, they also make the book difficult to integrate into a standard course. With that being said, this is a good book for high school or college students wanting to learn R on their own. Complete mathematical explanations paired with computational examples in R provide an excellent tool for these students to obtain a solid foundation in R.

There are many other books and online resources that provide an introduction to R. However, unlike many of the other books, this book does not target an upper-level statistical audience. Instead, it attempts to engage the mathematically less-mature reader at a level that is appropriate for him/her to form a solid understanding of R. The author acknowledges that, “... R is huge, and this book is not meant as a comprehensive guide to R” (p. xiii). The premise is, that by allowing students to learn how to handle quantitative problems that arise in the high school and college curriculum, they will be well prepared to handle more difficult material in the future. By having a solid understanding of R tools for the more simple material, they will be able to extend their R knowledge to handle more complicated scientific computations with minimal difficulty.

Erin R. LEATHERMAN

West Virginia University

Reproducible Research With R and RStudio.

Christopher Gandrud. Boca Raton, FL: CRC Press, 2014, xxv + 288 pp., $69.95(P), ISBN: 978-1-466-57284-3.

This book fills a niche in the sense that it provides some broader context to the overall data analysis process than Yihui Xie’s Dynamic Documents with R and Knitr. Xie’s (Citation2013) book is technical, and gives many specific examples and explanations, with particular focus on the R package, knitr. Gandrud’s book, on the other hand, is more general and gives a basic overview of the entire research process. Gandrud has written a great outline of how a fully reproducible research project should look from start to finish, with brief explanations of each tool that he uses along the way. Something that Gandrud does very well is to provide sources that expand upon his skeletal introductions to topics. It is a good place to start when you need to know where to go next, kind of like how people tend to use Wikipedia.

The book has four parts. The first tells you how to get set up to begin conducting reproducible research, including an explanation of the topic, starting to use R, RStudio and knitr, and file management. The second part focuses on data, collecting, storing, sharing, and cleaning data. This section even introduces version control systems using github (https://github.com/) for collaborative projects. The third part focuses on analysis and results. The fourth part wraps up with producing pdf and web reports, and presentations.

This book is very readable. Readers with little-to-no experience with R and RStudio may find it difficult at times to get through the code, but usually there is an explanation of the purpose or function after each code chunk.

There are several typographical errors (e.g., “e” or “c” is omitted from “cache” in the arguments to a chunk of knitr code), which would cause problems if the reader were to blindly type in the code and expect it to run. Wait! Is not the purpose of reproducible research that code chunks embedded in the document will actually run? Here is where there is a small difficulty in using knitr versus explaining knitr. To explain knitr one needs to write the knitr commands “<<...>>=....@” into the document:

<<example, cache=TRUE>>=

# Create data

sample <- rnorm(n=1000, mean=5, sd=2)

 

# Save sample

save(sample, file=''sample.Rdata'')

@

but when the document is processed these lines are part of the control commands, and only the lines corresponding to R code will actually be printed in the resulting document. So to explain how to use knitr requires writing lines that look like code but are not executed, or to use some very clever tricks. Gandrud has clearly used the former approach, which has led to the typos. However, the author provides links to the github pages where errors/typos are addressed. If the reader finds a mistake, she can easily check this site to see if it has been reported, and if not, she can submit the error report. As for quality of the book, it covers a lot of good material, but could be improved and made more relevant with additional examples and figures.

If this book goes into a second edition, a recommendation is to include more figures! Good graphics are an important part of data analysis and particularly plots made with the ggplot2 package (Wickham Citation2009) assist with reproducibility. Graphics produced with this package have an advantage of being described using a grammar, and formed in a manner that they can be considered templates to be used again and again for different purposes. One of the difficulties in creating live documents has been to get plots produced easily. The knitr package solved this quite effectively, and with the latest version of RStudio, shiny, and ggvis packages, it is also possible to include interactive graphics into a markdown document. In contrast, there are many useful tables in the book that collect various commands (chunk options, git commands, etc.) that make great quick references. The book achieves the author’s goal: “To bring together the skills I have picked up for actually doing and presenting computational research.” (p. vii)

There is no assumption that the reader knows any statistics or mathematics. This is a very introductory level book, and even assumes nothing about the reader’s skills in R, giving an introduction to R in Chapter 3. However, the readers who will get the most use from this book are those are already working in R, and just need a way to organize their work. That being said, advanced undergraduate students in mathematics, statistics, and similar fields, as well as students just beginning their graduate studies would benefit the most from reading this book. Many more experienced R users or second year graduate students might find themselves thinking “I wish I’d read this book at the start of my studies, when I was first learning R!”

This book is not intended as a course text. In fact, Gandrud says that his goal for the book is for it to be “self-sufficient. [So that] a reader without a detailed understanding of these programs will be able to understand and use the commands and procedures [covered] in this book. [...] It can and should be read linearly as it takes you through an empirical research processes [sic] from an empty folder to a completed set of documents that reproducibly showcase your findings.” (p. 15). It seems to be intended for people doing (fairly) independent research who want to share their work along the way. However, this book actually would be a good text for beginning graduate students or advanced undergraduate students who are just starting to do technical research. Teaching from it directly may not be a good idea, but since it is very readable, assigning its chapters as weekly readings could be very beneficial, especially in classes that give an introduction to statistical computing.

The title of this book makes it appear to be very similar to Xie’s book, but it is best considered as an introduction to a lot of the tools that Xie discusses in his book. This book could be used as the main text for a class on reproducible research, with Xie’s used as a reference or as a recommended supplemental text.

Samantha Tyner

Dianne Cook

Iowa State University

Statistics Play-by-Play: Laboratory Experiments for Elementary Statistics.

Maureen Petkewich and Don Edwards. Dubuque, IA: Kendall Hunt Publishing Company, 2013, iv + 95 pp., $14.20(P), ISBN: 978-1-4652-1849-0.

This book is a collection of 12 laboratory activities that are appropriate for an introductory statistics course. The activities can be used to illustrate or reinforce a wide variety of concepts that include sampling bias, graphical representations of data, measurement error, regression for prediction, applications of probability distributions, and the sampling distribution of the sample mean. In addition, several experiments are described, which enable students to collect their own data and carry out inferential procedures that are commonly covered in the introductory statistics class. This book is not intended to be used as a stand-alone text; instead, these activities could serve as a supplement to a standard introductory book.

When I first started paging through this book, I was not particularly impressed with a few of the activities. The lab on graphical representations of data, for example, simply asks students to answer some survey questions and then has them create graphics based on the results without really thinking about how those graphs may be useful. To me, the activity was commonplace, and it failed to promote statistical thinking. An activity that I would recommend instead is described in Activity-Based Statistics by Scheaffer et al. (Citation2004). Their activity asks students to first create conjectures and to create graphics that allow students to investigate those conjectures; in addition, the questions they pose promote a much richer discussion than does the activity presented in this book. Another activity that I feel pales in comparison to an alternative presented by Scheaffer et al. is used to illustrate the sampling distribution of the sample mean. This activity involves using the software of your choice to simulate various populations and sampling scenarios, and the instructions provided are long and tedious. I prefer the activity “Cents and the Central Limit Theorem” from Activity-Based Statistics because it involves a tactile simulation study that I believe helps students better understand the simulation process. Furthermore, when I have used computer-based simulations to introduce the central limit theorem, I have had more success with web-based applets than I would with the lab presented in this text. Finally, another shortcoming I observed, in general, is that the materials presented are not necessarily classroom-ready (at least not for me—other instructors may disagree). The authors shy away from using any one software package; instead, they mention repeatedly throughout the book that specific software instructions will be provided by the instructor. I understand why they made this choice as to not preclude anyone from adopting the text. In order for me to implement these labs effectively in my courses, however, I would have to spend considerable time for some of the labs preparing specific software instructions with appropriately tailored wording and notation.

All that said, there is plenty to like about this book. While a few of the labs left me unimpressed, many of the activities presented were quite engaging. The ASA-endorsed Guidelines for Assessment and Instruction in Statistics Education (GAISE; Aliaga et al. Citation2012) encourage instructors to foster active learning in the classroom, use real data, and use assessments (such as activities) to improve student learning. Fortunately, many of the activities presented in this book present such opportunities. Several of the labs require students to carry out experiments and collect their own data to investigate a research question, which definitely fosters active learning.

One of my favorites has students suppose that they have been hired by the producers of the game show Minute to Win It to test a potential new game. The activity requires students to actually play this game to collect data that will guide their decision-making. I envision this lab generating quite a bit of excitement in the classroom while still remaining a valuable learning experience. Another activity I feel students will find engaging involves an investigation into whether referees favor competitors in combat sports (such as tae kwon do) based on the color of their gear. This activity is based on real data from an actual study, and it also draws students in with a compelling research question.

Though the activities are not perfectly aligned with GAISE (e.g., some of the lab instructions provide a detailed “recipe” for conducting the experiments versus involving the students in designing the study), overall they do present valuable learning opportunities that the instructor can use to assess and improve student learning. All activities include discussion questions, which ask students to draw conclusions and make recommendations based on what they have learned, and students’ answers to such questions will provide the instructor with valuable insight.

While I would not use every lab in this book, I have definitely found a few new activities that I hope to implement in future introductory statistics courses (though I would admittedly make some modifications before doing so). If you are looking for more hands-on activities to engage your students or for more opportunities for your students to collect and analyze their own data, then this may be a good resource for you.

Tisha L. HOOKS

Winona State University

Understanding and Interpreting Educational Research.

Ronald C. Martella, J. Ron Nelson, Robert L. Morgan, and Nancy E. Marchand-Martella. New York: The Guilford Press, 2013, xxii + 666 pp., $80.00(P), ISBN: 978-1-46-250962-1.

In Understanding and Interpreting Educational Research, Martella, Nelson, Morgan, and Marchand-Martella have written an upper-level undergraduate or master’s level textbook to broadly introduce readers to the critical consumption of research across a variety of research methods.

Each chapter begins with a list of objectives and useful figures presenting the organization of the material and concludes with a summary, discussion questions, and exercises (solutions are provided in supplementary electronic materials for instructors). Most chapters also conclude with a reprinted, published article that connects to the research method discussed in the chapter, with questions tailored toward the article’s content.

The text is broken down into seven parts containing 16 chapters. “Understanding Research” (Part I) is comprised of the introductory chapter that briefly explores different approaches to gaining knowledge and being a critical research consumer, along with a preview of each of the coming research paradigms.

“Critical Issues in Research” (Part II) extends this introduction into two chapters for the critical research consumer before delving into specific types of research methods. It investigates issues in research design and measurement through discussion of internal and external validity and their threats, statistical and social validity, reliability and validity in quantitative and qualitative designs, and interobserver agreement. A real strength of the text is that discussion on internal and external validity is extended into appropriate chapters that follow, explaining how each of these threats to validity interact with particular research designs.

“Quantitative Research Methods” (Part III) spans five chapters and receives the most attention in this text. “Basic Statistical Concepts and Sampling Procedures” (Chapter 4) concisely summarizes much of a graduate introductory statistics course in education. Scales of measurement, central tendency, spread, directional and non-directional null and alternative hypotheses in inferential statistics, Type I errors, Type II errors, the levers of power, explanations of individual parametric and nonparametric tests of statistical significance, methods for sampling and sampling error are all covered. This section might be challenging for students without previous exposure to the material, since topics such as the logic of hypothesis testing, Type I error rate and power are often difficult for students even when presented at a more reserved pace. This becomes a benefit, however, for those using the book as a reference text.

“Experimental Designs” (Chapter 5) and “Causal-Comparative Research” (Chapter 6) review several types of experimental, quasi-experimental, and pre-experimental designs, supporting many of them with research examples and corresponding statistical methods. Similar to previous chapters, “Correlational Research” (Chapter 7) does not just cover the interpretation and design, but also the concept of correlation itself, including discussion of different correlations for different measurement scales (e.g., Pearson Product Moment, Kendall’s W, etc.) and advanced methods including multiple regression, factor analysis, path analysis, and structural equation modeling. Given the previous coverage of reliability, validity, and sampling, “Survey Research Methods” (Chapter 8) is an understandably brief discussion of the process of conducting survey research, focusing in particular on considerations of design and the development of the instrument.

“Qualitative Research Methods” (Part IV) contains two chapters that offer an overview of qualitative and mixed methods research methods. Relative to the hefty coverage of quantitative research methods, this section speaks more generally across the qualitative research methods while limited attention is paid to particular qualitative methods. It is somewhat surprising that the growing field of mixed methods research is given only very brief coverage at the end of this section.

Relative to the previous two parts, the depth of the three chapters of “Single-Case Research Methods” (Part V) is more akin to the deeper coverage of “Quantitative Research Methods.” The section covers the introduction of single-case designs, including AB, ABA, ABAB, and ABCB designs; multiple baseline designs across behaviors, participants or settings; and multiple probe, multitreatment, alternating treatment, and combination designs. Ample attention is paid to the purpose of these different designs and how they uniquely attempt to demonstrate experimenter control over the dependent variable. In this way, students are not simply memorizing designs without context, but are learning to be critical of design decisions.

“Evaluation Research” (Part VI) begins with a chapter containing a brief overview of the purpose of program evaluation, followed by a discussion of the types of program evaluations, how they are done and when they should be used. A second chapter (“Evaluating the Literature,” Chapter 15) then discusses the process of research synthesis (or meta-analysis), discussing the purpose of these methods, both systematic and unsystematic approaches, methods for conducting these syntheses and when they should be undertaken.

“Action Research” (Part VII) is comprised of a single chapter, “Action Research: Moving from Critical Consumer to Researcher” (Chapter 16). The section title is bit deceiving, as it only briefly lays out some foundational components of action research before beginning a very useful discussion across research methodologies on research ethics and the dissemination of research results through academic writing.

Where this textbook demonstrates real strength is in the inclusion of actual published research articles with a series of trailing questions about the chapter’s topic as it relates to the article. Additionally, the attention to each design’s strengths and weaknesses in given settings is beneficial to budding consumers of research. Conversely, the limited coverage of qualitative and mixed methods research designs is a shortcoming of the text. While the coverage of statistics and quantitative research methods in Part III was thorough enough to serve as both a basis for immediate learning, but also as a reference for students as they move forward in their understanding, the coverage of qualitative and mixed methods research methods would need to be supplemented to serve this same goal. Altogether, this textbook is a solid option in a crowded field of books for consumers of educational research, particularly for those who want broad overviews of different research types with an emphasis on statistical methods.

Christopher M. SWOBODA

University of Cincinnati

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 106.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.