1,447
Views
3
CrossRef citations to date
0
Altmetric
Book Reviews

Statistical Inference via Data Science: A Modern Dive Into R and the Tidyverse

by Chester Ismay and Albert Y. Kim. Boca Raton, FL: Chapman and Hall/CRC, Taylor & Francis Group, 2020, xxx + 430 pp., $79.95, ISBN: 978-0-367-40982-1.

The monograph belongs to the The R series, and it can serve as a convenient way for learning data science and statistics simultaneously with the R language. The textbook consists of four parts, eleven chapters, and each chapter contains sections and subsections. In Preface, the authors describe the book structure and illustrate it with a pipeline going from importing data to making its tidy version, which is applied in a loop of transforming-modeling-visualizing, and finally is used for communication, or interpretation and reporting of the modeling results. Chapter 1 “Getting Started With Data in R” presents a gentle introduction into the R language and RStudio interface (compared to a car’s engine and dashboard, respectively), describes how to install them, discusses basic terminology and programing concepts, including R packages and their loading, and writing codes. On example of data from 2013 on New York and Newark 336,776 domestic flights, and hourly weather condition, it is shown how to organize data frames and to check a data using various R operators and functions.

Part I “Data Science with tidyverse” begins consideration with Chapter 2 “Data Visualization” which provides the grammar of graphics implemented in the gglot2 package. Graphs of various kind and features, including scatterplots and line-graphs, box- and bar-plots, pie charts and histograms with multiple facets are shown on different datasets, particularly, for the mentioned airports data, comparing number of flights by carrier and origin, weather conditions in the airports, and arrival versus departure delays. Chapter 3 “Data Wrangling” describes some functions from dplyr package for transforming data to suit a needed aim, for instance, it discusses the pipe operator % >% for performing subsequent actions, shows how to filter rows and columns, to summarize variables, to group observations, to sort rows, to join data frames, to match names, to use multiple key variables, and more. Chapter 4 “Data Importing & ‘Tidy’ Data” considers how to import spreadsheet data using the R console and RStudio with help of the readr package, and how to get the tidy organized version of a data frame (with one observation in a row and different variables in each column, with all data available) using the tidyr package.

Part II “Data Modeling With moderndive” starts with Chapter 5 “Basic Regression” describes modeling for explanation and for prediction, with help of the moderndive package, as well as with the purr, tibble, stringr, forcats, and skimr packages. Exploratory data analysis (EDA) is discussed with correlations and simple linear regression by numerical or/and categorical predictor. Chapter 6 of “Multiple Regression” uses the mentioned packages to build models with two and more predictors, with various related topics from model selectin to Simpson’s paradox.

Part III “Statistical Inference With infer” starts with Chapter 7 of “Sampling” describes terminology and the main concepts, including random, unbiased, and representative sampling, generalizability and central limit theorem, with numerous examples and illustrations in histograms. Chapter 8 “Bootstrapping & Confidence Intervals” describes resampling with replacing in computer simulation, and bootstrap for summary statistics, including estimation of standard error for constructing confidence intervals, performed with help of infer package for statistical inference on different datasets. Chapter 9 “Hypothesis Testing” continues with using resampling and permutation tests for checking a null versus an alternative hypothesis with help of the infer package. Type I and II errors are discussed, together with choice of alpha and beta probabilities, z- and t-statistics for one and two samples, and interpretation of the results on p-values and their visualizing in histograms. Multiple datasets examples are used, including the same data on the airports for comparing airtime of Hawaiian and Alaska airlines flights departing. Chapter 10 “Inference for Regression” deals with standard error of models, test statistics for regression parameters, their p-values, linearity of relationship, and simulation-based inference.

Part VI “Conclusion” finalizes in Chapter 11 “Tell Your Story With Data” the journey through the book via mapping a flowchart of the data science pipeline consisting of all the stages of the considered modeling, and it also adds more examples for multivariate visualization, regression analysis, and predictions. The book is closing with the Appendix A of “Statistical Background” with main statistical formulae on several characteristics and normal distribution, and the Appendix B “Versions of R Packages Used” listing the packages used. There are also Bibliography of the most recent sources and the Index.

The monograph supplies multiple links to the websites of the R packages and related statistical methods, and the online version of the book with all the codes and outputs is available at moderndive.com. The textbook presents to students and researchers a very useful introduction to the data science and contemporary R programing, with numerous examples of R implementation for solving various problems of statistical estimation and inference.

Stan Lipovetsky
Minneapolis, MN

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.