1,799
Views
0
CrossRef citations to date
0
Altmetric
Book Reviews

Handbook of Regression Modeling in People Analytics: With Examples in R and Python,

by Keith McNulty. CRC Press, Taylor & Francis Group, Boca Raton, FL, 2021, ISBN 9781032041742, xvi + 255 pp., 48 color illustrations, $63.96 (hbk).

The book starts with a Foreword by Dr. Alexis Fink and Introduction by author, both are renowned experts in the People Analytics—a specific field of statistical modeling of people in organizations, their relations, decisions, strategies, efficient fulfilment of the work tasks in regulatory requirements, and other problems of management and human resources. The human behavior and organizations assemble the complex objects described by multiple factors, and the aim of regression modeling of such systems is mostly not in prediction but rather in understanding of how people and groups can successfully work, or why some events could be critical for reaching the goals of businesses and institutions. The book encompasses a wide range of regression models for inferential statistical analysis and interpretation for various examples, with implementation in R and Python programming languages. The material is structured in twelve chapters, each divided into multiple sections and subsections.

Chapter 1, “The Importance of Regression in People Analytics,” states that “The aim of this book is to encourage inexperienced analytics practitioners to ‘dip their toes’ further into the wide and varied world of regression in order to deliver more targeted and precise insights to their organizations and stakeholders on the problems they are most interested in” (p. 1). After Francis Galton coined the term “regression,” this kind of modeling of the response variable by the independent covariates has been mostly used for predicting and forecasting the outcome by given values of the input variables. The main aim of people analytics modeling lays in measuring the influences of each input variable onto the outcome, in interpretation of meaning and importance of the individual covariates, in quantifying their impact in explaining the outcome. The models built for understanding an outcome are called inferential models, and they better serve to the goals of people analytics where the datasets are of moderate sizes, and the decisions often have a real impact on individuals.

Chapter 2, “The Basics of the R Programming Language,” introduces this language for statistical modeling from downloading to coding for numerous examples, describing tidying data and frames, work with functions, packages and libraries, pipe operator and error messages, plotting and graphing, and documenting with R Markdown package.

Chapter 3, “Statistics Foundations,” reviews topics of descriptive statistics, distributions, and hypothesis testing, with illustrations on numerical examples run by R codes for sample mean, variance and standard deviation, covariance and correlation, random variables and histograms, t-distribution and confidence intervals, testing for a difference in means, for a nonzero correlation, and Chi-square test for a difference in frequency distribution. Foundational statistics in Python is also considered.

Chapter 4, “Linear Regression for Continuous Outcomes,” explores the ordinary least squares (OLS) regression, simple and multiple models, parameters and measures of fit, coefficient confidence and predictions from a model, relevance of input variables and transformation of categorical inputs into dummy variables. Testing of a model is discussed, including assumptions of linearity, additivity, constant variance, and normality of errors distribution. Models with interaction, higher-order polynomial terms, and multicollinearity between input variables are described in numerical examples.

Chapter 5, “Binomial Logistic Regression for Binary Outcomes,” present the main approach to modeling a dichotomic outcome needed in estimations for two groups, for instance, “promoted – not promoted.” Logistic regression belongs to the class of the generalized linear models (GLM), and examples in R with this function are given on various data with plotting the results. Models by one and many input variables are considered, the coefficients are interpreted via the log odds linear link function, and goodness-of-fit and model parsimony are described.

Chapter 6, “Multinomial Logistic Regression for Nominal Category Outcomes,” continues with the outcome in several groups modeled by the so-called multinomial-logit (MNL) function. Examples of R codes are given, together with explanations of the model parameters, reference level, goodness-of-fit, model simplification, and illustration on different datasets, particularly, for choice modeling of one product versus the other ones.

Chapter 7, “Ordinal Logistic Regression for Ordered Category Outcomes”, is devoted to the dependent variable of ordinal kind, for instance, given by a Likert scale of several levels measuring job preferences. The modeling is commonly performed in the so-called proportional odds approach, also known as constraint cumulative logistic models. The model is described, the R codes are presented in several examples, with model diagnostics, explanation of the parameters, and the Brant-Wald test for checking the proportional odds assumption. Some other approaches include the baseline logis- tic, adjacent-category logistic, and continuation-ratio logistic models.

Chapter 8, “Modeling Explicit and Latent Hierarchical Structure in Data,” focuses on the structural equation modeling (SEM) used for data with a latent construct, or for individuals as members of a few different hierarchical groups. Mixed models for explicit hierarchy, models with fixed and random effects, and SEM for latent hierarchy are described, with implementation in R scripts for various datasets, including a political party data.

Chapter 9, “Survival Analysis for Modeling the Occurrence of Singular Events Over Time,” deals with the Cox proportional hazard regression, checking its assumption, and frailty models, which are run by R packages and functions on the walkthrough example of employee attrition data.

Chapter 10, “Alternative Technical Approaches in R and Python,” recommends the tidymodels (https://www.tidymodels.org) meta-package for running models in R, with a convenient output presentation with help of the broom package, and a unified interface to running models by parsnip package, illustrated on multiple examples. For Python users, the scikit-learn and statsmodels (https://www.statsmodels.org/stable/index.html) packages are recommended, and scripts for the described models are given on numerous datasets.

Chapter 11, “Power Analysis to Estimate Required Sample Sizes for Inferential Modeling,” discusses errors, effect sizes, statistical power tests for simple hypotheses, for OLS, log-likelihood and hierarchical regression models, with numerical examples run in R and Python.

Chapter 12, “Further Exercises for Practice,” suggests datasets for modeling and analysis of such problems as analyzing graduate salaries, a recruiting process, the drivers of performance ratings, promotion differences between groups, and feedback on learning programs, and suggests documenting of the results in the R Markdown or in the Jupyter Notebook. The book is finalized with the References, Glossary, and Index.

Each chapter is completed with learning exercises given in dozens of questions for discussion and data drills. The material in the book can be applied outside of the people analytics area in other disciplines, and used as a practical introduction to regression methods for students and practitioners in statistical modeling and analysis. The link to the book website is Welcome | Handbook of Regression Modeling in People Analytics: With Examples in R, Python and Julia (peopleanalytics-regression-book.org), where instructions for downloading packages with the datasets for R and Python users are given, together with an essential part of the book, and additional materials. Other resources on modeling in R and Python and on meaningful regressions can be found in the given references.

Stan Lipovetsky Minneapolis, MN
[INLINEFIGURE]

References

  • Lipovetsky S. (2020), Introduction to Data Science: Data Analysis and Prediction Algorithms with R, by Rafael A. Irizarry, Technometrics, 62, 280–282.
  • Lipovetsky S. (2020), Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, by Chester Ismay and Albert Y. Kim, Technometrics, 62, 283.
  • Lipovetsky S. (2021), Advanced Statistics with Applications in R, by Eugene Demidenko, Technometrics, 63, 273–275.
  • Lipovetsky S. (2021), Linear Models with Python, by Julian J. Faraway, Technometrics, 63, 426–427.
  • Lipovetsky S. (2021), “Game Theory in Regression Modeling: A Brief Review on Shapley Value Regression,” Model Assisted Statistics and Applications, 16, 165–168.
  • Lipovetsky S. (2021), “Modified Ridge and Other Regularization Criteria: A Brief Review on Meaningful Regression Models,” Model Assisted Statistics and Applications, 16, 225–227.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.