103
Views
0
CrossRef citations to date
0
Altmetric
Book Reviews

Book Reviews

 

Computational Systems Biology of Cancer, by Emmanuel Barillot, Laurence Calzone, Philippe Hupe, Jean-Phillipe Vert, and Andrei Zinovyev. Boca Raton, FL: Chapman and Hall/CRC, 2016, xxix + 423 pp., $80.00, ISBN: 978-1-4398-3144-1.

The preface of this book reminds us that “cancer is a complex and heterogeneous disease…” “It is a constellation of diverse and evolving disorders that are manifested by the uncontrolled proliferation of all that may eventually lead to fatal dysfunction of the host system.” A basic question is “…how to achieve an in-depth yet realistic understanding of cancer dynamics. Computational systems biology is one prong of the research attack on this question.” This book “…explains how to apply Computational Systems Biology approaches to cancer research.” The chapters are:

1.

Introduction: Why Systems Biology of Cancer

2.

Basic Principles of the Molecular Biology of Cancer

3.

Experimental High Throughput Technologies for Cancer Research

4.

Bioinformatics Tools and Standards

5.

Exploring the Diversity of Cancers

6.

Prognosis and Prediction: Towards Individualized Treatments

7.

Mathematical Modeling Applied to Cancer Cell Biology

8.

Mathematical Modeling of Cancer Hallmarks

9.

Cancer Robustness: Facts and Hypotheses

10.

Cancer Robustness: Mathematical Foundations

11.

Finding New Cancer Targets

12.

Conclusion

As Terry Speed, a pioneer in the Biological Statistics of Cancer says: “…think of this as your guide book to the field, as well as a way to get started in it . . .As the field of cancer research progresses and evolves, new mathematical models must be developed and applied, to aid in the understanding of human biological systems” especially in the biology of cancer. This book is an excellent introduction to the field.

David E. Booth

Kent State University

Excel 2013 for Physical Sciences Statistics, A Guide to Solving Practical Problems, by Thomas J. Quirk, Meghan H. Quirk, and Howard F. Horton. Switzerland: Springer, 2016, xviii + 242 pp, $54.99 (e-Book), $69.99 (softcover), ISBN: 978-3-319-28963-2 (softcover), ISBN: 978-3-319-28964-9 (ebook).

This book is one of the series of short books on Excel written by the author (s) for different disciplines. This book like the rest, presents some important features of a powerful software known for its interesting features along with several statistical topics. Excel offers many interesting features useful to statisticians, and other disciplines. The authors go through several features of Excel as well as statistical topics along with some hands on examples related to the area of Physical Sciences. As a standalone source, a prior exposure to introductory statistical topics is very helpful in understanding the topics covered in the book. It is a good supplement to an introductory course in statistics for nonmajors. The book is divided into eight chapters and five appendices. Each chapter is followed by a good number of exercises very helpful to the students.

Chapter 1 covers a number of descriptive statistics such as sample size, mean and standard deviation along with some Excel functions such as AVERAGE, COUNT, and STDEV, which are used to complete each statistical function. The authors cover in detail, step by step showing how to get the worksheets ready and then perform the statistical functions and then show how to save the worksheets.

Chapter 2 is a short one focusing on creating frame numbers for generating and sorting random numbers and also shows how to fit the entire output on one page.

The topics covered in Chapter 3 include population mean, upper and lower confidence bounds for population mean, and hypothesis testing. The authors clearly explain how and where the components of confidence intervals come from. Excel provides an easy to use built-in function (TINV) for computing confidence interval. Hypothesis testing about the population mean including the steps needed to conduct such a test is covered in this chapter. Alternative ways of summarizing the results of such hypotheses and different ways of rejecting the null hypotheses are discussed. As I mentioned in my review of another book from the same first author, my first comment about the book is that the authors use the phrase “Accepting the null hypothesis.” It is important to present the nonmajors with the correct statistical terminologies.

Chapter 4 focuses on the seven-steps for hypothesis testing using the one-group t-test and explains in detail how to conduct and make conclusions using the t-test or confidence intervals. My second comment about the book is the lack of discussion on one-sided tests covered in this chapter and Chapter 3. My third concern is the fact that the book does not offer any comments on one or two population proportions.

Chapter 5 is the extension of the previous chapter including two-group t-test of difference of the means, the nine-steps for hypothesis testing using the two-group-test. Again like the other books in the series, only two-sided tests are presented.

The next topic covered in Chapter 6 is the nine-steps for computing the correlation coefficient and drawing the regression line and showing how to manage to print the data and the chart on one page. Correlation coefficient is computed either using the related formula or the Excel built-in function CORREL. It is also shown that how to plot data points and then fit a regression line to such data and make predictions. Furthermore, in this chapter the authors show how to add Data Analysis ToolPak to the software, which normally is not installed by default. Of course, this adds-in is needed to implement most of statistical features of Excel. One comment should be added here to remind the readers that the regression line can only be used to predict values for the response variable within the range of the independent variable(s).

Chapter 7 is an extension of the previous chapter to the case of multivariate problems by introducing multiple correlation and multiple regression topics. Because the book is written for students majoring in areas other than statistics, the authors deploy Excel capabilities namely, the Analysis ToolPak to perform these tasks.

The book concludes with the topic of interest to nonmajors, that is, ANOVA and shows how to put pieces together to form the ANOVA table, and also how to test the difference between two group means using the t-table.

Appendix A contains answers to exercise appearing at the end of each chapter. Appendix B consists of some test problems dealing with the topics covered in the book and Appendix C provides answers to these questions. Appendix D presents the readers with some statistical formulas used in the text and finally, Appendix E includes a short form of the t-table.

In sum, my closing comments on this book are similar to those made in review of one of the books in this series. This book is a well-written short book on the subject of statistical features of Excel 2013 with many examples and exercises related to the target audience. It is a great reference for those who need to use Excel for some introductory statistical concepts or as a supplement in an introductory course in statistics for nonmajors. I raised a number of concerns above. To make this a complete textbook on the subject, a number of topics should be corrected or added to it. First, the phrase “Failing to reject the null hypothesis” should replace “Accepting the null hypothesis.” Second, testing hypotheses on one or two population proportion(s) should be added to the book either as an independent chapter (s) or an addition to the appropriate chapters. Third, two-sided tests should be included in the appropriate chapters. With these additions, the book would be an excellent reference on the subject.

Morteza Marzjarani

Saginaw Valley State University (retired)

Python for Probability, Statistics and Machine Learning, by José Unpingco. Switzerland: Springer, 2016, viii + 276 pp., $119.00, ISBN: 978-3-319-3075-2.

The purpose of this book is to introduce scientific Python to those who have a prior knowledge of probability and statistics as well as basic Python. As the author says in the Preface, “This is not a good first book in any of these topics…” The list of chapters is the following:

1.

Getting Started with Scientific Python

2.

Probability

3.

Statistics

4.

Machine Learning

Topics covered include Hypotheses Testing and p-values, ROC Curves, Testing Multiple Hypotheses, Regression, Univariate Robust Statistics, Bootstrapping, Nonparametric Methods, Cross-Validation, Random Forests, Logistic Regression, Lasso, Support Vector Machines, and much more. All codes and datasets are available for download.

In my opinion, this is a very valuable reference for those wishing to use these methods in a Python environment. It is quite helpful to have executable code for many of the basic but important procedures. I would strongly recommend this book for the intended audience or as a reference work. Further references are included in the book so a start into a deeper look at these topics is provided. I am very enthusiastic about the book for those that want to use Python for these purposes.

I am a bit less enthusiastic about the book as a text for a course because the text does not avoid topics like Convergence in Probability, Matrix Algebra, Lebesque Integrals and so on. I would suggest the audience should have the usual two semester course in Mathematical Statistics prior to using the book as well as some linear algebra. Alternatively, the book could profitably be used for a lab in conjunction with the Mathematical Statistics course. The book does not contain exercises in the usual sense so the instructor must supply those that s/he wishes to use. All in all, I strongly recommend this book for those who want to use Python in this area.

David E. Booth

Kent State University

Modern Statistical Methods for HCI, by Judy Robertson and Maurits Kaptein, (Eds.). Switzerland: Springer, 2016, xx + 348 pp., $129.00, ISBN: 978-3-319-26631-2.

The monograph belongs to the series on the multidisciplinary field of human-computer interaction, HCI. It contains collection of papers by internationally recognized experts in modern statistical analysis who discuss how to interpret the p-value correctly to improve understanding and adequate application of statistical hypotheses testing. The editors aim to demonstrate the advances of modern R statistical packages, which can help to “fair statistical communication” and “provide scholars and practitioners with the methods and tools to do so” (p. 13). The book is structured in five parts and 14 chapters/papers within. Each chapter presents R language codes, and explains the results obtained.

In introduction, Chapter 1, the editors share their main thoughts on HCI developments, particularly, on the null hypothesis significance testing (NHST) and the corresponding p-value. Review of modern literature indicates that in psychology, cognitive science, bioinformatics, and economics the Bayesian alternative to NHST is often suggested but it should not be a blind substitution of one with another. Many authors consider sample size and overestimation of the effects, meaningful choice of the null hypothesis, usage of nonappropriate tests for distributions observed in data, and erroneous testing with p-values misinterpreted and abused. Actually, p-value does not define is a hypothesis true or false in a particular experiment, but it only provides “a description of the long term Type I error rate for a class of hypothetical experiments—most of which the researcher has not conducted” (p. 7). The editors give another clarifying example of p-value interpretation in relation to the fallacy of the transposed conditional probabilities. For a big t-value, the corresponding p-value is small, so it is often misunderstood as a low probability that the null hypothesis is true. Such understanding corresponds to the conditional probability P(H0|D) of null hypothesis H0 subject to the given data D. However, p-value rather quantifies P(D|H0)—the probability of the data given that H0 is true. Bayes formula connecting the conditional probabilities as prior and posterior values yields a well-known result that those can differ. Multiple tests and degrees of freedom are also discussed.

In Part I of “getting started with data analysis,” Chapter 2 introduces the R language used further throughout the book, from basic definitions to functions in packages. Chapter 3 continues with descriptive statistics and exploratory data analysis presented by science and art of data visualization via ggplot2 package in R grammar for effective communication of information. Chapter 4 focuses on handling missing data of different types by multiple imputation performed with help of mice package.

Part II is devoted to classical NHST and its proper application. Chapter 5 explores quantification of effect size, which should be accounted in using p-value for different tests, and describes power analysis in results interpretation. Chapter 6 discusses repeated measures and within-subjects analysis of variance (ANOVA) calculations, and time series with event-history analysis performed with survival R package by Kaplan–Meier product limit estimate, log-rank test, and Cox hazard modeling. Chapter 7 deals with various nonparametric statistics and their parametric NHST analogs, employing such libraries as XNomial, coin, reshape2, ez, ARTool, lsmeans, nnet, MASS, car, multcomp, lme4, geepack, and more known plyr.

Part III focuses on Bayesian inference, where Chapter 8 introduces its general principles and provides examples to quantify the posterior probability in comparison of two binary or two numeric variables, for Bayesian regression, using R codes in packages BEST, and especially Zelig which build Bayesian linear, logit, multinomial-logit, and probit models. Chapter 9 continues with Bayesian testing of multiple constrained hypotheses for ordered groups. Another software for Bayesian inequality and equality constrained model selection (BIEMS) for multivariate tests is used, which can be freely downloaded from the author of this chapter site https://jorismulder.com.

Part IV presents advanced modeling for HCI problems. Chapter 10 considers latent variables in path and structural equation models using lavaan package in R. Chapter 11 describes generalized linear models (GLM) and generalized linear mixed models (GLMM, also known as random effects or hierarchical models) used for complex data with nested structure. Regularization in ridge regression and other specifications in R are described as well. Chapter 12 deals with latent profile and latent class analyses (LPA and LCA, for continuous and categorical variables, respectively), which help to reduce the variables into groups and perform data clustering using R packages mclust and poLCA.

The final Part IV of the book extends the current analysis practice to novel approaches. Chapter 13 discusses how to improve the standard NHST to a “fair statistical communication” based on estimation rather than testing. Various misinterpretations of p-value and the alpha cut-off are considered on clear examples, and using informative charts and confidence interval estimation of effect sizes are suggested for reporting. These ideas the author presents also at his site: http://www.aviz.fr/badstat. In the last Chapter 14, the editors resume with a series of recommendations for improving statistical methodology and practice in HCI. Reviewing a literature on HCI, the authors notice that p-value is misused in two conclusions: first, rejection of null hypothesis H0 is understood as evidence in favor of the alternative one, although this alternative H1 was not tested; second, a failure to reject H0 is interpreted as evidence in favor of H0, which can be not the case. Also many reviewed papers do not perform power analysis for Type II errors, effect size predictions in their hypotheses, or multiple comparisons for Type I errors. The editors give numerous recommendations on improving methodologies of statistical hypotheses testing, which can ensure “that most of our research findings are not false, but also that they are actually useful” (p. 347).

In each part of the monograph the editors recommend several additional books on the topic. Each chapter presents multiple references and numerical illustrations for practical guide to writing codes in R. More supplements on data and R codes are available at http://extras.springer.com. The book can serve to students and practitioners in various fields where applied statistics is used so understanding hypotheses testing is needed for analysis and meaningful decision making.

Stan Lipovetsky

GfK North America, Minneapolis

Seasonal Adjustment Methods and Real Time Trend-Cycle Estimation, by E. B. Dagum and S. Bianconcini. Switzerland: Springer, 2016, xvi + 283 pp., $129.00, ISBN: 978-3-319-31820-2.

The monograph belongs to the series of Statistics for Social and Behavioral Sciences. It is written for readers with a good knowledge of matrix algebra, multiple regression, and time series modeling. It consists of two opening chapters presenting a general introduction and time series components, and the remaining nine chapters are grouped in two parts presenting seasonal adjustment methods and real time trend-cycle methods, respectively.

Chapter 1 gives a summary on three categories of seasonal adjustment methods: smoothing linear models, ARIMA (autoregressive integrated moving average) models, and structural time series models. Respectively, three main software packages correspond to them: X12ARIMA, TRAMO-SEATS (TRAMO stands for time series regression with ARIMA noise, missing observations and outliers; and SEATS—for signal extraction in ARIMA time series), and STAMP (structural time series analyzer, modeler, and predictor). Two former softwares are adopted officially by statistical agencies, and the last one is mainly used in econometrics and academia studies. ARIMA models of multiplicative Box-Jenkins type applied to seasonal series yt can include the nonseasonal backshift autoregressive (AR) polynomial lag operator of order p, the seasonal autoregressive (SAR) operator of order P, also the nonseasonal moving average (MA) operator order q, and the seasonal moving average (MAS) operator order Q, applied to iid white noise at. Additionally, there can be operators of the orders d and D for the nonseasonal and seasonal differencing, respectively, so a general model can be defined by its orders (p, d, q)(P, D, Q). Seasonal adjustment method based on ARIMA model decomposition was developed in SIGEX (signal extraction) approach, and later in TRAMO-SEATS software, where the first part (TRAMO-) estimates via regression the deterministic component and removes trend from data, and the second part (-SEATS) estimates the seasonal and trend-cycle components by ARIMA modeling. Interface software of X12ARIMA and TRAMO-SEATS is called Demetra+. Various characteristics and options of these software are described, for instance, optimal forecasts by Kalman filter, and canonical decomposition of the series onto the trend, seasonal, and cyclical components, and the remained stationary dynamic features. Structural time series presented by quarterly or monthly observations usually contain the trend, seasonality, and irregular components, all assumed uncorrelated. Trend is commonly specified as a random walk with a stochastic drift, and seasonal component is provided by a set of dummy variables for time periods in a year, with random disturbances. Alternatively, the seasonal pattern can be modeled by a set of trigonometric terms. The models can be presented in a state space form, with possible inclusion of additional covariates for capturing specific dynamic variations for holidays, trading days, and other effects. Recent developments on real time trend-cycle estimations include recession and recovery analyses based on socioeconomic indicators of leading, coincident, lagging percentage changes, and turning points, which are taken from a current business cycle and compared with past time periods. Henderson linear filter based on the reproducing kernel Hilbert space (RKHS) methodology and the nonlinear Dagum filter (NLDF) based on the 13-term asymmetric Henderson filter (H13) are described, together with other related techniques.

Chapter 2 introduces additive and multiplicative models for time series decomposition into latent components associated with a long-term or secular trend superimposed with cyclical movements, which constitute the business cycle, and seasonal variations with the remaining irregular component. Series can be affected by other variations, such as trading day, moving holiday, or festival activity. Herman Wold's theorem demonstrates that for a second-order stationary or weakly process with constant mean and variance, the process can be presented as infinite MA with uncorrelated white noise processes, which justifies applicability of autoregressive moving average (ARMA) and its extensions to ARIMA and regression-ARIMA (regARIMA) models. Trend estimation can be made by deterministic or stochastic models, where the former ones use approximations by polynomial and other mathematical functions, and the latter ones employ finite differences of low order with autoregressive components and moving average errors. The business cycle, as a quasi-periodic oscillation lasting about 3–5 years or more, can be modeled similarly to trend by deterministic model of a Furrier series kind and by stochastic models of ARIMA type, such as autoregressive models of order two with complex roots. The seasonal variations are related to climate season impact on agriculture, trade, energy consumption, holidays, and academic year retail of goods and services. Seasonal adjustment methods include moving averages or linear filters, and explicit models with a small number of parameters for each component, such as sine and cosine, local polynomials or splines, and locally weighted regression (LOESS) smoother. Models for moving holiday and trading day components are described as well. Irregular component represents unpredictable events of any kind, for instance, floods and snowstorms could yield power blackouts and have longer effects that are described by intervention models.

The next Part I of the book is devoted to seasonal adjustment methods in detail presentation with examples of application. Chapter 3 deals with causes and characteristics of seasonality, discusses the meaning, purpose, and methods of seasonal adjustment. Models are divided into two categories, depending on assumption that generating process of seasonality varies only in amplitude, or in both amplitude and phase. The main reason of seasonal adjustment consists in removing this component from socioeconomics series in interest of policy and decision making. Various models, including two methods officially adopted by statistical agencies, X12ARIMA and TRAMO-SEATS, are presented. Chapter 4 describes symmetric and asymmetric smoothing filters, particularly, the Census II X11, X11ARIMA, and its enhancement in X12ARIMA, and regARIMA methods for modeling, forecasting, testing seasonality, and residual seasonality. Examples on seasonal adjustments of the U.S. New Orders for Durable Goods (NODG) series are presented in multiple tables and graphs. Similarly, Chapter 5 describes the TRAMO-SEATS modeling with application to NODG data, including decomposition performed by symmetric and bi-infinite filter in Wiener–Kolmogorov (WK) analysis. Chapter 6 considers structural time series models applied in STAMP software, and illustrates them on the US unemployment rate data.

The last Part II of the book describes the trend-cycle estimation methods and problems of short-term time series. In contrast to a long cycle, such as the Kondratieff economic cycle of 47–60 years, the short-term trend usually includes cyclical fluctuations and is referred as trend-cycle. Chapter 7 describes the global trends evaluation by deterministic polynomial and exponential regressions, and by stochastic ARIMA, TRAMO-SEATS, and STAMP modeling, and the local trends evaluation by X11ARIMA, X12ARIMA, LOESS, Henderson and Gaussian smoothing filters, and cubic splines. Chapter 8 continues with Henderson trend-cycle filter in NLDF modification, its approximation in the cascade linear filter (CLF), RKHS developments, symmetric, and asymmetric smoothers in kernel representations. Chapter 9 generalizes fitting and smoothing nonparametric techniques based on density functions, local polynomials, graduation theory, and spline regression into the unified approach provided by RKHS methodology, with empirical evaluation on a set of 50 time series taken from Hyndman's time series library. Chapter 10 considers real time trend-cycle predictions derived from asymmetric MA filters incorporated in the X12ARIMA used by the majority of official statistical agencies in the recession and recovery analysis for finding turning points in the main socioeconomics indicators. Properties of RKHS and Musgrave filters and optimal bandwidth selection are described, and empirical application is performed for various US economic indicators by data from Federal Reserve Bank, Bureau of Labor Statistics, and National Bureau of Economic Research, for multiple indices. Chapter 11 concludes with estimation the effects of the seasonal adjustment ARIMA and TRAMO-SEATS methods when the real time trend-cycle is predicted with nonparametric kernel RKHS filters. Results on US leading, coinciding, and lagging indicators are compared and discussed.

Each chapter is completed by a list of the most recent references, and the book contains a list of acronyms and glossary, which facilitates reading throughout multiple terms conventional in this field. For professionals and students dealing with time series data the monograph can be very useful as a guide in the wide-ranging area of modern modeling and forecasting methods and software.

Stan Lipovetsky

GfK North America, Minneapolis

Editor Reports on New Editions, Proceedings, Collections, and Other Books

Big Data Analytics for Genomics, by Ka-Chun Wong (Ed.). New York: Springer, 2016, viii + 397 pp., $129.00, ISBN: 978-3-319-41278-8.

This edited volume is intended to showcase the current research on big data analytics for genomics, a highly specialized but an important topic of continued research and interest.

The refereed volume is consisting of 13 chapters and logically divided into three parts.

Part I – Statistical Analytics (4 chapters, 167 pages)

Part II – Computational Analytics (4 chapters, 146 pages)

Part III – Cancer Analytics (5 chapters, 81 pages)

It is evident that relatively more work is devoted to statistical analytics and computational analytics. The editors have done a good job in preparing the Preface, it also gives a useful introduction of the all the chapters, a brief survey.

Below is a biased selection of topics of the respective papers in the book:

Introduction to Statistical Methods for Integrative Data Analysis in Genome-Wide Association Studies

Causal Inference and Structure Learning of Genotype-Phenotype Networks Using Genetic Variation

Genomic Applications of the Neyman-Pearson Classification Paradigm

Improving Re-annotation of Annotated Eukaryotic Genomes

A Survey of Computational Methods for Protein Function Prediction

Perspectives of Machine Learning Techniques in Big Data Mining of Cancer

NGS Analysis of Somatic Mutations in Cancer Genomes

A Bioinformatics Approach for Understanding Genotype-Phenotype Correlation in Breast Cancer

The edited volume is well-organized, structured, and topics appeared sequentially. Most of the chapters are self-contained. The book includes many useful topics and methodologies for researchers and practitioners alike for the analysis of genomic data. Generally speaking, the chapters are written in such a way that makes it accessible to readers with moderate to strong knowledge of probability and statistics. However, the volume is designed for more of the application side rather than developing new theories. One of the main strengths of the volume is the use of existing methodology and concepts that are clearly presented and are generously illustrated with data examples. The mathematical treatment and formulation is moderate and the narrative is reasonably modern and clear.

In summary, this is a good collection of work in one place; I think this volume will attract a broader audience. I enjoyed reading a few chapters of the book and found them interesting and useful.

Big Data and Social Science, by Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, and Julia Lane (Eds.). Boca Raton, FL: CRC Press, 2016, xix + 356 pp., $62.63, ISBN: 978-1-4987-5140-7.

This contributed volume attempts to summarize a host of strategies of data science, a new discipline in its own right, to solve some real problems arising mostly from social science. The book provides tools, techniques and tips how to identify and gather the data and how to use data science strategies in an effort to get some meaningful answers. Some of the prominent features of the book are:

Hands-on approach to handling so-called big data in social science

Provide tools to deal with big data equally useful for both social and data scientists

Showcases real-world applications for a smooth understanding of concepts and methodologies

Links computer science concepts to social science research

Provides data, code, and programming exercises, an important contribution

The book is divided in three parts including 12 chapters including Introduction chapter.

Part I – Capture and Curation (4 chapters)

Part II – Modeling and Analysis (3 chapters)

Part III – Inference and Ethics (remaining chapters)

This is a well-written book and showcases a good number of examples and applications to demonstrate how the methods are actually used in real life situation using real datasets. Further, topics at hand are motivated by social science data. The format is standard; however, it showcases most of figures in colors for accessibility. The chapters are nicely structured, well presented and motivated by data examples. The main strength of the book is that it still offers a good number of applications that are based on real datasets emerging from social science perspectives. The book will be useful to students, practitioners, and data analyst in the respective fields. The editors did a very good job introducing the book, it aims and goals, intendent audience, clarifying underneath concepts and phrases, a must read before moving to other chapters.

Below is a biased selection of topics of the respective chapters in the book:

Record Linkage

Databases

Programming with Big Data

Machine Learning

Text Analysis

Networks: The Basics

Information Visualization

Errors and Inference

Workbooks

Again, the chapters of the book are clearly written and accessible. The content of the book is at the right level in most of the chapters. It offers readers a number of worked examples, datasets, and explores a host of applications.

Advanced Statistical Methods in Data Science, by Ding-Geng (Din) Chen, Jiahua Chen, Xuewen Lu, Grace Y. Yi, and Hao Yu (Eds.). New York: Springer, 2016, xvi + 222 pp., $142.86, ISBN: 978-981-10-2593-8.

Data science is an emerging field and has been arguably recognized as a science around 15 years ago.

The aim of edited volume is to provide statistical methodological developments and emerging strategies in the arena of data science. The volume is consisting of 12 contributed chapters and equally grouped in three parts:

Data analysis based on Latent or Dependent Variable Models

Life Time Data analysis

Applied Data Analysis

The almost all chapters of the book are reasonably well written and clearly presented. The handbook is organized and structured well; chapters are self-contained. The editors did a job introducing the all the chapters of book in Preface. The mathematical/technical level of the book is at the right level in most of the chapters for the targeted audience. Some chapters are relatively more applied in nature than other, there is a good blend of theory and application.

Below is a biased selection of topics of the respective chapter in the book:

Regularization in Regime-Switching Gaussian Autoregressive Models

Modeling Zero Inflation and Overdispersion in the Length of Hospital Stay for Patients with Ischaemic Heart Disease

Group Selection in Semiparametric Accelerated Failure Time Model

A Proportional Odds Model for Regression Analysis of Case I Interval-Censored Data

Empirical Likelihood Inference Under Density Ratio Models Based on Type I Censored Samples: Hypothesis Testing and Quantile Estimation

Maximum Smoothed Likelihood Estimation of the Centre of a Symmetric Distribution

Modelling the Common Risk Among Equities: A Multivariate Time Series Model with an Additive GARCH Structure

In conclusion, this handbook has a good collection of material on useful and interesting topics on data science. The book will be useful to graduate students and researchers interested in gaining perspectives and knowledge on this useful topic. The book comprises a wealth of information, a one-stop shopping, and can be served as a research reference book.

The readers can be benefitted by reading volumes and a special issue on data science and related subjects edited by Ahmed (Citation2014, Citation2017) and Xu, Hajiyev, and Ahmed (Citation2016).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.