300
Views
0
CrossRef citations to date
0
Altmetric
REVIEWS OF BOOKS AND TEACHING MATERIALS

Reviews of Books and Teaching Materials

Pages 424-433 | Published online: 21 Nov 2016
 

Hongjian Zhu

The University of Texas Health Science Center at Houston School of Public Health

Models for Dependent Time Series. Granville Tunnicliffe Wilson, Marco Reale, and John Haywood. Boca Raton, FL: Chapman & Hall/CRC Press, 2015, xv + 323 pp., $89.95 (H), ISBN: 978-1-58-488650-1.

This book is concerned with the statistical analysis of multiple dependent time series. According to the preface, Modeling Dependent Time Series is written with the objective of communicating an appreciation and understanding of the issues that arise, and the methodology that can be applied, when dependence between time series is being modeled. Standard texts for time series analysis are concerned with the modeling and theory of time series in the time or frequency domain; the graphical modeling approach for time series has recently become very popular in statistics and theory and methods for its use have been greatly developed and extended. This book covers the three important pillars of multiple time series—vector autoregressive modeling, spectral analysis, and graphical models—a useful characteristic for a modern book on time series since each brings new insights to the analyses and each has the ability to complement the other. The book is well-written and should be accessible to anyone with a good understanding of multiple linear regression.

The first chapter gives an introduction to time series using examples and provides a flavor of the topics covered in the rest of the book. Basic statistical terminology essential to understanding the contribution of each chapter is introduced here in a concise manner. Chapter 2 deals with vector autoregressive modeling. It begins with a brief description of second-order stationarity, followed by fundamental concepts like autoregressive (AR) approximation of time series, multistep AR prediction, vector autoregressive moving average models, and their state space representation, moving on to other topics and then finally setting the ground for spectral analysis, which is the subject of Chapter 3. The focus here is on motivating the idea of AR modeling in the context of minimum mean square error prediction.

Chapter 3 briefly introduces spectral analysis as a preliminary to understand spectral properties of the AR model. The exposition of concepts is clear with limited use of formal definitions and theorems. References to standard texts on spectral analysis of multiple time series are provided for the reader interested in checking theoretical details. Some of the terminology is different from that used in most standard texts on this topic, for example “Fourier coefficients” are termed “harmonic contrasts,” where the term “contrasts” is borrowed from the language of analysis of variance. But this is clearly mentioned early in the chapter and therefore should be easy for someone referencing different books. Given the emphasis on the practical aspect of time series analysis, a few things could have been better. For example, the periodogram is referred to on page 65 of the book but its easy computation via the Fast Fourier transform (FFT) is not mentioned until the complex exponential notation is introduced in Chapter 4; second, in the last section of this chapter the representation of time series as linear functions of the spectral coefficients is motivated and discussed without a formal reference to the spectral representation theorem, a very important theorem in spectral analysis. The main strength of this chapter is the section on practical examples of spectral analysis using six datasets. In this section, the authors address topics such as the choice of sampling interval, effects of aliasing, correction for seasonal effects, and other issues pertaining to practical spectral analysis.

Chapter 4 deals with the fitting of a vector autoregressive (VAR) model to an observed time series. It outlines three methods for estimating model parameters—(i) solving the Yule–Walker equations, (ii) least-square regression on lagged values, and (iii) Gaussian maximum likelihood estimation (MLE). The methods are illustrated with an example including a discussion on how the spectrum of a VAR (p) model can be used to check if the fitted model provides an adequate representation of the structure. A clear comparison of the three estimation techniques as well as a detailed discussion of concerns when using the AIC for order selection makes this chapter particularly useful. VAR models with exogenous variables and the Whittle likelihood of a time series are also discussed in the last two sections of the chapter.

Chapter 5 is on graphical modeling of structural VARs. Unlike standard or canonical VAR models where the dependence is only on past values, a structural VAR allows one to capture dependence between innovations by including explicit terms for dependence between current variables. The authors clarify the usage of the term structural in a structural VAR as opposed to its original use for structural model in econometrics. They introduce the conditional independence graph (CIG) and discuss how a CIG can be constructed using partial correlation between normally distributed variables. Relevant tests for significance of partial correlations used in the graphical modeling literature are also provided. The construction of a CIG is illustrated using the example of the flour price series innovations. Subsequent sections are devoted to understanding how one can draw conclusions about possible directed acyclic graph representations of the variables; properties of partial correlation graphs; simultaneous equation modeling; and the last two sections give a detailed construction of a structural VAR model for the pig market series.

Chapter 6 introduces an extension of the VAR model, the so-called VZAR model, and its key properties. It discusses how this model can be used to approximate a stationary multivariate process with specified covariance function or spectrum to the same degree of accuracy as a canonical VAR but using a lower order model. Extensions of Yule–Walker fitting, lagged regression, MLE, and order selection discussed in Chapter 4 for VAR models, are now presented for the VZAR model. The VZAR model is illustrated using the infant monitoring series. Chapter 7 on continuous time VZAR models follows a similar set-up where the continuous time versions of AR are introduced followed by the continuous time VZAR (VCZAR). The three estimation routes, that is, Yule–Walker fitting, lagged regression, and MLE are discussed and an illustration using the infant monitoring series is provided.

Chapter 8 is a practical guide to modeling irregularly sampled time series. Time series from Kirchner and Weil (Citation2000) giving the rates of extinctions and originations of species from the fossil record accumulated over stratigraphic intervals of varying lengths are used to demonstrate the application of the VZAR models discussed in Chapter 7. In doing so, issues of practical concern such as the choice of the rate parameter for the VZAR model and the possibility of observation noise are carefully addressed. The authors move on to discuss and further demonstrate how one can deal with issues which arise in selecting suitable models for this time series, precisely, the fairly small number of observations, presence of observation noise, effect of trend estimation on removing low-frequency variation, and nonlinearity of the models. To gain further insight, the authors examine the sample spectra of the series to obtain guidance on how a good model may be found. Modeling for regularly sampled bivariate series using VCZAR is illustrated with the gas furnace series introduced early in the book. Its extension to irregularly sampled bivariate series with application to the joint modeling of an extinction-originations time series is presented in the last section of the chapter.

Links between graphical, spectral, and VZAR approaches are presented in Chapter 9. For example, how partial coherency graphs can be applied in the frequency domain to gain further insight into the graphical modeling approach introduced earlier; how partial correlations graphs can be extended to identify structural forms of the VZAR model, etc. The final section of this last chapter also outlines some of the possible extensions that the reader is encouraged to investigate.

As discussed in the paragraphs above, the authors are successful in communicating concepts central to modeling time series in the time and frequency domain as well as using the graphical modeling approach. The numerous examples used to illustrate techniques covered in the chapters are easy to follow and this makes the book very useful. It does not include any exercises and therefore may not be suitable for someone new to time series who is trying to learn without guidance. However, with an experienced teacher the book has great pedagogical strength. Given the range of topics covered in this book, I would consider it useful not only for students but also for others doing research in this area. The choice of content for the chapters as well as the references for topics covered in the book is excellent. The book may be used as a standalone text, or to supplement standard texts on time series. In my opinion it is a valuable addition to the literature on time series analysis.

Swati Chandna

University College London

Modern Adaptive Randomized Clinical Trials: Statistical and Practical Aspects. Oleksandr Sverdlov, ed. Boca Raton, FL: Chapman & Hall/CRC Press, 2015, xvii + 515 pp., $ 109.95 (H), ISBN: 978-1-48-223988-1.

Consisting of 23 chapters, each written by individual authors, the book covers many interesting topics about adaptive randomization and provides insights from different perspectives. The book answers questions such as: “Is adaptive randomization always better than traditional fixed-schedule randomization?,” “Which procedures should be used and under which circumstances?,” “What special considerations are required for adaptive randomized trials?,” and “What kind of statistical inference should be used to achieve valid and unbiased treatment comparisons following adaptive randomization designs?” The book assumes a basic understanding of clinical trials as well as adaptive designs. Experienced statisticians who have a focused interest on adaptive randomization will find the book useful.

The 23 chapters are grouped into seven parts. In the first part, the editor gives an introduction to adaptive randomization, explains terminology, outlines the four main categories of adaptive randomization (restricted randomization, covariate-adaptive randomization, response-adaptive randomization, and covariate-adjusted response-adaptive randomization), and lays out the structure of the book. General discussion on when to use each type of adaptive randomization is also included. Throughout, there is a common theme regarding the trade-off between balancing the treatment arms to maximize efficiency (especially with small samples) and randomization to control bias.

Part II focuses on restricted randomization and includes Chapters 2, 3, and 4. These chapters cover the topic of Efron’s biased coin design, adaptive biased coins design, and a relatively new adaptive randomization method called Brick Tunnel and Wide Brick Tunnel Randomization, respectively.

Part III, including Chapters 5–8, presents covariate-adaptive randomization. In Chapter 5, a novel covariate-adaptive randomization design named “the minimal sufficient balance method” is reviewed with the concept, the implementation and an example. Chapter 6 discusses optimal model-based covariate-adaptive randomization and uses loss and bias to assess the allocation rules. Chapter 7 addresses the issue of inference following covariate-adaptive randomization. Chapter 8, titled “Coavariate-adaptive randomization with unequal allocation” reviews the methods to use when randomization is not intended for equal allocation to two treatment arms.

Part IV covers response-adaptive randomization. Chapter 9 discusses the dilemma that future patients and current patients in a clinical trial do not have the same interest because for future patients the focus is on inference while for the latter the emphasis is on the treatment outcomes in the current trial. The author proposes optimal allocation design when considering both inference and treatment. Chapter 10 gives a nice overview of response-adaptive randomization covering the basic framework as well as issues related to efficiency and variability. Chapter 11 addresses the problem of making inferences including estimation and hypothesis testing with a response-adaptive randomization trial. Chapter 12 proposes a special method to combine the conditional power and adaptive randomization to reestimate the sample size when missing data are present. Chapter 13 focuses on outcome adaptive randomization where not only the treatment assignment but also the observed outcomes are used to compute the probability of new patients.

Part V presents a novel type of adaptive randomization named covariate-adjusted response-adaptive (CARA) randomization, where treatment-covariate interaction exists. Part V includes Chapters 14–17 and covers special topics such as efficient and ethical adaptive clinical trial designs to detect treatment-covariate interaction, impact of missing data on the statistical property of longitudinal covariate-adjusted response-adaptive randomization, a new group-sequential CARA design using LASSO, and a Bayesian adaptive randomization. Part VI of the book, consisting of Chapters 18–20, covers randomization with treatment selection, where some arms of a multi-arm trial will be eliminated at interim analyses. The three chapters in Part VII each presents an interesting real life case study and cover many practical aspects of adaptive design trials such as the use of simulation, the role of Data Monitoring Committee (DMC), reverse engineering, etc.

Overall, the book covers many aspects of adaptive randomization studies and reports many new developments in the field with detailed statistical considerations. As regulatory reviewers, we can see ourselves delving into this book when we receive a submission with an adaptive randomization trial. These trials are more commonly used where smaller sample sizes, and potentially early stopping, can lead to relatively large deviations from the desired randomization ratios or large imbalances in covariates. The book would also prove a valuable resource for researchers in the field of adaptive randomization. One area for improvement could be more discussion on the practical issues of trial conduct. Subtitled as statistical and practical aspects, the discussion on practical issues is concentrated in the last three chapters when case studies are reported. The rest of the book focuses more on methodologies; a thorough evaluation of the possible pitfalls and difficulties in logistics and trial conduct for each of those methods would inspire more efforts toward application of these methods.

Xiting Yang and Gerry Gray

Food and Drug Administration

Power Analysis of TrialsWith Multilevel Data. Mirjam Moerbeek and Steven Teerenstra. Boca Raton, FL: Chapman & Hall/CRC Press, 2015, xix + 268 pp., $ 89.95 (H), ISBN: 978-1-49-872989-5.

Statisticians who interact with the broader community of researchers will no doubt be familiar with requests for assistance with power analyses. I suspect the frequency of these requests has increased in recent years as more and more organizations that fund research (e.g., government agencies, large foundations) require formal power analyses to justify the sample sizes used in the studies that they fund. In recent years the requirement for a power analysis has even begun to be regarded as an ethical issue. Institutional Review Boards are increasingly asking researchers to justify the sample size used in a study via power calculations.

Given this environment, the appearance of the book, Power Analysis of Trials with Multilevel Data, is well timed. Recent years have seen an increased understanding of the need to account for correlated data structures at both the design and analysis phases of a study. Multilevel models (and corresponding software implementations) provide a general framework for achieving this goal. As such, many more nonstatisticians are familiar with these models than was the case 15 years ago. As noted by the authors, there has been much recent research on power and optimal design for nested data. The objective of the current book is mainly to synthesize that research, rather than to present novel results.

For the most part mathematical details are omitted; the appropriate audience for the book seems to be the applied researcher with a moderate background in statistical methods. Some prior exposure to standard formulations of multilevel models for continuous and dichotomous outcomes would be beneficial, as the explanation in the book is brief. Most examples are drawn from behavioral health settings, and the level of the book (and some of the content) is similar to what can be found in Donner and Klar (Citation2000).

There are three major approaches to statistical power analysis: the computation of power for fixed specifications of effect size and sample sizes, the computation of sample size(s) for a prespecified power level and effect size (and possibly prespecified sample sizes at other levels of a multilevel design) and the computation of a minimum detectable effect size (MDES; Bloom Citation1995) for a fixed power level and fixed sample sizes. The book focuses almost exclusively on the second of these approaches. While this is a logical choice if one had to choose a single approach, I would have liked to see more attention given to the MDES approach. It plays a major role in the Optimal Design (Spybrook et al. Citation2011) software for multilevel power analysis and can be quite useful when there is little flexibility with respect to sample size and little consensus regarding what might consist of a practically meaningful (or a reasonably expected) effect size.

The book has an uneven beginning. Chapter 1 is dedicated to an overview of the logic of randomized experiments, research design, and power analysis. The main research designs covered in the book are listed and described in one or two paragraphs each. The design types are cluster randomized, multisite, pseudo cluster randomized, individually randomized with group treatment and longitudinal. The chapter feels a bit haphazard, with too many facts relating to weighty issues in research design crammed into too small of a space. For instance, nonrandomized designs are both introduced and dismissed within the space of a few sentences. I was, however, happy to see a discussion of the connection between sampling and randomness in multilevel models and of the need for experimental subjects to be randomly sampled from a population of interest in order for results to be generalized to such a population. These issues are too frequently ignored in writings about experimental design.

Chapter 1 also demonstrates a tendency toward digression on tangential topics. For instance, we find two paragraphs on page 3 dedicated to the technical issue of drawing a distinction between blocks and strata and a paragraph on page 17 about meta-analysis with little connection to the discussion before or after. This tendency toward digression improves over the course of the book.

Chapter 2 introduces the reader to the basics of multilevel models for continuous and dichotomous outcomes. It is a nice introduction, although with too much focus on details of estimation and interpretation that play little role in the methods discussed in subsequent chapters.

Throughout the book the authors (perhaps strategically) avoid many of the difficult issues that arise when modeling binary outcomes. For example, in chapter 2, attention is limited to models with a logit link function. Additionally, ambiguity regarding how intracluster correlation coefficients should be defined in binary models is acknowledged, but the discussion is terse, leaving the reader with little guidance regarding the possible implications of making one choice or the other.

Chapter 3 describes the basics of power analysis for independent data. It contains a nice (albeit brief) discussion of the practice of post hoc power analysis. It points out that one need not determine sample size on the basis of achieving a particular level of statistical power. For instance, one could instead determine the sample size needed to achieve a particular expected width of a confidence interval. Of course, like grant making agencies everywhere, the authors proceed to ignore this approach and subsequently focus exclusively on power analysis. Chapter 3 also introduces one of the best features of the book, which is its accessible presentation of methods for designing studies that take cost into account. This is typically done by maximizing power subject to a cost constraint or by minimizing cost for a given power requirement. Surprisingly, these techniques are rarely given much attention in general purpose books on applied experimental design such as this one. Instead one usually needs to turn to specialized books on optimal design, such as Berger and Wong (Citation2009). I was happy to see that this book is an exception to that rule.

The following chapters step the reader through power computations for each of the designs listed above (cluster randomized, multi-site, etc.). The exception is Chapter 5, which discusses methods for improving statistical power in cluster randomized trials. This discussion is limited to two level models. Chapter 10 extends results to three level models. Chapter 11 focuses on intracluster correlation coefficients (sources for information about them, what to do if they are not known). Chapter 12 is about computer software for power calculations.

Generally, the authors do a nice job introducing ideas using multilevel modeling notation (which I find a more intuitive way to understand how variance components should be interpreted) and then drawing a connection to mixed effects ANOVA models. Unfortunately, they violate this rule when discussing more complex designs: Crossover designs and stepped wedge designs (in Chapter 5) and pseudo cluster randomized designs (Chapter 7) are all presented exclusively in mixed effects notation, making the logic for the model specification less obvious.

Another nice feature of the book is the example power analyses that conclude most chapters (and sometimes appear earlier in chapters as well). The authors have done a very good job finding articles in the literature that use a particular design, extracting relevant parameters from those articles, and then illustrating how to use those parameters to plan a replication study. My only minor quibble would be that the examples are focused on only one particular design that has been decided on ahead of time. It would have been nice to see an example that walked the reader through the decision-making process of choosing between, for example, a cluster randomized cross-sectional design, a longitudinal design, and a pseudo cluster randomized design.

One major limitation of the book is its focus on presenting sample size equations that rely on critical values computed from the normal distribution (rather than, for instance, critical values from a t distribution). This, of course, allows for a simple presentation of sample size formula. However, if computer software had been introduced earlier in the book (rather than being relegated to a very short chapter at the end of the book), it could have been used to illustrate computations using a t distribution. More focus on using software for power computations would also have been more in line with the preferred approach of most researchers, who are more likely seek out software to compute sample sizes rather than plug numbers into an equation to compute sample sizes by hand.

The authors justify the large-sample approach by claiming that multilevel models should not be used for studies with fewer than 20 clusters. However, an alternative approach is not suggested, and it is certainly not uncommon to see cluster randomized trials with fewer than 20 clusters. Community intervention trials are a prime example. As it is, important issues regarding the computation of degrees of freedom in mixed effects models are ignored entirely.

The second important limitation of the book is its treatment of dichotomous outcomes. One problem is that the authors never settle on a consistent procedure for estimating the standard errors needed to compute power, and they do not clearly connect the test statistics used to derive sample size formulas to particular statistical models. In the chapter on cluster randomized designs, sample size formulas are provided for both the risk difference and the odds ratio. The first seems to be based on a literature that corrects tests for independent data using design effects, whereas the second is based on asymptotic arguments associated with generalized linear mixed models. The standard error in each case depends on an ICC, the definition of which is potentially ambiguous. However, the reader is given no guidance regarding when a particular sample size formula or ICC definition should be chosen. In the chapter on multisite designs only the formula based on logistic modeling of the odds ratio is provided. In Chapter 7, an entirely different model (a hierarchical binomial model) is used to determine standard errors.

Another problem is that the authors never address whether and how researchers should use covariates in conjunction with binary outcomes. The literature is confusing on this point, with some articles showing that covariate adjusting treatment effect estimates harms power and others showing that it helps. So I grant that this is a difficult issue to discuss in a book at this level. However, I do not agree that the book should ignore the issue entirely.

Schochet (Citation2013) showed that a generalized estimating equations approach to modeling binary data ensures that covariates function the same way in models for binary outcomes as they do in linear models for continuous outcomes. However, the generalized estimating equations approach is ignored in the manuscript.

The book has its share of typographical errors, and occasionally these contribute to difficulties in following the argument. For instance, the discussion on page 104 seems mistaken until one realizes that the wrong variance components are in the numerator of the two equations defining intracluster correlation coefficients.

Despite the above critiques, there is more good than bad in this book. Simple sample size formulas are provided for standard designs and for a few more advanced designs as well. Optimal designs using cost constraints are defined for a wider variety of designs than in other books I am aware of. Despite its flaws, I think this book deserves a place on the bookshelf of both researchers who plan experimental studies and statisticians who advise them.

Christopher H. Rhoads

University of Connecticut

Spatio-TemporalMethods in Environmental Epidemiology. Gavin Shaddick and James V. Zidek. Boca Raton, FL: Chapman & Hall/CRC Press, 2015, xxxi + 365 pp., $ 89.95 (H), ISBN: 978-1-48-223703-0.

The past two decades have seen a rapid expansion in the development of spatial and spatio-temporal statistical methods and a corresponding expansion in the number of books and textbooks outlining the theoretical and applied sides of this growing analytic toolbox. Within the field of environmental epidemiology, such methods have grown to address spatial, temporal, and spatio-temporal prediction of exposures for given locations in space and/or time, and the association to health outcomes observed in individuals living, working, or moving within the same area and experiencing the predicted exposures. The specific area of air pollution epidemiology provides multiple challenges in this regard: first, air pollution levels for multiple pollutants are monitored at fixed locations providing temporal and spatial snapshots of a complex, multidimensional, dynamic environmental process; and, second, health outcome data often occur as aggregate outcomes of individuals living in enumeration districts often collected for nonepidemiological purposes (e.g., billing). The epidemiologic, statistical, and geographical challenges in linking these data and providing epidemiologic insight are numerous, compounded in the space-time setting, and require thoughtful application of challenging methods to provide insight into underlying disease processes. The authors of this text, both accomplished researchers in the area, provide a much-needed consolidation of spatio-temporal modeling methods with an overall goal “to promote the interface (of environmental epidemiology and spatio-temporal modelling) between statisticians and practitioners to allow rapid advances in the field of spatio-temporal statistics to be fully exploited in assessing risk to health” (from the preface).

The textbook condenses many complex topics into accessible and manageable chapters addressing key elements of modern spatio-temporal analyses of environmental epidemiologic data. The book sets the stage by giving an overview of a very general hierarchical framework for the analysis of environmental data, a framework popularized in climate science by Berliner (Citation1996) and in general environmental statistics by Wikle (Citation2003). This broad hierarchy consists of the observation process (also referred to in the literature by the data process or the measurement process), the underlying environmental process driving space-time dynamics, and the (prior) distributions of parameters for both of the processes. The framework provides a solid conceptual setting for the methods in the book, as well as the Bayesian framework for analysis described throughout. The authors build on this framework to outline a very helpful list of elements of a strong spatio-temporal model, nicely preparing the reader for the details that follow. Next, the authors provide a whirlwind tour of epidemiologic study designs (e.g., cohort, case–control), generalized linear models (including smoothers and splines), and detailed definitions of Poisson and logistic models common in standard, nonspatial epidemiology. The authors provide helpful R examples throughout.

Chapters 3 and 4 provide a readable but thought-provoking overview of theoretical (and philosophical) concepts of uncertainty and Bayesian statistics. The two chapters together provide a valuable primer of concepts and terms often overlooked by readers seeking to jump immediately into complex modeling algorithms, but I plan to recommend them to students and colleagues just entering the field as a great source for the underlying concepts, described with examples relevant to the environmental epidemiology setting. Chapter 5 quickly follows illustrating the use of Markov chain Monte Carlo and integrated nested Laplace approximation implementations for implementing Bayesian modeling concepts, demonstrated with WinBUGS and R-INLA examples.

Chapters 6 and 7 detail strategies and challenges for modeling large environmental datasets. Topics include variable selection methods with a thoughtful discussion of the role of p-values (mirroring recent discussions in the American Statistical Association's consensus statement on p-values, Wasserstein and Lazar Citation2016), model averaging, and model comparisons (e.g., through Bayes factors). Analytic challenges such as missing data, measurement error, and preferential sampling often arise in environmental epidemiology and are each described in detail along with focused data examples and accompanying code.

Chapters 8–10 separately examine methods for spatio-temporal estimation of local disease risk, methods for spatio-temporal prediction of continuous exposure fields, and methods for modeling temporal variation in exposures, respectively. Chapter 8 introduces spatio-temporal models for the small area estimation of health outcome risks and rates (i.e., disease mapping). The authors provide accessible introductions and examples for concepts ranging from “borrowing strength” and shrinkage estimation through Markov random fields and the use of conditional autoregressive prior distributions, again, with multiple examples and implementation in both WinBUGS and R-INLA. Chapter 9 follows with a focus on prediction of the exposure field from observations from a network of monitors. Methods include estimation of spatial correlation, standard, and model-based kriging, and, again, the authors provide relevant examples including a detailed assessment of NO2 exposures across all of Europe. Chapter 10 provides an overview of time series methods applied to environmental pollution data including forecasting, filtering, and the spectral domain.

Chapters 11 and 12 bring exposure and health outcome data together, with specific focus on studies assessing health impacts of air pollution in data measured over space and time. Chapter 11 provides the methodological framework, again with many relevant and helpful examples and accompanying code, while Chapter 12 provides an important review of the many things that can (and often do) go wrong in large-scale epidemiologic analyses including aggregation (ecologic) bias and hidden exposure pathways. The authors outline mechanisms for acknowledging ecological bias and models for estimating personal exposures for individuals moving through various micro-environments.

Chapter 13 provides an in-depth look at design criteria for spatial and spatio-temporal monitoring of exposure fields to improve prediction of continuous exposures and aid in estimation of resulting health effects. The authors describe several existing large-scale monitoring networks and include elements of sampling and optimal design theory as applied to these networks. Examples include air pollution, temperature, acid deposition, and sampling in stream networks.

Chapter 14 concludes by pushing the envelope into emerging applications of spatial and spatio-temporal environmental statistics. The authors review methods for assessing nonattainment of regulatory limits, methods for modeling infectious disease dynamics in space and time, spatial deformation to improve statistical analysis, and extending methods to address networks of multivariate outcomes where individual monitors may or may not measure every pollutant.

The text covers a remarkable number of topics in its 318 pages (including many full color graphics and examples of code and output). The structure outlined above provides excellent coverage of many areas of recent development, held together with compelling examples and illustrations. Not all topics are easy and some may not be immediately accessible to all epidemiologic practitioners (e.g., spectral methods and Bochner's lemma in Chapter 10), but advanced topics are clearly marked and available in context for readers interested in digging deeper but not overly distracting for those seeking to move along to the next topic. Most (but not all) examples directly relate to air pollution epidemiology so other topics in environmental epidemiology (e.g., toxicology) are not stressed in detail. That said, the focus of the text is on spatio-temporal methods of analysis and air pollution epidemiology's combination of complex, multivariate, and dynamic space-time fields of exposure and space-time-specified health outcomes provide the quintessential example for space-time studies in the field.

Overall, I found the book a comprehensive overview placing many different topics into a logical perspective with focused, helpful examples. I enjoyed reading the book, am already recommending it to colleagues, and anticipate referring to it often in my future work.

Lance A.Waller

Emory University

http://orcid.org/0000-0001-5002-8886

Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery. Walter W. Piegorsch. New York:Wiley, 2015, xv + 470 pp., $ 115.00 (H), ISBN: 978-1-11-861965-0.

In Statistical Data Analytics, Piegorsch aims to teach the statistical analysis skills that form the foundations for data mining, informatics, and knowledge discovery. However, it should be emphasized that the goal of the book is not to explore computational methods for data mining or informatics, but rather to provide a broad base of statistical knowledge for data analysis, with a threefold focus on data summarization and statistical inference, supervised learning, and unsupervised learning. The book is written to an audience familiar with multivariable calculus and linear algebra, though a concise review of the latter is given in an appendix. Statistical Data Analytics is well-written and could easily serve as a graduate course textbook, an instructional resource, or a statistical reference.

The material in Statistical Data Analytics is presented similarly in each chapter. For every statistical method that is discussed, the author introduces and develops the underlying mathematical model as well as important details associated with a data analysis. An illustrative example is given for each method, with R code and output provided when applicable. The author usually uses the examples effectively to elucidate the purpose and ideas behind the statistical techniques. For most of the chapters, multiple statistical methods are discussed and practical examples are supplied from a variety of fields (finance, economics, medicine, biology, genetics, astronomy, etc.). While most of the R code is based on built-in functions from a variety of packages, some user-defined functions are supplied. For those unfamiliar with R, an appendix is given that teaches the basics of R programming. At the end of each chapter, a number of applied and theoretical exercises are provided. An online repository contains the datasets needed for the chapter exercises, and solutions to the exercises are available in a separate manual.

In the first chapter, Piegorsch makes a brief case that the focus of the book—statistical learning and analytics—is a necessary starting point for understanding how to effectively handle “big data.” Thus, while the remaining chapters of the book do not describe strategies specific to data mining and informatics, the author makes it clear that mastering the statistical techniques for analyzing smaller datasets is essential for the overarching process of knowledge discovery for any size data. In addition to establishing the philosophical approach of the book, Chapter 1 is used to discuss problems that occur with data collection and to define the differences in statistical description and modeling.

Chapters 2 through 5 of the book provide a review of probability, data manipulation, data visualization, and statistical inference. These chapters are mainly included so that, theoretically, someone with minimal prior statistical knowledge could understand the statistical learning methods presented in the latter two-thirds of the book. Specifically, Chapter 2 reviews the most important concepts of probability, random variables, and statistical distributions. Chapters 3 and 4 explain a variety of classical and modern data summarization and graphical procedures, including measures of location and variability, histograms, boxplots, and a variety of other techniques. Chapter 5 tackles statistical inference, with an emphasis on the use of likelihoods, confidence intervals, and hypothesis tests. Also included in this chapter is a discussion of multiple testing and the false discovery rate. While Chapters 3 and 4 are very accessible even to a statistically naive reader, the material in Chapters 2 and 5 is fairly dense and would be challenging for most readers not previously familiar with the content. To the author’s credit, the majority of important concepts are clearly and efficiently presented, and when details are lacking, other sources are referenced.

Chapters 6 through 9 of the text discuss what are arguably the most important methods of supervised learning: linear regression, generalized linear models, and discriminant analysis. Given that the reader has a solid understanding of matrix algebra and the material in Chapters 2–5, these chapters are relatively straightforward and provide a nice balance between theory and application. Chapters 6 and 7 offer a wide-ranging coverage of simple linear regression and multiple linear regression, respectively. In Chapter 7, there is a strong emphasis on model building and related procedures such as ridge regression, LASSO, and cross-validation. Disappointingly, here and throughout the remainder of the book, only quantitative predictors are considered, with a few exceptions. A brief discussion of analysis of variance (ANOVA) models is given at the end of Chapter 7, but strategies for handling categorical predictors using methods such as dummy variables are only peripherally mentioned in the context of regression. Chapter 8 introduces generalized linear models for potentially nonnormal responses. Special attention is given to logistic regression, log-linear models for contingency tables, and gamma regression, with real-world examples provided for each procedure. Chapter 9 completes the section of the book dedicated to supervised learning with a discussion of classification methods. Logistic regression, linear discriminant analysis, Bayesian classification, nearest neighbor algorithms, classification trees, regression trees, and support vector machines are all presented, with examples and R code supplied in each case.

Chapters 10 and 11, comprising the final section of the book, examine techniques for unsupervised learning, where no formal “output” or response variable is present. Chapter 10 focuses on dimension reduction and discusses the methods of principal component analysis, exploratory factor analysis, and canonical correlation analysis. Chapter 11 provides an overview of classical and modern methods of cluster analysis, including association rules (market basket analysis).

The focus of Statistical Data Analytics is somewhat different than other similar books on statistical learning. An Introduction to Statistical Learning: With Applications in R by James et al. (Citation2014) does not attempt to explicate the essentials of statistical probability and inference but instead assumes a basic level of statistical sophistication. At the same time, An Introduction to Statistical Learning has a more applied focus than Statistical Data Analytics, spending significantly more time discussing ideas from a conceptual viewpoint. While An Introduction to Statistical Learning contains less data examples than in Piegorsch’s book, the examples are usually extended throughout an entire chapter, and R code is reviewed in greater detail at the end of each chapter. Another comparable text, The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman (Citation2009), also covers less on statistical probability and inference, but includes more depth and breadth of statistical learning topics (albeit in many more pages). However, The Elements of Statistical Learning contains less data examples and does not demonstrate programming code.

While Statistical Data Analytics is largely successful in accomplishing its goal, it is admittedly somewhat ambitious in the amount of material it covers on a per page basis. In a classroom context, it may be best suited for students at a graduate level who are already familiar with the statistical content in Chapters 2–5. Chapter 2, especially, contains very few examples compared to the amount of mathematics presented and would be challenging for most students without significant elaboration by an instructor. For this reason, while Chapters 2–5 are nice for self-study purposes, they may not be practically useful in a one-semester class where the students come in with minimal background in statistics. However, if an instructor wished to hold a two-semester course, the statistics presented in Chapters 2–5 could serve as a first semester introduction to a follow-up course on statistical learning methods, though some additional examples and more detailed explanations would likely be needed from the instructor.

Additional downsides of Statistical Data Analytics are relatively minor. Those interested in using the book as a resource may wish that it contained even more topics, such as random forests, additive models, techniques for multivariate responses, computational strategies for data mining, and additional R code for performing analyses. Also, while the author initially emphasizes a need for data cleaning with messy datasets, little guidance is provided on good practice for this process. Finally, as mentioned earlier, almost no attention is given to datasets with categorical inputs (predictors).

In summary, Statistical Data Analytics provides a well-written and succinct overview of essential topics in statistical learning. The fact that the book is reasonably self-contained by including several chapters introducing concepts of statistical probability and inference, as well as data summarization and visualization, makes it attractive for use as a reference or as a text for a two-semester course sequence. Perhaps the biggest strength of the book is its comprehensive approach balancing mathematical details, application, and coding in R. The abundance of real-data examples provided for illustrative purposes and for practice exercises make this text especially appealing for an instructor.

PaulW. Bernhardt

Villanova University

Time Series Modeling With Unobserved Components. Matteo M. Pelagatti. Boca Raton, FL: Chapman & Hall/CRC Press, 2015, xvii + 257 pp., $ 99.95 (H), ISBN: 978-1-48-222500-6.

This book discusses unobserved component time series models. In these models, one or more time series are decomposed into a number of latent series, which are often referred to as unobserved components. This class of models is also commonly referred to as state space models. The main contribution of the book relative to existing books on this topic is that it emphasizes the actual model class, rather than methods for these kind of models. The author points out that despite the many advantages of this rich class, its use is still limited among practitioners. He hopes that his new angle will further popularize unobserved component models.

Indeed, the book puts much emphasis on various unobserved component models. It consists of three parts. Part I deals with statistical prediction (Chapter 1) and contains a refresher on time series concepts (Chapter 2). Part II discusses various unobserved component models. Chapter 3 provides the first unobserved component models, where an observable time series is decomposed into a trend, cycle, seasonal, and noise component. Chapter 4 discusses regression as an unobserved component model. With these first models in place, Chapter 5 turns to the methods and discusses estimation. Chapter 6 provides further details for the modeling, and Chapter 7 extends the previous models to a multivariate setting. Finally, Part III discusses some applications, including business cycle analysis (Chapter 8), road injuries, benchmarking, and electricity demand (all Chapter 9). Part III closes in Chapter 10 with a discussion of various software packages that can be used for unobserved component modeling, including EViews, Ox, R, SAS, and Stata.

Much space is devoted to a wide variety of unobserved component models. As such, the book really achieves its purpose and differentiates itself from alternatives, and is therefore a valuable addition and worth buying. The discussion of software in Chapter 10 is extremely timely and a great plus for practitioners and researchers that are ready to sit down and start implementing. For each software package, clear examples are given on how to run an example unobserved component model.

The book’s structure could be improved to better focus on the models. Although Part I does include necessary notation and terminology, it is rather basic and includes material that is not essential. For example, it ends with a brief discussion on cointegration. Moreover, the various kinds of models are fairly transparent in the second part, but are somewhat harder to find in the third part. In between, the methods are discussed. It might have been better to move Part I to an appendix, and start with a description of the actual models. Then Part I could have discussed each of the models types in its own section, with Part II focusing on the methods and Part III on applications of the models introduced in Part I. I would also like to have seen coverage of an extended set of models, for example, models for mixed-frequency observations, which are naturally incorporated in unobserved component models, and the popular class of dynamic factor models.

The book does achieve what it aims and adds to the existing books, such as Commandeur and Koopman (Citation2007), Durbin and Koopman (Citation2012), and Harvey (Citation1989). In terms of accessibility I would place it near that of Commandeur and Koopman (Citation2007), since both are aimed at first-time users for unobserved components models; a big plus for this new book is its great emphasis on models and code. Advanced users with a need for in-depth analysis are probably still better of using the books of Harvey (Citation1989) and Durbin and Koopman (Citation2012), which present the material in more detail and at a higher level. The former, for example, clearly discusses the link between ARIMA modeling and continuous time considerations, and the latter details non-Gaussian and nonlinear state space models, particle filtering, and Bayesian estimation. None of these topics appear in the new book.

In closing, the book reads well and really provides the reader with a broad understanding of the unobserved component approach. This includes models, methods, and the discussion of software packages. I can imagine that besides being relevant and interesting for practitioners, students will benefit from reading this book. I personally would be more than happy to suggest it to Master and advanced Bachelor students in Econometrics working on the topic in a course or for their thesis.

Michel van der Wel

Erasmus University Rotterdam

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 106.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.