311
Views
0
CrossRef citations to date
0
Altmetric
Indices

Book Reviews

 

Luai Al-Labadi

University of Toronto Mississauga

Causal Inference in Statistics: A Primer. Judea Pearl, Madelyn Glymour, and Nicholas P. Jewell. New York: John Wiley & Sons, Inc., 2016, xvii + 136 pp., $45.00 (P), ISBN: 978-1-11-918684-7.

In statistics, health sciences, and social sciences, interest in causal inference has been growing exponentially since the introduction of the Rubin causal model in the late seventies and eighties. At least during the last two decades, structural causal models and corresponding graphical models, as put forward by Spirtes, Glymour, and Scheines (Citation2000) or Pearl (Citation2009), have strongly influenced causal thinking and analyses in statistics and the applied sciences. However, an accessible introductory textbook to graphical models for (under)graduate students and researchers interested in causal inference has not been available until recently. The primer on Causal Inference in Statistics is the first short introductory textbook for students and researchers interested in graphical models for causal inference, written by authoritative experts: Judea Pearl, Madelyn Glymour, and Nicholas P. Jewell.

The primer is short, about 130 pages, with only four chapters. The first chapter, Preliminaries: Statistical and Causal Models, is predominantly an introduction to probability theory and statistics (which can be skipped by readers familiar with basic probability theory), but also highlights the importance of causal reasoning and introduces graph terminology and structural causal models. The second chapter, Graphical Models and Their Applications, carefully discusses the basic mechanisms responsible for causal and noncausal associations between two variables (chains, forks, colliders), introduces the important concept of d-separation, and briefly glances at model testing and causal search. The third chapter then looks at The Effects of Interventions. This chapter introduces the do operator for separating causal from noncausal effects, emphasizes the difference between intervening and conditioning, and then introduces two of the most important criteria for identifying a causal effect: the backdoor and front-door criterion. This chapter also touches on mediation analysis and concludes with a discussion of causal inference in linear systems where special attention is paid to the distinction between structural parameters and regression coefficients. Finally, the fourth chapter discusses Counterfactuals and Their Applications using several examples, including mediation analysis and the average treatment effect for the treated. Each chapter ends with helpful bibliographic notes. Overall, the main focus of the primer is on causal identification rather than estimating causal effects from real data. In particular, potential estimators and their properties are not covered, though nonparametric estimators could directly be derived from the nonparametric identification results.

The primer also contains many useful study questions injected throughout the text. The solutions to the questions are available as a manual for instructors only. The solutions manual (75 pages) is excellent: it not only provides the solutions but carefully explains them. Moreover, the manual provides direct links to (interactive) demonstrations and algorithmic solutions in DAGitty. DAGitty is a browser-based environment for creating, editing, and analyzing causal models, which is also available as an R package (developed and maintained by Johannes Textor). Demonstrations and solutions in DAGitty are accessible to all readers (http://dagitty.net/primer/). Though the primer contains several typos, regular updates of errata, and references can be downloaded from the textbook's companion webpage (http://bayes.cs.ucla.edu/PRIMER/).

The primer is an excellent self-contained introduction to graphical models for causal inference that addresses the most important issues and applications without getting lost in overly technical aspects. For instance, the faithfulness, autonomy, causal Markov, or positivity assumptions are not explicitly discussed. Neither is the do-calculus. Instead, the primer carefully discusses crucial concepts and issues that have been causing confusion among many statisticians and applied researchers. We particularly liked the discussion of Simpson's paradox because it demonstrates the importance of causal thinking in statistics. The Monty Hall problem is also nicely featured using Bayes’ rule and a corresponding structural causal model.

The primer also contains excellent subsections on issues hard to grasp for novices. For instance, the discussion of the difference between intervening and conditioning or of the difference between structural parameters and regression coefficients is very helpful not only for novices but also for instructors. Another highlight is the introduction to counterfactuals and their practical use in Chapter 4 (including mediation analysis). Considerable care is taken to explain why counterfactual questions can, in general, not be addressed by the do operator. We really liked Chapter 4 because these ideas are not covered with such clarity in Pearl's book on Causality (Citation2009). However, this chapter is by far more advanced than the other three chapters, such that novices to the topic might have trouble fully understanding counterfactual analyses on a first reading. Nonetheless, this chapter may be one of the most accessible descriptions of counterfactual queries available.

The primer does not have many weaknesses but there are a few areas where further discussion would have been very helpful for more applied researchers. For instance, how do we actually draw a causal graph from subject-matter theory and the observed data? Researchers familiar with structural equation modeling might be tempted to draw graphs based on the observed data rather than the underlying data-generating process. The primer does not make efforts to dissuade this. Moreover, the examples in the book implicitly assume that all the constructs involved in the data-generating process are measured without error. But with fallible measures, the backdoor criterion typically fails to identify a causal effect. Similar issues arise when the observed data suffer from attrition or nonresponse problems. The primer would have benefited from adding some more realistic examples involving unobserved or unreliably measured confounders, attrition, or nonresponse. Readers, especially those with interest in identifying and estimating causal effects from real data, would then immediately learn that valid causal inference crucially depends on both reliable subject-matter theory and high-quality data.

Overall, the primer is a very good introduction to graphs and structural causal models for causal inference. It is appropriate for novices but also researchers and instructors who are already familiar with the topic because it provides a fresh and far more applied perspective on graphical models than Pearl's Causality book, for instance. The textbook, together with the study questions, solutions manual, and the DAGitty demos and applications, would certainly be a very helpful companion for teaching (applied) introductory courses on graphical models for causal inference.

Peter M. Steiner, Yongnam Kim, and Stanley A. Lubanski

University of Wisconsin-Madison

Handbook of Item Response Theory, Volume One: Models. Wim J. Van Der Linden, ed. Boca Raton, FL: Chapman & Hall/CRC Press, 2016, xxviii + 595 pp., $139.95 (H), ISBN: 978-1-46-651431-7.

Edited by Wim van der Linden, this book is the first of a three-volume treatise on a special topic in psychometrics: item response theory (IRT). Psychologist Bengt Muthén once characterized psychometrics as latent variables and statistics. Although psychometricians use plenty of statistical methods and techniques, psychometric methods do not often appear in the statistics literature. For example, JASA has published very few IRT-related articles. This is a bit surprising given that IRT is all about modeling multivariate discrete responses. I use two published examples in JASA to illustrate the connection between statistics and IRT. In a study of birth defects, Legler and Ryan (Citation1997) used 10 indicators of abnormality (e.g., the presence or absence of a depressed nasal bridge) for a latent variable that measures the severity of teratogenic exposure. Scott and Ip (Citation2002) used responses to clustered, multiple cognitive items (correct or incorrect) from a large-scale assessment in reading. From a statistical perspective, IRT is a generalized mixed effects model that contains person-specific random effect and fixed-effect parameters that are related to each outcome (e.g., parameters for the indicator of a depressed nasal bridge). Outcome-specific parameters may also come in the form of a slope for the random effect, something that is not commonly prescribed in statistical modeling. Also, unlike many applications in statistics, the person-specific random effect (teratogenic severity in the former example, and reading ability in the latter) is often the quantity of primary interest in IRT, whereas in statistics the random effect may represent a nuisance factor designed for accounting the correlation between responses of the same person at different time points.

Following the tradition of its previous version (van der Linden and Hambleton Citation1996), this book, hereafter called Handbook I, catalogs a broad collection of statistical models that can be likened to a tree with many branches—the trunk being the basic item response model or the generalized mixed model for multiple binary outcomes, and the branches being extensions of the basic model as motivated by different IRT applications in educational, behavioral, and psychological sciences. Handbook I covers model presentation, while the other two volumes in the set cover statistical tools and applications.

Besides an introduction written by the editor, Handbook I is organized into eight sections. The introductory chapter summarizes early work on IRT, including the work of Alfred Binet, Louis Thurstone, Fred Lord, and George Rasch. The introduction is in itself an interesting read as the editor tells stories about early innovations these pioneers made in psychometrics (and statistics), highlights the influence of psychophysics on their thinking, and points out some misguided efforts made by these early scientists, which of course can only become clear in retrospect.

Each of the eight sections after the introduction contains between two and seven chapters, and chapters are written by experts on the selected topic. The chapters always follow the following format: introduction, presentation of model, parameter estimation, model fit, empirical example, and discussion. Handbook I is written as a desktop reference, with a uniform format that conveys a degree of consistency and facilitates searching and consumption of content. Notation-wise, Handbook I generally maintains a good level of consistency across chapters. Section I is about the basics and contains two chapters, respectively, dedicated to the unidimensional logistic model and the Rasch model. For statisticians, perhaps the Rasch model can be best understood as a random effect model with a logit link. The correlation between multiple binary responses coming from the same person is captured through the person-specific random effect, whereas the item effect is captured through an item-specific fixed effect intercept. While Section I is largely a conventional treatment of the basic dichotomous models, it represents quite an improvement over the previous single-volume version of this Handbook. The volume-set approach allows a clean presentation of the various models (in the previous version, dichotomous models are buried in an introductory chapter), so information becomes readily searchable, and more refined details can be useful for new users of these models. One example of the latter is the inclusion of model identifiability of IRT in the first chapter, a topic that has often been ignored and can easily trip up new users of IRT.

Section II, which consists of seven chapters, describes models for nominal and ordinal responses. Most of these models are immediate extensions to the dichotomous unidimensional logistic model. The seven chapters describe the nominal model, the Rasch rating-scale model, the graded response model (the cumulative model for ordinal responses), the partial credit model (the conditional model for ordinal responses), the generalized partial credit model, the sequential model for ordered responses, and finally, models for continuous responses. The last chapter on continuous outcome could almost be viewed as an outlier within Handbook I, as IRT typically deals with discrete data, while continuous data fall in the realm of other psychometrics subfields such as factor analysis. To be sure, a statistician may appreciate the general statistical idea that underlies the psychometric models and see the connection between factor analysis and IRT.

Consisting of four chapters, Section III describes the multidimensional and multicomponent models. The four chapters address normal-ogive multidimensional models, logistic multidimensional models, linear logistic models, and multicomponent models. For readers not familiar with IRT terminologies, the titles could look confusing. Multidimensional IRT models generally refer to models that contain more than one person-specific random effect for explaining variance in the data. The first two chapters describe different link functions. The third chapter regards a regression-type model for cognitive diagnosis and subtask analysis; due to historical reasons, the model is often referred to as linear logistic test models (LLTM). The multicomponent model is an extension to the LLTM.

Section IV represents a temporary digression from modeling response data. Three chapters present models for response time—Poisson and Gamma model, lognormal response-time model, and diffusion-based response time model. Of note, the first chapter contains models for count data in the response, the second chapter offers a brief summary of several different approaches for modeling response time, and the third chapter contains a fused model between the diffusion model, which has been developed in mathematical psychology, and IRT. This is a short section. Readers interested in more advanced survival-type models may need to research other sources.

Section V gets back to modeling response data and is an attempt to break out from the parametric forms of item response functions through the use of nonparametric models. The three chapters in this section, respectively, address the Mokken models, Bayesian nonparametric response models, and functional approaches to modeling response data. The Mokken model represents an interesting nonparametric model motivated by applications in social and political surveys; it imposes stochastic ordering on the set of items and is particularly useful in cases when large numbers of items cannot be administered, such as in clinical studies. Using an example from the first chapter, in assessing religiosity, a statement like “I attend mass on Christmas Eve” can be assumed to represent a lower level of religiousness than “I pray before every meal.” Such an assumption leads to constraints on the form of the item response function, and partly because of its flexibility, nonparametric models become the method of choice for this kind of data.

Section VI examines IRT-related models for nonmonotone items. An example of a nonmonotone item is the statement “I think capital punishment is necessary but I wish it were not.” A person would agree to the statement if her attitude is close to the position of the statement, and someone with more extreme views on either side of the latent attitude would tend to disagree with the statement. The IRT model for this kind of data is often referred to as an unfolding model. This section consists of two chapters, respectively, using the hyperbolic cosine unfolding model and a multinomial logistic unfolding model.

Section VII, hierarchical response models, further expands the scope of IRT to cover models for more complex testing situations. This section consists of seven chapters, more or less reflecting recent extensions of the IRT paradigm to practical and “messier” data. The chapters include the logistic mixture-distribution response model, the multilevel response models with covariates and multiple groups, the two-tier factor analysis model, the item-family models, the hierarchical rater models, the randomized response models for sensitive measurements, and joint hierarchical modeling of responses and response times.

Finally, Section VIII is largely a collection of eclectic models that represent generalization of the basic IRT models into different directions. This section contains four chapters: generalized linear latent and mixed modeling; multidimensional, multilevel, and multi-timepoint item response modeling; mixed-coefficient multinomial logit models; and explanatory response models. As an elaboration of the generalized linear latent and mixed model (GLLAMM, Skrondal and Rabe-Hesketh Citation2004), the first chapter perhaps is more accessible to a general statistical readership that is familiar with generalized linear mixed models. The GLLAMM extends the generalized mixed model to include latent variables and multilevel models. By encompassing different types of response data, including counts and continuous response data, it can also be seen as an extension of structural equation models to include such data. The second chapter, written by Muthen and Asparouhov, the authors of the popular program Mplus, contains more or less a description of yet another family of factor-analytic models that can be implemented via Mplus. The third chapter is an elaboration of extending the basic IRT models for accommodating covariates (person and item). The type of explanatory models (Wilson and De Boeck Citation2004) that can be implemented in statistical programs such as SAS PROC NLMIXED can be useful for statisticians looking for applications of predictive modeling to multivariate discrete data.

Handbook I represents a concerted effort for compiling a compendium of models that could be readily deployed in educational and psychological testing. This volume focuses mostly on potential applications to educational assessments and thus has a different emphasis than other recent and excellent IRT/latent variable books such as Reise and Revicki (Citation2015), which focuses more on other applications such as psychopathology and health. I like the breadth of the topics covered in Handbook I. I also like the organization of the topics into sections, although I think a summary or a roadmap in the form of a table showing what model under what scenario a specific model would be useful could help readers navigate the materials. While Handbook I has a fixed format for presentation, there is still a degree of heterogeneity that could be reduced for improving readability and usability. For example, the multidimensional IRT model with the probit and the logit links are presented in two different chapters. While there exist differences between the two link functions, the chapter on probit links requires a high level of mathematics to understand, and the other does not. For instance, asymptotic distribution is presented in the first with complex matrix notation, while the other tends to be more intuitive and visual.

Handbook I is likely to be useful for undergraduate or graduate students who have an interest in pursuing quantitative research in educational and psychological testing, especially with datasets that contain multiple discrete outcomes. Master- and doctoral-level students seeking dissertation topics and doing literature reviews will find Handbook I a valuable resource. On the other hand, applied statisticians/biostatisticians may also benefit from Handbook I. Many ideas presented here are relevant to other fields of applications beyond education and psychology. Mathematical statisticians or theorists, however, may not find the book to be extremely useful, although many of the models presented in the book could certainly make use of advanced statistical theories for model elaboration.

As observed by the editor, the growth in the number of IRT models and their applications has been rather dramatic in the last two decades since the last Handbook was published. Although IRT is now commonly used as a research tool, because of computational requirements, it is still not routinely used in practical performance assessments. “Modern test theory is now in its eighth grade,” quipped Jim Ramsay in his chapter on functional approach of IRT, about the penetration of IRT into the assessment market (p. 349). Two trends may help modern test theory to move a grade or two up in the coming decade. First, the advancement of technology has made it possible to conduct IRT analysis online (e.g., the Patient Reported Outcome Management Information System, PROMIS, in health sciences. See http://www.nihpromis.com). Second, careful documentation of the models, theories and methods for estimation, available software, and illustrative examples will make IRT more accessible and will increase its use in practice. Handbook of Item Response Theory, Volume One: Models serves as an important and timely step in this direction.

Edward H. Ip

Wake Forest School of Medicine

Improving Survey Methods: Lessons from Recent Research. Uwe Engel, Ben Jann, Peter Lynn, Annette Scherpenzeel, and Patrick Sturgis, eds. New York: Routledge, 2014, xxii + 429 pp., $59.95 (P), ISBN: 978-0-41-581762-2.

This book provides a comprehensive overview of recent developments in survey design. I strongly recommend this book. There is a strong European emphasis, but the findings apply equally well elsewhere, including North America. The book is well organized, with both author and subject indices, and is great value. There were very few typos and I was not aware of any obvious substantive errors. The book is an essential addition to your personal library if you are professionally involved in survey design as a practitioner or researcher. This is not a “how to” manual, rather a guide to the current research on survey design. Should you wish to pursue the topics further, each article includes a thorough bibliography. Although I do not consider this to be suitable as an undergraduate text (no exercises or toy examples), it would provide excellent supplementary material for those teaching survey design; I successfully incorporated several examples from the recent findings in my introductory (2nd year undergraduate) survey design class this fall. It would also be suitable for a graduate seminar class; I discussed several articles with a group of senior students (two graduate students and one final year undergraduate honors student), and they were very enthusiastic. The book does assume the reader is familiar with the terminology of survey design.

The earlier chapters are more accessible and of more general interest; the later chapters become more technical and require a somewhat greater background in the specific areas under discussion. The book comprises eight parts: Survey Modes and Response Effects, Interviewers and Survey Error, Asking Sensitive Questions, Conducting Web Surveys, Conducting Access Panels, Surveys–Expanding the Horizon, Coping with Nonresponse, and Handling Missing Data. Each part begins with an overview and introduction written by one of the editors. This introduces the topic, provides a context, and highlights some of the results. There is a wealth of information in every chapter; here are some examples to give the flavor, and inspire you to read the book.

The first part includes various discussions of differences in responses due to the form of a questionnaire. For example, labeling just the endpoints of a scale, compared with full labeling, positively wording items versus negatively wording them, or offering choices in ascending order versus descending order, can all affect the response. The methods for identifying false reporting (Chapter 8) are fascinating. There are two types of characteristics for identifying falsified data: those concerning the content of the responses, and those based on statistical properties of the responses (e.g., variance). These can be used with a clustering algorithm to separate responses into likely to be falsified and unlikely to be falsified. The likely to be falsified can then be investigated further. (You have to read the book!) My senior students were particularly interested in the chapters on nonresponse, especially the idea of applying different designs to subgroups, for example, offering different types of incentives, or using different types of wording in a notification letter.

There is an associated PPSM (Priority Program on Survey Methodology) website http://www.survey-methodology.de/en/index_en.html, which includes a link to the 25-page online appendix, http://www.surveymethodology.de/en/improvingsurveymethods_en.html. I found it a little irritating that I was unable to find these links in the book, since there are numerous references to the appendix, but it was well worth tracking down. The comments above represent a small fraction of the research represented in this book. I really enjoyed reading all of it, and encourage you to do so too.

Anne Michèle Millar

Mount Saint Vincent University

Joint Modeling of Longitudinal and Time-to-Event Data. Robert M. Elashoff, Gang Li, and Ning Li. Boca Raton, FL: Chapman & Hall/CRC Press, 2016, xix + 241 pp., $89.95 (H), ISBN: 978-1-43-980782-8.

This book provides an extensive survey of research performed on the subject of joint models in longitudinal and time-to-event data. Much of the research concerns the use of these joint models in analyzing longitudinal data in which one or more observations are missing due to death or other circumstances and this missing behavior is modeled using time-to-event analysis techniques. Another situation that is addressed is when the primary response is time-to-event data that may be dependent on time-varying covariates, which are modeled as the longitudinal portion of the joint model. The authors’ expertise in this area shines through their careful attention to detail in presenting the wide variety of settings in which these models can be applied. Overall, I consider the book to be a valuable and rich resource for introducing and promoting this relatively new area of research.

The first chapter of the monograph provides a brief introduction to the research area before presenting a plethora of real motivating examples taken from the literature and used throughout the book to help explain the different methods. The majority of these examples come from biostatistics applications, but the authors make it clear that the subject is very general. The next two chapters provide a review of current methods used in longitudinal and time-to-event data analysis. Chapter 2 focuses on longitudinal analysis and starts by discussing three different types of missing data: missing completely at random, missing at random, and missing not at random. The authors then present the most common models used in longitudinal analysis, primarily focusing on mixed effects models. In Chapter 3, after a brief introduction to the notation and tools of event time analysis (including a brief instruction in the theory of counting processes), the authors focus their discussion to the two most popular models used in event time analysis: the Cox proportional hazards model and the accelerated lifetime model. A significant portion of this chapter is also spent on the modeling of competing risks data.

The majority of the book’s content is contained in Chapter 4 and there certainly is quite a lot of material covered here; this one chapter is nearly three times the size of any one of the others. The authors start by introducing two primary models that serve as the basis of the joint modeling process. First is the selection model, in which the event time is modeled conditional on the longitudinal data and shares the random effects from the marginal longitudinal model. The other model is the mixed model, in which the longitudinal data are now modeled conditional on the event time data, which have their own specified marginal distribution. The authors then proceed to address a variety of settings in which these joint models can be applied, including discrete observation times and situations involving monotone, intermittent, and informative missing data. When discussing each setting, the authors tend to follow a standard pattern: (1) present the special conditions of the setting, (2) introduce the model that has been developed to address the special conditions, (3) discuss the estimation procedures needed for the model, and (4) illustrate the model using an example. Many of the estimation procedures use a frequentist approach, but in some cases Bayesian and nonparametric methods are also presented.

After this tour de force of joint models, the next two chapters address applications of these models to situations involving multivariate data. Chapter 5 specifically deals with modeling competing risks in the time-to-event data, primarily in the context of longitudinal data that can be missing due to one of multiple reasons. Chapter 6 deals with the more general case of multivariate data in either the longitudinal or event time portions of the models, with some special consideration given to the case of recurrent event time data. Finally, Chapter 7 explores topics primarily related to model fitting, including validation of and sensitivity to model assumptions and several topics related to variable selection. There is a small appendix that presents a table of available software for implementation of the joint modeling methods.

Where this book primarily succeeds is in the great care taken by the authors in walking through the necessary details of these joint models and the breadth of topics they cover. When topics are left out, the authors refer to a large body of literature to which the interested reader can look to further their understanding. There are some minor drawbacks to this approach, however. First, there is a significant amount of theory presented in this book, which may make it more difficult to grasp for those without the requisite background knowledge. This is especially true of those of a more applied persuasion, though it is certainly not impossible for the determined reader to fully grasp the concepts so long as they are willing to make the necessary time investment. Additionally, it is easy to become overwhelmed by the large amount of notation, though the authors make a concerted effort to reintroduce notation as much as possible. To cover so much material, there are some sections where it seemed the authors were forced to take a more fast-paced approach in their writing style, though, as stated above, they did their best to refer the reader to other sources to fill in the gaps. One primary concern is with the significant amount of biological terminology present throughout the examples, which may make it difficult for readers unfamiliar with biostatistics applications. It is to be expected that these examples would be from subject areas with which the authors are more familiar and much of the data present in both longitudinal and time-to-event analyses come from biological applications, but if the methods presented are truly general in application, it would have been nice to see more examples from other areas.

Due to the highly technical nature of this book, I would recommend it either as a handy reference for researchers or as a graduate level reference text in a specialized course. This is not to say that those with a more applied background in statistics should dismiss it out of hand as it is truly rich with useful content that can be extracted and applied with due diligence. As an industrial statistician who is well-versed in lifetime data analysis techniques, I personally found this book to be a great resource at introducing and thoroughly presenting the subject matter, despite most of the examples lying outside my realm of expertise. I certainly consider it a valuable addition to my bookshelf for personal reference and, should the need arise, I would be happy to refer it to others who might encounter such data in their work.

Caleb B. King

Sandia National Laboratories

Multiregional Clinical Trials for Simultaneous Global New Drug Development. Joshua Chen and Hui Quan, eds. Boca Raton, FL: Chapman & Hall/CRC Press, 2016, xxii + 353 pp., $99.95 (H), ISBN: 978-1-49-870146-4.

This book is a timely one addressing an important subject—multiregional clinical trials (MRCT). Since 1962, clinical development of new medicinal products (drugs, biologics, and vaccines) has improved humans’ quality of life and extended life expectancy tremendously. Clinical trial practices have been influenced by the 1962 Kefauver Harris Amendment to the Federal Food, Drug, and Cosmetic Act, which introduced a requirement for drug manufacturers to provide proof of the effectiveness and safety of their drugs before approval, required drug advertising to disclose accurate information about side effects, and stopped cheap generic drugs being marketed as expensive drugs under new trade names as new “breakthrough” medications.

In the past half-century, American and European patients benefitted most from the new medicinal products discovered, developed, and approved under Kefauver-Harris. Meanwhile, many countries outside of USA and Europe experienced “drug lag,” with patients waiting for the best drugs available because of the delay in approval. Global drug development has attempted to address this lag, and has been a very important scientific subject since the 1980s.

By early 1990, regulators, pharmaceutical companies, and academics began to realize the importance of this global development challenge. Some of these experts organized the International Conference on Harmonization (ICH). By the mid-1990s, ICH started publishing guidance documents including one in 1998 that deals with MRCT: ICH E5, “Ethnic Factors in the Acceptability of Foreign Clinical Data.” Since the publication of ICH E5 (1998), it has been about 20 years. To address a number of new developments over the past two decades, a new ICH guidance dealing with MRCT, called ICH E17, is being proposed. The timing of this book is just right, providing a resource for experts running clinical trials as well as a reference for those drafting the E17 document.

This book is organized as 26 chapters in five sections: Section 1 is an introduction, Section 2 covers design considerations, Section 3 discusses DMC (Data Monitoring Committee), Section 4 talks about analysis and reporting, and finally Section 5 is a group of miscellaneous topics, with eight chapters. The organization of the book is reasonable and the topical coverage is comprehensive.

Contributors to this book are among the best experts in their corresponding fields. For example, the chapter on the history of clinical development of new drugs and its evolution toward MRCT was written by Christy Chuang-Stein, a globally known statistical scientist whose research covers a broad application of statistics on clinical development of new drugs. Mingxiu Hu is a well-established statistical researcher in oncology, and he contributed the chapter on considerations specific to oncology drug development. As another example, the overview chapter on DMC activities is authored by Janet Wittes, who has extensive experience and publications in DMC activities and is one of the best choices to discuss this important topic.

Most chapters are short, with about 20 pages including references. This length of chapter makes the book easy to read and easy to follow. Concepts involving design, conduct, and analysis of MRCT are not simple, but this book successfully presents all of these topics—setting the stage and drilling down to a reasonable depth, without unnecessary detail. It also introduces many real world examples to help readers understand some of the difficult concepts.

One more nice feature of this book is that although many of the contributors have similar industry and research background, there is not much overlap between chapters. Each chapter introduces a stand-alone, self-contained topic. Cross-referencing of chapters is rarely needed, but is provided where appropriate.

I learned a lot from this book, especially that issues arising in MRCT are much more difficult than issues in multi-center trials, from which much of my experience with clinical trials comes. For example, in multi-center trials within the same country, the selected active control is the same across centers. In MRCT, however, availability of active control agents can be different at different regions. Another example is that in multi-center trials within the same country, the target audience is only one regulatory agency (at least at the study design stage). In designing an MRCT, on the other hand, the target audiences are different regulatory agencies, depending on the development strategy. Further, when dealing with concerns within each region, or within each country, the challenges are more than simply performing subgroup analysis. Subgroup analyses are very common in clinical trials, and most of them are post hoc. The objectives of subgroup analysis include

1.

To address particular concerns in some specific subgroups;

2.

to explore whether the test drug is more efficacious or more harmful to a subset of patients;

3.

to demonstrate consistent efficacy across all subgroups;

4.

to generate new hypothesis of drug effect; and

5.

to address regulatory queries.

However, in MRCT, when a country or a region is performing a subgroup analysis, the emphasis tends to be specifically on the treatment effect within the corresponding country or region. In this case, the considerations could be more focused on objectives 1, 2, and 3 above. Of course the nature of these questions originated from regulatory agencies and it is naturally objective 5. One of the most important concerns from the local regulatory agency’s point of view is whether the treatment effect within their jurisdiction is consistent with the treatment effect from the rest of the world, a “we versus they” comparison.

While I learned many new things from this book, I also developed a number of questions about MRCT, and I hope that a reader would similarly regard this book as a useful starting point as opposed to a complete set of “final answers.” For example, if the expected treatment effect in one region is different from another region, should we plan to recruit more subjects from the region with smaller treatment effect into the MRCT? Or should we recruit fewer?

A related question comes from Chapter 14 (p. 183), which describes the strategy used after seeing results from an interim analysis: “For each individual region, we drop the region at interim if the conditional success rate is less than a given threshold….” In fact, this strategy could potentially introduce operation bias to the entire trial. The statistical hypothesis testing framework starts with a null hypothesis. Statistical procedures are developed under the null. Hence, the region that delivers a small conditional success rate actually reflects the null. If the purpose is to protect α against inflation, then this region should be considered as an important region, and the recruitment of such a region should not be discontinued. My question is, will the strategy proposed in Chapter 14 introduce operation bias?

In many chapters of this book, a common concern is consistency: is the treatment effect observed in each region consistent with the effect from other regions? From a local regulatory agency point of view, if the question is “Is the treatment effect in my region consistent with the treatment effect from the rest of the world?,” then the question can be addressed by evaluating an interaction: μ test , we -μ placebo , we -μ test , other -μ placebo , other ,which is a “we against they” type of interaction comparing test product and placebo. The proposal in Chapter 4 of the book is to evaluate this interaction using a hypothesis test, leading to a series of unanswered questions. In this setting, should the null hypothesis be “consistent” versus the alternative hypothesis of “not consistent”? Or should the null hypothesis be “not consistent” and the alternative “consistent”? In the former case, how can the conclusion be achieved? Furthermore, if the decision is to perform a hypothesis test, then where does the α come from? At the time when the MRCT was designed, were there α values allocated for interaction tests? Will multiplicity adjustment be necessary? My recommendation would be that the local agency should treat this as an estimation problem instead of a hypothesis test, obtaining a point estimate of the specified interaction and a corresponding confidence interval.

As a final example, I wonder whether the region effect should be considered as fixed or random. In a single-region trial with multiple centers, the treatment of center effects used to be widely discussed. In late 1980s up to early 1990s, the accepted common understanding was that from a sponsor’s (the drug development company) point of view, the responsibility is to report everything that was observed to regulatory agencies. Hence centers are thought of as fixed effects. In other words, the sponsor reports to the agency the data as observed, without generalization to centers not observed in the clinical trials. From a regulatory agency’s point of view, if the agency decides to generalize the results submitted by sponsors, they can then consider the center effect as random. In other words, results reported from the sponsor are thought of as a random sample of centers from all possible centers. Now, in MRCT, the sponsor reports region as a fixed effect. How about the regulatory agency? Because in this case, the generalization applies only within the region. In this case, does the regulatory agency have to consider region as a random effect?

The book brings up many more key issues, and I hope that the ICH E17 guidance currently being drafted will help address some of them. This is a very good book—well organized, with good experts contributing, and it addresses a timely, important topic. I consider it a very useful reference.

Naitee Ting

Boehringer Ingelheim Pharmaceuticals, Inc.

http://orcid.org/0000-0002-9168-297X

Theory and Methods of Statistics. P. K. Bhattacharya and Prabir Burman. San Diego, CA: Academic Press, 2016, xiv + 515 pp., $150.00 (P), ISBN: 978-0-12-802440-9.

Aimed at advanced graduate students and researchers in statistics, this book contains a wide range of topics in mathematical statistics. The authors make a point of avoiding the use of measure theory, thus making the subject accessible to a broader mathematical audience. Intuitive explanations as well as plenty of theoretical examples and exercises accompany the numerous mathematical theorems and results included in the book. Altogether, these provide the reader with a heightened understanding of the mathematical aspects of fundamental concepts and principles of statistics, and advanced comprehension of the theory underlying prevalent statistical models and methods.

The first 10 chapters of the book focus mainly on general, classical results in mathematical statistics. The book begins with a review of some basic probability theory in Chapter 1, followed by an overview of properties of several of the most commonly occurring probability distributions in statistics in Chapter 2. Chapter 3 deals with the convergence of sequences of random variables, including the law of large numbers and the central limit theorem. This chapter also contains a collection of useful probability inequalities.

Chapter 4 offers a short introduction to statistical inference, thereafter focusing on sufficient statistics and optimal decision rules. Chapter 5 concerns point estimation, focusing on uniformly minimum variance unbiased estimators and minimum risk equivariant estimators. Various methods of estimation are introduced, and this chapter includes results such as the Rao–Blackwell and Lehmann–Scheffé theorems, Basu's theorem, and the Cramér–Rao inequality. Hypothesis testing is the overall topic of Chapter 6. Uniformly most powerful unbiased tests, the Neyman–Pearson lemma, and sequential probability ratio tests are discussed in this chapter, which also touches upon the topics of p-values and confidence sets. The focus of Chapter 7 is on asymptotic theory for maximum likelihood estimators, including (strong) consistency, asymptotic normality, asymptotic efficiency, as well as the likelihood ratio test and Pearson's chi-squared test.

Chapter 8 discusses nonparametric tests, and starts by introducing permutation tests, the Wilcoxon and Mann-Whitney tests, Spearman's rank correlation, and Kendall's tau. This chapter goes on to discuss asymptotic theory for U-statistics, locally most powerful rank tests, as well as asymptotic theory for the Kolmogorov–Smirnov and Cramér–von Mises statistics. Chapter 9 is about curve estimation, featuring, among other things, kernel density and regression estimation. Some asymptotic properties of these estimators are considered, and there is a description of how cross-validation may be used for bandwidth selection. Finally, Chapter 10 discusses statistical functionals, and L-estimators and M-estimators. Moreover, this chapter briefly touches upon the use of jackknife and bootstrap resampling methods for estimating asymptotic bias and variance.

Taking into account the book's many theoretical examples and exercises, its first 10 chapters would serve well as the central text of an advanced level course in theoretical statistics, aimed at students with a good mathematical background and some prior courses in statistics and probability theory. The authors have focused on communicating core topics in theoretical statistics with a high level of mathematical accuracy, yet purposely steering clear of the technicalities of measure theory. Thus, they have succeeded in writing a book that should allow advanced students to gain a comprehensive knowledge and understanding of the fundamentals of theoretical statistics, within a relatively short period of time. The book could, with advantage, be supplemented by a book like Principles of Statistical Inference (Cox Citation2006). The latter focuses on giving a broad overview of the discipline to a reader with some familiarity with statistics, while keeping the mathematical detail to a minimum.

Chapter 11 of the book under review deals with estimation and inference for Gauss–Markov models, including some discussion of simultaneous inference. This chapter proceeds to consider model selection, and thereafter random- and mixed-effects models. Chapter 12 is devoted to multivariate analysis, and considers multivariate linear models, principal component analysis, factor analysis, classification, and canonical correlation analysis. Time series models are the focus of Chapter 13. This chapter deals with issues like stationarity, estimation of mean and autocorrelation, forecasting, and spectral analysis.

Like the rest of the book, the chapters on linear models, multivariate analysis, and time series are rich in content, but also very theoretical in nature. As implied in the preface, they call for a solid knowledge of advanced linear algebra and calculus. From a teaching perspective, in order for students to fully appreciate the content of these chapters, they should also have some prior practical experience with the statistical methods in question. If not, the application of these methods including, for example, methods for validation of model assumptions should be integrated as part of the course, along with some suitable additional reading.

The lack of illustrations in the book is a pity. For example, the book devotes some space to the problem of bandwidth selection in connection with kernel density estimation, mentioning also that kernel estimators are very sensitive to the choice of bandwidth. Especially for a reader with little or no prior knowledge of the topic, a visual example of some kernel density estimates with different bandwidths would likely have been quite beneficial. The absence of illustrations is even more curious, perhaps, in the pages that discuss the verification of multivariate normality assumptions and validation of time-series models. Various types of diagnostic plots and what to look for in these plots are described in words. However, some examples of these plots would presumably have been useful to a reader with little experience in model assessment.

Arguably, the validation of model assumptions does not necessarily need to play a large role in a book focusing purely on mathematical statistics, even though the topic is of vital importance in applied statistics. Nonetheless, one cannot help but feel that the chapter on Gauss–Markov models lacks mention of model validation, especially since the chapters on multivariate analysis and time series models both include comments on the issue. Some remarks related to the assessment of model adequacy, at least for linear normal models, would have been a useful addition to the section on criteria for model selection.

In summary, the book contains a wealth of information on central topics in mathematical statistics, all made accessible to a reader with no knowledge of measure theory. Yet, to fully grasp and appreciate the extensive content, the reader should still be well versed in mathematics, and have some working knowledge of applied statistics. As a textbook for an advanced level course in theoretical statistics, the book would likely be well supplemented with some less mathematical, more illustrative material, depending on the students’ backgrounds.

Nina M. Jakobsen

University of Copenhagen

http://orcid.org/0000-0002-8623-7492

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.