2,348
Views
0
CrossRef citations to date
0
Altmetric
Book Review

Bayesian Thinking in Biostatistics.

Gary L. Rosner, Purushottam W. Laud, and Wesley O. Johnson. Boca Raton, FL: Chapman & Hall/CRC Press, 2021, xix + 607 pp., $120.00(H), $96.00(e-book), ISBN: 978-1-43-980008-9(H), 978-1-43-980010-2(e-book).

ORCID Icon

Bayesian statistics is becoming more and more popular in biostatistics partly because (a) there is almost always prior information available, (b) uncertainty quantification is crucial in biological and biomedical applications, (c) adaptiveness is desired for modern clinical trial design, and (d) the increasing complexity of data and scientific questions calls for more flexible statistical models. Despite its fast growing popularity, Bayesian statistics has not been widely taught at the introductory level yet and hence scientists from other disciplines often do not have the right tools in their toolboxes to apply Bayesian methods to their research. This textbook aims to introduce Bayesian statistics to intro-level biostatistics courses, which could benefit students majored in biostatistics as well as nonstatisticians who are interested in Bayesian biostatistics. Overall, I enjoy reading the book very much.

The book can be roughly divided into three major parts: (a) Chapters 1–6 laying the foundation of Bayesian statistics, (b) Chapters 7–10 covering Bayesian regression models and model assessment, and (c) Chapters 11–15 specializing in various Bayesian models for biostatistics applications (survival analysis, clinical trial design, longitudinal data analysis, and diagnostic tests). While any book with a finite number of pages could not possibly cover every aspect of each topic, the authors list references to books and papers in each chapter under the section “Recap and Readings” so that readers could expand their study on the topic through further readings. The book has exercises in each chapter and is accompanied by a dedicated website https://github.com/BTB-RLJ with data and code in BUGS, JAGS, and Stan, which make it an excellent textbook for students as well as a great reference book for scientists who are interested in applying Bayesian methods to their research problems. Below, I will briefly review each chapter of the book.

Chapter 1 introduces the background of Bayesian inference in analyzing scientific data. It points out that Bayesian inference is based on hypothetical data generating probability models combined with existing knowledge of the models represented by prior distributions. Therefore, Bayesian inference is by definition model-based (data generating probability models and prior models). Moreover, all modeling uncertainty can be and should only be quantified by the posterior probability distribution. The chapter also briefly describes several datasets considered in later chapters of the book.

Chapter 2 starts by first introducing Bayes’ theorem for discrete distributions, which is later generalized to general distributions. The Bayes’ theorem is at the heart of Bayesian inference as it gracefully turns the conditional probability of data given model parameters (i.e., the data generating probability model) into conditional probability of model parameters given data (i.e., the posterior distribution) with the help of marginal distribution of model parameters (i.e., the prior distribution). They then go on discussing how to use posterior distribution to make statistical inference and prediction. At the end of the chapter, they also provide an overview of Monte Carlo approximation of intractable posterior distributions, which will be discussed in detail in Chapter 4.

Chapter 3 applies the principles of Bayesian inference outlined in Chapters 1 and 2 to well-known parametric models such as binomial, normal, Poisson, and exponential for exchangeable (i.e., conditionally iid) observations. Conjugate priors are used so that the posterior distributions can be conveniently derived. The chapter also touches on Bayesian nonparametric models—most notably the Dirichlet process mixture models.

Chapter 4 delves into various computational algorithms for modern Bayesian inference where posterior distributions are most likely intractable analytically. The algorithms include normal and Laplace approximation, Monte Carlo sampling (e.g., importance sampling and rejection sampling), and Markov chain Monte Carlo (e.g., Gibbs sampler, Metropolis–Hastings, slice sampling, and Hamiltonian Monte Carlo). For Markov chain Monte Carlo algorithms, it is important to check convergence or lack thereof in practice; several convergence diagnostic tools are covered in the chapter.

Chapter 5 extends the models in Chapter 3 to two-sample data, which is motivated by real-world scenarios where comparing two populations instead of understanding just one population is desired.

Chapter 6 formally introduces the prior specification process. While conjugacy is convenient, it is not the only consideration when specifying a prior. When relevant information of the data generating models is available, it would be almost silly not to incorporate it; an informative prior can be used to incorporate such information. When there is no reliable prior information, a reference prior may be adopted in order to still carry out the Bayesian inference. The authors also caution readers of using what they call disinformative priors, which could bias the inference in a harmful way.

Chapters 7–9 cover Bayesian inference of various regression models including linear, binary, Poisson (and its variants), and nonlinear regression.

Chapter 10 is devoted to an important topic—model assessment, which includes model comparison/selection and checking of model fit. For model selection, they discuss techniques based on Bayes factors as well as those based on information criteria (AIC, BIC, WAIC, and DIC). For model checking, they introduce Box and Johnson checks for goodness of fit and methods for checking outliers. Linear regression is used as an example for both model selection and checking.

Chapters 11 and 12 introduce models for survival data, one feature of which is that survival times are often censored. Chapter 11 discusses modeling exchangeable survival times via exponential, Weibull, competing risks models, etc. whereas Chapter 12 incorporates covariates into survival analysis via time-to-event regression models including accelerated failure-time model, proportional hazards model, and frailty model. Both parametric and nonparametric models are considered with the former being the main focus.

Chapter 13 discusses design of clinical trials. Phase 1 trials (also known as does finding trials or does escalation trials) in oncology aims to identify the maximally tolerated dose (MTD) with respect to some predefined dose-limiting toxicity. After MTD has been found, Phase 2 trials are carried out to demonstrate the efficacy of a new treatment often via a randomized design. Phase 3 trials are typically much larger randomized studies aiming to further collect evidence to support the results found in Phase 2. The chapter demonstrates several salient features of Bayesian approaches to design clinical trials. For example, in Phase 2 and 3 trials, frequentist approaches would violate the likelihood principle in interim analyses whereas Bayesian approaches are naturally free of such violation; as a result, Bayesian design is often more flexible in terms of when to peek at the data.

Chapter 14 introduces Bayesian hierarchical models, which generalize models discussed from earlier chapters by introducing latent variables. Such latent variables may be useful to account for dependencies of observations across groups or spatial-temporal domain. Bayesian hierarchical models have wide applicability such as models for (partially) exchangeable observations, longitudinal data, and spatial data.

Chapter 15 pays specific attention to modeling medical diagnostic tests. It covers binary tests as well as tests based on continuous biomarkers. When additional covariates are available, ROC regression is introduced to study the dependence between the diagnostic tests and covariates.

Yang Ni
Department of Statistics, Texas A&M University
College Station, TX
[email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.