1,501
Views
0
CrossRef citations to date
0
Altmetric
Indices

Book Review

ORCID Icon

This is a fantastic book intensively focusing on the mathematical underpinnings of modern genome-wide association studies (GWAS). It serves well for senior graduate students in applied mathematics, computer science, and statistics who are interested in building a solid mathematical understanding of GWAS. Backgrounds of advanced mathematics and genetics are expected. It can also be used as a handbook for professionals to quickly check mathematical contexts of GWAS approaches and tools. This book is especially helpful for the latest generation of statistical geneticists who are pursuing academic career paths.

Chapter 1 lays out the mathematical foundations for the rest of the book. This chapter reviews the prerequisite knowledge for solving optimization problems in GWAS. A systematic introduction to metrics of similarity and distance used in sparsity-penalized optimization problems is provided. This introduction is followed by proximal methods using these metrics and matrix calculus for calculating partial derivatives—both essential in GWAS optimization approaches. Auxiliary tools for dimension reduction and correlation analysis are then overviewed. This chapter is crucial to the whole book in two ways: first, readers need to fully understand the contents in this chapter before moving forward to the following chapters; moreover, readers can use this chapter as a litmus test to evaluate whether their mathematical background is ready for this book.

Chapter 2 introduces the concept of linkage disequilibrium (LD), which is a cornerstone of GWAS. This chapter covers the statistical definition of LD, the measure of LD in various scenarios, optimization problems for reconstructing haplotypes, and correlation analysis across genomics regions. Readers are expected to be familiar with basic genetics concepts such as haplotypes, alleles, evolution mechanisms, etc., since such concepts are not defined or reviewed in this book.

Chapter 3 through Chapter 7 explore main association problems. The journey starts with association studies for qualitative traits (Chapter 3), preparing readers gradually, from the Hardy–Weinberg equilibrium and how it provides the basis for statistical tests used in association studies, to associating single markers or multiple markers to a phenotype, to multivariate association at population level. Importantly, Chapter 3 emphasizes the association analysis challenges for the high-dimensional omics data generated by next generation DNA sequencing (DNA-Seq) technologies and state-of-the-art solutions for such big data. In Chapter 4, the discussion advances into association studies for quantitative traits as well as the linear regression models used for such analysis. Advanced algorithms used to address the challenges in DNA-Seq data analysis, namely kernel algorithms and non-linear mapping, are discussed in detail. A highlight in this chapter is the carefully designed strategy in explaining the reproducing kernel Hilbert space (RKHS), which is the foundation of kernel algorithms. Chapter 5 furthers the discussion from single phenotype to multiple phenotypes, covering multivariate linear regression, canonical correlation analysis (CCA), dimension-reduction approaches such as principal component analysis (PCA), the kernel algorithms used in CCA and PCA, and regularization methods. Chapter 6 continues the discussion of association analysis on a different population: individuals with known family relations. Methods covered in Chapter 3 through Chapter 5 are revisited in the context of genetic covariance between family members. Chapter 7 discusses an interesting new topic in association analysis, the gene-gene and gene-environment interactions, with the applications of previously discussed approaches under this topic. These chapters are well organized in a gradual progression of discussions on problems and solutions. The audience is recommended to read in the same order. Kernel algorithms specialized for DNA-Seq omics data are a major highlight.

Chapter 8 briefly reviews a wide range of classical machine learning approaches that translate the knowledge discovered in association analysis to predictive models for disease risk and precision medicine. Overall, this book helps readers to build a solid mathematical and theoretical foundation for state-of-the-art GWAS research and paves paths to academic and methodological research. It is well-organized and lists resources of software and libraries at the end of each chapter. Meanwhile, it is worth noting that as one of the two books of the “Big Data in Omics and Imaging” series, this book focuses on genomics data and GWAS approaches. The other book in the same series by the same author, Big Data in Omics and Imaging: Integrated Analysis and Causal Inference, covers multiomics data analysis approaches over a wide range of omics data including genomics, transcriptomics, epigenomics, proteomics, and radiomics (generated from biomedical images) as well as modern biomedical big data generated from wearable health devices. The readers are recommended to read the series as a whole.

This book has a few limitations. The introduction to mathematical foundations is highly focused on the needs of GWAS studies. Therefore, it should not replace prerequisite math courses. Additionally, this book is not meant to be used as a practical bioinformatics handbook on workflows and pipelines for processing and analyzing specific types of genomics data such as those generated from SNP arrays, whole genome sequencing, or whole exome sequencing. Finally, this book does not cover the theories, models, algorithms, or tools beyond kernel algorithms for big data analysis.

Jing Su
Wake Forest School of Medicine

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.