Search in:

Journal of the American Statistical Association Volume 115, 2020 - Issue 531

Submit an article Journal homepage

1,278

Views

CrossRef citations to date

Altmetric

Applications and Case Studies

Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer

Ryan SunDepartment of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA; Correspondence[email protected]

http://orcid.org/0000-0003-1176-1561 View further author information

Xihong LinDepartment of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA; ;Department of Statistics, Harvard University, Cambridge, MAView further author information

Pages 1079-1091 | Received 01 Oct 2017, Accepted 20 Aug 2019, Published online: 16 Oct 2019

Cite this article
https://doi.org/10.1080/01621459.2019.1660170
CrossMark

Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/01621459.2019.1660170?needAccess=true

Abstract

Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases such as breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with individual SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk–Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk–Jones statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation for SNP-sets of any finite size, and we develop an omnibus statistic that is robust to the degree of signal sparsity. An additional advantage of our work is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study (GWAS). We evaluate the finite sample performance of the GBJ through simulation and apply the method to identify breast cancer risk genes in a GWAS conducted by the Cancer Genetic Markers of Susceptibility Consortium. Our results suggest evidence of association between FGFR2 and breast cancer and also identify other potential susceptibility genes, complementing conventional SNP-level analysis. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

KEYWORDS:

Breast cancer
FGFR2 gene
Gene-level test
Generalized higher criticism
Sparse alternative

Supplementary Material

The supplementary materials provide the proof of Theorem 1 from Section 3.3, offer further details on how to calculate the exact p-value from EquationEquation (5)(5) $\begin{matrix} Pr (G_{d} \geq g) \\ = 1 - Pr {\forall j = 1, 2, \dots, d : | Z |_{(j)} \leq b_{j} | Z \sim MVN (0, Σ)}, \end{matrix}$ (5) in Section 3.4, demonstrate the accuracy of the p-value calculation from Section 3.4, give an alternative visualization of the rejection region plots from Section 4, list the exact simulation parameters and provide further power results from Section 5, show diagnostic QQ-plots from the analysis of Section 6, and evaluate the accuracy of the summary statistic correlation approximation using data from Section 6.

Additional information

Funding

This work was supported by the National Institutes of Health grants R35-CA197449, P01-CA134294, U19-CA203654, and R01-HL113338. The authors would like to thank the editor, associate editor, and referees for helpful comments that have improved the article.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related Research Data

The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies.

Source: Taylor & Francis

On the exact Berk-Jones statistics and their $p$-value calculation

Source: Institute of Mathematical Statistics

Powerful SNP-set analysis for case-control genome-wide association studies.

Source: Elsevier BV

Binary Regression Using an Extended Beta-Binomial Distribution, with Discussion of Correlation Induced by Covariate Measurement Errors

Source: Informa UK Limited

Anticancer Activity of the Cholesterol Exporter ABCA1 Gene

Source: The Authors. Published by Elsevier Inc.

The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures

Source: arXiv

Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test

Source: The University of North Carolina at Chapel Hill University Libraries

Rare-variant association analysis: study designs and statistical tests.

Source: Elsevier BV

Five years of GWAS discovery

Source: Elsevier BV

A global reference for human genetic variation.

Source: Springer Nature

Fast calculation of boundary crossing probabilities for Poisson processes

Source: Elsevier BV

Binary Regression Using an Extended Beta-Binomial Distribution, with Discussion of Correlation Induced by Covariate Measurement Errors

Source: Informa UK Limited

A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer

Source: Springer Science and Business Media LLC

Goodness-of-fit tests via phi-divergences

Source: arXiv

Source: Elsevier BV

Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer

Source: Taylor & Francis

dmGWAS: dense module searching for genome-wide association studies in protein–protein interaction networks

Source: Oxford University Press (OUP)

Goodness-of-fit test statistics that dominate the Kolmogorov statistics

Source: Springer Science and Business Media LLC

Generalized Linear Models

Source: Springer US

Principal components analysis corrects for stratification in genome-wide association studies

Source: Springer Science and Business Media LLC

Linking provided by

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer

Related Research Data

Information for

Open access

Opportunities

Help and information

Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer

Abstract

Supplementary Material

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature