1,278
Views
28
CrossRef citations to date
0
Altmetric
Applications and Case Studies

Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer

ORCID Icon &
Pages 1079-1091 | Received 01 Oct 2017, Accepted 20 Aug 2019, Published online: 16 Oct 2019
 

Abstract

Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases such as breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with individual SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk–Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk–Jones statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation for SNP-sets of any finite size, and we develop an omnibus statistic that is robust to the degree of signal sparsity. An additional advantage of our work is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study (GWAS). We evaluate the finite sample performance of the GBJ through simulation and apply the method to identify breast cancer risk genes in a GWAS conducted by the Cancer Genetic Markers of Susceptibility Consortium. Our results suggest evidence of association between FGFR2 and breast cancer and also identify other potential susceptibility genes, complementing conventional SNP-level analysis. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Supplementary Material

The supplementary materials provide the proof of Theorem 1 from Section 3.3, offer further details on how to calculate the exact p-value from EquationEquation (5) in Section 3.4, demonstrate the accuracy of the p-value calculation from Section 3.4, give an alternative visualization of the rejection region plots from Section 4, list the exact simulation parameters and provide further power results from Section 5, show diagnostic QQ-plots from the analysis of Section 6, and evaluate the accuracy of the summary statistic correlation approximation using data from Section 6.

Additional information

Funding

This work was supported by the National Institutes of Health grants R35-CA197449, P01-CA134294, U19-CA203654, and R01-HL113338. The authors would like to thank the editor, associate editor, and referees for helpful comments that have improved the article.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.