70
Views
6
CrossRef citations to date
0
Altmetric
Original Research

A simple method for assessing the strength of evidence for association at the level of the whole gene

, &
Pages 115-120 | Published online: 17 Nov 2008

Abstract

Introduction

It is expected that different markers may show different patterns of association with different pathogenic variants within a given gene. It would be helpful to combine the evidence implicating association at the level of the whole gene rather than just for individual markers or haplotypes. Doing this is complicated by the fact that different markers do not represent independent sources of information.

Method

We propose combining the p values from all single locus and/or multilocus analyses of different markers according to the formula of Fisher, X = (−2ln(pi)), and then assessing the empirical significance of this statistic using permutation testing. We present an example application to 19 markers around the HTRA2 gene in a case-control study of Parkinson’s disease.

Results

Applying our approach shows that, although some individual tests produce low p values, overall association at the level of the gene is not supported.

Discussion

Approaches such as this should be more widely used in assimilating the overall evidence supporting involvement of a gene in a particular disease. Information can be combined from biallelic and multiallelic markers and from single markers along with multimarker analyses. Single genes can be tested or results from groups of genes involved in the same pathway could be combined in order to test biologically relevant hypotheses. The approach has been implemented in a computer program called COMBASSOC which is made available for downloading.

Introduction

A commonplace issue that arises when carrying out case-control studies to detect genetic association is that more than one marker within the same gene may support association. From a genetic point of view it may be expected that different markers may be in linkage disequilibrium (LD) with a pathogenic variant and from a biological point of view it may be expected that different variants within the same gene may have a role in influencing risk of disease. Often the hypothesis of interest is whether variants in a given gene influence risk rather than whether one particular marker demonstrates association. Hence it would be desirable to combine information from multiple markers in order to obtain an overall measure of the evidence implicating a gene. As has been discussed (CitationNeale and Sham 2004), this issue is perhaps especially pertinent in the context of GWA studies. If the situation arises where a number of markers within a single gene achieve modest levels of significance then most people would agree that this finding would be of more interest than if the same number of markers achieved the same results but were randomly positioned with respect to each other.

Typically, an association study claiming to find evidence to support the involvement of a gene will present results obtained from several or many markers in the vicinity. A few single markers may individually produce small p values and results from some multimarker methods, using logistic regression or inferred haplotypes, may be presented as offering additional support. Varied numbers of markers in different combinations will have been studied and results from the analyses yielding the most positive results will be presented. There may be an attempt to deal with multiple testing issues by carrying out simulations in order to obtain the empirical significance of the most highly significant result. However we argue here that the main point of interest is not the true statistical significance of only the most strongly positive analysis but rather the inference to be derived from the overall combination of results obtained from different markers and methods. It is this combination of results, in the form of p values from different single marker and multimarker tests, which is usually presented by the authors with the tacit invitation that readers use their own judgement and intuition to decide on the strength of the evidence implicating the gene in question. It would be helpful to have a formal method to support this process.

A number of complexities need to be dealt with. Firstly, markers within a gene do not represent independent sources of information since some will be in LD with each other. Also, there may be different variants influencing risk, perhaps to different extents. If this is so then alleles of some markers may show association through their proximity to one variant while other markers may detect the effect of another variant. Alternatively, different haplotypes of the same marker set may be associated with different variants. Some markers within the same gene may demonstrate little or no LD with each other and hence be relatively independent. Markers some distance from the coding region may nevertheless detect association. There may be a relatively large number of markers to deal with and methods which involves combining all into a conventional multi-marker analysis (CitationChapman et al 2003, Citation2007; CitationClayton et al 2004) may be impractical, because of the large number of parameters involved, and/or inappropriate, because different variants may produce different patterns of association with different subsets of markers.

One early approach to tackling this issue was to consider combining results from groups of neighboring markers which were close enough to each other to be in LD (CitationZaykin et al 2002). This resulted in a series of p values produced from overlapping marker sets forming a sliding window analysis but did not produce an overall statistic at the level of the whole gene. A subsequent development (CitationChen et al 2006) considered combining results from analysis of single SNPs with one overall haplotype analysis. Other approaches combined results from either single marker analyses (CitationHoh et al 2001; CitationPotter 2006) or results from different multimarker analyses using sliding windows incorporating a weighting scheme for markers flanking the central marker of each window (CitationYang et al 2006). The evaluation described in the first of these studies (CitationPotter 2006) showed that combining p values according to the method of Fisher (CitationFisher 1925) produced good power compared with other approaches. A method has been proposed to use extreme-value distributions to evaluate the significance of results over blocks of markers (CitationDudbridge and Koeleman 2004) but it is not clear that this could readily be applied to the variety of different methods which are used to evaluate the evidence implicating a particular candidate gene. Here we present a natural development of these ideas which allows the assessment of a whole gene. It differs from previous methods in that it uses information from both multiple markers and multiple methods of analysis. Information can be combined from single marker analyses along with multimarker analyses using different numbers of markers, which may be biallelic or multallelic, and based on haplotypes or locus-scoring methods. No matter how many different methods are applied, one can still arrive at an overall p value which provides a measure of the strength of evidence supporting the hypothesesis that one or more variants in the gene influence susceptibility to the phenotype being studied.

Method

The approach consists of two stages. The first is to combine the evidence for association and the second is to assess the strength of the evidence.

The method we use for combining p values is that due to CitationFisher (1925). This is based on the observation that, if n independent tests are made of the same hypothesis, then X = (−2ln(pi)) is distributed as a χ2 with 2n df. The p values to be combined could be obtained from a set of single marker analyses or could come from both single marker and multimarker analyses. The summative measure obtained, X, could be taken to provide a combined measure of the strength of evidence in favor of association for a group of markers except that we do not expect the contributions to be independent.

This is dealt with by the second stage of our procedure which is simply to use permutation testing to assess the empirical significance of X. If we keep the multimarker genotypes intact and permute case-control labels then this will fully deal with all the interdependencies of the markers due to LD between them and of interdependencies between the methods of analysis. Among other things it may also mitigate the effects of over-correcting for multiple markers (as could occur if a Bonferroni correction were applied), some of which may be scarcely informative. The procedure we propose for obtaining an empirical significance is sequential Monte Carlo testing (CitationBesag and Clifford 1991). When carrying out permutation testing, rather than setting the number of permuted replicates, n, to a fixed number one instead sets a target for r, the number of times that a permuted replicate should exceed the test statistic obtained from the real dataset. Typically the target for r might be set to a value of 10 or 20. One would also set some maximum value of n to ensure that the procedure did eventually finish. If the target value for r is reached then the empirical significance is given by p = r/n while if the target is not reached before n reaches its maximum value the empirical significance is given by p = (r + 1)/(n + 1), as used in conventional Monte Carlo testing (CitationNorth et al 2003a). The sequential approach produces a very valuable increase in speed of permutation testing when the p value to be estimated turns out to be non-significant. If there is no association present then the number of permutations expected to be performed before the target is reached is approximated by r + r log((n + 1/2)/(r + 1/2)) (CitationBesag and Clifford 1991). For example, with a target of r = 10 and n = 9999 then one may expect to perform 39.8 permutations, achieving a 250-fold speed increase compared with using the conventional method. By permuting the multimarker genotypes against phenotype this approach can be trusted to yield the correct Type 1 error rate when the null hypothesis is true.

To summarize, we propose that to obtain an overall measure for the strength of evidence supporting involvement of a gene which has been typed with a number of markers subjected to different single locus and multilocus methods of analysis one first derives X = (−2ln(pi)) and then assesses the empirical significance of X using permutation testing.

In order to provide a demonstration of the approach in practice, we applied it to a publicly available case-control dataset. This consisted of consisted of 270 subjects with Parkinson’s disease and 271 controls genotyped for a GWA study using the Illumina Infinium I and Infinium II assays (CitationFung et al 2006) These genotypings were downloaded from the Coriell Institute (http://ccr.coriell.org). There has been a previous report that two different mutations within the HTRA2 gene may be associated with Parkinson’s disease (CitationStrauss et al 2005). According to the UCSC browser (http://genome.ucsc.edu/cgi-bin/hgGateway), HTRA2 is located on chromosome 2 at 74610040-74614191. We selected 19 SNPs spanning this region ranging from rs6718621 at 74512208 to rs10170219 at 74715172 and calculated individual p values testing for association with each marker using the SCANASSOC program (CitationCurtis et al 2006). In addition to single marker analyses we carried out haplotype-based tests for association using consecutive sets of two or three markers. We then applied the new approach to assess the overall evidence for association obtained from this group of 19 markers. We set a target of 10 for r, the number of permuted datasets to achieve the value obtained from the real one, and we set a maximum number of permutations, n, to be 9999.

Results

The results from the tests of the individual markers are shown in . It can be seen that one marker, rs2241027, is significant at p = 0.04 and that two others yield p values below 0.1. One three-marker analysis has a test-wise significance of 0.03. We combined all 54 values according to the formula X = (−2ln(pi)) and obtained a value of 149.4. Taking this as a χ2 statistic with 108 degrees of freedom would produce a nominal p value of 0.005. However, when we carried out permutation testing the target number of 10 permuted datasets to produce this value or higher was reached after only 62 permutations, corresponding to an empirical significance of 10/62 = 0.16.

Table 1 Markers spanning HTRA2 showing individual p values obtained for tests for association with Parkinson’s disease

Discussion

The approach we propose seems simple and to have face validity. It adequately deals with the issues of non-independence between markers and methods of analysis while allowing the combining of information from many markers from different regions of the same gene. There may be some benefit in considering it in relation to other approaches for combining evidence from diverse sources. The philosophy underlying the Bonferroni correction and related procedures such as the estimation of the false positive report probability (CitationWacholder et al 2004) is that one is carrying out a number of unrelated experiments and one wishes to test whether for at least one of them the alternative hypothesis may be true. The philosophy of Fisher’s approach is that one is carrying out multiple independent experiments to test a single hypothesis. Notionally, one may then expect that the same effect will be present in all experiments although stochastic factors will impact on the results one obtains in practice. Thus one may expect that some studies may yield significant results while others may, through chance or small sample size, be formally non-significant. Nevertheless one will tend to see that the p values obtained over all experiments are smaller than would be expected by chance. When interpreting data from markers around a single gene one faces a hybrid situation. One expects that some markers may provide information regarding the main hypothesis, that the gene concerned affects the phenotype studied, while other markers will not be in LD with functional variants and hence will behave as unrelated sources of essentially random effects. One way to model this situation would be to carry out logistic regression analysis with each marker being treated as an independent variable contributing to risk (CitationChapman et al 2003), although it is not clear the extent to which significance testing based on asymptotic distributions would be appropriate if more than a few markers were included in such an analysis. Certainly, what we notice in practice is that authors report the best results they have obtained from single marker and multimarker analyses, generally without any formal attempt to consolidate the overall evidence implicating a gene. Our method of combining all results from all sources and carrying out permutation testing does provide a means to obtain such a summary p value.

Although we are unable to find any published account, it appears that a somewhat similar method to ours may be implemented in Shaun Purcell’s PLINK program (CitationPurcell et al 2007) as described in the on-line documentation (http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#set). According to the documentation accompanying version 0.99q (3 March 2007), this can carry out analyses on subsets of markers selected from a set designated by the user. The subset size is varied between a minimum and maximum size also specified by the user and the best result obtained is defined in terms of the sum of the largest chi-squared statistics from a subset of each size. The overall significance is then evaluated using a permutation procedure. We suspect that our method would be similar to setting both the minimum and maximum subset size to be equal to the size of the whole set. That is, one would simply sum the chi-squared results for all markers. However the documentation implies that one should avoid doing this by setting the maximum subset size to a “reasonable number” in order to avoid performing an “unnecessary number of tests”. If the minimum and maximum sizes differ then in fact additional tests are performed for the different sizes. Our approach explicitly addresses the possibility that different kinds of analysis might be used. The software we have implemented allows incorporation of locus-based logistic regression as well as haplotype-based analyses. In principle other methods of analysis, for example neural network analysis (CitationNorth et al 2003b) or haplotype clustering methods (CitationKnight et al 2008), could be accommodated. Our approach defines in advance the markers of interest and takes a summary statistic derived from all their p values simultaneously. Likewise, previous work suggests that, at least in some situations, more powerful tests will result if only p values below a certain threshold are combined (CitationZaykin et al 2002). Once again, the choice of threshold is arbitrary and it is not clear that using this truncation will always be of benefit. The exploration of the advantages and disadvantages of each approach could be the subject of further investigation.

Our example application does not provide support for association between Parkinson’s disease and HTRA2. Different conclusions might have been drawn had there been a stronger prior hypothesis, for example if the three markers with p < 0.1 had been specifically implicated in other studies. At the level of the gene, however, our overall result is negative. There appear to be more small p values than would be expected by chance (as is clear from the (−2ln(pi))) so a naïve interpretation might have been that these markers did support association. However once we apply our permutation we can see that, because of the non-independence of the p values, in fact the results are well within chance expectation. This demonstrates the value of our approach in being able to summarize the available evidence.

A number of extensions to this basic approach could be developed. We should begin by pointing out that even as it stands the method can combine information from biallelic and multiallelic markers. It can also combine information from both single marker analyses and multimarker analyses. Thus one might wish to treat some sets of markers as being suitable for haplotype analysis and combine the information from these with results from other markers or groups of markers. As demonstrated in the example above, one can also combine results from single marker and multimarker analysis of the same markers. That is, if one had 4 markers one could combine the 4 single marker p values along with a p value obtained from haplotype analysis of all of them. Using multiple methods of analysis may risk reducing power somewhat but the overall significance level obtained remains valid.

Other ways could be considered to combine the individual p values. For example, more weight could be given to markers within coding regions or those closer to rather than further from the gene or those having been implicated in previous studies. Again, the permutation testing will ensure that whatever method is used to combine them the empirical significance level will still be valid. Results from functionally related groups of genes could be combined. This would provide evidence to implicate a particular pathway or system rather than an individual gene.

We should note some situations in which the empirical significance level would not be valid. The main principle is that the p values to be combined must not be selected on a post hoc basis. For example, one must not notice that a particular intron contains a number of interesting results and then combine the results just from that intron. One cannot elect to include markers from some distance away after seeing that some appear to support association. One cannot perform a number of different multimarker analyses and then include the results from only the most significant ones. One can apply this approach to a gene which appears interesting based on the fact that a number of markers within it appear to show some evidence for association but only if one then proceeds to make a standard multiple-testing correction for all the other genes for which genotypes were obtained.

We acknowledge that although our approach may appear theoretically attractive we are not currently able to present clear evidence regarding its power compared with other methods. This is because it is intended to deal with a situation which is biologically plausible – that different mutations in the same gene might each have an effect on a given phenotype – but for which real data are lacking and for which plausible computer simulations would be technically difficult. One would need to model datasets in which multiple mutations occurred within the same gene along with the complex and realistic LD relationships for markers around each mutation. We have previously studied such models in the context of a single mutation (CitationNorth et al 2006) but have not as yet produced a procedure to carry out systematic studies of the simulated effects of multiple mutations. We do not expect that the approach would be any more powerful than pre-existing methods in the simple situation of a single mutation. Although we cannot claim to have demonstrated that the approach is necessarily more powerful than other methods, we are confident at least that the permutation procedure means that the overall result is valid, that is that the Type 1 error rate is correct. This means that our approach does at least provide a way of summarizing the available evidence implicating a particular region rather than having to rely upon the reader’s subjective judgment based on a number of non-independent p values obtained from different analyses.

The method for combining results from different analyses has been implemented in the COMBASSOC program, which is available along with the other programs to support GENECOUNTING (CitationZhao et al 2002), available from our website at: www.mds.qmul.ac.uk/statgen. Analyses can consist of any number of single marker tests, multimarker haplotype analyses and multimarker locus-wise analyses using logistic regression. If desired, different subsets of markers can be selected for different analyses. No matter how many different tests are performed, results from all are combined to produce one overall measure of the strength of evidence in favor of association and the empirical significance of this is derived using permutation testing, providing a single overall p value.

We hope that the approach outlined will prove attractive and practical. It provides a simple and intuitive way to provide some objective assessment of the overall evidence for association produced by a group of markers. We consider such an approach to be preferred to the widespread practice of quoting the individual significance of a number of different single marker and/or multimarker analyses and leaving it to the reader to form some kind of judgement as to the implication of the results.

Acknowledgments

AEV was supported by Wellcome Trust Project Grant, Grant No. 076392. JK was supported by an MRC Bioinformatics Training Fellowship, Grant No. G0501329.

Disclosures

The authors disclose no conflicts of interest.

References

  • BesagJCliffordP1991Sequential Monte Carlo p-valuesBiometrika783014
  • ChapmanJClaytonD2007One degree of freedom for dominance in indirect association studiesGenet Epidemiol312617117266117
  • ChapmanJMCooperJDToddJA2003Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical powerHum Hered56183114614235
  • ChenBESakodaLCHsingAW2006Resampling-based multiple hypothesis testing procedures for genetic case-control association studiesGenet Epidemiol3049550716755536
  • ClaytonDChapmanJCooperJ2004Use of unphased multilocus genotype data in indirect association studiesGenet Epidemiol274152815481099
  • CurtisDKnightJShamPC2006Program report: GENECOUNTING support programsAnn Hum Genet70277916626337
  • DudbridgeFKoelemanBP2004Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studiesAm J Hum Genet754243515266393
  • FisherRA1925Statistical methods for research workers13 edLondonOliver and Boyd
  • FungHCScholzSMatarinM2006Genome-wide genotyping in Parkinson‘s disease and neurologically normal controls: first stage analysis and public release of dataLancet Neurol5911617052657
  • HohJWilleAOttJ2001Trimming, weighting, and grouping SNPs in human case-control association studiesGenome Res112115911731502
  • KnightJCurtisDShamPC2008CLUMPHAP: a simple tool for performing haplotype-based association analysisGenet Epidemiol325394518395815
  • NealeBMShamPC2004The future of association studies: gene-based analysis and replicationAm J Hum Genet753536215272419
  • NorthBShamPCKnightJ2006Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci [online]Ann Hum Genet7089390617044864
  • NorthBVCurtisDShamPC2003aA note on calculation of empirical P values from Monte Carlo procedureAm J Hum Genet72498912596795
  • NorthBVCurtisDCassellPG2003bAssessing optimal neural network architecture for identifying disease-associated multi-marker genotypes using a permutation test, and application to calpain 10 polymorphisms associated with diabetesAnn Hum Genet673485612914569
  • PotterDM2006Omnibus permutation tests of the association of an ensemble of genetic markers with disease in case-control studiesGenet Epidemiol304384616671109
  • PurcellSNealeBTodd-BrownK2007PLINK: a tool set for whole-genome association and population-based linkage analysesAm J Hum Genet815597517701901
  • StraussKMMartinsLMPlun-FavreauH2005Loss of function mutations in the gene encoding Omi/HtrA2 in Parkinson’s diseaseHum Mol Genet1420991115961413
  • WacholderSChanockSGarcia-ClosasM2004Assessing the probability that a positive report is false: an approach for molecular epidemiology studiesJ Natl Cancer Inst964344215026468
  • YangHCLinCYFannCS2006A sliding-window weighted linkage disequilibrium testGenet Epidemiol305314516830340
  • ZaykinDVZhivotovskyLAWestfallPH2002Truncated product method for combining P-valuesGenet Epidemiol2002221708511788962
  • ZhaoJHLissarragueSEssiouxL2002GENECOUNTING: haplotype analysis with missing genotypesBioinformatics181694512490459