78
Views
0
CrossRef citations to date
0
Altmetric
Short Report

Lack of correlation between in silico projection of function and quantitative real-time PCR-determined gene expression levels in colon tissue

, , , , , , & show all
Pages 99-103 | Published online: 09 Sep 2013

Abstract

There are a number of in silico programs that use algorithms and external web sources to predict the effect of single nucleotide polymorphisms (SNPs). While many of these programs have been shown to predict accurately the effect of SNPs in functional areas of the gene, such as 5′ upstream or coding regions, empiric research may be warranted to confirm the functional consequences of SNPs that are predicted to have little to no effect. We compared predictions from FASTSNP (Function Analysis and Selection Tool for Single Nucleotide Polymorphism) and F-SNP (Functional Single Nucleotide Polymorphism) with experimentally derived genotype-phenotype correlations to determine the accuracy of these programs in predicting SNP functionality. We used normal colon tissue to evaluate 24 TagSNPs within six genes. Two of 16 SNPs that were predicted to have no functional effect in FASTSNP were significantly associated with gene expression. Only one of the eight SNPs that were predicted to have a low to high effect was significantly associated with gene expression. While the two in silico programs that were used were similar in their results for the SNPs predicted by FASTSNP to have no effect, of SNPs with scores from low to high, there were three that received an F-SNP score below what is considered functionally significant. In silico programs can fail to identify functional SNPs, supporting a continuing role for empiric analysis of SNP function. Laboratory analysis is necessary to identify causal SNPs accurately, establish biological plausibility of the effect, and ultimately inform cancer prevention strategies.

Introduction

The ability to link functional genetic variants with disease risk leads to advances in diagnostics and therapeutics.Citation1 Over 10 million single nucleotide polymorphisms (SNPs) have been reported,Citation2 with an estimated 100,000–300,000 that alter an amino acid.Citation3 In silico prediction programs have been developed to identify SNPs with possible functional effects.Citation4 Several programs are available, each with unique algorithms to assess the potential effect of an amino acid sequence substitution.Citation5 For instance, FASTSNP (Function Analysis and Selection Tool for Single Nucleotide Polymorphism) utilizes web wrapper agents to gather information from 11 different web servers to offer real-time information on phenotypic risk and functional effects, and F-SNP (Functional Single Nucleotide Polymorphism) uses 16 different tools and databases in an integrated fashion to predict functionality based on splicing, transcription, translation, and post-translation.Citation4,Citation6

These programs are useful in prioritizing SNPs for genotyping, as well as for more detailed functional analyses. A large survey of many of these programs showed a high level of consistency between programs in identifying high-risk/high-priority SNPs for colon cancer research.Citation7 However, evolving research supports a functional role for intronic SNPs. For example, an intronic SNP associated with acute lung injury and asthma regulates promoter activity of smMLCK,Citation8 another in PRRX2 has been shown to interact with the conditioning region in KLK2-KLK3,Citation9 and yet another in the GHI gene that is associated with reduced colorectal cancer risk was shown to decrease GHI expression.Citation10 Each of these intronic SNPs is predicted to have no to low risk of effect in either the in silico FASTSNP or F-SNP prediction programs.

To explore the accuracy of predictive models with SNP functionality, identified tagSNPs were correlated with gene expression in normal colon tissue. Empiric results were then compared with the in silico risk prediction programs, FASTSNP and F-SNP.

Materials and methods

Tissue samples

Deidentified normal frozen colon tissues (n = 82) were obtained from the Cooperative Human Tissue Network, funded by the National Cancer Institute, and stored at −80°C. Of the sample population, 54% were male and 46% were female. The tissue donors were aged 17–92 (mean 60.48) years and were of Caucasian (n = 51), African American (n = 23), Asian (n = 1), and unknown (n = 7) origin.

Reverse transcription and quantitative real-time polymerase chain reaction

Total DNA was isolated from normal colon tissue samples using the AllPrep DNA/RNA/Protein Mini Kit (Qiagen, Valencia, CA, USA). Total RNA was isolated utilizing Trizol (Invitrogen, Grand Island NY, USA) for homogenization, and the RNEasy Mini kit (Qiagen) for isolation using a protocol developed by Mauricio Rodriquez-Lanetty (unpublished) with minor alterations. Briefly, tissues (about 25 mg) were homogenized in 150 μL Trizol using a Bullet Blender and stainless steel beads. The homogenate was placed in a new vial with 450 μL of Trizol. After adding 100 μL of chloroform, the vials were shaken well, incubated for 2 minutes at room temperature, centrifuged and the supernatant was placed in a new vial. Equal parts of 100% ethanol were added and the mixture placed in an RNEasy spin column. RNA was washed and eluted according to the RNEasy protocol.

First strand cDNA synthesis was performed using the High Capacity RNA-to-cDNA kit (ABI, Carlsbad CA, USA) on 500 ng total RNA, as measured by an RNA 6000 Nano kit (Agilent, Santa Clara, CA, USA). Quantitative real-time reverse transcription polymerase chain reaction (PCR) reactions were performed on the ABI 7900HT Fast Real Time PCR System using Taqman primer/probe sets and Taqman Fast Universal PCR Master Mix no AmpErase® UNG (ABI). Experiments were run as per the manufacturer’s protocol in triplicate on cDNA diluted 1:10 for 50 PCR cycles, retaining those with standard deviations <1 (exclusions: IFNGR2 [1], IL1B [1]). Samples were normalized to β-actin, discarding those with β-actin Ct (cycle threshold) >30 (IFNGR1 [4], IFINGR2 [4], IL1B [5], LEPR [1], RPS6 KB1 [1], TSC2 [4]). Genes of interest Ct ≥40 or undetermined were set to 40. β-actin was chosen as the housekeeping gene because, in normal colon tissue, it has been shown that structural housekeeping genes such as β-actin have less variation than metabolic housekeeping genes such as glyceraldehyde 3-phosphate dehydrogenase.Citation11

TagSNP selection and genotyping

TagSNPs were selected using the following parameters: r2 = 0.8 defined LD blocks using a Caucasian LD map, minor allele frequency >0.1, range −1,500 base pairs from the initiation codon to +1,500 base pairs from the termination codon, and one SNP/LD bin. All markers were genotyped using a multiplexed bead array assay format based on GoldenGate chemistry (Illumina, San Diego, CA, USA). A genotyping call rate of 99.93% was attained. Blinded internal replicates represented 1.6% of the sample set. The duplicate concordance rate was 99.996%.

In silico prediction programs

Two in silico programs were used. FASTSNP is a web-based tool for assessing phenotypic effects of SNPs through the use of external web servers and a prediction algorithm. FASTSNP uses a ranking system from 0 (no known effect) to 5 (very high risk) based on location of the SNP (eg, 5′ upstream, 3′ untranslated region, intronic) and possible functional effects such as amino acid changes, alterations in splicing sites, and “premature translation termination”.Citation6 F-SNP also utilizes bioinformatic tools and websites to predict the functional effects of SNPs. The process has several steps, with each step determining the next. For instance, if a mutation is found in the coding region through Ensembl, the information is then submitted to an outside bioinformatics website, such as PolyPhen, to test for functional effect.Citation4

Statistical analysis

Identified TagSNPs for 34 genes (CYP19A1, IFNG, IFNGR1, IFNGR2, IKBKB, IL10, IL15, IL17A, IL1A, IL1B, IL1RN, IL2, IL23R, IL2RA, IL4, IL6, IL6R, IL8, LEPR, MTOR, NFKB1, PDGFB, PDK1, PIK3CA, PRKAG2, PTEN, RPS6KB1, RPS6KB2, STAT3, STAT 5B, TGFB1, TNF, TSC2, VEGFA) were entered into the FASTSNP website, and predicted risk values were noted. Six genes (IFNGR1, IFNGR2, IL1B, LEPR, RPS6KB1, and TSC2) were identified as having SNPs that were predicted to have a score of either 2–3 or 3–4 (low to medium or medium to high risk of effect, respectively). From these six genes, tagSNPs with a score of 0–0 (no or unknown risk, n=16) or with a score of either 2–3 or 3–4 (low to high risk, n=8) were chosen for further comparison with phenotype data. Results from F-SNP were based on transcriptional regulation and marked either “changed”/“not changed” or “exist”/“not exist.” A functional significance score is given, with a score of ≥0.5 being considered likely to lead to functional changes.Citation12 The TagSNPs chosen for FASTSNP prediction were entered into the F-SNP prediction program and compared with both phenotype data and with FASTSNP predictions in order to assess similarity between prediction programs.

Statistical analyses were performed using SAS version 9.3 (SAS Institute, Cary, NC, USA). The level of expression for the candidate gene was calibrated to the expression of the housekeeping gene to generate change in Ct. Expression levels were calculated by taking 2^∆Ct and the median of those values was assessed by genotype. A codominant model was initially assumed, but if a dominant or recessive model fitted the data better, that model was evaluated and is presented. P-values comparing median expression levels across genotypes are based on Wilcoxon rank-sum and Kruskal–Wallis rank-sum tests. Statistical significance was set at P < 0.05. SNP associations were performed among Caucasians and African Americans separately, and the directions of the associations are the same for both races for the three leptin receptor SNPs that were reported as being significant (rs8179183, rs9436301, rs4655537). Race was not associated with gene expression. Expression was also not statistically significantly different by age or gender.

Results

Predicted and actual effects in normal colon samples

The predicted FASTSNP and F-SNP effects and gene expression association P-values of the 24 TagSNPs are presented in . Of 16 SNPs predicted to have no/unknown (0–0) effect, two (LEPR rs4655537 and rs9436301) were found to be significantly associated with gene expression (). The common homozygous LEPR rs4655537 genotype (GG) is associated with a 1.7-fold increase (P = 0.01) in expression of LEPR compared with the heterozygous or homozygous variant (GA/AA) genotype. The CC variant LEPR rs9436301 genotype is associated with a 1.52-fold increase in gene expression (P = 0.04) as compared with the CT/TT genotype.

Table 1 Prediction scores and association with gene expression

Table 2 SNPs with significant association with gene expression

Of the eight tagSNPs that were predicted to have a low to high effect (2–3 or 3–4) in the FASTSNP program, only LEPR rs8179183 was significantly associated with gene expression. The common homozygous genotype (GG) was associated with a 1.6-fold decrease (P = 0.048) in expression compared with the heterozygote and homozygous variant (GC/CC).

When compared, FASTSNP and F-SNP scores were similar, although not entirely consistent (). For TagSNPs that were predicted to have no (0–0) effect in FASTSNP, the F-SNP score was below 0.5, the score at which a SNP is likely to lead to functional changes. Of the eight SNPs that were predicted to have a low to medium (2–3) or medium to high (3–4) effect with FASTSNP, five received a functional significance score ≥0.5. The other three ranged in scores from 0.176 to 0.330, causing their prediction to match the genotype/phenotype results better. While four of the five tagSNPs with a functional significance score ≥0.5 hovered near 0.5 (0.5–0.633), one (RPS6KB1 rs180523) had a functional significance score of 1. RPS6KB1 rs180523 also had a FASTSNP score of 3–4, but the expression results showed no statistically significant differences in expression across genotypes (P = 0.63).

Discussion

Differentiating between SNPs that may be deleterious and those that are “benign” is critical to risk assessment and the design of cancer prevention strategies.Citation5 With the human genome being home to potentially millions of SNPs, laboratory discovery of individual SNPs is a daunting task. For this reason, in silico programs have emerged to assist in choosing functional SNPs. These programs use readily available scientific data and bioinformatics to offer predictions on the functional effects of SNPs. This study sought to determine genotype-phenotype relationships empirically, and found that a zero risk of effect in an in silico prediction program does not guarantee a lack of effect of certain SNPs in human colon samples.

In an effort to explore this in relation to gene expression, 82 colon samples were genotyped and phenotyped for the 24 TagSNPs predicted by FASTSNP to have either no effect (0–0) or a low to medium or medium to high effect (2–3 or 3–4, respectively). Our results showed that two of the 16 SNPs that were predicted to have no effect had a significant association with gene expression. In the eight SNPs with a predicted low to high effect, only one showed a significant association with gene expression.

Not all prediction programs generate similar results. The databases and external websites employed by each program are different (although there is some overlap), and unique algorithms are likely to generate disparate results. Thus, FASTSNP results were compared with those of F-SNP. F-SNP combines accumulated results into a single “functional significance score,” with a score of ≥0.5 considered likely to lead to functional changes, given that that is the median score for known disease-related SNPs.Citation4 For these data, FASTSNP and F-SNP scores corresponded for SNPs predicted to have no known effect. However, they did not match with all SNPs that were predicted to have a low to high effect.

There is a chance that the lack of correlation is due to the small sample size. Also, the functionality of SNPs is not limited to RNA expression, and prediction programs are designed to explore other dimensions of functionality, such as amino acid changes and alterations in splicing sites. This may explain a portion of the high-priority SNPs that showed no change in mRNA expression. Further functionality experiments would be necessary to explore other mechanisms of action, such as post-translational modification, protein expression, and protein function, specifically with the leptin receptor protein. There may also be organ-specific differences in gene expression, which may have impacted the results shown here. This further necessitates laboratory functionality studies and inspection of low-priority SNPs in a case-by-case manner. It is also possible that the SNPs chosen for analysis are not truly functional SNPs, but exist in tight linkage with the causative SNP. For this reason also, biochemical studies are necessary to define the mechanistic basis of the noted associations.

There are a few examples of comparison of FASTSNP and functional in vitro experiments. However, these only focus on the high-priority SNPs. For example, a study in the Chinese Han population found two cystathionine gamma-lyase SNPs (rs482843 and rs1021737) to be identified by FASTSNP as high-priority SNPs, yet which showed no significant contribution to the risk of essential hypertension in this population.Citation13 On the other hand, one in vitro study created a p16INK4A protein (from the CDKN2A gene) based on SNPs identified as high-priority by FASTSNP and other in silico programs, and found that CDKN2A rs11552822 may lead to a decrease in binding affinity for CDK6, and may be involved in the development of malignant melanoma.Citation14

In silico programs have been shown to be accurate when predicting functional effects with SNPs that rank very high on their prediction list, and certainly these higher-risk SNPs may be prioritized in laboratory-based research. However, it is not likely that they stand alone in the progression of complex disease.Citation15 Thus, SNPs that are ranked as “no risk” by in silico programs may actually have an effect on gene expression, which may, in turn, lead to an effect on protein abundance and subsequent functioning of the enzyme. For example, the no to low priority GH1 rs2665802 has been associated with both a decrease in human growth hormone gene expression and growth hormone secretion. It was noted that this SNP may work in conjunction with other SNPs not studied, but the contribution of the SNP was found to be direct.Citation10

Even low to medium effects on enzymatic activity may play an important role in the development of disease. Therefore, functional analyses of these low risk SNPs are necessary to capture fully the genotypic contributions to phenotype. This information is critical in determining the biological basis of variability, and can potentially aid in the design of rational intervention/prevention strategies.

Acknowledgments

This work was supported by R01 CA48998 (MLS).

Disclosure

The authors report no conflicts of interest in this work.

References

  • AndersenMCEngstromPGLithwickSIn silico detection of sequence variations modifying transcriptional regulationPLoS Comp Biol200841e5
  • LeeJEHigh-throughput genotypingForum Nutr2007609710117684405
  • BrunhamLRSingarajaRRPapeTDKejariwalAThomasPDHaydenMRAccurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 genePLoS Genet200516e8316429166
  • LeePHShatkayHF-SNP: computationally predicted functional SNPs for disease association studiesNucleic Acids Res200836D820D82417986460
  • WangLLLiYZhouSFA bioinformatics approach for the phenotype prediction of nonsynonymous single nucleotide polymorphisms in human cytochromes P450Drug Metab Dispos200937597799119204079
  • YuanHYChiouJJTsengWHFASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritizationNucleic Acids Res200634W635W64116845089
  • George Priya DossCRajasekaranRArjunPSethumadhavanRPrioritization of candidate SNPs in colon cancer using bioinformatics tools: an alternative approach for a cancer biologistInterdiscip Sci20102432034621153778
  • HanYJMaSFWadeMSFloresCGarciaJGAn intronic MYLK variant associated with inflammatory lung disease regulates promoter activity of the smooth muscle myosin light chain kinase isoformJ Mol Med (Berl)201290329930822015949
  • CiampaJYeagerMAmundadottirLLarge-scale exploration of gene-gene interactions in prostate cancer using a multistage genome-wide association studyCancer Res20117193287329521372204
  • MillarDSHoranMChuzhanovaNACooperDNCharacterisation of a functional intronic polymorphism in the human growth hormone (GH1) geneHum Genomics20104528930120650818
  • RubieCKempfKHansJHousekeeping gene variability in normal and cancerous colorectal, pancreatic, esophageal, gastric and hepatic tissuesMol Cell Probes200519210110915680211
  • LeePHShatkayHAn integrative scoring system for ranking SNPs by their potential deleterious effectsBioinformatics20092581048105519228803
  • LiYZhaoQLiuXLRelationship between cystathionine gamma-lyase gene polymorphism and essential hypertension in Northern Chinese Han populationChin Med J (Engl)2008121871672018701025
  • RajasekaranRPriya DossCGSudandiradossCRamanathanKSethumadhavanRIn silico analysis of structural and functional consequences in p16INK4A by deleterious nsSNPs associated CDKN2A gene in malignant melanomaBiochimie200890101523152918573309
  • ProkuninaLAlarcon-RiquelmeMERegulatory SNPs in complex diseases: their identification and functional validationExpert Rev Mol Med200461011515122975