1,033
Views
6
CrossRef citations to date
0
Altmetric
Applications and Case Studies

Fully Bayesian Analysis of RNA-seq Counts for the Detection of Gene Expression Heterosis

, &
Pages 610-621 | Received 01 May 2017, Published online: 13 Nov 2018
 

ABSTRACT

Heterosis, or hybrid vigor, is the enhancement of the phenotype of hybrid progeny relative to their inbred parents. Heterosis is extensively used in agriculture, and the underlying mechanisms are unclear. To investigate the molecular basis of phenotypic heterosis, researchers search tens of thousands of genes for heterosis with respect to expression in the transcriptome. Difficulty arises in the assessment of heterosis due to composite null hypotheses and nonuniform distributions for p-values under these null hypotheses. Thus, we develop a general hierarchical model for count data and a fully Bayesian analysis in which an efficient parallelized Markov chain Monte Carlo algorithm ameliorates the computational burden. We use our method to detect gene expression heterosis in a two-hybrid plant-breeding scenario, both in a real RNA-seq maize dataset and in simulation studies. In the simulation studies, we show our method has well-calibrated posterior probabilities and credible intervals when the model assumed in analysis matches the model used to simulate the data. Although model misspecification can adversely affect calibration, the methodology is still able to accurately rank genes. Finally, we show that hyperparameter posteriors are extremely narrow and an empirical Bayes (eBayes) approach based on posterior means from the fully Bayesian analysis provides virtually equivalent posterior probabilities, credible intervals, and gene rankings relative to the fully Bayesian solution. This evidence of equivalence provides support for the use of eBayes procedures in RNA-seq data analysis if accurate hyperparameter estimates can be obtained. Supplementary materials for this article are available online.

Supplementary Materials

The supplementary figures (Figure S1, Figures S2, etc.) are in supplement.pdf. The file TabelS1.csv contain the Paschold et al. (2012) data, as well as fully Bayesian posterior estimates of the gene-specific heterosis probabilities, gene-specific parameter means and standard deviations, estimated effect sizes, and gene-specific parameter estimates from the edgeR method by McCarthy et al. (2012) from Section 4.1. The packages directory contains four R packages including fbseq which is the user-interface for the computation, the back-ends fbseqCUDA and fbseqOpenMP which are suitable for use on computers with and without a CUDA-capable GPU (respectively), and fbseqStudies which reproduces the analyses in this article.

Funding

This research was supported by National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health and the joint National Science Foundation/NIGMS Mathematical Biology Program under award number R01GM109458. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.