Abstract
We propose small-variance asymptotic approximations for inference on tumor heterogeneity (TH) using next-generation sequencing data. Understanding TH is an important and open research problem in biology. The lack of appropriate statistical inference is a critical gap in existing methods that the proposed approach aims to fill. We build on a hierarchical model with an exponential family likelihood and a feature allocation prior. The proposed implementation of posterior inference generalizes similar small-variance approximations proposed by Kulis and Jordan and Broderick, Kulis, and Jordan for inference with Dirichlet process mixture and Indian buffet process prior models under normal sampling. We show that the new algorithm can successfully recover latent structures of different haplotypes and subclones and is magnitudes faster than available Markov chain Monte Carlo samplers. The latter are practically infeasible for high-dimensional genomics data. The proposed approach is scalable, easy to implement, and benefits from the flexibility of Bayesian nonparametric models. More importantly, it provides a useful tool for applied scientists to estimate cell subtypes in tumor samples. R code is available on http://www.ma.utexas.edu/users/yxu/. Supplementary materials for this article are available online.
Additional information
Notes on contributors
Yanxun Xu
Yanxun Xu, Division of Statistics and Scientific Computing, The University of Texas at Austin 1 University Station G2500, Austin, TX 78712 (E-mail: [email protected]).
Peter Müller
Peter Müller, Department of Mathematics, University of Texas at Austin, 1 University Station, C1200, Austin, TX 78712 USA (E-mail: [email protected]).
Yuan Yuan
Yuan Yuan, SCBMB, Baylor College of Medicine, Houston, TX 77030 (E-mail: [email protected]).
Kamalakar Gulukota
Kamalakar Gulukota, Center for Biomedical Informatics, NorthShore University HealthSystem, Evanston, IL 60201 (E-mail: [email protected]).
Yuan Ji
Yuan Ji, Center for Biomedical Informatics, NorthShore University, HealthSystem, Evanston, IL and Department of Health Studies, The University of Chicago, Chicago, IL 60637 (E-mail: [email protected]). Yanxun Xu, Peter Müller, and Yuan Ji’s research is partially supported by NIH R01 CA132897.