613
Views
10
CrossRef citations to date
0
Altmetric
Visualization

Uncertainty in Phylogenetic Tree Estimates

&
Pages 542-552 | Received 01 Dec 2016, Published online: 06 Jun 2018
 

ABSTRACT

Estimating phylogenetic trees is an important problem in evolutionary biology, environmental policy, and medicine. Although trees are estimated, their uncertainties are generally discarded in statistical models for tree-valued data. Here, we explicitly model the multivariate uncertainty of tree estimates. We consider both the cases where uncertainty information arises extrinsically (through covariate information) and intrinsically (through the tree estimates themselves). The latter case is applicable to any procedure for tree estimation, and thus has broad relevance to the entire field of phylogenetics. The importance of accounting for tree uncertainty in tree space is demonstrated in two case studies. In the first instance, differences between gene trees are small relative to their uncertainties, while in the second, the differences are relatively large. Our main goal is visualization of tree uncertainty, and we demonstrate advantages of our method with respect to reproducibility, speed, and preservation of topological differences compared to visualization based on multidimensional scaling. The proposal highlights that phylogenetic trees are estimated in an extremely high-dimensional space, resulting in uncertainty information that cannot be discarded. Most importantly, it is a method that allows biologists to diagnose whether differences between gene trees are biologically meaningful or due to uncertainty in estimation.

Acknowledgments

The authors are very grateful to an anonymous referee and two editors, whose helpful and constructive suggestions substantially improved both the text and figures of the manuscript. The authors are also grateful to the R Core Team (Citation2017) and authors of the packages phangorn (Schliep Citation2011), ape (Paradis, Claude, and Strimmer Citation2004), distory (Chakerian and Holmes Citation2017), treespace (Jombart et al. Citation2017), MASS (Venables and Ripley Citation2002), scatterplot3d (Ligges and Mächler Citation2003), XML (Lang and the CRAN Team Citation2017), gridExtra (Auguie Citation2017), lattice (Sarkar Citation2008), and ggplot2 (Wickham Citation2009), which were used for constructing the figures and running the analyses in this article.

Notes

1 https://github.com/adw96/TreeUncertainty.

2 Note that constructing the sets and then projecting them is computationally wasteful. For this reason we only determine the algebraic form of the R40 sets, and determine the form of the R2 sets algebraically before constructing them. Details are given in supplementary materials.

3 Substituting posterior standard deviations for frequentist standard errors overstates the precision in the estimates when priors are chosen to be uninformative (Efron, Halloran, and Holmes Citation1996; Efron Citation2015), as is often the case in phylogenetic inference. The recent proposal of Efron (Citation2015) could be applied to tree space to correct for this. However, because we are focusing only on an exploratory method, we defer the generalization of Efron’s proposal to tree space to a separate investigation.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.