613
Views
10
CrossRef citations to date
0
Altmetric
Visualization

Uncertainty in Phylogenetic Tree Estimates

&
Pages 542-552 | Received 01 Dec 2016, Published online: 06 Jun 2018
 

ABSTRACT

Estimating phylogenetic trees is an important problem in evolutionary biology, environmental policy, and medicine. Although trees are estimated, their uncertainties are generally discarded in statistical models for tree-valued data. Here, we explicitly model the multivariate uncertainty of tree estimates. We consider both the cases where uncertainty information arises extrinsically (through covariate information) and intrinsically (through the tree estimates themselves). The latter case is applicable to any procedure for tree estimation, and thus has broad relevance to the entire field of phylogenetics. The importance of accounting for tree uncertainty in tree space is demonstrated in two case studies. In the first instance, differences between gene trees are small relative to their uncertainties, while in the second, the differences are relatively large. Our main goal is visualization of tree uncertainty, and we demonstrate advantages of our method with respect to reproducibility, speed, and preservation of topological differences compared to visualization based on multidimensional scaling. The proposal highlights that phylogenetic trees are estimated in an extremely high-dimensional space, resulting in uncertainty information that cannot be discarded. Most importantly, it is a method that allows biologists to diagnose whether differences between gene trees are biologically meaningful or due to uncertainty in estimation.

Acknowledgments

The authors are very grateful to an anonymous referee and two editors, whose helpful and constructive suggestions substantially improved both the text and figures of the manuscript. The authors are also grateful to the R Core Team (Citation2017) and authors of the packages phangorn (Schliep Citation2011), ape (Paradis, Claude, and Strimmer Citation2004), distory (Chakerian and Holmes Citation2017), treespace (Jombart et al. Citation2017), MASS (Venables and Ripley Citation2002), scatterplot3d (Ligges and Mächler Citation2003), XML (Lang and the CRAN Team Citation2017), gridExtra (Auguie Citation2017), lattice (Sarkar Citation2008), and ggplot2 (Wickham Citation2009), which were used for constructing the figures and running the analyses in this article.

Notes

1 https://github.com/adw96/TreeUncertainty.

2 Note that constructing the sets and then projecting them is computationally wasteful. For this reason we only determine the algebraic form of the R40 sets, and determine the form of the R2 sets algebraically before constructing them. Details are given in supplementary materials.

3 Substituting posterior standard deviations for frequentist standard errors overstates the precision in the estimates when priors are chosen to be uninformative (Efron, Halloran, and Holmes Citation1996; Efron Citation2015), as is often the case in phylogenetic inference. The recent proposal of Efron (Citation2015) could be applied to tree space to correct for this. However, because we are focusing only on an exploratory method, we defer the generalization of Efron’s proposal to tree space to a separate investigation.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.