ABSTRACT
Inferring evolutionary histories (phylogenetic trees) has important applications in biology, criminology, and public health. However, phylogenetic trees are complex mathematical objects that reside in a non-Euclidean space, which complicates their analysis. While our mathematical, algorithmic, and probabilistic understanding of phylogenies in their metric space is mature, rigorous inferential infrastructure is as yet undeveloped. In this manuscript, we unify recent computational and probabilistic advances to construct tree–valued confidence sets. The procedure accounts for both center and multiple directions of tree–valued variability. We draw on block replicates to improve testing, identifying the best supported most recent ancestor of the Zika virus, and formally testing the hypothesis that a Floridian dentist with AIDS infected two of his patients with HIV. The method illustrates connections between variability in Euclidean and tree space, opening phylogenetic tree analysis to techniques available in the multivariate Euclidean setting. Supplementary materials for this article are available online.
Acknowledgments
The author is indebted to Professor Tom Nye of Newcastle University, who most kindly made available the code that underpins his PCA methods (Nye Citation2011, Citation2014), which assisted immensely in the implementation of the log map function. Many thanks to John Bunge for suggestions that significantly clarified the exposition; Phil Spinks for the turtle trees of Section 6.3; Megan Owen for many careful and constructive comments on an earlier draft; Giles Hooker and Marty Wells for funding that enabled completion of the project; and Sidney Resnick and Louis Billera for their support of the investigation and helpful suggestions at a formative stage. Two referees’ thoughtful comments substantially improved every component of this article and their time and care spent reviewing is very much appreciated.
Notes
1 For generality we consider trees to be unrooted, though by designating a particular leaf as the root the restriction to rooted trees is trivial. Note that Barden, Le, and Owen (Citation2016) considered rooted trees with m internal edges; this difference accounts for our use of , in contrast to their use of
.