Full article: Zig-Zag Sampling for Discrete Structures and Nonreversible Phylogenetic MCMC

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We construct a zig-zag process targeting a posterior distribution defined on a hybrid state space consisting of both discrete and continuous variables. The construction does not require any assumptions on the structure among discrete variables. We demonstrate our method on two examples in genetics based on the Kingman coalescent, showing that the zig-zag process can lead to efficiency gains of up to several orders of magnitude over classical Metropolis–Hastings algorithms, and that it is well suited to parallel computation. Our construction resembles existing techniques for Hamiltonian Monte Carlo on a hybrid state space, which suffers from implementationally and analytically complex boundary crossings when applied to the coalescent. We demonstrate that the continuous-time zig-zag process avoids these complications. Supplementary materials for this article are available online.

Keywords:

1 Introduction

The zig-zag process is a nonreversible, piecewise deterministic Markov process introduced by Bierkens and Roberts (Citation2017); Bierkens, Fearnhead, and Roberts (Citation2019b) for continuous-time MCMC. It has several advantages over reversible methods such as Metropolis–Hastings (Hastings Citation1970) and Gibbs sampling (Gelfand and Smith Citation1990): it avoids diffusive backtracking which slows their mixing, and is rejection-free so that no computation is wasted on rejected moves.

In brief, the generator of the zig-zag process ${(x_{t}, v_{t})}_{t \geq 0}$ targeting the probability density (with respect to the product of the Lebesgue and counting measures) $\tilde{π} (x, v) : = π (x) / 2^{d}$ on a state space $X \times {- 1, 1}^{d} \subseteq R^{d} \times {- 1, 1}^{d}$ is $L f (x, v) = \sum_{i = 1}^{d} v_{i} \partial_{i} f (x, v) + \sum_{i = 1}^{d} λ_{i} (x, v) (f (x, F_{i} v) - f (x, v)),$ where $\partial_{i}$ is the derivative with respect to x_i , and F_i flips the sign of v_i . The flip rates(1) $λ_{i} (x, v) : = {(- v_{i} \partial_{i} log (\tilde{π} (x, v)))}^{+}$ (1) with ${(x)}^{+} : = \max {x, 0}$ , ensure that ${(x_{t}, v_{t})}_{t \geq 0}$ leaves $\tilde{π} (x, v)$ invariant (Bierkens, Fearnhead, and Roberts Citation2019b, theorem 2.2). In words, the coordinates of $x_{t}$ move at constant velocities $v_{t}$ until a flip at coordinate i, when the corresponding velocity changes sign.

To date, the zig-zag processes have been applied to targets such as the Curie–Weiss model (Bierkens and Roberts Citation2017) and logistic regression (Bierkens, Fearnhead, and Roberts Citation2019b), whose state spaces have simple geometric structures with natural notions of direction and velocity. Discrete variables (other than the velocities) have been restricted to cases with simple partial orders, such as model selection (Chevalier, Fearnhead, and Sutton Citation2020; Gagnon and Doucet Citation2021). We construct a zig-zag process on a general hybrid state space with both continuous and discrete coordinates, without imposing any structure on discrete coordinates. This is done by introducing a separate space of continuous variables for each value of the discrete variable, introducing boundaries into the continuous spaces, and updating the discrete variable when boundaries are hit. The strategy takes advantage of the full generality of piecewise-deterministic Markov processes (Davis Citation1993, sec. 24), and is resembles similar work for Hamiltonian Monte Carlo (HMC) (Dinh et al. Citation2017; Nishimura, Dunson, and Lu Citation2020). Our method can also been seen of as a generalization of the zig-zag process on a restricted domain (Bierkens et al. Citation2018) to a union of many restricted sub-domains, with jumps between sub-domains at boundary hitting events. We illustrate our method with an application to the coalescent (Kingman Citation1982): a tree-valued target with continuous branch lengths, discrete tree topologies with no natural partial order, and no canonical geometric structure.

The coalescent examples illustrate the need for methods which are implementable on complex state spaces. They are also of interest because existing MCMC algorithms for coalescents tend to mix slowly. The key difficulty lies in designing Metropolis–Hastings proposal distributions which combine efficient exploration with a high acceptance rate (Mossel and Vigoda Citation2005; Höhna, Defoin-Platel, and Drummond Citation2008; Lakner et al. Citation2008). Workarounds consist of empirical searches for efficient proposals (Höhna and Drummond Citation2012; Aberer, Stamatakis, and Ronquist Citation2016) or Metropolis-coupled MCMC (Geyer Citation1992). The former does not scale to problems for which empirical optimization is infeasible. The latter helps mixing between modes, but does not overcome low acceptance rates or the backtracking behavior of reversible MCMC.

The zig-zag process has some similarities with HMC (Neal Citation2010), which augments the state space with momentum and uses Hamiltonian dynamics to propose large steps which are accepted with high probability, though they are not rejection-free. Like the zig-zag process, HMC requires gradient information and a suitable geometric embedding of the target. Dinh et al. (Citation2017) provided those for the coalescent using an orthant complex construction of phylogenetic tree space (Billera, Holmes, and Vogtmann Citation2001). Our examples differ from the method of Dinh et al. (Citation2017) in three ways. First, we replace the embedding of Billera, Holmes, and Vogtmann (Citation2001) with τ-space (Gavryushkin and Drummond Citation2016), which is better suited to ultrametric trees. Second, the zig-zag process is readily implementable on τ-space via Poisson thinning without a numerical integrator such as leap-prog (Dinh et al. Citation2017, algorithm 1). Finally, the zig-zag process has simple boundary behavior between orthants and does not require boundary smoothing (Dinh et al. Citation2017, sec. 3.3), chiefly because discontinuous gradients are easier to handle on continuous rather than discretized paths.

The rest of the paper is structured as follows. Section 2 defines the zig-zag algorithm with discrete and continuous variables, and proves that it has the desired invariant distribution. Section 3 recalls the coalescent and the τ-space embedding. In Sections 4 and 5 we recall the popular infinite and finite sites models of mutation, derive zig-zag processes for each, and demonstrate their performance via simulation studies. Section 6 concludes with a discussion. The algorithms and datasets used in the simulation studies are available at https://github.com/JereKoskela/tree-zig-zag.

2 Zig-zag on Hybrid Spaces

The definition of our zig-zag process follows (Davis Citation1993, sec. 24). Let $F$ be a countable set. For each $m \in F$ , let $Ω_{m}^{o}$ be an open subset of $R^{d}, \bar{Ω_{m}^{o}}$ be its closure, and $\partial Ω_{m}^{*} : = \bar{Ω_{m}^{o}} ∖ Ω_{m}^{o}$ be its boundary. We assume that ${\partial Ω_{m}^{*}}_{m \in F}$ are piecewise Lipschitz and denote by $\partial Ω_{m}$ the restriction of $\partial Ω_{m}^{*}$ to noncorner points. Let $Ω^{o} : = \cup_{m \in F} Ω_{m}^{o} = {(m, x) : m \in F, x \in Ω_{m}^{o}}$ and $\partial Ω : = \cup_{m \in F} \partial Ω_{m}$ . For a point $(m, x) \in \partial Ω_{m}$ , let $n (m, x)$ be the outward unit normal, and let $Γ^{\pm} (m, x) : = {v \in {- 1, 1}^{d} : \pm (v \cdot n (m, x)) > 0}$ be the sets of velocities with which a zig-zag process can exit (+) or enter (–) $Ω_{m}^{o}$ at $x$ . Zig-zag dynamics imply $v \in Γ^{+} (m, x) \Leftrightarrow - v \in Γ^{-} (m, x)$ . We also define $Γ^{\pm} (\partial Ω) : = \cup_{(m, x) \in \partial Ω} ({(m, x)} \times Γ^{\pm} (m, x))$ and $Ω^{*} : = (Ω^{o} \times {- 1, 1}^{d}) \cup Γ^{-} (\partial Ω)$ . Integrals over $Ω^{o}$ and $\partial Ω$ , or subsets thereof, are taken to incorporate discrete sums in the $m \in F$ coordinate.

The zig-zag process ${(m_{t}, x_{t}, v_{t})}_{t \geq 0}$ is defined on $Ω^{*}$ , with target $\tilde{π} (m, x, v) : = π (m, x) / 2^{d}$ on $Ω^{*} \cup Γ^{+} (\partial Ω)$ for a given density $π (m, x)$ . At $(m, x, v) \in Ω^{*}$ , the process moves with velocity $v$ and each coordinate v_i flips at rate $λ (m, x, v)$ , defined as in (1) since m is fixed between boundary hitting events. When $(m, x, v) \in Γ^{+} (\partial Ω)$ , the process jumps according to a Markov kernel $Q : Γ^{+} (\partial Ω) \mapsto P (Γ^{-} (\partial Ω))$ , where $P (A)$ denotes the set of probability measures on $(A, B (A))$ . We assume that Q and $\tilde{π}$ satisfy the skew-detailed balance condition(2) $\begin{matrix} \tilde{π} (m, x, v) Q (m, x, v; j, d y, w) d x \\ = \tilde{π} (j, y, - w) Q (j, y, - w; m, d x, - v) d y \end{matrix}$ (2) for any $(m, x, v) \in Γ^{+} (\partial Ω)$ and $(j, y, w) \in Γ^{-} (\partial Ω)$ , as well as(3) $\int_{(j, y) \in \partial Ω} \sum_{w \in Γ^{-} (j, y)} (w \cdot n (j, y)) Q (m, x, v; j, d y, w) = - v \cdot n (m, x),$ (3) for any $(x, v) \in Γ^{+} (\partial Ω)$ , and exclude jumps to paths pointing into corners by assuming(4) $\begin{matrix} \int_{(m, x) \in \partial Ω} \sum_{v \in Γ^{+} (m, x)} \int_{(j, y) \in \partial Ω} \sum_{w \in Γ^{-} (j, y)} \\ 1_{(\cup_{m \in F} \partial Ω^{*}) ∖ \partial Ω} (j, y + T_{Γ^{+} (\partial Ω)} (j, y, w) w) \\ \times Q (m, x, v; j, d y, w) d x = 0, \end{matrix}$ (4) where $T_{Γ^{+} (\partial Ω)} (j, y, w)$ is the time the line ${(j, y + t w)}_{t \geq 0}$ hits $Γ^{+} (\partial Ω)$ . We will abuse notation and use $\tilde{π} (m, x, v) d x$ as the target density and Lebesgue measure on $Ω^{o}$ , and as their restrictions to the surface $\partial Ω$ , on which $\tilde{π}$ is not a probability density.

By Davis (Citation1993, theorem 26.14), the zig-zag process with dynamics defined above is a piecewise-deterministic Markov process with extended generator(5) $\begin{matrix} L f (m, x, v) = \sum_{i = 1}^{d} v_{i} \partial_{i} f (m, x, v) \\ + \sum_{i = 1}^{d} λ_{i} (m, x, v) (f (m, x, F_{i} v) - f (m, x, v)), \end{matrix}$ (5) whose domain D(L) consists of measurable functions $f (m, x, v)$ on $Ω^{*}$ satisfying:

For each $(m, x, v) \in Γ^{+} (\partial Ω)$ , the limit $\lim_{t \to 0} f (m, x - t v, v) = : f (m, x, v)$ exists.
For each $(m, x, v) \in Ω^{*}$ , the function $t \mapsto f (m, x + t v, v)$ is absolutely continuous on $t \in [0, T_{Γ^{+} (\partial Ω)} (m, x, v))$ .
For $(m, x, v) \in Γ^{+} (\partial Ω)$ ,(6) $f (m, x, v) = \int_{(j, y) \in \partial Ω} \sum_{w \in Γ^{-} (y)} f (j, y, w) Q (m, x, v; j, d y, w) .$ (6)
The random variable $\sum_{k = 1}^{n} f (m_{T_{k}}, x_{T_{k}}, v_{T_{k}}) - f (m_{T_{k}}, x_{T_{k} -}, v_{T_{k} -})$ is integrable for each $n \in N$ , where ${T_{k}}_{k \geq 1}$ are the successive jump times (both velocity flips and jumps due to hitting a boundary) of ${(m_{t}, x_{t}, v_{t})}_{t \geq 0}$ .

For a set A, let B(A) and $C^{1} (A)$ be the respective spaces of bounded and continuously differentiable functions on A. Let $C_{b}^{1} (A) : = B (A) \cap C^{1} (A)$ . For t > 0, we define $N_{t} : = # {velocity flips and boundary jumps in {(m_{s}, x_{s}, v_{s})}_{s = 0}^{t}} .$

Theorem 1.

Suppose $F$ is finite, $\tilde{π} (m, \cdot, v) \in C^{1} (Ω_{m}^{o})$ for each $v \in {- 1, 1}^{d}$ and $m \in F, \tilde{π} > 0$ on $Ω^{o} \times {- 1, 1}^{d}$ , that $Q (m, x, v; \cdot)$ has compact support for each $(m, x, v) \in Γ^{+} (\partial Ω)$ , and for each t > 0 and $(m, x, v) \in Ω^{*}$ ,(7) $E [N_{t} | (m_{0}, x_{0}, v_{0}) = (m, x, v)] < \infty .$ (7)

Suppose the initial distribution of $(m, x)$ has a density on $Ω^{*}$ and that (2)–(4) hold. Then the zig-zag process with generator (5) and domain D(L) as described above has stationary distribution $\tilde{π}$ .

Proof.

Provided in the supplementary materials. □

Remark 1.

In addition to having the right invariant distribution, a practical algorithm needs to be ergodic. To discuss ergodicity of the zig-zag process ${(m_{t}, x_{t}, v_{t})}_{t \geq 0}$ from Theorem 1, let ${{(m_{t}^{j}, x_{t}^{j}, v_{t}^{j})}_{t \geq 0}}_{j \in F}$ be zig-zag processes restricted to respective spaces $Ω_{j}^{o}$ by boundary jump kernels $Q^{j} : \cup_{x \in \partial Ω_{j}} [(j, x) \times Γ^{+} (j, x)] \mapsto P (\cup_{x \in \partial Ω_{j}} [(j, x) \times Γ^{-} (j, x)])$ , each with target proportional to $\tilde{π} (j, \cdot, \cdot)$ . When $F$ is finite, a sufficient condition for ergodicity of the global process ${(m_{t}, x_{t}, v_{t})}_{t \geq 0}$ is that ${{(m_{t}^{j}, x_{t}^{j}, v_{t}^{j})}_{t \geq 0}}_{j \in F}$ are all ergodic, and that(8) $\begin{matrix} \int_{x \in \partial Ω_{m}} \sum_{v \in Γ^{+} (m, x)} \int_{y \in \partial Ω_{j}} \sum_{w \in Γ^{-} (j, y)} Q (m, x, v; j, d y, w) \\ \tilde{π} (m, x, v) d x > 0, \end{matrix}$ (8) for ordered pairs $(m, j) \in F^{2}$ which form a cycle spanning the support of $\tilde{π}$ . Bierkens, Roberts, and Zitt (Citation2019) provide conditions for ergodicity of single-domain zig-zag processes.

We conclude this section with a pseudocode specification of our zig-zag algorithm.

Algorithm 1 Simulation the zig-zag process targeting $\tilde{π}$

Require: Initial condition $(m, x_{0}, v_{0})$ , target $\tilde{π}$ , jump kernel Q, terminal time $t_{end}$

Set $t \leftarrow 0, m_{0} \leftarrow m, x \leftarrow x_{0}, v \leftarrow v_{0}$
while $t < t_{end}$ do
Set $τ \leftarrow T_{Γ^{+} (\partial Ω)} (m, x, v)$ and $I \leftarrow 0$ . ⊳ $I = 0 \Rightarrow$ boundary hit.
for $i \in {1, 2, \dots, d - 1, d}$ do
Sample $U \sim Exp (1)$ .
Set ρ to be the solution of $\int_{0}^{ρ} λ_{i} (m, x + s v, v) d s = U$ .
if $ρ < τ$ then
Set $τ \leftarrow ρ$ and $I \leftarrow i$
Set $t \leftarrow t + τ, x \leftarrow x + τ v$
if I > 0 then
Set $v_{I} \leftarrow - v_{I}$
else
Set $(m, x, v) \leftarrow (j, y, w) \sim Q (m, x, v; \cdot, \cdot, \cdot)$

3 The Coalescent and a Geometric Embedding

An ultrametric binary tree with n labeled leaves is a rooted binary tree in which each leaf is an equal graph distance away from the root. We follow Gavryushkin and Drummond (Citation2016) and encode such a tree with leaf labels ${1, \dots, n}$ as the pair $(E_{n}, t_{n})$ , where E_n is a ranked topology and $t_{n} \in {(0, \infty)}^{n - 1}$ . The continuous variables $t_{n}$ encode times between mergers. The time from the leaves to the first merger is t₁, and subsequent t_i variables are times between successive mergers. The ranked topology E_n is an $(n - 1)$ -tuple of pairs of labels, where the ith pair specifies the two child nodes of the ith merger. Nonleaf nodes are labeled by the leaves they subtend. For example, the ranked topology $E_{4} = (E_{4, 1}, E_{4, 2}, E_{4, 3}) : = ({1, 2}, {{1, 2}, 3}, {{1, 2, 3}, 4})$ encodes the four leaf caterpillar tree depicted in and with nodes labeled left to right. We order the two entries of $E_{n, i}$ by their least element for definiteness.

The coalescent (Kingman Citation1982) is a seminal model for the genetic ancestry of samples from large populations. Under the coalescent, a tree $(E_{n}, t_{n})$ has probability density(9) $π (E_{n}, t_{n}) d t_{n} : = exp (- \sum_{i = 1}^{n - 1} (\begin{matrix} n + 1 - i \\ 2 \end{matrix}) t_{i}) d t_{n},$ (9) which arises as the law of a tree constructed by starting a lineage from each leaf, and merging each pair of lineages at rate 1 until the most recent common ancestor (MRCA) is reached. The success of the coalescent is due to robustness: distributions of ancestries of a large class of individual-based models converge to the coalescent in the infinite population limit under suitable rescalings of time. For details, see, for example, Wakeley (Citation2009).

We define swap operators s_i for $i \in {2, \dots, n - 2} : E_{n, i - 1} \notin E_{n, i}$ via $\begin{matrix} s_{i} (E_{n}) : = (E_{n, 1}^{'}, \dots, E_{n, n - 1}^{'}) where \\ E_{n, k}^{'} : = {\begin{matrix} E_{n, k} & for k < i - 1 or k > i, \\ E_{n, i} & for k = i - 1, \\ E_{n, i - 1} & for k = i . \end{matrix} \end{matrix}$

In words, s_i swaps the order of the $(i - 1)$ th and ith mergers. We also define pivot operators $p_{i}^{↓}$ and $p_{i}^{↑}$ for $i \in {2, \dots, n - 1} : E_{n, i - 1} \in E_{n, i}$ as(10) $\begin{matrix} p_{i}^{↓} (E_{n}) : = (E_{n, 1}^{'}, \dots, E_{n, n - 1}^{'}) with \\ E_{n, k}^{'} : = {\begin{matrix} E_{n, k} & for k \notin {i - 1, i}, \\ {E_{n, i - 1}^{↓}, E_{n, i}^{s}} & for k = i - 1, \\ {E_{n, i - 1}^{'}, E_{n, i - 1}^{↑}} & for k = i, \end{matrix} \end{matrix}$ (10) where $E_{n, i}^{↑}$ (resp. $E_{n, i}^{↓}$ ) is the entry of $E_{n, i}$ with the higher (resp. lower) least element, and $E_{n, i}^{s}$ is the sibling: the entry of $E_{n, i}$ that is not $E_{n, i - 1}$ . Pivot $p_{i}^{↑}$ is defined by interchanging ↓ and ↑ in (10). The pivots are the two nearest neighbor interchanges between the ith merger and the merger involving its nearest child. illustrates all three operators.

Fig. 1 Operators $p_{2}^{↑}, p_{2}^{↓}$ , and s₂. The horizontal arrangement of leaves is arbitrary throughout this article; only vertical distance is meaningful.

Fig. 1 Operators p2↑, p2↓, and s2. The horizontal arrangement of leaves is arbitrary throughout this article; only vertical distance is meaningful.

Next we describe τ-space, which gives a geometric structure to the set of pairs $(E_{n}, t_{n})$ . For fixed E_n , the space of $t_{n}$ -vectors is the orthant ${[0, \infty)}^{n - 1}$ . Each boundary point with t_i = 0 corresponds to a degenerate tree in which one of three things happens:

The two leaves of $E_{n, 1}$ merge at time 0 if i = 1.
There are two simultaneous mergers if $E_{n, i - 1} \notin E_{n, i}$ .
There is a simultaneous merger of three lineages if $E_{n, i - 1} \in E_{n, i}$ .

Type 1 boundaries are boundaries of the whole τ-space. Type 2 boundaries separate orthants corresponding to two ranked topologies separated by an s_i -step. Trajectories crossing the boundary move from one ranked topology to the other. Type 3 boundaries separate the three orthants which resolve the triple merger into two binary mergers, which differ by a $p_{i}^{↓}$ or $p_{i}^{↑}$ -step. A trajectory that crosses the boundary visits a tree with the triple merger. depicts the τ-space with three leaves, and a type 2 boundary in a general τ-space. An example with five leaves is depicted in (Gavryushkin and Drummond Citation2016, ), and the $t_{n}$ variables are illustrated in and in Sections 4 and 5.

Fig. 2 (Left) τ-space with n = 3 embedded into $R^{3}$ . Each square is a copy of ${[0, \infty)}^{2}$ associated with the given topology. The coordinates $(t_{1}, t_{2})$ are the respective time of the first merger, and the time between the first and second merger. The dot is the origin, and the line on which all three orthants intersect is a type 3 boundary consisting of trees in which all three leaves merge simultaneously at time t₁. The dashed lines are boundaries at $\infty$ . (Right) A segment of τ-space depicting a type 2 boundary, in which each square represents ${[0, \infty)}^{n - 1}$ . Only the two orthants adjacent to the boundary are shown.

Fig. 2 (Left) τ-space with n = 3 embedded into R3. Each square is a copy of [0,∞)2 associated with the given topology. The coordinates (t1,t2) are the respective time of the first merger, and the time between the first and second merger. The dot is the origin, and the line on which all three orthants intersect is a type 3 boundary consisting of trees in which all three leaves merge simultaneously at time t1. The dashed lines are boundaries at ∞. (Right) A segment of τ-space depicting a type 2 boundary, in which each square represents [0,∞)n−1. Only the two orthants adjacent to the boundary are shown.

Fig. 3 A realization of the infinite sites model with n = 4, two mutations, three types, and $D_{n} = ({0.7}, {0.7}, {}, {0.2})$ . The holding times $t_{3}$ are shown on the left.

We will use τ-space to construct zig-zag processes whose state spaces consist of tree topologies, branch lengths, and a scalar parameter introduced in the next section. The discrete variables $F$ will be the ranked topologies, with the boundary crossings described above defining the boundary jump kernel Q. For more on τ-space, for example, existence and uniqueness of geodesics and Fréchet means, we refer to Gavryushkin and Drummond (Citation2016).

4 The Infinite Sites Model

The infinite sites model (Watterson Citation1975) connects the coalescent tree to DNA sequence data by associating the MRCA with the unit interval (0, 1). Mutations with independent, $U (0, 1)$ -distributed locations accrue along branches of the tree (with branch lengths as specified by $t_{n}$ ) at rate $θ / 2$ . The type of a leaf consist of the mutations along branches separating it from the MRCA. We denote the resulting list of types of the leaves by D_n . A realization of a coalescent tree and associated D_n is shown in . The typical task is to sample from the conditional law $(E_{n}, t_{n}, θ) | D_{n}$ corresponding to observing D_n , but not the tree which gave rise to it.

To handle mutations, we define F_n as the rooted graphical tree with $2 n - 1$ nodes, the first n of which are leaves labeled $1, \dots, n$ , while the remaining $n - 1$ are labeled as in E_n . Edges connect children to their parents, and edge lengths are determined by $t_{n}$ . For an edge $γ \in F_{n}$ , we denote by $c_{γ}$ and $p_{γ}$ the respective labels of the child and parent nodes of γ, by $m_{γ}$ the number of mutations on γ, and by $l_{γ} : = \sum_{t_{j} \in γ} t_{j}$ the edge length, where we write $t_{i} \in γ$ if t_i contributes to the length of γ in which case we say γ spans t_i .

Given a prior density $π_{0} (θ)$ for θ, the posterior distribution of $(E_{n}, t_{n}, θ) | D_{n}$ follows from (9), the fact that the number of mutations on branch $γ \in F_{n}$ is Poisson( $θ l_{γ} / 2$ )-distributed given $l_{γ}$ , and that mutations on distinct branches are independent. In particular,(11) $\begin{matrix} π (E_{n}, t_{n}, θ | D_{n}) \propto {{\prod_{γ \in F_{n}} (\frac{θ l_{γ}}{2})}^{m_{γ}}} \\ exp (- \sum_{i = 1}^{n - 1} \frac{(n + 1 - i) (n + θ - i)}{2} t_{i}) π_{0} (θ) \end{matrix}$ (11) provided E_n is consistent D_n , and π = 0 otherwise. This distribution can be sampled using a zig-zag algorithm by taking $F$ to be the set of ranked topologies on n leaves which are consistent with D_n , as well as $Ω_{E_{n}}^{o} : = {(t_{n}, θ) \in {(0, \infty)}^{n}}$ and $\partial Ω_{E_{n}} : = {(t_{n}, θ) \in {[0, \infty)}^{n} : t_{i} = 0 for one i \in {1, \dots, n - 1} or θ = 0}$ for each $E_{n} \in F$ . In the boundary classification of Section 3, θ = 0 is another type 1 boundary. For $(t_{n}, θ) \in \partial Ω$ , we define $k (t_{n}, θ)$ as the index of the zero variable, taken to be n in the case of θ.

We augment the state space with n zig-zag velocities $v_{n}$ , of which $(v_{1}, \dots, v_{n - 1})$ drive $t_{n}$ and v_n drives θ. For $γ \in F_{n}$ , we also define $v_{γ} : = \sum_{j : t_{j} \in γ} v_{j}$ as the rate of change of $l_{γ}$ . The boundary kernel Q is defined separately on each boundary type:(12) $\begin{matrix} Q (E_{n}, (t_{n}, θ), v_{n}; \cdot, \cdot, \cdot) \\ : = { \begin{matrix} δ_{{E_{n}}} (\cdot) \otimes δ_{{(t_{n}, θ)}} (\cdot) \otimes δ_{{F_{k (t_{n}, θ)} (v_{n})}} (\cdot) & for type 1, \\ δ_{{s_{k (t_{n}, θ)} (E_{n})}} (\cdot) \otimes δ_{{(t_{n}, θ)}} (\cdot) \otimes δ_{{F_{k (t_{n}, θ)} (v_{n})}} (\cdot) & for type 2, \\ (\frac{δ_{{p_{k (t_{n}, θ)}^{↑} (E_{n})}} (\cdot)}{2} + \frac{δ_{{p_{k (t_{n}, θ)}^{↓} (E_{n})}} (\cdot)}{2}) \otimes δ_{{(t_{n}, θ)}} (\cdot) \\ \otimes δ_{{F_{k (t_{n}, θ)} (v_{n})}} (\cdot) & for type 3. \end{matrix} \end{matrix}$ (12)

At a type 1 boundary the process reflects back into $Ω_{E_{n}}^{o}$ via a velocity flip. At a type 2 boundary it undergoes an s_i -step and a velocity flip to pass through the boundary. For type 3 boundaries it chooses an adjacent orthant uniformly at random.

In the interiors of orthants, velocity flip rates are(13) $\begin{matrix} λ_{i} (E_{n}, t_{n}, θ; v_{n}) \\ : = {[v_{i} (\frac{(n + 1 - i) (n + θ - i)}{2} - \sum_{γ \in F_{n} : t_{i} \in γ} \frac{m_{γ}}{l_{γ}})]}^{+}, \end{matrix}$ (13) (14) $\begin{matrix} λ_{θ} (E_{n}, t_{n}, θ; v_{n}) \\ : = {[v_{n} (\sum_{i = 1}^{n - 1} \frac{n + 1 - i}{2} t_{i} - \frac{1}{θ} \sum_{γ \in F_{n}} m_{γ} - \partial_{θ} log (π_{0} (θ)))]}^{+} . \end{matrix}$ (14)

Simulating holding times with these rates is difficult due to the time intervals during which they vanish. One strategy is Poisson thinning via dominating rates consisting of only those terms in the round brackets in (13) and (14) whose sign matches that of the corresponding velocity v_i , but these can result in loose bounds and inefficient algorithms. Instead, we define $t_{n} : = θ$ for brevity, and for $i \in {1, \dots, n - 1}$ define $γ (E_{n}, i) : = argmin {l_{γ} : p_{γ} = E_{n, i}}$ as the shorter child branch from parent node $E_{n, i}$ . For $i \in {1, n}$ we adopt the short-hands $m_{γ (E_{n}, n)} : = \sum_{γ \in F_{n}} m_{γ} and m_{γ (E_{n}, 1)} : = \sum_{γ \in F_{n} : p_{γ} = E_{n, 1}} m_{γ},$ and finally define the time localization $T \equiv T (E_{n}, t_{n}, θ; v_{n})$ as(15) $T : = \min { \min_{i \in {1, \dots, n} : v_{i} < 0} { \frac{- t_{i}}{\begin{matrix} {1 + c [1_{E_{n, i}} (E_{n, i - 1}) \\ + 1_{{1, n}} (i)] 1_{Z_{+}} (m_{γ (E_{n}, i)})} v_{i} \end{matrix}} }, K},$ (15) where $\min_{\emptyset} = \infty$ , and $K ≫ 0$ is a maximum increment for the case when all velocities are positive. The indicator functions in the denominator pick out boundaries where (11) vanishes: type 1 or 3 boundaries corresponding to length 0 branches which carry at least one mutation, and the θ = 0 boundary if there is at least one mutation in total. The parameter c > 0 ensures that, when the current process time is t, such boundaries cannot be hit on $[t, t + T]$ , and at most one other boundary can be reached. We found c = 4 gave good performance across our tests in Sections 4 and 5. A larger value results in tighter bounds on (13) and (14), but wastes more computation as t + T is hit more often before an accepted velocity flip.

On time interval $[t, t + T]$ , flip rates (13) and (14) are bounded above by constant rates $\begin{matrix} λ_{i}^{*} & : = {[ v_{i} ( \frac{(n + 1 - i) (n + θ + {(v_{n} T)}^{\pm} - i)}{2} - \sum_{γ \in F_{n} : t_{i} \in γ} \frac{m_{γ}}{l_{γ} + {(v_{γ} T)}^{\pm}}} ) ]}^{+} , \\ λ_{θ}^{*} & : = [v_{θ} (\sum_{i = 1}^{n - 1} \frac{n + 1 - i}{2} [t_{i} + {(v_{i} T)}^{\pm}] - \frac{1}{θ + v_{n} T} \sum_{γ \in F_{n}} m_{γ} \\ - \inf_{s \in [0, T]} {\partial_{θ} log (π_{0} (θ + v_{θ} s))})]^{+}, \end{matrix}$ where, for each $λ_{i}^{*}, i \in {1, \dots, n - 1, θ}, {(x)}^{\pm} : = {(x)}^{+}$ if $v_{i} > 0$ and ${(x)}^{\pm} : = {(x)}^{-} : = \min {x, 0}$ if $v_{i} < 0$ . Algorithms S1 and S2 in the supplementary materials give pseudocode for simulating holding times, velocity flips, and boundary crossings as outlined above.

Proposition 1.

Suppose that the initial condition $(E_{n}, t_{n}, θ)$ has a positive density on $F \times {(0, \infty)}^{n}$ , that $π_{0} > 0$ , and that $\partial_{θ} log (π_{0} (θ))$ is bounded on compact subsets of $(0, \infty)$ . Then (11) is stationary under the dynamics simulated by Algorithm S1 and S2, given in the supplementary materials.

Proof.

Provided in the supplementary materials. □

We compared the zig-zag process to a Metropolis–Hastings algorithm by reanalyzing the data of Ward et al. (Citation1991) with n = 55, 14 distinct types, and 18 mutations. We used the improper prior $[t, t + T]$ , set $v_{i} = \pm 2 / [(n + 1 - i) (n - i)]$ , and set v_n from trial runs to cross the θ-mode in unit time. in the supplementary materials details the Metropolis–Hastings algorithm, and other tuning parameters. We compared both methods to a hybrid combining zig-zag dynamics with continuous time Metropolis–Hastings moves at rate κ = 10. Performance was insensitive to κ provided it was not extreme: small values resemble a zig-zag process, while large values resemble Metropolis–Hastings.

Table 1 Effective sample sizes and run times for all three methods and datasets for the infinite sites model.

Display Table

shows that the zig-zag and hybrid methods mix visibly better than Metropolis–Hastings over the latent tree, as measured by the tree height $H_{n} : = t_{1} + \dots + t_{n - 1}$ . However, they are not as effective at exploring the upper tail of the θ-marginal, likely because they do not stay in regions of short trees for long enough for θ to increase into the tail.

Fig. 4 Trace plots under the infinite sites model and the dataset of Ward et al. (Citation1991).

To assess scaling, we simulated two datasets: one of size n = 550 with mutation rate $θ = 5.5$ (the approximate posterior mean in ) and one with n = 55 and θ = 55, which models a segment of DNA 10 times longer. and demonstrate that the zig-zag and hybrid processes scale far better than Metropolis–Hastings, particularly when θ = 55. Estimates in quantify the improvement to 1–3 orders of magnitude.

Fig. 5 Trace plots for the infinite sites model and the dataset with n = 550, $θ = 5.5$ , 30 distinct types, and 38 mutations.

Fig. 6 Trace plots for the infinite sites model and the dataset with n = 55, θ = 55, 40 distinct types, and 252 mutations.

5 The Finite Sites Model

The finite sites model (Jukes and Cantor Citation1969) is more detailed than the infinite sites model, but has greater computational cost. Consider a finite set of sites S with a finite number of possible types H per site; for example $H = {0, 1}$ or $H = {A, T, C, G}$ . Mutations occur along branches of the coalescent tree at each site with rate $θ / (2 | S |)$ , and the type of a mutant child is drawn from stochastic matrix P with unique stationary distribution ν. We denote the transition matrix of the H-valued compound Poisson process with jump rate $θ / (2 | S |)$ and jump transition matrix P by ${(Q_{h g}^{θ} (t))}_{h, g \in H; t \geq 0}$ . depicts a realization of the finite sites coalescent.

Fig. 7 A realization of the finite sites model with n = 4, $S = H = {0, 1}$ , three mutations, three types, and $D_{n} = (# 00, # 10, # 01, # 11) = (2, 1, 1, 0)$ .

As in Section 4, we denote the configuration of types at the leaves by D_n , and seek to sample from the posterior $π (E_{n}, t_{n}, θ | D_{n})$ , which can be written as a sum over the types of internal nodes:(16) $\begin{matrix} π (E_{n}, t_{n}, θ | D_{n}) \propto {\prod_{s \in S} \sum_{\overset{h (s, c_{γ}) \in H}{for γ \in F_{n} : | c_{γ} | > 1}} \prod_{γ \in F_{n}} Q_{h (s; p_{γ}) h (s; c_{γ})}^{θ} (l_{γ})} \\ \times exp (- \sum_{i = 1}^{n - 1} (\begin{matrix} n + 1 - i \\ 2 \end{matrix}) t_{i}) π_{0} (θ), \end{matrix}$ (16) where $h (s; η) \in H$ is the type at site $s \in S$ on the node with label $η \in E_{n}$ , and $γ \in F_{n} : | c_{γ} | > 1$ denotes edges which do not end in a leaf. The target (16) can be evaluated efficiently using the pruning algorithm of Felsenstein (Citation1981).

The posterior (16) can be sampled using zig-zag dynamics with the same construction as in Section 4. Velocity flip rates can be written in terms of branch-specific gradients as(17) $\begin{matrix} λ_{i} (E_{n}, t_{n}, θ; v_{n}) = [v_{i} ((\begin{matrix} n + 1 - i \\ 2 \end{matrix}) \\ - \sum_{s \in S} \sum_{\overset{h (s, c_{γ}) \in H}{for γ \in F_{n} : | c_{γ} | > 1}} \sum_{δ \in F_{n} : t_{i} \in δ} \\ \frac{[\partial_{i} Q_{h (s; p_{δ}) h (s; c_{δ})}^{θ} (l_{δ})] \prod_{γ \in F_{n} : γ \neq δ} Q_{h (s; p_{γ}) h (s; c_{γ})}^{θ} (l_{γ})}{\sum_{\overset{h (s, c_{γ}) \in H}{for γ \in F_{n} : | c_{γ} | > 1}} \prod_{γ \in F_{n}} Q_{h (s; p_{γ}) h (s; c_{γ})}^{θ} (l_{γ})})]^{+}, \end{matrix}$ (17) (18) $\begin{matrix} λ_{θ} (E_{n}, t_{n}, θ; v_{n}) = [- v_{n} (\partial_{θ} log (π_{0} (θ)) \\ + \sum_{s \in S} \sum_{\overset{h (s, c_{γ}) \in H}{for γ \in F_{n} : | c_{γ} | > 1}} \sum_{δ \in F_{n}} \\ \frac{[\partial_{θ} Q_{h (s; p_{δ}) h (s; c_{δ})}^{θ} (l_{δ})] \prod_{γ \in F_{n} : γ \neq δ} Q_{h (s; p_{γ}) h (s; c_{γ})}^{θ} (l_{γ})}{\sum_{\overset{h (s, c_{γ}) \in H}{for γ \in F_{n} : | c_{γ} | > 1}} \prod_{γ \in F_{n}} Q_{h (s; p_{γ}) h (s; c_{γ})}^{θ} (l_{γ})})]^{+}, \end{matrix}$ (18) which can be evaluated using the linear-cost method of Ji et al. (Citation2020).

We show that events with rates (17) and (18) can be simulated using the example of (Griffiths and Tavaré Citation1994, sec. 7.4), in which $S = {1, \dots, 20}, H = {0, 1}$ and P is the 2 × 2 matrix which always changes state, corresponding to $ν = (1 / 2, 1 / 2)$ and(19) $\tilde{π}$ (19)

As (19) is not bounded away from 0 when $h \neq g$ , (17) and (18) lack simple bounds for Poisson thinning. As in (15), bounds can be obtained by time localization using $\begin{matrix} T \equiv T (E_{n}, t_{n}, θ; v_{n}) \\ : = \min {\min_{i \in {1, \dots, n} : v_{i} < 0} {\frac{- t_{i}}{[1 + c 1_{{1, θ}} (i) 1_{Z_{+}} (m_{γ (E_{n}, i)})] v_{i}}}, K}, \end{matrix}$ where $K ≫ 0$ is a default increment in case all velocities are positive. The variable T localizes the next zig-zag time step beginning at time t so that at most one branch can shrink to length zero on $[t, t + T]$ , θ can fall by at most $1 / (1 + c)$ of its present value, and t₁ can fall by at most $1 / (1 + c)$ of its present value if the first two leaves to merge are of distinct types. This treatment of θ and t₁ is needed as (17) and (18) diverge in these cases, rendering the θ = 0 and $t_{1} = 0$ boundaries inaccessible.

Given $T \in (0, \infty)$ , we have the following bounds on (19) on the time interval $[t, t + T]$ : $\begin{matrix} Q_{h h}^{θ} (l_{γ}) & \leq \frac{1}{2} {1 + exp (- [θ + {(v_{n} T)}^{-}] [l_{γ} + {(v_{γ} T)}^{-}])}, \\ Q_{h g}^{θ} (l_{γ}) & \leq \frac{1}{2} {1 - exp (- [θ + {(v_{n} T)}^{+}] [l_{γ} + {(v_{γ} T)}^{+}])}, \\ Q_{h h}^{θ} (l_{γ}) & \geq \frac{1}{2} {1 + exp (- [θ + {(v_{n} T)}^{+}] [l_{γ} + {(v_{γ} T)}^{+}])}, \\ Q_{h g}^{θ} (l_{γ}) & \geq \frac{1}{2} {1 - exp (- [θ + {(v_{n} T)}^{-}] [l_{γ} + {(v_{γ} T)}^{-}])}, \end{matrix}$ where $h \neq g$ . Substituting these bounds into (Ji et al. Citation2020, Equationeq. (9)(9) $π (E_{n}, t_{n}) d t_{n} : = exp (- \sum_{i = 1}^{n - 1} (\begin{matrix} n + 1 - i \\ 2 \end{matrix}) t_{i}) d t_{n},$ (9) ) provides bounds on flip rates that can be evaluated with $O (| S | n)$ cost.

demonstrates that the zig-zag process again mixes over latent trees faster than Metropolis–Hastings, but struggles to explore the upper tail of the θ-marginal. The hybrid method was run with κ = 100 to compensate for shorter run lengths than in Section 4, and thus, resembles Metropolis–Hastings rather than the zig-zag process.

Fig. 8 Trace plots for the finite sites model and data from Griffiths and Tavaré (Citation1994).

and show results for two further simulated datasets: one with n = 500 and S = 20, and one with n = 50 and S = 200. The superior mixing of the zig-zag process over the latent tree is clear, as quantified by effective sample sizes in . The lack of mixing in the upper tail of the θ-marginal is also stark, particularly in where zig-zag significantly underestimates posterior variance. The estimated posterior means of all three methods coincide in all cases (results not shown).

Fig. 9 Trace plots for the finite sites model and dataset with n = 500 and S = 20 consisting of five distinct sequences.

Fig. 10 Trace plots for the finite sites model and dataset with n = 50 and S = 200 consisting of 18 distinct sequences.

Table 2 Effective sample sizes and run times for all three methods and datasets.

Display Table

6 Discussion

We have presented a general method for using zig-zag processes to sample targets defined on hybrid spaces consisting of discrete and continuous variables. This was done by introducing boundaries into the state space of continuous variables and updating discrete components via a Markov jump kernel Q whenever a boundary was hit. The resulting algorithm remains a piecewise-deterministic Markov processes in the sense of (Davis Citation1993, sec. 24), and generalizes existing zig-zag processes for restricted domains (Bierkens et al. Citation2018). Crucially, no assumptions of structure among the discrete variables are required. The key conditions on Q are the skew-detailed balance (2), which is local, and (3), which involves an integral with respect to Q but not the target $\tilde{π}$ . Both are verifiable in applications, and do not require $\tilde{π}$ to be normalized. Our method is reminiscent of discrete Hamiltonian Monte Carlo (Dinh et al. Citation2017; Nishimura, Dunson, and Lu Citation2020), but the lack of time discretization simplifies boundary crossings (though see Nishimura, Dunson, and Lu Citation2020, sec. S6.4).

We have demonstrated the method on two examples involving the coalescent, which is a gold-standard model in phylogenetics. It is defined on the space of binary trees consisting of discrete tree topologies and continuous branch lengths, which lacks a simple geometric structure, for example, a partial order or a tractable norm. We have also shown that the zig-zag process can improve mixing over trees relative to Metropolis-Hastings, particularly under the infinite sites model. This model is widely used to analyze ever larger datasets, and the zig-zag process shows promise for expanding the scope of feasible MCMC computations.

The zig-zag process was more efficient than Metropolis–Hastings under the infinite sites model in terms of effective sample size, but struggled to explore the tails of the θ-marginal. A likely reason is correlation in the target: high mutation rates are only be attainable when branch lengths are short. A Metropolis–Hastings algorithm can jump to a high mutation rate as soon as the latent tree has short branches, while the zig-zag process must traverse all intervening mutation rates before branch lengths grow. The speed up zig-zag method of Vasdekis and Roberts (Citation2021) has state-dependent velocities, and could provide further improvement. The hybrid method with both zig-zag motion and Metropolis-Hastings updates interpolated between the two algorithms.

All three algorithms exhibited much longer run times under the finite sites model than under infinite sites. For the zig-zag and hybrid methods, that is due to the $O (| S | n)$ cost per evaluation of (17) and (18), of which there are O(n). However, flip times for different velocities are conditionally independent given the current state and can be generated in parallel, unlike steps of the Metropolis–Hastings algorithm. Hence, the zig-zag process is well suited to parallel architectures.

Acknowledgments

The author is grateful to Jure Vogrinc and Andi Wang for productive conversations on MCMC for coalescent processes, and nonreversible MCMC in general.

Supplementary Materials

The supplementary materials contains the proofs of Theorem 1 and Proposition 1, as well as details of the zig-zag and Metropolis–Hastings algorithms used in Sections 4 and 5.

Additional information

Funding

The author was supported by ESPRC grant EP/R044732/1.

References

Aberer, A. J., Stamatakis, A., and Ronquist, F. (2016), “An Efficient Independence Sampler for Updating Branches in Bayesian Markov Chain Monte Carlo Sampling of Phylogenetic Trees,” Systematic Biology, 65, 161–176. DOI: 10.1093/sysbio/syv051.
PubMed Web of Science ®Google Scholar
Bierkens, J., Bouchard-Côte, A., Doucet, A., Duncan, A. B., Fearnhead, P., Lienart, T., Roberts, G., and Vollmer, S. J. (2018), “Piecewise Deterministic Markov Processes for Scalable Monte Carlo on Restricted Domains,” Statistics & Probability Letters, 136, 148–154.
Web of Science ®Google Scholar
Bierkens, J., Fearnhead, P., and Roberts, G. (2019a), “Supplement to ‘The Zig-zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data’,” Annals of Statistics, 47.
Web of Science ®Google Scholar
Bierkens, J., Fearnhead, P., and Roberts, G. (2019b), “The Zig-zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data,” Annals of Statistics, 47, 1288–1320.
Web of Science ®Google Scholar
Bierkens, J., and Roberts, G. (2017), “A Piecewise Deterministic Scaling Limit of Lifted Metropolis-Hastings for the Curie-Weiss Model,” Annals of Applied Probability, 27, 846–882.
Web of Science ®Google Scholar
Bierkens, J., Roberts, G., and Zitt, P.-A. (2019), “Ergodicity of the Zigzag Process,” Annals of Applied Probability, 29, 2266–2301.
Web of Science ®Google Scholar
Billera, L. J., Holmes, S. P., and Vogtmann, K. (2001), “Geometry of the Space of Phylogenetic Trees,” Advances in Applied Mathematics, 27, 733–767. DOI: 10.1006/aama.2001.0759.
Web of Science ®Google Scholar
Chevalier, A., Fearnhead, P., and Sutton, M. (2020), “Reversible Jump PDMP Samplers for Variable Selection,” preprint, arXiv:2010.11771.
Google Scholar
Davis, M. (1993), Markov Models and Optimization, Boca Raton, FL: Chapman Hall.
Google Scholar
Dinh, V., Bilge, A., Zhang, C., and Matsen, F. A., IV. (2017), “Probabilistic Path Hamiltonian Monte Carlo,” in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia.
Google Scholar
Felsenstein, J. (1981), “Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach,” Journal of Molecular Evolution, 17, 368–376. DOI: 10.1007/BF01734359.
PubMed Web of Science ®Google Scholar
Flegal, J. M., Hughes, J., Vats, D., and Dai, N. (2017), mcmcse: Monte Carlo Standard Errors for MCMC.
Google Scholar
Gagnon, P., and Doucet, A. (2021), “Nonreversible Jump Algorithms for Bayesian Nested Model Selection,” Journal of Computational and Graphical Statistics, 30, 312–323. DOI: 10.1080/10618600.2020.1826955.
Web of Science ®Google Scholar
Gavryushkin, A., and Drummond, A. J. (2016), “The Space of Ultrametric Phylogenetic Trees,” Journal of Theoretical Biology, 403, 197–208. DOI: 10.1016/j.jtbi.2016.05.001.
PubMed Web of Science ®Google Scholar
Gelfand, A. E., and Smith, A. F. M. (1990), “Sampling-Based Approaches to Calculating Marginal Densities,” Journal of the American Statistical Association, 85, 398–409. DOI: 10.1080/01621459.1990.10476213.
Web of Science ®Google Scholar
Geyer, C. J. (1992), “Practical Markov Chain Monte Carlo,” Statistical Science, 7, 473–483. DOI: 10.1214/ss/1177011137.
Google Scholar
Griffiths, R. C., and Tavaré, S. (1994), “Simulating Probability Distributions in the Coalescent,” Theoretical Population Biology, 46, 131–159. DOI: 10.1006/tpbi.1994.1023.
Web of Science ®Google Scholar
Hastings, W. K. (1970), “Monte Carlo Sampling Methods Using Markov Chains and their Applications,” Biometrika, 57, 97–109. DOI: 10.1093/biomet/57.1.97.
Web of Science ®Google Scholar
Höhna, S., Defoin-Platel, M., and Drummond, A. J. (2008), “Clock-Constrained Tree Proposal Operators in Bayesian Phylogenetic Inference,” in 8th IEEE International Conference on BioInformatics and BioEngineering, pp. 1–7.
Google Scholar
Höhna, S., and Drummond, A. J. (2012), “Guided Tree Topology Proposals for Bayesian Phylogenetic Inference, Systematic Biology, 61, 1–11. DOI: 10.1093/sysbio/syr074.
PubMed Web of Science ®Google Scholar
Ji, X., Zhang, Z., Holbrook, A., Nishimura, A., Baele, G., Rambaut,A., Lemey, P., and Suchard, M. A. (2020), “Gradients Do Grow on Trees: A Linear-time O(N)-Dimensional Gradient for Statistical Phylogenetics,” Molecular Biology and Evolution, 37, 3047–3060. DOI: 10.1093/molbev/msaa130.
PubMed Web of Science ®Google Scholar
Jukes, T. H., and Cantor, C. R. (1969), “Evolution of Protein Molecules,” in Mammalian protein metabolism, ed. H. N. Munro, pp. 21–132, New York: Academic Press.
Google Scholar
Kingman, J. F. C. (1982), “The Coalescent,” Stochastic Processes and Their Applications, 13, 235–248. DOI: 10.1016/0304-4149(82)90011-4.
Google Scholar
Lakner, C., Van Der Mark, P., Huelsenbeck, J. P., Larget, B., and Ronquist, F. (2008), “Efficiency of Markov Chain Monte Carlo Tree Proposals in Bayesian Phylogenetics,” Systematic Biology, 57, 86–103. DOI: 10.1080/10635150801886156.
PubMed Web of Science ®Google Scholar
Mossel, E., and Vigoda, E. (2005), “Phylogenetic MCMC Algorithms are Misleading on Mixtures of Trees,” Science, 309, 2207–2209. DOI: 10.1126/science.1115493.
PubMed Web of Science ®Google Scholar
Neal, R. M. (2010), “MCMC Using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo, S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Boca Raton, FL: CRC Press.
Google Scholar
Nishimura, A., Dunson, D. B., and Lu, J. (2020), “Discontinuous Hamiltonian Monte Carlo for Discrete Parameters and Discontinuous Likelihoods,” Biometrika, 107, 365–380. DOI: 10.1093/biomet/asz083.
Web of Science ®Google Scholar
Vasdekis, G., and Roberts, G. O. (2021), “Speed Up Zig-zag,” preprint, arXiv:2103.16620.
Google Scholar
Wakeley, J. (2009), Coalescent Theory: An Introduction, Greenwood Village, CO: Roberts & Co.
Google Scholar
Ward, R. H., Frazier, B. L., Dew, K., and Pääbo, S. (1991), “Extensive Mitochondrial Diversity Within a Single Amerindian Tribe,” Proceedings of the National Academy of Sciences of the United States of America, 88, 8720–8724. DOI: 10.1073/pnas.88.19.8720.
PubMed Web of Science ®Google Scholar
Watterson, G. A. (1975), “On the Number of Segregating Sites in Genetical Models Without Recombination,” Theoretical Population Biology, 7, 256–276. DOI: 10.1016/0040-5809(75)90020-9.
PubMed Web of Science ®Google Scholar

Zig-Zag Sampling for Discrete Structures and Nonreversible Phylogenetic MCMC

Abstract

1 Introduction

2 Zig-zag on Hybrid Spaces

3 The Coalescent and a Geometric Embedding

4 The Infinite Sites Model

Table 1 Effective sample sizes and run times for all three methods and datasets for the infinite sites model.

5 The Finite Sites Model

Table 2 Effective sample sizes and run times for all three methods and datasets.

6 Discussion

Acknowledgments

Supplementary Materials

References

Information for

Open access

Opportunities

Help and information

Zig-Zag Sampling for Discrete Structures and Nonreversible Phylogenetic MCMC

Abstract

1 Introduction

2 Zig-zag on Hybrid Spaces

3 The Coalescent and a Geometric Embedding

4 The Infinite Sites Model

Table 1 Effective sample sizes and run times for all three methods and datasets for the infinite sites model.

5 The Finite Sites Model

Table 2 Effective sample sizes and run times for all three methods and datasets.

6 Discussion

Acknowledgments

Supplementary Materials

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date