Full article: Sampling contingency tables

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Let $Ω$ be the set of all $m \times n$ matrices, where $r_{_{i}}$ and $c_{_{j}}$ are the sums of entries in row $i$ and column $j$ , respectively. Sampling efficiently uniformly at random elements of $Ω$ is a problem with interesting applications in Combinatorics and Statistics. To calibrate the statistic $χ^{2}$ for testing independence, Diaconis and Gangolli proposed a Markov chain on $Ω$ that samples uniformly at random contingency tables of fixed row and column sums. Although the scheme works well for practical purposes, no formal proof is available on its rate of convergence. By using a canonical path argument, we prove that this Markov chain is fast mixing and the mixing time $τ_{_{x}} (ϵ)$ is given by $τ_{_{x}} (ϵ) \leq 2 c_{_{m a x}} {(m n)}^{4} ln (c_{_{m a x}} ϵ^{- 1}),$ where $c_{_{m a x}} - 1$ is the maximal value in a cell.

Keywords:

1 Introduction

An $m \times n$ contingency table is an $m \times n$ array where the sum of entries in row $i$ is $r_{_{i}}$ and the sum of entries in column $j$ is $c_{_{j}}$ . Let the size of a contingency table be the multiset of its rows and columns sums $r_{_{i}}$ and $c_{_{j}}$ . The combinatorial problems involving contingency tables are varied and interesting. They include counting magic squares and enumerating permutations by descent patterns and counting double cosets of symmetric groups. Contingency tables are also useful in the study of induced representations, tensor products and symmetric functions. See [Citation1] for a comprehensive introduction on these topics. Moreover, random sampling contingency tables is a problem that has statistical applications, as illustrated in [Citation2,Citation3]. For graph theoretic concepts, we refer to [Citation4].

While entries in contingency tables may be any non-negative integers, the case where entries are $0$ or $1$ had deserved a special attention in the literature [1–3,5–12Citation^[1]Citation^[2]Citation^[3]Citation^[5]Citation^[6]Citation^[7]Citation^[8]Citation^[9]Citation^[10]Citation^[11]Citation^[12]]. Indeed, $(0, 1)$ matrices are much researched upon for its relevance in various fields of sciences, such as network modeling in ecology, social sciences, chemical compounds and biochemical networks in the cell, where they serve to model bipartite graphs with fixed degree sequences. Although random sampling contingency tables and $(0, 1)$ matrices problems seem similar, the present paper focuses exclusively on the former.

Let $Ω$ be the set of $m \times n$ contingency tables. One way to sample a table at random is to construct a random walk that starts at a given element of $Ω$ , then moves to another element according to some simple rule which changes one element to another. After $i$ steps, one outputs the element reached by the random walk and considers this element, the $i$ th state reached, as a random sample. Such a random walk is a Markov chain.

Although this description seems straightforward, there are two issues at stake. First of all, the rule of transition describing the local perturbation from one element to another has to be simple enough so that it can be implemented. Moreover, the local perturbation must be such that, starting from any element, any other element has a chance to be visited. If the Markov chain is viewed as a random walk on a digraph $G$ so that the elements of $Ω$ are vertices of $G$ , the last condition is equivalent to require that $G$ be strongly connected. In [Citation1,Citation2], the following Markov chain is proposed. Take an $m \times n$ contingency table $T$ where entries in row $i$ add up to $r_{_{i}}$ and entries in column $j$ add up to $c_{_{j}}$ . Choose at random two rows $i$ and $i^{'}$ and two columns $j$ and $j^{'}$ . Then make the following local perturbation on table $T$ , changing it into the array $T^{'}$ by the rule $T_{_{i j}}^{'} = T_{_{i j}} + 1$ , $T_{_{i^{'} j}}^{'} = T_{_{i^{'} j}} - 1$ , $T_{_{i j^{'}}}^{'} = T_{_{i j^{'}}} - 1$ , and $T_{_{i^{'} j^{'}}}^{'} = T_{_{i j}} + 1$ . Move from $T$ to $T^{'}$ if $T^{'}$ has no negative entry, and stay on $T$ otherwise. It can be easily proved that this operation is a local perturbation that defines a random walk which visits all the contingency tables having row sums and column sums $r_{_{i}}$ and $c_{_{j}}$ , respectively.

There is another issue to be addressed which is as follows. Since the successive states of a Markov chain are not independent, an unbiased sample can only be obtained if the chain reaches stationarity, the stage when the probability of sampling a particular configuration is fixed in time. The mixing time of a Markov chain is the number of steps required to reach that stationary state up to a defined $ϵ$ . The information about the mixing time of a particular Markov chain is important to avoid to get a biased sample (if stationarity is not reached), or to avoid the computational cost of running the chain more than necessary. The present paper solves this problem by showing that the Markov chain proposed in [Citation1,Citation2,Citation13] mixes fast. That is, the stationary state is reached after a number of steps which is polynomial on $m n$ , the size of the contingency table having fixed row and column sums $r_{_{i}}$ and $c_{_{j}}$ , respectively. For the closely related problem of sampling $(0, 1)$ matrices, some partial results are available. Indeed, while the mixing time of the general problem is still an open problem, it had been proven by Erdos et al. [Citation14] to be fast mixing for semi-regular degree sequences only.

2 Basics on Markov chains

Let $M$ be a Markov chain on a set of states $Ω$ . Let $P$ be the matrix of transitions from one state to another. One may visualize the Markov chain $M$ as a weighted directed graph $G$ , where the states are the vertices of $G$ and there is an edge of weight $P (x, y)$ if the transition from state $x$ to state $y$ has probability $P (x, y)$ . A Markov chain is irreducible if for all pairs of states $x$ and $y$ , there is an integer $t$ , depending on the pair $(x, y)$ , such that $P^{t} (x, y) > 0$ . In terms of the digraph $G$ , the Markov chain is irreducible if there is a directed path between every pair of vertices of $G$ . A Markov chain is aperiodic if for all states $x$ , there is an integer $t$ such that for all $t^{'} \geq t$ , $P^{t^{'}} (x, x) > 0$ . That is, after sufficient number of iterations, the chain has a positive probability to stay on $x$ at every subsequent step. This ensures that the return to state $x$ is not periodic. In terms of the graph $G$ , this can be achieved by having a loop at every vertex. A Markov chain is ergodic if it is irreducible and aperiodic. If $P$ is the matrix of transitions of an ergodic Markov chain, it can be shown that there is an integer $t$ such that for all the pairs of states $(x, y)$ , $P^{t} (x, y) > 0$ . (Notice that $t$ does not depend on the pair). Also, it can be proven that for every ergodic Markov chain with transition matrix $P$ , the largest eigenvalue of $P$ only occurs once and is equal to 1 (Perron–Frobenius Theorem). Using this, it can be proved that there is a unique probability vector $π$ , such that $π P = π$ . The vector $π$ is the stationary distribution of $M$ . A chain is reversible if $P (x, y) π (x) = P (y, x) π (y)$ for every pair of states $x$ and $y$ . If a chain is reversible and the matrix $P$ is symmetric, then $π (x) = π (y)$ for every pair $x$ and $y$ . The stationary state is then said to be uniform. We say a matrix $P$ is irreducible, ergodic, aperiodic, etc. if the Markov chain ruled by $P$ is irreducible, ergodic, aperiodic, etc.

Let $M$ be an ergodic Markov chain defined on a finite set of states $Ω$ , with a transition matrix $P$ and stationary distribution $π$ . Starting the chain from an initial state $x$ , we would like to measure $Δ_{_{x}} (t)$ , the distance between the distribution at time $t$ and the stationary distribution. More formally, if $P^{^{t}} (y | x)$ represents the probability that, at time $t$ , the chain is at state $y$ given initial state $x$ , and $π (y)$ represents the probability that the chain is at state $y$ at stationarity, then the variation distance, denoted by $Δ_{_{x}} (t)$ , is defined as $Δ_{_{x}} (t) = \frac{1}{2} \sum_{_{y \in Ω}} | P^{^{t}} (y | x) - π (y) | .$

A converging chain is one such that $Δ_{_{x}} (t) \to 0$ as $t \to \infty$ for all initial states $x$ . The rate of convergence is measured by $τ_{_{x}} (ϵ)$ , the time required to reduce the variation distance to $ϵ$ given an initial state $x$ . $τ_{_{x}} (ϵ) = m i n {t : Δ_{_{x}} (t) \leq ϵ, f o r a l l t^{'} \geq t} .$

The mixing time of the chain, denoted by $τ (ϵ)$ , is defined as ${max}_{x \in Ω}$ , the maximum being over all the initial points $x$ . A Markov chain is said to be rapidly mixing if its mixing time is bounded above by a polynomial in the size of the input and $\frac{1}{ϵ}$ .

Let $1 = λ_{_{1}} \geq λ_{_{2}} \geq \dots \geq λ_{_{n}}$ be the eigenvalues of the transition matrix $P$ . The spectral gap of the matrix $P$ is defined as $m a x {1 - λ_{2}, 1 - | λ_{n} |}$ . If $P (x, x) < \frac{1}{2}$ for all $x$ , then $λ_{n} > 0$ , and therefore smaller than $λ_{2}$ . The spectral gap of $P$ is then the real number $λ_{_{1}} - λ_{_{2}}$ , that is, $1 - λ_{_{2}}$ . It can be shown that largest the gap, the faster is the mixing time of the chain. The analysis of the mixing time of a Markov chain is based on the intuition that a random walk on the graph $G$ mixes fast (i.e., reaches all the states quickly) if $G$ has no bottleneck. That is, there are no cuts between any set of vertices $S$ to its complement, which blocks the flow of the Markov chain and thus prevents the Markov chain from reaching easily some states. See [10,12,15–17Citation^[10]Citation^[12]Citation^[15]Citation^[16]Citation^[17]] for a better exposition on the topic. To make this more formal, we need some preliminary definitions which conforms with [Citation12].

Denoting the probability of $x$ at stationarity by $π (x)$ and the probability of moving from $x$ to $y$ by $P (x, y)$ , the capacity of the arc $e = (x, y)$ , denoted by $c (e)$ , is given by $c (e) = π (x) P (x, y) .$

Let $P_{_{x, y}}$ denote the set of all simple paths $p$ from $x$ to $y$ (paths that contain every vertex at most once). A flow in $G$ is a function $ϕ$ , from the set of simple paths to the reals, such that $\sum_{_{p \in P_{_{x, y}}}} ϕ (p) = π (x) π (y),$ for all vertices $x, y$ of $G$ with $x \neq y$ . An arc-flow along an arc $e$ , denoted by $ϕ^{'} (e)$ , is then defined as $ϕ^{'} (e) = \sum_{_{p ∋ e}} ϕ (p) .$

For a flow $ϕ$ , a measure of existence of an overload along an arc is given by the quantity $ρ (e)$ , where $ρ (e) = \frac{ϕ^{'} (e)}{c (e)},$ and the cost of the flow $ϕ$ , denoted by $ρ (ϕ)$ , is given by $ρ (ϕ) = max_{_{e}} ρ (e) .$

If a network $G$ representing a Markov chain can support a flow of low cost, then it cannot have any bottlenecks, and hence its mixing time should be small. This intuition is confirmed by the following theorem [Citation12].

Theorem 1

[Citation12]

Let $M$ be an ergodic reversible Markov chain with holding probabilities $P (x, x) \geq \frac{1}{2}$ at all states $x$ . The mixing time of $M$ satisfies $τ_{_{x}} (ϵ) \leq ρ (ϕ) | p | (l n \frac{1}{π (x)} + l n \frac{1}{ϵ}),$ where $| p |$ is the length of a longest path carrying non-zero flow in $ϕ$ .

Thus, one way to prove that a Markov chain mixes fast is to produce a flow along some paths, where the paths and the maximal overload on edges are polynomials on the size of the problem.

3 Markov chains on the set of contingency tables

Diaconis et al. [Citation1,Citation2] define local perturbations that transform a contingency table into another of the same row and column sums. We now describe it in more detail while redefining some terms for completeness. Let $X$ be an $m \times n$ contingency table of row and column sums $r_{_{i}}$ and $c_{_{j}}$ , respectively. Let $X_{_{i j}}$ be the entry on the $i^{^{t h}}$ row and $j^{^{t h}}$ column. That is, $X_{_{i j}}$ is the entry in the cell $(i, j)$ . We totally order the cells of $X$ lexicographically. That is, $(i, j) > (k, l)$ if $i > k$ . If $i = k$ , then $(i, j) > (k, l)$ with $j > l$ . A site, denoted by $[i, i^{'}, j, j^{'}]$ , is the set of four cells ${(i, j), (i^{'}, j), (i, j^{'}), (i^{'}, j^{'})}$ , with $i \neq i^{'}$ and $j \neq j^{'}$ . A flip on $[i, i^{'}, j, j^{'}]$ is a transformation that changes the contingency table $X$ into $X^{'}$ by the operations $X_{_{i, j}}^{'} = X_{_{i, j}} + 1$ $X_{_{i^{'}, j}}^{'} = X_{_{i^{'}, j}} - 1$ $X_{_{i^{'}, j^{'}}}^{'} = X_{_{i^{'}, j^{'}}} + 1$ $X_{_{i, j^{'}}}^{'} = X_{_{i, j^{'}}} - 1,$ while preserving all the other entries. See for an illustration.

Fig. 1 A flip on the site $[i, i^{'}, j, j^{'}]$ .

The following result given in [Citation1,Citation2], shows that the Markov chain on $Ω$ is irreducible.

Theorem 2

[Citation1,Citation2]

Given two contingency tables $X$ and $Y$ in $Ω$ , there is a sequence of flips which transforms $X$ to $Y$ .

Let $G$ denote the graph whose vertices are all $m \times n$ contingency tables of row and column sums $r_{_{i}}$ and $c_{_{j}}$ , respectively, and there is an edge from a vertex $X$ to $Y$ if and only if there is a site $[i, i^{'}, j, j^{'}]$ such that a flip on $[i, i^{'}, j, j^{'}]$ changes $X$ to $Y$ . If $X$ is any vertex of $G$ , it is routine to check that $X$ has at most ${(m n)}^{2}$ adjacent neighbors. Indeed, the vertex $Y$ is an adjacent neighbor of $X$ if there is a flip changing $X$ to $Y$ . Now, given a cell $(i, j)$ on the contingency table $X$ , there are at most $(m - 1) (n - 1)$ sites involving $(i, j)$ . Considering all $m n$ points on the array $X$ gives a total of at most ${(m n)}^{2}$ possible sites.

For the sake of convenience, in what follows, we use $X$ or $x$ interchangeably. We write $X$ when we need to stress that we are talking about a contingency table and we write $x$ if we need to stress that we are talking about the vertex that represents $X$ in the graph $G$ . Now, we formally define a random walk on $G$ as follows. Start at any vertex $z$ . If the walk is at $x$ , choose at random an adjacent neighbor $y$ of $x$ , and move from $x$ to $y$ with probability $P_{_{x, y}} = \frac{1}{2 {(m n)}^{2}}$ , or stay at $x$ otherwise. If $P$ denotes the transition matrix and $P (x)$ is the row of $P$ corresponding to the state $x$ , it is routine to check that the entries of $P (x)$ add up to 1. That is, $P (x)$ is a probability vector. Moreover, by Theorem 2, the matrix $P$ is irreducible. Also, since there is a loop on every point $x$ , the matrix $P$ is aperiodic. Hence the random walk is an ergodic Markov chain. Finally, $P$ is a symmetric matrix. Thus, the Markov chain has a stationary uniform distribution $π$ . We call this Markov chain as the Diaconis Markov chain.

Theorem 3

If $Ω$ is the set of $m \times n$ contingency tables, the Diaconis Markov chain on $Ω$ mixes fast and the mixing time $τ_{_{x}} (ϵ)$ is given by $τ_{_{x}} (ϵ) \leq 2 c_{_{m a x}} {(m n)}^{4} ln (c_{_{m a x}} ϵ^{- 1}),$ where $c_{_{m a x}} - 1$ is the maximal in a cell.

We prove this result by showing that, between any two vertices of $G$ , there is a path, the canonical path, that is not ‘too long’ and where no edge is overloaded. To put this in a more formal setting, we need the following definitions and lemmas.

Let $X$ and $Y$ be two different $m \times n$ contingency tables. If $X_{_{i j}} = Y_{_{i j}}$ , we say that the point $(i, j)$ is matched, and unmatched otherwise. Obviously therefore, to match a point $(i, j)$ means to make $X_{_{i j}}$ equal to $Y_{_{i j}}$ . A path from $X$ to $Y$ is the sequence of contingency tables $(X = X^{(0)}, X^{(1)}, X^{(2)}, \dots, X^{(t)}, \dots, Y)$ , where one contingency table is obtained from the preceding one by a single flip on some site. The canonical path from $X$ to $Y$ is a path that matches the cells in increasing lexicographic ordering. Since there are many such paths from $X$ to $Y$ , we take the path that uses the flips with the least indices, lexicographically. This makes the canonical path unique. Suppose that at time $t$ , the canonical path has reached the contingency table $X^{(t)}$ , corresponding to matching the cell $(i, j)$ , and $X_{_{i j}}^{(t)} \neq Y_{_{i j}}$ . The matched zone is the set of cells $(k, l)$ such that $(k, l) < (i, j)$ (Obviously $X_{_{k l}}^{(t)} = Y_{_{k l}}$ for $(k, l) < (i, j)$ ). The cell $(i, j)$ is then called the smallest unmatched cell. A key observation, given in the lemma that follows, is that once a cell is matched, the canonical path does not unmatch it anymore. This entails that it is possible to match all the $m n$ cells in the lexicographic order, without backtracking.

Lemma 4

Let $(i, j)$ be the smallest unmatched cell at time $t$ along the canonical path from $X$ to $Y$ . There is a sequence of flips where all the other cells involved in matching $(i, j)$ are greater than $(i, j)$ .

Proof

Let $(i, j)$ be the least unmatched cell at time $t$ , and suppose that $X_{_{i j}}^{(t)} = a$ and $Y_{_{i j}} = a^{'}$ , with $| a^{'} - a | = r$ , as illustrated in .

If there is no row $i^{'}$ with $i^{'} > i$ (if $i$ is the last row), then $X^{(t)}$ and $Y$ have different column sums on the column $j$ , which is a contradiction. A similar argument holds if there is no column $j^{'}$ such that $j^{'} > j$ . Therefore there is a row $i^{'}$ and column $j^{'}$ such that $i^{'} > i$ and $j^{'} > j$ . Thus making $r$ successive flips on $[i, i^{'}, j, j^{'}]$ matches the cell $(i, j)$ . We only have to care about the case where some entries on the site $[i, i^{'}, j, j^{'}]$ become negative before performing all the $r$ flips.

$a < a^{'}$ . In this case, either $(i^{'}, j)$ or $(i, j^{'})$ , or both, may become negative. Suppose that the entry on the point $(i^{'}, j)$ becomes negative at the $s$ th flip, with $s < r$ . Then there must be other rows $k_{i}$ , with $k_{i} > i$ which have together at least $r - s + 1$ positive entries, such that successive flips on the site $[i, k, j, j^{'}]$ can complete the matching. If there is no such row, then all the entries $X_{_{k j}}^{(t + s - 1)} = 0$ for all $k > i$ . But since all the points $(l, j)$ with $l < i$ are matched, we have $c_{_{j}}$ in $X^{(t + s - 1)}$ is less than $c_{_{j}}$ in $Y$ , a contradiction. A similar argument holds if the entry in the cell $(i, j^{'})$ became negative.

$a > a^{'}$ . In this case, $(i^{'}, j^{'})$ may become negative. Suppose that the entry on the point $(i^{'}, j^{'})$ becomes negative at the $s$ th flip, with $s < r$ . Then, there are rows $k$ and column $l$ , with $k > i$ and $l > j$ , which have together at least $r - s + 1$ entries such that the successive flips on $[i, k, j, l]$ complete the matching of the cell $(i, j)$ . Suppose there is no such row $k$ , or column $l$ , or both, since all the relevant entries are zeros. Thus, either the row sum $r_{_{i}}$ in $X^{(t + s - 1)}$ is less than $r_{_{i}}$ in $Y$ , or the column sum $c_{_{j}}$ in $X^{(t + s - 1)}$ is less than $c_{_{j}}$ in $Y$ or both. Again a contradiction.

Fig. 2 The hatched zone is the matched zone at time $t$ , where we assume that $i^{'} > i$ and $j^{'} > j$ . The picture also assumes that there may be some other contingency tables between $X$ and $X^{(t)}$ , and between $X^{(t)}$ and $Y$ .

Corollary 5

For every pair of contingency tables $X$ and $Y$ , there is a sequence of flips changing $X$ to $Y$ such that the points in successive matched zones are not affected.

Corollary 6

For all contingency tables $X$ and $Y$ , the length of the canonical path from $X$ to $Y$ is at most $(c_{_{m a x}} - 1) m n$ , where $c_{_{m a x}} - 1$ is the maximal value in a cell.

Proof

To match the point $(i, j)$ requires at most $c_{_{m a x}} - 1$ successive flips. Since there are at most $m n$ points to match and there is no backtracking by Corollary 5, the result follows.

A second key observation, given in Lemma 9, is that the number of canonical paths which pass through an edge $e$ is not too great. We first need the following definition and a lemma of far reaching importance.

Let $Ω$ denote the set of $m \times n$ contingency tables, where $r_{_{i}}$ and $c_{_{j}}$ are the sums of row $i$ and column $j$ , respectively, for $1 \leq i \leq m$ , $1 \leq j \leq n$ , and let $X \in Ω$ . A margin is either a column or a row sum. Let $X_{_{j}}$ denote the $j$ th column of $X$ . Suppose that the column $X_{_{j}}$ contains $m$ entries and the sum of these $m$ entries is $c_{_{j}}$ , then the number of non-negative integer solutions of the equation $x_{_{1}} + x_{_{2}} + \dots + x_{_{m}} = c_{_{j}}$ is equal to $(\binom{c_{j} + m - 1}{m - 1})$ . We denote by $X_{_{j_{_{l}}}}$ the $l$ th non-negative integer solution of the equation $x_{_{1}} + x_{_{2}} + \dots + x_{_{m}} = c_{_{j}}$ . (Not to be confused with $X_{_{j, l}}$ , the $(j, l)$ th entry of the contingency table $X$ .) Let $Y^{^{(l)}}$ be a contingency table whose $j$ th column is the solution $X_{_{j_{_{l}}}}$ . (By Theorem 2, starting from $X$ , it is possible to reach any $Y^{^{(l)}}$ . We assume that $X = Y^{^{(1)}})$ . We denote by $Y^{^{(l)}} ∖ X_{_{j_{_{l}}}}$ the contingency table obtained by deleting column $X_{_{j_{_{l}}}}$ from $Y^{^{(l)}}$ . Obviously, $Y^{^{(l)}} ∖ X_{_{j_{_{l}}}}$ is an $m \times (n - 1)$ contingency table whose row sums are $r_{_{i}} - a_{_{i}}$ , where $a_{_{i}}$ is the $i$ th entry of column $j$ of $Y^{^{(l)}}$ . We now state and prove the following instrumental lemma.

Lemma 7

Let $Ω$ denote the set of $m \times n$ contingency tables where $r_{_{i}}$ and $c_{_{j}}$ are the sums of row $i$ and column $j$ , respectively, for $1 \leq i \leq m$ , $1 \leq j \leq n$ . Let $N$ be the cardinality of $Ω$ . Choose any contingency table $X \in Ω$ and let $X_{_{j}}$ be the column or row of $X$ with the smallest margin amongst all the columns and rows of $X$ . Then, (1) $N = N_{_{1}}^{'} + N_{_{2}}^{'} + \dots + N_{_{σ}}^{'}$ (1) where $N_{_{l}}^{'}$ is the number of $m \times (n - 1)$ contingency tables that can be reached from $Y^{^{(l)}} ∖ X_{_{j_{_{l}}}}$ through a sequence of flips, $σ = (\binom{c_{j} + m - 1}{m - 1})$ is the number of non-negative integer solutions of the equation $x_{_{1}} + x_{_{2}} + \dots + x_{_{m}} = c_{_{j}}$ , and $c_{_{j}}$ is the margin of $X_{_{j}}$ .

Proof

(We insist that $X_{_{j}}$ may be a row or a column, as long as it has the least margin. But even when it is a row, we still consider it as a column, by just transposing the table.) Consider any contingency table $Y^{^{(l)}} \in Ω$ and consider its column $j$ . The set $Ω$ is the disjoint union of $A$ and $B$ , where $A$ is the set of contingency tables obtained from $Y^{^{(l)}}$ through a sequence of flips on sites not containing entries of column $j$ , and $B$ is the set of contingency tables obtained from $Y^{^{(l)}}$ through a sequence of flips where at least one site containing entries of column $j$ is involved. The set $A$ is isomorphic to the set of $m \times (n - 1)$ contingency tables that can be reached from $Y^{^{(l)}} ∖ X_{_{j_{_{l}}}}$ after a sequence of flips. Thus, $| A | = N_{_{l}}^{'}$ .

Elements of the set $B$ can be obtained from $Y^{^{(l)}}$ by performing a single flip in a site containing entries of column $j$ of $Y^{^{(l)}}$ to obtain a contingency table $Y^{(l + 1)}$ , and, from $Y^{(l + 1)}$ perform all the possible flips in sites not containing entries in column $j$ of $Y^{(l + 1)}$ . As above, this is isomorphic to the set of $m \times (n - 1)$ contingency tables reachable from $Y^{^{(l + 1)}} ∖ X_{_{j, l + 1}}$ . There are $N_{_{l + 1}}^{'}$ such contingency tables. Recursively, one may thus obtain all the $N_{_{1}}^{'}, N_{_{2}}^{'}, \dots, N_{_{σ}}^{'}$ , starting from $l = 1$ , where $σ = (\binom{c_{j} + m - 1}{m - 1})$ , is the number of non-negative integer solutions of the equation $x_{_{1}} + x_{_{2}} + \dots + x_{_{m}} = c_{_{j}}$ .

It only remains to show that each of these solutions would yield a valid contingency table in $Ω$ . For a contradiction, let $Z_{_{j}}$ be a non-negative integer solution of the equation $x_{_{1}} + x_{_{2}} + \dots + x_{_{m}} = c_{_{j}}$ and there is no contingency table $Z$ such that $Z_{_{j}}$ is its $j$ th column. Then there is no contingency table in $Ω$ whose $j$ th column can be transformed into $Z_{_{j}}$ through a sequence of flips. Let $a_{_{r}}$ and $a_{_{s}}$ be two entries in $Z_{_{j}}$ . We may assume that $a_{_{s}}$ is in the $s$ th position (row) and $a_{_{r}}$ is in the $r$ th position. There is no contingency table $Z$ in $Ω$ such that $Z_{_{j}}$ is its $j$ th column only if there is no contingency table $Z^{^{(1)}}$ in $Ω$ whose $(r, j)$ th entry is $a_{_{r}} + 1$ and the $(s, j)$ th entry is $a_{_{s}} - 1$ , and such that a flip in the site $[r, s, j, j^{'}]$ of $Z^{^{(1)}}$ produces $Z$ . This is possible only if, in $Z^{^{(1)}}$ , all the entries in row $s$ (apart from $a_{_{s}}$ ) are zeros (so that none can be decreased by 1). Thus either (1), $Z_{_{s, j}}^{^{(1)}} = r_{_{s}}$ , where $r_{_{s}}$ is the sum of row $s$ , or (2), there is another entry that is not zero in row $s$ .

(1)

$Z_{_{s, j}}^{^{(1)}} = r_{_{s}}$ . Either $r_{_{s}} > c_{_{j}}$ or not. If $r_{_{s}} > c_{_{j}}$ , there will be a negative entry in $Z^{^{(1)}}$ , which then is not a contingency table. To avoid this, we chose $X_{_{j}}$ to be the column or row of the least sum amongst all the column and row sums. If not, then

(2)

Suppose there is another entry that is not zero in row $s$ . Let this non-zero entry be in column $j^{'}$ . Then a flip on site $[r, s, j, j^{'}]$ may not be possible only if the $(r, j^{'})$ entry of $Z^{(1)}$ is equal to the sum of row $r$ , so that it cannot be increased by 1. Thus $Z_{r, j}^{(1)} = 0$ .

Therefore, for all contingency tables $Z^{^{(1)}}$ in $Ω$ whose $(r, j)$ th entry is $a_{_{r}} + 1$ and $(s, j)$ th entry is $a_{_{s}} - 1$ , and all other entries are equal to the corresponding entries of $Z_{_{j}}$ , we have either $Z_{_{r, j}}^{^{(1)}} = 0$ or $Z_{s, j}^{(1)} = r_{s}$ . Thus $a_{_{s}} = r_{_{s}} + 1$ or $a_{_{r}} = 0 - 1$ . In either case, $Z_{_{j}}$ is not a non-negative integer solution of the equation $x_{_{1}} + x_{_{2}} + \dots + x_{_{m}} = c_{_{j}}$ . This is a contradiction.

Corollary 8

Let $N$ be the number of all $m \times n$ contingency tables of fixed row and column sums. The number of contingency tables having $k$ fixed cells (in lexicographic ordering) is at most $N^{\frac{m n - k}{m n}}$ .

Proof

We prove the result by induction on $k$ , the number of cells that are fixed in the lexicographical ordering. Indeed, if $k = 0$ , the number of such contingency tables is $N$ . In case $k = m n$ , there is only a single contingency table. Hence the formula holds for these two extreme cases.

For the induction, assume that the result holds for $k$ . Let $k + 1$ cells be fixed and let the $(k + 1) t h$ cell be on the column $j$ . Let $Ω$ denote the set of contingency tables having $k + 1$ vertices fixed (in lexicographic ordering). We have to show that $| Ω | \leq N^{\frac{m n - k - 1}{m n}}$ .

By Lemma 7, and by induction, we have that $| Ω | \leq {N_{1}^{'}}^{\frac{m (n - 1) - k}{m (n - 1)}} + {N_{2}^{'}}^{\frac{m (n - 1) - k}{m (n - 1)}} + \dots + {N_{σ}^{'}}^{\frac{m (n - 1) - k}{m (n - 1)}} \leq {(N_{1}^{'} + N_{2}^{'} + \dots + N_{σ}^{'})}^{\frac{m (n - 1) - k}{m (n - 1)}} \leq N^{\frac{m (n - 1) - k}{m (n - 1)}} \leq N^{\frac{m n - k - 1}{m n}} . □$

Lemma 9

If $e = (z, y)$ is an arc of $G$ , then the number of different canonical paths passing through $e$ is at most $N$ , where $N$ is the number of vertices of $G$ .

Proof

Let $σ = (v, \dots, z, y, \dots, w)$ be a canonical path passing through the edge $e = (z, y)$ , as illustrated in . In the graph $G$ , the edge $e$ represents a flip on a fixed site of the contingency table $z$ , such that making the flip $e$ moves the chain from $z$ to $y$ . Suppose that the $z$ is reached at the $k$ th flip along the canonical path $σ$ , and let $\bar{Z}$ be the matched zone at $z$ . That is, $\bar{Z}$ is the set containing the first $k$ cells, in the lexicographic ordering, i.e., $\bar{Z} = (1, 2, \dots, k)$ . Now, by Lemma 4, the matched zone is preserved along the canonical path. Thus $w$ is the end point of a canonical path passing through $e$ only if $w$ contains $\bar{Z}$ . Hence $w$ are contingency tables where the sequence of cells $(1, 2, \dots, k)$ is fixed, and the sequence of cells $(k + 1, k + 2, \dots, m n)$ may take any entry, subject to being a contingency table of the correct size. The number of such vertices (contingency tables) is thus at most $N^{\frac{m n - k}{m n}}$ , by Lemma 7.

Moreover, let ${\bar{Z}}^{c}$ denote the complement of $\bar{Z}$ . That is ${\bar{Z}}^{c} = (k + 1, k + 2, \dots, m n)$ . The contingency table $v$ is the start point of the canonical path $σ$ only if $v$ is such that matching the $k$ first cells simultaneously turns the sequence of entries in $(k + 1, k + 2, \dots, m n)$ equal to the corresponding entries in $z$ . Hence, since the canonical path is unique, the sequence of entries in $(1, 2, \dots, k)$ uniquely determines the sequence of entries in $(k + 1, k + 2, \dots, m n)$ . Therefore $v$ are contingency tables where the sequence of cells $(k + 1, k + 2, \dots, m n)$ is uniquely determined by the sequence of cells $(1, 2, \dots, k)$ . The latter may take any entry, subject to being a contingency table of the right size. The number of such vertices (contingency tables) is thus at most $N^{\frac{k}{m n}}$ .

Hence, the number of canonical paths using $e$ is at most $N^{\frac{k}{m n}} N^{\frac{m n - k}{m n}} = N$ .

Fig. 3 The set of canonical paths passing through the edge $e$ . Notice that $e$ represents the flip that changes the contingency table $Z$ to the contingency table $Y$ .

Now, we are able to prove Theorem 3.

Proof of Theorem 3

In order to prove Theorem 3 by using Theorem 1, we show that there is a flow $ϕ$ such that $ρ (ϕ)$ is a polynomial in $m \times n$ , the size of contingency tables in $Ω$ . Indeed, if $x$ and $y$ are two vertices of $G$ , let ${\hat{p}}_{_{x y}}$ denote the canonical path from $x$ to $y$ and let $ϕ$ be a flow consisting of injecting $π (x) π (y)$ units of flow along ${\hat{p}}_{_{x y}}$ . Then, for all arcs $e$ , we have $ϕ^{'} (e) = \sum π (x) π (y),$ where the sum is over all the pairs ${x, y}$ such that $e \in {\hat{p}}_{_{x y}}$ . Since by Lemma 9, there are at most $N$ canonical paths through $e$ , and $π (x) = π (y) = \frac{1}{N}$ , as the distribution $π$ is uniform, we have $ϕ^{'} (e) \leq N π (x) π (y) = \frac{1}{N} .$

Moreover, since $P_{_{x, y}} = \frac{1}{2 {(m n)}^{2}}$ if there is an edge from $x$ to $y$ , we have $c ≔ c (e) = π (x) P_{x, y} = \frac{1}{N (2 m^{2} n^{2}}$ is constant for all $e .$ Thus $ρ (ϕ) = \frac{m a x_{e} ϕ^{'} (e)}{c}$ .

Now, using Theorem 1, Corollary 6, we have $τ_{_{x}} (ϵ) \leq (2 m^{2} n^{2}) (c_{_{m a x}} m n) (ln \frac{1}{π (x)} + ln \frac{1}{ϵ}) .$

Since an entry may take any value from $0$ to $c_{_{m a x}} - 1$ , we have $N \leq {(c_{_{m a x}})}^{m n}$ . Thus, $τ_{_{x}} (ϵ) \leq 2 c_{_{m a x}} {(m n)}^{3} (m n ln c_{_{m a x}} + ln \frac{1}{ϵ}) \leq 2 c_{_{m a x}} {(m n)}^{4} ln (c_{_{m a x}} ϵ^{- 1}) . □$

Notes

Peer review under responsibility of Kalasalingam University.

References

Diaconis P. Gangolli A. Rectangular arrays with fixed margins Proceedings of the Workshop on Markov Chains 1994 Institute of Mathematics and its Applications, University of Minnesota
Google Scholar
Diaconis P. Efron B. Testing for independence in a two-way table Ann. Statist. 13 1985 845 913
Web of Science ®Google Scholar
Goteli N.J. Ellison A.M. A Primer of Ecological Statistics Second ed. 2013 Sinauer Associates, Inc
Google Scholar
Pirzada S. An Introduction To Graph Theory 2012 Universities Press OrientBlackswan, Hyderabad, India
Google Scholar
Brualdi R.A. Matrices of zeroes and ones with fixed row and column sum vectors Linear Algebra Appl. 33 1980 159 231
Web of Science ®Google Scholar
Cobb G.W. Chen Y. An application of Markov Chains Monte Carlo to community ecology Amer. Math. Monthly 110 2003 265 288
Web of Science ®Google Scholar
Cooper C. Dyer M. Greenhill C. Sampling regular graphs and Peer-to-Peer network Combin. Probab. Comput. 16 2007 557 594
Web of Science ®Google Scholar
Erdös P. Gallai T.G. Graphs with prescribed degrees of vertices (Hungarian) Mat. Lapok 11 1960 264 274
Google Scholar
Gail M. Mantel N. Counting the number of rxc contingency tables with fixed margins J. Amer. Statist. Assoc. 72 1977 859 862
Web of Science ®Google Scholar
Jerrum M. Sinclair A. Approximating the permanent SIAM J. Comput. 18 6 1989 1149 1178
Web of Science ®Google Scholar
Kannan R. Tetali P. Vempala S. Simple Markov-chain algorithms for generating bipartite graphs and tournaments Random Struct. Algorithms 14 4 1999 293 308
Web of Science ®Google Scholar
Sinclair A. Improved bounds for mixing rates of Markov chains and multicommodity flow Lect. Notes Comput. Sci. 583 1992 474 487
Google Scholar
R. Kannan, Markov Chains and polynomial time Algorithm, in: Plenary Talk At Proc. of 35th Annual IEEE Symp. on the Foundations of Computer Science, 1994, pp. 656–671.
Google Scholar
Erdös P.L. Miklós I. Toroczkai Z. A simple Havel-Hakimi type algorithm to realize graphical degree sequences of directed graphs Electron. J. Combin. 17 1 2010 P66
Web of Science ®Google Scholar
Chen Y. Diaconis P. Holmes S. Liu J.S. Sequential Monte Carlo methods for statistical analysis of tables J. Amer. Statist. Assoc. 100 2005 109 120
Web of Science ®Google Scholar
Chung F.R.K. Graham R.L. Yau S.T. On sampling with Markov chains Random Struct. Algorithms 9 1996 55 77
Web of Science ®Google Scholar
Levin D.A. Peres Y. Wilmer E.L. Markov Chains and Mixing Times 2006 Amer. Math. Soc
Google Scholar

Sampling contingency tablesFootnote
Peer review under responsibility of Kalasalingam University.

Abstract

1 Introduction

2 Basics on Markov chains

[Citation12]

3 Markov chains on the set of contingency tables

[Citation1,Citation2]

References

Information for

Open access

Opportunities

Help and information

Sampling contingency tablesFootnotePeer review under responsibility of Kalasalingam University.

Abstract

1 Introduction

2 Basics on Markov chains

[Citation12]

3 Markov chains on the set of contingency tables

[Citation1,Citation2]

Notes

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Sampling contingency tablesFootnote
Peer review under responsibility of Kalasalingam University.