Full article: A posterior convergence rate theorem for general Markov chains

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This paper establishes a posterior convergence rate theorem for general Markov chains. Our approach is based on the Hausdorff α-entropy introduced by Xing (Electronic Journal of Statistics 2:848–62, 2008) and Xing and Ranneby (Journal of Statistical Planning and Inference 139 (7):2479–89, 2009). As an application we illustrate our results on a non linear autoregressive model.

Keywords:

AMS CLASSIFICATION:

1. Introduction

The aim of this paper is to study the asymptotic behavior of posterior distributions based on observations which arise from Markov chains. Let $X_{0}, X_{1}, \dots$ be a Markov chain with transition density $p_{θ} (y | x)$ and initial density $q_{θ} (x_{0})$ with respect to some σ-finite measure μ on a measurable space $(X, A) .$ We assume that the function $x \mapsto q_{θ} (x)$ and the 2-variable function $(x, y) \mapsto p_{θ} (y | x)$ are measurable for all parameters θ in the parameter set $Θ .$ So the joint distribution $P_{θ}^{(n)}$ of $X^{(n)} = {X_{0}, X_{1}, \dots, X_{n}}$ has a density given by $p_{θ}^{(n)} (x^{(n)}) = q_{θ} (x_{0}) \prod_{i = 1}^{n} p_{θ} (x_{i} | x_{i - 1})$ relative to the product measure $μ (x_{0}) μ (x_{1}) \dots μ (x_{n}),$ where the parameter θ does not depend on the size of $X^{(n)} .$ Denote by θ₀ the true parameter generating the observations $X_{0}, X_{1}, \dots .$ Note that any semimetric $d ((q_{θ_{1}}, p_{θ_{1}}), (q_{θ_{2}}, p_{θ_{2}}))$ on the product space of the initial densities and the transition densities induces naturally a semimetric $d (θ_{1}, θ_{2}) = d ((q_{θ_{1}}, p_{θ_{1}}), (q_{θ_{2}}, p_{θ_{2}}))$ on Θ when the mapping $θ \mapsto (q_{θ}, p_{θ})$ is one-to-one, which is assumed in the paper. Given a prior Π on Θ, the posterior distribution $Π (\cdot | X_{0}, X_{1}, \dots, X_{n})$ is a random probability measure given by $Π (B | X_{0}, X_{1}, \dots, X_{n}) = \frac{\int_{B} p_{θ}^{(n)} (X^{(n)}) Π (d θ)}{\int_{Θ} p_{θ}^{(n)} (X^{(n)}) Π (d θ)} = \frac{\int_{B} R_{θ}^{(n)} (X^{(n)}) Π (d θ)}{\int_{Θ} R_{θ}^{(n)} (X^{(n)}) Π (d θ)}$ for each measurable subset B in Θ, where $R_{θ}^{(n)} (X^{(n)}) = p_{θ}^{(n)} (X^{(n)}) / p_{θ_{0}}^{(n)} (X^{(n)})$ stands for the likelihood ratio.

Recall that the posterior distribution $Π (\cdot | X_{0}, X_{1}, \dots, X_{n})$ is said to be convergent almost surely at a rate at least $ε_{n}$ if there exists r > 0 such that $Π (θ \in Θ : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n}) \to 0 almost surely as n \to \infty .$ Posterior consistency is an important issue in Bayesian analysis. Much works were concerned with the asymptotic behavior of posterior distributions for independent and identically distributed observations, see, for instance, Barron, Schervish, and Wasserman (Citation1999), Shen and Wasserman (Citation2001), Ghosal and van der Vaart (Citation2007), Walker, Lijoi, and Prunster (Citation2007), Walker (Citation2003), Walker (Citation2004), Xing and Ranneby (Citation2009), Xing (Citation2011a) and Xing (Citation2011b). An old well-known approach is based on the existence of uniformly consistent tests. In this paper we use an integration condition together with the Hausdorff α-entropy to study convergence rates of posteriors when the observations are not independent and identically distributed. The integration condition and the Hausdorff α-entropy have an advantage in applications, because they both are prior-dependent. The Hausdorff α-entropy condition was introduced in Xing (Citation2008) and Xing and Ranneby (Citation2009) and it is weaker than the metric entropy condition. By means of the integration condition and the Hausdorff α-entropy, we establish a posterior convergence rate theorem for general Markov chains. As applications we discuss the posterior rate of convergence for the non linear autoregressive model.

The layout of this paper is as follows. In Sec. 2 we present a prior-dependent integration inequality and show one type of general posterior convergence rate theorem for Markov chains. In Sec. 3 we illustrate our result by finding a posterior convergence rate for non linear autoregression model. The technical proofs are collected in the Appendix.

2. A convergence rate theorem for Markov chains

In this section we introduce a prior-dependent integration condition to consistency of posterior distributions. Together with the Hausdorff α-entropy, the integration condition plays a central roll in study of the Bayesian convergence rate.

Recall that the Hausdorff α-entropy $J (δ, Θ_{1}, α, d)$ for a subset $Θ_{1} \subset Θ$ is the logarithm of the minimal sum of αth power of prior masses of balls of d-radius $\leq δ$ needed to cover $Θ_{1},$ see Xing (Citation2008) and Xing and Ranneby (Citation2009) for the details of the Hausdorff α-entropy. For simplicity of notations, we define the Hausdorff α-constant $C (δ, Θ_{1}, α, d) : = e^{J (δ, Θ_{1}, α, d)}$ of any subset $Θ_{1}$ of Θ. Observe that $C (δ, Θ_{1}, α, d)$ depends on the prior Π. It was proved in Xing and Ranneby (Citation2009) that the inequality $Π {(Θ_{1})}^{α} \leq C (δ, Θ_{1}, α, d) \leq Π {(Θ_{1})}^{α} N {(δ, Θ_{1}, d)}^{1 - α}$ holds for any $0 \leq α \leq 1,$ where $N (δ, Θ_{1}, d)$ denotes the minimal number of balls of d-radius $\leq δ$ needed to cover $Θ_{1} \subset Θ .$ We shall adopt the following Hellinger type semimetrics. $\begin{matrix} H (p_{θ_{1}} (y | x), p_{θ_{2}} (y | x)) = {(\int_{X} \int_{X} (\sqrt{p_{θ_{1}} (y | x)} - \sqrt{p_{θ_{2}} (y | x)})}^{2} d μ (y) d ν (x))^{1 / 2}, \\ H (q_{θ_{1}} (x), q_{θ_{2}} (x)) = {(\int_{X} (\sqrt{q_{θ_{1}} (x)} - \sqrt{q_{θ_{2}} (x)})}^{2} d μ (x))^{1 / 2}, \\ H_{*} (p_{θ_{1}} (y | x), p_{θ_{2}} (y | x)) \\ = {(\int_{X} \int_{X} (\sqrt{p_{θ_{1}} (y | x)} - \sqrt{p_{θ_{2}} (y | x)})}^{2} (\frac{2}{3} \sqrt{\frac{p_{θ_{1}} (y | x)}{p_{θ_{2}} (y | x)}} + \frac{1}{3}) d μ (y) d ν (x))^{1 / 2}, \\ H_{*} (q_{θ_{1}} (x), q_{θ_{2}} (x)) = {(\int_{X} (\sqrt{q_{θ_{1}} (x)} - \sqrt{q_{θ_{2}} (x)})}^{2} (\frac{2}{3} \sqrt{\frac{q_{θ_{1}} (x)}{q_{θ_{2}} (x)}} + \frac{1}{3}) d μ (x))^{1 / 2} . \end{matrix}$ Denote $W_{n}^{1} (θ_{0}, ε) = {θ \in Θ : H_{*} {(p_{θ_{0}}, p_{θ})}^{2} + \frac{1}{n} H_{*} {(q_{θ_{0}}, q_{θ})}^{2} \leq ε^{2}} .$ By means of the metric $d (θ, θ_{0}) : = H (p_{θ}, p_{θ_{0}}),$ Ghosal and van der Vaart (Citation2007, Theorem 5) gave an in-probability posterior convergence rate theorem for stationary α-mixing Markov chains. Since calculation of the α-mixing coefficients is generally not easy and many processes are neither mixing nor stationary, it seems worth to develop a posterior convergence rate theorem for Markov chains which may be neither stationary nor α-mixing. Now we present an almost sure assertion in this direction. Our result is based on the following prior-dependent integration condition.

Throughout this paper the notation $a ≲ b$ means $a \leq C b$ for some positive constant C which is universal or fixed in the proof. Write $a \approx b$ if $a ≲ b$ and $b ≲ a .$ Denote $P f^{α} = \int_{X} f^{α} d P$ which is the integral of the non negative function f with power α relative to the measure P on $X .$

Proposition 1.

Suppose that there exist a μ-integrable function r(y) and constants $a_{1} \geq a_{0} > 0$ with $a_{1} \geq 1$ such that $d ν (y) = r (y) d μ (y)$ and $a_{0} r (y) \leq p_{θ} (y | x) \leq a_{1} r (y)$ for all $θ \in Θ$ and $x, y \in X$ . Let $0 < δ < \frac{\sqrt{a_{0}}}{2 \sqrt{a_{1}}}$ and $0 < α < \frac{1}{2}$ . Then the inequality $\begin{matrix} P_{θ_{0}}^{(n)} {(\int_{{θ \in Θ_{1} : d (θ, θ_{0}) > ε}} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ))}^{α} \\ \leq 2 e^{- (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} n ε^{2}} C (δ ε, {θ \in Θ_{1} : d (θ, θ_{0}) > ε}, α, d) \end{matrix}$ holds for all n, $ε > 0$ and $Θ_{1} \subset Θ$ , where $d (θ, θ_{0}) = H (p_{θ}, p_{θ_{0}}) .$

Therefore we have

Theorem 1.

Suppose that all assumptions of Proposition 1 hold and suppose that $ε_{n} > 0, n ε_{n}^{2} \geq c_{0} log n$ for all large n and some fixed constant $c_{0} > 0$ . Suppose that there exist $c_{1} < (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}, c_{2} > \frac{1}{c_{0}}$ and a sequence of subsets Θ_n on Θ such that $C (δ j ε_{n}, {θ \in Θ_{n} : j ε_{n} < d (θ, θ_{0}) \leq 2 j ε_{n}}, α, d) \leq e^{c_{1} j^{2} n ε_{n}^{2}} Π {(W_{n}^{1} (θ_{0}, ε_{n}))}^{α}$ for all large j, n, and $\sum_{n = 1}^{\infty} \frac{e^{n ε_{n}^{2} (3 a_{1} + 4 c_{2})} Π (Θ ∖ Θ_{n})}{Π (W_{n}^{1} (θ_{0}, ε_{n}))} < \infty .$

Then there exists b > 0 such that for each large r and all large n, $Π (θ \in Θ : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n}) \leq e^{- b n ε_{n}^{2}} almost surely .$

By choosing $δ = \frac{\sqrt{a_{0}}}{4 \sqrt{a_{1}}}$ and $α = \frac{1}{4}$ we can easily get

Corollary 1.

Suppose that there exist a μ-integrable function r(y) and constants $a_{1} \geq a_{0} > 0$ such that $d ν (y) = r (y) d μ (y)$ and $a_{0} r (y) \leq p_{θ} (y | x) \leq a_{1} r (y)$ for all $θ \in Θ$ and $x, y \in X$ . Suppose that $ε_{n} > 0, n ε_{n}^{2} \geq c_{0} log n$ for all large n and some fixed constant $c_{0} > 0$ . Suppose that there exist $c_{1}, c_{2}, c_{3}$ with $3 c_{1} + c_{2} < a_{0} / 16$ and $c_{3} > 1 / c_{0}$ and a sequence of subsets Θ_n on Θ such that for all large j and n,

$N (\frac{\sqrt{a_{0}}}{4 \sqrt{a_{1}}} j ε_{n}, {θ \in Θ_{n} : j ε_{n} < d (θ, θ_{0}) \leq 2 j ε_{n}}, d) \leq e^{c_{1} j^{2} n ε_{n}^{2}};$
$Π (θ \in Θ_{n} : j ε_{n} < d (θ, θ_{0}) \leq 2 j ε_{n}) \leq e^{c_{2} j^{2} n ε_{n}^{2}} Π (W_{n}^{1} (θ_{0}, ε_{n}));$
$\sum_{n = 1}^{\infty} \frac{e^{n ε_{n}^{2} (3 a_{1} + 4 c_{3})} Π (Θ ∖ Θ_{n})}{Π (W_{n}^{1} (θ_{0}, ε_{n}))} < \infty .$

Then there exists b > 0 such that for each large r and all large n,

Π (θ \in Θ : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n}) \leq e^{- b n ε_{n}^{2}} almost surely .

3. Non linear autoregression

In this section we discuss an application of our theorems. By means of Corollary 1, we improve on the posterior rate of convergence for the non linear autoregressive model in Ghosal and van der Vaart (Citation2007).

We observe $X_{1}, X_{2}, \dots, X_{n}$ of a time series ${X_{t} : t \in Z}$ given by $X_{i} = f (X_{i - 1}) + ξ_{i} for i = 1, 2, \dots, n,$ where $ξ_{1}, ξ_{2}, \dots, ξ_{n}$ are i.i.d. random variables with the standard normal distribution and the unknown regression function f is in the space $F$ which consists of all functions f with $\sup_{x \in R} | f (x) | \leq M$ for some fixed positive constant M. Let $q_{f} (x)$ be the density of X₀ relative to the Lebesgue measure $d μ$ on $R .$ So $X_{0}, X_{1}, \dots$ can be considered as a Markov chain generated by the transition density $p_{f} (y | x) = ϕ (y - f (x))$ with $ϕ (x) = {(2 π)}^{- 1 / 2} e^{- x^{2} / 2}$ and the initial density $q_{f} (x) .$ Since $ϕ (x)$ is a strictly positive continuous function tending to zero as $x \to \pm \infty,$ there exist two constants $0 < a_{0} < 1 < a_{1}$ depending only on M such that $a_{0} ϕ (y) \leq p_{f} (y | x) \leq a_{1} ϕ (y)$ for all $f \in F$ and $- \infty < y, x < \infty .$ Assume that there exists a constant N > 0 such that the set of initial densities of the Markov chain satisfies $H_{*} (q_{f_{1}}, q_{f_{2}}) \leq N$ for all initial densities $q_{f_{1}}$ and $q_{f_{2}} .$ For instance, all of the initial densities with $a_{0} ϕ (x) \leq q_{f} (x) \leq a_{1} ϕ (x)$ satisfy $H_{*} (q_{f_{1}}, q_{f_{2}}) \leq \sqrt{2} {(a_{1} / a_{0})}^{1 / 4}$ and hence form a set with the requirement. Define a measure $d ν = ϕ d μ$ in $R$ and a norm $| | f | |_{2} = {(\int_{R} | f |^{2} d ν)}^{1 / 2}$ on $F .$ Assume that the true regression function $f_{0} \in F$ belongs to the Lipschitz continuous space Lip_M, which consists of all functions f on $(- \infty, \infty)$ satisfying $| f (x) - f (y) | \leq L | x - y |$ for all $- \infty < x, y < \infty,$ where L is a fixed positive constant. When the Markov chain is stationary, Ghosal and van der Vaart (Citation2007, Section 7.4) constructed a prior on the regression functions and obtained the in-probability posterior convergence rate $n^{- 1 / 3} {(log n)}^{1 / 2},$ which is the minimax rate times the logarithmic factor ${(log n)}^{1 / 2} .$ In the following we shall apply Corollary 1 to get the posterior convergence rate $n^{- 1 / 3} {(log n)}^{1 / 6}$ in the almost sure sense for a general Markov chain defined as above.

First, we note that for any $f \in F,$ $\begin{matrix} H_{*} {(p_{f_{0}}, p_{f})}^{2} + \frac{1}{n} H_{*} {(q_{f_{0}}, q_{f})}^{2} \leq \sqrt{\frac{a_{1}}{a_{0}}} H {(p_{f_{0}}, p_{f})}^{2} + \frac{N^{2}}{n} \\ = \frac{1}{2} \sqrt{\frac{a_{1}}{a_{0}}} \int_{- \infty}^{\infty} (1 - e^{- \frac{{(f (x) - f_{0} (x))}^{2}}{4}}) d ν (x) + \frac{N^{2}}{n} \leq \frac{| | f - f_{0} | |_{2}^{2}}{8} \sqrt{\frac{a_{1}}{a_{0}}} + \frac{N^{2}}{n}, \end{matrix}$ where the last inequality follows from the elementary inequality $1 - e^{- t} \leq t .$ Hence for some small constant $b_{1} > 0$ we have that $W_{n}^{1} (f_{0}, ε_{n}) \supset {f \in F : | | f - f_{0} | |_{2} \leq b_{1} ε_{n}}$ for all large n. Similarly, $| | f - f_{0} | |_{2} \approx H (p_{f}, p_{f_{0}})$ hold for all $f \in F$ with $| | f - f_{0} | |_{2} \leq 1 .$ Hence Corollary 1 works well for the metric $| | \cdot | |_{2} .$

We also need some basic facts on approximation of Lipschitz continuous functions by means of step functions. Given a finite interval $[- A_{n}, A_{n})$ and a positive integer K_n, we make the partition $[- A_{n}, A_{n}) = \cup_{k = 1}^{K_{n}} I_{k}$ with $I_{k} = [- A_{n} + \frac{2 A_{n} (k - 1)}{K_{n}}, - A_{n} + \frac{2 A_{n} \cdot k}{K_{n}})$ for $k = 1, 2, \dots, K_{n} .$ Write $I_{0} = R ∖ [- A_{n}, A_{n}) .$ The space of step functions relative to the partition is the set of functions $h : [- A_{n}, A_{n}) \mapsto R$ such that h is identically equal to some constant on each I_k for $k = 1, 2, \dots, K_{n},$ more precisely, $h (x) = \sum_{k = 1}^{K_{n}} β_{k} 1_{I_{k}} (x)$ for some $β = (β_{1}, β_{2}, \dots, β_{K_{n}}) \in {[- M, M]}^{K_{n}} \subset R^{K_{n}},$ where $1_{I_{k}} (x)$ denotes the indicator function of I_k. Denote by $f_{β} (x)$ the function on $(- \infty, \infty)$ which is equal to $\sum_{k = 1}^{K_{n}} β_{k} 1_{I_{k}} (x)$ on $[- A_{n}, A_{n})$ and vanish outside $[- A_{n}, A_{n}) .$ Hence $f_{β} \in F$ and $| | f_{β_{1}} - f_{β_{2}} | |_{2} = | | β_{1} - β_{2} | |_{*},$ where $| | β | |_{*} = {(\sum_{k = 1}^{K_{n}} β_{k}^{2} {(\int_{I_{k}} d ν)}^{2})}^{1 / 2} .$ Let Π be the prior on $F$ which is induced by the map $β \mapsto f_{β}$ such that all the coordinates β_k of β are chosen to be i.i.d. random variables with the uniform distribution on $[- M, M] .$ Hence the support $F_{Π}$ of Π consists of all such functions $f_{β} .$ Take $A_{n} = 2 \sqrt{log (1 / ε_{n})} \approx \sqrt{log n}$ and $K_{n} = ⌊ \frac{3 L A_{n}}{b_{1} ε_{n}} ⌋ + 1$ with $ε_{n} = {(\frac{\sqrt{log n}}{n})}^{1 / 3} .$ Then $K_{n} \approx {(n log n)}^{1 / 3} \approx n ε_{n}^{2} .$ Write $β_{0} = (β_{0, 1}, β_{0, 2}, \dots, β_{0, K_{n}})$ for $β_{0, k} = f_{0} (- A_{n} + \frac{2 A_{n} k - 1}{K_{n}}) .$ Since $f_{0} \in F \cap L i p_{L},$ we have that $f_{β_{0}} \in F$ and $\sup_{- A_{n} \leq x < A_{n}} | f_{β_{0}} (x) - f_{0} (x) | \leq L A_{n} / K_{n} \leq b_{1} ε_{n} / 3.$ From the triangle inequality and the inequality $\int_{x}^{\infty} ϕ (t) d t \leq ϕ (x) / x$ for all x > 0, it follows that for all $f_{β} \in F_{Π}$ and for all large n, $\begin{matrix} | | | f_{β} - f_{0} | |_{2} - | | f_{β} - f_{β_{0}} | |_{2} | \leq | | f_{β_{0}} - f_{0} | |_{2} = {(\int_{- A_{n}}^{A_{n}} | f_{0} - f_{β_{0}} |^{2} d ν)}^{1 / 2} \\ + {(\int_{I_{0}} f_{0}^{2} d ν)}^{1 / 2} \leq \frac{b_{1} ε_{n}}{3} {(\int_{- A_{n}}^{A_{n}} d ν)}^{1 / 2} + M {(\frac{ϕ (A_{n})}{A_{n}})}^{1 / 2} \\ \leq \frac{b_{1} ε_{n}}{3} + \frac{M ε_{n}}{{(2 π)}^{1 / 4} A_{n}^{1 / 2}} \leq \frac{b_{1} ε_{n}}{2} . \end{matrix}$ Thus for all large j and n, we have $\begin{matrix} \frac{Π (f_{β} \in F_{Π} : j ε_{n} < | | f_{β} - f_{0} | |_{2} \leq 2 j ε_{n})}{Π (W_{n}^{1} (θ_{0}, ε_{n}))} \leq \frac{Π (f_{β} \in F_{Π} : | | f_{β} - f_{0} | |_{2} \leq 2 j ε_{n})}{Π (f_{β} \in F_{Π} : | | f_{β} - f_{0} | |_{2} \leq b_{1} ε_{n})} \\ \leq \frac{Π (f_{β} \in F_{Π} : | | f_{β} - f_{0} | |_{2} \leq 3 j ε_{n})}{Π (f_{β} \in F_{Π} : | | f_{β} - f_{β_{0}} | |_{2} \leq \frac{b_{1}}{2} ε_{n})} \\ = \frac{Π (β \in {[- M, M]}^{K_{n}} : | | β - β_{0} | |_{*} \leq 3 j ε_{n})}{Π (β \in {[- M, M]}^{K_{n}} : | | β - β_{0} | |_{*} \leq \frac{b_{1}}{2} ε_{n})} . \end{matrix}$ Note that the Euclidean volume of the K_n-dimensional ellipsoid ${β \in R^{K_{n}} : | | β - β_{0} | |_{*} \leq r}$ is equal to $r^{K_{n}}$ times the Euclidean volume of the “unit” K_n-dimensional ellipsoid ${β \in R^{K_{n}} : | | β - β_{0} | |_{*} \leq 1} .$ So the last quotient does not exceed $j^{2 K_{n}} = e^{K_{n} log (2 j)},$ which is less than $e^{c_{2} j^{2} n ε_{n}^{2}}$ for any given $c_{2} > 0$ and all large j. Hence we have obtained condition (ii) of Corollary 1. Similarly, for all large j and n, we have $\begin{matrix} N (\frac{\sqrt{a_{0}}}{4 \sqrt{a_{1}}} j ε_{n}, {f_{β} \in F_{Π} : j ε_{n} < | | f_{β} - f_{0} | |_{2} \leq 2 j ε_{n}}, | | \cdot | |_{2}) \\ \leq N (\frac{\sqrt{a_{0}}}{4 \sqrt{a_{1}}} j ε_{n}, {f_{β} \in F_{Π} : | | f_{β} - f_{β_{0}} | |_{2} \leq 3 j ε_{n}}, | | \cdot | |_{2}) \\ \leq N (\frac{\sqrt{a_{0}}}{4 \sqrt{a_{1}}} j ε_{n}, {β \in {[- M, M]}^{K_{n}} : | | β - β_{0} | |_{*} \leq 3 j ε_{n}}, | | \cdot | |_{*}), \end{matrix}$ which, by Lemma 4.1 in Pollard (Citation1990), is less than $b_{2}^{K_{n}} = e^{K_{n} log b_{2}}$ for some constant $b_{2} > 0,$ and therefore condition (i) of Corollary 1 holds for any given $c_{1} > 0 .$

References

Barron, A., M. Schervish, and L. Wasserman. 1999. The consistency of posterior distributions in nonparametric problems. The Annals of Statistics 27 (2):536–61. doi: 10.1214/aos/1018031206.
Web of Science ®Google Scholar
Ghosal, S., and A. W. van der Vaart. 2007. Convergence rates of posterior distributions for noniid observations. The Annals of Statistics 35 (1):192–223. doi:10.1214/009053606000001172.
Web of Science ®Google Scholar
Pollard, D. 1990. Empirical processes: Theory and applications. Hayward, CA: IMS.
Google Scholar
Shen, X., and L. Wasserman. 2001. Rates of convergence of posterior distributions. The Annals of Statistics 29 (3):687–714. doi:10.1214/aos/1009210686.
Web of Science ®Google Scholar
Walker, S. 2003. On sufficient conditions for Bayesian consistency. Biometrika 90 (2):482–488. doi:10.1093/biomet/90.2.482.
Web of Science ®Google Scholar
Walker, S. 2004. New approaches to Bayesian consistency. The Annals of Statistics 32 (5):2028–43. doi:10.1214/009053604000000409.
Web of Science ®Google Scholar
Walker, S., A. Lijoi, and I. Prunster. 2007. On rates of convergence for posterior distributions in infinite-dimensional models. The Annals of Statistics 35 (2):738–46. doi:10.1214/009053606000001361.
Web of Science ®Google Scholar
Xing, Y. 2008. On adaptive Bayesian inference. Electronic Journal of Statistics 2:848–62.
Web of Science ®Google Scholar
Xing, Y. 2011a. Rates of posterior convergence for iid observations. Communications in Statistics - Theory and Methods 39 (19):3389–98. doi:10.1080/03610920903177389.
Google Scholar
Xing, Y. 2011b. Convergence rates of nonparametric posterior distributions. Journal of Statistical Planning and Inference 141 (11):3382–90. doi:10.1016/j.jspi.2010.10.009.
Web of Science ®Google Scholar
Xing, Y., and B. Ranneby. 2009. Sufficient conditions for Bayesian consistency. Journal of Statistical Planning and Inference 139 (7):2479–89. doi:10.1016/j.jspi.2008.11.008.
Web of Science ®Google Scholar

Appendix

Proof of Proposition 1.

Our proof is mainly based on some elementary inequalities such as Jensen’ inequality and Hölder’s inequality. It is no restriction to assume that $n = 2 k$ is an even number. Take non empty disjoint subsets $B_{j}, j = 1, 2, \dots, N,$ of Θ such that $\sum_{j = 1}^{N} Π {(B_{j})}^{α} \leq 2 C (δ ε, {θ \in Θ_{1} : d (θ, θ_{0}) > ε}, α, d), \cup_{j = 1}^{N} B_{j} = {θ \in Θ_{1} : d (θ, θ_{0}) > ε}$ and d-diameters of all B_j do not exceed $2 δ ε .$ Then by the inequality ${(x + y)}^{α} \leq x^{α} + y^{α}$ for all $x, y \geq 0,$ we get $\begin{matrix} P_{θ_{0}}^{(n)} {(\int_{{θ \in Θ_{1} : d (θ, θ_{0}) > ε}} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{2 k} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ))}^{α} \\ \leq P_{θ_{0}}^{(n)} \sum_{j = 1}^{N} (\int_{B_{j}} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{2 k} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ))^{α} \\ = \sum_{j = 1}^{N} Π {(B_{j})}^{α} P_{θ_{0}}^{(n)} {(\frac{1}{Π (B_{j})} \int_{B_{j}} \frac{q_{θ} (X_{0}) \prod_{i = 1}^{2 k} p_{θ} (X_{i} | X_{i - 1})}{q_{θ_{0}} (X_{0}) \prod_{i = 1}^{2 k} p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ))}^{α} \\ \leq 2 C (δ ε, {θ \in Θ_{1} : d (θ, θ_{0}) > ε}, α, d) \\ \max_{1 \leq j \leq N} P_{θ_{0}}^{(n)} {(\frac{1}{Π (B_{j})} \int_{B_{j}} \frac{q_{θ} (X_{0}) \prod_{i = 1}^{2 k} p_{θ} (X_{i} | X_{i - 1})}{q_{θ_{0}} (X_{0}) \prod_{i = 1}^{2 k} p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ))}^{α} . \end{matrix}$ we shall use the notations $\prod_{i = 1}^{0} p_{θ} (X_{i} | X_{i - 1}) = 1$ and $I_{j, s} = \frac{\int_{B_{j}} q_{θ} (X_{0}) \prod_{i = 1}^{s + 1} p_{θ} (X_{i} | X_{i - 1}) Π (d θ)}{\int_{B_{j}} q_{θ} (X_{0}) \prod_{i = 1}^{s} p_{θ} (X_{i} | X_{i - 1}) Π (d θ)}$ for $s = 0, 1, \dots, 2 k - 1 .$ For simplicity we also let $I_{j, s}$ stand for the parameter of the corresponding integral means. Then the last maximum is equal to $\begin{matrix} \max_{1 \leq j \leq N} P_{θ_{0}}^{(n)} {(\frac{\int_{B_{j}} q_{θ} (X_{0}) Π (d θ)}{q_{θ_{0}} (X_{0}) Π (B_{j})} \prod_{s = 0}^{2 k - 1} \frac{I_{j, s}}{p_{θ_{0}} (X_{s + 1} | X_{s})})}^{α} \\ = \max_{1 \leq j \leq N} P_{θ_{0}}^{(n)} ({(\frac{\int_{B_{j}} q_{θ} (X_{0}) Π (d θ)}{q_{θ_{0}} (X_{0}) Π (B_{j})} \prod_{t = 1}^{k} \frac{I_{j, 2 t - 1}}{p_{θ_{0}} (X_{2 t} | X_{2 t - 1})})}^{α} {(\prod_{t = 0}^{k - 1} \frac{I_{j, 2 t}}{p_{θ_{0}} (X_{2 t + 1} | X_{2 t})})}^{α}), \end{matrix}$ which, by Hölder’s inequality, is less than $\begin{matrix} \max_{1 \leq j \leq N} {(P_{θ_{0}}^{(n)} {(\frac{\int_{B_{j}} q_{θ} (X_{0}) Π (d θ)}{q_{θ_{0}} (X_{0}) Π (B_{j})} \prod_{t = 1}^{k} \frac{I_{j, 2 t - 1}}{p_{θ_{0}} (X_{2 t} | X_{2 t - 1})})}^{2 α})}^{\frac{1}{2}} \\ \max_{1 \leq j \leq N} {(P_{θ_{0}}^{(n)} {(\prod_{t = 0}^{k - 1} \frac{I_{j, 2 t}}{p_{θ_{0}} (X_{2 t + 1} | X_{2 t})})}^{2 α})}^{\frac{1}{2}} : = (\max_{1 \leq j \leq N} A_{j, k}) (\max_{1 \leq j \leq N} B_{j, k}) . \end{matrix}$ Take $θ_{j} \in B_{j}$ for each j. From Jensen’s inequality and the assumption $a_{0} r (X_{s}) \leq p_{θ} (X_{s} | X_{s - 1}) \leq a_{1} r (X_{s})$ it turns out that $\begin{matrix} d {(I_{j, s}, θ_{j})}^{2} = \int_{X} \int_{X} (\sqrt{I_{j, s}} - \sqrt{p_{θ_{j}} (X_{s + 1} | X_{s})})^{2} d μ (X_{s + 1}) d ν (X_{s}) \\ \leq \int_{B_{j}} \int_{X} \int_{X} (\sqrt{p_{θ} (X_{s + 1} | X_{s})} - \sqrt{p_{θ_{j}} (X_{s + 1} | X_{s})})^{2} d μ (X_{s + 1}) \\ \frac{q_{θ} (X_{0}) \prod_{i = 1}^{s} p_{θ} (X_{i} | X_{i - 1})}{\int_{B_{j}} q_{θ} (X_{0}) \prod_{i = 1}^{s} p_{θ} (X_{i} | X_{i - 1}) Π (d θ)} d ν (X_{s}) Π (d θ) \\ \leq \frac{a_{1}}{a_{0}} \int_{B_{j}} \int_{X} \int_{X} (\sqrt{p_{θ} (X_{s + 1} | X_{s})} - \sqrt{p_{θ_{j}} (X_{s + 1} | X_{s})})^{2} d μ (X_{s + 1}) d ν (X_{s}) \\ \frac{q_{θ} (X_{0}) \prod_{i = 1}^{s - 1} p_{θ} (X_{i} | X_{i - 1})}{\int_{B_{j}} q_{θ} (X_{0}) \prod_{i = 1}^{s - 1} p_{θ} (X_{i} | X_{i - 1}) Π (d θ)} Π (d θ) \leq \frac{4 a_{1} δ^{2} ε^{2}}{a_{0}} \\ = \frac{a_{1}}{a_{0}} \int_{B_{j}} d {(θ, θ_{j})}^{2} \frac{q_{θ} (X_{0}) \prod_{i = 1}^{s - 1} p_{θ} (X_{i} | X_{i - 1})}{\int_{B_{j}} q_{θ} (X_{0}) \prod_{i = 1}^{s - 1} p_{θ} (X_{i} | X_{i - 1}) Π (d θ)} Π (d θ) \leq \frac{4 a_{1} δ^{2} ε^{2}}{a_{0}} \end{matrix}$ Thus, $d (I_{j, s}, θ_{j}) \leq \frac{2 \sqrt{a_{1}} δ ε}{\sqrt{a_{0}}}$ and $d (I_{j, s}, θ_{0}) \geq d (θ_{j}, θ_{0}) - d (I_{j, s}, θ_{j}) \geq (1 - \frac{2 \sqrt{a_{1}} δ}{\sqrt{a_{0}}}) ε .$ Write $\begin{matrix} A_{j, k}^{2} = \\ \int_{X^{2 k - 1}} (\int_{X} (\int_{X} (\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})^{2 α} d μ (X_{2 k})) p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k - 1})) \\ {(\frac{\int_{B_{j}} q_{θ} (X_{0}) Π (d θ)}{q_{θ_{0}} (X_{0}) Π (B_{j})} \prod_{t = 1}^{k - 1} \frac{I_{j, 2 t - 1}}{p_{θ_{0}} (X_{2 t} | X_{2 t - 1})})}^{2 α} \\ q_{θ_{0}} (X_{0}) \prod_{s = 0}^{2 k - 3} p_{θ_{0}} (X_{s + 1} | X_{s}) d μ (X_{0}) d μ (X_{1}) \dots d μ (X_{2 k - 2}) . \end{matrix}$ Take an non negative integer m with $\frac{2 α}{1 - 2 α} \leq 2^{m} < \frac{4 α}{1 - 2 α} .$ From Hölder’s inequality it turns out that for each j and k, $\begin{matrix} \int_{X} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{2 α} d μ (X_{2 k}) \\ \int_{X} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{α} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{α} d μ (X_{2 k}) \\ \leq {(\int_{X} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{α \cdot \frac{2}{2 - 2 α}} d μ (X_{2 k}))}^{\frac{2 - 2 α}{2}} {(\int_{X} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{α \cdot \frac{1}{α}} d μ (X_{2 k}))}^{α} \\ = {(\int_{X} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{\frac{2 α}{2 - 2 α}} d μ (X_{2 k}))}^{\frac{2 - 2 α}{2}}, \end{matrix}$ which, by repeating the above procedure m more times, does not exceed $\begin{matrix} {(\int_{X} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{\frac{α}{2^{m} (1 - 2 α) + α}} d μ (X_{2 k}))}^{\frac{2^{m} (1 - 2 α) + α}{2^{m}}} \\ \leq {(\int_{X} {(\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{\frac{1}{2}} d μ (X_{2 k}))}^{\frac{α}{2^{m - 1}}} . \end{matrix}$ Thus we get $\begin{matrix} \int_{X} (\int_{X} (\frac{I_{j, 2 k - 1}}{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})^{2 α} d μ (X_{2 k})) p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k - 1}) \\ \leq \int_{X} (1 - \frac{1}{2} \int_{X} (\sqrt{I_{j, 2 k - 1}} - \sqrt{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})^{2} d μ (X_{2 k}))^{\frac{α}{2^{m - 1}}} \\ p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k - 1}) \\ \leq {(1 - \frac{1}{2} \int_{X} \int_{X} (\sqrt{I_{j, 2 k - 1}} - \sqrt{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{2} p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k}) d μ (X_{2 k - 1}))^{\frac{α}{2^{m - 1}}} \\ \leq {(1 - \frac{a_{0}}{2} \int_{X} \int_{X} (\sqrt{I_{j, 2 k - 1}} - \sqrt{p_{θ_{0}} (X_{2 k} | X_{2 k - 1})})}^{2} d μ (X_{2 k}) d ν (X_{2 k - 1}))^{\frac{1}{2} - α} \\ = {(1 - \frac{a_{0} d {(I_{j, 2 k - 1}, θ_{0})}^{2}}{2})}^{\frac{1}{2} - α} \leq e^{- (1 - 2 α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} ε^{2}} . \end{matrix}$ Hence we have $\begin{matrix} A_{j, k}^{2} \leq e^{- (1 - 2 α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} ε^{2}} \int_{X^{2 k - 1}} (\frac{\int_{B_{j}} q_{θ} (X_{0}) Π (d θ)}{q_{θ_{0}} (X_{0}) Π (B_{j})} \prod_{t = 1}^{k - 1} \frac{I_{j, 2 t - 1}}{p_{θ_{0}} (X_{2 t} | X_{2 t - 1})})^{2 α} \\ q_{θ_{0}} (X_{0}) \prod_{s = 0}^{2 k - 3} p_{θ_{0}} (X_{s + 1} | X_{s}) d μ (X_{0}) d μ (X_{1}) \dots d μ (X_{2 k - 2}) . \end{matrix}$ Repeating the same argument k − 1 times one can get that $\begin{matrix} A_{j, k}^{2} \leq e^{- (1 - 2 α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} k ε^{2}} \int_{X} (\frac{\int_{B_{j}} q_{θ} (X_{0}) Π (d θ)}{q_{θ_{0}} (X_{0}) Π (B_{j})})^{2 α} q_{θ_{0}} (X_{0}) d μ (X_{0}) \\ \leq e^{- (1 - 2 α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} k ε^{2}} {(\int_{X} \frac{\int_{B_{j}} q_{θ} (X_{0}) Π (d θ)}{q_{θ_{0}} (X_{0}) Π (B_{j})} q_{θ_{0}} (X_{0}) d μ (X_{0}))}^{2 α} \\ = e^{- (1 - 2 α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} k ε^{2}} . \end{matrix}$ Similarly, we have $B_{j, k}^{2} \leq e^{- (1 - 2 α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} k ε^{2}} .$ Hence we have proved the required inequality and the proof of Proposition 1 is complete. □

To prove Theorem 1 we need the following inequality.

Lemma 1.

If there exists a constant $a_{1} \geq 1$ such that $\int_{A} p_{θ_{0}} (y | x) d μ (y) \leq a_{1} \int_{A} d ν (y)$ for all $x \in X$ and $A \in A$ , then the inequality $\begin{matrix} P_{θ_{0}}^{(n)} (\int_{Θ} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ) \leq e^{- n ε^{2} (3 a_{1} + 4 c)} Π (W_{n}^{1} (θ_{0}, ε))) \\ \leq e^{- n ε^{2} c} \end{matrix}$ holds for all n, $ε > 0$ and c > 0.

Proof of Lemma 1.

Without loss of generality, we may assume that $Π (W_{n}^{1} (θ_{0}, ε)) > 0 .$ From Jensen’s inequality and Chebyshev’s inequality it follows that $\begin{matrix} P_{θ_{0}}^{(n)} (\int_{Θ} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ) \leq e^{- n ε^{2} (3 a_{1} + 4 c)} Π (W_{n}^{1} (θ_{0}, ε))) \\ \leq P_{θ_{0}}^{(n)} (e^{n ε^{2} (\frac{3 a_{1}}{4} + c)} \leq {(\frac{1}{Π (W_{n}^{1} (θ_{0}, ε))} \int_{W_{n}^{1} (θ_{0}, ε)} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ))}^{- \frac{1}{4}}) \\ \leq P_{θ_{0}}^{(n)} (e^{n ε^{2} (\frac{3 a_{1}}{4} + c)} \leq \frac{1}{Π (W_{n}^{1} (θ_{0}, ε))} \int_{W_{n}^{1} (θ_{0}, ε)} {(\frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})})}^{- \frac{1}{4}} Π (d θ)) \\ \leq \frac{\int_{W_{n}^{1} (θ_{0}, ε)} P_{θ_{0}}^{(n)} {(\frac{q_{θ_{0}} (X_{0})}{q_{θ} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ_{0}} (X_{i} | X_{i - 1})}{p_{θ} (X_{i} | X_{i - 1})})}^{\frac{1}{4}} Π (d θ)}{e^{n ε^{2} (\frac{3 a_{1}}{4} + c)} Π (W_{n}^{1} (θ_{0}, ε))} . \end{matrix}$ So it suffices to prove that $P_{θ_{0}}^{(n)} {(\frac{q_{θ_{0}} (X_{0})}{q_{θ} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ_{0}} (X_{i} | X_{i - 1})}{p_{θ} (X_{i} | X_{i - 1})})}^{\frac{1}{4}} \leq e^{\frac{3 a_{1}}{4} n ε^{2}}$ for all $θ \in W_{n}^{1} (θ_{0}, ε) .$ We assume without loss of generality that n is an even number, say $n = 2 k .$ Write $\frac{q_{θ_{0}} (X_{0})}{q_{θ} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ_{0}} (X_{i} | X_{i - 1})}{p_{θ} (X_{i} | X_{i - 1})} = \frac{q_{θ_{0}} (X_{0})}{q_{θ} (X_{0})} \prod_{j = 1}^{k} \frac{p_{θ_{0}} (X_{2 j} | X_{2 j - 1})}{p_{θ} (X_{2 j} | X_{2 j - 1})} \prod_{j = 1}^{k} \frac{p_{θ_{0}} (X_{2 j - 1} | X_{2 j - 2})}{p_{θ} (X_{2 j - 1} | X_{2 j - 2})} .$ From Hölder’s inequality it then turns out that $\begin{matrix} P_{θ_{0}}^{(n)} {(\frac{q_{θ_{0}} (X_{0})}{q_{θ} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ_{0}} (X_{i} | X_{i - 1})}{p_{θ} (X_{i} | X_{i - 1})})}^{\frac{1}{4}} \\ \leq {(P_{θ_{0}}^{(n)} {(\frac{q_{θ_{0}} (X_{0})}{q_{θ} (X_{0})} \prod_{j = 1}^{k} \frac{p_{θ_{0}} (X_{2 j} | X_{2 j - 1})}{p_{θ} (X_{2 j} | X_{2 j - 1})})}^{\frac{1}{2}})}^{\frac{1}{2}} {(P_{θ_{0}}^{(n)} {(\prod_{j = 1}^{k} \frac{p_{θ_{0}} (X_{2 j - 1} | X_{2 j - 2})}{p_{θ} (X_{2 j - 1} | X_{2 j - 2})})}^{\frac{1}{2}})}^{\frac{1}{2}} \\ : = A_{k} B_{k} . \end{matrix}$ Hence by Fubini’s theorem we get that $A_{k}^{2}$ is equal to $\begin{matrix} \int_{X^{2 k + 1}} \frac{q_{θ_{0}} {(X_{0})}^{\frac{3}{2}}}{q_{θ} {(X_{0})}^{\frac{1}{2}}} \prod_{j = 1}^{k} (\frac{p_{θ_{0}} {(X_{2 j} | X_{2 j - 1})}^{\frac{3}{2}}}{p_{θ} {(X_{2 j} | X_{2 j - 1})}^{\frac{1}{2}}} p_{θ_{0}} (X_{2 j - 1} | X_{2 j - 2})) \\ d μ (X_{0}) d μ (X_{1}) \dots d μ (X_{2 k}) \\ = \int_{X^{2 k - 1}} (\int_{X} (\int_{X} \frac{p_{θ_{0}} {(X_{2 k} | X_{2 k - 1})}^{\frac{3}{2}}}{p_{θ} {(X_{2 k} | X_{2 k - 1})}^{\frac{1}{2}}} d μ (X_{2 k})) p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k - 1})) \\ \frac{q_{θ_{0}} {(X_{0})}^{\frac{3}{2}}}{q_{θ} {(X_{0})}^{\frac{1}{2}}} \prod_{j = 1}^{k - 1} \frac{p_{θ_{0}} {(X_{2 j} | X_{2 j - 1})}^{\frac{3}{2}}}{p_{θ} {(X_{2 j} | X_{2 j - 1})}^{\frac{1}{2}}} p_{θ_{0}} (X_{2 j - 1} | X_{2 j - 2}) d μ (X_{0}) d μ (X_{1}) \dots d μ (X_{2 k - 2}), \end{matrix}$ where by the proof of Lemma 1 in Xing (Citation2011b) we have $\begin{matrix} \int_{X} (\int_{X} \frac{p_{θ_{0}} {(X_{2 k} | X_{2 k - 1})}^{\frac{3}{2}}}{p_{θ} {(X_{2 k} | X_{2 k - 1})}^{\frac{1}{2}}} d μ (X_{2 k})) p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k - 1}) \\ = \int_{X} (1 + \frac{3}{2} H_{*} {(p_{θ_{0}} (\cdot | X_{2 k - 1}), p_{θ} (\cdot | X_{2 k - 1}))}^{2}) p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k - 1}) \\ = 1 + \int_{X} \frac{3}{2} H_{*} {(p_{θ_{0}} (\cdot | X_{2 k - 1}), p_{θ} (\cdot | X_{2 k - 1}))}^{2} p_{θ_{0}} (X_{2 k - 1} | X_{2 k - 2}) d μ (X_{2 k - 1}) \\ \leq 1 + \int_{X} \frac{3 a_{1}}{2} H_{*} {(p_{θ_{0}} (\cdot | X_{2 k - 1}), p_{θ} (\cdot | X_{2 k - 1}))}^{2} d ν (X_{2 k - 1}) \\ = 1 + \frac{3 a_{1}}{2} H_{*} {(p_{θ_{0}}, p_{θ})}^{2} \leq e^{\frac{3 a_{1}}{2} H_{*} {(p_{θ_{0}}, p_{θ})}^{2}} . \end{matrix}$ Thus, we have obtained that $A_{k} \leq e^{\frac{3 a_{1}}{4} H_{*} {(p_{θ_{0}}, p_{θ})}^{2}} A_{k - 1} .$ Repeating the same argument k − 1 times and using $a_{1} \geq 1$ one can get $\begin{matrix} A_{k} \leq e^{\frac{3 a_{1}}{4} k H_{*} {(p_{θ_{0}}, p_{θ})}^{2}} {(\int_{X} \frac{q_{θ_{0}} {(X_{0})}^{\frac{3}{2}}}{q_{θ} {(X_{0})}^{\frac{1}{2}}} d μ (X_{0}))}^{\frac{1}{2}} \\ = e^{\frac{3 a_{1}}{4} k H_{*} {(p_{θ_{0}}, p_{θ})}^{2}} {(1 + \frac{3}{2} H_{*} {(q_{θ_{0}}, q_{θ})}^{2})}^{\frac{1}{2}} \leq e^{\frac{3}{4} H_{*} {(q_{θ_{0}}, q_{θ})}^{2} + \frac{3 a_{1}}{4} k H_{*} {(p_{θ_{0}}, p_{θ})}^{2}} . \end{matrix}$ Similarly, we can get that $B_{k} \leq e^{\frac{3 a_{1}}{4} k H_{*} {(p_{θ_{0}}, p_{θ})}^{2}} .$ Therefore $A_{k} B_{k} \leq e^{\frac{3}{4} H_{*} {(q_{θ_{0}}, q_{θ})}^{2} + \frac{3 a_{1}}{4} n H_{*} {(p_{θ_{0}}, p_{θ})}^{2}} \leq e^{\frac{3 a_{1}}{4} n ε^{2}}$ for all $θ \in W_{n}^{1} (θ_{0}, ε),$ and the proof of Lemma 1 is complete. □

Proof of Theorem 1.

Take a constant c such that $c_{2} > c > 1 / c_{0} .$ Hence $e^{- n ε_{n}^{2} c} \leq e^{- c c_{0} log n} = 1 / n^{c c_{0}}$ and hence $\sum_{n = 1}^{\infty} e^{- n ε_{n}^{2} c} < \infty .$ By Lemma 1 and the first Borel-Cantelli lemma, we get that for almost all ${X_{0}, X_{1}, \dots, X_{n}}$ the inequality $\int_{Θ} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ) \geq e^{- n ε_{n}^{2} (3 a_{1} + 4 c)} Π (W_{n}^{1} (θ_{0}, ε_{n}))$ holds for all large n. But $\begin{matrix} P_{θ_{0}}^{(n)} (Π (θ \in Θ : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n}) \geq e^{- b n ε_{n}^{2}}) \\ \leq P_{θ_{0}}^{(n)} (e^{b n ε_{n}^{2}} Π (θ \in Θ ∖ Θ_{n} : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n}) \geq \frac{1}{2}) \\ + P_{θ_{0}}^{(n)} (e^{b n ε_{n}^{2}} Π (θ \in Θ_{n} : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n}) \geq \frac{1}{2}) : = a_{n} + b_{n} . \end{matrix}$ Hence, for large n we have $\begin{matrix} a_{n} \leq 2 e^{b n ε_{n}^{2}} P_{θ_{0}}^{(n)} (Π (θ \in Θ ∖ Θ_{n} : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n})) \\ \leq \frac{2 e^{n ε_{n}^{2} (b + 3 a_{1} + 4 c)}}{Π (W_{n}^{1} (θ_{0}, ε_{n}))} P_{θ_{0}}^{(n)} (\int_{{θ \in Θ ∖ Θ_{n} : d (θ, θ_{0}) \geq r ε_{n}}} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ)) \\ \leq \frac{2 e^{n ε_{n}^{2} (b + 3 a_{1} + 4 c)} Π (Θ ∖ Θ_{n})}{Π (W_{n}^{1} (θ_{0}, ε_{n}))}, \end{matrix}$ which, by the assumption of Theorem 1, implies that $\sum_{n = 1}^{\infty} a_{n} < \infty$ if the constant b is small enough.

On the other hand, let $D_{j} = {θ \in Θ_{n} : j ε_{n} < d (θ, θ_{0}) \leq 2 j ε_{n}}$ and let $[r]$ be the largest integer less than or equal to the constant r. Then, by the inequality ${(x + y)}^{α} \leq x^{α} + y^{α}$ for all $x, y \geq 0$ and $0 < α < 1,$ we get $\begin{matrix} b_{n} = P_{θ_{0}}^{(n)} (2^{α} e^{α b n ε_{n}^{2}} Π {(θ \in Θ_{n} : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n})}^{α} \geq 1) \\ \leq 2^{α} e^{α b n ε_{n}^{2}} P_{θ_{0}}^{(n)} (Π {(θ \in Θ_{n} : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n})}^{α}) \\ \leq 2^{α} e^{α b n ε_{n}^{2}} \sum_{j = [r]}^{\infty} P_{θ_{0}}^{(n)} (Π {(D_{j} | X_{0}, X_{1}, \dots, X_{n})}^{α}) \\ \leq \frac{2^{α} e^{α n ε_{n}^{2} (b + 3 a_{1} + 4 c)}}{Π {(W_{n}^{1} (θ_{0}, ε_{n}))}^{α}} \sum_{j = [r]}^{\infty} P_{θ_{0}}^{(n)} {(\int_{D_{j}} \frac{q_{θ} (X_{0})}{q_{θ_{0}} (X_{0})} \prod_{i = 1}^{n} \frac{p_{θ} (X_{i} | X_{i - 1})}{p_{θ_{0}} (X_{i} | X_{i - 1})} Π (d θ))}^{α} \end{matrix}$ which, by Proposition 1 and the inequality assumption of Theorem 1, does not exceed $\begin{matrix} \frac{2^{α} e^{α n ε_{n}^{2} (b + 3 a_{1} + 4 c)}}{Π {(W_{n}^{1} (θ_{0}, ε_{n}))}^{α}} \sum_{j = [r]}^{\infty} 2 e^{- (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2} n {(j ε_{n})}^{2}} e^{c_{1} j^{2} n ε_{n}^{2}} Π {(W_{n}^{1} (θ_{0}, ε_{n}))}^{α} \\ = 2^{1 + α} e^{α n ε_{n}^{2} (b + 3 a_{1} + 4 c)} \sum_{j = [r]}^{\infty} e^{(c_{1} - (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}) n ε_{n}^{2} j^{2}} \\ \leq 2^{1 + α} e^{α n ε_{n}^{2} (b + 3 a_{1} + 4 c)} \sum_{j = [r]}^{\infty} e^{(c_{1} - (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}) n ε_{n}^{2} j} \\ = \frac{2^{1 + α} e^{α n ε_{n}^{2} (b + 3 a_{1} + 4 c) + (c_{1} - (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}) n ε_{n}^{2} [r]}}{1 - e^{(c_{1} - (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}) n ε_{n}^{2}}} \\ \leq \frac{2^{1 + α} n^{c_{0} α (b + 3 a_{1} + 4 c) + c_{0} (c_{1} - (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}) [r]}}{1 - n^{(c_{1} - (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}) c_{0}}} \\ \leq 2^{2 + α} n^{c_{0} α (b + 3 a_{1} + 4 c) + c_{0} (c_{1} - (\frac{1}{2} - α) {(\frac{\sqrt{a_{0}}}{2} - \sqrt{a_{1}} δ)}^{2}) [r]}, \end{matrix}$ where the next last inequality holds for all large r and the last inequality holds for all large n. Since the last exponent of n is strictly less than −1 if r is large enough, we obtain that $\sum_{n = 1}^{\infty} b_{n} < \infty$ if the constant r is large enough. Hence, by the first Borel-Cantelli lemma we obtain that for almost all $X_{0}, X_{1}, \dots, X_{n},$ $Π (θ \in Θ : d (θ, θ_{0}) \geq r ε_{n} | X_{0}, X_{1}, \dots, X_{n}) \leq e^{- b n ε_{n}^{2}}$ if n is large enough. The proof of Theorem 1 is completed. □

A posterior convergence rate theorem for general Markov chains

Abstract

1. Introduction

2. A convergence rate theorem for Markov chains

3. Non linear autoregression

References

Appendix

Information for

Open access

Opportunities

Help and information

A posterior convergence rate theorem for general Markov chains

Abstract

1. Introduction

2. A convergence rate theorem for Markov chains

3. Non linear autoregression

References

Appendix

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date