Full article: Monotonicity and robustness in Wiener disorder detection

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We study the problem of detecting a drift change of a Brownian motion under various extensions of the classical case. Specifically, we consider the case of a random post-change drift and examine monotonicity properties of the solution with respect to different model parameters. Moreover, robustness properties – effects of misspecification of the underlying model – are explored.

KEYWORDS:

Subject Classifications:

1. Introduction

In the classical version of the quickest disorder detection (QDD) problem, see Shiryaev (Citation1967), one observes a one-dimensional process Y which satisfies $Y_{t} = b {(t - Θ)}^{+} + σ W_{t},$ where b and σ are non-zero constants, W is a standard Brownian motion and the disorder time Θ is an exponentially distributed random variable (with intensity $λ > 0$ ) such that W and Θ are independent. The associated Bayes’ risk (expected cost) corresponding to a stopping rule τ is defined as(1.1) $P (Θ > τ) + c E [{(τ - Θ)}^{+}],$ (1.1) where c > 0 is the cost of one unit of detection delay. It is well-known (see (Shiryaev, Citation1978, Chapter 4)) that to minimise the Bayes risk one should stop the first time the conditional probability process $Π_{t} : = P (Θ \leq t | F_{t}^{Y})$ reaches a certain level a. Moreover, the level a is characterized as the unique solution of a transcendental equation.

In many situations, however, it is natural not to know the exact value of the disorder magnitude b, but merely its distribution. This is the case for example when a specific machine is monitored continuously, and the machine can break down in several possible ways. To study such a situation, we allow for the new drift to be a random variable B with distribution μ such that B is independent of the other sources of randomness. In this setting we study monotonicity properties of the QDD problem, i.e. whether the (minimal) expected cost is monotone with respect to various model parameters. In particular, we study the dependence of the expected cost on the volatility σ, the distribution μ, and the disorder intensity λ. We also study robustness in the QDD problem, i.e. what happens if one misspecifies various model parameters. More specifically, we aim at estimates for the increased cost associated with the use of suboptimal strategies. Clearly, such estimates are helpful in situations where the model is badly calibrated, but also in situations where one chooses to use a simpler suboptimal strategy rather than a computationally more demanding optimal strategy.

As mentioned above, the classical version of the QDD problem was studied in Shiryaev (Citation1967), see also (Shiryaev, Citation1978, Chapter 4) and (Peskir and Shiryaev, Citation2006, Section 22); for extensions to the case of detecting a change in the intensity of a Poission process, see Peskir and Shiryaev (Citation2002), Bayraktar et al. (Citation2005) and Bayraktar et al. (Citation2006). For the case of a random disorder magnitude, Beibel (Citation1997) obtains asymptotic results of a problem with normally distributed drift. Concavity of the value function in a related hypothesis testing problem with two possible post-change drift values in a time-homogeneous case was obtained in Muravlev and Shiryaev (Citation2014). Finally, practical significance of the disorder detection problem in modern engineering applications is explained in Zucca et al. (2016).

2. General model formulation

We model a signal-processing activity on a stochastic basis $(Ω, F, F, P)$ , where the filtration $F = {F_{t}}_{t \geq 0}$ satisfies the usual conditions. We are interested in the signal process X, which is not directly observable, but we can continuously observe the noisy process(2.1) $Y_{t} = \int_{0}^{t} X_{u} d u + \int_{0}^{t} σ (u) d W_{u}, t \geq 0.$ (2.1)

Here W is a Brownian motion independent of X, the dispersion σ is deterministic and strictly positive, and the signal process follows(2.2) $X_{t} = B^{0} 1_{{Θ = 0}} + B^{1} 1_{{0 < Θ \leq t}},$ (2.2) where Θ is a $[0, \infty)$ -valued random variable representing the disorder occurrence time. Moreover, B⁰, B¹ are real-valued random variables corresponding to disorder magnitudes in the cases ‘disorder occurs before we start observing Y’ and ‘disorder occurs while we observe Y’, respectively. Also, Θ, B⁰, and B¹ are independent. Let Θ have the distribution $\tilde{π} δ_{0} + (1 - \tilde{π}) ν$ , were ν is a probability measure on $(0, \infty)$ with a continuously differentiable distribution function $F_{ν}$ . In addition, denote the distributions of B⁰ and B¹ by $μ^{0}$ and $μ^{1}$ , respectively. When referring to $μ^{0}$ and $μ^{1}$ collectively, we will simply say that the prior is μ. Let us introduce the notation $D^{n} : = {π \in {[0, \infty)}^{n} : | | π | |_{1} \leq 1}$ and $Δ^{n} : = {π \in {[0, \infty)}^{n} : | | π | |_{1} = 1},$

where $| | π | |_{1} = \sum_{i = 1}^{n} π_{i}$ . We assume that $μ^{0} = \sum_{i = 1}^{n} {\overset{ˇ}{p}}_{i} δ_{b_{i}}, μ^{1} = \sum_{i = 1}^{n} p_{i} δ_{b_{i}},$ where $b_{1}, \dots, b_{n} \in R ∖ {0}$ and $({\overset{ˇ}{p}}_{1}, \dots, {\overset{ˇ}{p}}_{n}), (p_{1}, \dots, p_{n}) \in Δ^{n}$ .

The model studied in the paper is a generalisation of the classical disorder occurrence model Shiryaev (Citation1967). Firstly, the exponential disorder distribution used in the classical problem is replaced by an arbitrary distribution with time-dependent intensity. The generalisation is advantageous in situations when the intensity of the disorder occurrence changes with time. For example, if the disorder corresponds to a component failure in a system, for many physical systems, the failure intensity is known to increase with age. Also, if occurrence of the disorder depends on external factors such as weather, then such dependency can be incorporated into the time-dependent disorder intensity from an accurate weather forecast. Moreover, in contrast to the classical problem in which the disorder magnitude is known in advance, in this generalisation, the magnitude takes a value from a range of possible values. Returning to the component failure example, the different possible disorder magnitudes would represent different types of component failure. In the problem of detecting malfunctioning atomic clocks Zucca et al. (2016), the disorder corresponds to a systematic drift of a clock. The sign of the disorder magnitude reflects whether a clock starts to go too slow or too fast while the absolute value represents the severity of the drift. In addition, the different distributions $μ^{0}, μ^{1}$ of B⁰ and B¹ and the weight $\tilde{π}$ reflect the prior knowledge about how likely different disorder magnitudes are if the disorder happened before or while observing Y. For instance, such model flexibility is relevant when we start observing the system after a particular incident (e.g. a storm if the system is affected by the weather) and we know that the distribution of possible disorder magnitudes after the incident is different than under normal operating conditions. From a mathematical point of view, $\tilde{π}$ and B⁰ allow us to give a statistical interpretation to an arbitrary starting point in the Markovian embedding (2.7) of the original optimal stopping problem studied later.

Remark 2.1.

We point out that the finite support assumption on μ is made for notational convenience. As any distribution can be approximated arbitrarily well by finitely supported ones, obviously, our monotonicity results below can be extended to general disorder magnitude distributions.

We are interested in a disorder detection strategy τ incorporating two objectives: short detection delay and a small portion of false alarms. As noted in the introduction, a classical choice of Bayes’ risk for a detection strategy to minimize is given by (1.1). In the present paper, we consider a slightly more flexible risk structure by allowing a time-dependent cost for the detection delay. More precisely, we consider the Bayes’ risk $R (τ) : = E [1_{{τ < Θ}} + \int_{Θ}^{τ} c (u) d u]$ where $1_{{τ < Θ}}$ is a fixed penalty for a false alarm and the term $\int_{Θ}^{τ} c (u) d u$ is a penalty for detection delay. Here $t \mapsto c (t)$ is a deterministic function with c(t) > 0 for all $t \geq 0$ . Writing $F^{Y} = {F_{t}^{Y}}_{t \geq 0}$ for the filtration generated by Y (which is our observation filtration), let us introduce ${\tilde{Π}}_{t} : = E [1_{R ∖ {0}} (X_{t}) | F_{t}^{Y}]$ . Then $\begin{matrix} R (τ) = E [1 - E [1_{{Θ \leq τ}} | F_{τ}^{Y}]] + \int_{0}^{\infty} c (t) E [1_{{t \leq τ}} E [1_{{Θ \leq t}} | F_{t}^{Y}]] d t \\ = E [1 - {\tilde{Π}}_{τ} + \int_{0}^{τ} c (t) {\tilde{Π}}_{t} d t] . \end{matrix}$

Hence the optimal stopping problem to solve is(2.3) $V = \inf_{τ \in T^{Y}} E [1 - {\tilde{Π}}_{τ} + \int_{0}^{τ} c (t) {\tilde{Π}}_{t} d t],$ (2.3) where $T^{Y}$ denotes the set of $F^{Y}$ -stopping times.

2.1. Subsection: filtering equations

Let us define $Π_{t}^{(i)} : = E [1_{{X_{t} = b_{i}}} | F_{t}^{Y}]$ , where $i = 1, \dots, n$ . By the Kallianpur-Striebel formula, see (Crisan and Rozovskii, Citation2011, Theorem 2.9 on p. 39),(2.4) $Π_{t}^{(i)} = \frac{\tilde{π} {\overset{ˇ}{p}}_{i} e^{\int_{0}^{t} \frac{b_{i}}{σ {(u)}^{2}} d Y_{u} - \int_{0}^{t} \frac{b_{i}^{2}}{2 σ {(u)}^{2}} d u} + (1 - \tilde{π}) p_{i} \int_{[0, t]} e^{\int_{θ}^{t} \frac{b_{i}}{σ {(u)}^{2}} d Y_{u} - \int_{θ}^{t} \frac{b_{i}^{2}}{2 σ {(u)}^{2}} d u} ν (d θ)}{\tilde{π} \sum_{j} {\overset{ˇ}{p}}_{j} e^{\int_{0}^{t} \frac{b_{j}}{σ {(u)}^{2}} d Y_{u} - \int_{0}^{t} \frac{b_{j}^{2}}{2 σ {(u)}^{2}} d u} + (1 - \tilde{π}) (\sum_{j} p_{j} \int_{[0, t]} e^{\int_{θ}^{t} \frac{b_{j}}{σ {(u)}^{2}} d Y_{u} - \int_{θ}^{t} \frac{b_{j}^{2}}{2 σ {(u)}^{2}} d u} ν (d θ) + ν ((t, \infty)))}$ (2.4) for $i = 1, \dots, n$ . Moreover, from the Kushner-Stratonovich equation, see (Crisan and Rozovskii, Citation2011, Theorem 3.1 on p. 58), we know that $Π^{(i)}$ satisfies(2.5) $d Π_{t}^{(i)} = p_{i} λ (t) (1 - \sum_{j = 1}^{n} Π_{t}^{(j)}) d t + \frac{Π_{t}^{(i)}}{σ (t)} (b_{i} - \sum_{j = 1}^{n} b_{j} Π_{t}^{(j)}) d {\hat{W}}_{t}, i = 1, \dots, n .$ (2.5)

Here $λ (t) = F'_{ν} (t) / (1 - F_{ν} (t))$ is the intensity of the disorder occurring at time t > 0 (conditional on not having occurred yet), and ${\hat{W}}_{t} = \int_{0}^{t} \frac{1}{σ (u)} (d Y_{u} - E [X_{u} | F_{u}^{Y}] d u)$ is a standard Brownian motion with respect to ${F_{t}^{Y}}_{t \geq 0}$ , see Bain and Crisan (Citation2009) (the process ${\hat{W}}_{t}$ is referred to as the innovation process). Note that ${\tilde{Π}}_{t} = \sum_{i = 1}^{n} Π_{t}^{(i)}$ yields(2.6) $d {\tilde{Π}}_{t} = λ (t) (1 - {\tilde{Π}}_{t}) d t + \frac{{\hat{X}}_{t}}{σ (t)} (1 - {\tilde{Π}}_{t}) d {\hat{W}}_{t},$ (2.6) where ${\hat{X}}_{t} = E [X_{t} | F_{t}^{Y}]$ .

The posterior distribution $P (X_{t} \in \cdot | F_{t}^{Y}) = \sum_{i = 1}^{n} Π_{t}^{(i)} δ_{b_{i}} (\cdot)$ , so the n-tuple $Π_{t} = (Π_{t}^{(1)}, \dots, Π_{t}^{(n)})$ fully describes the posterior. As a result, (2.4) and (2.5) provide two different representations of the posterior distribution.

2.2. Subsection: markovian embedding

Following standard lines in optimal stopping theory, we embed our optimal stopping problem into a Markovian framework. To do that, define a Markovian value function V by(2.7) $V (t, π) : = \inf_{τ \in T_{t}^{Π}} E^{t, π} [1 - {\tilde{Π}}_{t + τ} + \int_{t}^{t + τ} c (u) {\tilde{Π}}_{u} d u], (t, π) \in [0, \infty) \times D^{n},$ (2.7) where $T_{t}^{Π}$ denotes the stopping times with respect to the n-dimensional process ${Π_{t + s}^{t, π}}_{s \geq 0}$ starting from π at time t and satisfying (2.5). It is worth noting that $V (t, π)$ corresponds to the value of the problem $(2.3)$ in which the initial time is t and $μ_{0} = \sum_{i = 1}^{n} π_{i} δ_{b_{i}}$ .

Remark 2.2.

The value function $V (t, \cdot)$ in (2.7) is concave for any $t \geq 0$ . Indeed, the concavity proof in Muravlev and Shiryaev (Citation2014) extends to the current setting. Since concavity is not used in the monotonicity results below, however, we omit the details.

2.2.1. Subsubsection: the classical shiryaev solution

In this subsection we recall the solution in the classical case where the cost c, the intensity λ and the post-change drift b are constants. In that case, we have the optimal stopping problem(2.8) $U (π) = \sup_{τ \in T^{Π}} E^{π} [1 - Π_{τ} + c \int_{0}^{τ} Π_{t} d t]$ (2.8) with an underlying diffusion process $d Π_{t} = λ (1 - Π_{t}) d t + \frac{b}{σ} Π_{t} (1 - Π_{t}) d {\hat{W}}_{t} .$

It is well-known (see (Shiryaev, Citation1978, Chapter 4) or (Peskir and Shiryaev, Citation2006, Section 22)) that U solves the free-boundary problem(2.9) ${\begin{matrix} \frac{b^{2} π^{2} {(1 - π)}^{2}}{2 σ^{2}} \partial_{π}^{2} U + λ (1 - π) \partial_{π} U + c π = 0 & π \in (0, a) \\ U (π) = 1 - π & π \in [a, 1] \\ \partial_{π} U (a) = - 1. \end{matrix}$ (2.9)

Here $a \in (0, 1)$ is the free-boundary, and it can be determined as the solution of a certain transcendental equation. Moreover, the stopping time $τ : = \inf {t \geq 0 : Π_{t} \geq a}$ is optimal in (2.8), and one can check that the value function U is decreasing and concave.

3. Value dependencies and robustness

3.1. Subsection: monotonicity properties of the value function

In this section, we study parameter dependence of the optimal stopping problem (2.7). In particular, we investigate how the value function changes when we alter parameters of the probabilistic model, which include the prior for the drift magnitude and the prior for the disorder time.

The effects of adding more noise, stretching out the prior by scaling, and increasing the observation cost are explained by the following theorem.

Theorem 3.1

(General monotonicity properties of the value function V).

V is increasing in the volatility $σ (\cdot)$ .
Given a prior μ for the drift magnitude, let V_k denote the Markovian value function (2.7) in the case when the drift prior is $μ (\frac{\cdot}{k})$ . Then the map $k \mapsto V_{k} (t, π)$ is decreasing on $(0, \infty)$ for any $(t, π)$ .
V is increasing in the cost function $c (\cdot)$ .

Proof.

For simplicity of notation, and without loss of generality, we consider the case t = 0 in the proofs below.

For the volatility, let $t \mapsto σ_{1} (t)$ and $t \mapsto σ_{2} (t)$ be two time-dependent volatility functions satisying $σ_{1} (t) \leq σ_{2} (t)$ for all $t \geq 0$ . Also, let $Y_{t}^{i} : = \int_{0}^{t} X_{u} d u + \int_{0}^{t} σ_{i} (u) d W_{u}, i = 1, 2,$

and let V_i, i = 1, 2, be the corresponding value functions. In addition, let $W^{⊥}$ be a standard Brownian motion independent of W and X. Then, clearly, $V_{1} = \inf_{τ \in T^{Y^{1}}} E [1_{{τ < Θ}} + \int_{Θ}^{τ} c (u) d u] = \inf_{τ \in T^{Y^{1}, W^{⊥}}} E [1_{{τ < Θ}} + \int_{Θ}^{τ} c (u) d u] .$

Moreover, the process ${\tilde{Y}}_{t}^{2} : = Y_{t}^{1} + \int_{0}^{t} \sqrt{σ_{2}^{2} (u) - σ_{1}^{2} (u)} d W_{u}^{⊥}$ coincides in law with Y² and $T^{{\tilde{Y}}^{2}} \subseteq T^{Y^{1}, W^{⊥}}$ . Hence it follows that $V_{1} = \inf_{τ \in T^{Y^{1}, W^{⊥}}} E [1_{{τ < Θ}} + \int_{Θ}^{τ} c (u) d u] \leq \inf_{τ \in T^{{\tilde{Y}}^{2}}} E [1_{{τ < Θ}} + \int_{Θ}^{τ} c (u) d u] = V_{2},$ which finishes the proof of the claim.

Note that for k > 0, the process $Y_{t}^{k} : = \int_{0}^{t} k X_{u} d u + \int_{0}^{t} σ (u) d W_{u}$

satisfies $Y_{t}^{k} = k {\tilde{Y}}_{t}$ , where ${\tilde{Y}}_{t} : = \int_{0}^{t} X_{u} d u + \int_{0}^{t} \frac{σ (u)}{k} d W_{u} .$

Moreover, the set of $F^{Y^{k}}$ -stopping times coincides with the set of $F^{\tilde{Y}}$ -stopping times, so monotonicity in k is implied by monotonicity in the volatility. Thus claim 2 follows from claim 1.

The fact that the value is increasing in c is obvious from the definition (2.7) of the value function.

□

The monotonicity of the minimal Bayes’ risk with respect to volatility σ is of course not surprising: more noise in the observation process gives a smaller signal-to-noise ratio, which slows down the speed of learning. It is less clear how a change in the disorder intensity λ should affect the value function under a general disorder magnitude distribution. However, we have the following comparison result for the case of constant parameters.

Theorem 3.2

(Monotonicity in the intensity for constant parameters). Assume that the disorder magnitude can only take one value $b \in R ∖ {0}$ . Let the cost c, the volatility σ and the intensity λ be constants, and assume that $λ \geq λ' (\cdot)$ . Let U be the value function for Shiryaev’s problem with parameters $(b, σ, λ, c)$ , and let V denote the value function for the problem specification $(b, σ, λ', c)$ . Then $U (π) \leq V (t, π)$ for all $π \in [0, 1]$ and $t \geq 0$ .

Proof.

Without loss of generality, we only consider the case t = 0. Let $π \in [0, 1]$ , denote by $Y'$ the observation process corresponding to the model specification $(b, σ, λ', c)$ , and let $Π'$ denote the corresponding process Π started from π at time 0. Let $τ \in T^{Y'}$ be a bounded stopping time. Then, applying (a generalised version of) Ito’s formula and taking expectations at the stopping time τ, we get $\begin{matrix} U (π) = E [U (Π' (τ))] - E [\int_{0}^{τ} (λ' (s) (1 - Π' (s)) \partial_{π} U (Π' (s)) \\ + \frac{b^{2}}{2 σ^{2}} {(Π')}^{2} (s) {(1 - Π' (s))}^{2} \partial_{π}^{2} U (Π' (s))) d s] \\ \leq E [U (Π' (τ))] - E [\int_{0}^{τ} (λ (1 - Π' (s)) \partial_{π} U (Π' (s)) \\ + \frac{b^{2}}{2 σ^{2}} {(Π')}^{2} (s) {(1 - Π' (s))}^{2} \partial_{π}^{2} U (Π' (s))) d s] \\ \leq E [U (Π' (τ))] + E [c \int_{0}^{τ} Π' (s) d s] \\ \leq E [1 - Π' (τ)] + E [c \int_{0}^{τ} Π' (s) d s], \end{matrix}$

where we used the monotonicity of U and the fact that(3.1) $λ (1 - π) \partial_{π} U (π) + \frac{b^{2}}{2 σ^{2}} π^{2} {(1 - π)}^{2} \partial_{π}^{2} U (π) + c π \geq 0$ (3.1) at all points away from the optimal stopping boundary of Shiryaev’s classical problem, compare (2.9). Taking the infimum over bounded stopping times τ, we get $U (π) \leq V (0, π)$ , which finishes the proof. □

Remark 3.1.

The monotonicity in intensity does not easily extend to cases with unknown post-change drift by the same argument. In fact, one can check that in higher dimensions the partial derivatives $\frac{\partial V}{\partial π_{i}}$ are not necessarily all negative, which implies difficulties with extending the above proof to a more general setting. However, the robustness result in Theorem 3.3 below provides a partial extension in which models with general support for the drift magnitude and general intensities are compared with a fixed parameter model.
Though the authors expect the inequality in Theorem 3.2 to hold also when one time-dependent intensity dominates another, the comparison with the constant intensity case was chosen to avoid additional mathematical complications that need to be resolved in order to apply Ito’s formula to the value function of a time-dependent disorder detection problem.

3.2. Subsection: robustness

Robustness concerns how a possible misspecification of the model parameters affects the performance of the detection strategy when evaluated under the real physical measure. In this section, we use coupling arguments to study robustness properties with respect to the disorder magnitude and disorder time. For simplicity, we assume that the parameters λ, c and σ are constant so that we have a time-independent case; generalizations to the time-dependent case are straightforward but notationally more involved.

Thus we assume that the signal process follows(3.2) $X_{t} = B^{0} 1_{{Θ = 0}} + B^{1} 1_{{0 < Θ \leq t}},$ (3.2) where B⁰, B¹ are random variables with distributions $μ^{0}, μ^{1}$ respectively, and Θ has the distribution $ν_{\tilde{π}} : = \tilde{π} δ_{0} + (1 - \tilde{π}) ν$ , where ν is an exponential distribution with intensity λ. Let us simply write $μ : = (μ^{0}, μ^{1})$ .

For a given $l \in R ∖ {0}$ , let Θ_l satisfy $Θ_{l} \geq Θ$ with distribution $\tilde{π} δ_{0} + (1 - \tilde{π}) ν_{l}$ , where ν_l is an exponential distribution with intensity $λ_{l} \leq λ$ . Let $g_{l} (t, \tilde{π}, Y) : = \frac{\tilde{π} e^{\frac{l}{σ^{2}} Y_{t} - \frac{l^{2}}{2 σ^{2}} t} + (1 - \tilde{π}) λ_{l} \int_{0}^{t} e^{\frac{l}{σ^{2}} (Y_{t} - Y_{θ}) - \frac{l^{2}}{2 σ^{2}} (t - θ)} e^{- θ / λ_{l}} d θ}{\tilde{π} e^{\frac{l}{σ^{2}} Y_{t} - \frac{l^{2}}{2 σ^{2}} t} + (1 - \tilde{π}) (λ_{l} \int_{0}^{t} e^{\frac{l}{σ^{2}} (Y_{t} - Y_{θ}) - \frac{l^{2}}{2 σ^{2}} (t - θ)} e^{- θ / λ_{l}} d θ + 1 - e^{- t / λ_{l}})},$ compare (2.4). Also, we introduce the notation $\begin{matrix} Y_{t}^{μ} : = \int_{0}^{t} X_{u} d u + σ W_{t}, \\ Y_{t}^{δ_{l}} : = l {(t - Θ_{l})}^{+} + σ W_{t}, \\ {\tilde{Π}}_{δ_{l}}^{δ_{l}} (t) : = g_{l} (t, \tilde{π}, Y^{δ_{l}}) \end{matrix}$ and ${\tilde{Π}}_{δ_{l}}^{μ} (t) : = g_{l} (t, \tilde{π}, Y^{μ}) .$

Here $Y^{μ}$ is the observation process for a setting in which the post-change drift has distribution μ and the disorder happens at Θ. The process $Y^{δ_{l}}$ is the observation process and ${\tilde{Π}}_{δ_{l}}^{δ_{l}}$ is the corresponding conditional probability process in the situation of a post-change drift l that occurs at Θ_l. Moreover, the process ${\tilde{Π}}_{δ_{l}}^{μ}$ represents the conditional probability process calculated as if the drift change is described by $(δ_{l}, Θ_{l})$ in the scenario where the true drift-change is given by $(μ, Θ)$ .

Now, let $a : = a_{l}$ denote the optimal stopping boundary for the classical Shiryaev one-dimensional problem in the model $(δ_{l}, Θ_{l})$ , and define $\begin{matrix} τ_{δ_{l}}^{δ_{l}} : = \inf {t \geq 0 : {\tilde{Π}}_{δ_{l}}^{δ_{l}} (t) \geq a}, \\ τ_{δ_{l}}^{μ} : = \inf {t \geq 0 : {\tilde{Π}}_{δ_{l}}^{μ} (t) \geq a}, \end{matrix}$ and $V_{δ_{l}}^{μ} : = E [1_{{τ_{δ_{l}}^{μ} < Θ}} + c {(τ_{δ_{l}}^{μ} - Θ)}^{+}] .$

Here $τ_{δ_{l}}^{δ_{l}}$ is the optimal stopping time in the model $(δ_{l}, Θ_{l})$ , and $τ_{δ_{l}}^{μ}$ is the (sub-optimal) stopping time and $V_{δ_{l}}^{μ}$ is the corresponding cost for someone who believes in $(δ_{l}, Θ_{l})$ , whereas the true model is $(μ, Θ)$ .

Finally, let ${\tilde{Π}}_{t}^{μ} : = P (1_{R ∖ {0}} (X_{t}) | F_{t}^{Y^{μ}}) = Π_{t}^{(1)} + \dots + Π_{t}^{(n)}$ as in Section 2, and define $γ_{δ_{l}}^{μ} : = \inf {t \geq 0 : {\tilde{Π}}_{t}^{μ} \geq a} .$

Theorem 3.3

(Robustness with respect to disorder magnitude and intensity).

Suppose that $\inf (supp μ) > 0$ or $\sup (supp μ) < 0$ , and let $l : = \underset{_{x \in supp (μ)}}{arg min} | x |$ .
Then(3.3) $V^{μ} \leq V_{δ_{l}}^{μ} \leq V^{δ_{l}} + c \frac{λ - λ_{l}}{λ λ_{l}} (1 - \tilde{π}),$ (3.3)

where $V^{μ}$ and $V^{δ_{l}}$ denote the minimal associated Bayes’ risks for the models $(μ, Θ)$ and $(δ_{l}, Θ)$ , respectively.

Also,(3.4) $V^{μ} \leq P (Θ > γ_{δ_{l}}^{μ}) + c E [{(γ_{δ_{l}}^{μ} - Θ)}^{+}] \leq V^{δ_{l}} .$ (3.4)
Suppose $r : = \underset{_{x \in supp (μ)}}{arg max} | x |$ , and define $V_{δ_{r}}^{μ}$ like $V_{δ_{l}}^{μ}$ for l = r. If $λ_{r} \geq λ$ , then(3.5) $V^{δ_{r}} \leq V^{μ} \leq V_{δ_{r}}^{μ} .$ (3.5)

Remark 3.2.

Note that (3.3) and (3.5) correspond to situations in which the tester uses a misspecified model. More precisely, filtering and stopping are performed as if the underlying model had a one-point distribution as the disorder magnitude prior (the classical Shiryaev model). Such a situation may appear due to model miscalibration but is also relevant in situations with limited computational resources as the tester can deliberately choose to under/overestimate the actual parameters in order to use a simpler detection strategy. Equation (3.3) thus gives an upper bound for the expected loss when the classical Shiryaev model is employed. In (3.4), on the other hand, filtering is performed according to the correct model but the simple Shiryaev threshold strategy (suboptimal) is used for stopping.

Proof.

1. (a) For definiteness, we consider the case $\inf (supp μ) > 0$ so that l > 0; the other case is completely analogous. First note that the suboptimality of $τ_{δ_{l}}^{μ}$ yields $V^{μ} \leq V_{δ_{l}}^{μ}$ . Next, observe that we have $Y_{t}^{δ_{l}} = Y_{t}^{μ}$ for all $0 \leq t \leq Θ$ and $Y_{t}^{δ_{l}} \leq Y_{t}^{μ}$ for all $t \geq 0$ , and therefore ${\tilde{Π}}_{δ_{l}}^{δ_{l}} (t) = {\tilde{Π}}_{δ_{l}}^{μ} (t) for t \in [0, Θ]$ and ${\tilde{Π}}_{δ_{l}}^{δ_{l}} (t) \leq {\tilde{Π}}_{δ_{l}}^{μ} (t) for all t \geq 0$ by the filtering Equationequation (3(3.1) $λ (1 - π) \partial_{π} U (π) + \frac{b^{2}}{2 σ^{2}} π^{2} {(1 - π)}^{2} \partial_{π}^{2} U (π) + c π \geq 0$ (3.1) .3). Consequently, $τ_{δ_{l}}^{δ_{l}} \geq τ_{δ_{l}}^{μ},$ so(3.6) $\begin{matrix} E [{(τ_{δ_{l}}^{δ_{l}} - Θ_{l})}^{+}] \geq E [{(τ_{δ_{l}}^{μ} - Θ)}^{+}] - E [{(Θ_{l} - Θ)}^{+}] \\ = E [{(τ_{δ_{l}}^{μ} - Θ)}^{+}] - \frac{λ - λ_{l}}{λ λ_{l}} (1 - \tilde{π}) . \end{matrix}$ (3.6)

Moreover, since ${\tilde{Π}}_{δ_{l}}^{δ_{l}} (t) = {\tilde{Π}}_{δ_{l}}^{μ} (t)$ on the time interval $[0, Θ]$ , we have $P (τ_{δ_{l}}^{δ_{l}} < Θ_{l}) \geq P (τ_{δ_{l}}^{δ_{l}} < Θ) = P (τ_{δ_{l}}^{μ} < Θ),$

which together with (3.6) yields $\begin{matrix} V^{δ_{l}} = E [1_{{τ_{δ_{l}}^{δ_{l}} < Θ_{l}}} + c {(τ_{δ_{l}}^{δ_{l}} - Θ_{l})}^{+}] \\ \geq E [1_{{τ_{δ_{l}}^{μ} < Θ}} + c {(τ_{δ_{l}}^{μ} - Θ)}^{+}] - c \frac{λ - λ_{l}}{λ λ_{l}} (1 - \tilde{π}) \\ = V_{δ_{l}}^{μ} - c \frac{λ - λ_{l}}{λ λ_{l}} (1 - \tilde{π}) . \end{matrix}$

(b) The first inequality is immediate by suboptimality of $γ_{δ_{l}}^{μ}$ . For the second one, let U be the value function of the classical Shiryaev problem so that $U (\tilde{π}) = V^{δ_{l}}$ . Then U is C² on $[0, a_{l}) \cup (a_{l}, 1]$ and C¹ on $[0, 1]$ , so applying Itô’s formula to $U ({\tilde{Π}}_{t})$ and taking expectations at the bounded stopping time $γ_{δ_{l}}^{μ} \land k$ , we get $\begin{matrix} U (\tilde{π}) = E [U ({\tilde{Π}}_{γ_{δ_{l}}^{μ} \land k})] - E [\int_{0}^{γ_{δ_{l}}^{μ} \land k} λ (1 - {\tilde{Π}}_{u}) U' ({\tilde{Π}}_{u}) + \frac{{\hat{X}}_{u}^{2}}{2 σ^{2}} {(1 - {\tilde{Π}}_{u})}^{2} U ″ ({\tilde{Π}}_{u}) d u] \\ \geq E [U ({\tilde{Π}}_{γ_{δ_{l}}^{μ} \land k})] - E [\int_{0}^{γ_{δ_{l}}^{μ} \land k} λ_{l} (1 - {\tilde{Π}}_{u}) U' ({\tilde{Π}}_{u}) + \frac{l^{2}}{2 σ^{2}} {\tilde{Π}}_{u}^{2} {(1 - {\tilde{Π}}_{u})}^{2} U ″ ({\tilde{Π}}_{u}) d u] \\ = E [U ({\tilde{Π}}_{γ_{δ_{l}}^{μ} \land k})] + E [c \int_{0}^{γ_{δ_{l}}^{μ} \land k} {\tilde{Π}}_{u} d u], \end{matrix}$ where monotonicity and concavity of U were used in the inequality. Letting $k \to \infty$ gives $U (\tilde{π}) \geq E [1 - {\tilde{Π}}_{γ_{δ_{l}}^{μ}}] + E [c \int_{0}^{γ_{δ_{l}}^{μ}} {\tilde{Π}}_{u} d u],$ which finishes the proof of the claim.

2. Recall that $d {\tilde{Π}}_{t} = λ (1 - {\tilde{Π}}_{t}) d t + \frac{{\hat{X}}_{t}}{σ} (1 - {\tilde{Π}}_{t}) d {\hat{W}}_{t} .$

Let $U (\tilde{π}) = V^{δ_{r}} (\tilde{π})$ . Since U is C¹ on $[0, 1]$ and C² on $[0, a) \cup (a, 1]$ , where a = a_r is the boundary in Shiryaev’s problem with drift r and intensity λ_r, applying Itô’s formula to $U ({\tilde{Π}}_{t})$ and taking expectations at a bounded stopping time τ yields(3.7) $\begin{matrix} U (\tilde{π}) = E [U ({\tilde{Π}}_{τ})] - E [\int_{0}^{τ} λ (1 - {\tilde{Π}}_{u}) U' ({\tilde{Π}}_{u}) + \frac{{\hat{X}}_{u}^{2}}{2 σ^{2}} {(1 - {\tilde{Π}}_{u})}^{2} U ″ ({\tilde{Π}}_{u}) d u] \\ \leq E [U ({\tilde{Π}}_{τ})] - E [\int_{0}^{τ} λ_{r} (1 - {\tilde{Π}}_{u}) U' ({\tilde{Π}}_{u}) + \frac{r^{2}}{2 σ^{2}} {\tilde{Π}}_{u} {(1 - {\tilde{Π}}_{u})}^{2} U ″ ({\tilde{Π}}_{u}) d u] \\ \leq E [U ({\tilde{Π}}_{τ})] + E [c \int_{0}^{τ} {\tilde{Π}}_{u} d u] \end{matrix}$ (3.7) (3.8) $\leq E [1 - {\tilde{Π}}_{τ}] + E [c \int_{0}^{τ} {\tilde{Π}}_{u} d u] .$ (3.8)

Here concavity was used for the first inequality, (3.7) follows from the fact that $λ_{r} (1 - \tilde{π}) U' (\tilde{π}) + \frac{r^{2}}{2 σ^{2}} \tilde{π} {(1 - \tilde{π})}^{2} U ″ (\tilde{π}) + c \tilde{π} \geq 0, \tilde{π} \in [0, a) \cup (a, 1],$ and the inequality (3.8) because $U (\tilde{π}) \leq 1 - \tilde{π}$ . Hence, since the same value $V^{μ}$ is obtained if one in (2.3) restricts the infimum to only bounded stopping times, $V^{δ_{r}} = U \leq V^{μ} .$

Lastly, since $τ_{l}^{μ}$ is a suboptimal strategy, we also have $V^{μ} \leq V_{δ_{r}}^{μ},$ which finishes the claim.□

Corollary 3.1.

In the notation above, assume that $λ = λ_{l}$ so that there is no mis-specification of the intensity. Moreover, assume that $supp (μ) \subseteq [l, r]$ , where $0 < l < r$ . Then $V^{δ_{r}} \leq V^{μ} \leq V^{δ_{l}},$ so monotonicity in the disorder magnitude holds when comparing with deterministic magnitudes. Furthermore, $0 \leq V_{δ_{l}}^{μ} - V^{μ} \leq V^{δ_{l}} - V^{δ_{r}},$ so the increase in the Bayes’ risk due to underestimation (with a constant) of the disorder magnitude is bounded by the difference of two value functions of the classical Shiryaev problem.

We finish with some implications concerning the stopping strategy $τ_{D} : = \inf {t \geq 0 : Π_{t} \in D}$ , where $D = {π \in Δ^{n} : V (π) = 1 - π}$ is a standard abstractly defined optimal stopping set, see Peskir and Shiryaev (Citation2006) (we now assume that we are in the case of time-independent coefficients so that the value function is merely a function of $π \in D^{n}$ ). The concavity of V, compare Remark 2.2, yields the existence of a boundary $γ \subset Δ^{n}$ separating $D$ from its complement $Δ^{n} ∖ D$ . The following result provides a more accurate location of the boundary γ.

Corollary 3.2

(Confined stopping boundary). Assume that the coefficients c, σ and λ are constant and that $supp (μ) \subseteq [l, r]$ , where $0 < l < r$ . Let a_l and a_r denote the boundaries in the classical Shiryaev problem with disorder magnitude l and r, respectively. Then $a_{l} \leq \inf {| | π | |_{1} : π \in γ} \leq \sup {| | π | |_{1} : π \in γ} \leq a_{r},$

i.e. the stopping boundary is contained in a strip. Moreover, the optimal strategy $τ_{D}$ satisfies $1 - a_{r} \leq P (τ_{D} < Θ | F_{τ_{D}}^{Y}) \leq 1 - a_{l} .$

Acknowledgements

We thank the Associate Editor and an anonymous referee for their suggestions to improve the paper.

References

Bain, A. and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Stochastic Modelling and Applied Probability, 60, New York: Springer.
Google Scholar
Beibel, M. (1997). Sequential Change-Point Detection in Continuous Time When the Post-Change Drift is Unknown, Bernoulli 3: 457–478.
Web of Science ®Google Scholar
Bayraktar, E., Dayanik, S., and Karatzas, I. (2005). The Standard Poisson Disorder Problem Revisited, Stochastic Processes and Their Applications 115: 1437–1450.
Web of Science ®Google Scholar
Bayraktar, E., Dayanik, S., and Karatzas, I. (2006). Adaptive Poisson Disorder Problem, Annals of Applied Probability 16: 1190–1261.
Web of Science ®Google Scholar
Crisan, D. and Rozovskii, B. (2011). The Oxford Handbook of Nonlinear Filtering, Oxford: Oxford University.
Google Scholar
Muravlev, A. and Shiryaev, A. (2014). Two-Sided Disorder Problem for a Brownian Motion in a Bayesian Setting, Proceedings of Steklov Institute of Mathematics 287: 202–224.
Web of Science ®Google Scholar
Peskir, G. and Shiryaev, A. (2002). Solving the Poisson Disorder Problem, in Advances in Finance and Stochastics, pp. 295–312, Berlin: Springer.
Google Scholar
Peskir, G. and Shiryaev, A. (2006). Optimal Stopping and Free-Boundary Problems, Lectures in Mathematics, ETH Zurich, Basel: Birkhäuser.
Google Scholar
Shiryaev, A. N. (1967). Two Problems of Sequential Analysis, Cybernetics 3: 63–69.
Google Scholar
Shiryaev, A. N. (1978). Optimal Stopping Rules, New York: Springer.
Google Scholar
Zucca, C., Tavella, P., and Peskir, G. (2016). Detecting Atomic Clock Frequency Trends Using an Optimal Stopping Method, Metrologia 53: 89–95.
Web of Science ®Google Scholar

Monotonicity and robustness in Wiener disorder detection

Abstract

1. Introduction

2. General model formulation

2.1. Subsection: filtering equations

2.2. Subsection: markovian embedding

2.2.1. Subsubsection: the classical shiryaev solution

3. Value dependencies and robustness

3.1. Subsection: monotonicity properties of the value function

3.2. Subsection: robustness

Acknowledgements

References

Information for

Open access

Opportunities

Help and information

Monotonicity and robustness in Wiener disorder detection

Abstract

1. Introduction

2. General model formulation

2.1. Subsection: filtering equations

2.2. Subsection: markovian embedding

2.2.1. Subsubsection: the classical shiryaev solution

3. Value dependencies and robustness

3.1. Subsection: monotonicity properties of the value function

3.2. Subsection: robustness

Acknowledgements

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date