Abstract
We study the problem of detecting a drift change of a Brownian motion under various extensions of the classical case. Specifically, we consider the case of a random post-change drift and examine monotonicity properties of the solution with respect to different model parameters. Moreover, robustness properties – effects of misspecification of the underlying model – are explored.
1. Introduction
In the classical version of the quickest disorder detection (QDD) problem, see Shiryaev (Citation1967), one observes a one-dimensional process Y which satisfieswhere b and σ are non-zero constants, W is a standard Brownian motion and the disorder time Θ is an exponentially distributed random variable (with intensity ) such that W and Θ are independent. The associated Bayes’ risk (expected cost) corresponding to a stopping rule τ is defined as(1.1) (1.1) where c > 0 is the cost of one unit of detection delay. It is well-known (see (Shiryaev, Citation1978, Chapter 4)) that to minimise the Bayes risk one should stop the first time the conditional probability process reaches a certain level a. Moreover, the level a is characterized as the unique solution of a transcendental equation.
In many situations, however, it is natural not to know the exact value of the disorder magnitude b, but merely its distribution. This is the case for example when a specific machine is monitored continuously, and the machine can break down in several possible ways. To study such a situation, we allow for the new drift to be a random variable B with distribution μ such that B is independent of the other sources of randomness. In this setting we study monotonicity properties of the QDD problem, i.e. whether the (minimal) expected cost is monotone with respect to various model parameters. In particular, we study the dependence of the expected cost on the volatility σ, the distribution μ, and the disorder intensity λ. We also study robustness in the QDD problem, i.e. what happens if one misspecifies various model parameters. More specifically, we aim at estimates for the increased cost associated with the use of suboptimal strategies. Clearly, such estimates are helpful in situations where the model is badly calibrated, but also in situations where one chooses to use a simpler suboptimal strategy rather than a computationally more demanding optimal strategy.
As mentioned above, the classical version of the QDD problem was studied in Shiryaev (Citation1967), see also (Shiryaev, Citation1978, Chapter 4) and (Peskir and Shiryaev, Citation2006, Section 22); for extensions to the case of detecting a change in the intensity of a Poission process, see Peskir and Shiryaev (Citation2002), Bayraktar et al. (Citation2005) and Bayraktar et al. (Citation2006). For the case of a random disorder magnitude, Beibel (Citation1997) obtains asymptotic results of a problem with normally distributed drift. Concavity of the value function in a related hypothesis testing problem with two possible post-change drift values in a time-homogeneous case was obtained in Muravlev and Shiryaev (Citation2014). Finally, practical significance of the disorder detection problem in modern engineering applications is explained in Zucca et al. (2016).
2. General model formulation
We model a signal-processing activity on a stochastic basis , where the filtration satisfies the usual conditions. We are interested in the signal process X, which is not directly observable, but we can continuously observe the noisy process(2.1) (2.1)
Here W is a Brownian motion independent of X, the dispersion σ is deterministic and strictly positive, and the signal process follows(2.2) (2.2) where Θ is a -valued random variable representing the disorder occurrence time. Moreover, B0, B1 are real-valued random variables corresponding to disorder magnitudes in the cases ‘disorder occurs before we start observing Y’ and ‘disorder occurs while we observe Y’, respectively. Also, Θ, B0, and B1 are independent. Let Θ have the distribution , were ν is a probability measure on with a continuously differentiable distribution function . In addition, denote the distributions of B0 and B1 by and , respectively. When referring to and collectively, we will simply say that the prior is μ. Let us introduce the notationand
where . We assume thatwhere and .
The model studied in the paper is a generalisation of the classical disorder occurrence model Shiryaev (Citation1967). Firstly, the exponential disorder distribution used in the classical problem is replaced by an arbitrary distribution with time-dependent intensity. The generalisation is advantageous in situations when the intensity of the disorder occurrence changes with time. For example, if the disorder corresponds to a component failure in a system, for many physical systems, the failure intensity is known to increase with age. Also, if occurrence of the disorder depends on external factors such as weather, then such dependency can be incorporated into the time-dependent disorder intensity from an accurate weather forecast. Moreover, in contrast to the classical problem in which the disorder magnitude is known in advance, in this generalisation, the magnitude takes a value from a range of possible values. Returning to the component failure example, the different possible disorder magnitudes would represent different types of component failure. In the problem of detecting malfunctioning atomic clocks Zucca et al. (2016), the disorder corresponds to a systematic drift of a clock. The sign of the disorder magnitude reflects whether a clock starts to go too slow or too fast while the absolute value represents the severity of the drift. In addition, the different distributions of B0 and B1 and the weight reflect the prior knowledge about how likely different disorder magnitudes are if the disorder happened before or while observing Y. For instance, such model flexibility is relevant when we start observing the system after a particular incident (e.g. a storm if the system is affected by the weather) and we know that the distribution of possible disorder magnitudes after the incident is different than under normal operating conditions. From a mathematical point of view, and B0 allow us to give a statistical interpretation to an arbitrary starting point in the Markovian embedding (2.7) of the original optimal stopping problem studied later.
Remark 2.1.
We point out that the finite support assumption on μ is made for notational convenience. As any distribution can be approximated arbitrarily well by finitely supported ones, obviously, our monotonicity results below can be extended to general disorder magnitude distributions.
We are interested in a disorder detection strategy τ incorporating two objectives: short detection delay and a small portion of false alarms. As noted in the introduction, a classical choice of Bayes’ risk for a detection strategy to minimize is given by (1.1). In the present paper, we consider a slightly more flexible risk structure by allowing a time-dependent cost for the detection delay. More precisely, we consider the Bayes’ riskwhere is a fixed penalty for a false alarm and the term is a penalty for detection delay. Here is a deterministic function with c(t) > 0 for all . Writing for the filtration generated by Y (which is our observation filtration), let us introduce . Then
Hence the optimal stopping problem to solve is(2.3) (2.3) where denotes the set of -stopping times.
2.1. Subsection: filtering equations
Let us define , where . By the Kallianpur-Striebel formula, see (Crisan and Rozovskii, Citation2011, Theorem 2.9 on p. 39),(2.4) (2.4) for . Moreover, from the Kushner-Stratonovich equation, see (Crisan and Rozovskii, Citation2011, Theorem 3.1 on p. 58), we know that satisfies(2.5) (2.5)
Here is the intensity of the disorder occurring at time t > 0 (conditional on not having occurred yet), andis a standard Brownian motion with respect to , see Bain and Crisan (Citation2009) (the process is referred to as the innovation process). Note that yields(2.6) (2.6) where .
The posterior distribution , so the n-tuple fully describes the posterior. As a result, (2.4) and (2.5) provide two different representations of the posterior distribution.
2.2. Subsection: markovian embedding
Following standard lines in optimal stopping theory, we embed our optimal stopping problem into a Markovian framework. To do that, define a Markovian value function V by(2.7) (2.7) where denotes the stopping times with respect to the n-dimensional process starting from π at time t and satisfying (2.5). It is worth noting that corresponds to the value of the problem in which the initial time is t and .
Remark 2.2.
The value function in (2.7) is concave for any . Indeed, the concavity proof in Muravlev and Shiryaev (Citation2014) extends to the current setting. Since concavity is not used in the monotonicity results below, however, we omit the details.
2.2.1. Subsubsection: the classical shiryaev solution
In this subsection we recall the solution in the classical case where the cost c, the intensity λ and the post-change drift b are constants. In that case, we have the optimal stopping problem(2.8) (2.8) with an underlying diffusion process
It is well-known (see (Shiryaev, Citation1978, Chapter 4) or (Peskir and Shiryaev, Citation2006, Section 22)) that U solves the free-boundary problem(2.9) (2.9)
Here is the free-boundary, and it can be determined as the solution of a certain transcendental equation. Moreover, the stopping time is optimal in (2.8), and one can check that the value function U is decreasing and concave.
3. Value dependencies and robustness
3.1. Subsection: monotonicity properties of the value function
In this section, we study parameter dependence of the optimal stopping problem (2.7). In particular, we investigate how the value function changes when we alter parameters of the probabilistic model, which include the prior for the drift magnitude and the prior for the disorder time.
The effects of adding more noise, stretching out the prior by scaling, and increasing the observation cost are explained by the following theorem.
Theorem 3.1
(General monotonicity properties of the value function V).
V is increasing in the volatility .
Given a prior μ for the drift magnitude, let Vk denote the Markovian value function (2.7) in the case when the drift prior is . Then the map is decreasing on for any .
V is increasing in the cost function .
Proof.
For simplicity of notation, and without loss of generality, we consider the case t = 0 in the proofs below.
For the volatility, let and be two time-dependent volatility functions satisying for all . Also, let
and let Vi, i = 1, 2, be the corresponding value functions. In addition, let be a standard Brownian motion independent of W and X. Then, clearly,
Moreover, the processcoincides in law with Y2 and . Hence it follows thatwhich finishes the proof of the claim.
Note that for k > 0, the process
satisfies , where
Moreover, the set of -stopping times coincides with the set of -stopping times, so monotonicity in k is implied by monotonicity in the volatility. Thus claim 2 follows from claim 1.
The fact that the value is increasing in c is obvious from the definition (2.7) of the value function.
□
The monotonicity of the minimal Bayes’ risk with respect to volatility σ is of course not surprising: more noise in the observation process gives a smaller signal-to-noise ratio, which slows down the speed of learning. It is less clear how a change in the disorder intensity λ should affect the value function under a general disorder magnitude distribution. However, we have the following comparison result for the case of constant parameters.
Theorem 3.2
(Monotonicity in the intensity for constant parameters). Assume that the disorder magnitude can only take one value . Let the cost c, the volatility σ and the intensity λ be constants, and assume that . Let U be the value function for Shiryaev’s problem with parameters , and let V denote the value function for the problem specification . Then for all and .
Proof.
Without loss of generality, we only consider the case t = 0. Let , denote by the observation process corresponding to the model specification , and let denote the corresponding process Π started from π at time 0. Let be a bounded stopping time. Then, applying (a generalised version of) Ito’s formula and taking expectations at the stopping time τ, we get
where we used the monotonicity of U and the fact that(3.1) (3.1) at all points away from the optimal stopping boundary of Shiryaev’s classical problem, compare (2.9). Taking the infimum over bounded stopping times τ, we get , which finishes the proof. □
Remark 3.1.
The monotonicity in intensity does not easily extend to cases with unknown post-change drift by the same argument. In fact, one can check that in higher dimensions the partial derivatives are not necessarily all negative, which implies difficulties with extending the above proof to a more general setting. However, the robustness result in Theorem 3.3 below provides a partial extension in which models with general support for the drift magnitude and general intensities are compared with a fixed parameter model.
Though the authors expect the inequality in Theorem 3.2 to hold also when one time-dependent intensity dominates another, the comparison with the constant intensity case was chosen to avoid additional mathematical complications that need to be resolved in order to apply Ito’s formula to the value function of a time-dependent disorder detection problem.
3.2. Subsection: robustness
Robustness concerns how a possible misspecification of the model parameters affects the performance of the detection strategy when evaluated under the real physical measure. In this section, we use coupling arguments to study robustness properties with respect to the disorder magnitude and disorder time. For simplicity, we assume that the parameters λ, c and σ are constant so that we have a time-independent case; generalizations to the time-dependent case are straightforward but notationally more involved.
Thus we assume that the signal process follows(3.2) (3.2) where B0, B1 are random variables with distributions respectively, and Θ has the distribution , where ν is an exponential distribution with intensity λ. Let us simply write .
For a given , let Θl satisfy with distribution , where νl is an exponential distribution with intensity . Letcompare (2.4). Also, we introduce the notationand
Here is the observation process for a setting in which the post-change drift has distribution μ and the disorder happens at Θ. The process is the observation process and is the corresponding conditional probability process in the situation of a post-change drift l that occurs at Θl. Moreover, the process represents the conditional probability process calculated as if the drift change is described by in the scenario where the true drift-change is given by .
Now, let denote the optimal stopping boundary for the classical Shiryaev one-dimensional problem in the model , and defineand
Here is the optimal stopping time in the model , and is the (sub-optimal) stopping time and is the corresponding cost for someone who believes in , whereas the true model is .
Finally, letas in Section 2, and define
Theorem 3.3
(Robustness with respect to disorder magnitude and intensity).
Suppose that or , and let .
Then(3.3) (3.3)
where and denote the minimal associated Bayes’ risks for the models and , respectively.
Also,(3.4) (3.4)
Suppose , and define like for l = r. If , then(3.5) (3.5)
Remark 3.2.
Note that (3.3) and (3.5) correspond to situations in which the tester uses a misspecified model. More precisely, filtering and stopping are performed as if the underlying model had a one-point distribution as the disorder magnitude prior (the classical Shiryaev model). Such a situation may appear due to model miscalibration but is also relevant in situations with limited computational resources as the tester can deliberately choose to under/overestimate the actual parameters in order to use a simpler detection strategy. Equation (3.3) thus gives an upper bound for the expected loss when the classical Shiryaev model is employed. In (3.4), on the other hand, filtering is performed according to the correct model but the simple Shiryaev threshold strategy (suboptimal) is used for stopping.
Proof.
1. (a) For definiteness, we consider the case so that l > 0; the other case is completely analogous. First note that the suboptimality of yields . Next, observe that we have for all and for all , and thereforeandby the filtering Equationequation (3(3.1) (3.1) .3). Consequently,so(3.6) (3.6)
Moreover, since on the time interval , we have
which together with (3.6) yields
(b) The first inequality is immediate by suboptimality of . For the second one, let U be the value function of the classical Shiryaev problem so that . Then U is C2 on and C1 on , so applying Itô’s formula to and taking expectations at the bounded stopping time , we getwhere monotonicity and concavity of U were used in the inequality. Letting giveswhich finishes the proof of the claim.
2. Recall that
Let . Since U is C1 on and C2 on , where a = ar is the boundary in Shiryaev’s problem with drift r and intensity λr, applying Itô’s formula to and taking expectations at a bounded stopping time τ yields(3.7) (3.7) (3.8) (3.8)
Here concavity was used for the first inequality, (3.7) follows from the fact thatand the inequality (3.8) because . Hence, since the same value is obtained if one in (2.3) restricts the infimum to only bounded stopping times,
Lastly, since is a suboptimal strategy, we also havewhich finishes the claim.□
Corollary 3.1.
In the notation above, assume that so that there is no mis-specification of the intensity. Moreover, assume that , where . Thenso monotonicity in the disorder magnitude holds when comparing with deterministic magnitudes. Furthermore,so the increase in the Bayes’ risk due to underestimation (with a constant) of the disorder magnitude is bounded by the difference of two value functions of the classical Shiryaev problem.
We finish with some implications concerning the stopping strategy , where is a standard abstractly defined optimal stopping set, see Peskir and Shiryaev (Citation2006) (we now assume that we are in the case of time-independent coefficients so that the value function is merely a function of ). The concavity of V, compare Remark 2.2, yields the existence of a boundary separating from its complement . The following result provides a more accurate location of the boundary γ.
Corollary 3.2
(Confined stopping boundary). Assume that the coefficients c, σ and λ are constant and that , where . Let al and ar denote the boundaries in the classical Shiryaev problem with disorder magnitude l and r, respectively. Then
i.e. the stopping boundary is contained in a strip. Moreover, the optimal strategy satisfies
Acknowledgements
We thank the Associate Editor and an anonymous referee for their suggestions to improve the paper.
References
- Bain, A. and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Stochastic Modelling and Applied Probability, 60, New York: Springer.
- Beibel, M. (1997). Sequential Change-Point Detection in Continuous Time When the Post-Change Drift is Unknown, Bernoulli 3: 457–478.
- Bayraktar, E., Dayanik, S., and Karatzas, I. (2005). The Standard Poisson Disorder Problem Revisited, Stochastic Processes and Their Applications 115: 1437–1450.
- Bayraktar, E., Dayanik, S., and Karatzas, I. (2006). Adaptive Poisson Disorder Problem, Annals of Applied Probability 16: 1190–1261.
- Crisan, D. and Rozovskii, B. (2011). The Oxford Handbook of Nonlinear Filtering, Oxford: Oxford University.
- Muravlev, A. and Shiryaev, A. (2014). Two-Sided Disorder Problem for a Brownian Motion in a Bayesian Setting, Proceedings of Steklov Institute of Mathematics 287: 202–224.
- Peskir, G. and Shiryaev, A. (2002). Solving the Poisson Disorder Problem, in Advances in Finance and Stochastics, pp. 295–312, Berlin: Springer.
- Peskir, G. and Shiryaev, A. (2006). Optimal Stopping and Free-Boundary Problems, Lectures in Mathematics, ETH Zurich, Basel: Birkhäuser.
- Shiryaev, A. N. (1967). Two Problems of Sequential Analysis, Cybernetics 3: 63–69.
- Shiryaev, A. N. (1978). Optimal Stopping Rules, New York: Springer.
- Zucca, C., Tavella, P., and Peskir, G. (2016). Detecting Atomic Clock Frequency Trends Using an Optimal Stopping Method, Metrologia 53: 89–95.