Full article: Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The problem of filtering of unobservable components x(t) of a multidimensional continuous diffusion Markov process $z (t) = (x (t), y (t))$ , given the observations of the (multidimensional) process y(t) taken at discrete consecutive times with small time steps, is analytically investigated. On the base of that investigation the new algorithms for simulation of unobservable components, x(t), and the new algorithms of nonlinear filtering with the use of sequential Monte Carlo methods, or particle filters, are developed and suggested. The analytical investigation of observed quadratic variations is also developed. The new closed-form analytical formulae are obtained, which characterize dispersions of deviations of the observed quadratic variations and the accuracy of some estimates for x(t). As an illustrative example, estimation of volatility (for the problems of financial mathematics) is considered. The obtained new algorithms extend the range of applications of sequential Monte Carlo methods, or particle filters, beyond the hidden Markov models and improve their performance.

Keywords:

MSC codes:

Public Interest Statement

The problems of filtering of an unobservable process of interest, x(t), or estimation of the value of some function f(x(t)) (or functional) based on observations of the other random process y(t) related to x(t) arise in many areas: signal detection and filtering in radio physical devices; design of control and intelligent systems; financial mathematics; physical chemistry. In the last two decades, a great deal of works has been devoted to development and investigation of Monte Carlo methods for filtering, or particle filtering, for hidden Markov models where x(t) is a Markov process in itself. In the present paper, the new algorithms for the solution of nonlinear filtering problems (with the use of Monte Carlo calculations) are derived and obtained in explicit and closed analytical form for the general case where (x(t),y(t)) represents a multidimensional continuous diffusion Markov process. The new, effective methods for solution of nonlinear filtering difficult problems are obtained.

1. Introduction

In the last two decades, a great deal of works has been devoted to the development and the investigation of particle filters, or sequential Monte Carlo algorithms, for filtering of an unobservable process $x (t) =^{def} (x_{1} (t), \dots, x_{m} (t))$ given observations of the other process $y (t) =^{def} (y_{m + 1} (t), \dots, y_{p} (t))$ , taken at discrete times $t_{k}, t_{0} < t_{1} < t_{2} < \dots < t_{k} < \dots,$ with small time steps $t_{k} - t_{k - 1} =^{def} Δ t_{k}$ (see, e.g. survey (Crisan & Doucet, Citation2002), collection (Doucet, Freitas, & Gordon, Citation2001), works (Carvalho, Del Moral, Monin, & Salut, Citation1997; Del Moral, Citation1998; Doucet, Godsill, & Andrieu, Citation2000), recent complete survey (Doucet & Johansen, Citation2011), survey (Del Moral & Doucet, Citation2014), and references wherein). The observations $y (t_{k})$ are being obtained consecutively, and an estimate for $x (t_{k})$ should be updated at each time $t_{k}$ . It is assumed that the whole process $z (t) =^{def} (x (t), y (t)) =^{def} (z_{1} (t), \dots, z_{p} (t))$ is a Markov process.

In order to solve the problem of filtering of x(t_n), given the sequence of observations $y (t) |_{0}^{n} =^{def} (y (t_{0}), y (t_{1}), \dots, y (t_{n})),$ with the use of Monte Carlo methods, the samples of the random sequences $x^{i} (t) |_{0}^{n} =^{def} (x^{i} (t_{0}), x^{i} (t_{1}), \dots, x^{i} (t_{n}))$ (from the distributions of $x (t_{k})$ when $y (t) |_{0}^{k}$ and ${x (t) |}_{0}^{k - 1}$ are given) should have been simulated numerically for i = 1, …, N.

In general case, first of all, the problem arises: how to obtain an explicit and compact analytical expression (exactly or approximately) for the conditional probability density $P (x (t_{k}) | x (t) {|_{0}^{k - 1}, y (t) |}_{0}^{k})$ in order to simulate samples of $x (t_{k})$ when $y (t) |_{0}^{k}$ and ${x (t) |}_{0}^{k - 1}$ are given.

Suppose that such sample sequences $x^{i} {(t) |}_{0}^{k}$ are being simulated, for i = 1, …, N. The joint probability density for the given observations $y (t) |_{0}^{n}$ and the sequence $x^{i} (t) |_{0}^{n}$ could be computed:(1) $P_{i} (n) =^{def} P (x^{i} (t) {|_{0}^{n}, y (t) |}_{0}^{n}) = P_{0} (x^{i} (t_{0}), y (t_{0})) \prod_{k = 1}^{n} P (x^{i} (t_{k}), y (t_{k}) | x^{i} (t_{k - 1}), y (t_{k - 1})),$ (1)

where $P (x (t_{k}), y (t_{k}) | x (t_{k - 1}), y (t_{k - 1}))$ represents the transition probability density of the Markov process $(x (t), y (t))$ , and $P_{0} (x_{0}, y_{0})$ is the probability density for the initial joint distribution of the initial value x(t₀) and the initial observation y(t₀).

In a large number of works devoted to particle filters the various algorithms of resampling were introduced in order to obtain the most a posteriori probable samples. For the problem of searching for the maximum a posteriori probable sample, the following algorithm of resampling can be suggested: introduce the values $W_{i} (n) =^{def} P_{i} (n) / \sum_{j = 1}^{N} P_{j} (n)$ , for $i = 1, \dots, N$ . Then W_i(n) can be considered as a weight of the sample sequence $x^{i} (t) |_{0}^{n}$ (as well as of the sample point xⁱ(t_n)), which characterizes the a posteriori probability (or the importance) of the sample sequence $x^{i} (t) |_{0}^{n}$ in comparison with the other samples $x^{j} (t) |_{0}^{n}$ given $y (t) |_{0}^{n}$ . Due to the Markov property of the process $z (t)$ , the recursive formulae for $P_{i} (n)$ and W_i(n) can be written in general form; we have

(2) $P_{i} (n + 1) = P_{i} (n) P (x^{i} (t_{n + 1}), y (t_{n + 1}) | x^{i} (t_{n}), y (t_{n})) .$ (2)

Then, it is possible to calculate recursively the new values P_i(n + 1) and W_i(n + 1), when the new measurement y(t_n+1) is obtained and the new sample point xⁱ(t_n+1) is augmented to the sequence $x^{i} (t) |_{0}^{n}$ , so that $x^{i} (t) |_{0}^{n + 1} = (x^{i} (t_{0}), \dots, x^{i} (t_{n}), x^{i} (t_{n + 1}))$ . It is easy to see, that it is not necessary to keep in memory all the sequence $x^{i} (t) |_{0}^{n}$ , but only the last point $x^{i} (t_{n}) .$ Then the samples $x^{i} (t) |_{0}^{n}$ with small weights W_i(n) could be deleted, but the “more important” samples $x^{j} (t) |_{0}^{n}$ should be continued with a few “offsprings”, so that the number of all sample sequences under consideration is equal to N (exactly or approximately). The point $x^{j^{*}} (t_{n})$ and the sequence $x^{j^{*}} (t) |_{0}^{n}$ that have the maximum weight $W_{j^{*}} (n)$ correspond to the point x^j(t_n) and to the sample sequence that is maximally a posteriori probable given the observations $y (t) |_{0}^{n}$ , among all the considered sample points x^j(t_n) and sample sequences $x^{j} (t) |_{0}^{n}$ . Then the value of $x^{j^{*}} (t_{n})$ can be taken as the sought estimate of $x (t_{n}) .$ On the base of the Laws of Large Numbers, it could be proved that this Monte Carlo estimate converges to the maximum a posteriori probability estimate for x(t_n) if N increases (see also Miguez, Crisan, & Djuric, Citation2013).

In most of works the additional assumption is accepted that the process x(t) is a Markov process in itself, and that the conditional probability density $P (y (t_{k}) | y (t_{k - 1}), x (t_{k}), x (t_{k - 1}))$ (for the observable process $y (t)$ ) can be presented in explicit and simple analytical form. Such cases are often referred to as hidden Markov models. Then, in many works, the sample sequences $x^{j} (t) |_{0}^{n}$ are being simulated as trajectories of the Markov process $x (t)$ in itself, since such a simulation can be easily done. The joint probability density, corresponding to the constructed sample $x^{j} (t) |_{0}^{n}$ and to the observations $y (t) |_{0}^{n}$ , is equal to(3) $P (x^{j} (t) |_{0}^{n}, y (t) |_{0}^{n}) = P_{0} (x^{j} (t_{0}), y (t_{0})) \prod_{k = 1}^{n} P (x^{j} (t_{k}) | x^{j} (t_{k - 1})) P (y (t_{k}) | y (t_{k - 1}), x^{j} (t_{k}), x^{j} (t_{k - 1})) = P_{j} (n) .$ (3)

Then the weights W_j(n) (or the other similar weights) could be easily calculated, and the above procedure of sampling and resampling could be realized, in order to obtain the estimate for x(t_n). In some cases it would be better to simulate samples x^j(t_k) with the use of the conditional probability density $P (x (t_{k}) | x (t_{k - 1}), y (t_{k}), y (t_{k - 1})),$ which includes the observed value y(t_k). But this could have required a large amount of computations if we do not have an explicit and compact analytical expression for that conditional probability density. Meantime, if the process x(t) is being simulated just as a Markov process in itself, the sample sequences $x^{j} (t) |_{0}^{n}$ represent samples from the a priori distribution for $x (t_{k})$ that can be far apart from the a posteriori distribution for $x (t_{k})$ given ${y (t) |}_{0}^{k}$ . Then resampling with the use of weights and significant increase in the number N are needed in order to find the most a posteriori probable values of $x (t_{n})$ using the algorithm described above. Nevertheless, for hidden Markov models the particle filters with such a simple simulation of the sample sequences $x^{j} (t) |_{0}^{n}$ , with resampling based on consideration of weights, but with large N, have been implemented and proved to be useful in some applications (Carvalho et al., Citation1997; Thrun, Fox, Burgard, & Dellaert, Citation2001).

In general case when x(t) represents only some components of a multidimensional Markov process $(x (t), y (t))$ (and x(t) is not a Markov process in itself), it was not shown in the literature how to simulate sample sequences $x^{i} (t) |_{0}^{n}$ when the values of the other components, $y (t) |_{0}^{n}$ , are given (i.e. how to simulate sample sequences $x^{i} (t) |_{0}^{n}$ without a formidable amount of computations at each time t_k).

Note that the difficulties in obtaining the samples $x^{i} {(t) |}_{0}^{n}$ that correspond to the distributions $P (x (t_{k}) | x (t) |_{0}^{k - 1}, y ({t) |}_{0}^{k})$ have led to the introduction and the use of some “proposal sampling distributions” (which can be simulated easier) and “auxiliary particle filters” in many works (see, e.g. collection (Doucet, De Freitas, & Gordon, Citation2001) and survey (Doucet & Johansen, Citation2011)). But the “optimal proposal sampling distribution” would be equal to the true distribution $P (x (t_{k}) | x (t) |_{0}^{k - 1}, y ({t) |}_{0}^{k})$ if the latter could have been found in closed form. In the present work the precise, explicit and compact analytical formulae and algorithms for simulation of the sample sequences $x^{i} (t) |_{0}^{n}$ , when $y (t) |_{0}^{n}$ is given, and for recursive calculations of P_i(n), weights, and estimates are obtained for the general case of a multidimensional continuous diffusion Markov process $(x (t), y (t))$ .

For the problem of estimation of a function $f (x (t_{n}))$ or $φ (x (t) |_{0}^{n})$ , the new estimates in explicit and closed analytical form are obtained in the following Section 2.2.

Moreover, in the quotients $P_{i} (n) / \sum_{j = 1}^{N} P_{j} (n)$ the “scale” of all the P_j(n) is cancelled, and the information that could characterize the smallness of the a posteriori probability of all the generated samples $x^{j} (t) |_{0}^{n}$ is lost. In the present work, the new sequential Monte Carlo algorithms are derived that include some tests (developed in Section 2.4) in order to discard the samples of low a posteriori probability before the calculation of all the weights is done. The implementation of the suggested tests guarantees that the samples that remain under consideration belong to the domain where the a posteriori probability density is localized.

In some important cases of hidden Markov models, in the particle filters that were suggested and studied in the literature, where the samples $x^{i} {(t) |}_{0}^{n}$ (with i = 1, …, N) are being simulated as the trajectories of the Markov process x(t) in itself, it is possible that the large part of those N generated samples $x^{i} {(t) |}_{0}^{n}$ would not be localized in the “vicinity” of the “true” realization of the trajectory, $x^{t r} {(t) |}_{0}^{n}$ , because they are simulated with the use of the a priori transition probability density for x(t), and they could be of low a posteriori probability given ${y (t) |}_{0}^{n}$ . For example, if the “level of intensity of noise” in observations y(t) is small, the chance to generate randomly the sample ${x (t) |}_{0}^{n}$ that belongs to the domain where the a posteriori probability density (given ${y (t) |}_{0}^{n}$ ) is concentrated might be as small as the chance to catch randomly a needle in a haystack. Thus, although theoretically in those cases the Monte Carlo estimates converge as N → ∞, the required number N increases if the “level of intensity of noise” decreases. Meanwhile, a large amount of calculations would be wasted for processing of those samples $x^{i} {(t) |}_{0}^{n}$ of low a posteriori probability. In the present paper, in the following Section 2.6, we shall show (for the problem of nonlinear filtering of a signal) that with the use of the new algorithm (17) of simulation $x^{i} {(t) |}_{0}^{n}$ given ${y (t) |}_{0}^{n}$ and with some appropriate change of the mathematical model of description of the process of observations (which could be justified in applications), the new algorithm (17) will generate at once the samples $x^{i} {(t) |}_{0}^{n}$ that are localized in the domain that is a posteriori probable given ${y (t) |}_{0}^{n}$ , for all i = 1, …, N.

The special case when the diffusion coefficients of the observable process y(t) depend on unobservable components x(t) is considered. In that case “observed quadratic variations” of the process y(t) can be introduced (when observations y(t_k) are taken in discrete time), and they contain a lot of information about x(t). In the present paper, the analytical formulae that characterize the observed quadratic variations are obtained in explicit and closed form. On the base of these results, the algorithms of filtering and estimation with the use of the observed quadratic variations are developed.

Theoretically, the derivative of the process of quadratic variation could be incorporated into the set of observed, known processes. If that derivative process were known, then with the use of the new algorithm of simulation (17), new estimates (26) and algorithms developed in the present paper, the filtering problem would be effectively solved, as shown in Section 2.5. But in practice we cannot assume that this derivative process is straightforwardly observed. However, if additional observations of y(t) are available, namely, y(t₀ + sδt), with δt ≪ Δt, $s = 1, 2, \dots,$ then the estimates $\overset{\land}{u} (t_{k})$ for the derivative u(t_k) can be obtained and used for the filtering, as shown in Section 2.5.

The similar systems arise if some additional precisely known observations are available, for example, measurements of some function $H (x (t), t)$ . In that case the solution of the filtering problem, described in Section 2.5, could be implemented.

The obtained new algorithms improve performance of particle filters, or sequential Monte Carlo methods, and extend the range of their applications beyond the hidden Markov models.

The implementation of particle filters, as it is usual for Monte Carlo methods, requires a lot of repeating computations, but it became accessible and useful in some applications, since the speed of computations and the capacity of computers have increased dramatically in the last two decades. Note that the speed can be increased further with the use of parallel computing. The computational cost of implementation of a general scheme of particle filtering on some existing processing units that allow parallel programming was considered e.g. in the paper by Hendeby, Karlsson, and Gustafsson Citation(2010).

2. Derivation of new recursive algorithms for Monte Carlo simulation and filtering of unobservable components of multidimensional diffusion Markov processes

2.1. Simulation of trajectories of unobservable components. Analytical investigation of a multidimensional diffusion Markov process observed at discrete times

Consider the multidimensional diffusion Markov process $z (t) \overset{d e f}{=} (x (t), y (t)) \overset{d e f}{=} (z_{1} (t), \dots, z_{p} (t))$ . The components $(z_{1} (t), \dots, z_{m} (t)) \overset{d e f}{=} x (t)$ are unobservable, but the other components $(z_{m + 1} (t), \dots, z_{p} (t)) \overset{d e f}{=} y (t)$ are available for observations, which are being taken at discrete times t_k.

A diffusion Markov process with continuous trajectories z(t) can be characterized by its drift and diffusion coefficients:(4) $E \{(z_{j} (t + Δ t) - z_{j} (t)) | z (t) = z\} = A_{j} (z, t) Δ t + o (Δ t), i, j = 1, \dots, p,$ (4) (5) $E \{((z_{i} (t + Δ t) - z_{i} (t)) (z_{j} (t + Δ t) - z_{j} (t)) | z (t) = z\} = b_{i j} (z, t) Δ t + o (Δ t),$ (5)

where Δt is a small time step. Denote $Δ z (t) \overset{d e f}{=} z (t + Δ t) - z (t) \overset{d e f}{=} (Δ z_{1} (t), \dots, Δ z_{p} (t)) .$

From the assumption that the trajectories of the process z(t) are continuous functions of t it follows (Kolmogorov, Citation1931/1986, Citation1933/1986) that(6) $lim_{Δ t \to 0} \frac{1}{Δ t} E \{Δ z_{i_{1}} (t) \dots Δ z_{i_{r}} (t) | z (t) = z\} = 0, w i t h r > 2 .$ (6)

The matrix ∥ b_ij ∥ is symmetric and nonnegative definite. Therefore, in general case, that matrix can be represented with the use of its “square root” in the form ∥ b_ij ∥ = ∥ a_ij ∥ ∥ a_ij ∥ ^T, where ∥ a_ij ∥ ^T stands for transpose matrix ∥ a_ij ∥ .

Then the process $z (t)$ could be constructed as a solution of the system of stochastic differential equations:(7) $d z_{i} (t) = A_{i} (z, t) d t + \sum_{j = 1}^{p} a_{i j} (z, t) d w_{j} (t), i = 1, \dots, p, t > t_{0},$ (7)

where $w_{j} (t)$ are independent Wiener processes. The initial condition at t = t₀ is given as $z (t_{0}) = z_{0}$ , where z₀ is a random variable independent of all $w_{j} (t),$ with given probability density P₀(z₀). The system (7) should be interpreted as the system of stochastic integral equations:(8) $z_{i} (t) = \int_{t_{0}}^{t} A_{i} (z (s), s) d s + \sum_{j = 1}^{p} \int_{t_{0}}^{t} a_{i j} (z (s), s) d w_{j (} s), i = 1, \dots, p,$ (8)

with stochastic integrals in the sense of Ito (Doob, Citation1953). It is assumed that the drift coefficients A_i(z, t) satisfy the Lipschitz condition(9) $‖ A_{i} (z^{(1)}, t) - A_{i} (z^{(2)}, t) ‖ \leq K ‖ z^{(1)} - z^{(2)} ‖, w i t h K = C o n s t .$ (9)

It is also assumed that the diffusion coefficients b_ij(z, t) are continuous and differentiable functions of z and t. The Lipschitz conditions (9) guarantee that trajectories z(t) do not have finite escapes, i.e. $|z (t)|$ does not tend to infinity when t tends to some finite moments of time.

The system of integral equations (8) with any given continuous trajectory $w (t) \overset{d e f}{=} (w_{1} (t), \dots, w_{p} (t))$ can be solved with the use of successive approximations, which converge and define the continuous trajectory $z (t)$ . Thus, the trajectories of the process z(t) are continuous with the probability 1.

The diffusion Markov process z(t) can be also constructed as the limit solution of the system of finite difference equations:(10) $z_{i} (t_{k}) - z_{i} (t_{k - 1}) = A_{i} (z (t_{k - 1}), t_{k - 1}) Δ t_{k} + \sum_{j = 1}^{p} a_{i j} (z (t_{k - 1}), t_{k - 1}) η_{j} (t_{k}) \sqrt{Δ t_{k}},$ (10)

where the random impacts $η_{j} (t_{k})$ are independent random variables (for all $j = 1, \dots, p$ , $k = 1, 2, \dots$ ), with $E η_{j} (t_{k}) = 0$ and $E {(η_{j} (t_{k}))}^{2} = 1$ ; $Δ t_{k} \overset{d e f}{=} t_{k} - t_{k - 1}$ ; $i = 1, \dots, p$ . In particular, the increments of Wiener processes, $Δ w_{j} (t_{k}) \overset{d e f}{=} w_{j} (t_{k}) - w_{j} (t_{k - 1}),$ can be used in the finite difference equations (Equation10(10) $z_{i} (t_{k}) - z_{i} (t_{k - 1}) = A_{i} (z (t_{k - 1}), t_{k - 1}) Δ t_{k} + \sum_{j = 1}^{p} a_{i j} (z (t_{k - 1}), t_{k - 1}) η_{j} (t_{k}) \sqrt{Δ t_{k}},$ (10) ) instead of $η_{j} (t_{k}) \sqrt{Δ t_{k}}$ :(11) $z_{i} (t_{k}) - z_{i} (t_{k - 1}) = A_{i} (z (t_{k - 1}), t_{k - 1}) Δ t_{k} + \sum_{j = 1}^{p} a_{i j} (z (t_{k - 1}), t_{k - 1}) Δ w_{j} (t_{k}),$ (11)

Here again it is assumed that the Lipschitz conditions (9) are satisfied, and that the functions A_i(z, t) and b_ij(z, t) are differentiable with respect to z. The construction of diffusion Markov processes by the passage to the limit in the scheme of finite difference equations when time steps Δt_k tend to zero was first introduced and investigated by Academician S.N. Bernstein in 1934, 1938 years. The works (Bernstein, Citation1934/1964, Citation1938/1964) were published at first in French. They are republished in Collected works of S.N. Bernstein, vol.4 (in Russian), Publishing House Nauka (Academy of Science of USSR, Moscow, 1964). The results, obtained in these works (Bernstein, Citation1934/1964, Citation1938/1964), define analytically all the joint probability distributions for the limit process z(t), for all the random variables $z (t_{i_{1}})$ , $z (t_{i_{2}})$ ,…, $z (t_{i_{r}}),$ for any r and $t_{i_{r}}$ . By the Theorem of A.N. Kolmogorov on extension (or continuation) of measures in function space (CitationKolmogorov, 1933/1950), those multidimensional distributions can be continued to the measure on the σ-algebra that corresponds to the process z(t) with continuous time t. The convergence of the linear broken lines, which represent interpolation of the solutions of finite difference stochastic equations (10) or (11), to the continuous trajectories of diffusion Markov processes was also proved in (Kushner, Citation1971). Thus, the diffusion Markov process $z (t)$ is well defined by the system of finite difference equations (which could be more general, but similar to (10)) and by the passage to the limit from those Markov processes with discrete time (Bernstein, Citation1934/1964, Citation1938/1964). Analytical methods and solutions for some problems of filtering and estimation, based on recursive finite difference stochastic equations, with the passage to the limit (as the time steps tend to zero), similarly to the scheme of S.N. Bernstein, were developed and investigated in (Khazen, Citation1968, Citation1971, Citation1977; Chapter 3; Khazen, Citation2009)

In the following Section 2.2, the goal is to obtain the estimates of $x (t_{n})$ or $f (x (t_{n}))$ with the use of Monte Carlo calculation of some integrals that can be interpreted as mathematical expectations. We should notice the following properties that are important in order to achieve that goal. Denote $P_{Δ} (z (t_{i_{1}}), \dots, z (t_{i_{r}}))$ the joint probability density for the random process z_Δ(t) that represents the solution of the finite difference equations (10) or (11) with small time steps Δt_k = Δ (and with piecewise constant or piecewise linear interpolation on the small time intervals $(t_{0} + k Δ, t_{0} + (k + 1) Δ)$ , $k = 0, 1, 2, \dots$ ). Denote $P (z (t_{i_{1}}), \dots, z (t_{i_{r}}))$ the joint probability density for the limit diffusion Markov continuous process z(t) obtained as Δ → 0. It was established in the works by Bernstein (Citation1934/1964, 1938/1964) that $P_{Δ} (z (t_{i_{1}}), \dots, z (t_{i_{r}})) \to P (z (t_{i_{1}}), \dots, z (t_{i_{r}}))$ as Δ → 0. Hence, the contribution to the error of the Monte Carlo estimates of the integrals $\int f (x (t_{n})) P (x (t) |_{0}^{n}, y (t) |_{0}^{n} {) d x (t) |}_{0}^{n}$ and $\int P (x (t) |_{0}^{n}, y (t) |_{0}^{n} d x (t) |_{0}^{n}$ (which are considered below in (23), (24)), caused by approximation of the limit diffusion process z(t) by its pre-limit finite difference model (11), tends to zero as Δ → 0. For the scheme (11), it was also proved in (Kushner, Citation1971, Chapter 10) that trajectories z_Δ(t) converge in mean square to the trajectories of the limit diffusion process z(t) as Δ → 0, so that E $\{{[z (t) - z_{Δ} (t)]}^{2}\} \to 0$ . Consequently, $|E \{f (z (t))\} - E \{f (z_{Δ} (t))\}| \to 0$ , if the function f(z) satisfies Lipschitz condition. Meanwhile, in the problems of filtering the best estimate of $f (x (t_{n}))$ (by the criterion of minimum mean square error) is $\hat{f (x (t_{n})}$ $= E \{f (x (t_{n})) {| y (t) |}_{0}^{n}\}$ , and, in general case, the value of the mean square error of filtering, $E \{{[f (x (t_{n})) - \hat{f (x (t_{n}))}]}^{2}\}$ , remains finite, positive, limited from below, even if the error of calculation of that conditional expectation tends to zero. Thus, the accuracy of the estimate of x(t_n) or $f (x (t_{n}))$ could not be noticeably improved even if the accuracy of approximation of the limit diffusion process z(t) by its finite difference model (11) increased as Δ decreased. Therefore, it is justified to use the finite difference model (11) for description and simulation of the considered process z(t), in order to obtain the estimates of x(t_n) or $f (x (t_{n}))$ that provide solutions to the considered filtering problems (with feasible precision) in case if the observations y(t_k) are being taken in discrete consecutive times t_k with small time steps.

` In the present work, we are interested to describe analytically the conditional probability density $P (x (t_{k}) | x (t) |_{0}^{k - 1}, y (t) |_{0}^{k})$ . Consider the local increment Δz of the Markov process $z (t)$ when the value $z (t) = z$ is given, with small time step Δt; denote $Δ z \overset{d e f}{=} z (t + Δ t) - z (t) \overset{d e f}{=} (Δ x, Δ y)$ . We have the relations (4)–(6) for the moments of Δz given $z (t) = z .$ Then the characteristic function for the random value Δz can be presented in the form:

(12) $F (u | z, t) \overset{d e f}{=} E \{e x p (i \sum_{k = 1}^{p} u_{k} Δ z_{k}) | z (t) = z\} \overset{d e f}{=} \int e x p (i \sum_{k = 1}^{p} u_{k} Δ z_{k}) P (Δ z | z, t) d Δ z = e x p \{Δ t \sum_{k = 1}^{p} A_{k} (z, t) (i u_{k}) + \frac{1}{2} Δ t \sum_{k, l = 1}^{p} b_{k l} (z, t) (i u_{k}) (i u_{l}) + o (Δ t)\},$ (12)

where $u = (u_{1}, \dots, u_{p})$ is a real vector. The last equality follows from the known expression for the characteristic function of a random variable with the use of its moments (see, for example, course of the Theory of Probability (Gnedenko, Citation1976). The inverse transformation provides the representation(13) $P (Δ z | z, t) \overset{d e f}{=} {(2 π)}^{- p} \int e x p (- i \sum_{k = 1}^{p} u_{k} Δ z_{k}) F (u | z, t) d u = {(2 π Δ t)}^{- \frac{p}{2}} {(D e t ‖ b_{i j} ‖)}^{- 1 / 2} e x p \{- \frac{1}{2 Δ t} \sum_{i, j = 1}^{p} r_{i j} (z, t) (Δ z_{i} - A_{i} (z, t) Δ t) (Δ z_{j} - A_{j} (z, t) Δ t)\},$ (13)

where $‖ r_{i j} ‖ \overset{d e f}{=} ‖ b_{i j} ‖^{- 1}$ is the inverse matrix of ∥ b_ij ∥ (with 1 $\leq i, j \leq p) .$ The above expressions (12), (13) show that the local increment Δz (in the small, but finite time interval Δt) of the multidimensional diffusion Markov process z(t) with continuous trajectories, when $z (t) = z$ is given, can be considered as a multidimensional Gaussian random variable.

Using the Theorem on Normal Correlation, we find the following expressions for the first and second moments of the conditional probability distribution of the increments Δx_α (with (α = 1, …, m) provided that the increments Δy_ρ (with ρ = (m + 1), …, p) and the value $z (t) = z$ are given (Khazen, Citation2009, Chapter 3, Sections 3.1.2 and 3.3.1, pages 79–81, 101–106):(14) $E \{Δ x_{α} | Δ y, z\} = A_{α} (z, t) Δ t + b_{α σ} (z, t) c_{σ ρ} (z, t) (Δ y_{ρ} - A_{ρ} (z, t) Δ t) + o (Δ t),$ (14)

(15) $E \{Δ x_{α} Δ x_{β} | Δ y, z\} = (b_{α β} (z, t) - b_{a σ} (z, t) c_{σ ρ} (z, t) b_{ρ β} (z, t)) Δ t + o (Δ t) .$ (15)

Hereinafter the summation is being made over repeated indices, with 1 ≤ α, β ≤ m, $(m + 1) \leq σ, ρ \leq p;$ the matrix ∥ c_σρ ∥ represents the inverse or pseudo-inverse (Moore–Penrose) matrix of the matrix of the diffusion coefficients of the observable components y_ρ(t), so that $‖ c_{σ ρ} ‖ =^{def} ‖ b_{σ ρ} ‖^{+} .$

For the probability density of the increments Δy, we obtain the following expression:(16) $P (Δ y | z (t) = z) = C (z, t) e x p \{- \frac{1}{2 Δ t} c_{σ ρ} (z, t) (Δ y_{ρ} - A_{ρ} (z, t) Δ t) (Δ y_{σ} - A_{σ} (z, t) Δ t)\},$ (16)

where C(z, t) is the normalization factor.

Note, that in the theory of filtering of diffusion Markov processes with observations made in continuous time t, i.e. if $y (s)$ is supposed to be known exactly on the time interval t₀ ≤ s ≤ t, in such case it is assumed that the diffusion coefficients b_σρ, with $(m + 1) \leq σ, ρ \leq p$ , do not depend on unobservable components $x (t) .$ In the contrary case (as it was pointed out by this author (Khazen, Citation1977; Chapter 3; Khazen, Citation2009, Chapter 3), since the diffusion coefficients $b_{σ ρ} (x (t), y (t), t)$ could be (at least, theoretically) precisely restored on the base of a single realization of the observed trajectory y(s) on the small time interval t − δ ≤ s ≤ t + δ (no matter how small the taken value of δ is), the functions b_σρ(x(t), y(t), t) could have been incorporated into the set of observable functions. In the problem at hand, when filtering of the process x(t) should be obtained on the base of observations $y (t_{k}),$ taken at discrete times t_k with small, but finite time steps Δt_k, that restriction may be cancelled. We shall consider further (in Sections 2.4, 2.5) how to incorporate some estimates of the values of $b_{σ ρ} (x (t_{k}), y (t_{k}), t_{k})$ given ${y (t) |}_{0}^{k}$ in order to improve filtering of $x (t_{n}) .$

The matrix ∥ b_αβ − b_ασc_σρb_ρβ ∥ is symmetric and nonnegative definite, and it can be presented in the following form:

∥ b_αβ − b_ασc_σρb_ρβ ∥ $=^{def}$ $‖ g_{α β} ‖ ‖ g_{γ δ} ‖^{T},$ with 1 ≤ α, β, γ, δ ≤ m,

Then the samples x^j(t_k), when $x^{j} {(t) |}_{0}^{k - 1}$ and ${y (t) |}_{0}^{k}$ are given, can be simulated as(17) $x_{α}^{j} (t_{k}) = x_{α}^{j} (t_{k - 1}) + A_{α} (x^{j} (t_{k - 1}), y (t_{k - 1})) Δ t_{k} + g_{α β} (x^{j} (t_{k - 1}), y (t_{k - 1})) ξ_{β}^{j} (k) \sqrt{Δ t_{k}} + b_{α σ} (x^{j} (t_{k - 1)}, y (t_{k - 1})) c_{σ ρ} (x^{j} (t_{k - 1}), y (t_{k - 1})) (y_{ρ} (t_{k}) - y_{ρ} (t_{k - 1}) - A_{ρ} (x^{j} (t_{k - 1}), y (t_{k - 1})) Δ t_{k}),$ (17)

where $ξ_{β}^{j} (k)$ are independent samples from Gaussian standard distribution, with $E ξ_{β}^{j} (k) = 0$ , $E {(ξ_{β}^{j} (k))}^{2} = 1$ (for all β = 1, …, m, $j = 1, \dots, N$ , $k = 1, 2, \dots)$ ; Δt_k = t_k - t_k-1.

We shall denote shortly $b_{α σ} (x^{j} (t_{k - 1)}, y (t_{k - 1})) =^{def} b_{α σ}^{j} (t_{k - 1}), c_{σ ρ} (x^{j} (t_{k - 1)}, y (t_{k - 1})) =^{def} c_{σ ρ}^{j} (t_{k - 1}),$ $A_{σ} (x^{j} (t_{k - 1)}, y (t_{k - 1})) =^{def} A_{σ}^{j} (t_{k - 1}),$ $c_{σ ρ} (x (t_{k - 1}), y (t_{k - 1})) =^{def} c_{σ ρ} (t_{k - 1}) .$

Note, that the above expressions (14)–(17) show that if all the diffusion coefficients b_αρ vanish (with 1 ≤ α ≤ m, $(m + 1) \leq ρ \leq p$ ) and if $A_{α} (z, t)$ , b_αβ(z, t) do not depend on y(t), then ${x (t) |}_{1}^{n}$ will be simulated without the use of ${y (t) |}_{1}^{n}$ . Only the initial measurement $y (t_{0})$ will be taken into account: the initial sample point $x^{j} (t_{0})$ should be simulated as a random variable with the probability density(18) $P_{0} (x (t_{0}) | y (t_{0}) = y_{0}) = P_{0} (x (t_{0}), y_{0}) / \int^{} P_{0} (x (t_{0}), y_{0}) d x (t_{0}) .$ (18)

This particular case do not correspond to all the hidden Markov models: The process x(t) could be a Markov process in itself, but it is still possible, that some coefficients b_αρ are not equal to zero, for example, in case, if there are some “white noises”, which enter simultaneously into equations for x(t) and into “random disturbances or random errors” of the observable process y(t).

In general case of a multidimensional continuous Markov diffusion process $(x (t), y (t))$ , the influence of the observed data ${y (t) |}_{1}^{n}$ is manifested in the expressions (Equation17(17) $x_{α}^{j} (t_{k}) = x_{α}^{j} (t_{k - 1}) + A_{α} (x^{j} (t_{k - 1}), y (t_{k - 1})) Δ t_{k} + g_{α β} (x^{j} (t_{k - 1}), y (t_{k - 1})) ξ_{β}^{j} (k) \sqrt{Δ t_{k}} + b_{α σ} (x^{j} (t_{k - 1)}, y (t_{k - 1})) c_{σ ρ} (x^{j} (t_{k - 1}), y (t_{k - 1})) (y_{ρ} (t_{k}) - y_{ρ} (t_{k - 1}) - A_{ρ} (x^{j} (t_{k - 1}), y (t_{k - 1})) Δ t_{k}),$ (17) ), which describe simulation of the random samples $x^{j} (t_{k})$ .

The values P_i(n) and W_i(n) can be easily calculated:

(19) $P_{i} (n) = P_{0} (x^{i} (t_{0}), y (t_{0})) \prod_{k = 1}^{n} P (Δ x^{i} (t_{k}) | Δ y (t_{k}), x^{i} (t_{k - 1}), y (t_{k - 1})) P (Δ y (t_{k}) | x^{i} (t_{k - 1}), y (t_{k - 1})),$ (19)

where the conditional probability densities for the increments, Δxⁱ(t_k) $=^{def} x^{i} (t_{k}) - x^{i} (t_{k - 1})$ , Δy(t_k) $=^{def} y (t_{k}) - y (t_{k - 1}),$ are Gaussian densities with moments determined by (14), (15), and (16). And the above procedure of sampling and resampling, which corresponds to a search for the point x(t_n) that provides maximum to the a posteriori probability density, could be implemented. In the next Sections 2.2–2.4, the results of the further analytical investigation and development of the algorithms are presented.

2.2. Estimating of a function f(x(t_n))

Consider the problem of estimation of some functions $f (x (t_{n}))$ or $φ (x ({(t) |}_{0}^{n}))$ given $y (t) |_{0}^{n} .$ The general expression for the estimate can be written in the following form:(20) $\hat{f (x (t_{n}))} = E \{f (x (t_{n})) {| y (t) |}_{0}^{n}\} = \frac{\int^{} f (x (t_{n})) P (x (t) {|_{0}^{n}, y (t) |}_{0}^{n}) d x (t) |_{0}^{n}}{\int^{} {P (x (t) |}_{0}^{n}, y (t) |_{0}^{n} {) d x (t) |}_{0}^{n}} .$ (20)

It is assumed that for the considered functions f(x) or $φ (x (t) |_{0}^{n})$ this conditional expectation exists. In case of a Markov process $(x (t_{n}), y (t_{n}))$ , we can write:(21) $P (x (t) {|_{0}^{n}, y (t) |}_{0}^{n}) = P_{0} (x (t_{0}) | y (t_{0})) P_{0} (y (t_{0})) \prod_{k = 1}^{n} P (x (t_{k}) | y (t_{k}), y (t_{k - 1}), x (t_{k - 1})) \times \prod_{s = 1}^{n} P (y (t_{s}) | y (t_{s - 1}), x (t_{s - 1})) .$ (21)

In our case of a multidimensional continuous diffusion Markov process, we have derived the explicit analytical expressions (19), (16). Hence, in this case, we obtain(22) $\prod_{s = 1}^{n} P (y (t_{s}) | y (t_{s - 1}), x (t_{s - 1})) = C (x (t) {|_{0}^{n - 1}, y (t) |}_{0}^{n - 1}) e x p \{- \frac{1}{2} \sum_{s = 1}^{n} \frac{1}{Δ t_{s}} c_{o ρ} (t_{s - 1}) [Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) - 2 Δ y_{σ} (t_{s}) A_{ρ} (t_{s - 1}) Δ t_{s} + A_{σ} (t_{s - 1}) A_{ρ} (t_{s - 1}) {(Δ t_{s})}^{2}]\},$ (22)

where $C (x (t) |_{0}^{n - 1}, y (t) |_{0}^{n - 1}$ ) represents the known normalization factor. Denote shortly $C_{i} (n - 1) =^{def} C (x^{i} (t) |_{0}^{n - 1}, y (t) |_{0}^{n - 1})$ .

If N samples $x^{i} {(t) |}_{0}^{n}$ , with $i = 1, \dots, N$ and N ≫ 1, have been independently taken from the distribution $P_{0} (x (t_{0}) | y (t_{0})) \prod_{k = 1}^{n} P (x (t_{k}) | y (t_{k}), y (t_{k - 1}), x (t_{k - 1})),$

then, in accordance with basics of Monte Carlo methods, the following integrals can be interpreted as mathematical expectations. Then we obtain(23) $\int f (x (t_{n})) {P (x (t) |}_{0}^{n}, y (t) |_{0}^{n}) d x (t) |_{0}^{n} \approx P_{0} (y (t_{0}) \frac{1}{N} \sum_{i = 1}^{N} f (x^{i} (t_{n})) C_{i} (n - 1) \times e x p \{- \frac{1}{2} \sum_{s = 1}^{n} \frac{1}{Δ t_{s}} c_{σ ρ}^{i} (t_{s - 1}) [Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) - 2 Δ y_{σ} (t_{s}) A_{ρ}^{i} (t_{s - 1}) Δ t_{s} + A_{σ}^{i} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) {(Δ t_{s})}^{2}]\},$ (23)

and similarly(24) ${\int P (x (t) |}_{0}^{n}, y (t) |_{0}^{n}) d x (t) |_{0}^{n} \approx P_{0} (y (t_{0}) \frac{1}{N} \sum_{i = 1}^{N} C_{i} (n - 1) \times e x p \{- \frac{1}{2} \sum_{s = 1}^{n} \frac{1}{Δ t_{s}} c_{σ ρ}^{i} (t_{s - 1}) [Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) - 2 Δ y_{σ} (t_{s}) A_{ρ}^{i} (t_{s - 1}) Δ t_{s} + A_{σ}^{i} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) {(Δ t_{s})}^{2}]\} .$ (24)

Due to the Laws of Large Numbers, the accuracy of those approximate estimates increases as N increases.

Denote shortly(25) $Υ_{i} (n) = C_{i} (n - 1) e x p \{- \frac{1}{2} \sum_{s = 1}^{n} \frac{1}{Δ t_{s}} c_{σ ρ}^{i} (t_{s - 1}) [Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) - 2 Δ y_{σ} (t_{s}) A_{ρ}^{i} (t_{s - 1}) Δ t_{s} + A_{σ}^{i} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) {(Δ t_{s})}^{2}]\} .$ (25)

Then the sought estimate $\hat{f (x (t_{n}))}$ can be written in the following form:(26) $\hat{f (x (t_{n}))} = \sum_{i = 1}^{N} f (x^{i} (t_{n})) \overset{}{{\tilde{W}}_{i}} (n), where \overset{}{{\tilde{W}}_{i}} (n) = \frac{Υ_{i} (n)}{\sum_{j = 1}^{N} Υ_{j} (n)} .$ (26)

In most of applications, we can assume that there exists the mathematical expectation of the random value $|f (x^{i} (t_{n}))| Υ_{i} (n)$ , where ${y (t) |}_{0}^{n}$ is fixed and $x^{i} {(t) |}_{0}^{n}$ represent the results of independent random simulations with the use of (17); i.e. $E {|f (x^{i} (t_{n}))| Υ_{i} (n)} < \infty$ . This assumption is equivalent to the following: $E \{|f (x (t_{n}))| {| y (t) |}_{0}^{n}\} < \infty$ . Denote $ζ_{i} = f (x^{i} (t_{n})) Υ_{i} (n) .$ The random values ζ_i are independent, and they have one and the same probability distribution, and $E \{| ζ_{i} |\} < \infty$ . Then the Strong Law of Large Numbers (the Theorem of Kolmogorov) guarantees that the sums (i.e. the arithmetic means $\frac{1}{N} \sum_{i = 1}^{N} ζ_{i}$ ) in right-hand side of the formulae (23), (24) converge with probability one (i.e. almost surely) to the integrals in left-hand side if N → ∞.

Note that in case if the diffusion coefficients b_σρ (for $m < σ, ρ \leq p$ ) of the observable process $y (t)$ do not depend on unobservable process x(t), then $C_{i} (n - 1)$ and $c_{σ ρ}^{i} (t_{s - 1})$ do not depend on i, and cofactors $C_{i} (n - 1) e x p \{- \frac{1}{2} \sum_{s = 1}^{n} \frac{1}{Δ t_{s}} c_{σ ρ}^{i} (t_{s - 1}) Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s})\}$ should be cancelled in the numerator and denominator of the formula (Equation2(2) $P_{i} (n + 1) = P_{i} (n) P (x^{i} (t_{n + 1}), y (t_{n + 1}) | x^{i} (t_{n}), y (t_{n})) .$ (2) 6). In that case we can put:(27) $Υ_{i} (n) = e x p \{- \frac{1}{2} \sum_{s = 1}^{n} c_{σ ρ} (t_{s - 1}) [- 2 A_{ρ}^{i} (t_{s - 1}) Δ y_{σ} (t_{s}) + A_{σ}^{i} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) Δ t_{s}]\} .$ (27)

Note that the calculations of the values Y_i(n) (25) or (27) can be implemented recurrently since they are determined by the values of the recursively accumulated sums.

In case of hidden Markov models where the unobservable process $x (t)$ is a Markov process in itself, and the observable process y(t) is described by a stochastic differential equation (SDE):(28) $d y (t) = h (x (t), t) d t + σ d w (t),$ (28)

where h(x, t) is a given nonlinear function with respect to x, $σ = C o n s t,$ and w(t) is a standard Wiener process, the above estimate $\hat{f (x (t_{n}))}$ takes the form, which is similar to the estimates for $f (x (t_{n}))$ that were earlier constructed and studied for this particular case in the literature. The novelty of the estimates (23)–(27) and the simulation (17), obtained in the present paper, is that the simulation of unobservable components and estimates $\hat{f (x (t_{n}))}$ are obtained in explicit and closed analytical form for the general case of partially observable multidimensional continuous diffusion Markov processes, and the obtained Monte Carlo estimate (26) converges with probability one to the sought posterior expectation of $f (x (t_{n}))$ given ${y (t) |}_{0}^{n}$ , if N → ∞.

Now we shall demonstrate that the Monte Carlo estimates (23)–(26), (27) for $f (x (t_{n}))$ (obtained in the present paper) hold true also in the case if the sample sequences $x^{i} {(t) |}_{0}^{n}$ are being generated with the use of some branching sampling procedures.

Consider the following branching resampling. As it was already pointed out in the Introduction, it is purposeful to discard the samples $x^{i} {(t) |}_{0}^{n}$ that have negligibly small weights W_i(n) or $\overset{}{{\tilde{W}}_{i}} (n)$ , in order to decrease the amount of all the computations.

Suppose that in the end of each time interval $(T_{k}, T_{k + 1}]$ (with $k = 0, 1, 2, \dots$ ; T₀ = t₀; T_k+1 = T_k + m_kΔt, and $m_{k}$ and N_k are given numbers) each sample sequence $x^{i} {(t) |}_{0}^{n}$ that still existed at the time T_k+1 is being continued with N_k+1 “offsprings”. At the initial moment of time, T₀, there are N₀ independently taken initial sample points xⁱ(t₀), $i = 1, \dots, N_{0}$ . When t_n = T_k+1 and $t_{n + 1} = T_{k + 1} + Δ t$ , sample points $x^{(i, j)} (t_{n + 1})$ are being augmented to the sequence $x^{i} {(t) |}_{0}^{n},$ with $j = 1, \dots, N_{k + 1}$ , in order to construct the “offsprings”. The sample points $x^{(i, j)} (t_{n + 1})$ are being independently taken from the distribution $P (x (t_{n + 1}) | x^{i} (t) |_{0}^{n}, y (t) |_{0}^{n + 1})$ . Then those sample sequences (with “offsprings”) will be continued until the next time of branching, T_k+2, except for some sample sequences that are discarded before T_k+2 since their weights become “too small” (for example, smaller than some chosen threshold $W_{c r}$ ); and so on. For the simplicity of discussion, we can renumber again all the current sample sequences (that still exist at the time t_n) as $x^{i} {(t) |}_{0}^{n}$ . Consider at first the case when all the sample sequences are being continued without discarding. Then their number is growing, and at the time T_k+1 it is equal to $N_{0} \times N_{1} \times \dots \times N_{k}$ (with k > 0).

Consider the estimate $\bar{f (x (t_{n}))} = E \{f (x (t_{n})) {| y (t) |}_{0}^{n}\},$

under condition that the above branching sampling procedure is being implemented. Denote $(t_{r (n)}, t_{(r + 1) (n)}]$ the time interval (between the moments of branching) that contains t_n, so that $t_{r (n)} = T_{k (n)} < t_{n} \leq T_{k (n) + 1} = t_{(r + 1) (n)}$ .

From the Markov property of the random process $(x (t), y (t))$ and the factorization (21), (22), and from the “tower property” of conditional expectations:

$E \{f (x (t_{n})) {| y (t) |}_{0}^{n}\} = E \{E \{f (x (t_{n}) | y (t) |_{0}^{n}, x (t) |_{0}^{r (n)}\}\} = E \{E \{E \{f (x (t_{n}) | y (t) |_{o}^{n}, x (t) |_{0}^{r (n)}\} {| x (t) |}_{0}^{(r - 1) (n)}\}\} = \dots,$

it follows that the formulae (23)–(26), (27) hold true for the Monte Carlo estimate of integral $\bar{f (x (t_{n}))}$ in the case if the sample sequences $x^{i} (t) |_{0}^{n}$ are generated with the use of the above branching procedure.

We can consider the Monte Carlo estimates for the integrals (23), (24) for each i-th “tree” of the branching sample sequences, which begins at the point $x^{i} (t_{0})$ , with $i = 1, \dots, N_{0}$ . Denote these random values as ξ_i(n), with $i = 1, \dots, N_{0}$ . Then the random values ξ_i(n) are independent of each other, and they have one and the same probability distribution, and $E \{|ξ_{i}|\} < \infty$ . Then they obey the Strong Law of Large Numbers, so that $\frac{1}{N_{0}} \sum_{i = 1}^{N_{0}} ξ_{i}$ converges with probability one to the sought integral as N₀ → ∞.

But many of the exponential weights $\overset{}{{\tilde{W}}_{i}} (n)$ are rapidly decreasing with increase in the number n of time steps. Therefore, it is possible to discard some highly a posteriori improbable sample sequences $x^{i} (t) |_{0}^{n}$ , which do not provide noticeable contribution into the estimates $\hat{f (x (t_{n}))}$ . The numbers m_k, N_k, and the threshold W_cr can be adjusted (in practical implementation for a particular class of applications) in order to decrease the amount of all the calculations and make them feasible, and at the same time to keep the current sample points xⁱ(t_n) in the domain where the a posteriori probability density is localized. In practical implementation, it can be also purposeful to begin branching not at the fixed moment of time T_k but at some current moment of time t_n when the number of all existed sample sequences (or sample points) becomes less than some given fraction of N.

It is worth noting that in the following Section 2.4 the tests (47), (48) are obtained that allow detection and rejection of samples xⁱ(t_n) of low a posteriori probability to be done without the use of the weights W_i(n) or $\overset{}{{\tilde{W}}_{i}} (n)$ (for $i = 1, \dots, N)$ , i.e. independently for each sample sequence $x^{i} {(t) |}_{0}^{n}$ . Thus, in the above algorithm we can use such tests instead of the comparison of the weight $\overset{}{{\tilde{W}}_{i}} (n)$ with the threshold W_cr. Then the algorithm described above can be effectively implemented with the use of parallel computing, since the simulation and continuation (the branching resampling) of each sample sequence $x^{i} {(t) |}_{0}^{m}$ can be done independently of the other samples $x^{k} {(t) |}_{0}^{m}$ with k ≠ i. The values Y_i(n) determined by (25) or (27) will be also calculated recurrently and independently for each sample sequence $x^{i} {(t) |}_{0}^{n}$ . Thus, that large amount of calculations could be effectively performed with the use of parallel computing. And only on the other stage all the obtained accumulated values Y_i(n) (with $i = 1, \dots, N$ ) will be used for the calculation of the weights $\overset{}{{\tilde{W}}_{i}} (n)$ , which are needed for the calculation of the sought estimate $\hat{f (x (t_{n}))}$ .

The new Monte Carlo algorithm of estimation of $f (x (t_{n}))$ or $φ (x ({t) |}_{0}^{n})$ (presented above) is derived straightforwardly from Bayes formula (Equation2(2) $P_{i} (n + 1) = P_{i} (n) P (x^{i} (t_{n + 1}), y (t_{n + 1}) | x^{i} (t_{n}), y (t_{n})) .$ (2) 0). The estimates are constructed with the use of the samples xⁱ(t_n) or $x^{i} {(t) |}_{0}^{n}$ that are simulated by (17), and their weights $\hat{f (x (t_{n}))}$ are defined by (23)–(26), (27).

For the particular case of hidden Markov models, another algorithm with random branching resampling for “particle filtering” of unobservable Markov “signal” x(t_n) was developed, that is the Sequential Importance Resampling (SIR) algorithm; it is studied in the works (Crisan & Doucet, Citation2002; Del Moral, Citation1998), for the processes with discrete time. In that SIR algorithm with random branching resampling, the a posteriori probability distribution for x(t_n) is approximated by some “cloud of random particles” xⁱ(t_n) with weights w_i(n) = 1/N, $i = 1, \dots, N .$ With the use of our new algorithm of simulation of unobservable components x(t_n) given ${y (t) |}_{0}^{n}$ (see Section 2.1, (17)) and the new closed form analytical expressions (14)–(17), (23)–(27), that SIR algorithm with random branching can be generalized for the general case of filtering of unobservable components x(t) of a multidimensional diffusion Markov process $(x (t), y (t))$ , in the following way.

In the generalized algorithm that we are proposing the sample sequence $x^{i} {(t) |}_{o}^{m}$ should be chosen for continuation at the time of branching T_m = t_r(m) (and kept up to the next time of branching, T_m+1) with probability p_i(m) (which will be determined below) in each of N independent attempts to continue their ensemble, so that the expectation of the random number K_i(m) of its “offsprings” is equal to $E \{K_{i} (m) {| x^{1} (t_{r (m)}), \dots, x^{K (m)} (t_{r (m)}), y (t) |}_{0}^{r (m)}\} = p_{i} (m) N$ , with i = 1, …, K(m), where K(m) denotes the number of all the sample points xⁱ(t_r(m)) that still existed at the time t_r(m). We can suggest that the times of branching T_m+1 = T_m + MΔt, with M ≥ 1, $m = 0, 1, 2, \dots .$ (Note, that M = 1 in the standard SIR algorithms). The probability p_i(m) depends on all the sample points x¹(t_m), …, $x^{K (m)} (t_{m})$ or sample sequences $x^{i} {(t) |}_{0}^{m}$ . Such a procedure of random resampling is similar to the standard SIR algorithm, but we have to determine the probability p_i(m) for the general case of multidimensional diffusion Markov processes $(x (t), y (t))$ .

For the general case, with the use of analytical expressions (16)–(27), we derive the following expression: $p_{i} (m) = \frac{Q_{i} (m) e x p \{- R_{i} (m)\}}{\sum_{j = 1}^{K (m)} Q_{j} (m) e x p \{- R_{j} (m)\}},$

where

$R_{i} (m) = \frac{1}{2} \sum_{s = 0}^{M - 1} \frac{1}{Δ t} c_{σ ρ}^{i} (t_{r (m - 1) + s}) [Δ y_{σ} (t_{r (m - 1) + s + 1}) Δ y_{ρ} (t_{r (m - 1) + s + 1}) - 2 Δ y_{σ} (t_{r (m - 1) + s + 1}) A_{ρ}^{i} (t_{r (m - 1) + s}) Δ t + A_{σ}^{i} (t_{r (m - 1) + s}) A_{ρ}^{i} (t_{r (m - 1) + s}) {(Δ t)}^{2}],$

and Q_i(m) denotes the normalization factor of the probability density $Q_{i} (m) e x p \{- R_{i} (m)\}$ . Here the notations are the same as earlier in Sections 2.1, 2.2.

In the case if the diffusion coefficients b_σρ of the observed process y(t) do not depend on the unobservable process x(t), the above expression can be written in the following more concise form: $p_{i} (m) = \frac{e x p \{- V_{i} (m)\}}{\sum_{j = 1}^{K (m)} exp \{- V_{j} (m)\}},$

where $V_{i} (m) = \frac{1}{2} \sum_{s = 0}^{M - 1} c_{σ ρ} (t_{r (m - 1) + s}) [- 2 Δ y_{σ} (t_{r (m - 1) + s + 1}) A_{ρ}^{i} (t_{r (m - 1) + s}) + A_{σ}^{i} (t_{r (m - 1) + s}) A_{ρ}^{i} (t_{r (m - 1) + s}) Δ t] .$

In case of hidden Markov models (28), which can be considered as a particular case of a multidimensional diffusion Markov process (x(t), y(t)), from the above expression follows: $p_{i} (m) = \frac{e x p \{- Γ_{i} (m)\}}{\sum_{j = 1}^{K (m)} exp \{- Γ_{j} (m)\}},$

where $Γ_{i} (m) = \frac{1}{2 σ^{2}} \sum_{s = 0}^{M - 1} [- 2 h (x^{i} (t_{r (m - 1) + s})) Δ y (t_{r (m - 1) + s + 1}) + h^{2} (x^{i} (t_{r (m - 1) + s})) Δ t]$

For the simplicity of notation, the above formula is written for the case of one-dimensional processes $x (t)$ and $y (t)$ , which satisfy (28). The last expression for p_i(m) (for the case M = 1) is in agreement with the determination of the probabilities of continuation of the samples xⁱ(t_n) presented in (Crisan & Doucet, Citation2002; Del Moral, Citation1998) for the case of hidden Markov models.

With the use of the above procedure of random branching and with our algorithm (17) of simulation of $x (t_{n})$ given ${y (t) |}_{0}^{n}$ , we can generalize the SIR algorithm of particle filtering with random branching, which was developed for hidden Markov models, to the general case of multidimensional diffusion Markov processes $(x (t), y (t))$ .

Thus, in the present paper a few different possible versions of algorithms of particle filters are provided for filtering of unobservable components x(t_n) given $y (t) |_{0}^{n}$ , which are justified theoretically. The new algorithm developed above, with branching sampling and recurrent calculation of the values (25) or (27), appears to be preferable for implementation with the use of parallel computing. The further practical implementation and comparison for various applications could be achieved in the future works.

2.3. Analytical investigation of observed quadratic variations

Consider the case when some diffusion coefficients b_σρ of the observable process y(t) depend on the unobservable process x(t). We assume throughout this paper that the diffusion coefficients $b_{i j} (z, t)$ are continuous and differentiable functions with respect to z. For simplicity of notation, consider at first the case when y(t) is one-dimensional. For example, consider the following model that plays an important role in financial mathematics (Hull, Citation2000):(29) $d x = A (x, t) d t + σ_{1} (x, t) d w_{1} (t), d y = (μ - \frac{x^{2}}{2}) d t + σ_{2} x d w_{2} (t) .$ (29)

Here $w_{1} (t), w_{2} (t)$ are independent Wiener processes, μ and σ₂ are known constants, and the functions A(x, t) and σ₁(x, t) are continuous and differentiable with respect to x. In the famous Black–Scholes models (Hull, Citation2000), the observable process y(t) in (29) represents the natural logarithm of a stock price, S(t), so that $y (t) =^{def} log S (t)$ , while the process $|x (t)|$ corresponds to volatility, which is to be estimated given the path y(s), t₀ ≤ s ≤ t. If the process y(s) were available for observations continuously over the time interval t₀ ≤ s ≤ t, the value $|x (t)|$ could have been restored precisely, at least theoretically.

It is well known that a diffusion Markov process z(t) is not differentiable at any time t. It can be proved that its quadratic variations(30) $\sum_{k = 1}^{M} [z_{i} (t_{k}) - z_{i} (t_{k - 1})] [z_{j} (t_{k}) - z_{j} (t_{k - 1})] \underset{m a x \{Δ t_{k}\} \to 0}{\to} \int_{t - δ}^{t + δ} b_{i j} (z (s), s) d s,$ (30)

if time steps Δt_k tend to zero, while M → ∞; here $(t - δ) = t_{0} < t_{1} < \dots < t_{k} < \dots < t_{M} = (t + δ)$ , with given δ > 0, and the value of δ can be arbitrarily small. Note that the relation (Equation30(30) $\sum_{k = 1}^{M} [z_{i} (t_{k}) - z_{i} (t_{k - 1})] [z_{j} (t_{k}) - z_{j} (t_{k - 1})] \underset{m a x \{Δ t_{k}\} \to 0}{\to} \int_{t - δ}^{t + δ} b_{i j} (z (s), s) d s,$ (30) ) will be also obtained as a consequence of the new analytical estimates that are developed below.

For the system (29), $b_{22} (x (t), y (t), t) = {(σ_{2})}^{2} x^{2} (t)$ , and from (30) it follows that the value of x²(t) could be restored with any desirable accuracy. Then $|x (t)| = + \sqrt{x^{2} (t)}$ , and the filtering problem would be solved exactly.

But the observations $y (t_{k})$ are being taken at discrete times t_k with small, but finite time steps $Δ t_{k}$ , so that it is impossible to observe precisely the values of $b_{σ ρ} (x (t_{k}), y (t_{k}), t_{k}) .$ In general case, we can calculate the “observed quadratic variations”,(31) $V_{σ ρ}^{(o b s)} (t_{k}) =^{def} \sum_{s = 0}^{M} [y_{σ} (t_{k - s}) - y_{σ} (t_{k - s - 1})] [y_{ρ} (t_{k - s}) - y_{ρ} (t_{k - s - 1})],$ (31)

and consider it as an estimate for $b_{σ ρ} (x (t_{k}), y (t_{k}), t_{k}) .$ The similar estimates for volatilities were introduced and considered in (Hull, Citation2000, chapter 15). It is more convenient to use the recursive averaging instead of the moving averaging (31). Then(32) $V_{σ ρ}^{(o b s)} (t_{k}) =^{def} e^{- λ Δ t_{k}} V_{σ ρ}^{(o b s)} (t_{k - 1}) + [y_{σ} (t_{k}) - y_{σ} (t_{k - 1})] [y_{ρ} (t_{k}) - y_{ρ} (t_{k - 1})],$ (32)

with $λ > 0 .$ In the limit, if all Δt_k decreased and tended to zero, we would have obtained

(33) $V_{σ ρ}^{(o b s)} (t) \underset{m a x \{Δ t_{k}\} \to 0}{\to} \int_{t_{0}}^{t} e^{- λ (t - s)} b_{σ ρ} (x (s), y (s), s) d s .$ (33)

We can incorporate the observed quadratic variations $V_{σ ρ}^{(o b s)} (t_{k})$ into the set of observed data. We shall show further, in Section 2.4, how to use that “observed quadratic variations” in order to reject at once the samples $x^{i} (t) |_{0}^{k}$ that are highly improbable given $y (t) |_{0}^{k}$ .

The recursive formula (Equation32(32) $V_{σ ρ}^{(o b s)} (t_{k}) =^{def} e^{- λ Δ t_{k}} V_{σ ρ}^{(o b s)} (t_{k - 1}) + [y_{σ} (t_{k}) - y_{σ} (t_{k - 1})] [y_{ρ} (t_{k}) - y_{ρ} (t_{k - 1})],$ (32) ) implies that(34) $V_{σ ρ}^{(o b s)} (t_{k}) = \sum_{s = 1}^{k} e^{- λ (t_{k} - t_{s})} [y_{σ} (t_{s}) - y_{σ} (t_{s - 1})] [y_{ρ} (t_{s}) - y_{ρ} (t_{s - 1})] .$ (34)

We may assume that the considered realization of the process z(t), $(x (t_{0}), y (t_{0})), (x (t_{1}), y (t_{1})), \dots, (x (t_{s}), y (t_{s})), \dots$ , is being obtained consecutively in accordance with finite difference stochastic equations (Equation11(11) $z_{i} (t_{k}) - z_{i} (t_{k - 1}) = A_{i} (z (t_{k - 1}), t_{k - 1}) Δ t_{k} + \sum_{j = 1}^{p} a_{i j} (z (t_{k - 1}), t_{k - 1}) Δ w_{j} (t_{k}),$ (11) ). The value of observed quadratic variation $V_{σ ρ}^{(o b s)} (t_{k})$ is being accumulated in parallel, along with that realization of z(t), so that the random increment $Δ y_{ρ} (t_{s}) =^{def} y_{ρ} (t_{s}) - y_{ρ} (t_{s - 1})$ will be produced after $(x (t_{s - 1}), y (t_{s - 1}))$ is realized. We are interested to describe the properties of the observed quadratic variations $V_{σ ρ}^{(o b s)} (t_{k})$ under condition that the realization of unobservable components, ${x (t) |}_{0}^{k}$ , is fixed, although it is unknown, and that the next measurement y(t_s) will arrive after the previous measurement $y (t_{s - 1})$ is already given.

The conditional expectation $E \{Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) | x (t_{s - 1}), y (t_{s - 1})\} = b_{σ ρ} (x (t_{s - 1}), y (t_{s - 1}), t_{s - 1}) Δ t_{s} .$

In general case, as it was demonstrated above, the increment Δy = (Δy_m+1, …, Δy_p) in small time step Δt, given $z (t) = z$ , can be described as the Gaussian multidimensional random variable with the probability density (16). The increments Δy(t_k) are independent from the past history before t_k-1 if z(t_k-1) is given. The following properties of Gaussian distributions will be useful in the sequel: If ξ is a Gaussian random variable with Eξ = 0, then Eξ⁴ = 3(Eξ²)². If (ξ₁, ξ₂, ξ₃, ξ₄) is a four-dimensional Gaussian random variable with $E \{ξ_{i}\} = 0$ (for i = 1, …, 4), then $E \{ξ_{1} ξ_{2} ξ_{3} ξ_{4}\} = E \{ξ_{1} ξ_{2}\} E \{ξ_{3} ξ_{4}\} + E \{ξ_{1} ξ_{3}\} E \{ξ_{2} ξ_{4}\} + E \{ξ_{1} ξ_{4}\} E \{ξ_{2} ξ_{3}\} .$

In general case of multidimensional process y(t), we obtain $E \{{(Δ y_{σ} (t_{k}) Δ y_{ρ} (t_{k}) - b_{σ ρ} (t_{k}) Δ t_{k})}^{2} | x (t_{k - 1}), y (t_{k - 1})\} = [b_{σ σ} (x (t_{k - 1}), y (t_{k - 1}), t_{k - 1}) Δ t_{k}] [b_{ρ ρ} (x (t_{k - 1}), y (t_{k - 1}), t_{k - 1}) Δ t_{k}] + {[b_{σ ρ} (x (t_{k - 1}), y (t_{k - 1}), t_{k - 1}) Δ t_{k}]}^{2} .$

Denote shortly $b_{o ρ} (x (t_{k}), y (t_{k}), t_{k}) =^{def} b_{σ ρ} (t_{k})$ . Introduce the following value:(35) $\sum_{s = 1}^{k} e^{- λ (t_{k} - t_{s})} b_{σ ρ} (t_{s - 1}) (t_{s} - t_{s - 1}) =^{def} \tilde{b_{σ ρ}} (t_{k}) .$ (35)

Formula (35) can be written in recursive form, similar to (32):(36) $\tilde{b_{σ ρ}} (t_{k}) = e^{- λ (t_{k} - t_{k - 1})} \tilde{b_{σ ρ}} (t_{k - 1}) + b_{σ ρ} (t_{k - 1}) (t_{k} - t_{k - 1}) .$ (36)

Consider deviation

(37) $ψ_{σ ρ} (t_{k}) =^{def} V_{σ ρ}^{(o b s)} (t_{k}) - \tilde{b_{σ ρ}} (t_{k}) .$ (37)

For simplicity of discussion suppose that all the time steps $Δ t_{s} = Δ t .$ , we obtain: $E ψ_{σ ρ} (t_{k}) = o (Δ t)$ . That notation means that this small value is proportional to a value which decreases faster than Δt if Δt decreases.

The increments Δy_ρ(t_s) and Δy_ρ(t_q) with t_q < t_s are independent when $(x (t_{s - 1}), y (t_{s - 1}))$ is given. We obtain that the dispersion of the deviation ψ_σρ(t_k) (37) (under condition that ${x (t) |}_{0}^{k}$ is fixed) is equal to(38) $V a r \{ψ_{σ ρ} (t_{k})\} =^{def} E \{{(ψ_{σ ρ} (t_{k}))}^{2} {| x (t) |}_{0}^{k}\} = \sum_{s = 1}^{k} e^{- 2 λ (t_{k} - t_{s})} E \{(b_{σ σ} (t_{s - i}) b_{ρ ρ} (t_{s - 1}) + {[b_{σ ρ} (t_{s - 1})]}^{2}) {| x (t) |}_{0}^{s - 1}\} {(t_{s} - t_{s - 1})}^{2} .$ (38)

In case if the diffusion coefficients of the observable process, $b_{σ ρ} (z, t),$ are the functions of x and t only, so that $b_{σ ρ} (z (t), t) = b_{σ ρ} (x (t), t)$ (with $(m + 1) \leq σ, ρ \leq p)$ we obtain(39) $V a r \{ψ_{σ ρ} (t_{k})\} = \sum_{s = 1}^{k} e^{- 2 λ (t_{k} - t_{s})} (b_{σ σ} (t_{s - 1}) b_{ρ ρ} (t_{s - 1}) + {[b_{σ ρ} (t_{s - 1})]}^{2}) {(t_{s} - t_{s - 1})}^{2} =^{def} \overset{}{{\tilde{d}}_{σ ρ}} (x (t) |_{0}^{k}, t_{k}) =^{def} \overset{}{{\tilde{d}}_{σ ρ}} (t_{k}) .$ (39)

The value (39) also can be calculated recursively, similarly to (36). Note, that the value $\overset{}{{\tilde{d}}_{σ ρ}} (t_{k})$ is a value proportional to Δt. If σ = ρ, the above expression takes more concise form:

(40) ${\tilde{d}}_{ρ ρ} (t_{k}) = 2 \sum_{s = 1}^{k} e^{- 2 λ (t_{k} - t_{s})} {[b_{ρ ρ} (t_{s - 1})]}^{2} {(t_{s} - t_{s - 1})}^{2} .$ (40)

In general case, if $b_{σ ρ} (z (t), t)$ depend on y(t), we can find the estimates $E \{b_{σ ρ} (t_{s}) {| x (t) |}_{0}^{s - 1}\} < K_{σ ρ} < \infty$ , where $K_{σ ρ}$ may be determined either as some functions of ${x (t) |}_{0}^{s - 1}$ or as some constants $K_{σ ρ} \leq K_{1}$ . Then for the variance (38), we obtain the following estimates: $V a r \{ψ_{σ ρ} (t_{k})\} < 2 \sum_{s = 1}^{k} e^{- 2 λ (t_{k} - t_{s})} (K_{σ ρ} {(x ({t) |}_{o}^{s - 1}))}^{2} {(Δ t_{s})}^{2},$ (41) $V a r \{ψ_{σ ρ} (t_{k})\} < {(K_{1})}^{2} (\frac{1}{λ}) Δ t .$ (41)

The process z(t) is considered on the finite time interval t₀ ≤ s ≤ T, so that the values of $K_{σ ρ} (x (t) |_{0}^{s - 1})$ are limited. Thus, the variance of the deviation (37) is a value proportional to Δt, and if Δt → 0 the observed quadratic variation tends to the limit (33). We shall demonstrate below, that the probability distribution for the deviation ψ_ρρ or $ψ_{σ ρ}$ tends to the Gaussian one as Δt → 0. It follows that the deviations converge to zero if Δt → 0.

Besides, we obtained new analytical formulae (38)–(41), which describe the dispersion of the deviation (37) of the observed quadratic variation when observations are taken in discrete times, with small time steps.

For the system (29), the general analytical formulae (38)–(40), obtained in the present paper, provide characterization of the accuracy of the estimate of volatility $|x (t_{k})|$ , which is constructed with the use of the observed quadratic variation. If this accuracy satisfies requirements, the problem of estimation is solved. In that case the value of the given time step Δt can be considered as “small enough”.

In general case, if the value of x(t_k) cannot be uniquely recovered only on the base of observed quadratic variations, the further filtering may be needed.

Finally, we are going to demonstrate that the probability density of the random value ψ_ρρ(t_k) can be approximately described as a Gaussian one. The deviation ψ_ρρ(t_k) contains the sum of squares of the increments $Δ y_{ρ} (t_{s})$ , which are independent of the past history before t_s-1 given y(t_s-1), x(t_s-1). The increments $Δ y_{ρ} (t_{s})$ (given y(t_s-1), x(t_s-1)) can be described as the Gaussian random variables with probability density (16). The sum of squares of Gaussian random variables is not a Gaussian random variable. However, the probability density for the random deviation ψ_ρρ(t_k) can be approximately described as the Gaussian probability density. Consider the case when $b_{σ ρ} (z (t), t)$ (with $(m + 1) \leq ρ, σ \leq p)$ are the functions of x(t) and t only. Then the variance of deviation ψ_ρρ(t_k) is equal to $\overset{}{{\tilde{d}}_{ρ ρ}} (t_{k})$ (40), which represents a value proportional to Δt (here the time step Δt can be small, but Δt is finite).

Note, that for a Gaussian random variable ξ with Eξ = 0 the following relations hold true: $E \{ξ^{2 r}\} = Q (r) {(E \{ξ^{2}\})}^{r}$ , where Q(r) is a constant, and for the odd moments $E \{ξ^{2 r + 1}\} = 0$ , with $r = 1, 2, 3, \dots$ . Those relations can be useful in estimating of the moments of $Δ y_{ρ} (t_{s})$ when (y(t_s-1),x(t_s-1)) is given. Then the consideration of the higher moments of the deviation proves that $E {(ψ_{ρ ρ} (t_{k}))}^{q} = o (Δ t)$ , if q > 2. Then the characteristic function of the random deviation ψ_ρρ(t_k) takes the form similar to (12), which implies the Gaussian expression for its inverse transformation, similar to (13). Hence, the probability density of that deviation can be approximately described as a Gaussian probability density. The smaller Δt the higher the precision of that approximation, although the deviation itself tends to zero if Δt → 0. But in practice the time step should not be chosen too small since in our mathematical model y(t) is considered as a diffusion random process, which is not differentiable at any time t. That is not the case in practice where we have a smooth trajectory $\tilde{y} (t)$ , and the quadratic variation of $\tilde{y} (t)$ is equal to zero. Diffusion approximation can be accepted when the time steps Δt_s ≥ τ_cor, where τ_cor represents the characteristic time span of decay of correlation, so that correlation between the random values $Δ \overset{}{\tilde{y}} (t_{s})$ and $Δ \overset{}{\tilde{y}} (t_{s - 1})$ (given $x (t_{s - 1}), \overset{}{\tilde{y}} (t_{s - 1})$ ) would be negligibly small (Khazen, Citation1968, Citation1971, Citation1977; Chapters 2, 3; Khazen, Citation2009).

2.4. Detection and rejection of highly a posteriori improbable samples

In the process of resampling, the a posteriori improbable samples $x {(t)}_{0}^{n}$ are being rejected when their weights become small. Besides, it is possible to discard some highly a posteriori improbable samples before all the weights $W_{i} (n)$ or $\overset{}{{\tilde{W}}_{i}} (n)$ are calculated. It can be done with the use of the following tests that are based on analysis of the obtained analytical expressions (Equation19(19) $P_{i} (n) = P_{0} (x^{i} (t_{0}), y (t_{0})) \prod_{k = 1}^{n} P (Δ x^{i} (t_{k}) | Δ y (t_{k}), x^{i} (t_{k - 1}), y (t_{k - 1})) P (Δ y (t_{k}) | x^{i} (t_{k - 1}), y (t_{k - 1})),$ (19) ), (21), (Equation22(22) $\prod_{s = 1}^{n} P (y (t_{s}) | y (t_{s - 1}), x (t_{s - 1})) = C (x (t) {|_{0}^{n - 1}, y (t) |}_{0}^{n - 1}) e x p \{- \frac{1}{2} \sum_{s = 1}^{n} \frac{1}{Δ t_{s}} c_{o ρ} (t_{s - 1}) [Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) - 2 Δ y_{σ} (t_{s}) A_{ρ} (t_{s - 1}) Δ t_{s} + A_{σ} (t_{s - 1}) A_{ρ} (t_{s - 1}) {(Δ t_{s})}^{2}]\},$ (22) ).

The measurements ${y (t) |}_{0}^{k}$ are being obtained under condition that there is an unobservable realization of the process $x (t), x^{t r} {(t) |}_{0}^{n}$ , which is unknown. (Here the superscript “tr” stands for “true”). The first and the second moments of the random variables Δy_ρ(t_s) given $x^{t r} {(t) |}_{0}^{k}$ are equal to(42) $E \{Δ y_{ρ} (t_{s}) | x^{t r} (t_{s - 1}), y (t_{s - 1})\} = A_{ρ} (x^{t r} (t_{s - 1}), y (t_{s - 1}), t_{s - 1}) Δ t_{s} =^{def} A_{ρ}^{t r} (t_{s - 1}) Δ t_{s},$ (42) $E \{Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) | x^{t r} (t_{s - 1}), y (t_{s - 1})\} = b_{σ ρ} (x^{t r} (t_{s - 1}), y (t_{s - 1}), t_{s - 1}) Δ t_{s} + o (Δ t_{s}) =^{def} b_{σ ρ}^{t r} (t_{s - 1}) Δ t_{s} + o (Δ t_{s})$

For simplicity of discussion we can assume that all the time steps are equal to Δt.

Consider at first the case when the diffusion coefficients b_σρ of the observable process y(t) do not depend on x(t); then $b_{σ ρ}^{i} = b_{σ ρ}^{t r} = b_{σ ρ}$ .

The value P_i(k) (19) contains the following cofactor:

(43) $C e x p \{- \frac{1}{2} \sum_{s = 1}^{k} \frac{1}{Δ t_{s}} c_{o ρ} (t_{s - 1}) [Δ y_{σ} (t_{s}) Δ y_{ρ} (t_{s}) - 2 Δ y_{σ} (t_{s}) A_{ρ}^{i} (t_{s - 1}) Δ t_{s} + A_{σ}^{i} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) {(Δ t_{s})}^{2}]\},$ (43)

where the normalization factor C does not depend on $x^{i} {(t) |}_{0}^{k}$ in that case.

Consider the hypothesis H_i that the sample $x^{i} {(t) |}_{0}^{k}$ is situated in the vicinity of the true realization $x^{t r} {(t) |}_{0}^{k}$ , and the “closeness” means that $| A_{ρ}^{i} (t_{s - 1}) - A_{ρ}^{t r} (t_{s - 1}) | \leq D_{ρ} (x^{i} (t_{s - 1}), y (t_{s - 1}), t_{s - 1}) =^{def} D_{ρ}^{i} (t_{s - 1})$ . Here the function D_ρ(x, y, t) can be determined on the basis of some preliminary analysis of the considered dynamical system, for example, we can put $D_{ρ} (x, y, t) = |\frac{\partial A_{ρ} (x, y, t)}{\partial x_{α}}| R_{α}$ , where R_α are constants, 1 ≤ α ≤ m. In order to distinguish between two hypotheses, H_i and its negation $\overset{}{{\bar{H}}_{i}}$ , consider the following value:(44) $Q^{i} (t_{k}) =^{def} \sum_{s = 1}^{k} c_{σ ρ} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) Δ y_{σ} (t_{s}) .$ (44)

Note, that Qⁱ(t_k) can be recursively calculated. Denote

(45) $G^{i} (t_{k}) =^{def} \sum_{s = 1}^{k} c_{σ ρ} (t_{s - 1}) A_{σ}^{i} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) Δ t_{s} .$ (45)

The larger the difference $|Q^{i} (t_{k}) - G^{i} (t_{k})|$ the farther the sample $x^{i} {(t) |}_{0}^{k}$ can be from $x^{t r} {(t) |}_{0}^{k}$ Denote $\overset{}{Δ {\tilde{y}}_{σ}} (t_{s}) = Δ y_{σ} (t_{s}) - A_{σ}^{t r} (t_{s - 1}) Δ t_{s}$ , $U^{i} (t_{k}) =^{def} \sum_{s = 1}^{k} c_{σ ρ} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) \overset{}{Δ {\tilde{y}}_{σ}} (t_{s})$ .

Consider the variance of the random value Uⁱ(t_k) when the realization $x^{t r} {(t) |}_{0}^{k}$ is given. We obtain $V a r {U^{i} (t_{k})} = \sum_{s = 1}^{k} E {c_{σ ρ} (t_{s - 1}) b_{σ η} (t_{s - 1}) c_{η τ} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) A_{τ}^{i} (t_{s - 1}) {| x^{t r} (t) |}_{0}^{s} \} Δ t_{s} .$

If the matrix ∥ b_σρ ∥ is not degenerated, then b_σηc_ητ = δ_στ, where δ_στ is Kronecker symbol, δ_στ = 1 if σ = τ and δ_στ = 0 if σ ≠ τ. In that case, the above expression for $V a r {U^{i} (t_{k})}$ can be written in more concise form:(46) $V a r {U^{i} (t_{k})} = \sum_{s = 1}^{k} E {c_{σ ρ} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) A_{σ}^{i} (t_{s - 1}) {| x^{t r} (t) |}_{0}^{s} \} Δ t_{s}$ (46)

In order to reject the samples ${x (t) |}_{0}^{k}$ , which are highly improbable given ${y (t) |}_{0}^{k}$ , we can use approximately the following test. If the considered dynamic system is stable, the following value can be used as an approximate estimate for the above expression (46), if the hypothesis H_i is true and the measurements ${y (t) |}_{0}^{k}$ are obtained: $\tilde{V} a r^{} {U^{i} (t_{k})} = \sum_{s = 1}^{k} c_{σ ρ} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) A_{σ}^{i} (t_{s - 1}) Δ t_{s} .$

The probability of the following event:(47) $|Q^{i} (t_{k}) - G^{i} (t_{k})| > [\sum_{s = 1}^{k} c_{σ ρ} (t_{s - 1}) A_{ρ}^{i} (t_{s - 1}) D_{σ}^{i} (t_{s - 1}) Δ t_{s} + 3 \sqrt{\overset{}{\tilde{V} a r} \{U^{i} (t_{k})\}} \}$ (47)

is small under condition that H_i is true. Hence, if the inequality (47) is satisfied for the considered sample $x^{i} {(t) |}_{0}^{k}$ , the hypothesis $H_{i}$ should be rejected, and that sample $x^{i} {(t) |}_{0}^{k}$ should be discarded. If all the considered sample sequences are discarded, the new sample points $x (t_{k})$ should be generated as initial points, i.e. as samples from the initial distribution $P_{0} (x (t_{k}) | y (t_{k}))$ (18). For the samples $x^{i} {(t) |}_{0}^{k}$ that remain under consideration the values P_i(k) and W_i(k) or $\overset{}{{\tilde{W}}_{i}} (k)$ will be recursively computed. The samples that were not discarded can be continued with a few “offsprings”. Due to the test (47) the samples $x^{i} {(t) |}_{0}^{k}$ that are highly a posteriori improbable (when the sequence of observations $y (t) |_{0}^{k}$ is given) will be rejected at once, before the weights W_j(k) or $\overset{}{{\tilde{W}}_{i}} (k)$ are computed.

Consider the case when some diffusion coefficients depend on x(t). In some cases, as it was shown above, the unobservable value of $x (t)$ can be restored with the use of the observed quadratic variations. In other cases, if x(t) cannot be uniquely recovered, it is purposeful to use observed quadratic variations $V_{ρ ρ}^{(o b s)} (t_{k})$ in order to reject the samples $x^{i} {(t) |}_{0}^{k}$ that are highly a posteriori improbable. That test is similar to the above test (47). Assume that if hypothesis H_i is true, the sample $x^{i} {(t) |}_{0}^{k}$ is “close” to the true realization $x^{t r} {(t) |}_{0}^{k}$ , and the “closeness” means that $|b_{ρ ρ} (x^{i} (t_{s}), y (t_{s}), t_{s}) - b_{ρ ρ} (x^{t r} (t_{s}), y (t_{s}), t_{s})| \leq B_{ρ} (x^{i} (t_{s}), y (t_{s}), t_{s}) =^{def} B_{ρ}^{i} (t_{s}) .$

Here the function B_ρ(x, y, t) can be determined as $B_{ρ} (x, y, t) = |\frac{\partial b_{ρ ρ} (x, y, t)}{\partial x_{α}}| R_{α}$ , where R_α are constants. Denote shortly (similarly to (35), (36), (38), (39)): ${\tilde{b}}_{σ ρ}^{i} (t_{k}) =^{def} \overset{}{{\tilde{b}}_{σ ρ}} (x^{i} (t) |_{0}^{k}, y (t) |_{0}^{k}, t_{k}) {\tilde{d}}_{σ ρ}^{i} (t_{k}) =^{def} \overset{}{{\tilde{d}}_{σ ρ}} (x^{i} (t) |_{0}^{k}, y (t) |_{0}^{k}, t_{k}),$ ${\tilde{B}}_{σ ρ}^{i} (t_{k}) =^{def} \overset{}{{\tilde{B}}_{ρ}} (x^{i} (t) {|_{0}^{k}, y (t) |}_{0}^{k}, t_{k}) =^{def} \sum_{s = 1}^{k} e^{- λ (t_{k} - t_{s})} B_{ρ} (x^{i} (t_{s}), y (t_{s}), t_{s}) (t_{s} - t_{s - 1})$

$\overset{}{{\tilde{B}}_{ρ}^{i}} (t_{k}) = e^{- λ (t_{k} - t_{k - 1})} \overset{}{{\tilde{B}}_{ρ}^{i}} (t_{k - 1}) + B_{ρ}^{i} (t_{k}) (t_{k} - t_{k - 1})$

Consider the case when $b_{σ ρ} (z (t), t) = b_{σ ρ} (x (t), t)$ (with $(m + 1) \leq σ, ρ \leq p$ ). Then the probability of the following event(48) $|V_{ρ ρ}^{(o b s)} (t_{k}) - {\tilde{b}}_{ρ ρ}^{i} (t_{k})| > [\overset{}{{\tilde{B}}_{ρ}^{i}} (t_{k}) + 3 \sqrt{\overset{}{{\tilde{d}}_{ρ ρ}^{i}} (t_{k})}]$ (48)

is small if the hypothesis H_i is true. Hence, if the inequality (48) is satisfied at least for one number ρ, with (m + 1) ≤ ρ ≤ p, then the hypothesis H_i should be rejected, and the sample $x^{i} {(t) |}_{0}^{k}$ should be discarded at once, before the calculation of all the weights W_j(k) or $\overset{}{{\tilde{W}}_{j}} (k)$ is done.

In case if $b_{σ ρ} (z (t), t) = b_{σ ρ} (x (t), y (t), t)$ , with $E {{(b_{σ ρ})}^{2} | x^{t r} (t) |_{0}^{k}} \leq K_{1}^{2} = C o n s t$ , the following test can be used instead of (48):(49) $|V_{ρ ρ}^{(o b s)} (t_{k}) - {\tilde{b}}_{ρ ρ}^{i} (t_{k})| > [\overset{}{{\tilde{B}}_{ρ}^{i}} (t_{k}) + 3 K_{1} \sqrt{\frac{1}{λ} Δ t}]$ (49)

2.5. Incorporation of the derivative of the process of quadratic variation into the set of observed data. Systems with some additional precise observations

In case of continuous time of observation, the process of quadratic variation can be included into the set of observed processes, as it was demonstrated in Section 2.3.

For simplicity of notation, consider the system of one-dimensional processes x(t), y(t):

(50) $d x = m (x, t) d t + σ_{1} (x, t) d w_{1} (t),$ (50)

(51) $d y = h (x, t) d t + σ_{2} (x, t) d w_{2} (t),$ (51)

where w₁(t), w₂(t) are independent Wiener processes, the functions m(x, t), h(x, t) satisfy Lipschitz condition, and the functions $h (x, t), σ_{1} (x, t)$ , σ₂(x, t) are twice differentiable with respect to x. The process

(52) $V (t) = \int_{t_{0}}^{t} e^{- α (t - s)} σ_{2}^{2} (x (s), s) d s$ (52)

can be considered theoretically as an observed process, given the trajectory $y (s)$ for t₀ ≤ s ≤ t. Here $α = C o n s t \geq 0 .$ The process V(t) obeys the ordinary differential equation $\frac{d V}{d t} = - α V + σ_{2}^{2} (x (t), t) .$

For continuous time of observation and multidimensional diffusion processes $(x (t), y (t))$ , in some cases the possible values of x(t) are precisely determined by the equalities similar to (52) (and, consequently, x(t) can be restored precisely, at least theoretically), and in other cases the trajectory x(s) must be localized in some “layer”, where those equalities are satisfied, given the observed path of the process of quadratic variation, V(s), for t₀ ≤ s ≤ t.

Theoretically, it is possible to consider the derivative $\frac{d V (t)}{d t} =^{def} u (t)$ as an observed process, and incorporate it into the set of observed components: $(y (t), u (t))$ . For the system (50), (51), the process u(t) satisfies the following stochastic differential equation (in the sense of Ito):(53) $d u = - α u d t + 2 σ_{2} (x, t) \frac{\partial σ_{2}}{\partial x} [m (x, t) d t + σ_{1} (x, t) d w_{1} (t)] + [σ_{2} \frac{\partial^{2} σ_{2}}{\partial x^{2}} + {(\frac{\partial σ_{2}}{\partial x})}^{2}] σ_{1}^{2} (x, t) d t .$ (53)

The above SDE (53) could be augmented to the system of SDE (50)–(51) for x(t), y(t).

But in practice we cannot assume that the process u(t) (or, equivalently, the process $σ_{2}^{2} (x (t), t)$ ) is straightforwardly observed and known precisely. Practically, the process V(t_k) is not precisely given. Even in case if the deviations of “observed quadratic variation” V^(obs)(t_k) from V(t_k) are small, in obtaining the estimates for derivative, $\frac{d V (t)}{d t} |_{t = t_{k}} =^{def} u (t_{k})$ , the sought values could be inevitably corrupted by errors.

However, if additional observations of y(t) are available, namely, y(t₀ + sδt), with δt ≪ Δt, $s = 1, 2, \dots,$ then the estimates $\overset{}{\hat{u}} (t_{k})$ for the values of u(t_k) can be improved. Suppose that there is a special digital device that takes measurements of $y (t_{s}^{'})$ at times $t_{s}^{'} = t_{0} + s δ t$ , with $t_{k - 1} < t_{s}^{'} \leq t_{k}$ , and use them in order to calculate the estimate $\overset{}{\hat{u}} (t_{k})$ . For the moment, consider the case if we had $\overset{}{\hat{u}} (t_{k}) = u (t_{k})$ . Then the problem of filtering (based on observations y(t_k), u(t_k) taken at discrete times t_k, with small time steps Δt_k) could be solved effectively with the use of the new algorithm of simulation (17) and the estimates (26) and algorithms suggested in the present paper. For the system (50), (51), and (53), we obtain in (17): b₁₁ - b_1σc_σρb_ρ1 ≡ 0. That means that the sample paths $x^{i} {(t) |}_{1}^{n}$ would be deterministically continued given the observations $u (t) |_{1}^{n}$ and the initial values xⁱ(t₁), with i = 1, …, N. It is purposeful to choose the probability distribution of the initial random samples xⁱ(t₁) in such way that $E \{σ_{2}^{2} (x^{i} (t_{1}), t_{1})\} = u (t_{1})$ , with some random scattering. Meanwhile, the weights $\overset{}{{\tilde{W}}_{i}} (n)$ are being defined by (25), (26) with the use of observations ${y (t) |}_{1}$ . Thus, the a posteriori distribution is being determined not only by $u (t) |_{1}^{n}$ but also by $y (t) |_{1}^{n}$ . Then, as it was shown in Section 2.2, the estimate (26) converges with probability one (due to the Strong Law of Large Number, as N → ∞) to the sought a posteriori expectation that provides solution to the filtering problem. That algorithm could be implemented with the use of parallel computing.

The random errors of the obtained estimates $\hat{u} (t_{k})$ can be taken into account in the following way. The random deviations $\hat{u} (t_{k}) - u (t) =^{def} ε (t_{k})$ can be approximately described as independent of each other (for different k) Gaussian random values. Then $\hat{u} (t_{k})$ can be considered as measurements (taken at discrete times ) of the process $\hat{u} (t_{k})$ that satisfies the following SDE:(54) $d \overset{}{\hat{u}} = - α \overset{}{\hat{u}} d t + 2 σ_{2} (x, t) \frac{\partial σ_{2}}{\partial x} [m (x, t) d t + σ_{1} (x, t) d w_{1} (t)] + [σ_{2} \frac{\partial^{2} σ_{2}}{\partial x^{2}} + {(\frac{\partial σ_{2}}{\partial x})}^{2}] σ_{1}^{2} (x, t) d t + σ_{3} d w_{3} (t),$ (54)

where $w_{3} (t)$ is an independent Wiener process, and $σ_{3}^{2} (t_{k})$ is proportional to the variance of $ε (t_{k})$ , i.e. $E (ε^{2} (t_{k}))$ . Thus, σ₃ can be determined as some function of x, t. Then the solution of the filtering problem for the system (50), (51), (54) can be obtained in the similar way as above.

Note that the systems that are similar to (50), (51), (53) arise if the measurements of some given function $H (x (t), t)$ are known without errors. Here the function H(x, t) is supposed to be twice differentiable with respect to x. Then the solution of the filtering problem (given observations of y(t) and $H (x (t), t)$ at discrete times t_k = t₀ + kΔt, with small time step Δt) can be obtained with the use of the algorithm of simulation (17) and the estimates (26), similarly to the solution described above for the system (50), (51), (53).

2.6. The problem of nonlinear filtering of a signal. Change of the mathematical model that describes the process of observations

For simplicity of notations, consider one-dimensional processes x(t), y(t). In most of works devoted to nonlinear filtering problems the following hidden Markov model is accepted for description of a signal process x(t) and an observed process y(t):(55) $d x = m (x, t) d t + σ_{1} (x, t) d w_{1} (t),$ (55) (56) $d y = h (x (t), t) d t + σ_{2} d w_{2} (t),$ (56)

where w₁(t), w₂(t) are independent Wiener processes, σ₂ ≡ Const, the functions m(x, t), $h (x, t),$ σ₁(x, t) satisfy Lipschitz condition, and the functions h(x, t), σ₁(x, t) are twice differentiable with respect to x.

In many applications, it would be possible to describe an observed process Y(t) as follows:(57) $Y (t) = h (x (t), t) + u (t),$ (57)

where u(t) represents the errors of measurements. The “noise process” u(t) can be considered as a Gaussian Markov process, which can be described by the following stochastic differential equation:(58) $d u = - β u d t + σ_{3} d w_{3} (t),$ (58)

where β and σ₃ are constant, and w₃(t) is an independent Wiener process. Then $E \{u\} = 0$ , the variance $E \{u^{2}\} = σ_{3}^{2} / 2 β$ , and the correlation function $E \{u (t) u (t + τ)\} = (σ_{3}^{2} / 2 β) e^{- β τ}$ .

Denote $z_{1} (t) =^{def} x (t)$ , $z_{2} (t) =^{def} u (t)$ , $z_{3} (t) =^{def} Y (t)$ .

Then the model (55), (57), (58) of the signal process x(t) and the observed process Y(t) can be described as follows:(59) $d z_{1} = m (z_{1}, t) d t + σ_{1} (z_{1}, t) d w_{1} (t), d z_{2} = - β z_{2} d t + σ_{3} d w_{3} (t), d z_{3} = \frac{\partial h (z_{1}, t)}{\partial t} d t + \frac{\partial h}{\partial z_{1}} [m (z_{1}, t) d t + σ_{1} (z_{1}, t) d w_{1}] + \frac{1}{2} \frac{\partial^{2} h}{\partial z_{1}^{2}} σ_{1}^{2} (z_{1}, t) d t - β z_{2} d t + σ_{3} d w_{3} .$ (59)

Here z₃(t) represents the observed process, and z₁(t), z₂(t) are unobservable components of the process $z (t) =^{def} (z_{1} (t), z_{2} (t), z_{3} (t))$ . It is easy to see that the diffusion coefficients $b_{11} = σ_{1}^{2} (z_{1}, t)$ , $b_{13} = \frac{\partial h}{\partial z_{1}} σ_{1}^{2} (z_{1}, t)$ , b₁₂ = 0, $b_{22} = σ_{3}^{2}$ , $b_{23} = σ_{3}^{2}$ , $b_{33} = {(\frac{\partial h}{\partial z_{1}} σ_{1} (z_{1}, t))}^{2} + σ_{3}^{2}$ . Consider the matrix ∥ r_αβ ∥ = ∥ b_αβ - b_ασc_σρb_ρβ ∥ , which was introduced in (15): This is the variance–covariance matrix for the conditional probability density of the increments $Δ z_{1} = z_{1} (t + Δ t) - z_{1} (t)$ , $Δ z_{2} = z_{2} (t + Δ t) - z_{2} (t)$ , given z(t) and the increment $Δ z_{3} = z_{3} (t + Δ t) - z_{3} (t)$ . For the system (59) we find: $r_{11} = σ_{1}^{2} [1 - \frac{1}{1 + σ_{3}^{2} / {(\frac{\partial h}{\partial z_{1}} σ_{1})}^{2}}]$ , $r_{12} = - σ_{3}^{2} σ_{1}^{2} \frac{\partial h}{\partial z_{1}} {[{(\frac{\partial h}{\partial z_{1}} σ_{1})}^{2} + σ_{3}^{2}]}^{- 1}, r_{22} = σ_{3}^{2} [1 - σ_{3}^{2} {[{(\frac{\partial h}{\partial z_{1}} σ_{1})}^{2} + σ_{3}^{2}]}^{- 1}], a n d D e t ‖ r_{α β} ‖ \equiv 0 .$

The case when σ₃ is small corresponds to the small errors of measurements. All the entries of the matrix ∥ r_αβ ∥ (shown above) is small if σ₃ is small. That means that most of the samples $(z_{1} (t), z_{2} (t)) |_{0}^{n}$ (given $z_{3} {(t) |}_{0}^{n}$ ) simulated with the use of the algorithm (17) continue to be localized in the domain where the a posteriori probability density is concentrated. Besides, the test (48) can be implemented (unless the function $h (x, t)$ is linear with respect to $x$ and σ₁ ≡ Const) in order to discard a posteriori improbable samples with the use of “observed quadratic variation” (for the observations made in discrete time) of the observed process z₃(t).

Thus, although the dimensionality of the unobservable process increased for the model of description (59) in comparison with the model (55), (56), the problem of nonlinear filtering (with observations made in discrete time) will be solved effectively with the use of the new Monte Carlo estimates (23)–(27) and with the use of the new algorithm of simulation (17), derived in the present paper.

Note that the model (59) is a hidden Markov model where unobservable process $(z_{1} (t), z_{2} (t))$ is a Markov process in itself. But if the samples $(z_{1} (t), z_{2} (t)) |_{0}^{n}$ were simulated as trajectories of the Markov process $(z_{1} (t), z_{2} (t))$ in itself (as it is suggested in particle filters, developed and studied in the literature), then, in case of the low “level of intensity of noise” (i.e. if σ₃ is small) the large part of generated samples would be localized in the domain of low a posteriori probability (and that part is growing when $σ_{3}$ decreases). Then, in order to obtain some feasible accuracy of filtering, the number N should be increased significantly if the value of σ₃ decreased. Meanwhile, the large amount of calculations would be wasted in vain for processing of the samples of low a posteriori probability. Thus, this example (the system (59)) also proves the advantage of the new algorithms, developed in the present paper, in comparison with the known algorithms for hidden Markov models.

3. Conclusion

In the Conclusion, the new results obtained and presented by this author are shortly summarized and underscored.

(1)	It was analytically proved by this author at first in the book (Khazen, Citation2009) that the increment of a multidimensional continuous diffusion Markov process z(t), $Δ z = z (t + Δ t) - z (t)$ , in small time interval Δt, given z(t), obeys asymptotically to the multidimensional Gaussian probability distribution (with the first and second moments that are determined by its drift and diffusion coefficients; see expression (13)). As a corollary, the analytical expressions (14)–(16) are obtained, which describe Gaussian conditional probability density for the increment Δx of the unobservable components given observation of Δy and given the value z(t); here Δz = (Δx, Δy). The Gaussian probability density for Δy is also obtained (see (16)). In the present paper the new, precise algorithm (in closed analytical form) is obtained for simulation of samples x(t_k) when ${x (t) \|}_{0}^{k - 1}$ and ${y (t) \|}_{0}^{k}$ are given (see (17)). It is important that the influence of the current “next” observation $y (t_{k})$ is taken into account when samples $x (t_{k})$ are being simulated.
(2)	The new Monte Carlo estimates of functions $f (x (t_{n}))$ given ${y (t) \|}_{0}^{n}$ are obtained and presented in explicit, precise, closed analytical form, under the condition that the samples ${x (t) \|}_{0}^{n}$ are being simulated with the use of the new proposed algorithm (17) (see (26), (25), (27)). The convergence of the Monte Carlo estimates (as the number N of samples increases, N → ∞) is guaranteed by the Strong Law of Large Number (the Theorem of Kolmogorov). For the first time, the estimates, (26), (25), (27), are obtained for the general case of multidimensional diffusion Markov process $(x (t), y (t))$ . In the particular case when x(t) is a Markov process in itself and the process $(x (t), y (t))$ represents the simple Hidden Markov Model described by (28), the estimate (26) takes the form that is in agreement with the estimate that was obtained earlier in this particular case.
(3)	For the first time, some tests are developed in order to discard at once the sample sequences ${x (t) \|}_{0}^{n}$ that are highly improbable given the observation ${y (t) \|}_{0}^{n}$ (see Section 2.4). This is important in order to prevent the “degeneration” of the set of considered samples. These tests are being performed independently for each sample ${x (t) \|}_{0}^{n}$ , and that is important also for the implementation with the use of parallel computing.
(4)	Some branching sampling procedures are developed. The standard Sequential Importance Resampling (SIR) procedure is generalized for the general case of partially observed multidimensional diffusion Markov process, when the new algorithm of simulating of samples (17) should be used. Some new versions of branching resampling procedures are proposed, which can be easier implemented with the use of parallel computing.
(5)	Analytical investigation of observed quadratic variations is developed (in case when diffusion coefficients of observed components, y(t), depend on unobservable components x(t)) (see Section 2.5). For the first time, the analytical formulae (in explicit and closed form) that determine the dispersions of deviations of observed quadratic variations are obtained (see (38), (39)), and it is also proved that the deviations obey asymptotically (when Δt → 0) to the Gaussian distribution.
(6)	The important particular case of nonlinear filtering of a signal is considered in Section 2.6. Significant advantage of the new algorithms and estimates obtained in the present paper in comparison with the particle filters suggested earlier in the literature for Hidden Markov Models is demonstrated.
(7)	In Section 2.5 it is demonstrated that the new filtering algorithms, developed in the present paper, provide opportunity to incorporate the observed quadratic variations into the set of observed data. They also provide opportunity to use additional precise observations in case if such observations are available. These results are new, and they also confirm the advantage of the proposed new solution of the nonlinear filtering problem.

The obtained new algorithms and estimates extend the range of applications of particle filtering beyond Hidden Markov Models and improve performance.

Additional information

Funding

Funding. The author received no direct funding for this research.

Notes on contributors

Ellida M. Khazen

Ellida M. Khazen graduated from Department of Mathematics of Lomonosov Moscow State University (Moscow, Russia, USSR) with Honor Diploma in 1959 (Master Degree). She received her PhD Degree in Mathematics from Department of Mathematics of Lomonosov Moscow State University in 1962. She received her Doctor Degree in Applied Mathematics in 1994. She was working as Senior Scientist Researcher Mathematician at Moscow Scientific Research Institute of Device Automation in 1962–1996, and also as a Visiting Lecturer at Department of Mathematics of Lomonosov Moscow State University and as a Visiting Lecturer at Moscow Institute of Radio Engineering, Electronics and Automation (MIREA). She has published two scientific monographs (in Russian and in English), chapters in books (in Russian), and more than 50 scientific research papers in the areas of the theory of turbulence onset, the theory of random processes, filtering and signal detection, optimal statistical decisions and statistical sequential analysis, informational estimation of risks, and the theory of optimal control.

References

Bernstein, S. N. (1934/1964). Principes de la theorie des equations differentielles stochastiques. Proceeding of the Phys.-Mat. Steklov Institute, 5, 95–124. Republished in Collected works of S.N. Bernstein (Vol. 4). Nauka: Moscow Publishing House ( in Russian).
Google Scholar
Bernstein, S. N. (1938/1964). Equations differentielles stochastiques. Actualités Scientifiques et Industrielles, 738, 5– 31. Conference International Sci. math. Univ. Geneve. Theorie des probabilites, V. Les fonctions aleatoires. Republished in Collected works of S.N. Bernstein, vol.4. Nauka: Moscow Publishing House. ( in Russian).
Google Scholar
Carvalho, H., Del Moral, P., Monin, A., & Salut, G. (1997). Optimal nonlinear filtering in GPS/INS integration. IEEE Transactions on Aerospace and Electronic Systems, 33, 835–850.10.1109/7.599254
Web of Science ®Google Scholar
Crisan, D., & Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on Signal Processing, 50, 736–746.10.1109/78.984773
Web of Science ®Google Scholar
Del Moral, P. (1998). Measure-valued processes and interacting particle systems. Application to nonlinear filtering problems. Annals of Applied Probability, 8, 438–495.
Web of Science ®Google Scholar
Del Moral, P., & Doucet, A. (2014). Particle methods: An introduction with applications. ESAIM: Proceedings, 44, 1–46.10.1051/proc/201444001
Google Scholar
Doob, J. L. (1953). Stochastic processes. John Wiley & Sons, Chapman & Hall, New York, NY, London.
Google Scholar
Doucet, A., & Johansen, A. (2011). A tutorial on particle filtering and smoothing: Fifteen years later. In D. Crisan & B. Rozovsky (Eds.), The Oxford handbook of nonlinear filtering (39 pp.). Oxford: Oxford University Press. Retrieved from http://www.stats.ox.ac.uk/~doucet/doucet_johansen_tutorialPF2011.pdf
Google Scholar
Doucet, A., Godsill, S., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10, 197–208.10.1023/A:1008935410038
Web of Science ®Google Scholar
Doucet, A., Freitas, N., & Gordon, N. (Eds.). (2001). Sequential Monte Carlo methods in practice. Information science and statistics. New York, NY: Springer Verlag.
Google Scholar
Gnedenko, B. V. (1976). The theory of probability. Moscow: Mir.
Google Scholar
Hendeby, G., Karlsson, R., Gustafsson, F. (2010). Particle Filtering: The Need for Speed. EURASIP Journal of Advances in Signal Processing, 2010, Article ID 181403. doi:10.1155/2010/181403
Web of Science ®Google Scholar
Hull, J. C. (2000). Options, Futures, & Other derivatives (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Khazen, E. M. (1968). Methods of optimal statistical decisions and optimal control problems (in Russian). Moscow: Soviet Radio.
Google Scholar
Khazen, E. M. (1971). On stochastic differential equations for the a posteriori probability distribution in problems of adaptive filtering and signal detection. Automation and Remote Control, 32, 1776–1782.
Web of Science ®Google Scholar
Khazen, E. M. (1977). Chapters 2, 3, 5 by Khazen, E.M., pp. 47-66, 67-102, 193-243, in the book: Petrov, B.N., Ulanov, G.M., Ulyanov, S.V., Khazen, E.M. Information theory in processes of optimal control and organization. Nauka: Moscow Publishing House. ( in Russian).
Google Scholar
Khazen, E. M. (2009). Methods of optimal statistical decisions, optimal control, and stochastic differential equations. Bloomington, IN: Xlibris.
Google Scholar
Kolmogorov, A.N. (1931/1986). Uber die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Math. Ann., Bd. 104, S. 413-458. Russian translation in “Uspekhi Mat. Nauk”, 1938, issue 5, pages 5-41. Republished in Collected works of A.N. Kolmogorov, The Theory of probability and mathematical statistics. Nauka: Moscow Publishing House. ( in Russian).
Google Scholar
Kolmogorov, A.N. (1933/1950). Foundations of the Theory of Probability. Published at first in German as “Grundbegriffe der Wahrscheinlichkeitsrechnung” in (1933). Russian editions in 1936, 1974 (English ed.). New York, NY: Chelsea.
Google Scholar
Kolmogorov, A. N. (1933/1986). Zur Theorie der stetigen zufalligen Prozesse. Math. Ann., Bd. 108, S. 149–160. Republished in Collected works of A.N. Kolmogorov, The Theory of Probability and Mathematical Statistics. Nauka: Moscow Publishing House. ( in Russian).
Google Scholar
Kushner, H. (1971). Introduction to stochastic control. New York,NY: Holt, Rinchart, and Winston.
Google Scholar
Míguez, J., Crisan, D., & Djurić, P. (2013). On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization. Statistics and Computing, 23, 91–107.10.1007/s11222-011-9294-4
Web of Science ®Google Scholar
Thrun, S., Fox, D., Burgard, W., & Dellaert, F. (2001). Robust Monte Carlo localization for mobile robots. Artificial Intelligence, 128, 99–141.10.1016/S0004-3702(01)00069-8
Web of Science ®Google Scholar

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Abstract

Public Interest Statement

1. Introduction

2. Derivation of new recursive algorithms for Monte Carlo simulation and filtering of unobservable components of multidimensional diffusion Markov processes

2.1. Simulation of trajectories of unobservable components. Analytical investigation of a multidimensional diffusion Markov process observed at discrete times

2.2. Estimating of a function f(x(t_n))

2.3. Analytical investigation of observed quadratic variations

2.4. Detection and rejection of highly a posteriori improbable samples

2.5. Incorporation of the derivative of the process of quadratic variation into the set of observed data. Systems with some additional precise observations

2.6. The problem of nonlinear filtering of a signal. Change of the mathematical model that describes the process of observations

3. Conclusion

Notes on contributors

Ellida M. Khazen

References

Information for

Open access

Opportunities

Help and information

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Abstract

Public Interest Statement

1. Introduction

2. Derivation of new recursive algorithms for Monte Carlo simulation and filtering of unobservable components of multidimensional diffusion Markov processes

2.1. Simulation of trajectories of unobservable components. Analytical investigation of a multidimensional diffusion Markov process observed at discrete times

2.2. Estimating of a function f(x(tn))

2.3. Analytical investigation of observed quadratic variations

2.4. Detection and rejection of highly a posteriori improbable samples

2.5. Incorporation of the derivative of the process of quadratic variation into the set of observed data. Systems with some additional precise observations

2.6. The problem of nonlinear filtering of a signal. Change of the mathematical model that describes the process of observations

3. Conclusion

Additional information

Funding

Notes on contributors

Ellida M. Khazen

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

2.2. Estimating of a function f(x(t_n))