Full article: A numerical integration-based Kalman filter for moderately nonlinear systems

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This paper introduces a computationally efficient data assimilation scheme based on Gaussian quadrature filtering that potentially outperforms current methods in data assimilation for moderately nonlinear systems. Moderately nonlinear systems, in this case, are systems with numerical models with small fourth and higher derivative terms. Gaussian quadrature filters are a family of filters that make simplifying Gaussian assumptions about filtering pdfs in order to numerically evaluate the integrals found in Bayesian data assimilation. These filters are differentiated by the varying quadrature rules to evaluate the arising integrals. The approach we present, denoted by Assumed Gaussian Reduced (AGR) filter, uses a reduced order version of the polynomial quadrature first proposed in Ito and Xiong [Citation2000. Gaussian filters for nonlinear filtering problems. IEEE Trans. Automat. Control. 45, 910–927]. This quadrature uses the properties of Gaussian distributions to form an effectively higher order method increasing its efficiency. To construct the AGR filter, this quadrature is used to form a reduced order square-root filter, which will reduce computational costs and improve numerical robustness. For cases of sufficiently small fourth derivatives of the nonlinear model, we demonstrate that the AGR filter outperforms ensemble Kalman filters (EnKFs) for a Korteweg-de Vries model and a Boussinesq model.

Keywords:

1. Introduction

Data assimilation is the process of estimating the system state given previous information and current observational information. The methods typically used for data assimilation in the ocean and atmosphere are variational schemes (Talagrand and Courtier, Citation1987; Daley, Citation1991; Courtier et al., Citation1998) and ensemble Kalman filters (EnKFs) (Evensen, Citation2003; Bishop et al., Citation2001). Both methods are built on linear hypotheses and have led to useful results in quasi-nonlinear situations. These methods are optimal for the case where the observations and model error are Gaussian. These methods have been successful in numerical weather prediction (NWP) (Buehner et al., Citation2010; Kuhl et al., Citation2013) but are suboptimal for nonlinear model dynamics as the Fokker-Plank equations that govern the evolution of a pdf may only be solved exactly for certain cases. This sub-optimality has led to a proliferation of methods to perform data assimilation each with their own advantages (see, for example, Daley, Citation1991; Anderson, Citation2001; Bishop et al., Citation2001; Sondergaard and Lermusiaux, Citation2013; Poterjoy, Citation2016).

While it is well known that atmospheric and oceanic models may have non-Gaussian statistics (Morzfeld and Hodyss, Citation2019), computational resources limit our ability to fully resolve the data assimilation problem. It was shown in Miyoshi et al., (2014) that ensembles need to have on the order of a thousand members to represent non-Gaussian prior pdfs in an EnKF for a general circulation model, however, typical ensemble sizes are on the order of a hundred (Houtekamer et al., Citation2014). Additionally, computational constraints lead to data assimilation systems using lower resolutions than the forecasting model and are therefore more linear. In targeting this specific application, algorithmic efficiencies may be found.

Gaussian quadrature filters explicitly assume conditional pdfs are Gaussian in the Bayesian filtering equations. Then powerful numerical integration techniques are used, e.g. Gaussian quadrature and cubature, to evaluate the resulting integral equations. The first of these types of filters appeared in the early 2000s with Ito and Xiong (Citation2000) and Wu et al. (Citation2006) but it was not until the cubature Kalman filter (Arasaratnam and Haykin, Citation2009) that Gaussain quadrature filters became popular. Since then, they have seen extensive use in radar tracking (Haykin et al., Citation2011), traffic flow (Liu et al., Citation2017), power systems (Sharma et al., Citation2017), etc.; however, they have not enjoyed the same popularity in atmospheric and oceanographic sciences. This is likely due to their expense as quadrature rules require many evaluations of the nonlinear model. The central difference filter (CDF) (Ito and Xiong, Citation2000) uses low-order polynomial quadrature requiring twice the number of model evaluations as the size of the state space. Higher order quadrature methods require even more model evaluations. The CDF has successfully outperformed Extended Kalman Filter (EKF) (Ito and Xiong, Citation2000), unscented Kalman filter (UKF) (Ito and Xiong, Citation2000), and 4 D-Var (King et al., Citation2016) for low dimensional problems. The nonlinear filter presented here, the Assumed Gaussian Reduced (AGR) filter, is essentially a square root version of the CDF with dynamical sampling.

The AGR filter uses low-order polynomial quadrature that takes advantage of the properties of Gaussian distributions to achieve an effective higher order of accuracy. To further reduce the computational costs of the filter, singular value sampling is used. These two techniques make the AGR filter efficient in terms of nonlinear model evaluations giving it potential for atmospheric and oceanic applications. The algorithm for the AGR filter is similar to that of a square-root EnKF but with a different prediction step. This prediction step will cost more computationally to perform than a typical EnKF prediction step in terms of matrix and vector operations. However, the AGR filter formulation of this prediction step will be more accurate for numerical models with small fourth order derivatives, i.e., moderately nonlinear systems.

This manuscript is organized as follows: Section 2 begins with a brief review of Bayesian filtering followed by details regarding assumptions about the associated pdfs to arrive at a discrete filter in terms of Gaussian integrals. The evaluation of these Gaussian integrals is discussed in Section 3 in terms of low-rank polynomial quadrature for scalar and multi-dimensional problems. Results are presented relating to the performance of this quadrature to help to define the scenarios in which this filter should be used. The algorithm for the full AGR filter is presented in Section 4. Section 5 uses a one-dimensional Korteweg-de Vries model and a two-dimensional Boussinesq model to compare the performance of the AGR filter versus a square root EnKF filter. Final remarks are in Section 6. The appendix contains the formulas used in Sections 2 and 3.

2. Linking Bayesian filtering to Gaussian quadrature filters

We begin our discussion with a review of Bayesian filtering in order to highlight the differences between common types of nonlinear filters. The aim of Bayesian filtering is to estimate the pdf $p (x_{t} | Y_{T}),$ where x_t is the current state at time t and $Y_{T} = {y_{1}, . ., y_{t}}$ contains the previous observations up to time t. The Bayesian filter is most commonly developed as a recursive filter formed by first applying Bayes’ rule to $p (x_{t} | y_{t})$ and then applying the Markovian properties of the observations, i.e. the property that observations depend only on the current state. The filter was first described in Ho and Lee (Citation1964) and is discussed detail in Särkkä (Citation2013) and Chen (2003). This filter is typically divided into two steps: the first step, which we will refer to as the prediction step, computes the prior distribution using preliminary information given by the Chapman–Kolmogorov equation (2.1) $p (x_{t} | Y_{T - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} .$ (2.1)

The second step, which we will refer to as the correction step, computes the posterior distribution (2.2) $p (x_{t} | Y_{T}) = \frac{1}{Z_{t}} p (y_{t} | x_{t}) p (x_{t} | Y_{T - 1})$ (2.2) where $Z_{t} = \int p (y_{t} | x_{t}) p (x_{t} | Y_{T - 1}) d x_{t}$ is the normalization constant. The exact solutions of Equation(2.1)(2.1) $p (x_{t} | Y_{T - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} .$ (2.1) and Equation(2.2)(2.2) $p (x_{t} | Y_{T}) = \frac{1}{Z_{t}} p (y_{t} | x_{t}) p (x_{t} | Y_{T - 1})$ (2.2) are unknown except in special cases. In particular, for linear state dynamics where the prior pdf $p (x_{t} | x_{t - 1})$ is Gaussian and the measurement likelihood $p (y_{t} | x_{t})$ is Gaussian, the filter Equation(2.2)(2.2) $p (x_{t} | Y_{T}) = \frac{1}{Z_{t}} p (y_{t} | x_{t}) p (x_{t} | Y_{T - 1})$ (2.2) has an exact solution given by the Kalman filter (Kalman, Citation1960). Otherwise Equation(2.2)(2.2) $p (x_{t} | Y_{T}) = \frac{1}{Z_{t}} p (y_{t} | x_{t}) p (x_{t} | Y_{T - 1})$ (2.2) may be approximated using a particle filter (Särkkä, Citation2013; Poterjoy, Citation2016). In practice, the full pdf $p (x_{t} | Y_{T})$ is not used and instead only its first two moments, the mean and covariance, are used. Under Gaussian assumptions this leads to what are referred to as Kalman-type filters.

To summarize the relationship between the Bayesian filter in Equation(2.1)(2.1) $p (x_{t} | Y_{T - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} .$ (2.1) and Equation(2.2)(2.2) $p (x_{t} | Y_{T}) = \frac{1}{Z_{t}} p (y_{t} | x_{t}) p (x_{t} | Y_{T - 1})$ (2.2) and Kalman-type filters we begin by considering the system given by (2.3) $x_{t} = f (x_{t - 1}) + w_{t}$ (2.3) with the observation process (2.4) $y_{t} = H x_{t} + v_{t}$ (2.4) where $x \in R^{n}, y \in R^{d},$ f is the model, H is the linear map between the state space and the observation space, w is the Gaussian model error with covariance Q, and v is the Gaussian observation error with covariance R. At time t, the mean of the predictive distribution Equation(2.1)(2.1) $p (x_{t} | Y_{T - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} .$ (2.1) is given by (2.5) $x_{t}^{b} = E [x_{t}, Y_{T - 1}]$ (2.5) (2.6) $= \int_{R^{n}} x_{t} p (x_{t} | Y_{T - 1}) d x_{t}$ (2.6) (2.7) $= \int_{R^{n}} f (x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (2.7) where b in $x_{t}^{b}$ indicates that x is the background estimate of the mean at time t. EquationEquation (2(2.1) $p (x_{t} | Y_{T - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} .$ (2.1) .7) is computed using Equation(A.3)(A.3) $= \int_{R^{n}} f (x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (A.3) from Appendix A where $E [\cdot]$ is the expectation.

Similarly, the covariance of Equation(2.1)(2.1) $p (x_{t} | Y_{T - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} .$ (2.1) is given by (2.8) $P_{t}^{b} = E [(x_{t} - x_{t}^{b}) {(x_{t} - x_{t}^{b})}^{T}]$ (2.8) (2.9) $= \int_{R^{n}} (x_{t} - x_{t}^{b}) {(x_{t} - x_{t}^{b})}^{T} p (x_{t} | Y_{t - 1}) d x_{t}$ (2.9) (2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) using Equation(A.6)(A.6) $= \int_{R^{n}} f (x_{t - 1}) f {(x_{t - 1})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q .$ (A.6) . The equations for the prediction step, Equation(2.7) and (2.10)(2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) , are both a consequence of the model error w being Gaussian. To approximate the correction step, it is first assumed that the joint distribution of $(x_{t}, {\hat{y}}_{t})$ is Gaussian, more specifically, (2.11) $p (x_{t}, y_{t} | Y_{T - 1}) = p (y_{t} | x_{t}) p (x_{t} | Y_{t - 1})$ (2.11) (2.12) $= N ((\begin{matrix} {x_{t}}^{b} \\ {\hat{y}}_{t}^{b} \end{matrix}), (\begin{matrix} {P_{t}}^{b} & {P_{t}}^{x y} \\ {({P_{t}}^{x y})}^{T} & P^{y} \end{matrix}))$ (2.12) (2.13) $= N ((\begin{matrix} {x_{t}}^{b} \\ {\hat{y}}_{t}^{b} \end{matrix}), (\begin{matrix} {P_{t}}^{b} & {P_{t}}^{b} H^{T} \\ H {P_{t}}^{b} & H {P_{t}}^{b} H^{T} + R \end{matrix}))$ (2.13) where ${\hat{y}}_{t}^{b}$ is the estimated observations computed via Equation(2.4)(2.4) $y_{t} = H x_{t} + v_{t}$ (2.4) using $x_{t}^{b}, P_{t}^{x y}$ is the cross-covariance between $x_{t}^{b}$ and ${\hat{y}}_{t}^{b},$ and $P_{t}^{y}$ is the covariance of ${\hat{y}}_{t}^{b} .$ The observation process ${\hat{y}}_{t}^{b}$ and $P_{t}^{y}$ are computed similar to Equation(2.7)(2.7) $= \int_{R^{n}} f (x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (2.7) and Equation(2.10)(2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) . The computation of the cross covariance $P_{t}^{x y}$ Equation(2.13)(2.13) $= N ((\begin{matrix} {x_{t}}^{b} \\ {\hat{y}}_{t}^{b} \end{matrix}), (\begin{matrix} {P_{t}}^{b} & {P_{t}}^{b} H^{T} \\ H {P_{t}}^{b} & H {P_{t}}^{b} H^{T} + R \end{matrix}))$ (2.13) may be found in the appendix (Equation (A.12)). Then it follows from Equation(2.13)(2.13) $= N ((\begin{matrix} {x_{t}}^{b} \\ {\hat{y}}_{t}^{b} \end{matrix}), (\begin{matrix} {P_{t}}^{b} & {P_{t}}^{b} H^{T} \\ H {P_{t}}^{b} & H {P_{t}}^{b} H^{T} + R \end{matrix}))$ (2.13) that the conditional distribution of x_t given y_t in Equation(2.2)(2.2) $p (x_{t} | Y_{T}) = \frac{1}{Z_{t}} p (y_{t} | x_{t}) p (x_{t} | Y_{T - 1})$ (2.2) is approximated in terms of the mean $x_{t}^{a}$ and covariance $P_{t}^{a}$ $\begin{matrix} p (x_{t} | y_{t}, Y_{t - 1}) = p (x_{t} | Y_{T}) \\ = N (x_{t} | Y_{T}) \end{matrix}$ where the mean and covariance are given by the Kalman equations (2.14) $x_{t}^{a} = x_{t}^{b} + K_{t} (y_{t} - H x_{t}^{b})$ (2.14) (2.15) $P_{t}^{a} = (I - K_{t} H) P_{t}^{b}$ (2.15) (2.16) $K_{t} = P_{t}^{b} H^{T} {(H P_{t}^{b} H^{T} + R)}^{- 1}$ (2.16) where $x_{t}^{a}$ is the mean at the analysis (denoted by the a) at time t, y are the observations, and K_t is the Kalman gain. Note that in the above Kalman-type filter framework we have assumed the observation operator H is linear, however, this need not be the case (Ito and Xiong, Citation2000; Särkkä, Citation2013). In general, solving Equation(2.7) and (2.10)(2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) explicitly is intractable for large problems, including the large problems found in geosciences. One strategy for approximating Equation(2.7) and (2.10)(2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) is to use sampling which leads to the expressions for the sample mean and covariance used in EnKFs. Another strategy is to make the further simplifying assumption that $p (x_{t - 1} | Y_{T - 1})$ is Gaussian, arriving at a particular type of assumed density filter referred to as a Gaussian filter in literature. Since EnKF filters also contain Gaussian assumptions, to differentiate these filters we will refer to Gaussian filters as Gaussian quadrature filters.

To form the basis for Gaussian quadrature filters, we will make the additional simplifying assumption (2.17) $p (x_{t - 1} | Y_{T - 1}) = N (x_{t - 1} | Y_{T - 1}),$ (2.17) i.e., that our prior distribution is Gaussian. With this additional assumption, Equations Equation(2.7) and (2.10)(2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) simplify and we arrive at the algorithm

(1) Prediction step: (2.18) $x_{t}^{b} = \int_{R^{n}} f (x_{t - 1}) N (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (2.18) (2.19) $P_{t}^{b} = \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} N (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q .$ (2.19)

(2) Correction step: $\begin{matrix} K_{t} = P_{t}^{b} H^{T} {(R + H P_{t}^{b} H^{T})}^{- 1} \\ x_{t}^{a} = x_{t}^{b} + K_{t} (y_{t} - H x_{t}^{b}) \\ P_{t}^{a} = (I - K_{t} H) P_{t}^{b} . \end{matrix}$

With this formulation it is easily verified that for a linear f(x) in Equation(2.3)(2.3) $x_{t} = f (x_{t - 1}) + w_{t}$ (2.3) , we arrive at the Kalman filter equations exactly. In this regard, the Gaussian quadrature filters can be seen as a nonlinear extension of the Kalman filter. Other nonlinear filters such as the extended Kalman filter or UKF (Julier et al., Citation2000) may also be formulated using this framework (Särkkä, Citation2013).

3. Gaussian integration

The distinct feature of Gaussian quadrature filters is the evaluation of the Gaussian integrals Equation(2.18) and (2.19)(2.19) $P_{t}^{b} = \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} N (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q .$ (2.19) which are multidimensional integrals of the form (3.1) $I = \int_{R^{n}} F (x_{t - 1}) N (x_{t - 1} | x_{t - 1}^{a}, P_{t - 1}^{a}) d x_{t - 1}$ (3.1) where $F (\cdot)$ is a general function and $N (x_{t - 1} | x_{t - 1}^{a}, P_{t - 1}^{a})$ is equivalent to $N (x_{t - 1} | Y_{T - 1}) .$ These types of filters are differentiated by the type of quadrature they use, for example, the Gauss-Hermite Kalman filter (Ito and Xiong, Citation2000; Wu et al., Citation2006), the cubature Kalman filter (Arasaratnam and Haykin, Citation2009), and the central difference filter (Ito and Xiong, Citation2000). The quadrature rules in these methods entail model evaluations and the computation of weights requiring a trade-off between cost and performance. Higher order methods provide greater numerical accuracy but require substantially more model evaluations which may be cost prohibitive. We will use low-order polynomial quadrature to balance computational cost and performance.

3.1. Gaussian pdf integration: scalar case

To discuss the evaluation of the Gaussian integrals of the form Equation(3.1)(3.1) $I = \int_{R^{n}} F (x_{t - 1}) N (x_{t - 1} | x_{t - 1}^{a}, P_{t - 1}^{a}) d x_{t - 1}$ (3.1) , we begin with the scalar case given by $I = \int_{R} F (x_{t - 1}) N (x_{t - 1} | x_{t - 1}^{a}, P_{t - 1}^{a}) d x_{t - 1} .$

Using the change of variables $x_{t - 1} = \sqrt{P} η + x_{t - 1}^{a},$ where $\sqrt{P}$ is the square root of $P_{t - 1}^{a},$ we arrive at the integral in standard form given by $I = \int_{R} \tilde{F} (η) N (η | 0, 1) d η$ where $\tilde{F} (η) = F (\sqrt{P} η + x_{t - 1}^{a}) .$ This form of the Gaussian integral allows for the development of explicit formulas to evaluate it. We approximate $F (\cdot)$ by a second-degree polynomial $γ (s)$ given by (3.2) $γ (s) = \tilde{F} (0) + a_{1} s + \frac{1}{2} a_{2} s^{2}$ (3.2) where (3.3) $a_{1} = \frac{\tilde{F} (d) - \tilde{F} (- d)}{2 d} and a_{2} = \frac{\tilde{F} (d) - 2 \tilde{F} (0) + \tilde{F} (- d)}{d^{2}}$ (3.3) where d > 0 is the step size. Note that because of the change in variables the first and second derivatives, a₁ and a₂, are in the direction of $\sqrt{P} .$ Then using Equation(3.2)(3.2) $γ (s) = \tilde{F} (0) + a_{1} s + \frac{1}{2} a_{2} s^{2}$ (3.2) in Equation(2.18)(2.18) $x_{t}^{b} = \int_{R^{n}} f (x_{t - 1}) N (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (2.18) , the prior mean estimate is given by (3.4) $x_{t}^{b} = \int_{R} f (\sqrt{P} η + x_{t - 1}^{a}) N (η | 0, 1) d η$ (3.4) (3.5) $= \int_{R} (f (\sqrt{P} \cdot 0 + x_{t - 1}^{a}) + a_{1} η + \frac{1}{2} a_{2} η^{2}) \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} η^{2}} d η$ (3.5) (3.6) $= f (x_{t - 1}^{a}) + \frac{1}{2} a_{2} .$ (3.6)

The odd term in Equation(3.5)(3.5) $= \int_{R} (f (\sqrt{P} \cdot 0 + x_{t - 1}^{a}) + a_{1} η + \frac{1}{2} a_{2} η^{2}) \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} η^{2}} d η$ (3.5) zeros out and the mean estimate is now the previous mean propagated forward with a second-order correction term. Similarly, using Equation(3.2) and (3.6)(3.6) $= f (x_{t - 1}^{a}) + \frac{1}{2} a_{2} .$ (3.6) , we may compute the prior covariance prediction Equation(2.19)(2.19) $P_{t}^{b} = \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} N (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q .$ (2.19) as (3.7) $P_{t}^{b} = \int_{R} (f (\sqrt{P} η + x_{t - 1}^{a}) - x_{t}^{b})^{2} N (η | 0, 1) d η + Q$ (3.7) (3.8) $= \int_{R} {(a_{1} η + \frac{1}{2} a_{2} η^{2} - \frac{1}{2} a_{2})}^{2} \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} η^{2}} d η + Q$ (3.8) (3.9) $= a_{1}^{2} + \frac{1}{2} a_{2}^{2} + Q .$ (3.9)

The variance is now in terms of the first and second derivatives of the model. The primary cost of evaluating Equation(3.6) and (3.9)(3.9) $= a_{1}^{2} + \frac{1}{2} a_{2}^{2} + Q .$ (3.9) comes from computing a₁ and a₂ via Equation(3.3)(3.3) $a_{1} = \frac{\tilde{F} (d) - \tilde{F} (- d)}{2 d} and a_{2} = \frac{\tilde{F} (d) - 2 \tilde{F} (0) + \tilde{F} (- d)}{d^{2}}$ (3.3) which requires three evaluations of the model Equation(2.3)(2.3) $x_{t} = f (x_{t - 1}) + w_{t}$ (2.3) : $f (x_{t - 1}^{a}), f (x_{t - 1}^{a} - d \sqrt{P}),$ and $f (x_{t - 1}^{a} + d \sqrt{P}) .$

One of the reasons this method is effective is that the quadrature error of the mean estimation in (3.6) is based on the fourth derivative of the model f even though we are using a second-order polynomial approximation, see Equation(B.3)(B.3) $= \frac{3}{24} a_{4} + \dots$ (B.3) in Appendix B. This is due to the fact that odd terms drop out in Gaussian polynomial integration. Meanwhile, the quadrature error in the estimation of the covariance, see Equation(B.6)(B.6) $= \frac{15}{36} a_{3}^{2} + \frac{105}{576} a_{4}^{2} + \dots .$ (B.6) , is related to the size of the third derivative of f.

3.2. Non-Gaussian pdf integration

For comparison, we now consider the case of Equation(2.7) and (2.10)(2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) without making a Gaussian assumption. To simplify our notation, we will denote $p (x_{t - 1} | x_{t - 1}^{a}, P_{t - 1}^{a})$ the prior pdf by $p (x_{t - 1}) .$ Assume at time t we have $x_{t - 1}$ sampled from $p (x_{t - 1})$ we may then determine the expected error in the mean and variance at time t by propagating samples drawn from $p (x_{t - 1})$ forward, and determining their error (see Section 3.3). As in the previous case where $p (x_{t - 1})$ is Gaussian, we will relate the error to the moments of $p (x_{t - 1}) .$ This is most conveniently done through a Taylor-series expansion of Equation(2.3)(2.3) $x_{t} = f (x_{t - 1}) + w_{t}$ (2.3) . To this end, note that (3.10) $x_{t} = f (μ_{t - 1}) + \frac{d f}{d x_{t - 1}} (x_{t - 1} - μ_{t - 1}) + \frac{1}{2} \frac{d^{2} f}{d x_{t - 1}^{2}} {(x_{t - 1} - μ_{t - 1})}^{2} + \dots$ (3.10) where $μ_{t - 1}$ is the true mean of $p (x_{t - 1}) .$ Applying Equation(3.10)(3.10) $x_{t} = f (μ_{t - 1}) + \frac{d f}{d x_{t - 1}} (x_{t - 1} - μ_{t - 1}) + \frac{1}{2} \frac{d^{2} f}{d x_{t - 1}^{2}} {(x_{t - 1} - μ_{t - 1})}^{2} + \dots$ (3.10) to the expectation of x_t gives (3.11) $μ_{t} = E [x_{t}]$ (3.11) (3.12) $= \int_{R} f (x_{t - 1}) p (x_{t - 1}) d x_{t - 1}$ (3.12) (3.13) $= f (μ_{t - 1}) + \frac{1}{2} \frac{d^{2} f}{d x_{t - 1}^{2}} σ_{t - 1}^{2} + \dots$ (3.13) where $σ_{t - 1}^{2}$ is the variance derived from $p (x_{t - 1}) .$ Similarly, subtracting Equation(3.13)(3.13) $= f (μ_{t - 1}) + \frac{1}{2} \frac{d^{2} f}{d x_{t - 1}^{2}} σ_{t - 1}^{2} + \dots$ (3.13) from Equation(3.10)(3.10) $x_{t} = f (μ_{t - 1}) + \frac{d f}{d x_{t - 1}} (x_{t - 1} - μ_{t - 1}) + \frac{1}{2} \frac{d^{2} f}{d x_{t - 1}^{2}} {(x_{t - 1} - μ_{t - 1})}^{2} + \dots$ (3.10) , squaring the result, and applying the expectation one obtains (3.14) $σ_{t}^{2} = \frac{d f}{d x_{t - 1}} σ_{t - 1}^{2} \frac{d f}{d x_{t - 1}} + \frac{1}{2} \frac{d f}{d x_{t - 1}} T_{t - 1} \frac{d^{2} f}{d x_{t - 1}^{2}} + \frac{1}{4} \frac{d^{2} f}{d x_{t - 1}^{2}} (F_{t - 1} - σ_{t - 1}^{4}) \frac{d^{2} f}{d x_{t - 1}^{2}} + \dots$ (3.14) where $T_{t - 1}$ and $F_{t - 1}$ are the third and fourth moments of $p (x_{t - 1}),$ respectively. Without the simplifying assumptions used in the Gaussian pdf case, we arrive at these infinite sums for the mean and covariance.

3.3. EnKF framework

To evaluate integrals of the form Equation(2.7) and (2.10)(2.10) $= \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q$ (2.10) , or the expressions in Equation(3.13) and (3.14)(3.14) $σ_{t}^{2} = \frac{d f}{d x_{t - 1}} σ_{t - 1}^{2} \frac{d f}{d x_{t - 1}} + \frac{1}{2} \frac{d f}{d x_{t - 1}} T_{t - 1} \frac{d^{2} f}{d x_{t - 1}^{2}} + \frac{1}{4} \frac{d^{2} f}{d x_{t - 1}^{2}} (F_{t - 1} - σ_{t - 1}^{4}) \frac{d^{2} f}{d x_{t - 1}^{2}} + \dots$ (3.14) , in an EnKF framework, statistical sampling is used. The sample mean and variance at time t are (3.15) ${\bar{x}}_{t} = \frac{1}{k} \sum_{i = 1}^{k} x_{t}^{(i)}$ (3.15) (3.16) $s_{t}^{2} = \frac{1}{k - 1} \sum_{i = 1}^{k} {(x_{t}^{(i)} - {\bar{x}}_{t})}^{2}$ (3.16) where k is the number of samples. The error in these estimates is well-known form central limit theorem-type arguments (for example, see Hodyss et al., Citation2016). The error may be quantified by calculating the squared deviation about the true mean and variance: (3.17) $E ({({\bar{x}}_{t} - μ_{t})}^{2}) = \frac{σ_{t}^{2}}{k}$ (3.17) (3.18) $E ({(s_{t}^{2} - σ_{t}^{2})}^{2}) = \frac{1}{k} (F_{t} - \frac{k - 3}{k - 1} σ_{t}^{4}) .$ (3.18)

The AGR filter update EquationEquations (3.6)(3.6) $= f (x_{t - 1}^{a}) + \frac{1}{2} a_{2} .$ (3.6) and Equation(3.9)(3.9) $= a_{1}^{2} + \frac{1}{2} a_{2}^{2} + Q .$ (3.9) are only approximating the first few terms in Equation(3.13) and (3.14)(3.14) $σ_{t}^{2} = \frac{d f}{d x_{t - 1}} σ_{t - 1}^{2} \frac{d f}{d x_{t - 1}} + \frac{1}{2} \frac{d f}{d x_{t - 1}} T_{t - 1} \frac{d^{2} f}{d x_{t - 1}^{2}} + \frac{1}{4} \frac{d^{2} f}{d x_{t - 1}^{2}} (F_{t - 1} - σ_{t - 1}^{4}) \frac{d^{2} f}{d x_{t - 1}^{2}} + \dots$ (3.14) assuming the pdf $p (x_{t - 1})$ is Gaussian. In contrast, the sample mean Equation(3.15)(3.15) ${\bar{x}}_{t} = \frac{1}{k} \sum_{i = 1}^{k} x_{t}^{(i)}$ (3.15) and sample covariance Equation(3.16)(3.16) $s_{t}^{2} = \frac{1}{k - 1} \sum_{i = 1}^{k} {(x_{t}^{(i)} - {\bar{x}}_{t})}^{2}$ (3.16) are attempting to approximate the full sums in Equation(3.13) and (3.14)(3.14) $σ_{t}^{2} = \frac{d f}{d x_{t - 1}} σ_{t - 1}^{2} \frac{d f}{d x_{t - 1}} + \frac{1}{2} \frac{d f}{d x_{t - 1}} T_{t - 1} \frac{d^{2} f}{d x_{t - 1}^{2}} + \frac{1}{4} \frac{d^{2} f}{d x_{t - 1}^{2}} (F_{t - 1} - σ_{t - 1}^{4}) \frac{d^{2} f}{d x_{t - 1}^{2}} + \dots$ (3.14) without knowledge of $p (x_{t - 1})$ which is a more difficult task.

3.4. Scalar example

In this example, we explore the differences in the predicted mean and covariance estimates used by the AGR filter and EnKF filters. In the scalar case, the AGR filter is full rank allowing for comparison between the error caused by the low-order polynomial approximation Equation(3.2)(3.2) $γ (s) = \tilde{F} (0) + a_{1} s + \frac{1}{2} a_{2} s^{2}$ (3.2) versus the sampling error in an EnKF estimate. Consider the scalar model given by (3.19) $f (x) = c_{1} x + c_{2} x^{2} + c_{3} x^{3} + c_{4} x^{4}$ (3.19) with $p (x_{0})$ Gaussian and $μ_{0} = 0 .$ This implies from Equation(3.13) and (3.14)(3.14) $σ_{t}^{2} = \frac{d f}{d x_{t - 1}} σ_{t - 1}^{2} \frac{d f}{d x_{t - 1}} + \frac{1}{2} \frac{d f}{d x_{t - 1}} T_{t - 1} \frac{d^{2} f}{d x_{t - 1}^{2}} + \frac{1}{4} \frac{d^{2} f}{d x_{t - 1}^{2}} (F_{t - 1} - σ_{t - 1}^{4}) \frac{d^{2} f}{d x_{t - 1}^{2}} + \dots$ (3.14) that the true mean and variance are given by (3.20) $μ_{1} = 〈 x_{1} 〉 = c_{2} σ_{0}^{2} + \dots$ (3.20) (3.21) $σ_{1}^{2} = c_{1} σ_{0}^{2} c_{1} + 2 c_{2} σ_{0}^{4} c_{2} + \dots$ (3.21)

In this example, and the following examples, we are not considering model error. For the EnKF case, where we approximate Equation(3.20) and (3.21)(3.21) $σ_{1}^{2} = c_{1} σ_{0}^{2} c_{1} + 2 c_{2} σ_{0}^{4} c_{2} + \dots$ (3.21) , the mean and covariance depend on c₁ and c₂. We set the variance P = 1 and $c_{1}, c_{3} = 0$ and let $0 \leq c_{2} \leq 0.6$ and $0 \leq c_{4} \leq 0.05 .$ We define the true solution to this problem to be given by Equation(3.15)(3.15) ${\bar{x}}_{t} = \frac{1}{k} \sum_{i = 1}^{k} x_{t}^{(i)}$ (3.15) with k = 50,000. In this case, we perform a random draw from P to form the ensembles. We propagate the mean estimate for the AGR filter and the ensemble for the EnKF using Equation(3.19)(3.19) $f (x) = c_{1} x + c_{2} x^{2} + c_{3} x^{3} + c_{4} x^{4}$ (3.19) and compute the error in the predicted means and covariances. The error map of the mean estimates of Equation(3.6) and (3.15)(3.15) ${\bar{x}}_{t} = \frac{1}{k} \sum_{i = 1}^{k} x_{t}^{(i)}$ (3.15) for the different values of c₂, c₄ and ensemble sizes k = 5, 10, 100 for the EnKF are shown in . Note for this example the AGR filter only requires 3 model evaluations as described in Section 3.1 whereas the EnKF requires the same number of model evaluations as the ensemble size.

Fig. 1. The L² error in the estimated (a) EnKF mean (3.15) for k = 5, (b) EnKF mean for k = 10, (c) EnKF mean for k = 100, and (d) AGR filter mean (3.6) for increasing values of c₂ (horizontal axis) and c₄ (vertical axis). Note that the color scales are different between the first two plots (a) & (b) and the second two (c) & (d).

Fig. 1. The L2 error in the estimated (a) EnKF mean (3.15) for k = 5, (b) EnKF mean for k = 10, (c) EnKF mean for k = 100, and (d) AGR filter mean (3.6) for increasing values of c2 (horizontal axis) and c4 (vertical axis). Note that the color scales are different between the first two plots (a) & (b) and the second two (c) & (d).

In and (), for a similar number of model evaluations to the AGR filter, the sampling error in the EnKF estimates are quite large. Note that the color bars in (a) and (b) are the same and are of a different order than the color bars used in (c) and (d). In panels (c) and (d), the amount error in the EnKF estimate with k = 100 and the AGR filter is comparable. The AGR filter quadrature error is invariant with respect to changes in c₂, whereas the EnKF estimation error depends on both c₂ and c₄ as expected given Equation(3.20)(3.20) $μ_{1} = 〈 x_{1} 〉 = c_{2} σ_{0}^{2} + \dots$ (3.20) . If c₄, which the fourth derivative depends on, is sufficiently small we expect better performance from the AGR filter estimated mean (3.6) regardless of the size of c₂.

In the prior covariance estimates in , we see in (a) that the error in the EnKF covariance estimate with k = 100 grows with increases in c₂ and c₄. By comparison, the error in the AGR filter covariance in (b) is small when c₄ is small and grows as the fourth-order derivative grows as expected since the error depends on $c_{4}^{2} .$ The AGR filter covariance estimation is equal to or better than the EnKF estimate for small c₄. For larger c₄, the EnKF covariance estimate performs better. Note for this example we do not have a c₃ term which the error in the AGR filter and EnKF depends on as well.

Fig. 2. The L² error in (a) EnKF prior covariance estimate for k = 100 and (b) the error in the AGR filter prior covariance estimate for increasing values of c₂ (horizontal axis) and c₄ (vertical axis).

Fig. 2. The L2 error in (a) EnKF prior covariance estimate for k = 100 and (b) the error in the AGR filter prior covariance estimate for increasing values of c2 (horizontal axis) and c4 (vertical axis).

This example demonstrates the types of scenarios where one might choose one type of filter over another. For small ensemble sizes, the AGR filter may be the preferable choice as well as for the case where the model is moderately nonlinear, i.e. small magnitude higher order terms. For a large ensemble with large model fourth derivatives, the EnKF may provide a better estimate of the predicted mean.

3.5. Gaussian pdf integration: multi-dimensional case

We will now extend the results in Section 3.1 to higher dimensions. To evaluate integrals of the form (3.1), we begin by first applying the coordinate transform $x_{t - 1} = S^{T} η + x_{t - 1}^{a},$ where S is the square root of the covariance $P_{t - 1}^{a}$ such that $P_{t - 1}^{a} = S^{T} S .$ Using this change of coordinates, we can convert (3.1) to the standard form with N(0, I), where I is the identity matrix. Then (3.22) $I = \int_{R^{n}} \tilde{F} (η) \frac{1}{{(2 π)}^{n / 2}} e^{- \frac{1}{2} | η |^{2}} d η$ (3.22) where (3.23) $\tilde{F} (η) = F (S^{T} η + x_{t - 1}^{a}) .$ (3.23)

Using Equation(3.22)(3.22) $I = \int_{R^{n}} \tilde{F} (η) \frac{1}{{(2 π)}^{n / 2}} e^{- \frac{1}{2} | η |^{2}} d η$ (3.22) we can develop formulas to evaluate Equation(2.18) and (2.19)(2.19) $P_{t}^{b} = \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} N (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q .$ (2.19) explicitly based on polynomial quadrature. In Ito and Xiong (Citation2000), $\tilde{F} (η)$ is approximated by the function $γ (η)$ such that $\tilde{F} (z_{i}) = γ (z_{i})$ for points ${z_{i}}$ in $R^{n} .$ The multivariate polynomial $γ (η)$ is given by (3.24) $γ (η) = \tilde{F} (0) + \sum_{i = 1}^{n} a_{i} s_{i} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} s_{i}^{2}$ (3.24) where $a_{i} \in R^{n}$ is the ith column of a, the first-order variation or Jacobian, s_i is the ith column of S, and b_i is the ith column approximation of the second-order variation, or Hessian. The coefficients a and b may be determined using centered differencing, similar to the scalar case, via (3.25) $a_{i} = \frac{f (d e_{i}) - f (- d e_{i})}{2 d}, 1 \leq i \leq n$ (3.25) where ${e_{i}} \in R^{n}$ are unit vectors and d > 0. We approximate b_i via (3.26) $b_{i} = \frac{f (d e_{i}) - 2 f (0) + f (- d e_{i})}{d^{2}}, 1 \leq i \leq n .$ (3.26)

Evaluating a and b requires $2 n + 1$ model evaluations. Note that we do not use cross derivative terms in the Hessian which would require an additional $\frac{1}{2} n (n - 1)$ model evaluations to compute.

Using the polynomial in Equation(3.24)(3.24) $γ (η) = \tilde{F} (0) + \sum_{i = 1}^{n} a_{i} s_{i} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} s_{i}^{2}$ (3.24) , we can write the integral Equation(3.22)(3.22) $I = \int_{R^{n}} \tilde{F} (η) \frac{1}{{(2 π)}^{n / 2}} e^{- \frac{1}{2} | η |^{2}} d η$ (3.22) as (3.27) $I = \int_{R^{n}} γ (η) \frac{1}{{(2 π)}^{n / 2}} e^{- 1 / 2 | η |^{2}} d η$ (3.27) and create explicit formulas for the mean and covariance: (3.28) $x_{t}^{b} = \int_{R^{n}} γ (η) \frac{1}{{(2 π)}^{n / 2}} e^{- 1 / 2 | η |^{2}} d η$ (3.28) (3.29) $= \frac{1}{{(2 π)}^{n / 2}} \int_{R^{n}} (\tilde{F} (0) + \sum_{i = 1}^{n} a_{i} η_{i} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} η_{i}^{2}) e^{- 1 / 2 | η |^{2}} d η$ (3.29) (3.30) $= f (x_{t - 1}^{a}) + \frac{1}{2} \sum_{i = 1}^{n} b_{i}$ (3.30) and (3.31) $P_{t}^{b} = Q + \int_{R^{n}} (γ (η) - x_{t}^{b}) {(γ (η) - x_{t}^{b})}^{T} \frac{1}{{(2 π)}^{n / 2}} e^{- 1 / 2 | η |^{2}} d η$ (3.31) (3.32) $= Q + \frac{1}{{(2 π)}^{n / 2}} \int_{R^{n}} (\sum_{i = 1}^{n} a_{i} η_{i} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} η_{i}^{2} - \frac{1}{2} \sum_{i = 1}^{n} b_{i})$ (3.32) (3.33) $\cdot {(\sum_{i = 1}^{n} a_{i} η_{i} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} η_{i}^{2} - \frac{1}{2} \sum_{i = 1}^{n} b_{i})}^{T} e^{- 1 / 2 | η |^{2}} d η$ (3.33) (3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34)

To summarize, a change of coordinates is used to transform the Gaussian integrals into standard form. We then approximate $\tilde{F} (s)$ by a quadratic polynomial. Using this approximation, we create self-contained formulas for the predicted mean and covariance.

Similar to the scalar case, odd polynomial terms drop out in the polynomial quadrature. This results in the quadrature error in estimating the mean Equation(3.30)(3.30) $= f (x_{t - 1}^{a}) + \frac{1}{2} \sum_{i = 1}^{n} b_{i}$ (3.30) on the order of the fourth derivative of the nonlinear model (see Equation(B.8)(B.8) $= \frac{1}{8} \sum_{i = 1}^{n} d_{i, i, i, i} + \frac{1}{24} \sum_{i = j, k = ℓ, i \neq k} d_{i, i, k, k} + \frac{1}{24} \sum_{i = k, j = ℓ, i \neq j} d_{i, j, i, j} + \frac{1}{24} \sum_{i = ℓ, j = k, i \neq k} d_{i, j, j, i} + \dots$ (B.8) in the appendix) even though our polynomial approximation Equation(3.24)(3.24) $γ (η) = \tilde{F} (0) + \sum_{i = 1}^{n} a_{i} s_{i} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} s_{i}^{2}$ (3.24) is only second order. We do not see as much benefit in the computation of the covariance as the error given by Equation(B.11)(B.11) $= \frac{1}{4} \sum_{i \neq j}^{n} b_{i j} b_{i j}^{T} + \frac{15}{36} \sum_{i = 1}^{n} c_{iii} c_{iii}^{T} + \frac{1}{36} \sum_{i \neq j \neq k}^{n} c_{ijk} c_{ijk}^{T} + \dots$ (B.11) is related to the cross terms in the Hessian approximation that were dropped in Equation(3.24)(3.24) $γ (η) = \tilde{F} (0) + \sum_{i = 1}^{n} a_{i} s_{i} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} s_{i}^{2}$ (3.24) . Overall, the contribution to the filter error from the low-order polynomial quadrature is minimized for moderately nonlinear systems.

3.5.1 Multidimensional example

For this example, we will again look at the effects of nonlinearity versus sampling in the AGR filter and the EnKF. We consider a variable coefficient Korteweg-de Vries (KdV) model that governs the evolution of Rossby waves in a jet flow (Hodyss and Nathan, Citation2002). This may be written as (3.35) $A_{t} - A_{xxxx} + m_{p} (x) A_{x} + m_{g} (x) - A A_{x} = 0$ (3.35) where $\begin{matrix} m p (x) & = & 1 - exp (- a x^{2}) \\ m_{g} (x) & = & - 2 a x exp (- a x^{2}) \end{matrix}$ and a = 0.0005. The derivatives are vanishing on the boundary, the initial condition is given by a solitary Rossby wave, and we use 512 model computational nodes. A contour plot of the true solution in time is shown in .

Fig. 3. Contour plot of the wave amplitude over the domain (vertical axis) of the KdV equation over time (horizontal axis).

We begin by creating a 35,000 member ensemble that will be used as the true solution in our experiments. This ensemble was created by drawing the members from climatology then using an EnKF to perform three system cycles using observations created from an ensemble member. This was done to improve the quality of the ensemble. The resulting covariance $P_{0}^{b}$ of this ensemble has eigenvalues plotted in .

Fig. 4. The 512 sorted eigenvalues of the initial background error covariance $P_{0}^{b}$ created from a 35000 member climatological ensemble. The horizontal axis is the eigenvalue number and the vertical axis is the magnitude of each eigenvalue.

The eigenvalues of $P_{0}^{b}$ and their corresponding eigenvectors will be used to form $S = \sqrt{P}$ needed by the AGR filter. Additionally, members for smaller ensemble sizes will be drawn randomly from the 35,000-member ensemble. Since $P_{0}^{b}$ has near-zero eigenvalues, we will consider only the first 250 eigen-directions thus $P_{0}^{b} \approx U_{m} Σ_{m} U_{m}^{T}$ where Σ_m is a truncated matrix with the first 250 eigenvalues of $P_{0}^{b}$ along the diagonal and U_m is composed of the corresponding eigenvectors. The square root of $P_{0}^{b}$ is then given by $S_{m} = U_{m} \sqrt{Σ_{m}}$ which is used in the coordinate transform Equation(3.23)(3.23) $\tilde{F} (η) = F (S^{T} η + x_{t - 1}^{a}) .$ (3.23) . In this example, 501 model evaluations are used to compute Equation(3.30) and (3.34)(3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34) , thus the solution will be compared to an ensemble with 500 members for fairness. Similar to the one-dimensional example, the error in the prior mean estimates of the AGR filter and the EnKF is examined as nonlinearity is increased. The nonlinearity is further developed by increasing the amount of time (t₀) the model is integrated forward.

One way to observe the impact of the increased nonlinearity is to look at the influence of b in (3.24). For comparison we consider the filter without the second-order correction term which uses the first-order polynomial quadrature as AGR1, and with b which uses the second-order polynomial quadrature as AGR2.

Both filters are initialized using the mean of the 35,000 member ensemble, which we consider to be the true mean. The perturbations for the 500 member ensemble are drawn from the 35,000 member ensemble and then re-centered on the true mean. The S for the AGR filters is described above. All methods are integrated forward to t₀ and the prior means and covariances are computed. compares the L₂ error in the EnKF prior mean solution with K = 500 and the AGR filter solutions with m = 250. The AGR2 filter significantly outperforms the AGR1 filter, demonstrating the importance of the second-order correction term. The AGR2 filter outperforms the EnKF until about $t_{0} = 0.55$ or 5501 model time steps. The AGR2 filter performs well prior to this point having half the error of the EnKF at $t_{0} = 0.25$ or 2501 model time steps. compares the covariances of the EnKF and AGR filter using the Frobenius norm given by $| | A | |_{FRO} = \sqrt{t race (A^{T} A)} .$

Fig. 5. (a) The L₂ error (vertical axis) in the estimate of the prior mean for the EnKF with k = 500, the AGR1 with m = 250, and AGR2 with m = 250 for time step length t₀ (horizontal axis). (b)The error in the Frobenius norm of the corresponding covariance estimates.

Fig. 5. (a) The L2 error (vertical axis) in the estimate of the prior mean for the EnKF with k = 500, the AGR1 with m = 250, and AGR2 with m = 250 for time step length t0 (horizontal axis). (b)The error in the Frobenius norm of the corresponding covariance estimates.

The AGR1 and AGR2 filters have about the same error in their covariances and outperform the EnKF until about $t_{0} = 1 .$ For model regimes which do not have overly large higher order terms, the AGR2 may provide better estimation.

For large n, evaluating Equation(3.30) and (3.34)(3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34) is prohibitively expensive since it requires $2 n_{e} + 1$ model evaluations where n_e is the number of nonzero eigenvalues. To reduce the computational cost, we consider the case where only the leading m eigenvalues are kept. Ideally, m would be chosen so that the singular values capture the essential dynamics, however, in atmospheric applications this is may not be possible due to computational constraints. The truncation error in the estimation of the square root S_m of $P_{t - 1}^{a}$ is given by (3.36) $| S - S_{m} | \leq \sum_{i > m} \sqrt{σ_{i}} .$ (3.36)

If $P_{t - 1}^{a}$ has n–m eigenvalues approaching zero this estimation is very accurate. In other words, the extent of the correlations in $P_{t - 1}^{a}$ determines the accuracy of this truncation. The error in evaluating Equation(3.30) and (3.34)(3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34) now comes from both quadrature and this truncation.

We repeat the previous experiment with K = 40 introducing undersampling for the EnKF estimate and m = 20 for the AGR estimates. Again we see the importance of the second-order correction term when comparing AGR1 and AGR2 in . In (a) the AGR2 filter again has half the error of the EnKF at $t_{0} = 0.25 .$ However, due to the presence of sampling error in both of the prior mean estimates, the AGR2 continues to outperform the EnKF until about $t_{0} = 1.55$ or 15,501 model time steps after which time the EnKF has a slight edge in performance. In (b) both the AGR1 and AGR2 estimates outperform the EnKF covariance estimates for various values of t₀.

Fig. 6. (a) The L₂ error (vertical axis) in the estimate of the prior mean for the EnKF with k = 40, the AGR1 with m = 20, and AGR2 with m = 20 for time step length t₀ (horizontal axis). (b) The corresponding error in the Forbenius norm for the prior covariance estimates.

Fig. 6. (a) The L2 error (vertical axis) in the estimate of the prior mean for the EnKF with k = 40, the AGR1 with m = 20, and AGR2 with m = 20 for time step length t0 (horizontal axis). (b) The corresponding error in the Forbenius norm for the prior covariance estimates.

In both the cases with undersampling and without undersampling the AGR2 consistently outperformed AGR1 due to the inclusion of the second-order correction term b. Additionally, in both cases, there was a moderately nonlinear regime in which the AGR2 filter outperformed the EnKF. Similar to the scalar case, the AGR2 filter was found to be more sensitive to increased nonlinearity than the EnKF; however, the EnKF proved to be more sensitive to undersampling. This broadened the regime in which the AGR2 filter outperformed the EnKF.

3.5.2 A note on $P_{0}^{b}$

For this example, S_m was computed from $P_{0}^{b}$ for the AGR filters. This $P_{0}^{b}$ was created using a 35,000 member climatological ensemble. Using fewer ensemble members to create $P_{0}^{b}$ introduces another source of error at the starting time. For example, if $P_{0}^{b}$ is constructed with $k_{e} = 40, 80, 160, 35000$ ensemble members, then the accuracy of the AGR2 filter for m = 20 decreases accordingly for computing the prior covariance estimates as in . For convenience, we have included the error estimate for the EnKF in this plot. Note that the ensemble of the 40 member EnKF is drawn from the 35,000 member climatological ensemble. There are numerous strategies to develop a more accurate and higher rank $P_{0}^{b}$ (Clayton et al., 2013; Derber and Bouttier, Citation1999) which are beyond the scope of this paper.

Fig. 7. The error in the Frobenius norm (vertical axis) of the prior covariance estimates in the AGR2 filter for m = 20 with $P_{0}^{b}$ computed using $k_{e} = 40, 80, 160, 35000$ ensemble members for time step length t₀ (horizontal axis).

4 AGR filters

In order to utilize the mean Equation(3.30)(3.30) $= f (x_{t - 1}^{a}) + \frac{1}{2} \sum_{i = 1}^{n} b_{i}$ (3.30) and covariance Equation(3.34)(3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34) updates, we develop an algorithm in the same vein as the ensemble square root filters (Whitaker and Hamill, Citation2002), i.e. we will update S_m keeping P^b in factored form. To begin with we note that after some algebraic manipulation and dropping Q, we may rewrite Equation(3.34)(3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34) as (4.1) $P_{t}^{b} = a (I + {(a^{T} {(\frac{1}{2} b b^{T})}^{- 1} a)}^{- 1}) a^{T} .$ (4.1) where $a = [a_{1}, \dots, a_{m}]$ and $b = [b_{1}, \dots, b_{m}]$ are computed using the centered differencing scheme (4.2) $a_{i} = \frac{f (S^{T} (d e_{i}) + x_{t - 1}^{a}) - f (S^{T} (- d e_{i}) + x_{t - 1}^{a})}{2 d}, 1 \leq i \leq m$ (4.2) and we approximate b_i via (4.3) $b_{i} = \frac{f (S^{T} (d e_{i}) + x_{t - 1}^{a}) - 2 f (x_{t - 1}^{a}) + f (S^{T} (- d e_{i}) + x_{t - 1}^{a})}{d^{2}}, 1 \leq i \leq m .$ (4.3)

The above equations are the same as Equation(3.25) and (3.26)(3.26) $b_{i} = \frac{f (d e_{i}) - 2 f (0) + f (- d e_{i})}{d^{2}}, 1 \leq i \leq n .$ (3.26) but with the truncated S = S_m. Letting $ξ = \sqrt{2} b^{†} a,$ where $b^{†}$ is the Moore-Penrose pseudo inverse, and using Equation(4.1)(4.1) $P_{t}^{b} = a (I + {(a^{T} {(\frac{1}{2} b b^{T})}^{- 1} a)}^{- 1}) a^{T} .$ (4.1) , then $P = \tilde{a} {\tilde{a}}^{T}$ where (4.4) $\tilde{a} = a \sqrt{I + {(ξ^{T} ξ)}^{- 1}} .$ (4.4)

Note that $ξ \in R^{m}$ so the expression in Equation(4.4)(4.4) $\tilde{a} = a \sqrt{I + {(ξ^{T} ξ)}^{- 1}} .$ (4.4) may not be overly expensive to compute. To form the filter, we use the Potter method (Potter, Citation1963) for the Kalman square root update in reduced order form. This will improve the numerical robustness by ensuring $P = S^{T} S$ is symmetric and reducing the amount of storage required by the AGR filter by only storing the square root S. To form the filter, let (4.5) $β = H \tilde{a} \in R^{p \times m}$ (4.5) then (4.6) $Z = R + H P^{b} H^{T} = R + β β^{T}, K_{t} = \tilde{a} β^{T} Z^{- 1} .$ (4.6)

Thus, $P_{t}^{a} = \tilde{a} (I - β^{T} Z^{- 1} β) {\tilde{a}}^{T} .$

Letting $η = I - β^{T} Z^{- 1} β = V D V^{T}$ then we update S by $S_{t} = (\sqrt{D} + ϵ I) V^{T} \tilde{a}$ where $ϵ > 0$ is a tunable parameter. We have chosen to form a regularized S which will help with the conditioning of the matrix and decrease dispersion. Other inflation methods such as multiplicative covariance inflation may also be used. To summarize, the algorithm for the AGR2 filter is as follows:

Given $S_{t - 1} = [s_{1}, \dots, s_{m}]$ compute $x_{t}^{b}$ and a_i, b_i for $1 \leq i \leq m .$
Compute $\tilde{a}$ as in (4.4).
Let $β = H \tilde{a}$ then $x_{t}^{a} = x_{t}^{b} + K_{t} (y_{t} - H x_{t}^{b})$

where $K_{t} = \tilde{a} β^{T} Z^{- 1}$ and $Z = R + β β^{T} .$

Decompose η such that $η = V D V^{T}$ where D is diagonal and V is unitary. Then $S_{t} = (\sqrt{D} + ϵ I) V^{T} \tilde{a} .$

The algorithm itself is readily implemented and requires minimal tuning of the parameter d from EquationEquations (4.2)(4.2) $a_{i} = \frac{f (S^{T} (d e_{i}) + x_{t - 1}^{a}) - f (S^{T} (- d e_{i}) + x_{t - 1}^{a})}{2 d}, 1 \leq i \leq m$ (4.2) and Equation(4.3)(4.3) $b_{i} = \frac{f (S^{T} (d e_{i}) + x_{t - 1}^{a}) - 2 f (x_{t - 1}^{a}) + f (S^{T} (- d e_{i}) + x_{t - 1}^{a})}{d^{2}}, 1 \leq i \leq m .$ (4.3) . For quasi-linear systems, the second-order correction term b may be dropped giving the AGR1 filter. In this case, we may further reduce computational cost by using finite differencing instead of centered differencing. Then to evaluate Equation(3.30) and (3.34)(3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34) , we use the finite differencing scheme to approximate the $a_{i}, i = 1, \dots, m,$ i.e., (4.7) $a_{i} = \frac{f (S^{T} (d e_{i}) + x_{t - 1}^{a}) - f (x_{t - 1}^{a})}{d}, 1 \leq i \leq m$ (4.7) after the coordinate change where d > 0 is the step size. The benefit of computing a_i in this manner is that this only requires m + 1 model evaluations. Note that the expression Equation(4.7)(4.7) $a_{i} = \frac{f (S^{T} (d e_{i}) + x_{t - 1}^{a}) - f (x_{t - 1}^{a})}{d}, 1 \leq i \leq m$ (4.7) amounts to a directional derivative determined by the truncated S. In using S the derivative is computed in the direction of the largest change in the dynamics. Meanwhile, the parameter d restricts the search direction to a constrained set. This is a generalization of the standard derivative, in fact it is a numerical approximation of the Jacobian under a coordinate transformation. In this way, the AGR1 may be viewed as a form of an extended Kalman filter.

5. Data assimilation

In this section, we present data assimilation comparisons between the AGR2 filter, described in the previous section, and the ensemble square root filter (Tippett et al., Citation2003) as the example EnKF method. We use this particular filter as the correction step as it is most similar to the AGR filter while having an ensemble estimate for the mean and covariance.

5.1. 1 D Example

We return to the KdV model given by Equation(3.35)(3.35) $A_{t} - A_{xxxx} + m_{p} (x) A_{x} + m_{g} (x) - A A_{x} = 0$ (3.35) . As before we will use k = 40 ensemble members drawn from the 35,000 member ensemble for the EnKF and the AGR2 filter with m = 20. This time the initial $x_{0}^{a}$ for the AGR2 filter will be the mean of the k = 40 EnKF ensemble. Both the EnKF and AGR2 filter will use the same 32 observations at assimilation time. We use localization and multiplicative inflation wherein the correlation length scale used in the localization and the inflation factor were tuned so that the ensemble variance correspond to the true error variance. We again consider different values of t₀, the time the model is integrated forward before assimilation, to see how increasing the nonlinearity affects these two filtering algorithms. To reduce the influence of the initial conditions, we will only consider assimilation cycles 200–450.

is a plot of the average error across a data assimilation window for various t₀. For smaller t₀ the model integration is less nonlinear and we can see that the AGR2 filter has about 30% less error than the EnKF. As t₀ gets larger, the model integration is more nonlinear and the error in the solution of the AGR2 grows more rapidly than the EnKF and by $t_{0} = 2.05$ or 20001 model time steps the AGR2 error is about 24% less than the EnKF.

Fig. 8. The L² error (vertical axis) averaged over the assimilation window for increasing t₀ (horizontal axis), the time between cycles, for the EnKF and the AGR2 filter.

Fig. 8. The L2 error (vertical axis) averaged over the assimilation window for increasing t0 (horizontal axis), the time between cycles, for the EnKF and the AGR2 filter.

This cycling experiment result demonstrates that the improvement in the predicted mean and covariance estimates seen in leads to an improvement in the data assimilation state estimation or analysis. Also it demonstrates that increasing the nonlinearity has more of an impact on the quality of the AGR2 solution versus the EnKF solution.

5.2. 2 D Example

We will now investigate the performance of the proposed AGR filter using a two-dimensional Boussinesq model that develops Kelvin-Helmhotz waves, specifically, we use the model developed in (Hodyss et al., Citation2013). The governing equations given by (5.1) $\begin{matrix} \frac{\partial ζ}{\partial t} & = & - (u \frac{\partial ζ}{\partial x} + w \frac{\partial ζ}{\partial z} + \frac{g}{θ_{0}} \frac{\partial θ}{\partial x}) + F, \\ \frac{\partial θ}{\partial t} & = & - (u \frac{\partial θ}{\partial x} + w \frac{\partial θ}{\partial z} + w \frac{\partial θ_{0}}{\partial z}) + H, \end{matrix}$ (5.1) where $u = \frac{\partial ψ}{\partial z}, w = - \frac{\partial ψ}{\partial x}, and ζ = \nabla^{2} ψ,$

$\nabla^{2}$ is the Laplacian operator, u and w are zonal and vertical winds, respectively, θ is the potential temperature, and ζ is the vorticity. The vorticity source F and the heat source H both have sub-grid scale parameterizations, more details may be found in Hodyss et al. (Citation2013). The buoyancy frequency of the reference state $Θ_{0}$ is given by the background potential temperature: $N_{0}^{2} = \frac{g}{θ_{0}} \frac{d θ_{0}}{d z} = 10^{- 4} s^{- 1} .$ And $U_{0} = \frac{V}{2} [1 + \tanh (μ \frac{z - z_{0}}{L})]$ is the reference state for the zonal wind with $V = 10 m s^{- 1},$ μ = 8, L = 1 km, and $z_{0} = 0.5$ km. The z boundary conditions are a mirrored forcing the vertical velocity to vanish. Additionally, there are sponge boundaries along the left and right sides of the channel. At time t = 0, the flow is perturbed leading to waves that amplify as they travel then break. For this experiment, the model was run with 128 computational nodes in the x direction and 33 nodes (unmirrored) in the z direction. All told the state vector has 8448 elements. The true solution at the end of the assimilation window may be seen in for (a) the vorticity and (b) the temperature. As the waves move across the atmospheric slice, they grow and eventually shear.

Fig. 9. The true solution of (5.1) at time t = 15000, or 200 cycles of 75 seconds, for (a) vorticity and (b) temperature. The vertical axis is the height and the horizontal axis is the distance.

During the assimilation window, the model is advanced, then the filtering is performed with 112 temperature and 112 wind observations. The observations are created by perturbing the truth via (5.2) $y_{t} = y_{t}^{true} + R^{1 / 2} ξ_{t}$ (5.2) where R is the instrument error covariance and ξ is white noise. For this experiment, $R = 1 e - 2$ for both the temperature and wind. A 24,000 member ensemble was created by cycling random perturbations through the model. The smaller k = 20 and k = 40 ensembles were drawn from this 24,000 member ensemble and which was also used to create $P_{0}^{b}$ for the AGR filter. We initialize the $x_{0}^{a}$ used in the AGR filter with the mean of the k = 20 ensemble for the EnKF. Both the EnKF with k = 20, 40 and the AGR2 filter will use the same observations for a particular t₀. Again both types of filters are using localization and inflation tuned to so that the ensemble variance matches the true error variance. We will compute the error averaged over assimilation cycles 100–400 to reduce the influence of the initial conditions.

The error in the mean estimation plots in demonstrate similar results to the one-dimensional KdV example. For the more linear case $t 0 = 75,$ the AGR2 filter significantly outperforms the EnKF. As t₀ is increased, the nonlinearity increases and the AGR2 filter loses its performance advantage over the EnKF until around $t 0 = 300 .$ As before, the increased nonlinearity has a greater impact on the performance of the AGR2 filter as opposed to the EnKF.

Fig. 10. The averaged L² error (vertical axis) across the data assimilation window in the mean estimates for particular t₀ (horizontal axis) for (a) vorticity and (b) temperature.

Fig. 10. The averaged L2 error (vertical axis) across the data assimilation window in the mean estimates for particular t0 (horizontal axis) for (a) vorticity and (b) temperature.

We have presented two example problems comparing the AGR filter and the EnKF. The first example was a one-dimensional KdV model in which the AGR filter outperformed the EnKF but was more influenced by nonlinearity. In the second example, a two-dimensional Boussinesq model was considered. In this case, starting with $t 0 = 75,$ the AGR filter out performed the EnKF. When $t 0 = 300,$ the error in the mean estimation has more than doubled and the performance between the AGR filter and the EnKF are comparable. Again we see that the AGR filter is more affected by the nonlinearity in the model than the EnKF.

6. Final remarks

We have presented a quadrature Kalman filter, the AGR filter, for moderately nonlinear systems. The filter uses numerical quadrature to evaluate the Bayesian formulas for optimal filtering under Gaussian assumptions. The AGR filter has the Gaussian noise assumptions and Gaussian joint distribution assumption from Kalman filtering with the added assumption that the prior distribution is Gaussian. This leads to Gaussian integrals which are evaluated using the second-order polynomial quadrature. Due to the properties of Gaussian distributions, using this polynomial achieves the same precision as a third-order polynomial quadrature. This effective higher order quadrature is key to the success of this filter.

In numerical tests, the AGR filter was found to outperform a comparable square-root EnKF in regions of low-to-moderate nonlinearity for a KdV model and a Boussinesq model. We expect these results to extend to more realistic atmospheric models, given that fourth and higher order terms of the model are sufficiently small. For highly nonlinear dynamical systems, the AGR filter is affected more than the square-root EnKF but may still provide performance benefit if the system is severely under-sampled as demonstrated in the scalar example in Section 3.4. It is also possible to use higher order quadrature to reduce the effect of nonlinearity but this would, of course, increase the computational costs of the filter.

While the Gaussian assumption made in this filter may seem restrictive, this assumption is commonly made, or effectively made, in data assimilation. For example, recent results indicate that it may require an ensemble with on the order of one thousand members to capture non-Gaussianity pdfs present in an EnKF for a simplified general circulation model (Miyoshi et al., 2014). This is already significantly more than the O(100) ensemble members typically used in EnKFs for full complexity atmospheric models. Effectively, a Gaussian assumption is being made due to the sample size. The computational efficiency of the AGR filter means that there is greater opportunity to pursue non-Gaussian pdfs via Gaussian mixture models (GMMs). In GMMs a non-Gaussian distribution is approximated by a series of Gaussian distributions which, in this case, would lead to an optimally weighted ensemble of AGR filters.

A. Formulas

A.1. Expectation formulas

Consider the expectation (A.1) $E [x_{t}, Y_{T - 1}] = \int_{R^{n}} x_{t} p (x_{t} | Y_{T - 1}) d x_{t}$ (A.1) (A.2) $= \int_{R^{n}} [\int_{R^{n}} x_{t} p (x_{t} | x_{t - 1}) d x_{t}] p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (A.2) (A.3) $= \int_{R^{n}} f (x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (A.3)

where $E [\cdot]$ is the expectation. Note that Equation(A.2)(A.2) $= \int_{R^{n}} [\int_{R^{n}} x_{t} p (x_{t} | x_{t - 1}) d x_{t}] p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (A.2) follows from Equation(2.1)(2.1) $p (x_{t} | Y_{T - 1}) = \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} .$ (2.1) and Fubini’s theorem and Equation(2.7)(2.7) $= \int_{R^{n}} f (x_{t - 1}) p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (2.7) follows from Equation(2.3)(2.3) $x_{t} = f (x_{t - 1}) + w_{t}$ (2.3) and w_t being Gaussian. Similarly, the predicted covariance is given by (A.4) $E [x_{t} x_{t}^{T}] = \int_{R^{n}} x_{t} x_{t}^{T} p (x_{t} | Y_{T - 1}) d x_{t}$ (A.4) (A.5) $= \int_{R^{n}} [\int_{R^{n}} x_{t} x_{t}^{T} p (x_{t} | x_{t - 1}) d x_{t}] p (x_{t - 1} | Y_{T - 1}) d x_{t - 1}$ (A.5) (A.6) $= \int_{R^{n}} f (x_{t - 1}) f {(x_{t - 1})}^{T} p (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q .$ (A.6)

A.2. Covariance

The cross covariance in Equation(2.13)(2.13) $= N ((\begin{matrix} {x_{t}}^{b} \\ {\hat{y}}_{t}^{b} \end{matrix}), (\begin{matrix} {P_{t}}^{b} & {P_{t}}^{b} H^{T} \\ H {P_{t}}^{b} & H {P_{t}}^{b} H^{T} + R \end{matrix}))$ (2.13) is computed via (A.7) $P_{t}^{x y} = E [(x_{t} - x_{t}^{b}) {({\hat{y}}_{t} - {\hat{y}}_{t}^{b})}^{T}]$ (A.7) (A.8) $= \int (x_{t} - x_{t}^{b}) {({\hat{y}}_{t} - {\hat{y}}_{t}^{b})}^{T} p (x_{t}, {\hat{y}}_{t} | Y_{t - 1}) d x_{t} d {\hat{y}}_{t}$ (A.8) (A.9) $= \int_{R^{n}} (x_{t} - x_{t}^{b}) [\int_{R^{d}} {({\hat{y}}_{t} - {\hat{y}}_{t}^{b})}^{T} p ({\hat{y}}_{t} | x_{t}) d {\hat{y}}_{t}] p (x_{t} | Y_{T - 1}) d x_{t}$ (A.9) (A.10) $= \int_{R^{n}} (x_{t} - x_{t}^{b}) {(H x_{t} - H x_{t}^{b})}^{T} p (x_{t} | Y_{t - 1}) d x_{t}$ (A.10) (A.11) $= \int_{R^{n}} (x_{t} - x_{t}^{b}) {(x_{t} - x_{t}^{b})}^{T} H^{T} p (x_{t} | Y_{t - 1}) d x_{t}$ (A.11) (A.12) $= P_{t}^{b} H^{T} .$ (A.12)

B. Qquadrature error

B.1. Scalar quadrature error

The quadrature error in evaluating the integrals Equation(2.18) and (2.19)(2.19) $P_{t}^{b} = \int_{R^{n}} (f (x_{t - 1}) - x_{t}^{b}) {(f (x_{t - 1}) - x_{t}^{b})}^{T} N (x_{t - 1} | Y_{T - 1}) d x_{t - 1} + Q .$ (2.19) comes from the low-order polynomial approximation Equation(3.2)(3.2) $γ (s) = \tilde{F} (0) + a_{1} s + \frac{1}{2} a_{2} s^{2}$ (3.2) . Consider the estimation error of Equation(3.6)(3.6) $= f (x_{t - 1}^{a}) + \frac{1}{2} a_{2} .$ (3.6) given by (B.1) $e_{mean} = \int_{R} (f (\sqrt{P} η + x_{t - 1}^{a}) - γ (η)) N (η | 0, 1) d η$ (B.1) (B.2) $= \int_{R} (\frac{1}{6} a_{3} η^{3} + \frac{1}{24} a_{4} η^{4} + \dots) \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} η^{2}} d η$ (B.2) (B.3) $= \frac{3}{24} a_{4} + \dots$ (B.3) where $a_{3}, a_{4}$ are the third and fourth derivatives, respectively, of f. Similarly, the quadrature error for Equation(3.9)(3.9) $= a_{1}^{2} + \frac{1}{2} a_{2}^{2} + Q .$ (3.9) is given by (B.4) $e_{covariance} = \int_{R} (f (\sqrt{P} η + x_{t - 1}^{a}) - x_{t - 1}^{a})^{2} N (η | 0, 1) d η$ (B.4) (B.5) $= \int_{R} {(\frac{1}{6} a_{3} η^{3} + \frac{1}{24} a_{4} η^{4} + \dots - (\frac{3}{24} a_{4} + \dots))}^{2} \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} η^{2}} d η$ (B.5) (B.6) $= \frac{15}{36} a_{3}^{2} + \frac{105}{576} a_{4}^{2} + \dots .$ (B.6)

B.2. N-d quadrature error

The estimation error of Equation(3.30) and (3.34)(3.34) $= Q + \sum_{i = 1}^{n} a_{i} a_{i}^{T} + \frac{1}{2} \sum_{i = 1}^{n} b_{i} b_{i}^{T} .$ (3.34) is given by (B.7) $e_{mean} = \frac{1}{{(2 π)}^{n / 2}} \int_{R^{n}} (\frac{1}{2} \sum_{i \neq j}^{n} b_{i, j} η_{i} η_{j} + \frac{1}{6} \sum_{i, j, k = 1}^{n} c_{i, j, k} η_{i} η_{j} η_{k} + \frac{1}{24} \sum_{i, j, k, ℓ = 1}^{n} d_{i, j, k, ℓ} η_{i} η_{j} η_{k} η_{ℓ} \dots) e^{- 1 / 2 | η |^{2}} d η$ (B.7) (B.8) $= \frac{1}{8} \sum_{i = 1}^{n} d_{i, i, i, i} + \frac{1}{24} \sum_{i = j, k = ℓ, i \neq k} d_{i, i, k, k} + \frac{1}{24} \sum_{i = k, j = ℓ, i \neq j} d_{i, j, i, j} + \frac{1}{24} \sum_{i = ℓ, j = k, i \neq k} d_{i, j, j, i} + \dots$ (B.8) and (B.9) $e_{covariance} = \frac{1}{{(2 π)}^{n / 2}} \int_{R^{n}} (\frac{1}{2} \sum_{i \neq j}^{n} b_{i, j} η_{i} + \frac{1}{6} \sum_{i, j, k = 1}^{n} c_{i, j, k} η_{i} η_{j} η_{k} + \dots - (\frac{1}{8} \sum_{i = 1}^{n} d_{i, i, i, i} + \dots))$ (B.9) (B.10) $\cdot {(\frac{1}{2} \sum_{i \neq j}^{n} b_{i, j} η_{i} + \frac{1}{6} \sum_{i, j, k = 1}^{n} c_{i, j, k} η_{i} η_{j} η_{k} + \dots - (\frac{1}{8} \sum_{i = 1}^{n} d_{i, i, i, i} + \dots))}^{T} e^{- 1 / 2 | η |^{2}} d η$ (B.10) (B.11) $= \frac{1}{4} \sum_{i \neq j}^{n} b_{i j} b_{i j}^{T} + \frac{15}{36} \sum_{i = 1}^{n} c_{iii} c_{iii}^{T} + \frac{1}{36} \sum_{i \neq j \neq k}^{n} c_{ijk} c_{ijk}^{T} + \dots$ (B.11) where $c_{ijk} = \sum_{i, j, k = 1}^{n} \frac{\partial^{3} f}{\partial x_{i} \partial x_{j} \partial x_{k}} and d_{ijk ℓ} = \sum_{i, j, k, ℓ = 1}^{n} \frac{\partial^{4} f}{\partial x_{i} \partial x_{j} \partial x_{k} \partial x_{ℓ}} .$

Note that the error on the mean does not depend on the cross terms in the Jacobian.

Acknowledgments

The authors would like to express their gratitude to Dr Nancy Baker from the U.S. Naval Research Laboratory for her insights and discussions that have greatly improved this manuscript. This research is supported by the Office of Naval Research (ONR) through the NRL Base Program PE 0601153N.

References

Anderson, J. L. 2001. An ensemble adjustment filter for data assimilation. Mon. Wea. Rev. 129, 2884–2903. doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2
Google Scholar
Arasaratnam, I. and Haykin, S. 2009. Cubature Kalman filters. IEEE Trans. Automat. Contr. 54, 1254–1269. doi:10.1109/TAC.2009.2019800
Google Scholar
Bishop, C. H., Etherton, B. and Majumdar, S. J. 2001. Adaptive sampling with the ensemble transform Kalman filter, Part I: theoretical aspects. Mon. Wea. Rev. 129, 420–1167. doi:10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2
Google Scholar
Buehner, M., Houtekamer, P. L., Charette, C., Mitchell, H. L. and He, B. 2010. Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: one-month experiments with real observations. Mon. Wea. Rev. 138, 1567–1586. doi:10.1175/2009MWR3158.1
Google Scholar
Chen, Z. 2003. Bayesian Filtering: From Kalman Filters to Particle Filters, and Beyond. Tech. Rep. Hamilton, Canada: McMaster University.
Google Scholar
Clayton,A. M., Lorenc, A. C. and Barker, D. M. 2013. Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office. Qjr. Meteorol. Soc. 139, 1445–1461. doi:10.1002/qj.2054
Google Scholar
Courtier, P., Andersson, E., Heckley, W., Pailleux, J., Vasiljevic̀, D. and co-authors. 1998. The ECMWF implementation of three-dimensional variational assimilation (3D-Var). I: formulation. Q. Q. J. Royal Met. Soc. 124, 1783–1807.
Google Scholar
Daley, R. 1991.Atmospheric Data Analysis. Cambridge University Press, New York.
Google Scholar
Derber, J. and Bouttier, F. 1999. A reformulation of the background error covariance in the ECMWF global data assimilation system. Tellus A 51, 195–221. doi:10.3402/tellusa.v51i2.12316
Google Scholar
Evensen, G. 2003. The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dynamics 53, 343–367. doi:10.1007/s10236-003-0036-9
Google Scholar
Haykin, S., Zia, A., Xue, Y. and Arasaratnam, I. 2011. Control theoretic approach to tracking radar: first step towards cognition. Digit. Signal Process. 21, 576–585. doi:10.1016/j.dsp.2011.01.004
Google Scholar
Ho, Y. C. and Lee, R. C. K. 1964. A Bayesian approach to problems in stochastic estimation and control. IEEE Trans. Automat. Control 9, 333–339. doi:10.1109/TAC.1964.1105763
Google Scholar
Hodyss, D., Campbell, W. F. and Whitaker, J. S. 2016. Observation-dependent posterior inflation for the ensemble Kalman Filter. Mon. Wea. Rev. 144, 2667–2684. doi:10.1175/MWR-D-15-0329.1
Google Scholar
Hodyss, D. and Nathan, T. 2002. Solitary Rossby waves in zonally varying jet flows. Physica D96, 239–262.
Google Scholar
Hodyss, D., Viner, K., Reinecke, A. and Hansen, J. 2013. The impact of noisy physics on the stability and accuracy of physics-dynamics coupling. Mon. Wea. Rev. 144, 4470–4486.
Google Scholar
Houtekamer, P. L., Deng, X., Mitchell, H. L., Baek, S. J. and Gagnon, N. 2014. Higher resolution in operational ensemble Kalman Filter. Mon. Wea. Rev. 142, 1143–1162. doi:10.1175/MWR-D-13-00138.1
Google Scholar
Ito, K. and Xiong, K. 2000. Gaussian filters for nonlinear filtering problems. IEEE Trans. Automat. Control 45, 910–927. doi:10.1109/9.855552
Google Scholar
Julier, S., Uhlmann, J. and Durrant-Whyte, H. F. 2000. A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. Automat. Contr. 45, 477–482. doi:10.1109/9.847726
Google Scholar
King, S., Kang, W. and Ito, K. 2016. Reduced order Gaussian smoothing for nonlinear data assimilation. IFAC-Papers OnLine 49, 199–204. doi:10.1016/j.ifacol.2016.10.163
Google Scholar
Kalman, R. E. 1960. A new approach to linear filtering and prediction problems. Trans. ASME, J. Basic Eng. 82, 35–45. doi:10.1115/1.3662552
Google Scholar
Kuhl, D. D., Rosmond, T. E., Bishop, C. H., McLay, J. and Baker, N. L. 2013. Comparison of hybrid ensemble/4DVar and 4DVar within the NAVDAS-AR data assimilation framework. Mon. Wea. Rev. 141, 2740–2758. doi:10.1175/MWR-D-12-00182.1
Google Scholar
Liu, J., Cai, B. and Wang, J. 2017. Cooperative localization of connected vehicles: integrating GNSS with DSRC using a robust cubature Kalman filter. IEEE Trans. Intell. Transport. Syst. 18, 2111–2125. doi:10.1109/TITS.2016.2633999
Google Scholar
Morzfeld, M. and Hodyss, D. 2019. Gaussian approximations in filters and smoothers for data assimilation. Tellus A 71, 1–27.
Google Scholar
Miyoshi, T., Kondo, K. and Imamura, T. 2014. The 10,240-member ensemble Kalman filtering with an intermediate AGCM. Geophys. Res. Lett. 41, 5264–5271. doi:10.1002/2014GL060863
Google Scholar
Poterjoy, J. 2016. A localized particle filter for high-dimensional nonilnear systems. Mon. Wea. Rev. 144, 59–76. doi:10.1175/MWR-D-15-0163.1
Google Scholar
Potter, J. E. 1963. New Statistical Formulas. Space Guidance Analysis Memo 40, Instrumentation Laboratory, MIT, Cambridge, MA.
Google Scholar
Särkkä, S. 2013. Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge.
Google Scholar
Sharma, A., Srivastava, S. C. and Chakrabarti, S. 2017. A cubature Kalman filter based power system dynamic stat estimator. IEEE Trans. Instrum. Meas. 66, 2036–2045. doi:10.1109/TIM.2017.2677698
Google Scholar
Sondergaard, T. and Lermusiaux, P. F. J. 2013. Data assimilation with Gaussian mixture models using the dynamically orthogonal field equations. Part I: theory and scheme. Mon. Wea. Rev. 141, 1737–1760. doi:10.1175/MWR-D-11-00295.1
Google Scholar
Talagrand, O. and Courtier, P. 1987. Variational assimilation of meteorological observations with the adjoint vorticity equation I: theory.Qjr. Meteorol. Soc.116, 1311–1328.
Google Scholar
Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M. and Whitaker, J. S. 2003. Ensemble square root filters. Mon. Wea. Rev. 131, 1485–1490. doi:10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO;2
Google Scholar
Whitaker, J. S. and Hamill, T. M. 2002. Ensemble data assimilation without perturbed observations. Mon. Wea. Rev. 130, 1913–1924. doi:10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2
Google Scholar
Wu, Y., Hu, D., Wu, M. and Hu, X. 2006. A Numerical-integration perspective on Gaussian filters. IEEE Trans. Signal Process. 54, 2910–2921. doi:10.1109/TSP.2006.875389
Google Scholar

A numerical integration-based Kalman filter for moderately nonlinear systems

Abstract

1. Introduction

2. Linking Bayesian filtering to Gaussian quadrature filters

3. Gaussian integration