Full article: The Rational SPDE Approach for Gaussian Random Fields With General Smoothness

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

A popular approach for modeling and inference in spatial statistics is to represent Gaussian random fields as solutions to stochastic partial differential equations (SPDEs) of the form $L^{β} u = W$ , where $W$ is Gaussian white noise, L is a second-order differential operator, and $β > 0$ is a parameter that determines the smoothness of u. However, this approach has been limited to the case $2 β \in N$ , which excludes several important models and makes it necessary to keep β fixed during inference. We propose a new method, the rational SPDE approach, which in spatial dimension $d \in N$ is applicable for any $β > d / 4$ , and thus remedies the mentioned limitation. The presented scheme combines a finite element discretization with a rational approximation of the function $x^{- β}$ to approximate u. For the resulting approximation, an explicit rate of convergence to u in mean-square sense is derived. Furthermore, we show that our method has the same computational benefits as in the restricted case $2 β \in N$ . Several numerical experiments and a statistical application are used to illustrate the accuracy of the method, and to show that it facilitates likelihood-based inference for all model parameters including β.

Supplementary materials for this article are available online.

Keywords:

1 Introduction

One of the main challenges in spatial statistics is to handle large datasets. A reason for this is that the computational cost for likelihood evaluation and spatial prediction is in general cubic in the number N of observations of a Gaussian random field. A tremendous amount of research has been devoted to coping with this problem and various methods have been suggested (see Heaton et al. Citation2019, for a recent review).

A common approach is to define an approximation u_h of a Gaussian random field u on a spatial domain $D$ via a basis expansion, (1.1) $u_{h} (s) = \sum_{j = 1}^{n} u_{j} φ_{j} (s), s \in D,$ (1.1) where $φ_{j} : D \to R$ are fixed basis functions and the vector $u = {(u_{1}, \dots, u_{n})}^{⊤} \sim N (0, Σ_{u})$ are stochastic weights. The computational effort can then be reduced by choosing $n ≪ N$ . However, methods based on such low-rank approximations tend to remove fine-scale variations of the process. For this reason, methods which instead exploit sparsity for reducing the computational cost have gained popularity in recent years. One can construct sparse approximations either of the covariance matrix of the measurements (Furrer, Genton, and Nychka Citation2006), or of the inverse of the covariance matrix (Datta et al. Citation2016). Alternatively, one can let the precision matrix $Σ_{u}^{- 1}$ of the weights in (1.1) be sparse, as in the stochastic partial differential equation (SPDE) approach by Lindgren, Rue, and Lindström (Citation2011), where usually $n \approx N$ . To increase the accuracy further, several combinations of the methods mentioned above have been considered (e.g., Sang and Huang Citation2012) and multiresolution approximations of the process have been exploited (Nychka et al. Citation2015; Katzfuss Citation2017). However, theoretical error bounds have not been derived for most of these methods, which necessitates tuning these approximations for each specific model.

In this work we propose a new class of approximations, whose members we refer to as rational SPDE approximations or rational approximations for short. Our approach is similar to some of the above methods in the sense that an expansion (1.1) with compactly supported basis functions is exploited. The main novelty is that neither the covariance matrix $Σ_{u}$ nor the precision matrix $Σ_{u}^{- 1}$ of the weights u is assumed to be sparse. The covariance matrix is instead a product $Σ_{u} = P Q^{- 1} P^{⊤}$ , where P and Q are sparse matrices and the sparsity pattern of P is a subset of that of Q. We show that the resulting approximation facilitates inference and prediction at the same computational cost as a Markov approximation with $Σ_{u}^{- 1} = Q$ , and at a higher accuracy.

For the theoretical framework of our approach, we consider a Gaussian random field on a bounded domain $D \subset R^{d}$ which can be represented as the solution u to the SPDE (1.2) $L^{β} u = W in D,$ (1.2) where $W$ is Gaussian white noise on $D$ , and $L^{β}$ is a fractional power of a second-order differential operator L which determines the covariance structure of u. Our rational approximations are based on two components: (i) a finite element method (FEM) with continuous and piecewise polynomial basis functions ${φ_{j}}_{j = 1}^{n}$ , and (ii) a rational approximation of the function $x^{- β}$ . We explain how to perform these two steps in practice to explicitly compute the matrices P and Q. Furthermore, we derive an upper bound for the strong mean-square error of the rational approximation. This bound provides an explicit rate of convergence in terms of the mesh size of the finite element discretization, which facilitates tuning the approximation without empirical tests for each specific model.Examples of random fields which can be expressed as solutions to SPDEs of the form (1.2) include approximations of Gaussian Matérn fields (Matérn Citation1960). Specifically, if $D : = R^{d}$ a zero-mean Gaussian Matérn field can be viewed as a solution u to (1.3) ${(κ^{2} - Δ)}^{β} (τ u) = W,$ (1.3) where Δ denotes the Laplacian (Whittle Citation1963). The constant parameters $κ, τ > 0$ determine the practical correlation range and the variance of u. The exponent β defines the smoothness parameter ν of the Matérn covariance function via the relation $2 β = ν + d / 2$ and, thus, the differentiability of the field. For applications, variance, range and differentiability typically are the most important properties of a Gaussian field. For this reason, the Matérn model is highly popular in spatial statistics and has become the standard choice for Gaussian process priors in machine learning (Rasmussen and Williams Citation2006). Since (1.3) is a special case of (1.2) we believe that the outcomes of this work will be of great relevance for many statistical applications, see also Section 7.

In contrast to covariance-based models, the SPDE approach additionally has the advantage that it allows for a number of generalizations of stationary Matérn fields including (i) nonstationary fields generated by nonstationary differential operators (e.g., Fuglstad et al. Citation2015), (ii) fields on more general domains such as the sphere (e.g., Lindgren, Rue, and Lindström Citation2011), and (iii) non-Gaussian models (e.g., Wallin and Bolin Citation2015).

Lindgren, Rue, and Lindström (Citation2011) showed that, in the case that $2 β \in N$ , one can construct accurate approximations of the form (1.1) for Gaussian Matérn fields on bounded domains $D ⊊ R^{d}$ , such that $Σ_{u}^{- 1}$ is sparse. To this end, (1.3) is considered on $D$ and the differential operator $κ^{2} - Δ$ is augmented with appropriate boundary conditions. The resulting SPDE is then solved approximately by means of a FEM. Due to the implementation in the R-INLA software, this approach has become widely used, see Bakka et al. (Citation2018) for a comprehensive list of recent applications.

However, the restriction $2 β \in N$ implies a significant limitation for the flexibility of the method. In particular, it is therefore not directly applicable to the important special case of exponential covariance ( $ν = 1 / 2$ ) on $R^{2}$ , where $β = 3 / 4$ . In addition, restricting the value of β complicates estimating the smoothness of the process from data. In fact, β typically is fixed when the method is used in practice, since identifying the value of $2 β \in N$ with the highest likelihood requires a separate estimation of all the other parameters in the model for each value of β. A common justification for fixing β is to argue that it is not practicable to estimate the smoothness of a random field from data. However, there are certainly applications for which it is feasible to estimate the smoothness. We provide an example of this in Section 7. Furthermore, having the correct smoothness of the model is particularly important for interpolation, and the fact that the Matérn model allows for estimating the smoothness from data was the main reason for why Stein (Citation1999) recommended the model.

The rational SPDE approach presented in this work facilitates an estimation of β from data by providing an approximation of u which is computable for all fractional powers $β > d / 4$ (i.e., $ν > 0$ ), where $d \in N$ is the dimension of the spatial domain $D \subset R^{d}$ . It thus enables to include this parameter in likelihood-based (or Bayesian) parameter estimation for both stationary and nonstationary models. Although the SPDE approach has been considered in the nonfractional case also for nonstationary models, Lindgren, Rue, and Lindström (Citation2011) showed convergence of the approximation only for the stationary model (1.3). Our analysis in Section 3 closes this gap since we consider the general model (1.2) which covers the nonstationary case and several other previously proposed generalizations of the Matérn model.

The structure of this article is as follows: We briefly review existing methods for the SPDE approach in the fractional case in Section 2. In Section 3, the rational SPDE approximation is introduced and a result on its strong convergence is stated. The procedure of applying the rational SPDE approach to statistical inference is addressed in Section 4. Section 5 contains numerical experiments which illustrate the accuracy of the proposed method. The identifiability of the parameters in the Matérn SPDE model (1.3) is discussed in Section 6, where we derive necessary and sufficient conditions for equivalence of the induced Gaussian measures. In Section 7, we present an application to climate data, where we consider fractional and nonfractional models for both stationary and nonstationary covariances. We conclude with a discussion in Section 8. Finally, the supplementary materials contain four appendices providing details about (A) the finite element discretization, (B) the convergence analysis, (C) a comparison with the quadrature method by Bolin, Kirchner, and Kovács (Citation2018a), and (D) the equivalence of Gaussian measures. The method developed in this work has been implemented in the R (R Core Team Citation2017) package rSPDE (Bolin Citation2019).

2 The SPDE Approach in the Fractional Case Until Now

A reason for why the approach by Lindgren, Rue, and Lindström (Citation2011) only works for integer values of $2 β$ is given by Rozanov (Citation1977), who showed that a Gaussian random field on $R^{d}$ is Markov if and only if its spectral density can be written as the reciprocal of a polynomial, $\tilde{S} (k) = {(2 π)}^{- d} {(\sum_{j = 0}^{m} b_{j} | | k | |^{2 j})}^{- 1}$ . Since the spectrum of a Gaussian Matérn field is (2.1) $S (k) = \frac{1}{{(2 π)}^{d}} \frac{1}{{(κ^{2} + | | k | |^{2})}^{2 β}}, k \in R^{d},$ (2.1) the precision matrix Q will not be sparse unless $2 β \in N$ . For $2 β \notin N$ , Lindgren, Rue, and Lindström (Citation2011) suggested to compute a Markov approximation by choosing $m = ⌈ 2 β ⌉$ and selecting the coefficients $b = {(b_{1}, \dots, b_{m})}^{⊤}$ so that the deviation between the spectral densities $\int_{R^{d}} w (k) {(S (k) - \tilde{S} (k))}^{2} d k$ is minimized. For this measure of deviation, w is some suitable weight function which should be chosen to get a good approximation of the covariance function. For the method to be useful in practice, the coefficients b_j should be given explicitly in terms of the parameters κ and ν. Because of this, Lindgren, Rue, and Lindström (Citation2011) proposed a weight function that enables an analytical evaluation of the integral, $\int_{κ^{2}}^{\infty} [z^{2 β} - \sum_{j = 0}^{m} b_{j} {(z - κ^{2})}^{j}]^{2} z^{- 2 m - 1 - θ} d z,$ where $θ > 0$ is a tuning parameter. By differentiating this integral with respect to the parameters and setting the differentials equal to zero, a system of linear equations is obtained, which can be solved to find the coefficients b. The resulting approximation depends strongly on θ, and one could use numerical optimization to find a good value of θ for a specific value of β, or use the choice $θ = 2 β - ⌊ 2 β ⌋$ , which approximately minimizes the maximal distance between the covariance functions (Lindgren, Rue, and Lindström Citation2011). This method was used for the comparison in Heaton et al. (Citation2019), and we will use it as a baseline method when analyzing the accuracy of the rational SPDE approximations in later sections.

Another Markov approximation based on the spectral density was proposed by Roininen et al. (Citation2018). These Markov approximations may be sufficient in certain applications; however, any approach based on the spectral density or the covariance function is difficult to generalize to models on more general domains than $R^{d}$ , nonstationary models, or non-Gaussian models. Thus, such methods cannot be used if the full potential of the SPDE approach should be kept for fractional values of β.

There is a rich literature on methods for solving deterministic fractional PDEs (e.g., Gavrilyuk, Hackbusch, and Khoromskij Citation2004; Bonito and Pasciak Citation2015; Jin et al. Citation2015; Nochetto, Otárola, and Salgado Citation2015), and some of the methods that have been proposed could be used to compute approximations of the solution to the SPDE (1.3). However, any deterministic problem becomes more sophisticated when randomness is included. Even methods developed specifically for sampling solutions to SPDEs like (1.3) may be difficult to use directly for statistical applications, when likelihood evaluations, spatial predictions or posterior sampling are needed. For instance, it has been unclear if the sampling approach by Bolin, Kirchner, and Kovács (Citation2018a), which is based on a quadrature approximation for an integral representation of the fractional inverse $L^{- β}$ , could be used for statistical inference. In Appendix C, we show that it can be viewed as a (computationally less efficient) version of the rational SPDE approximations developed in this work. Consequently, the results in Section 4 on how to use the rational SPDE approach for inference apply also to that method. In Section 5.1, we compare the performance of the two methods in practice within the scope of a numerical experiment.

3 Rational Approximations for Fractional SPDEs

In this section, we propose an explicit scheme for approximating solutions to a class of SPDEs including (1.3). Specifically, in Sections 3.1 and 3.2, we introduce the fractional order equation of interest as well as its finite element discretization. In Section 3.3, we propose a nonfractional equation, whose solution after specification of certain coefficients approximates the random field of interest. For this approximation, we provide a rigorous error bound in Section 3.4. Finally, in Section 3.5, we address the computation of the coefficients in the rational approximation.

3.1 The Fractional Order Equation

With the objective of allowing for more general Gaussian random fields than the Matérn class, we consider the fractional order EquationEquation (1(1.1) $u_{h} (s) = \sum_{j = 1}^{n} u_{j} φ_{j} (s), s \in D,$ (1.1) .2), where $D \subset R^{d}$ , $d \in {1, 2, 3}$ , is an open, bounded, convex polytope, with closure $\bar{D}$ , and $W$ is Gaussian white noise in $L_{2} (D)$ . Here and below, $L_{2} (D)$ is the Lebesgue space of square-integrable real-valued functions, which is equipped with the inner product ${(w, v)}_{L_{2} (D)} : = \int_{D} w (s) v (s) d s$ . The Sobolev space of order $k \in N$ is denoted by $H^{k} (D) : = {w \in L_{2} (D) : D^{γ} w \in L_{2} (D) \forall | γ | \leq k}$ and $H_{0}^{1} (D)$ is the subspace of $H^{1} (D)$ containing functions with vanishing trace.

We assume that the operator $L : D (L) \subset L_{2} (D) \to L_{2} (D)$ is a linear second-order differential operator in divergence form, (3.1) $L u = - \nabla \cdot (H \nabla u) + κ^{2} u,$ (3.1) whose domain of definition $D (L)$ depends on the choice of boundary conditions on $\partial D$ . Specifically, we impose homogeneous Dirichlet or Neumann boundary conditions and set $V = H_{0}^{1} (D)$ or $V = H^{1} (D)$ , respectively. Furthermore, we let the functions H and κ in (3.1) satisfy the following assumptions:

$H : \bar{D} \to R^{d \times d}$ is symmetric, Lipschitz continuous on the closure $\bar{D}$ , that is, $\exists C_{Lip} > 0 : | H_{i j} (s) - H_{i j} (s') | \leq C_{Lip} | | s - s' | |$ $\forall s, s' \in \bar{D}, \forall i, j \in {1, \dots, d},$

and uniformly positive definite, that is,

\exists C_{0} > 0 : ess \inf_{s \in D} ξ^{⊤} H (s) ξ \geq C_{0} | | ξ | |^{2} \forall ξ \in R^{d};

$κ : D \to R$ is essentially bounded, $κ \in L_{\infty} (D)$ . Furthermore, if Neumann boundary conditions are imposed, then $ess \inf_{s \in D} κ (s) \geq κ_{0} > 0$ holds.

If I.–II. are satisfied, the differential operator L in (3.1) induces a symmetric, continuous and coercive bilinear form a_L on V, (3.2) $a_{L} (u, v) : = {(H \nabla u, \nabla v)}_{L_{2} (D)} + {(κ^{2} u, v)}_{L_{2} (D)}, u, v \in V,$ (3.2) and its domain is given by $D (L) = H^{2} (D) \cap V$ . Furthermore, Weyl’s law (see, e.g., Davies Citation1995, Theorem 6.3.1) shows that the eigenvalues ${λ_{j}}_{j \in N}$ of the elliptic differential operator L in (3.1), in nondecreasing order, satisfy the spectral asymptotics (3.3) $λ_{j} ≂ j^{2 / d} as j \to \infty .$ (3.3)

Thus, existence and uniqueness of the solution u to (1.2) readily follow from Lemma 2.1 and Proposition 2.3 of Bolin, Kirchner, and Kovács (Citation2018a). We formulate this as a proposition.

Proposition 3.1

. Let L be given by (3.1) where H and κ satisfy the assumptions I.–II. above and assume $β > d / 4$ . Then (1.2) has a unique solution u in $L_{2} (Ω; L_{2} (D))$ .

The assumptions I.–II. on the differential operator L are satisfied, for example, by the Matérn operator $L = κ^{2} - Δ$ , in which case the condition $β > d / 4$ on the fractional exponent in (1.2) corresponds to a positive smoothness parameter ν, that is, to a nondegenerate field. Moreover, EquationEquation (1(1.1) $u_{h} (s) = \sum_{j = 1}^{n} u_{j} φ_{j} (s), s \in D,$ (1.1) .2) as considered in our work includes several previously proposed nonfractional nonstationary models as special cases, such as the nonstationary Matérn models by Lindgren, Rue, and Lindström (Citation2011), the models with locally varying anisotropy by Fuglstad et al. (Citation2015), and the barrier models by Bakka et al. (Citation2019). Thus, Proposition 3.1 shows existence and uniqueness of the fractional versions of all these models, which can be treated in practice by using the results of the following sections.

3.2 The Discrete Model

To discretize the problem, we assume that $V_{h} \subset V$ is a finite element space with continuous piecewise linear basis functions ${φ_{j}}_{j = 1}^{n_{h}}$ defined with respect to a triangulation $T_{h}$ of the domain $\bar{D}$ indexed by the mesh width $h : = \max_{T \in T_{h}} h_{T}$ , where $h_{T} : = diam (T)$ is the diameter of the element $T \in T_{h}$ . Furthermore, the family ${(T_{h})}_{h \in (0, 1)}$ of triangulations inducing the finite-dimensional subspaces ${(V_{h})}_{h \in (0, 1)}$ of V is supposed to be quasi-uniform, that is, there exist constants $C_{1}, C_{2} > 0$ such that $ρ_{T} \geq C_{1} h_{T}$ and $h_{T} \geq C_{2} h$ for all $T \in T_{h}$ and $h \in (0, 1)$ . Here, $ρ_{T} > 0$ is radius of largest ball inscribed in $T \in T_{h}$ .

The discrete operator $L_{h} : V_{h} \to V_{h}$ is defined in terms of the bilinear form a_L in (3.2) via the relation ${(L_{h} ϕ_{h}, ψ_{h})}_{L_{2} (D)} = a_{L} (ϕ_{h}, ψ_{h})$ which holds for all $ϕ_{h}, ψ_{h} \in V_{h}$ . We then consider the following SPDE on the finite-dimensional state space V_h, (3.4) $L_{h}^{β} u_{h} = W_{h} in D,$ (3.4) where $W_{h}$ is Gaussian white noise in V_h, that is, $W_{h} = \sum_{j = 1}^{n_{h}} ξ_{j} e_{j, h}$ for a basis ${e_{j, h}}_{j = 1}^{n_{h}}$ of V_h which is orthonormal in $L_{2} (D)$ and $ξ_{j} \sim N (0, 1)$ iid for $j = 1, \dots, n_{h}$ .

We note that the assumptions I.–II. from Section 3.1 on the functions H and κ combined with the convexity of $D$ imply that the operator L in (3.1) is $H^{2} (D)$ -regular, that is, for a right-hand side $f \in L_{2} (D)$ , the weak solution $u \in V$ to Lu = f satisfies $u \in H^{2} (D) \cap V$ , see, for example, Grisvard (Citation2011, Theorem 3.2.1.2) for the case of Dirichlet boundary conditions. By combining this observation with the spectral asymptotics (3.3) we see that the assumptions in Lemmas 3.1 and 3.2 of Bolin, Kirchner, and Kovács (Citation2018a) are satisfied (since then, in their notation, $r = s = q = 2$ and $α = 2 / d$ ) and we obtain an error estimate for the finite element approximation $u_{h} = L_{h}^{- β} W_{h}$ in (3.4) for all $β \in (d / 4, 1)$ . Furthermore, since their derivation requires only that $β > d / 4$ , we can formulate this result for all such values of β in the following proposition.

Proposition 3.2

. Suppose that $β > d / 4$ and that L is given by (3.1) where H and κ satisfy the assumptions I.–II. from Section 3.1. Let u, u_h be the solutions to (1.2) and (3.4), respectively. Then, there exists a constant C > 0 such that, for sufficiently small h, $| | u - u_{h} | |_{L_{2} (Ω; L_{2} (D))} \leq C h^{min {2 β - d / 2, 2}} .$

3.3 The Rational Approximation

Proposition 3.2 shows that the mean-square error between u and u_h in $L_{2} (D)$ converges to zero as $h \to 0$ . It remains to describe how an approximation of the random field u_h with values in the finite-dimensional state space V_h can be constructed.

For $β \in N$ one can use, for example, the iterated finite element method presented in Appendix A to compute u_h in (3.4) directly. In the following, we construct approximations of u_h if $β \notin N$ is a fractional exponent. For this purpose, we aim at finding a nonfractional equation (3.5) $P_{ℓ, h} u_{h, m}^{R} = P_{r, h} W_{h} in D,$ (3.5) such that $u_{h, m}^{R}$ is a good approximation of u_h, and where the operator $P_{j, h} : = p_{j} (L_{h})$ is defined in terms of a polynomial p_j of degree $m_{j} \in N_{0}$ , for $j \in {ℓ, r}$ . Since the so-defined operators $P_{ℓ, h}$ , $P_{r, h}$ commute, this will lead to a nested SPDE model of the form (3.6) $\begin{matrix} P_{ℓ, h} x_{h, m} = W_{h} in D, \\ u_{h, m}^{R} = P_{r, h} x_{h, m} in D, \end{matrix}$ (3.6) which facilitates efficient computations, see Section 4 and Appendix A.

Comparing the initial EquationEquation (1(1.1) $u_{h} (s) = \sum_{j = 1}^{n} u_{j} φ_{j} (s), s \in D,$ (1.1) .2) with (3.7) $P_{ℓ} u_{m}^{R} = P_{r} W in D,$ (3.7) where $P_{j} : = p_{j} (L), j \in {ℓ, r}$ , motivates the choice $m_{ℓ} - m_{r} \approx β$ to obtain a similar smoothness of $u_{m}^{R} = {(P_{r}^{- 1} P_{ℓ})}^{- 1} W$ and $u = L^{- β} W$ in (1.2). In practice, we first choose a degree $m \in N$ and then set (3.8) $\begin{matrix} m_{r} : = m and m_{ℓ} : = m + m_{β}, where \\ m_{β} : = \max {1, ⌊ β ⌋} . \end{matrix}$ (3.8)

In this case, the solution $u_{m}^{R}$ of (3.7) has the same smoothness as the solution v of the nonfractional equation $L^{⌊ β ⌋} v = W$ , if $β \geq 1$ , and as v in $L v = W$ , if $β < 1$ . Furthermore, for fixed h, the degree m controls the accuracy of the approximation $u_{h, m}^{R}$ .

We now turn to the problem of defining the nonfractional operators $P_{ℓ, h}$ and $P_{r, h}$ in (3.5). To compute u_h in (3.4) directly, one would have to apply the discrete fractional inverse $L_{h}^{- β}$ to the noise term $W_{h}$ on the right-hand side. Therefore, a first idea would be to approximate the function $x^{- β}$ on the spectrum of L_h by a rational function $\tilde{r}$ and to use $\tilde{r} (L_{h}) W_{h}$ as an approximation of u_h. This is, in essence, the approach proposed by Harizanov et al. (Citation2018) to find optimal solvers for the problem $L^{β} x = f$ , where L is a sparse symmetric positive definite matrix. However, the spectra of L and of L_h as $h \to 0$ (considered as operators on $L_{2} (D)$ ) are unbounded and, thus, it would be necessary to normalize the spectrum of L_h for every h, since it is not feasible to construct the rational approximation $\tilde{r}$ on an unbounded interval. We aim at an approximation $L_{h}^{- β} \approx p_{ℓ} {(L_{h})}^{- 1} p_{r} (L_{h})$ , where in practice the choice of $p_{ℓ}$ and p_r can be made independent of L_h and h. Thus, we pursue another idea.

In contrast to the differential operator L in (3.1), its inverse $L^{- 1} : L_{2} (D) \to L_{2} (D)$ is compact and, thus, the spectra of $L^{- 1}$ and of $L_{h}^{- 1}$ are bounded subsets of the intervals $J : = [0, λ_{1}^{- 1}]$ and $J_{h} : = [λ_{n_{h}, h}^{- 1}, λ_{1, h}^{- 1}] \subset J$ , respectively, where $λ_{1, h}, λ_{n_{h}, h} > 0$ are the smallest and the largest eigenvalue of L_h. This motivates a rational approximation r of the function $f (x) : = x^{β}$ on J and to deduce the nonfractional EquationEquation (3(3.1) $L u = - \nabla \cdot (H \nabla u) + κ^{2} u,$ (3.1) .5) from $u_{h, m}^{R} = r (L_{h}^{- 1}) W_{h}$ .

To achieve our envisaged choice (3.8) of different polynomial degrees $m_{ℓ}$ and m_r, we decompose f via $f (x) = \hat{f} (x) x^{m_{β}}$ , where $\hat{f} (x) : = x^{β - m_{β}}$ . We approximate $\hat{f} \approx \hat{r} : = \frac{q_{1}}{q_{2}}$ on J_h, where $q_{1} (x) : = \sum_{i = 0}^{m} c_{i} x^{i}$ and $q_{2} (x) : = \sum_{j = 0}^{m + 1} b_{j} x^{j}$ are polynomials of degree m and m + 1, respectively, and use $r (x) : = \hat{r} (x) x^{m_{β}}$ as an approximation for f. This construction leads (after expanding the fraction with x^m) to a rational approximation $\frac{p_{r}}{p_{ℓ}}$ of $x^{- β}$ , (3.9) $\begin{matrix} x^{- β} = f (x^{- 1}) \approx \hat{r} (x^{- 1}) x^{- m_{β}} = \frac{q_{1} (x^{- 1})}{q_{2} (x^{- 1}) x^{m_{β}}} \\ = \frac{\sum_{i = 0}^{m} c_{i} x^{m - i}}{\sum_{j = 0}^{m + 1} b_{j} x^{m + m_{β} - j}}, \end{matrix}$ (3.9) where the polynomials $p_{r} (x) : = \sum_{i = 0}^{m} c_{i} x^{m - i}$ and $p_{ℓ} (x) : = \sum_{j = 0}^{m + 1} b_{j} x^{m + m_{β} - j}$ are of degree m and $m + m_{β}$ , respectively, that is, (3.8) is satisfied.

The operators $P_{ℓ, h}, P_{r, h}$ in (3.5) are defined accordingly, (3.10) $\begin{matrix} P_{ℓ, h} : = p_{ℓ} (L_{h}) = \sum_{j = 0}^{m + 1} b_{j} L_{h}^{m + m_{β} - j}, \\ P_{r, h} : = p_{r} (L_{h}) = \sum_{i = 0}^{m} c_{i} L_{h}^{m - i} . \end{matrix}$ (3.10)

Their continuous counterparts in (3.7) are $P_{ℓ} : = p_{ℓ} (L)$ and $P_{r} : = p_{r} (L)$ . We note that, for (3.8) to hold, any choice $m_{2} \in {0, 1, \dots, m + m_{β}}$ would have been permissible for the polynomial degree of q₂, if m is the degree of q₁. The reason for setting $m_{2} = m + 1$ is that this is the maximal choice which is universally applicable for all values of $β > d / 4$ .

In the following we refer to $u_{h, m}^{R}$ in (3.5) with $P_{ℓ, h}, P_{r, h}$ defined by (3.10) as the rational SPDE approximation of degree m. We emphasize that this approximation relies (besides the finite element discretization) only on the rational approximation of the function $\hat{f}$ . In particular, no information about the operator L except for a lower bound of the eigenvalues is needed. In the Matérn case, we have $L = κ^{2} - Δ$ (with certain boundary conditions) and an obvious lower bound of the eigenvalues is therefore given by $κ^{2}$ .

3.4 An Error Bound for the Rational Approximation

In this subsection, we justify the approach proposed in Sections 3.2 and 3.3 by providing an upper bound for the strong mean-square error $| | u - u_{h, m}^{R} | |_{L_{2} (Ω; L_{2} (D))}$ . Here u and $u_{h, m}^{R}$ are the solutions of (1.2) and (3.5) and the rational approximation $u_{h, m}^{R}$ is constructed as described in Section 3.3, assuming that $\hat{r} = {\hat{r}}_{h}$ is the $L_{\infty}$ -best rational approximation of $\hat{f} (x) = x^{β - m_{β}}$ on the interval J_h for each h. This means that ${\hat{r}}_{h}$ minimizes the error in the supremum norm on J_h among all rational approximations of the chosen degrees in numerator and denominator. How such approximations can be computed is discussed in Section 3.5.

The theoretical analysis presented in Appendix B results in the following theorem, showing strong convergence of the rational approximation $u_{h, m}^{R}$ to the exact solution u.

Theorem 3.1.

Suppose that $β > d / 4$ and that L is given by (3.1) where H and κ satisfy the assumptions I.–II. from Section 3.1. Let u, $u_{h, m}^{R}$ be the solutions to (1.2) and (3.5), respectively. Then, there is a constant C > 0, independent of h, m, such that, for sufficiently small h, $\begin{matrix} | | u - u_{h, m}^{R} | |_{L_{2} (Ω; L_{2} (D))} \\ \leq C (h^{min {2 β - d / 2, 2}} + 1_{β \notin N} h^{min {2 (β - 1), 0 - d / 2}} e^{- 2 π \sqrt{| β - m_{β} | m}}) . \end{matrix}$

Remark 3.1.

To calibrate the accuracy of the rational approximation with the finite element error, one can choose $m \in N$ such that $e^{- 2 π \sqrt{| β - m_{β} | m}} \propto h^{2 \max {β, 1}}$ . The strong rate of mean-square convergence is then $\min {2 β - d / 2, 2}$ .

Remark 3.2

. If the functions H and κ in (3.1) are smooth, $H \in C^{\infty} {(\bar{D})}^{d \times d}$ and $κ \in C^{\infty} (\bar{D})$ (as, e.g., in the Matérn case) and if the domain $D$ has a smooth boundary, the higher-order strong mean-square convergence rate $\min {2 β - d / 2, p + 1}$ can be proven for a finite element method with continuous basis functions which are piecewise polynomial of degree at most $p \in N$ . Thus, for $β > 1$ , finite elements with p > 1 may be meaningful.

3.5 Computing the Coefficients of the Rational Approximation

As explained in Section 3.3, the coefficients ${c_{i}}_{i = 0}^{m}$ and ${b_{j}}_{j = 0}^{m + 1}$ needed for defining the operators $P_{ℓ, h}, P_{r, h}$ in (3.10) are obtained from a rational approximation $\hat{r} = {\hat{r}}_{h}$ of $\hat{f} (x) = x^{β - m_{β}}$ on J_h. For each h, this approximation can, for example, be computed with the second Remez algorithm (Remez Citation1934), which generates the coefficients of the $L_{\infty}$ -best approximation. The error analysis for the resulting approximation $u_{h, m}^{R}$ in (3.5) was performed in Section 3.4. Despite the theoretical benefit of generating the $L_{\infty}$ -best approximation, the Remez algorithm is often unstable in computations and, therefore, we use a different method in our simulations. However, versions of the Remez scheme were used, for example, by Harizanov et al. (Citation2018).

A simpler and computationally more stable way of choosing the rational approximation is, for instance, the Clenshaw–Lord Chebyshev–Padé algorithm (Baker and Graves-Morris Citation1996). To further improve the stability of the method, we will rescale the operator L so that its eigenvalues are bounded from below by one, which for the Matérn case corresponds to reformulating the SPDE (1.3) as ${(Id - κ^{- 2} Δ)}^{β} (\tilde{τ} u) = W$ and using $L = Id - κ^{- 2} Δ$ , where $Id$ denotes the identity on $L_{2} (D)$ and $\tilde{τ} : = κ^{2 β} τ$ .

To avoid computing a different rational approximation $\hat{r}$ for each finite element mesh width h, in practice we compute the approximation $\hat{r}$ only once on the interval $J_{*} : = [δ, 1]$ , where $δ \in (0, 1)$ should ideally be chosen such that $J_{h} \subset J_{*}$ for all considered mesh sizes h. For the numerical experiments later, we will use $δ = 10^{- (5 + m) / 2}$ when computing rational approximations of order m, which gives acceptable results for all values of β. As an example, the coefficients computed with the Clenshaw–Lord Chebyshev–Padé algorithm on $J_{*}$ for the case of exponential covariance on $R^{2}$ are shown in .

Table 1 Coefficients of the rational approximation for $β = 3 / 4$ (exponential covariance on $R^{2}$ ) for m = 1, 2, 3, normalized so that c_m = 1.

Display Table

4 Computational Aspects of the Rational Approximation

In the nonfractional case, the sparsity of the precision matrix for the weights u in (1.1) facilitates fast computation of samples, likelihoods, and other quantities of interest for statistical inference. The purpose of this section is to show that the rational SPDE approximation proposed in Section 3 preserves these good computational properties.

The representation (3.6) shows that $u_{h, m}^{R}$ can be seen as a Markov random field $x_{h, m}$ , transformed by the operator $P_{r, h}$ . Solving this latent model as explained in Appendix A, yields an approximation of the form (1.1), where $Σ_{u} = P_{r} Q^{- 1} P_{r}^{⊤}$ . Here $P_{ℓ}, P_{r} \in R^{n_{h} \times n_{h}}$ correspond to the discrete operators $P_{ℓ, h}$ and $P_{r, h}$ in (3.10), respectively. The matrix $Q : = P_{ℓ}^{⊤} C^{- 1} P_{ℓ}$ is sparse if the mass matrix C with respect to the finite element basis ${φ_{j}}_{j = 1}^{n_{h}}$ is replaced by the diagonal lumped mass matrix $\tilde{C}$ , see Appendix A. By defining $x \sim N (0, Q^{- 1})$ , we have $u = P_{r} x$ , which is a transformed Gaussian Markov random field (GMRF). Choosing x as a latent variable instead of u thus enables us to use all computational methods, which are available for GMRFs (see Rue and Held Citation2005), also for the rational SPDE approximation.

As an illustration, we consider the following hierarchical model, with a latent field u which is a rational approximation of (1.2), (4.1) $\begin{matrix} y_{i} = u (s_{i}) + ε_{i}, i = 1, \dots, N, \\ P_{ℓ} u = P_{r} W in D, \end{matrix}$ (4.1) where u is observed under iid Gaussian measurement noise $ε_{i} \sim N (0, σ^{2})$ . Given that one can treat this case, one can easily adapt the method to be used for inference in combination with MCMC or INLA (Rue, Martino, and Chopin Citation2009) for models with more sophisticated likelihoods.

Defining the matrix A with elements $A_{i j} = φ_{j} (s_{i})$ and the vector $y = {(y_{1}, \dots, y_{N})}^{⊤}$ gives us the discretized model (4.2) $\begin{matrix} y | x \sim N (A P_{r} x, σ^{2} I), \\ x \sim N (0, Q^{- 1}) . \end{matrix}$ (4.2)

In this way, the problem has been reduced to a standard latent GMRF model and a sparse Cholesky factorization of Q can be used for sampling x from $N (0, Q^{- 1})$ as well as to evaluate its log-density $log π_{x} (x)$ . Samples of u can then be obtained from samples of x via $u = P_{r} x$ . For evaluating the log-density of u, $log π_{u} (u)$ , the relation $log π_{u} (u) = log π_{x} (P_{r}^{- 1} u)$ can be exploited. Furthermore, the posterior distribution of x is given by $x | y \sim N (μ_{x | y}, Q_{x | y}^{- 1})$ , where $μ_{x | y} = σ^{- 2} Q_{x | y}^{- 1} P_{r}^{⊤} A^{⊤} y$ and $Q_{x | y} = Q + σ^{- 2} P_{r}^{⊤} A^{⊤} A P_{r}$ is a sparse matrix. Thus, simulations from the distribution of $x | y$ , and evaluations of the corresponding log-density $log π_{x | y} (x)$ , can be performed efficiently via a sparse Cholesky factorization of $Q_{x | y}$ . Finally, the marginal data log-likelihood is proportional to $\begin{matrix} log | P_{ℓ} | - \frac{1}{2} log | Q_{x | y} | - N log σ \\ - \frac{1}{2} (μ_{x | y}^{⊤} Q μ_{x | y} + σ^{- 2} {‖ y - A P_{r} μ_{x | y} ‖}^{2}) . \end{matrix}$

We therefore conclude that all computations needed for statistical inference can be facilitated by sparse Cholesky factorizations of $P_{ℓ}$ and $Q_{x | y}$ .

Remark 4.1

. From the specific form of the matrices $P_{ℓ}$ and $P_{r}$ addressed in Appendix A, we can infer that the number of nonzero elements in $Q_{x | y}$ for a rational SPDE approximation of degree m will be the same as the number of nonzero elements in $Q_{x | y}$ for the standard (nonfractional) SPDE approach with $β = m + m_{β}$ . Thus, also the computational cost will be comparable for these two cases.

Remark 4.2

. The matrix $Q_{x | y}$ can be ill-conditioned for m > 1 if a FEM approximation with piecewise linear basis functions is used. The numerical stability for large values of m can likely be improved by increasing the polynomial degree of the FEM basis functions, see also Remark 3.2.

5 Numerical Experiments

5.1 The Matérn Covariance on $R^{2}$

As a first test, we investigate the performance of the rational SPDE approach for Gaussian Matérn fields, without including the finite element discretization in space.

The spectral density S of the solution to (1.3) on $R^{2}$ is given by (2.1), whereas the spectral density for the non-discretized rational SPDE approximation $u_{m}^{R}$ in (3.7) is (5.1) $S_{R} (k) \propto κ^{4 β} {(\frac{\sum_{i = 1}^{m} c_{i} {(1 + κ^{- 2} | | k | |^{2})}^{m - i}}{\sum_{j = 1}^{m + 1} b_{j} {(1 + κ^{- 2} | | k | |^{2})}^{m + m_{β} - j}})}^{2} .$ (5.1)

We compute the coefficients as described in Section 3.5. To this end, we apply an implementation of the Clenshaw–Lord Chebyshev–Padé algorithm provided by the Matlab package Chebfun (Driscoll, Hale, and Trefethen Citation2014). By performing a partial fraction decomposition of (5.1), expanding the square, transforming to polar coordinates, and using the equality $\int_{0}^{\infty} \frac{ω J_{0} (ω h)}{(ω^{2} + a^{2}) (ω^{2} + b^{2})} d ω = \frac{1}{(b^{2} - a^{2})} (K_{0} (a h) - K_{0} (b h)),$ we are able to compute the corresponding covariance function $C_{R} (h)$ analytically. Here, J₀ is a Bessel function of the first kind and K₀ is a modified Bessel function of the second kind. To measure the accuracy of the approximation, we compare $C_{R} (h)$ to the true Matérn covariance function C(h) for different values of ν, where $κ = \sqrt{8 ν}$ is chosen such that the practical correlation range $r = \sqrt{8 ν} / κ$ equals one in all cases.

To put the accuracy of the rational approximation in context, the Markov approximation by Lindgren, Rue, and Lindström (Citation2011) and the quadrature method by Bolin, Kirchner, and Kovács (Citation2018a) are also shown. For the quadrature method, K = 12 quadrature nodes are used, which results in an approximation with the same computational cost as a rational approximation of degree m = 11, see Appendix C. shows the normalized error in the L₂-norm and the error with respect to $L_{\infty}$ -norm for different values of ν, both with respect to the interval $[0, 2]$ of length twice the practical correlation range, that is, ${(\frac{\int_{0}^{2} (C (h) - C_{a} (h))^{2} d h}{\int_{0}^{2} C {(h)}^{2} d h})}^{1 / 2} and \sup_{h \in [0, 2]} | C (h) - C_{a} (h) | .$

Fig. 1 The L₂- and $L_{\infty}$ -errors of the covariance functions for different values of ν for the different approximation methods. When ν = 1, all methods are exact.

Fig. 1 The L2- and L∞-errors of the covariance functions for different values of ν for the different approximation methods. When ν = 1, all methods are exact.

Here, C_a is the covariance function obtained by the respective approximation method.

Already for m = 3, the rational approximation performs better than both the Markov approximation and the quadrature approximation for all values of ν. It also decreases the error for the case of an exponential covariance by several orders of magnitude.

All methods are exact when ν = 1, since this is the nonfractional case. The Markov and rational methods show errors decreasing to zero as ν = 1, whereas the error of the quadrature method has a singularity at ν = 1. The performance of the quadrature method can be improved (although not the behavior near ν = 1) by increasing the number of quadrature nodes, see Appendix C. This is reasonable if the method is needed only for sampling from the model, but implementing this method for statistical applications, which require kriging or likelihood evaluations, is not feasible since the computational costs then are comparable to the standard SPDE approach with $β = K$ .

Finally, it should be noted that the Markov method also is exact at ν = 2 ( $β = 1.5$ ) since the spectrum of the process then is the reciprocal of a polynomial. The rational and quadrature methods cannot exploit this fact, since these approximations are based on the corresponding differential operator instead of the spectral density. This is the prize that has to be paid to formulate a method which works not only for the stationary Matérn fields but also for nonstationary and non-Gaussian models.

5.2 Computational Cost and the Finite Element Error

From the study in the previous subsection, we infer that the rational SPDE approach performs well for Matérn fields with arbitrary smoothness. However, as for the standard SPDE approach, we need to discretize the problem to be able to use the method in practice, for example, for inference. This induces an additional error source, which means that one should balance the two errors by choosing the degree m of the rational approximation appropriately with respect to the FEM error. A calibration based on the theoretical results has been suggested in Remark 3.1. In this section, we address this issue in practice and investigate the computational cost of the rational SPDE approximation.

As a test case, we compute approximations of a Gaussian Matérn field with unit variance and practical correlation range r = 0.1 on the unit square in $R^{2}$ . We assume homogeneous Neumann boundary conditions for the Matérn operator $κ^{2} - Δ$ in (1.3). For the discretization, we use a FEM with a nodal basis of continuous piecewise linear functions with respect to a mesh induced by a Delaunay triangulation of a regular lattice on the domain, with a total of n_h nodes. We consider three different meshes with $n_{h} = 57^{2}, 85^{2}, 115^{2}$ , which corresponds to the mesh sizes $h \approx r / 4, r / 6, r / 8$ .

To measure the accuracy, we compute the covariances between the midpoint of the domain ${\tilde{s}}_{*}$ and all other nodes in the lattice ${{\tilde{s}}_{j}}_{j = 1}^{n_{h}}$ for the Matérn field and the rational SPDE approximations and calculate the error similarly to the L₂-error in Section 5.1, ${(\frac{\sum_{j = 1}^{n_{h}} (C (| | {\tilde{s}}_{*} - {\tilde{s}}_{j} | |) - Σ_{j, *}^{u})^{2}}{\sum_{j = 1}^{n_{h}} C {(| | {\tilde{s}}_{*} - {\tilde{s}}_{j} | |)}^{2}})}^{1 / 2},$ where $Σ^{u} = P_{r} P_{ℓ}^{- 1} C P_{ℓ}^{- ⊤} P_{r}^{⊤}$ is the covariance matrix of u, see Appendix A. As a consequence of imposing boundary conditions, the error of the covariance is larger close to the boundary of the domain. However, we compare this error to the error of the nonfractional SPDE approach, which has the same boundary effects. As measures of the computational cost, we consider the time it takes to sample u and to evaluate $log | Q_{x | y} |$ for the model (4.2) with σ = 1, when y is a vector of noisy observations of the latent field at 1000 locations, drawn at random in the domain (a similar computation time is needed to evaluate $μ_{x | y}$ ).

The results for rational SPDE approximations of different degrees for the case $β = 3 / 4$ (exponential covariance) are shown in . Furthermore, we perform the same experiment when the standard (nonfractional) SPDE approach is used for $β = 2, 3, 4$ . As previously mentioned in Remark 4.1, the computational cost of the rational SPDE approximation of degree m should be comparable to the standard SPDE approach with $β = m + 1$ . validates this claim. One can also note that the errors of the rational SPDE approximations are similar to those of the standard SPDE approach, and that the reduction in error when increasing from m = 2 to m = 3 is small for all cases, indicating that the error induced by the rational approximation is small compared to the FEM error, even for a low degree m. This is also the reason for why, in particular in the pre-asymptotic region, one can in practice choose the degree m smaller than the value suggested in Remark 3.1, which gives $m \approx 6, 7, 8$ for $β = 3 / 4$ and the three considered finite element meshes.

Table 2 Covariance errors ( $\times 100$ ) and computing times in seconds ( $\times 100$ ) for sampling from the rational SPDE approximation u (with $β = 3 / 4$ ) and, in parentheses, for evaluating $log | Q_{x | y} |$ .

Display Table

6 Likelihood-Based Inference of Matérn Parameters

The computationally efficient evaluation of the likelihood of the rational SPDE approximation facilitates likelihood-based inference for all parameters of the Matérn model, including ν which until now had to be fixed when using the SPDE approach. In this section, we first discuss the identifiability of the model parameters and then investigate the accuracy of this approach within the scope of a simulation study.

6.1 Parameter Identifiability

A common reason for fixing the smoothness in Gaussian Matérn models is the result by Zhang (Citation2004) which shows that all three Matérn parameters cannot be estimated consistently under infill asymptotics. More precisely, for a fixed smoothness parameter ν, one cannot estimate both the variance of the field, $ϕ^{2}$ , and the scale parameter, κ, consistently. However, the quantity $ϕ^{2} κ^{2 ν}$ can be estimated consistently. The derivation of this result relies on the equivalence of Gaussian measures corresponding to Matérn fields (Zhang Citation2004, Theorem 2). The following theorem provides the analogous result for the Gaussian measures induced by the class of random fields specified via (1.3) on a bounded domain.

Theorem 6.1

. Let $D \subset R^{d}, d \in {1, 2, 3}$ , be bounded, open, and connected. For $i \in {1, 2}$ , let $β_{i} > d / 4$ , $κ_{i}, τ_{i} > 0$ , and consider the Gaussian measure $μ_{i} : = N (m_{i}, Q_{i}^{- 1})$ on $L_{2} (D)$ with mean $m_{i} : = 0$ and precision operator $Q_{i} : = τ_{i}^{2} L_{i}^{2 β_{i}}$ , where, for $i \in {1, 2}$ , the operators $L_{i} : = κ_{i}^{2} - Δ$ are augmented with the same homogeneous Neumann or Dirichlet boundary conditions. Then, μ₁ and μ₂ are equivalent if and only if $β_{1} = β_{2}$ and $τ_{1} = τ_{2}$ .

The proof can be found in Appendix D. Note that, for $D : = R^{d}$ , the parameter τ is related to the variance of the Gaussian random field via $ϕ^{2} = Γ (ν) {(τ^{2} Γ (2 β) {(4 π)}^{d / 2} κ^{2 ν})}^{- 1}$ . Thus, $τ^{- 2} \propto ϕ^{2} κ^{2 ν}$ , which means that Theorem 6.1 is in accordance with the result by Zhang (Citation2004). Since the Gaussian measures induced by the operators $L_{1} = τ {(κ_{1} + Δ)}^{β}$ and $L_{2} = τ {(κ_{2} + Δ)}^{β}$ are equivalent, we will not be able to consistently estimate κ under infill asymptotics. Yet, Theorem 6.1 suggests that it is possible to estimate τ and β consistently. In fact, with Theorem 6.1 available, it is straightforward to show that τ can be estimated consistently for a fixed ν by exploiting the same arguments as in the proof of (Zhang Citation2004, Theorem 3). However, it is beyond the scope of this article to show that both ν and τ can be estimated consistently which would also extend the results by Zhang (Citation2004).

6.2 Simulation Study

To numerically investigate the accuracy of likelihood-based parameter estimation using the rational SPDE approach, we again assume homogeneous Neumann boundary conditions for the Matérn operator in (1.3) and consider the standard latent model (4.1) from Section 4. We take the unit square as the domain of interest, set $σ^{2} = 0.1, ν = 0.5$ and choose κ and τ so that the latent field has variance $ϕ^{2} = 1$ and practical correlation range r = 0.2. For the FEM, we take a mesh based on a regular lattice on the domain, extended by twice the correlation range in each direction to reduce boundary effects, yielding a mesh with approximately 3500 nodes.

As a first test case, we use simulated data from the discretized model. We simulate 50 replicates of the latent field, each with corresponding noisy observations at 1000 measurement locations drawn at random in the domain. This results in a total of 50,000 observations, which we use to estimate the parameters of the model. We draw initial values for the parameters at random and then numerically optimize the likelihood of the model with the function fminunc in Matlab. This procedure is repeated 100 times, each time with a new simulated dataset.

As a second test case, we repeat the simulation study, but this time we simulate the data from a Gaussian Matérn field with an exponential covariance function instead of from the discretized model. For the estimation, we compute the rational SPDE approximation for the same finite element mesh as in the first test case. To investigate the effect of the mesh resolution on the parameter estimates, we also estimate the parameters using a uniformly refined mesh with twice as many nodes. The average computation time for evaluating the likelihood is approximately 0.16 sec for the coarse mesh and 0.4 sec for the fine mesh. This computation time is affine with respect to the number of replicates, and with only one replicate it is 0.09 sec for the coarse mesh and 0.2 sec for the fine mesh.

The results of the parameter estimation can be seen in , where the true parameter values are shown together with the mean and standard deviations of the 100 estimates for each case. Notably, we are able to estimate all parameters accurately in the first case. For the second case, the finite element discretization seems to induce a small bias, especially for the nugget estimate ( $σ^{2}$ ) that depends on the resolution of the mesh. The bias in the nugget estimate is not surprising since the increased nugget compensates for the FEM error. The bias could be decreased by choosing the mesh more carefully, also taking the measurement locations into account. In practice, however, this bias will not be of great importance, since the optimal nugget for the discretized model should be used.

Table 3 Results of the parameter estimation.

Display Table

It should be noted that there are several other methods for decreasing the computational cost of likelihood-based inference for stationary Matérn models. The major advantage of the rational SPDE approach is that it is directly applicable to more complicated nonstationary models, which we will use in the next section when analyzing real data.

7 Application

In this section, we illustrate for the example of a climate reanalysis dataset how the rational SPDE approach can be used for spatial modeling.

Climate reanalysis data are generated by combining a climate model with observations to obtain a description of the recent climate. We use reanalysis data generated with the Experimental Climate Prediction Center Regional Spectral Model (ECPC-RSM) which was originally prepared for the North American Regional Climate Change Assessment Program (NARCCAP) by means of NCEP/DOE Reanalysis (Mearns et al. Citation2009, 2014). As variable we consider average summer precipitation over the conterminous United States for a 26-year period from 1979 to 2004. The average value for each grid cell and year is computed as the average of the corresponding daily values for the days in June, July, and August. To obtain data which can be modeled by a Gaussian distribution, we follow Genton and Kleiber (Citation2015) and transform the data by taking the cube root. We then subtract the mean over the 26 years from each grid cell so that we can assume that the data has zero mean and focus on the correlation structure of the residuals. The resulting residuals for the year 1979 are shown in .

Fig. 2 Average summer precipitation residuals (in cm) for 1979 and the FEM mesh.

The 4112 observed residuals for each year are modeled as independent realizations of a zero-mean Gaussian random field with a nugget effect. That is, the measurement Y_ij at spatial location $s_{i}$ for year j is modeled as $Y_{i j} = u_{j} (s_{i}) + ε_{i j}$ , where $ε_{i j} \sim N (0, σ^{2})$ are independent, and ${u_{j} (s)}_{j}$ are independent realizations of a zero-mean Gaussian random field $u (s)$ . The analysis of Genton and Kleiber (Citation2015) revealed that an exponential covariance model is suitable for a subset of this dataset. Because of this, a natural first choice is to use a stationary Matérn model (1.3), either with $β = 0.75$ (exponential covariance) or with a general β which we estimate from the data. However, since we have data for a larger spatial region than Genton and Kleiber (Citation2015), one would suspect that a nonstationary model for $u (s)$ might be needed. The standard nonstationary model for the SPDE approach, as first suggested by Lindgren, Rue, and Lindström (Citation2011) and used in many applications since then, is (7.1) ${(κ {(s)}^{2} - Δ)}^{β} (τ (s) u (s)) = W (s), s \in D,$ (7.1) where β = 1 is fixed. Until now, it has not been possible to use the model (7.1) with fractional smoothness. Therefore, our main question is now: What is more important for this data—the fractional smoothness β or the nonstationary parameters? We thus consider four different SPDE models for $u (s)$ . Two of them are nonfractional models, where β = 1 is fixed, and for the other two (fractional) models, we estimate the fractional order β jointly with the other parameters from the data. For both cases, we consider stationary Matérn and nonstationary models, where the latter are formulated via (7.1) with $log κ (s) = κ_{0} + κ_{a} ψ_{a} (s) + \sum_{i, j = 1}^{2} \sum_{k, ℓ = 1}^{2} κ_{i j}^{k ℓ} ψ_{i}^{k} ({\tilde{s}}_{1}) ψ_{j}^{ℓ} ({\tilde{s}}_{2}),$ and the same model is used for $τ (s)$ . Here, $ψ_{j}^{1} (\tilde{s}) : = sin (j π \tilde{s})$ , $ψ_{j}^{2} (\tilde{s}) : = cos (j π \tilde{s}), ψ_{a} (s)$ is the altitude at location s, and $\tilde{s} = ({\tilde{s}}_{1}, {\tilde{s}}_{2})$ denotes the spatial coordinate after rescaling so that the observational domain is mapped to the unit square. Thus, $log κ (s)$ and $log τ (s)$ are modeled by the altitude covariate and 16 additional Fourier basis functions to capture large-scale trends in the parameters. The altitude covariate and the eight Fourier basis functions ${ψ_{1}^{k} ({\tilde{s}}_{1}) ψ_{j}^{ℓ} ({\tilde{s}}_{2}) : j, k, ℓ = 1, 2}$ are shown in .

Fig. 3 Nine basis functions modeling the parameters for the nonstationary models.

We discretize each model with respect to the finite element mesh shown in , assuming homogeneous Neumann boundary conditions. The mesh has 5021 nodes and was computed using R-INLA (Lindgren and Rue Citation2015). For the fractional models, we set m = 1 in the rational approximation and, for each model, the model parameters are estimated by numerical optimization of the log-likelihood as described in Section 4.

The log-likelihood values for the four models can be seen in . The parameter estimates for the stationary nonfractional ( $β = ν = 1$ ) model are $κ = 0.67, τ = 5.44$ , and $σ = 0.014$ , which implies a standard deviation $ϕ = 0.077$ and a practical range $ρ = 4.21$ . The estimates for the fractional model are $κ = 0.20, τ = 10.58, σ = 0.012$ , and $β = 0.72$ , corresponding to $ϕ = 0.081, ρ = 9.21$ , and a smoothness parameter $ν = 0.44$ . We note that the fractional model has a longer correlation range. This is likely to be caused by the nonfractional model underestimating the range ρ to compensate for the wrong local behavior of the covariance function induced by the smoothness parameter ν = 1.

Table 4 Model-dependent results for (i) the log-likelihood, (ii) the pseudo cross-validation scores (RMSE, CRPS, LS, each $\times 100$ ) averaged over ten replicates, and (iii) the computational time for one evaluation of the likelihood averaged over 100 computations.

Display Table

shows the estimated marginal standard deviation $ϕ (s)$ for the two nonstationary models (computed using the estimates of the parameters for $κ (s)$ and $τ (s)$ ) and 0.7 contours of the correlation function for selected locations in the domain. The estimate of β for the nonstationary fractional model is 0.723. Also for the nonstationary models, we observe a slightly longer practical correlation range $ρ (s)$ for the fractional model.

Fig. 4 Estimated marginal standard deviations (top row) and contours of 0.7 correlation of the correlation function for selected locations (bottom row), for the fractional (left column) and β = 1 (right column) models.

$Fig. 4 Estimated marginal standard deviations (top row) and contours of 0.7 correlation of the correlation function for selected locations (bottom row), for the fractional (left column) and β = 1 (right column) models.$

To investigate the predictive accuracy of the models, a pseudo cross-validation study is performed. We choose 10% of the spatial observation locations at random, and use the corresponding observations for each year to predict the values at the remaining locations. The accuracy of the four models is measured by the root mean square error (RMSE), the average continuous ranked probability score (CRPS), and the average log-score (LS). This procedure is repeated ten times, where in each iteration new locations are chosen at random to base the predictions on. The average scores for the ten iterations are shown in . Recall that low RMSE and CRPS values, respectively, high LS values correspond to good scores.

We observe that the predictive performance of the nonstationary nonfractional (β = 1) model is similar to the stationary fractional model in terms of CRPS, and actually worse in terms of RMSE. This clearly indicates that the data should be analyzed by a fractional model. Although the nonstationary fractional model has a better performance in terms of CRPS and LS than the stationary fractional model, the difference is quite small given that the nonstationary model has 38 parameters, compared to 4 for the stationary model. Thus, the fractional smoothness seems to be the most important aspect for this data. The fact that the rational SPDE approach allows us to make these comparisons and to verify the smoothness parameter, for stationary and nonstationary models, is one of its most important features.

8 Discussion

We have introduced the rational SPDE approach providing a new type of computationally efficient approximations for a class of Gaussian random fields. These are based on an extension of the SPDE approach by Lindgren, Rue, and Lindström (Citation2011) to models with general second-order differential operators of arbitrary order $β > d / 4$ . For these approximations, explicit rates of strong convergence have been derived and we have shown how to calibrate the degree of the rational approximation with the mesh size of the FEM to achieve these rates. The results can also be combined with the results in Bolin et al. (Citation2018b) to obtain explicit rates of weak convergence (convergence of functionals of the random field).

Our approach can, for example, be used to approximate stationary Matérn fields with general smoothness, and it is also directly applicable to more complicated nonstationary models, where the covariance function may be unknown. A general fractional order of the differential operator opens up for new applications of the SPDE approach, such as to Gaussian fields with exponential covariances on $R^{2}$ . For the Matérn model and its extensions, it furthermore facilitates likelihood-based (or Bayesian) inference of all model parameters. The specific structure of the approximation then in turn enables a combination with INLA or MCMC in situations where the Gaussian model is a part of a more complicated non-Gaussian hierarchical model.

We have illustrated the rational SPDE approach for stationary and nonstationary Matérn models. A topic for future research is to apply the approach to other random field models in statistics which are difficult to approximate by GMRFs, such as to models with long-range dependence (Lilly et al. Citation2017) based on the fractional Brownian motion. Another topic for future research is to modify the fractional SPDE approach by replacing the FEM basis by a multiresolution basis and to compare this approach to other multiresolution approaches such as (Katzfuss Citation2017). Finally, it is also of interest to extend the method to non-Gaussian versions of the SPDE-based Matérn models (Wallin and Bolin Citation2015), since the Markov approximation considered by Wallin and Bolin (Citation2015) is only computable under the restrictive requirement $β \in N$ .

Supplementary Materials

Appendices: The four appendices of the manuscript. (supplementary.pdf)

R code: Code for replicating the results of the application considered in Section 7. (code.zip)

Supplemental material

Supplemental Material

Download Zip (1.4 MB)

Acknowledgments

The authors thank Stig Larsson, Finn Lindgren, the editor, and the anonymous reviewer for valuable comments on the article, which led to an improvement of the results and the presentation.

Additional information

Funding

This work has been supported in part by the Swedish Research Council under grant no. 2016-04187 and the Knut and Alice Wallenberg Foundation (KAW 20012.0067).

References

Baker, G. A., Jr., and Graves-Morris, P. (1996), Padé Approximants, Encyclopedia of Mathematics and Its Applications (Vol. 59, 2nd ed.), Cambridge: Cambridge University Press.
Google Scholar
Bakka, H., Rue, H., Fuglstad, G.-A., Riebler, A., Bolin, D., Illian, J., Krainski, E., Simpson, D., and Lindgren, F. (2018), “Spatial Modelling With R-INLA: A Review,” Wiley Interdisciplinary Reviews: Computational Statistics 10, e1443. DOI: 10.1002/wics.1443.
Google Scholar
Bakka, H., Vanhatalo, J., Illian, J. B., Simpson, D., and Rue, H. (2019), “Non-Stationary Gaussian Models With Physical Barriers,” Spatial Statistics, 29, 268–288. DOI: 10.1016/j.spasta.2019.01.002.
Web of Science ®Google Scholar
Bolin, D. (2019), “rSPDE: Rational Approximations of Fractional SPDEs,” R Package Version 0.4.6, available at https://CRAN.R-project.org/package=rSPDE.
Google Scholar
Bolin, D., Kirchner, K., and Kovács, M. (2018a), “Numerical Solution of Fractional Elliptic Stochastic PDEs With Spatial White Noise,” IMA Journal of Numerical Analysis. DOI: 10.1093/imanum/dry091.
Web of Science ®Google Scholar
Bolin, D., Kirchner, K., and Kovács, M. (2018b), “Weak Convergence of Galerkin Approximations for Fractional Elliptic Stochastic PDEs With Spatial White Noise,” BIT Numerical Mathematics, 58, 881–906. DOI: 10.1007/s10543-018-0719-8.
Web of Science ®Google Scholar
Bonito, A., and Pasciak, J. E. (2015), “Numerical Approximation of Fractional Powers of Elliptic Operators,” Mathematics of Computation, 84, 2083–2110. DOI: 10.1090/S0025-5718-2015-02937-8.
Web of Science ®Google Scholar
Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016), “Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets,” Journal of the American Statistical Association, 111, 800–812. DOI: 10.1080/01621459.2015.1044091.
PubMed Web of Science ®Google Scholar
Davies, E. B. (1995), Spectral Theory and Differential Operators, Cambridge Studies in Advanced Mathematics (Vol. 42), Cambridge: Cambridge University Press.
Google Scholar
Driscoll, T. A., Hale, N., and Trefethen, L. N., eds. (2014), Chebfun Guide, Oxford: Pafnuty Publications.
Google Scholar
Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2015), “Does Non-Stationary Spatial Data Always Require Non-Stationary Random Fields?,” Spatial Statistics, 14, 505–531. DOI: 10.1016/j.spasta.2015.10.001.
Web of Science ®Google Scholar
Furrer, R., Genton, M. G., and Nychka, D. (2006), “Covariance Tapering for Interpolation of Large Spatial Datasets,” Journal of Computational and Graphical Statistics, 15, 502–523. DOI: 10.1198/106186006X132178.
Web of Science ®Google Scholar
Gavrilyuk, I. P., Hackbusch, W., and Khoromskij, B. N. (2004), “Data-Sparse Approximation to the Operator-Valued Functions of Elliptic Operator,” Mathematics of Computation, 73, 1297–1324. DOI: 10.1090/S0025-5718-03-01590-4.
Web of Science ®Google Scholar
Genton, M. G., and Kleiber, W. (2015), “Cross-Covariance Functions for Multivariate Geostatistics,” Statistical Science, 30, 147–163. DOI: 10.1214/14-STS487.
Web of Science ®Google Scholar
Grisvard, P. (2011), Elliptic Problems in Nonsmooth Domains, Classics in Applied Mathematics (Vol. 69), Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM).
Google Scholar
Harizanov, S., Lazarov, R., Margenov, S., Marinov, P., and Vutov, Y. (2018), “Optimal Solvers for Linear Systems With Fractional Powers of Sparse SPD Matrices,” Numerical Linear Algebra With Applications, 25, e2167. DOI: 10.1002/nla.2167.
Web of Science ®Google Scholar
Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., Nychka, D. W., Sun, F., and Zammit-Mangion, A. (2019), “A Case Study Competition Among Methods for Analyzing Large Spatial Data,” Journal of Agricultural, Biological and Environmental Statistics, 24, 398–425. DOI: 10.1007/s13253-018-00348-w.
PubMed Web of Science ®Google Scholar
Jin, B., Lazarov, R., Pasciak, J., and Rundell, W. (2015), “Variational Formulation of Problems Involving Fractional Order Differential Operators,” Mathematics of Computation, 84, 2665–2700. DOI: 10.1090/mcom/2960.
Web of Science ®Google Scholar
Katzfuss, M. (2017), “A Multi-Resolution Approximation for Massive Spatial Datasets,” Journal of the American Statistical Association, 112, 201–214. DOI: 10.1080/01621459.2015.1123632.
Web of Science ®Google Scholar
Lilly, J. M., Sykulski, A. M., Early, J. J., and Olhede, S. C. (2017), “Fractional Brownian Motion, the Matérn Process, and Stochastic Modeling of Turbulent Dispersion,” Nonlinear Processes in Geophysics, 24, 481–514. DOI: 10.5194/npg-24-481-2017.
Web of Science ®Google Scholar
Lindgren, F., and Rue, H. (2015), “Bayesian Spatial Modelling With R-INLA,” Journal of Statistical Software, 63, 1–25. DOI: 10.18637/jss.v063.i19.
Web of Science ®Google Scholar
Lindgren, F., Rue, H., and Lindström, J. (2011), “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach” (with discussion), Journal of the Royal Statistical Society, Series B, 73, 423–498. DOI: 10.1111/j.1467-9868.2011.00777.x.
Google Scholar
Matérn, B. (1960), “Spatial Variation: Stochastic Models and Their Application to Some Problems in Forest Surveys and Other Sampling Investigations,” Meddelanden Från Statens Skogsforskningsinstitut, 49, 144.
Google Scholar
Mearns, L. O., Gutowski, W., Jones, R., Leung, R., McGinnis, S., Nunes, A., and Qian, Y. (2009), “A Regional Climate Change Assessment Program for North America,” Eos, Transactions American Geophysical Union, 90, 311–311. DOI: 10.1029/2009EO360002.
Google Scholar
Mearns, L. O., McGinnis, S., Arritt, R., Biner, S., Duffy, P., Gutowski, W., Held, I., Jones, R., Leung, R., Nunes, A., Snyder, M., Caya, D., Correia, J., Flory, D., Herzmann, D., Laprise, R., Moufouma-Okia, W., Takle, G., Teng, H., Thompson, J., Tucker, S., Wyman, B., Anitha, A., Buja, L., Macintosh, C., McDaniel, L., O’Brien, T., Qian, Y., Sloan, L., Strand, G., and Zoellick, C. (2007, updated 2014), The North American Regional Climate Change Assessment Program Dataset, Boulder, CO: National Center for Atmospheric Research Climate Data Gateway Data Portal.
Google Scholar
Nochetto, R. H., Otárola, E., and Salgado, A. J. (2015), “A PDE Approach to Fractional Diffusion in General Domains: A Priori Error Analysis,” Foundations of Computational Mathematics, 15, 733–791. DOI: 10.1007/s10208-014-9208-x.
Web of Science ®Google Scholar
Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., and Sain, S. (2015), “A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets,” Journal of Computational and Graphical Statistics, 24, 579–599. DOI: 10.1080/10618600.2014.914946.
Web of Science ®Google Scholar
R Core Team (2017), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Rasmussen, C. E., and Williams, C. K. I. (2006), Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning, Cambridge, MA: MIT Press.
Google Scholar
Remez, E. Y. (1934), “Sur la Détermination des Polynômes d’Approximation de Degré Donnée,” Communications de la Societé Mathématique de Kharkov, 10, 41–63.
Google Scholar
Roininen, L., Lasanen, S., Orispää, M., and Särkkä, S. (2018), “Sparse Approximations of Fractional Matérn Fields,” Scandinavian Journal of Statistics, 45, 194–216. DOI: 10.1111/sjos.12297.
Web of Science ®Google Scholar
Rozanov, J. A. (1977), “Markov Random Fields and Stochastic Partial Differential Equations,” Sbornik: Mathematics 32, 515–534. DOI: 10.1070/SM1977v032n04ABEH002404.
Web of Science ®Google Scholar
Rue, H., and Held, L. (2005), Gaussian Markov Random Fields: Theory and Applications, Monographs on Statistics and Applied Probability (Vol. 104), Boca Raton, FL: Chapman & Hall/CRC.
Google Scholar
Rue, H., Martino, S., and Chopin, N. (2009), “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations” (with discussion), Journal of the Royal Statistical Society, Series B, 71, 319–392. DOI: 10.1111/j.1467-9868.2008.00700.x.
Web of Science ®Google Scholar
Sang, H., and Huang, J. Z. (2012), “A Full Scale Approximation of Covariance Functions for Large Spatial Data Sets,” Journal of the Royal Statistical Society, Series B, 74, 111–132. DOI: 10.1111/j.1467-9868.2011.01007.x.
Google Scholar
Stein, M. L. (1999), Interpolation of spatial data: Some Theory for Kriging, Springer Series in Statistics, New York: Springer-Verlag.
Google Scholar
Wallin, J., and Bolin, D. (2015), “Geostatistical Modelling Using Non-Gaussian Matérn Fields,” Scandinavian Journal of Statistics, 42, 872–890. DOI: 10.1111/sjos.12141.
Web of Science ®Google Scholar
Whittle, P. (1963), “Stochastic Processes in Several Dimensions,” Bulletin of the International Statistical Institute, 40, 974–994.
Google Scholar
Zhang, H. (2004), “Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics,” Journal of the American Statistical Association, 99, 250–261. DOI: 10.1198/016214504000000241.
Web of Science ®Google Scholar

The Rational SPDE Approach for Gaussian Random Fields With General Smoothness

Abstract

1 Introduction

2 The SPDE Approach in the Fractional Case Until Now