Search in:

Statistical Theory and Related Fields Volume 5, 2021 - Issue 1

Submit an article Journal homepage

Free access

285

Views

CrossRef citations to date

Altmetric

Listen

Articles

Nonlinear prediction via Hermite transformation

Tucker McElroya U.S. Census Bureau, Washington, DC, USACorrespondence[email protected]
View further author information

Srinjoy Dasb University of California San Diego, La Jolla, CA, USAView further author information

Pages 49-54 | Received 25 Oct 2019, Accepted 24 Nov 2020, Published online: 17 Dec 2020

Cite this article
https://doi.org/10.1080/24754269.2020.1856589
CrossMark

In this article

ABSTRACT
1. Introduction
2. Nonlinear prediction
3. Comparing linear and nonlinear prediction
Disclaimer
Disclosure statement
Additional information
Footnotes
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

General prediction formulas involving Hermite polynomials are developed for time series expressed as a transformation of a Gaussian process. The prediction gains over linear predictors are examined numerically, demonstrating the improvement of nonlinear prediction.

Keywords:

Gaussian transformation
Hermite polynomial
nonlinear forecasting
time series

1. Introduction

The general prediction problem is to compute the best predictor of a random variable Y given a data vector $\underline{X}$ , where a joint distribution is presumed to exist for Y and $\underline{X}$ . If we define ‘best’ according to mean squared error (MSE) loss,Footnote¹ the best predictor (when the random variables are square integrable) is the conditional expectation $E [Y | \underline{X}]$ , which in the case of Gaussian variables is a linear function of $\underline{X}$ . This linear function is completely computable in terms of first and second moments of the joint vector $(Y, \underline{X})$ , as discussed in Brockwell and Davis (Citation2013, Chapter 2). The problem can also be generalised to projection on infinite data sets, which arise in forecasting and signal extraction problems.

The theory for linear predictors is very well understood and is commonly applied to non-Gaussian data because it is simple to compute. Nevertheless, there can be a substantial predictive loss when non-Gaussian features are present in the data, such as asymmetry and excess kurtosis (Brockett et al., Citation1988; Maravall, Citation1983). A common technique for handling such raw time series data is to apply a transformation that reduces asymmetry and kurtosis, thereby generating cumulants in the transformed space that more closely resemble Gaussian cumulants. Box-Cox transforms are an example of such functions, and are typically identified through exploratory analysis or via metadata; see discussion in McElroy (Citation2016).

This paper provides exact formulas for non-linear prediction in scenarios where the non-Gaussian data process can be expressed as a univariate transformation of some Gaussian process. For some special cases, such as the log-normal distribution, exact formulas are already available for nonlinear predictors; here the general case is developed. The main result of the paper (Section 2) is an expansion of the conditional expectation in terms of Hermite coefficients of the transformation function – an idea that was utilised in Janicki and McElroy (Citation2016) to model marginal quantiles. Here this technique is used to derive analytical expressions for predictors, along with the MSE; the solution is given as an explicit function of the Hermite coefficients of the transformation function, the various Hermite polynomials evaluated at the linear predictor, and further weights explicitly determined from the mean squared error of the linear predictor.

These results are general, in the sense that they can be applied to diverse contexts in statistics, such as linear models, spatial statistics, and multivariate analysis. But our applications are focussed on time series, and in particular on forecasting problems. In forecasting, the data vector $\underline{X} = {[X_{1}, \dots, X_{T}]}^{'}$ is a sample of size T from a time series ${X_{t}}$ , and $Y = X_{T + 1}$ represents the next unobserved value of the process. Backcasting involves setting $Y = X_{0}$ , and missing value problems can similarly be addressed by letting $Y = X_{t}$ and $\underline{X} = {[X_{1}, \dots, X_{t - 1}, X_{t + 1}, \dots, X_{T}]}^{'}$ ; see McElroy and McCracken (Citation2017) for a recent treatment. These facets of the general methodology are developed in Section 2, and numerical comparisons are given in Section 3. The proofs are in an Appendix.

2. Nonlinear prediction

Our goal is to compute the minimal mean squared error (MSE) estimate of Y given data $\underline{X} = {[X_{1}, \dots, X_{T}]}^{'}$ (finite-sample case) or $\underline{X} = {X_{s} : s \leq T}$ (semi-infinite sample case). This estimate is the conditional expectation $E [Y | \underline{X}]$ , denoted $\hat{Y}$ for short. We presume that there exists an invertible function f such that $Z_{t} = f (X_{t})$ yields a Gaussian process ${Z_{t}}$ . Moreover, $f (Y)$ is a Gaussian random variable whose joint distribution with the process ${Z_{t}}$ is known. In the case of forecasting, backcasting, or missing value imputation, $f (Y) = Z_{t}$ for some t.

In the following development, it is important to impose that $f (Y)$ be standard normal, although this would rarely be the case if f is obtained by exploratory analysis. (For example, if $Y = X_{T + 1}$ and $f (x) = \log (x)$ by exploratory analysis, it would rarely be the case that $\log (X_{T + 1})$ would have unit variance.) Let $Z_{⋆}$ denote the standardisation of $f (Y)$ , i.e., $Z_{⋆} = (f (Y) - E [f (Y)]) / \sqrt{Var [f (Y)]}$ . We will define g as the inverse map from $Z_{⋆}$ to Y, so that $\begin{aligned} g (x) = f^{- 1} (x \sqrt{Var [f (Y)]} + E [f (Y)]) . \end{aligned}$ The mapping $Y = g (Z_{⋆})$ allows us to obtain a Hermite expansion of g; if the marginal distribution of $Z_{⋆}$ were non-Gaussian, we could instead have recourse to the Appell polynomials (Varma, Citation1951).

The main result of the paper is an expression for $E [Y | \underline{X}]$ in terms of $E [Z_{⋆} | \underline{Z}]$ , with $\underline{Z} = {[Z_{1}, \dots, Z_{T}]}^{'}$ . This is of interest because such a Gaussian conditional expectation has a well-known linear formula; see Chapter 4 of McElroy and Politis (Citation2020). In applications, one might transform the data by applying f, then model the Gaussian process, and finally compute $E [Z_{⋆} | \underline{Z}]$ by plugging into the linear formulas. Then the main formula of this paper can be used to obtain $E [Y | \underline{X}]$ by inserting $E [Z_{⋆} | \underline{Z}]$ and its MSE, denoted by V, which would also be available in applications. In particular, (1) $\begin{aligned} V = E [{(Z_{⋆} - E [Z_{⋆} | \underline{Z}])}^{2}] \end{aligned}$ (1) and is given by formulas in McElroy and Politis (Citation2020); also, see (Equation11(11) $\begin{aligned} V & = γ_{Z} (0) - [γ_{Z} (1), \dots, γ_{Z} (T)] Γ_{Z}^{- 1} \\ \times [γ_{Z} (1), \dots, γ_{Z} (T)]^{'} \end{aligned}$ (11) ) below. We define the space $L_{2} (d Φ)$ (where Φ is the standard normal cumulative distribution function) as all functions that are square integrable with respect to the measure $d Φ$ . An inner product on this space is defined via $⟨ f, h ⟩ = \int_{- \infty}^{\infty} f (x) h (x) ϕ (x) d x$ , where ϕ is the standard normal probability density function. Hence, we can also write $⟨ f, g ⟩ = E [f (W) g (W)]$ for $W \sim N (0, 1)$ . It follows that we can do a Hermite expansion on g (see Janicki & McElroy, Citation2016 for background): (2) $\begin{aligned} g (x) = \sum_{k = 0}^{\infty} J_{k} H_{k} (x) \end{aligned}$ (2) with $H_{k}$ the normalised Hermite polynomials (Roman, Citation1984), and $J_{k} = ⟨ g, H_{k} ⟩$ the Hermite coefficients. The Hermite polynomials are defined via $\begin{aligned} H_{k} (x) = \frac{1}{\sqrt{k!}} {(- 1)}^{k} e^{x^{2} / 2} \partial_{x}^{k} e^{- x^{2} / 2} \end{aligned}$ for $k \geq 0$ , and form a complete orthonormal system. (Hence, the coefficients $J_{k}$ tend to zero as $k \to \infty$ .) Plugging $Z_{⋆}$ into (Equation2(2) $\begin{aligned} g (x) = \sum_{k = 0}^{\infty} J_{k} H_{k} (x) \end{aligned}$ (2) ) yields $Y = \sum_{k = 0}^{\infty} J_{k} H_{k} (Z_{⋆})$ , and applying the conditional expectation operator (which is linear) yields (3) $\begin{aligned} \hat{Y} = E [Y | \underline{X}] = \sum_{k = 0}^{\infty} J_{k} E [H_{k} (Z_{⋆}) | \underline{X}] . \end{aligned}$ (3) Evidently, the nonlinear predictor can be computed in terms of conditional expectations of $H_{k} (Z_{⋆})$ , and the formula is summarised in our main theorem below. The proof relies upon the Hermite generating function (4) $\begin{aligned} h (x, t) = \exp {x t - t^{2} / 2} = \sum_{k = 0}^{\infty} \frac{t^{k}}{\sqrt{k!}} H_{k} (x) . \end{aligned}$ (4) (The second equality follows from the definition of the Hermite polynomials, and is discussed in Roman (Citation1984).) From (Equation4(4) $\begin{aligned} h (x, t) = \exp {x t - t^{2} / 2} = \sum_{k = 0}^{\infty} \frac{t^{k}}{\sqrt{k!}} H_{k} (x) . \end{aligned}$ (4) ) we see that (5) $\begin{aligned} \partial_{t}^{k} h (x, t) |_{t = 0} = \sqrt{k!} H_{k} (x) . \end{aligned}$ (5) Therefore, $J_{k} = E [g (W) H_{k} (W)] = \frac{1}{\sqrt{k!}} \partial_{t}^{k} E [g (W) h (W, t)] |_{t = 0}$ . We next discuss the optimal predictor.

Theorem 2.1

Suppose that $Z_{⋆}$ is standard normal, and $Y = g (Z_{⋆})$ with g given by (Equation2(2) $\begin{aligned} g (x) = \sum_{k = 0}^{\infty} J_{k} H_{k} (x) \end{aligned}$ (2) ), and $\hat{Z_{⋆}} = E [Z_{⋆} | \underline{Z}]$ is the linear prediction in the transformed space, with MSE given by V in (Equation1(1) $\begin{aligned} V = E [{(Z_{⋆} - E [Z_{⋆} | \underline{Z}])}^{2}] \end{aligned}$ (1) ). Then the MSE optimal nonlinear predictor $\hat{Y}$ of Y given the data $\underline{X}$ is (6) $\begin{aligned} E [Y | \underline{X}] = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \sum_{ℓ = 0}^{k} (\binom{k}{ℓ}) \sqrt{ℓ!} H_{ℓ} (\hat{Z_{⋆}}) κ_{k - ℓ}, \end{aligned}$ (6) where $κ_{ℓ}$ is the ℓth moment of a Gaussian variable of variance V, i.e., $κ_{ℓ}$ equals 1 if $ℓ = 0$ , and equals zero if ℓ is odd and equals ${\sqrt{V}}^{ℓ} (ℓ - 1)!!$ if ℓ is even. With $ε = Y - \hat{Y}$ , the optimal prediction MSE is (7) $\begin{aligned} E [ε^{2}] = \sum_{k = 1}^{\infty} J_{k}^{2} (1 - {(1 - V)}^{k}) . \end{aligned}$ (7)

The optimal predictor (Equation6(6) $\begin{aligned} E [Y | \underline{X}] = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \sum_{ℓ = 0}^{k} (\binom{k}{ℓ}) \sqrt{ℓ!} H_{ℓ} (\hat{Z_{⋆}}) κ_{k - ℓ}, \end{aligned}$ (6) ) can be explicitly computed to any desired level of accuracy, truncating the first summation over k to a desirable level. Note that this result applies to finite-samples, semi-infinite samples, and bi-infinite samples, allowing us to apply (Equation6(6) $\begin{aligned} E [Y | \underline{X}] = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \sum_{ℓ = 0}^{k} (\binom{k}{ℓ}) \sqrt{ℓ!} H_{ℓ} (\hat{Z_{⋆}}) κ_{k - ℓ}, \end{aligned}$ (6) ) to forecasting (from an infinite past) and missing value problems.

Remark 2.1

Since $\sum_{k = 0}^{\infty} J_{k}^{2} = ⟨ g, g ⟩$ , we see that the Hermite coefficients are square summable if $g \in L_{2} (d Φ)$ , and the MSE is well-defined, being bounded above by $\sum_{k = 1}^{\infty} J_{k}^{2} V^{k}$ . (Note that $V \leq 1$ because $Z_{⋆}$ has variance one.) Because $Var [{\hat{Y}}^{2}] = Var [Y^{2}] - E [ε^{2}]$ , we see that $\hat{Y}$ is square integrable, and hence the formula (Equation6(6) $\begin{aligned} E [Y | \underline{X}] = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \sum_{ℓ = 0}^{k} (\binom{k}{ℓ}) \sqrt{ℓ!} H_{ℓ} (\hat{Z_{⋆}}) κ_{k - ℓ}, \end{aligned}$ (6) ) converges.

Remark 2.2

In applications, we typically will know $f (Y) | \underline{Z}$ rather than $Z_{⋆} | \underline{Z}$ , but the latter can be obtained from the former by applying the normalising transform described above, i.e., $\begin{aligned} E [Z_{⋆} | \underline{Z}] & = (E [f (Y) | \underline{Z}] - E [f (Y)]) / \sqrt{Var [f (Y)]} \\ V & = E [{(f (Y) - E [f (Y) | \underline{Z}])}^{2}] / Var [f (Y)] . \end{aligned}$

Remark 2.3

To apply Theorem 2.1 we must obtain the Hermite coefficients $J_{k}$ . Either g is known (obtained by exploratory analysis) or estimated (see Janicki & McElroy, Citation2016), and then the Hermite coefficients can be obtained via Monte Carlo simulation: $\begin{aligned} J_{k} = ⟨ g, H_{k} ⟩ & = E [g (W) H_{k} (W)] \\ \approx M^{- 1} \sum_{m = 1}^{M} g (W_{m}) H_{k} (W_{m}) \end{aligned}$ for $W_{1}, \dots, W_{m}$ i.i.d. standard normal. Another approach to computation involves the generating function: $\begin{aligned} J_{k} & = \frac{1}{\sqrt{k!}} \partial_{t}^{k} E [g (W) h (W, t)] |_{t = 0} \\ = \frac{1}{\sqrt{2 π k!}} \partial_{t}^{k} \int g (w) \exp {w t - t^{2} / 2} \\ \times \exp {- w^{2} / 2} d w |_{t = 0} \\ = \frac{1}{\sqrt{2 π k!}} \partial_{t}^{k} \int g (w) \exp {- {(w - t)}^{2} / 2} d w |_{t = 0} \\ = \frac{1}{\sqrt{2 π k!}} \partial_{t}^{k} \int g (w + t) \exp {- w^{2} / 2} d w |_{t = 0} \\ = \frac{1}{\sqrt{k!}} \partial_{t}^{k} E [g (W + t)] |_{t = 0} = \frac{1}{\sqrt{k!}} E [g^{(k)} (W)] . \end{aligned}$ The last expression denotes the k-fold derivative of the function. In cases where g is explicitly known, this can be an easier approach to getting the Hermite coefficients.

In the following examples we suppose that ${\hat{Z}}_{⋆} = E [Z_{⋆} | \underline{Z}]$ and $V = E [{({\hat{Z}}_{⋆} - Z_{⋆})}^{2}]$ are known and available to the practitioner; we present various cases of transforms f, and apply the results of Theorem 2.1.

Example 2.1

Gaussian

A simple affine transformation $g (x) = σ x + μ$ ensures that Y is still Gaussian, with mean μ and variance $σ^{2}$ . In this case $J_{0} = μ$ and $J_{1} = σ$ , and we more simply have $E [Y | \underline{X}] = σ \hat{Z_{⋆}} + μ$ .

Example 2.2

Lognormal

Suppose $g (x) = e^{x}$ ; applying the generating function method of Remark 2.3, we obtain $\begin{aligned} J_{k} = \frac{1}{\sqrt{k!}} E [\exp {W}] = \frac{e^{1 / 2}}{\sqrt{k!}} \end{aligned}$ for all $k \geq 0$ . This can be utilised in (Equation3(3) $\begin{aligned} \hat{Y} = E [Y | \underline{X}] = \sum_{k = 0}^{\infty} J_{k} E [H_{k} (Z_{⋆}) | \underline{X}] . \end{aligned}$ (3) ), together with (Equation5(5) $\begin{aligned} \partial_{t}^{k} h (x, t) |_{t = 0} = \sqrt{k!} H_{k} (x) . \end{aligned}$ (5) ), to yield $\begin{aligned} E [Y | \underline{X}] & = e^{1 / 2} \sum_{k = 0}^{\infty} \frac{1}{k!} \partial_{t}^{k} (h (\hat{Z_{⋆}}, t) e^{V t^{2} / 2}) |_{t = 0} \\ = e^{1 / 2} (h (\hat{Z_{⋆}}, t) e^{V t^{2} / 2}) |_{t = 1} \\ = e^{1 / 2} \exp {\hat{Z_{⋆}} + (V - 1) / 2} \\ = \exp {\hat{Z_{⋆}} + V / 2}, \end{aligned}$ which corresponds to the result of McElroy (Citation2010). Applying Theorem 2.1 to compute the optimal MSE, we see that $E [{(\hat{Y} - Y)}^{2}] = e^{2} (1 - e^{- V}) .$

Example 2.3

Uniform

For ${Z_{t}}$ with standard normal marginal, set $g = Φ$ , so that ${X_{t}}$ has a marginal distribution that is uniform on $(0, 1)$ . Then by Remark 2.3, we find that $g^{(k)} (x) = ϕ^{(k - 1)} (x)$ , and hence $\begin{aligned} J_{k} & = \frac{1}{\sqrt{k!}}, E [ϕ^{(k - 1)} (W)] \\ = k^{- 1 / 2} {(- 1)}^{k - 1} E [H_{k - 1} (W) ϕ (W)], \end{aligned}$ where $W \sim N (0, 1)$ .

Example 2.4

Logistic

Consider a logistic transform given by $g (x) = e^{x} / (1 + e^{x})$ . The first few derivatives of g are $\begin{aligned} g^{(1)} (x) & = e^{x} (1 + e^{x})^{- 2} \\ g^{(2)} (x) & = (e^{x} - e^{2 x}) (1 + e^{x})^{- 3} \\ g^{(3)} (x) & = (e^{x} - 4 e^{2 x} + e^{3 x}) (1 + e^{x})^{- 4} . \end{aligned}$ By Monte Carlo, we obtain $J_{0} = 0.500$ , $J_{1} = 0.207$ , $J_{2} = 0$ , and $J_{3} = - .025$ .

Example 2.5

Square

With $g (x) = x^{2}$ , we have $J_{0} = 1$ , $J_{1} = 0$ , $J_{2} = \sqrt{2}$ , and $J_{k} = 0$ for k>2. Then the optimal nonlinear predictor is $E [Y | \underline{X}] = {\hat{Z_{⋆}}}^{2} + V$ . It is simple to check that the error is $\begin{aligned} ϵ = (Z_{⋆} - \hat{Z_{⋆}}) (Z_{⋆} + \hat{Z_{⋆}}) - V, \end{aligned}$ which has mean zero and is independent of all functions of the data. The prediction MSE is $4 V - 2 V^{2}$ .

3. Comparing linear and nonlinear prediction

It is of interest to understand how much benefit nonlinear prediction provides. Clearly, if g is affine (Example 2.1) then the minimal MSE is equal to $σ^{2} V$ , the same as the linear prediction, but we can expect gains to the degree that g differs from the affine case.

Remark 3.1

A related nonlinear predictor that is sometimes used in applications is defined via $\tilde{Y} = g (\hat{Z})$ , but unfortunately this estimator can be biased. Following the same arguments used in the proof of Theorem 2.1, $\begin{aligned} Y - \tilde{Y} & = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \partial_{t}^{k} (h (Z, t) - h (\hat{Z_{⋆}}, t)) |_{t = 0} \\ = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} (\exp {\hat{Z_{⋆}} t - t^{2} / 2} \\ \times [\exp {(Z - \hat{Z_{⋆}} t} - 1]) |_{t = 0}, \end{aligned}$ so that the expectation of the quantity in parentheses is $\begin{aligned} E [\exp {\hat{Z_{⋆}} t - t^{2} / 2}] [\exp {V t^{2} / 2} - 1] > 0. \end{aligned}$ Hence there is no guarantee that the bias is zero.

More properly, a comparison can be made to the best linear predictor. When ${Z_{t}}$ is a strictly stationary time series, then ${X_{t}}$ is as well, and we can in some cases determine the best linear estimator's MSE for comparison. Note that in this special case, the mean and variance of each variable $Z_{t}$ is the same, and hence without any loss of generality we may assume that ${Z_{t}}$ is standardised, i.e., each $Z_{t}$ is standard normal. Because $X_{t} = g (Z_{t})$ , it follows from Taniguchi and Kakizawa (Citation2000, p. 319) that $E [X_{t}] = J_{0}$ and (8) $\begin{aligned} γ_{X} (h) = \sum_{k = 0}^{\infty} J_{k}^{2} {γ_{Z} (h)}^{k} - J_{0}^{2} = \sum_{k = 1}^{\infty} J_{k}^{2} {γ_{Z} (h)}^{k}, \end{aligned}$ (8) where ${γ_{X} (h)}$ and ${γ_{Z} (h)}$ are the autocovariance sequences of ${X_{t}}$ and ${Z_{t}}$ , respectively. So in principle we can understand the second order structure of ${X_{t}}$ in terms of the Hermite coefficients and the original autocovariances.

Suppose further that we are interested in one-step ahead forecasting from a sample of size T. The best linear predictor is obtained by solving the Yule-Walker equations in ${γ_{X} (h)}$ , and the MSE of such is given by (9) $\begin{aligned} γ_{X} (0) - [γ_{X} (1), \dots, γ_{X} (T)] Γ_{X}^{- 1} [γ_{X} (1), \dots, γ_{X} (T)]^{'}, \end{aligned}$ (9) where $Γ_{X}$ is the T-dimensional Toeplitz covariance matrix with jkth entry $γ_{X} (j - k)$ . We know that such an MSE must be greater than the minimal MSE provided in Theorem 2.1, with equality occuring only in the case that a linear estimator is globally optimal (e.g., the time series is linear, or is Gaussian). If instead we are forecasting from an infinite past, then the MSE of the linear predictor is the innovation variance $σ^{2}$ given by Kolmogorov's formula: (10) $\begin{aligned} σ^{2} = \exp {{(2 π)}^{- 1} \int_{- π}^{π} \log f_{X} (λ) d λ} . \end{aligned}$ (10) Here, $f_{X} (λ) = \sum_{h = - \infty}^{\infty} γ_{X} (h) e^{- i λ h}$ is the spectral density of ${X_{t}}$ . Therefore, for either a finite sample or for an infinite past, we can determine the linear predictor MSE for a stationary process ${X_{t}}$ by first computing $γ_{X} (h)$ from $γ_{Z} (h)$ via (Equation8(8) $\begin{aligned} γ_{X} (h) = \sum_{k = 0}^{\infty} J_{k}^{2} {γ_{Z} (h)}^{k} - J_{0}^{2} = \sum_{k = 1}^{\infty} J_{k}^{2} {γ_{Z} (h)}^{k}, \end{aligned}$ (8) ), followed by application of (Equation9(9) $\begin{aligned} γ_{X} (0) - [γ_{X} (1), \dots, γ_{X} (T)] Γ_{X}^{- 1} [γ_{X} (1), \dots, γ_{X} (T)]^{'}, \end{aligned}$ (9) ) or (Equation10(10) $\begin{aligned} σ^{2} = \exp {{(2 π)}^{- 1} \int_{- π}^{π} \log f_{X} (λ) d λ} . \end{aligned}$ (10) ) as each case requires. As for the best (nonlinear) predictor, its MSE is given by (Equation7(7) $\begin{aligned} E [ε^{2}] = \sum_{k = 1}^{\infty} J_{k}^{2} (1 - {(1 - V)}^{k}) . \end{aligned}$ (7) ) of Theorem 2.1, where (11) $\begin{aligned} V & = γ_{Z} (0) - [γ_{Z} (1), \dots, γ_{Z} (T)] Γ_{Z}^{- 1} \\ \times [γ_{Z} (1), \dots, γ_{Z} (T)]^{'} \end{aligned}$ (11) is the analogue of (Equation9(9) $\begin{aligned} γ_{X} (0) - [γ_{X} (1), \dots, γ_{X} (T)] Γ_{X}^{- 1} [γ_{X} (1), \dots, γ_{X} (T)]^{'}, \end{aligned}$ (9) ) for the ${Z_{t}}$ process.

We provide an illustration in the case of an MA(1) process with various values of θ, and sample size T = 100. The innovation variance is set equal to ${(1 + θ^{2})}^{- 1}$ so that $γ_{Z} (0) = 1$ , as required by the above discussion. For the MA(1) process with T = 100, the value of V given by (Equation11(11) $\begin{aligned} V & = γ_{Z} (0) - [γ_{Z} (1), \dots, γ_{Z} (T)] Γ_{Z}^{- 1} \\ \times [γ_{Z} (1), \dots, γ_{Z} (T)]^{'} \end{aligned}$ (11) ) is the same (up to the fourth decimal place) as the innovation variance ${(1 + θ^{2})}^{- 1}$ . For transformations, we study $g (x) = x^{2}$ , $g (x) = e^{x}$ , and the logistic. Observe that from (Equation8(8) $\begin{aligned} γ_{X} (h) = \sum_{k = 0}^{\infty} J_{k}^{2} {γ_{Z} (h)}^{k} - J_{0}^{2} = \sum_{k = 1}^{\infty} J_{k}^{2} {γ_{Z} (h)}^{k}, \end{aligned}$ (8) ) the process ${X_{t}}$ will be m-dependent if ${Z_{t}}$ is (although the converse need not be true). Hence, if we obtained a sample from ${X_{t}}$ it would likely be identified with an MA(q) model, and the parameter estimates (e.g., obtained using a Whittle likelihood, which is valid for non-Gaussian processes so long as the cumulants are summable) would likely converge to those corresponding to the spectral factorisation of $f_{X}$ . Thus, our illustration provides an accurate rendition of the prediction MSE one would obtain in the case of linear or nonlinear predictors, only with the impact of parameter estimation error completely removed (Tables ).

Table 1. MSE for linear and non-linear predictors applied to a squared MA(1) process of parameter θ.

Display Table

Table 2. MSE for linear and non-linear predictors applied to an exponential MA(1) process of parameter θ.

Display Table

Table 3. MSE for linear and non-linear predictors applied to a logistic MA(1) process of parameter θ.

Display Table

In each case, the degree of benefit to nonlinear prediction increases with θ, as to be expected; however, there are large discrepancies between the three functions. When $θ = .8$ , the logistic transformation offers only a $3 %$ improvement with nonlinear prediction, indicating that linear prediction is almost just as good as the conditional expectation. With $g (x) = x^{2}$ the analogous improvement is $11 %$ , and is $16 %$ for $g (x) = e^{x}$ , indicating some real benefit to nonlinear prediction.

We end with a remark on how a confidence interval can be constructed using simulations. Let the formula given by (Equation6(6) $\begin{aligned} E [Y | \underline{X}] = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \sum_{ℓ = 0}^{k} (\binom{k}{ℓ}) \sqrt{ℓ!} H_{ℓ} (\hat{Z_{⋆}}) κ_{k - ℓ}, \end{aligned}$ (6) ) be denoted $h ({\hat{Z}}_{⋆})$ . Then the optimal prediction error can be written $\begin{aligned} ε = g ({\hat{Z}}_{⋆} + δ) - h ({\hat{Z}}_{⋆}), \end{aligned}$ where $δ = Z_{⋆} - {\hat{Z}}_{⋆}$ is the (linear) prediction error for the Gaussian variable, and is uncorrelated with (and hence independent of) ${\hat{Z}}_{⋆}$ . Also $δ \sim N (0, V)$ . Therefore, in cases where it is easy to simulate ${\hat{Z}}_{⋆}$ (e.g., suppose we are forecasting from a Gaussian ARMA process) we can independently draw δ and compute ε for repeated Monte Carlo draws, thereby obtaining a confidence interval for Y.

Disclaimer

This report is released to inform interested parties of research and to encourage discussion. The views expressed on statistical issues are those of the authors and not those of the U.S. Census Bureau.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Tucker McElroy

Tucker McElroy is Senior Time Series Mathematical Statistician at the U.S. Census Bureau.

Srinjoy Das

Srinjoy Das is a Postdoctoral Scholar in the Mathematics department at the University of California, San Diego.

Notes

1 Other loss functions could of course be envisioned (mean absolute loss yields the conditional median, for example).

References

Brockett, P. L., Hinich, M. J., & Patterson, D. (1988). Bispectral-based tests for the detection of Gaussianity and linearity in time series.Journal of the American Statistical Association, 83(403), 657–664. https://doi.org/10.1080/01621459.1988.10478645
Web of Science ®Google Scholar
Brockwell, P. J., & Davis, R. A. (2013). Time series: Theory and methods. Springer Science & Business Media.
Google Scholar
Janicki, R., & McElroy, T. (2016). Hermite expansion and estimation of monotonic transformations of Gaussian data. Journal of Nonparametric Statistics, 28(1), 207–234. https://doi.org/10.1080/10485252.2016.1139880
Web of Science ®Google Scholar
Maravall, A. (1983). An application of nonlinear time series forecasting. Journal of Business & Economic Statistics, 1(1), 66–74. https://doi.org/10.1080/07350015.1983.10509325
Google Scholar
McElroy, T. (2010). A nonlinear algorithm for seasonal adjustment in multiplicative component decompositions. Studies in Nonlinear Dynamics and Econometrics, 14(4). Article 6. https://doi.org/10.2202/1558-3708.1756
Web of Science ®Google Scholar
McElroy, T. (2016). On the measurement and treatment of extremes in time series. Extremes, 19(3), 467–490. https://doi.org/10.1007/s10687-016-0254-4
Web of Science ®Google Scholar
McElroy, T., & McCracken, M. (2017). Multi-step ahead forecasting of vector time series. Econometric Reviews, 36(5), 495–513. https://doi.org/10.1080/07474938.2014.977088
Web of Science ®Google Scholar
McElroy, T., & Politis, D. (2020). Time series: A first course with bootstrap starter. Chapman Hall.
Google Scholar
Roman, S. (1984). The umbral calculus. Academic Press.
Google Scholar
Taniguchi, M., & Kakizawa, Y. (2000). Asymptotic theory of statistical inference for time series. Springer.
Google Scholar
Varma, R. S. (1951). On Appell polynomials. Proceedings of the American Mathematical Society, 2(4), 593–596. https://doi.org/10.1090/S0002-9939-1951-0042547-5
Web of Science ®Google Scholar

Appendix

Proof of Theorem 2.1 From (Equation3

(3)

\begin{aligned} \hat{Y} = E [Y | \underline{X}] = \sum_{k = 0}^{\infty} J_{k} E [H_{k} (Z_{⋆}) | \underline{X}] . \end{aligned}

(3) ) and (Equation5

(5)

\begin{aligned} \partial_{t}^{k} h (x, t) |_{t = 0} = \sqrt{k!} H_{k} (x) . \end{aligned}

(5) ) we obtain

\begin{aligned} E [H_{k} (Z_{⋆}) | \underline{X}] = \frac{1}{\sqrt{k!}} \partial_{t}^{k} E [h (Z_{⋆}, t) | \underline{X}] |_{t = 0} . \end{aligned}

We can write

Z_{⋆} | \underline{X} \sim N (\hat{Z_{⋆}}, V)

, and using the property that

Z_{⋆} - \hat{Z_{⋆}}

is independent of all functions of the data, we obtain

\begin{aligned} E [h (Z_{⋆}, t) | \underline{X}] & = E [\exp {Z_{⋆} t - \hat{Z_{⋆}} t} \exp {\hat{Z_{⋆}} t - t^{2} / 2} | \underline{X}] \\ = E [\exp {Z_{⋆} t - \hat{Z_{⋆}} t} | \underline{X}] \exp {\hat{Z_{⋆}} t - t^{2} / 2} \\ = \exp {V t^{2} / 2} h (\hat{Z_{⋆}}, t) . \end{aligned}

Hence

\begin{aligned} E [H_{k} (Z_{⋆}) | \underline{X}] & = \frac{1}{\sqrt{k!}} \partial_{t}^{k} (h (\hat{Z_{⋆}}, t) e^{V t^{2} / 2}) |_{t = 0} \\ = \frac{1}{\sqrt{k!}} \sum_{ℓ = 0}^{k} (\binom{k}{ℓ}) \partial_{t}^{ℓ} h (\hat{Z_{⋆}}, t) \partial_{t}^{k - ℓ} e^{V t^{2} / 2} |_{t = 0} \\ = \frac{1}{\sqrt{k!}} \sum_{ℓ = 0}^{k} (\binom{k}{ℓ}) \sqrt{ℓ!} H_{ℓ} (\hat{Z_{⋆}}) κ_{k - ℓ} . \end{aligned}

The prediction error is

\begin{aligned} ε & = Y - E [Y | \underline{X}] = \sum_{k = 0}^{\infty} J_{k} (H_{k} (Z_{⋆}) - E [H_{k} (Z_{⋆}) | \underline{X}]) \\ = \sum_{k = 1}^{\infty} \frac{J_{k}}{\sqrt{k!}} \partial_{t}^{k} (h (Z_{⋆}, t) - E [h (Z_{⋆}, t) | \underline{X}]) |_{t = 0} \\ = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \partial_{t}^{k} (h (\hat{Z_{⋆}}, t) \\ \cdot [\exp {(Z_{⋆} - \hat{Z_{⋆}}) t} - \exp {V t^{2} / 2}]) |_{t = 0} . \end{aligned}

Note that

Z_{⋆} - \hat{Z_{⋆}}

is orthogonal to all linear functions of the data; because this error is Gaussian, it is moreover independent of all functions of the data

\underline{X}

. It follows that

E [ε] = 0

, because

E [\exp {(Z_{⋆} - \hat{Z_{⋆}}) t}] = \exp {V t^{2} / 2} .

Moreover, for any function

ℓ (\underline{X})

of the data,

\begin{aligned} E [ϵ ℓ (\underline{X})] & = \sum_{k = 0}^{\infty} \frac{J_{k}}{\sqrt{k!}} \partial_{t}^{k} (h (\hat{Z_{⋆}}, t) ℓ (\underline{X}) \\ \cdot E [\exp {(Z_{⋆} - \hat{Z_{⋆}}) t} \\ - \exp {V t^{2} / 2}]) |_{t = 0} = 0. \end{aligned}

This verifies optimality. To compute the MSE, first observe that

\begin{aligned} ε^{2} & = \sum_{j, k = 1}^{\infty} \frac{J_{j} J_{k}}{\sqrt{j!} \sqrt{k!}} \partial_{s}^{j} \partial_{t}^{k} (\exp {\hat{Z_{⋆}} (s + t) - (s^{2} + t^{2}) / 2} \\ \cdot [\exp {(Z_{⋆} - \hat{Z_{⋆}}) (s + t)} \\ - \exp {V t^{2} / 2 + (Z_{⋆} - \hat{Z_{⋆}}) s} \\ - \exp {V s^{2} / 2 + (Z_{⋆} - \hat{Z_{⋆}}) t} \\ + \exp {V (s^{2} + t^{2}) / 2}]) |_{s, t = 0} . \end{aligned}

Note that

E [\hat{Z_{⋆}}] = E [Z_{⋆}] = 0

, because

Z_{⋆}

is standard normal. Moreover, due to orthogonality,

Z_{⋆} = (Z_{⋆} - \hat{Z_{⋆}}) + \hat{Z_{⋆}}

with the two summands orthogonal, and hence

1 = E [Z_{⋆}^{2}] = V + E [{\hat{Z_{⋆}}}^{2}]

. Using these facts and again using the independence property, we take the expectation of

ε^{2}

and obtain

\begin{aligned} E [ε^{2}] & = \sum_{j, k = 1}^{\infty} \frac{J_{j} J_{k}}{\sqrt{j!} \sqrt{k!}} \partial_{s}^{j} \partial_{t}^{k} (\exp {E [\hat{Z_{⋆}}] (s + t) \\ + E [{\hat{Z_{⋆}}}^{2}] {(s + t)}^{2} / 2 - (s^{2} + t^{2}) / 2} \\ \cdot (\exp {V (s + t)^{2} / 2} - \exp {V (s^{2} + t^{2}) / 2})) |_{s, t = 0} \\ = \sum_{j, k = 1}^{\infty} \frac{J_{j} J_{k}}{\sqrt{j!} \sqrt{k!}} \partial_{s}^{j} \partial_{t}^{k} \\ \times (\exp {(1 - V) (s + t)^{2} / 2 - (s^{2} + t^{2}) / 2} \\ \cdot (\exp {V (s + t)^{2} / 2} - \exp {V (s^{2} + t^{2}) / 2})) |_{s, t = 0} \\ = \sum_{j, k = 1}^{\infty} \frac{J_{j} J_{k}}{\sqrt{j!} \sqrt{k!}} \partial_{s}^{j} \partial_{t}^{k} (\exp {s t} \cdot (1 - \exp {- V s t})) |_{s, t = 0} . \end{aligned}

Now, it is straight-forward to show that for any constant A,

\begin{aligned} \partial_{s}^{j} \partial_{t}^{k} \exp {A s t} |_{s, t = 0} = {\begin{cases} A^{k} k! & if j = k \\ 0 & else . \end{cases} \end{aligned}

Applying this with A = 1 and A = −V yields

\begin{aligned} E [ε^{2}] & = \sum_{k = 1}^{\infty} \frac{J_{k}^{2}}{k!} (1 - {(1 - V)}^{k}) k! \\ = \sum_{k = 1}^{\infty} J_{k}^{2} (1 - {(1 - V)}^{k}) . \end{aligned}

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Nonlinear prediction via Hermite transformation

ABSTRACT

1. Introduction

2. Nonlinear prediction

Gaussian

Lognormal

Uniform

Logistic

Square

3. Comparing linear and nonlinear prediction

Table 1. MSE for linear and non-linear predictors applied to a squared MA(1) process of parameter θ.

Table 2. MSE for linear and non-linear predictors applied to an exponential MA(1) process of parameter θ.

Table 3. MSE for linear and non-linear predictors applied to a logistic MA(1) process of parameter θ.

Disclaimer

Disclosure statement

Notes on contributors

Tucker McElroy

Srinjoy Das

References

Appendix

Information for

Open access

Opportunities

Help and information

Nonlinear prediction via Hermite transformation

ABSTRACT

1. Introduction

2. Nonlinear prediction

Gaussian

Lognormal

Uniform

Logistic

Square

3. Comparing linear and nonlinear prediction

Table 1. MSE for linear and non-linear predictors applied to a squared MA(1) process of parameter θ.

Table 2. MSE for linear and non-linear predictors applied to an exponential MA(1) process of parameter θ.

Table 3. MSE for linear and non-linear predictors applied to a logistic MA(1) process of parameter θ.

Disclaimer

Disclosure statement

Additional information

Notes on contributors

Tucker McElroy

Srinjoy Das

Notes

References

Appendix

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date