Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this paper, we are interested in nonparametric kernel estimation of a generalised regression function based on an incomplete sample $(X_{t}, Y_{t}, ζ_{t})_{t \in [0, T]}$ copies of a continuous-time stationary and ergodic process $(X, Y, ζ)$ . The predictor X is valued in some infinite-dimensional space, whereas the real-valued process Y is observed when the Bernoulli process $ζ = 1$ and missing whenever $ζ = 0$ . Uniform almost sure consistency rate as well as the evaluation of the conditional bias and asymptotic mean square error are established. The asymptotic distribution of the estimator is provided with a discussion on its use in building asymptotic confidence intervals. To illustrate the performance of the proposed estimator, a first simulation is performed to compare the efficiency of discrete-time and continuous-time estimators. A second simulation is conducted to discuss the selection of the optimal sampling mesh in the continuous-time case. Then, a third simulation is considered to build asymptotic confidence intervals. An application to financial time series is used to study the performance of the proposed estimator in terms of point and interval prediction of the IBM asset price log-returns. Finally, a second application is introduced to discuss the usage of the initial estimator to impute missing household-level peak electricity demand.

Keywords:

1. Introduction

Let $(E, d)$ be an infinite-dimensional space equipped with a semi-metric $d (\cdot, \cdot)$ . Consider $(X_{t}, Y_{t}, ζ_{t})_{t \in R^{+}}$ a stationary and ergodic continuous-time process valued in the space $X := E \times R \times {0, 1}$ , such that each triplet $(X_{t}, Y_{t}, ζ_{t})$ has the same probability distribution as the random variable (r.v.) $(X, Y, ζ)$ defined on probability space $(Ω, F, P)$ . Let S be a compact interval in $R$ and ψ be a measurable function defined on the space $S \times R$ , $(y, Y) \mapsto ψ (y, Y)$ , where y is a real variable such that $E (| ψ (Y, y) |) < \infty$ . Consider the following regression model: (1) $ψ (Y, y) = m_{ψ} (X, y) + ε,$ (1) where $m_{ψ} (X, y)$ is the conditional expectation of $ψ (Y, y)$ given the r.v. X. That is, for any $y \in S$ and a fixed $x \in E$ , $E (ψ (Y, y) | X = x) = m_{ψ} (x, y)$ . The error term ε is independent of X such $E (ε | X) = 0$ almost surely (a.s.)

Usually, when no data are missing, a sample of stationary and ergodic process $(X_{t}, Y_{t})_{0 \leq t \leq T}$ is observed. Here, we allow the response variable $Y_{t}$ to be Missing At Random (MAR) any time t. To check whether an observation is complete or missing, a new variable ζ is introduced into the model as an indicator of the missing observations. Thus, for any $t \in [0, T]$ , $ζ_{t} = 1$ if $Y_{t}$ is observed and 0 if $Y_{t}$ is missing. We suppose that the Bernoulli random variable ζ satisfies $P (ζ = 1 | X = x, Y = y) = P (ζ = 1 | X = x) := p (x)$ . Here, $p (x) > 0$ is the conditional probability of observing the response variable and is usually unknown. This assumption allows to conclude that ζ and Y are conditionally independent given X. Note that the above assumption says that the response variable does not provide additional information, on top of that given by the explanatory variable, to predict whether an individual will present a missing response.

In this paper, we are interested in the estimation of the regression function $m_{ψ} (\cdot, y)$ based on the observed data $(X_{t}, Y_{t}, ζ_{t})_{0 \leq t \leq T}$ . Note that, for any $t \in [0, T]$ , $t \mapsto {X_{t} (ω), ω \in Ω}$ is an element in the space $E$ , which means that, for any fixed time $t = t_{0}$ , $X_{t_{0}}$ is a curve. Specifically, if $E = C_{[0, 1]}$ is the space of square integrable functions defined on $[0, 1]$ , then the predictor $X_{t_{0}} := {X_{t_{0}} (s) : s \in [0, 1]}$ describes a trajectory in the functional space $E$ observed at the fixed time $t_{0}$ .

In real life, there are several situations where the response variable might be missing at random. For instance, in survey sampling studies the non-response is an increasingly common problem, where the missing response reaches rates of $25 %$ – $30 %$ or even higher (see, e.g. Sikov Citation2018). In such cases, the missing data become a real source of bias in survey sampling estimation. Another case where the response may be subject to the MAR phenomena is the household electricity consumption monitoring. Indeed, the real-time collection of intra-day electricity consumption is now possible after the deployment of smart meters at the household level. The transmission of the information from the smart meter towards the information system goes usually through WIFI or optical fibre networks which are significantly dependent on the weather conditions, among other factors. Therefore, a response variable such as the daily total electricity consumption might be subject to missing at random mechanism due to bad weather conditions (for more details , see Section 5.2). In financial market, despite the modern technology, which allows to collect data at a very fine time scale, financial data can still be missing. For instance, there are some regular holidays, such as Thanksgiving Day and Christmas, for which stock price data are missing. There are many other technical reasons (such as breakdown in devises recording data, computers' sudden shutdowns, …) that make stretches of data missing (see Section 5.1 for more details about this application). For further examples and details about missing at random data the reader is referred to Chapter 1 in Little and Rubin (Citation2002).

Whenever $(X_{i}, Y_{i}, ζ_{i})_{1 \leq i \leq n}$ is an independent and identically distributed (i.i.d) random sample several authors investigated nonparametric and semiparametric estimations of the regression function. In the framework where X is a finite dimensional, one can quote (Cheng Citation1994; Little and Rubin Citation2002; Nittner Citation2003; Tsiatis Citation2006; Liang et al. Citation2007; Efromovich Citation2011). See also, Ferraty et al. (Citation2013) when the predictor is infinite-dimensional. However, less attention has been given to the case of dependent data including an infinite-dimensional covariate, except Ling et al. (Citation2015), where a local constant estimation of the regression operator with discrete-time ergodic processes was considered.

In the case of continuous-time finite dimensional processes $(X, Y) \in R^{d} \times R^{d^{'}}$ (d and $d^{'} \geq 1$ ) satisfying a strong mixing condition, the estimation of the regression function based on completely observed data was considered by several authors, see for instance, the monograph by Bosq (Citation1998) and the references therein.

Some of these results were extended by Didi and Louani (Citation2014) and Bouzebda and Didi (Citation2017) for stationary and ergodic processes. Chaouch and Laïb (Citation2019) studied the asymptotic mean square error of the kernel regression estimator for MAR stationary and ergodic process and obtained an explicit upper bound of it.

It is worth noting that, even though a continuous-time functional processes framework is considered in this paper, in practice data are often collected according to some sampling scheme and the continuous-time process is discretised. Our results are then valid when considering a discrete-time ergodic stationary process $(X_{t_{k}}, Y_{t_{k}}, ζ_{t_{k}})_{1 \leq k \leq n}$ sampled from a continuous-time process ${(X_{t}, Y_{t}, ζ_{t})}_{0 \leq t \leq T}$ with a regular sampling mesh $δ = T / n .$ A discussion on the optimal choice of the sampling mesh in the continuous-time case will be then of great practical interest (see Section 3.5 and Simulation 2).

In the setting where $(X_{t}, Y_{t})_{t \in [0, T]} \in E \times R$ is an α-mixing continuous-time process with Y is completely observed, Maillot (Citation2008) established the convergence with rates of the regression operator, and a super-optimal mean square convergence rate was obtained in Chesneau and Maillot (Citation2014).

This paper aims to complete and extend (Maillot Citation2008; Chesneau and Maillot Citation2014; Ling et al. Citation2015) work at several levels. First, we suppose that the continuous-time process satisfies an ergodic assumption rather than an α-mixing one. Therefore, the dependence structure considered here is more general and involves several processes which do not satisfy the mixing property. Indeed, our results are valid for α-mixing and non α-mixing as well as long memory and Bernoulli shift processes (for more details, see examples used to discuss Assumption A3 below). Moreover, our results are stated and proved without assuming neither mixing condition nor imposing a particular covariance structure on the process. This is due to the fact that the main technical tools used here are martingale difference devices and sequence of projections on appropriate σ-fields. Second, we complete and extend results established in Ling et al. (Citation2015) for discrete-time functional data processes, to the continuous-time functional framework.

We estimate a general operator $m_{ψ} (x, y)$ which includes conditional mean, conditional distribution function and conditional quantiles. It is worth noting that such extension is not obvious since it requires an appropriate definition of σ-fields adapted to continuous-time context. Such adaptation is crucial when using martingale difference tools to establish asymptotic properties of the estimator. Third, the response variable considered here is affected by the MAR mechanism and therefore is not completely observed as in Maillot (Citation2008). Moreover, in contrast to Maillot (Citation2008) and Chesneau and Maillot (Citation2014), we do not limit our study to the mean square convergence, but we provide a more exhaustive inference on the regression operator estimator including pointwise and uniform almost sure convergence rate, identification of the limiting distribution of our estimator, and provide method to build confidence intervals. Fourth, simulation study is also carried out to investigate the selection of the ‘optimal’ sampling mesh, which is one of the most important topics in nonparametric estimation with continuous-time processes.

The rest of this paper is organised as follows. In Section 2, we present the framework adapted to continuous-time ergodic processes and introduce assumptions needed for establishing asymptotic results. The main asymptotic properties of the estimator are discussed in Section 3. An illustration of the performance of the proposed estimator is discussed through simulated data in Section 4. Section 5 is devoted to an application of the proposed estimator to financial time series. Section 6 discusses the application of our theoretical results to continuous-time conditional quantiles estimation. Finally technical proofs are given in the Appendix.

2. Framework and assumptions

To define the framework of our study, we need to introduce some definitions. Let $X = (X_{t})_{t \in [0, \infty)}$ be a continuous-time process defined on a probability space $(Ω, F, P)$ and observed at any time $t \in [0, T]$ . For more details about the definition of continuous-time ergodic processes, the reader is referred to Didi and Louani (Citation2014). From now on, we consider ${F_{t}, t \geq 0}$ the filtration defined on $(Ω, F)$ , that is ${F_{t}, t \geq 0}$ is an increasing sequence of sub-σ-algebras of $F$ .

For a positive real number δ such that $n = \frac{T}{δ} \in N$ and $j \in N \cap [1, n]$ , consider the δ-partition $(T_{j} = jδ)_{1 \leq j \leq n}$ of the interval $[0, T]$ . Furthermore, for t>0 and $1 \leq j \leq n$ , we define the following σ-fields: (2) $\begin{aligned} F_{t - δ} & := σ ((X_{s}, Y_{s}) : 0 \leq s < t - δ), F_{j} := F_{T_{j}} = σ ((X_{s}, Y_{s}), 0 \leq s < T_{j}), \\ S_{t, δ} & := σ ((X_{s}, Y_{s}), (X_{r}) : 0 \leq s < t, t \leq r \leq t + δ) . \end{aligned}$ (2) If t<0 we take $F_{t}$ the trivial σ-field. Note that, for any $δ > 0$ and t>0, we have $F_{t - δ} \subset S_{t - δ, δ} \subset S_{t, δ}$ . Moreover, for any $j \geq 2$ , such that $T_{j - 1} \leq t \leq T_{j},$ we have $F_{j - 2} \subseteq F_{t - δ} \subset S_{t, δ}$ .

Let $B (x, u)$ be a ball centred at $x \in E$ with radius $u > 0.$ Denote $D_{t} := d (x, X_{t})$ a nonnegative real-valued continuous-time process and let $F_{x} (u) = P (D_{t} \leq u) := P (X_{t} \in B (x, u))$ and $F_{x}^{F_{t - δ}} = P (X_{t} \in B (x, u) | F_{t - δ})$ be the distribution function and conditional distribution function of $(D_{t})_{0 \leq t \leq T}$ given the σ-field $F_{t - δ}$ , respectively.

To define an estimator of the regression function adapted to the MAR, multiply Equation (Equation1(1) $ψ (Y, y) = m_{ψ} (X, y) + ε,$ (1) ) by ζ to get $ζψ (Y, y) = ζ m_{ψ} (X, y) + ζε .$ Taking conditional expectations with respect to X = x, one gets $E (ζψ (Y, y) | X = x) = m_{ψ} (x, y) E (ζ | X = x) .$ Thus we have $m_{ψ} (x, y) = \frac{E (ζψ (Y, y) | X = x)}{E (ζ | X = x)} .$ Given a random sample $(X_{t}, Y_{t}, ζ_{t})_{0 \leq t \leq T}$ one can therefore define a kernel-type estimator of $m_{ψ} (x, y)$ , say ${\hat{m}}_{ψ, T} (x, y)$ , adapted to the MAR response framework. Note that if there are missing observations in the response variable, a simple way to estimate $m_{ψ} (x, y)$ is to consider a kernel smoothing-type estimator which only considers observed data, in other words, those for which $ζ_{t} = 1$ . Therefore, one gets (3) ${\hat{m}}_{ψ, T} (x, y) := {\begin{cases} \frac{\int_{0}^{T} ζ_{t} ψ (Y_{t}, y) Δ_{t} (x) d t}{\int_{0}^{T} ζ_{t} Δ_{t} (x) d t}, & if \int_{0}^{T} ζ_{t} Δ_{t} (x) d t \neq 0 \\ \frac{1}{T} \int_{0}^{T} ζ_{t} ψ (Y_{t}, y) d t, & otherwise, \end{cases}$ (3) where $Δ_{t} (x) = K (\frac{D_{t}}{h_{T}})$ , $K (\cdot)$ is a kernel density function, $h_{T}$ is the smoothing parameter tending to zero as T goes to infinity.

Remark 2.1

When the sample has missing observations in the response variable, two strategies can be followed to estimate $m_{ψ} (x, y)$ . The first one, ${\hat{m}}_{ψ, T} (x, y)$ given in Equation (Equation3(3) ${\hat{m}}_{ψ, T} (x, y) := {\begin{cases} \frac{\int_{0}^{T} ζ_{t} ψ (Y_{t}, y) Δ_{t} (x) d t}{\int_{0}^{T} ζ_{t} Δ_{t} (x) d t}, & if \int_{0}^{T} ζ_{t} Δ_{t} (x) d t \neq 0 \\ \frac{1}{T} \int_{0}^{T} ζ_{t} ψ (Y_{t}, y) d t, & otherwise, \end{cases}$ (3) ), called simplified estimator which only uses complete observations. The second approach consists in using the simplified estimator ${\hat{m}}_{ψ, T} (x, y)$ to impute the missing values of the response variable $Y_{t}$ according to the following expression: $\tilde{ψ} (Y_{t}, y) := ζ_{t} ψ (Y_{t}, y) + (1 - ζ_{t}) {\hat{m}}_{ψ, T} (X_{t}, y)$ , (see, e.g. Chu and Cheng Citation2003 or González-Manteiga and Pérez-González Citation2004). Consequently, an estimator, say ${\tilde{m}}_{ψ, T} (x, y)$ , based on imputed data may be defined as follows: ${\tilde{m}}_{ψ, T} (x, y) = \frac{\int_{0}^{T} \tilde{ψ} (Y_{t}, y) Δ_{t} (x) d t}{\int_{0}^{T} Δ_{t} (x) d t} .$

From now on, we set $Z_{1} (x) := \int_{0}^{δ} Δ_{t} (x) d t$ and define the conditional bias as (4) $B_{T} (x, y) := \frac{{\bar{m}}_{ψ, T, 2} (x, y)}{{\bar{m}}_{ψ, T, 1} (x)} - m_{ψ} (x, y) := C_{T} (x, y) - m_{ψ} (x, y),$ (4) where ${\bar{m}}_{ψ, T, 1} (x) := {\bar{m}}_{ψ, T, 1} (x, 1)$ and for $i = 1, 2,$ (5) ${\bar{m}}_{ψ, T, i} (x, y^{i - 1}) := \frac{1}{n E (Z_{1} (x))} \int_{0}^{T} E {ζ_{t} (ψ (Y_{t}, y))^{i - 1} Δ_{t} (x) | F_{t - δ}} d t .$ (5) Before introducing the assumptions under which we establish our asymptotic results, we add the following notations. Let $o_{a . s .} (u)$ denote a real random function ℓ such that $ℓ (u) / u$ converges to zero almost surely (a.s.) as $u \to 0$ and denote $O_{a . s .} (u)$ a real random function ℓ such that $ℓ (u) / u$ is almost surely bounded.

(Assumptions on the kernel function). Let K be a nonnegative bounded kernel of class $C^{1}$ over its support $[0, 1]$ such that $K (1) > 0$ . The derivative $K^{'}$ exists on $[0, 1)$ and satisfies the condition $K^{'} (v) < 0$ for all $v \in [0, 1)$ and $| \int_{0}^{1} (K^{j})^{'} (v) d v | < \infty$ for $j = 1, 2.$
(Assumptions related to the continuous-time functional ergodic processes)
Let $α_{0}$ be a nonnegative real number and $x \in E$ . Suppose, for any $0 \leq s < t \leq T$ such that $t - s \leq α_{0}$ , there exists a nonnegative continuous random function $f_{t, s} (x) := f_{X_{t}, s} (x)$ a.s. bounded by a deterministic function $b_{s, α_{0}} (x)$ Footnote¹.
Moreover, let $g_{t, s, x} (\cdot)$ be a random function defined on $R$ , $f (x)$ is a deterministic nonnegative bounded function and $ϕ (\cdot)$ a nonnegative real function tending to zero (as its argument tends to 0), and assume that:
1. $F_{x} (u) := P (d (x, X_{t}) \leq u) = ϕ (u) f (x) + o (ϕ (u))$ as $u \to 0.$
2. For any $0 \leq s \leq t$ , $F_{x}^{F_{s}} (u) := P^{F_{s}} (d (x, X_{t}) \leq u) = P (d (x, X_{t}) \leq u | F_{s}) = ϕ (u) f_{t, s} (x) + g_{t, s, x} (u)$ with $g_{t, s, x} (u) = o_{a . s .} (ϕ (u))$ as $u \to 0,$ $g_{t, s, x} (u) / ϕ (u)$ a.s. bounded and $T^{- 1} \int_{0}^{T} g_{t, t - δ, x} (u) d t = o_{a . s .} (ϕ (u))$ as $T \to \infty$ and $u \to 0.$
3. For any $x \in E$ : $lim_{T \to \infty} \frac{1}{T} \int_{0}^{T} f_{t, t - δ} (x) d t = f (x)$ , a.s.
4. There exists a nondecreasing bounded function $τ_{0} : [0, 1] \to [0, 1]$ such that, uniformly in $u \in [0, 1]$ , $\frac{ϕ (hu)}{ϕ (h)} = τ_{0} (u) + o (1) as h ↓ 0 and \int_{0}^{1} (K (v))^{'} τ_{0} (v) d v < \infty .$
5. $T^{- 1} \int_{0}^{T} b_{t, α_{0}} (x) d t \to D_{α_{0}} (x) < \infty$ as $T \to \infty .$
(Local smoothness and continuity conditions)
Suppose for any $(y, t) \in S \times [0, T]$ and r>0 such that $t \leq r \leq t + δ$ :
1. $E (ψ (Y_{r}, y) | S_{t, δ}) = E (ψ (Y_{r}, y) | X_{r}) = m_{ψ} (X_{r}, y)$ a.s.
2. $E (ζ_{r} ψ (Y_{r}, y) | S_{t, δ}) = E (ζ_{r} ψ (Y_{r}, y) | X_{r})$ a.s. and for $κ \geq 2$ ,
  $E (ζ_{r} | ψ (Y_{r}, y) |^{κ} | S_{t, δ}) = E (ζ_{r} | ψ (Y_{r}, y) |^{κ} | X_{r})$ a.s.
3. $\exists β > 0$ and a constant c>0 such that, for any $(x^{'}, x^{″}) \in E^{2}$ , $| m_{ψ} (x^{'}, y) - m_{ψ} (x^{″}, y) | \leq c d^{β} (x^{'}, x^{″}) .$
4. For any $κ_{1} \geq 2$ , $E (| ψ (Y_{r}, y) |^{κ_{1}} | S_{t, δ}) = E (| ψ (Y_{r}, y) |^{κ_{1}} | X_{r})$ a.s.
  The functions $W_{κ_{1}} (x, y) := E (| ψ (Y, y) |^{κ_{1}} | X = x)$ and ${\bar{W}}_{κ_{1}} (x, y) := E (| ψ (Y, y) - m_{ψ} (x, y) |^{κ_{1}} | X = x)$ are continuous in the neighbourhood of x and $sup_{x \in C, y \in S} | W_{κ_{1}} (x, y) | < \infty$ a.s.
5. $E (ζ_{r} | S_{t, δ}) = E (ζ_{r} | X_{r}) = p (X_{r})$ a.s. and for any $x \in E$ ,
  $sup_{{z : d (x, z) \leq u}} | p (z) - p (x) | = o (1)$ a.s. as $u \to 0.$

For j = 1, 2, define the following moments, which are independent of $x \in E$ (6) $M_{j} = K^{(j)} (1) - \int_{0}^{1} {(K^{j})}^{'} (u) τ_{0} (u) d u,$ (6) where $K^{(j)} (\cdot)$ denotes the jth derivative of the kernel $K (\cdot)$ and $(K^{j})^{'}$ the first derivative of K raised to the power j.

2.1. Comments on the assumptions

Condition (A1) is related to the choice of the kernel K, which is very usual in nonparametric functional estimation. Note that Parzen symmetric kernel is not adequate in this context since the random process $D_{t} = d (x, X_{t})$ is positive, therefore we consider K with support $[0, 1]$ . This is a natural generalisation of the assumption usually made on the kernel in the multivariate case where K is supposed to be a spherically symmetric density function. The assumptions $K (1) > 0$ and $K^{'} < 0$ guarantee that $M_{1} > 0$ for all limit functions $τ_{0} .$ In the case of non-smooth processes, $τ_{0}$ may be equal to the Dirac δ-function at 1, the condition $K (1) > 0$ is needed to define the moments $M_{j}$ which are, in this case, determined by the value $K (1)$ .

Conditions (A2)(i) –(ii) reflect the ergodicity property assumed on the continuous-time functional process. It plays an important role in studying the asymptotic properties of the estimator. The functions $f_{t, s}$ and f play the same role as the conditional and unconditional densities in finite dimensional case, whereas $ϕ (u)$ characterises the impact of the radius u on the small ball probability as u goes to 0. Several examples to satisfy these conditions are given in Laïb and Louani (Citation2010) for discrete-time functional data process. Some other examples satisfying this condition are also given in Didi and Louani (Citation2014) when observations $(X_{t}, Y_{t})_{0 \leq t \leq T}$ are sampled from an ergodic continuous-time process taking values in $R^{d} \times R$ space.

Condition (A2)-(iii) involves the ergodic nature of the process where the random function $f_{t, t - δ}$ belongs to the space of continuous functions. Note that approximating the integral $\int_{0}^{T} f_{t, t - δ} (x) d t$ by its Riemann's sum: $T^{- 1} \int_{0}^{T} f_{t, t - δ} (x) d t ≃ n^{- 1} \sum_{j = 1}^{n} f_{jδ, (j - 1) δ} (x)$ allows to easily prove that the sequence $(f_{jδ, (j - 1) δ} (x))_{j \geq 1}$ is stationary and ergodic (see Didi and Louani Citation2014). (A2)-(iv) is a usual condition when dealing with functional data, whereas (A2)-(v) is a consequence of ergodic assumption.

(A.3)(ii) is a Hölder-type assumption that requires a certain smoothness of the regression operator $m_{ψ} (\cdot, y) .$ Such assumption is commonly used in nonparametric estimation. (A.3)(iii) is a smoothness condition on the κth centred conditional moments of $ψ (Y, y) .$ (A3)(iv) assumes the continuity of the conditional probability of observing a missing response. Finally, note that the moments $(M_{j})_{j = 1, 2}$ are linked to the small probability function through $τ_{0}$ . One can refer to Ferraty et al. (Citation2007) for a discussion on the choice of $τ_{0}$ , the Kernel K and the positively of $(M_{j})_{j = 1, 2}$ .

Discussion on the assumptions (A3)(i) –(i $^{'}$ ). These hypotheses are Markov-type conditions that characterise the conditional moments of $ψ (Y, y)$ . They are satisfied for a general class of processes including the α-mixing and non α-mixing as well as long memory and the Bernoulli shift processes. As pointed in Doukhan and Louhichi (Citation1999), the main attraction of Bernoulli shift processes is that they provide examples of processes that are weakly dependent, but not mixing. According to the discussion made in the introduction we consider below some examples in both context (continuous and discretised processes) where the predictor X is a stationary and ergodic Markovian process that might be α-mixing or not and satisfies the conditions (A.3)(i) –( $i^{'}$ ).

First of all let us recall the following definitions.

Definition 2.1

see Doukhan and Louhichi (Citation1999)

Let $(ϵ_{i})_{i \in Z}$ be a sequence of independent real-valued r.v.s and F be a measurable function defined on $R^{Z} .$ A Bernoulli shift is a sequence $(U_{i})_{i \in Z}$ defined by $U_{i} = F (ϵ_{i - j}, j \in Z) .$

Definition 2.2

see Doukhan (Citation2018, p. 60)

A centred second-order stationary process $(X_{n})$ is called long-range dependent (LRD) if $\sum_{k = 0}^{\infty} r_{k}^{2} < \infty$ and $\sum_{k = 0}^{\infty} | r_{k} | = \infty$ , where $r_{k} = Cov (X_{k}, X_{0}) .$

Definition 2.3

A process $(B_{t}^{H})_{t \in R}$ is called a fractional Brownian motion (fBm) with Hurst exponent $H \in (0, 1]$ ; if it is almost surely continuous, centred Gaussian process with covariance $Γ_{H} (s, t) = Cov (B_{t}^{H}, B_{s}^{H}) = \frac{1}{2} (| t |^{2 H} + | s |^{2 H} (| t - s |^{2 H}), \forall s, t \in R .$

Definition 2.4

see Lemma 4.2 in Maslowski and Pospíšil (Citation2008)

A strictly stationary centred Gaussian process $(Y_{t})_{t \geq 0}$ is ergodic if $lim_{t \to \infty} R (t) := lim_{t \to \infty} E (Y (0) Y (t)) = 0.$

Example 2.5

Continuous-time long memory processes

Let $λ, σ > 0$ and consider the Langevin equation with fBM noise $B_{t}^{H}$ and initial condition $Z_{0}$ : (7) $Z_{t} = Z_{0} - λ \int_{0}^{t} Z_{s} d s + σ B_{t}^{H}, t \geq 0.$ (7) Then, for each $H \in (0, 1)$ , the following Gaussian stationary Markovian fractional Ornstein–Uhlenbeck process $(X_{t}^{H})_{t \geq 0}$ defined as $X_{t}^{H} := σ \int_{- \infty}^{t} e^{- λ (t - u)} d B_{u}^{H}, t \geq 0,$ is the unique (a.s.) solution of (Equation7(7) $Z_{t} = Z_{0} - λ \int_{0}^{t} Z_{s} d s + σ B_{t}^{H}, t \geq 0.$ (7) ) with initial condition $Z_{0} = X_{0}^{H}$ (for more details, see Section 2, p. 5, in Cheridito et al. Citation2003).

Note that for $H \in (\frac{1}{2}, 1)$ , the auto-covariance function of $(X_{t}^{H})_{t \geq 0}$ is similar to that of the increments of $(B_{t}^{H})_{t \geq 0}$ . Therefore, $X_{t}^{H}$ is ergodic (by Definition 2.3) and exhibits long-range dependence as detailed in Theorem 2.3 and the discussion in the end of page 8 in Cheridito et al. (Citation2003).

Now, to check the condition (A.3)(i), consider the model: $ψ (Y_{t}, y) = m_{ψ} (X_{t}^{H}, y) + ϵ_{t}, where ϵ_{t}'s are centred, i.i.d. and independent of X_{t}^{H} .$ Let $S_{t, δ}$ be the σ-field generated by $σ ((X_{s}^{H}, ϵ_{s}), (X_{r}^{H}) : 0 \leq s < t, t \leq r \leq t + δ)$ .

It follows that, for any $r \geq t$ , $E [ψ (Y_{r}, y) | S_{t, δ}] = E [m_{ψ} (X_{r}^{H}, y) + ϵ_{r} | S_{t, δ}]$ .

Since $(X_{r}^{H}, ϵ_{r})$ are Markovian then $E [ψ (Y_{r}, y) | S_{t, δ}] = m (X_{r}^{H})$ almost surely. Thus condition (A.3)(i) is satisfied.

Example 2.6

Discrete-time processes

As discussed above, in real life we do not observe the process continuously at any time $t \in [0, T]$ . We rather observe a discretised version of it based on some sampling scheme. The following examples are used to show that Assumption (A3)(i) is also satisfied for discretised processes as well.

(i) Long-memory discrete-time processes. Let $(ϵ_{t})_{t \in Z}$ be a white noise process with variance $σ^{2}$ , and let I and B be the identity operator and the backshift operator, respectively. Giraitis and Leipus (Citation1995) have proved (see Theorem 1 p. 55) that the k-factor Gegenbauer process $\prod_{i \leq i \leq k} (I - 2 ν_{i} B + B^{2})^{d_{i}} X_{t} = ϵ_{t},$ where $0 < d_{i} < 1 / 2$ if $| ν_{i} | < 1$ or $0 < d_{i} < 1 / 4$ if $| ν_{i} | = 1$ , for $i = 1, \dots, k$ , is long memory, stationary, causal and invertible and has the moving average representation. That is $X_{t} = \sum_{j \geq 0} ψ_{j} (d, ν) ϵ_{t - j}$ with $\sum_{j = 0}^{\infty} ψ_{j}^{2} (d, ν) < \infty .$

On the other hand, Guégan and Ladoucette (Citation2001) have shown that, if $(ϵ_{t})_{t \in Z}$ is a Gaussian process, then the above process is not strong mixing whereas the moving average representation of $(X_{t})$ confirms that it is a stationary Gaussian and ergodic process.

(ii) The stationary solution of the linear Markov AR(1) process: $X_{i} = \frac{1}{2} X_{i - 1} + ϵ_{i}$ , where $(ϵ_{i})$ are independent symmetric Bernoulli random variables taking values $- 1$ and 1, is not α-mixing (see Andrews Citation1984). However, $(X_{i})$ is a Markovian stationary and ergodic process.

(iii) Let $(u_{i})$ be an i.i.d. sequence uniformly distributed on ${1, \dots, 9}$ , and set $X_{t} := \sum_{i = 0}^{\infty} 10^{- i - 1} u_{t - i}$ , where the sequence $u_{t}, u_{t - 1}, \dots,$ represents the decimals of $X_{t}$ . The process $X = (X_{t})$ is stationary and admits the following AR(1) representation: $X_{t} = \frac{1}{10} X_{t - 1} + \frac{1}{10} u_{t} = \frac{1}{10} X_{t - 1} + \frac{1}{2} + ϵ_{t}$ where $ϵ_{t} = \frac{1}{10} u_{t} - \frac{1}{2}$ is a strong white noise. This process is not α-mixing (see Francq and Zakoïan Citation2010, Example A.3, p. 349), but it is ergodic.

To check the hypothesis (A3)(i) for Examples (i), (ii) and (iii), consider the regression model $ψ (Y_{i}, y) = m_{ψ} (X_{i}, y) + σ_{ψ} (X_{i}) η_{i},$ where $(η_{i})$ is a white noise process independent of $(X_{i})$ and define the σ-field: $G_{i} = σ ((X_{1}, η_{1}, ζ_{1}), \dots, (X_{i}, η_{i}, ζ_{i}), X_{i + 1})$ . It is then easy to see that condition (A3)(i) is fulfilled. The discrete-time processes in examples (i)–(iii) are still valid for the regression model developed in Section 3.5 under the context of sampling schemes.

3. Main results

In this section, we investigate several asymptotic properties of the continuous-time generalised regression estimator. Some particular cases, related to specific choices of the function $ψ (\cdot, y)$ , including the conditional cumulative distribution function and the conditional quantiles will also be discussed.

3.1. Almost sure consistency rates

3.1.1. Pointwise consistency

The following theorem establishes an almost sure pointwise consistency rate of ${\hat{m}}_{ψ, T} (x, y) .$

Theorem 3.1

Pointwise consistency

Assume that (A1)–(A3) hold true and the following conditions are satisfied: (8) $lim_{T \to \infty} Tϕ (h_{T}) = \infty and lim_{T \to \infty} \frac{\log T}{Tϕ (h_{T})} = 0.$ (8) Then, for T sufficiently large, we have (9) ${\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y) = O_{a . s .} (h_{T}^{β}) + O_{a . s .} (\sqrt{\frac{\log T}{Tϕ (h_{T})}}) .$ (9)

The proof of Theorem 3.1 is detailed in the supplementary material in Chaouch and Laïb (Citation2023).

Remark 3.1

Theorem 3.1 generalises Theorem 1 of Laïb and Louani (Citation2011) established in the context of discrete-time stationary and ergodic processes, and Theorem 3.4 of Ferraty et al. (Citation2005) stated under a strong mixing assumption with completely observed response where the support of y is reduced to one point. Moreover, the function $ϕ (h_{T})$ can decrease to zero at an exponential rate, whenever $h_{T}$ goes to zero, therefore $h_{T}$ should be chosen to decrease to zero at a logarithmic rate.

3.1.2. Uniform consistency

To establish the uniform consistency with rate of the regression operator, we need some additional definitions and assumptions that allow to express the uniform convergence rate as a function of the entropy number. Let $C$ and S be compact sets in $E$ and $R$ , respectively. Consider, for any $ϵ > 0$ , the ϵ-covering number of the compact set $C$ , say $N_{ϵ} := N (ϵ, C, d)$ , defined by $\begin{aligned} N_{ϵ} & := min {n : there exist c_{1}, \dots, c_{n} \in C such that ∀x \in C we can find \\ 1 \leq i \leq n for which d (x, c_{i}) < ϵ} . \end{aligned}$ The number $N_{ϵ}$ measures how full is the class $C$ . The finite set of points $c_{1}, c_{2}, \dots, c_{N_{ϵ}}$ is called an ϵ-net of $C$ if $C \subset \cup_{k = 1}^{N_{ϵ}} B (c_{k}, ϵ)$ , where $B (c_{k}, ϵ)$ is the ball, centred at $c_{k}$ and of radius ϵ, with respect to the topology induced by the semi-metric $d (\cdot, \cdot)$ . The quantity $φ_{C} (ϵ) = \log (N_{ϵ})$ is called the Kolmogorov's ϵ-entropy of the set $C$ that may be seen as a tool to measure the complexity of the subset $C$ , in the sense that high entropy means that a large amount of information is needed to describe an element of $C$ with an accuracy ϵ. Several examples of $φ_{C} (ϵ)$ covering special cases of functional processes are given in Ferraty et al. (Citation2010) and Laïb and Louani (Citation2011).

(U0)	Assume that (A2) holds uniformly in the following sense: (A2)(i) and (A2)(ii) hold true with the remaining term $o (ϕ (u))$ is uniform in x. For any $x \in C$ , $lim_{T \to \infty} sup_{x \in C} \| \frac{1}{T} \int_{0}^{T} f_{t, t - δ} (x) d t - f (x) \| = 0 a . s .$ $T^{- 1} \int_{0}^{T} b_{t, α_{0}} (x) d t \to D_{α_{0}} (x)$ as $T \to \infty$ with $0 < sup_{x \in C} D_{α_{0}} (x) < \infty$ . $b_{0} < inf_{x \in C} f (x) \leq sup_{x \in C} f (x) < \infty$ for some nonnegative real number $b_{0}$ . $inf_{x \in C} p (x) > b_{1}$ for some nonnegative real number $b_{1}$ .
(U1)	The kernel function K satisfies the following conditions: K is a Hölder function of order 1 with a constant $a_{K}$ . There exist two constants $a_{2}$ and $a_{3}$ such that $0 < a_{2} \leq K (x) \leq a_{3} < \infty$ , for any $x \in C .$
(U2)	For $1 \leq ℓ \leq 2$ , the sequence of random variables $(ψ^{ℓ} (Y_{t}, y))_{t}$ is ergodic and $E (\| ψ^{ℓ} (Y_{0}, y) \|) < \infty .$
(U3)	There exist $c_{ψ} > 0$ and nonnegative real number γ such that for any $y \in S$ $sup_{y^{'} \in [y - u, y + u] \cap S} \| ψ (Y_{t}, y) - ψ (Y_{t}, y^{'}) \| \leq c_{ψ} u^{γ} .$
(U4)	Let $T_{n}$ be the integer part of T and suppose for T large enough $\frac{(\log T)^{2}}{Tϕ (h_{T})} < φ_{C} (ϵ_{n}) < \frac{Tϕ (h_{T})}{\log T} with ϵ_{n} = \frac{\log T_{n}}{T_{n}} .$

Conditions in (U0) are standard in this context to get uniform consistency rate. Condition (U1) is usually used when we deal with nonparametric estimation for functional data, (U2) requires the existence of the moments up to order 2 of $ψ (Y, y)$ . (U3) is a regularity condition upon the function $ψ (\cdot, y)$ which is necessary to obtain the uniform consistency result over the compact S. (U4) allows to cover the subset $C$ with a finite number of balls and to express the convergence rate in terms of the Kolmogorov's entropy of this subset. Similar condition has been used in Ferraty et al. (Citation2010), where the authors have pointed out that, for a radius not too large, one requires the quantity $φ_{C} (\log T_{n} / T_{n})$ to be not too small and not too large. This condition seems to satisfy this exigence, since it implies that $\frac{φ_{C} (\log T_{n} / T_{n})}{Tϕ (h_{T})}$ goes to 0 for sufficiently large T. Examples given in Ferraty et al. (Citation2010) and Laïb and Louani (Citation2011) satisfy (U4).

Theorem 3.2 states uniform consistency rate of the Kernel regression estimator. It generalises Theorem 2 in Ferraty et al. (Citation2010) in the i.i.d. case and that in Laïb and Louani (Citation2011) established in the context of discrete-time stationary and ergodic processes with completely observed response.

Theorem 3.2

Uniform consistency

Assume (A1), (U0)–(U4), (A3) hold true. Moreover, suppose conditions in (Equation8(8) $lim_{T \to \infty} Tϕ (h_{T}) = \infty and lim_{T \to \infty} \frac{\log T}{Tϕ (h_{T})} = 0.$ (8) ) are satisfied and (10) $\sum_{n \geq 1} n^{γ} \exp {(1 - η) φ_{C} (\frac{\log n}{n})} < \infty forsome η > 0 where γ isasin (U 3) .$ (10) Then we have (11) $sup_{y \in S} sup_{x \in C} | {\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y) | = O_{a . s .} (h_{T}^{β}) + O_{a . s .} (\sqrt{\frac{φ_{C} (ϵ_{T})}{Tϕ (h_{T})}}) as T \to + \infty .$ (11)

The proof of Theorem 3.2 is detailed in the supplementary material in Chaouch and Laïb (Citation2023).

3.2. Asymptotic conditional bias and risk evaluation

Before evaluating the conditional bias, let us introduce some additional notations. Consider for i = 1, 2, the following assumption:

(BC1) Recall that $D_{t} (x) = d (X_{t}, x)$ and, for any $t \geq 0$ , denote $\begin{aligned} E [m_{ψ} (X_{t}, y) - m_{ψ} (x, y) | D_{t} (x), F_{t - δ}] \\ = E [m_{ψ} (X_{t}, y) - m_{ψ} (x, y) | D_{t} (x)] =: Ψ_{y} (D_{t} (x)) . \end{aligned}$ Assume that the function $Ψ_{y}$ is differentiable at 0 and satisfies $Ψ_{y} (0) = 0$ and $Ψ_{y}^{'} (0) \neq 0$ for any $y \in R$ . This condition was introduced in Ferraty et al. (Citation2007) and used by Laïb and Louani (Citation2010) to evaluate the conditional bias. The introduction of $Ψ_{y} (\cdot)$ allows to make an integration with respect to the real random variable $D_{t} (x)$ rather than the couple of random variables $(D_{t} (x), X_{t})$ , where $X_{t}$ being functional continuous random variable.

The following proposition gives asymptotic expression of the conditional bias term, which generalises Proposition 1 in Laïb and Louani (Citation2010) for discrete-time estimator to our setting. Its proof is similar to the one in the discrete-time framework and therefore is omitted.

Proposition 3.3

Conditional Bias

Under assumptions (A1)–(A3), (BC1) and conditions in (Equation8(8) $lim_{T \to \infty} Tϕ (h_{T}) = \infty and lim_{T \to \infty} \frac{\log T}{Tϕ (h_{T})} = 0.$ (8) ), we have $B_{T} (x, y) = \frac{h_{T} Ψ_{y}^{'} (0)}{M_{1}} [K (1) - \int_{0}^{1} (sK (s))^{'} τ_{0} (s) d s + o_{a . s .} (1)] + O_{a . s .} (h_{T} \sqrt{\frac{\log T}{Tϕ (h_{T})}}) .$

The next result gives an explicit expression of the asymptotic quadratic risk associated to the estimator ${\hat{m}}_{ψ, T} (x, y)$ .

Theorem 3.4

Quadratic risk

Suppose that Assumptions (A1)–(A3) hold true. Then, whenever $p (x) > 0$ and $f (x) > 0$ , we have, for a fixed $(x, y) \in E \times R$ , that $\begin{aligned} MSE (x, y) := E [{({\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y))}^{2}] \\ = A_{1} h_{T}^{2} [A_{1} + O (\sqrt{\frac{\log T}{Tϕ (h_{T})}})] + \frac{A_{2} (x, y)}{Tϕ (h_{T})}, \end{aligned}$ where $\begin{aligned} A_{1} & = \frac{Ψ_{y}^{'} (0)}{M_{1}} [K (1) - \int_{0}^{1} (sK (s))^{'} τ_{0} (s) d s + o (1)] and \\ A_{2} (x, y) & = \frac{4 (W_{2} (x, y) + (m_{ψ} (x, y))^{2}) M_{2}}{p (x) M_{1}^{2} f (x)} . \end{aligned}$

Remark 3.2

Note that, for sufficiently large T, the expression of MSE becomes $A_{1}^{2} h_{T}^{2} + \frac{A_{2} (x, y)}{Tϕ (h_{T})}$ . This result generalises the one in Chaouch and Laïb (Citation2019) established in the framework of real-valued continuous-time processes. Note, however that, for finite-dimensional continuous-time processes with MAR response, the bias term obtained in Chaouch and Laïb (Citation2019) is of order $h_{T}^{2}$ which is smaller than $h_{T}$ given in Proposition 3.3. The increase in the bias term is because of the infinite dimensional characteristic of the functional space.
The mean squared error can be used as a theoretical guidance to select the ‘optimal’ bandwidth by minimising the quantity $A_{1}^{2} h_{T}^{2} + \frac{A_{2} (x, y)}{Tϕ (h_{T})}$ with respect to $h_{T}$ . However, $A_{1}$ and $A_{2} (x, y)$ depend on some unknown quantities which should be replaced by their empirical consistent estimators, namely $Ψ_{y, T}^{'} (0), (M_{j, T})_{j = 1, 2}, τ_{0, T}, p_{T}, W_{2, T},$ and $f_{T}$ . Note that $Ψ_{y}^{'} (0)$ may be viewed as real regression function with response variable $m_{ψ} (X, y) - m_{ψ} (x, y)$ and predictor $d (X, x)$ . It may be then estimated by a kernel regression estimate $Ψ_{y, T}^{'} (0)$ by replacing $m_{ψ} (x, y)$ by its estimator ${\hat{m}}_{ψ, T} (X, y)$ .

3.3. Asymptotic normality

The following theorem establishes the asymptotic distribution of the estimator.

Theorem 3.5

Assume that conditions (A1)–(A3) are fulfilled. Suppose that, for β as defined in (A3)(ii), the following conditions hold true: (12) $lim_{T \to \infty} Tϕ (h_{T}) = \infty, h_{T}^{β} \sqrt{Tϕ (h_{T})} = o (1) and h_{T}^{β} \log T^{1 / 2} = o (1) as T \to \infty .$ (12) Then, for any $(x, y) \in E \times S$ such that $f (x) > 0,$ we have $\sqrt{Tϕ (h_{T})} ({\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y)) \overset{d}{\to} N (0, σ^{2} (x, y)),$ where (13) $σ^{2} (x, y) \leq \frac{1}{f (x)} \frac{M_{2}}{M_{1}^{2} p (x)} {\bar{W}}_{2} (x, y) =: \frac{1}{f (x)} \tilde{V} (x, y) as T ⟶ \infty$ (13) and ${\bar{W}}_{2} (x, y) := E [{(ψ (Y, y) - m_{ψ} (x, y))}^{2} | X = x] .$

Note that the statement (Equation13(13) $σ^{2} (x, y) \leq \frac{1}{f (x)} \frac{M_{2}}{M_{1}^{2} p (x)} {\bar{W}}_{2} (x, y) =: \frac{1}{f (x)} \tilde{V} (x, y) as T ⟶ \infty$ (13) ) gives only an upper bound of the asymptotic variance $σ^{2} (x, y)$ . The following proposition gives an estimate of $\tilde{V} (x, y)$ that will be needed to construct confidence intervals for the unknown operator $m_{ψ} (x, y)$ .

Proposition 3.6

Suppose conditions of Theorem 3.5 hold and $σ^{2} (x, y) > 0$ , then (14) ${\hat{V}}_{T} (x, y) := \frac{\sqrt{M_{T, 2}}}{M_{1, T}} \sqrt{\frac{{\bar{W}}_{2, T} (x, y)}{T F_{x, T} (h_{T}) p_{T} (x)}},$ (14) is a consistent estimator for $\tilde{V} (x, y)$ . The quantities $M_{1, T}$ , $M_{2, T}$ , ${\bar{W}}_{2, T}$ , $p_{T} (x)$ and $F_{x, T}$ are empirical versions of $M_{1}$ , $M_{2}$ , ${\bar{W}}_{2}$ , $p (x)$ and $F_{x},$ respectively.

$M_{1, T}$ and $M_{2, T}$ are calculated by replacing $τ_{0}$ , given in (A2)(iv), by its empirical version $τ_{0, T} (u) = \frac{F_{x, T} (u h_{T})}{F_{x, T} (h_{T})} where F_{x, T} (u) = \frac{1}{T} \int_{0}^{T} 1 l_{{d (x, X_{t}) \leq u}} d t .$ On the other hand ${\bar{W}}_{2, T} (x, y)$ and $p_{T} (x)$ are given by ${\bar{W}}_{2, T} (x, y) = \frac{\int_{0}^{T} ζ_{t} (ψ (Y_{t}, y))^{2} Δ_{t} (x) d t}{\int_{0}^{T} ζ_{t} Δ_{t} (x) d t} - ({\hat{m}}_{ψ, T} (x, y))^{2} and p_{T} (x) = \frac{\int_{0}^{T} ζ_{t} Δ_{t} (x) d t}{\int_{0}^{T} Δ_{t} (x) d t} .$

3.4. Continuous-time confidence intervals

Using the non-decreasing property of the cumulative standard Gaussian distribution function, the estimator ${\hat{V}}_{T} (x, y)$ , defined in (Equation14(14) ${\hat{V}}_{T} (x, y) := \frac{\sqrt{M_{T, 2}}}{M_{1, T}} \sqrt{\frac{{\bar{W}}_{2, T} (x, y)}{T F_{x, T} (h_{T}) p_{T} (x)}},$ (14) ), with the help of Proposition 3.6 and Theorem 3.5, the following corollary provides estimated confidence intervals for $m_{ψ} (x, y)$ at any x fixed.

Corollary 3.7

Assume conditions of Theorem 3.5 are fulfilled and the conditions in (Equation12(12) $lim_{T \to \infty} Tϕ (h_{T}) = \infty, h_{T}^{β} \sqrt{Tϕ (h_{T})} = o (1) and h_{T}^{β} \log T^{1 / 2} = o (1) as T \to \infty .$ (12) ) are replaced by (15) $lim_{T \to \infty} T F_{x, T} (h_{T}) = \infty and lim_{T \to \infty} h_{T}^{β} \sqrt{T F_{x, T} (h_{T})} = 0.$ (15) Then, for any $0 < α < 1$ , the $(1 - α)$ confidence intervals for $m_{ψ} (x, y)$ are (16) $I_{1 - α} (x, y) = [{\hat{m}}_{ψ, T} (x, y) - c_{α / 2} \sqrt{{\hat{V}}_{T} (x, y)}; {\hat{m}}_{ψ, T} (x, y) + c_{1 - α / 2} \sqrt{{\hat{V}}_{T} (x, y)}],$ (16) where $c_{α}$ is the αth quantile of the standard normal distribution.

These intervals are similar to those given in Remark 2 in Laïb and Louani (Citation2010) for discrete-time ergodic context with complete data, and those obtained in Ling et al. (Citation2015) for discrete-time stationary ergodic data with missing at random response.

3.5. Sampling schemes and computation of the confidence intervals

In the previous section, the process was supposed to be observable over $[0, T]$ . However, in practice the data are often collected according to a sampling scheme since it is difficult to observe a path continuously at any time t over the interval $[0, T]$ . Hereafter, we briefly discuss the effect of a sampling scheme on the construction of confidence intervals for the regression function $m_{ψ} (x, y)$ . Assume that the data are sampled, either regularly, irregularly or even randomly, from an underlying continuous-time process at instants $(t_{k})_{k = 1, \dots, n}$ . For a sake of simplicity, we consider here the case where the instants $(t_{k})$ are irregularly spaced, that is $inf_{1 \leq k \leq n} | t_{k + 1} - t_{k} | = δ > 0.$ Now, for $k \in {1, \dots n}$ , we define the following increasing families of σ-algebra: $F_{k} := F_{t_{k}} = σ ((X_{t_{1}}, Y_{t_{1}}), \dots, (X_{t_{k}}, Y_{t_{k}}))$ and $G_{k} := G_{t_{k}} = σ ((X_{t_{1}}, Y_{t_{1}}), \dots, (X_{t_{k}}, Y_{t_{k}}); X_{t_{k + 1}}) .$ The purpose then consists in estimating $m_{ψ} (x, y)$ given the discrete-time ergodic stationary process $(X_{t_{k}}, Y_{t_{k}}, ζ_{t_{k}})_{1 \leq k \leq n}$ sampled from the underlying continuous-time process ${(X_{t}, Y_{t}, ζ_{t})}_{0 \leq t \leq T}$ . In case of a regular sampling scheme, that is $T = nδ$ , the regression function is $m_{ψ} (x, y) := \frac{E (ζ_{t_{k}} ψ (Y_{t_{k}}, y) | X_{t_{k}} = x)}{E (ζ | X_{t_{k}} = x)}, 1 \leq k \leq n$ and its estimator ${\hat{m}}_{ψ, T} (x, y)$ defined in (Equation3(3) ${\hat{m}}_{ψ, T} (x, y) := {\begin{cases} \frac{\int_{0}^{T} ζ_{t} ψ (Y_{t}, y) Δ_{t} (x) d t}{\int_{0}^{T} ζ_{t} Δ_{t} (x) d t}, & if \int_{0}^{T} ζ_{t} Δ_{t} (x) d t \neq 0 \\ \frac{1}{T} \int_{0}^{T} ζ_{t} ψ (Y_{t}, y) d t, & otherwise, \end{cases}$ (3) ) becomes (17) ${\hat{m}}_{ψ, n} (x, y) = \frac{\sum_{k = 1}^{n} ζ_{t_{k}} ψ (Y_{t_{k}}, y) K (\frac{d (x, X_{t_{k}})}{h_{n}})}{\sum_{k = 1}^{n} ζ_{t_{k}} K (\frac{d (x, X_{t_{k}})}{h_{n}})}, t_{k} = kδ (1 \leq k \leq n) .$ (17) Note that Theorem 3.5 holds for the estimate ${\hat{m}}_{ψ, n} (x, y)$ when replacing T by $nδ$ . The limiting law is a Gaussian random variable with mean zero and variance function $σ^{2} (x, y) = \frac{1}{f (x)} \frac{M_{2}}{M_{1}^{2} p (x)} {\bar{W}}_{2} (x, y) .$ Making use of Corollary 3.7 and considering similar steps as in Laïb and Louani (Citation2010), it follows that, for any $0 < α < 1$ , the $(1 - α)$ asymptotic confidence intervals of $m_{ψ} (x, y)$ are (18) ${\hat{m}}_{ψ, n} (x, y) \pm c_{1 - α / 2} \frac{\sqrt{M_{n, 2}}}{M_{n, 1}} \sqrt{\frac{{\bar{W}}_{n, 2} (x, y)}{n F_{x, n} (x) p_{n} (x)}}, as n \to \infty,$ (18) where $c_{1 - α / 2}$ is the quantile of standard normal distribution.

4. Simulation study

This section aims to discuss numerically some aspects related to continuous-time processes that might affect the quality of estimation of the operator $m_{ψ} (x, y)$ . Here we consider $ψ (Y_{t}, y) = Y_{t},$ therefore $m_{ψ} (x, y) = m (x) = E (Y_{t} | X_{t} = x)$ , where $m (x)$ is the conditional expectation of $Y_{t}$ given $X_{t} = x .$ The first simulation aims to compare the quality of estimation of $m (x)$ based on the continuous-time and discrete-time processes. In the second simulation, we discuss the choice of the ‘optimal’ sampling mesh δ in the case of continuous-time processes and assess its sensitivity to the missing at random mechanism. Finally, the third simulation discusses the effect of the MAR rate on the coverage rate and length of the estimated confidence intervals.

4.1. Simulation 1: continuous-time versus discrete-time estimators

In this first simulation, we try to compare the estimation of the regression operator when discrete- and continuous-time processes are considered. We want to know whether considering a continuous-time processes may improve the quality of the predictions or not. We suppose that the functional space $E = L^{2} ([- 1, 1])$ endowed with its natural norm. The generation of continuous-time processes $({X_{t} (s) : s \in [- 1, 1]}, Y_{t})_{t \in [0, T]}$ is obtained by considering the following steps:

First, we simulate an Ornstein–Uhlenbeck (OU) process $(Z_{t})_{t \geq 0}$ solution of the following stochastic differential equation: (19) $d Z_{t} = 2 (5 - Z_{t}) d t + 7 d W_{t},$ (19) where $W_{t}$ denotes a Wiener process. Here, we take $d t = 0.005.$
Let $Γ (\cdot)$ be the operator mapping $R$ into $L^{2} ([- 1, 1])$ defined, for any $z \in R$ , as follows: $Γ (z) := (1 + ⌊ z ⌋ - z) P_{num (⌊ z ⌋)} + (z - ⌊ z ⌋) P_{num (⌊ z + 1 ⌋)},$ where $P_{j}$ is the Legendre polynomials of degree j and $num (z) := 1 + 2 z sign (z) - sign (z) (1 + sign (z)) / 2$ and $⌊ \cdot ⌋$ denotes the floor function.
We consider that curves are sampled at 400 equispaced values in $[- 1, 1]$ and defined, for any $t \in [0, T]$ , as $X_{t} (s) = Γ (Z_{t}) (s), s \in [- 1, 1] .$
To generate the real-valued process $(Y_{t})_{t \in [0, T]}$ , the following nonlinear functional regression model is considered: (20) $Y_{t} = m (X_{t}) + ε_{t},$ (20) where $m (x) := \int_{- 1}^{1} x^{2} (s) d s,$ $ε_{t} = U_{t} - U_{t - 1}$ where $U_{t}$ is a Wiener process independent of $X_{t}$ .

Observe that the OU process ${Z_{t} : t \in [0, T]}$ is a real-valued continuous-time process (since dt tends to zero). The operator $Γ (\cdot)$ has a role to transform each observation in the process $Z_{t}$ into a curve through the Legendre polynomials. In such way, the functional variable X is generated continuously as is the process $(Z_{t})$ . Moreover, note that steps 1, 2 and 3 are devoted to simulate the continuous-time functional process ${X_{t} (s) : s \in [- 1, 1]}_{t \in [0, T]}$ , whereas in step 4 the real-valued continuous-time process $(Y_{t})_{t \in [0, T]}$ is generated. A sample of 20 simulated curves is displayed in Figure (left) and an example of the real-valued process $(Y_{t})$ is given in Figure (right).

Figure 1. Left: A sample of 20 simulated curves ${X_{t} (s) : s \in [- 1, 1]}$ . Right: A realisation of the process $(Y_{t})_{t \in [0, 200]}$ .

Figure 1. Left: A sample of 20 simulated curves {Xt(s):s∈[−1,1]}. Right: A realisation of the process (Yt)t∈[0,200].

Now, our purpose is to compare, in terms of estimation accuracy, the continuous-time estimator with the discrete-time one for different values of T = 50, 200, 1000 and several missing at random rates. It is worth noting that the continuous-time process $(X_{t}, Y_{t})$ is observed at every instant $t = δ, 2 δ, \dots, nδ$ , where $δ = 0.005$ and $n = T / δ$ . However, the discrete-time process is observed only at the instants $t = 1, 2, \dots, n .$

As in Ferraty et al. (Citation2013) and Ling et al. (Citation2015), we consider that the missing at random mechanism is led by the following probability distribution: (21) $p (x) = P (ζ_{t} = 1 | X_{t} = x) = expit (\int_{- 1}^{1} x^{2} (s) d s),$ (21) where $expit (u) = e^{u} / (1 + e^{u})$ , for $u \in R .$ Now, we specify the tuning parameters on which depend our estimator given in (Equation3(3) ${\hat{m}}_{ψ, T} (x, y) := {\begin{cases} \frac{\int_{0}^{T} ζ_{t} ψ (Y_{t}, y) Δ_{t} (x) d t}{\int_{0}^{T} ζ_{t} Δ_{t} (x) d t}, & if \int_{0}^{T} ζ_{t} Δ_{t} (x) d t \neq 0 \\ \frac{1}{T} \int_{0}^{T} ζ_{t} ψ (Y_{t}, y) d t, & otherwise, \end{cases}$ (3) ). We choose the quadratic kernel defined as $K (u) = \frac{3}{4} (1 - u^{2}) 1 l_{(0, 1)} (u)$ and because curves are smooth enough we choose as semi-metric the $L_{2}$ -norm of the second derivatives of the curves, that is for $t_{1} \neq t_{2}$ , (22) $d (X_{t_{1}}, X_{t_{2}}) = {(\int_{- 1}^{1} [X_{t_{1}}^{(2)} (s) - X_{t_{2}}^{(2)} (s)]^{2} d s)}^{1 / 2} .$ (22) We used the local cross-validation method on the κ-nearest neighbours introduced in Ferraty and Vieu (Citation2006) page 116 to select the optimal bandwidth for both discrete- and continuous-time regression estimators. The accuracy of the discrete- and continuous-time regression estimators is evaluated over M = 500 replications. The accuracy is measured, at each replication $j = 1, \dots, M$ , by using the squared errors ${SE}_{T}^{j} := ({\hat{m}}_{T}^{j} (x) - m (x))^{2}$ and ${SE}_{n}^{j} := ({\hat{m}}_{n}^{j} (x) - m (x))^{2}$ for the continuous-time and discrete-time estimators, respectively. Observe that the discrete-time estimator of the regression operator is defined as ${\hat{m}}_{n} (x) := \frac{\sum_{t = 1}^{n} ζ_{t} Y_{t} Δ_{t} (x)}{\sum_{t = 1}^{n} ζ_{t} Δ_{t} (x)} .$

To get a better idea about the variability of the errors, Table summarises the distribution of the squared errors (multiplied by $10^{- 2}$ ) $({SE}_{T}^{j})_{j = 1, \dots, M}$ and $({SE}_{n}^{j})_{j = 1, \dots, M}$ . It shows that continuous-time regression estimator is more accurate than the discrete-time one. Moreover, when T increases the squared errors decrease faster when working with the continuous-time process.

Table 1. Summary statistics of $({SE}^{j})_{j = 1, \dots, 500}$ for discrete- and continuous-time estimators of the regression function.

Display Table

4.2. Simulation 2: optimal sampling mesh selection

The purpose of this simulation is to investigate another aspect related to continuous-time processes. The selection of the ‘optimal’ sampling mesh is one of the most important topics in continuous-time processes.

First of all, we generate a continuous-time functional data process according to the following equation: $X_{t} (s) = Z_{t} (1 - \sin (s - π / 3)), s \in [0, π / 3] and t \in [0, T],$ where $Z_{t}$ is an OU process solution of the stochastic differential equation (Equation19(19) $d Z_{t} = 2 (5 - Z_{t}) d t + 7 d W_{t},$ (19) ) and practically observed at the instants $t = δ, 2 δ, \dots, nδ$ with n = 200 fixed. Here, we take different values of sampling mesh δ, calculate the corresponding empirical version of the Mean Integrated Square Error ( $MISE (δ)$ ) and identify the optimal mesh, say $δ^{⋆}$ , that minimises $MISE (δ)$ . Note that each curve observed at the instant t is discretised at 100 equidistant points over the interval $[0, π / 3] .$ The response variable is obtained following the hereafter nonlinear functional regression model (Equation20(20) $Y_{t} = m (X_{t}) + ε_{t},$ (20) ), where the operator $m (\cdot)$ is defined as $m (X_{t}) = {(\int_{0}^{π / 3} X_{t}^{'} (s) d s)}^{2} and ε_{t} \sim N (0, 0.075) .$ Moreover, the missing at random mechanism in this simulation is also supposed to be the same as described in the first simulation as per Equation (Equation21(21) $p (x) = P (ζ_{t} = 1 | X_{t} = x) = expit (\int_{- 1}^{1} x^{2} (s) d s),$ (21) ). For the tuning parameters used to build the estimator, we considered the quadratic kernel and given the shape of the true regression operator, which depends on the first derivative of the functional predictor, the Euclidean distance between the first-order derivatives of the curves is adopted as a semi-metric. Finally the bandwidth is selected according to the local cross-validation method based on the κ-nearest neighbours as detailed in Ferraty and Vieu (Citation2006, p. 116).

For each value of sampling mesh δ, the regression operator $m (\cdot)$ is estimated over a grid of 50 different fixed curves and the whole procedure is repeated over M = 500 replications. Finally, the empirical MISE is calculated, for each sampling mesh δ, according to the following equation: $MISE (δ) := \frac{1}{M} \sum_{k = 1}^{M} \frac{1}{50} \sum_{j = 1}^{50} {(m (x_{j}) - {\hat{m}}_{n, δ}^{k} (x_{j}))}^{2} .$ Observe that ${\hat{m}}_{n, δ}^{k} (\cdot)$ is the estimator of $m (\cdot)$ , obtained at the kth iteration, depends on the sampling mesh δ, so is the MISE.

Figure displays the values of $MISE (δ)$ obtained for different values of sampling mesh δ and missing at random rate of 10%, 50% and 0% (complete data), respectively. One can observe that higher is the missing at random rate, higher are the errors in estimating the regression operator.

Figure 2. The $MISE (δ)$ obtained for different values of sampling mesh δ and several missing at random rates.

Table reports the optimal sampling mesh $δ^{⋆}$ , which minimises $MISE (δ)$ , for different missing at random rates. It also provides some summary statistics to have an idea about the distribution of $MISE (δ)$ for several values of δ. One can observe, from Table , that higher is the missing at random rate longer we need to observe the underlying process to collect the n = 200 observations to be able to reasonably estimate the regression operator. Indeed, when the data is complete the optimal time interval $T^{⋆} = n δ^{⋆} = 200 \times 0.3 = 60$ . However, when $MAR = 10 %$ (resp. 50%) the optimal time interval is equal to $T^{⋆} = 200 \times 0.36 = 72$ (resp. $T^{⋆} = 200 \times 0.38 = 76$ ). Consequently, it can be concluded that when the missing at random mechanism is heavily affecting the response variable, we need to collect data over a longer period of time. This allows to get sufficient information about the dynamic of the underlying continuous-time process and therefore get a better estimate of the regression operator.

Table 2. The optimal sampling mesh ( $δ^{⋆}$ ) obtained for different MAR rates and some summary statistics of the MISE(δ).

Display Table

4.3. Simulation 3: asymptotic confidence intervals

In this section, we are interested in evaluating the coverage rate, as well as the length, of the asymptotic confidence intervals given in (Equation16(16) $I_{1 - α} (x, y) = [{\hat{m}}_{ψ, T} (x, y) - c_{α / 2} \sqrt{{\hat{V}}_{T} (x, y)}; {\hat{m}}_{ψ, T} (x, y) + c_{1 - α / 2} \sqrt{{\hat{V}}_{T} (x, y)}],$ (16) ). The effect of the sampling mesh on the coverage rate will also be discussed numerically. Since this paper aims to extend results in Delsol (Citation2009) about confidence intervals to continuous-time functional data, we consider the same simulation framework.

Let $X_{t} (s) = \cos (Z_{t} + π (2 s - 1)), s \in [0, 1] and t \in [0, T],$ where $Z_{t}$ is an OU process solution of the stochastic differential equation (Equation19(19) $d Z_{t} = 2 (5 - Z_{t}) d t + 7 d W_{t},$ (19) ) observed at the instants $t = δ, 2 δ, \dots, nδ$ with n = 100, 200 fixed. Here, for comparison purpose, we consider two sampling mesh $δ = 0.1, 0.7.$ The regression operator is defined as $m (x) = \frac{1}{2 π} \int_{1 / 2}^{3 / 4} (x^{'} (s))^{2} d s$ , while the errors ${ε_{t}}$ are independent centred normal random variable with variance $0.1 \times s_{n}^{2}$ , where $s_{n}^{2}$ is the empirical variance of ${m (X_{1}), \dots, m (X_{n})} .$ Because the regression operator is defined as a function of the derivative of the functional random variable, the appropriate semimetric to be used in such case is based on the first derivative of the curve (see (Equation22(22) $d (X_{t_{1}}, X_{t_{2}}) = {(\int_{- 1}^{1} [X_{t_{1}}^{(2)} (s) - X_{t_{2}}^{(2)} (s)]^{2} d s)}^{1 / 2} .$ (22) )). Moreover, the quadratic kernel is used to perform this simulation. The optimal bandwidth is selected based on local cross-validation method on the κ-nearest neighbours. The missing at random rate is simulated according the conditional probability distribution given in (Equation21(21) $p (x) = P (ζ_{t} = 1 | X_{t} = x) = expit (\int_{- 1}^{1} x^{2} (s) d s),$ (21) ).

For a fixed $α \in (0, 1)$ , the asymptotic $(1 - α)$ -confidence intervals for $m (x)$ with $x \in Ξ$ are computed and compared for several values of sample size n and sampling mesh δ. Here $Ξ := {x_{1}, \dots, x_{n_{Ξ}}}$ is a grid of $n_{Ξ} = 50$ independently simulated curves where the regression operator is estimated. For every fixed curve $x \in Ξ,$ a number of M = 500 replications is considered to approximate the coverage rate. In this simulation $1 - α = 0.95, 0.90$ were considered.

As expected, Table shows that the average coverage rate varies with the sample size n, the sampling mesh and the MAR Rate. Higher are the sample size and the sampling mesh and smaller is the MAR rate closer will be the average coverage rate to $1 - α .$ Moreover, one can also observe that the asymptotic confidence intervals length decreases when the sample size increases and the MAR rate decreases.

Table 3. Average coverage over the grid Ξ and average confidence interval length appears in brackets.

Display Table

Figure (resp. Figure ) displays an example of asymptotic confidence intervals obtained for the 50 curves in the testing sample when n = 100, the MAR rate = 0%, 25%, 45%, $δ = 0.7$ and $1 - α = 0.95$ (resp. $1 - α = 0.9$ ). One can observe that the coverage rate decreases with an increase in the MAR rate. Similar results are also obtained when $δ = 0.1$ .

Figure 3. Asymptotic $95 %$ confidence intervals when $δ = 0.7$ and n = 100. Red dot represents the true regression function value at any fixed curve $x \in Ξ$ and the black dot is its estimation. The vertical lines represent the confidence intervals.

Figure 3. Asymptotic 95% confidence intervals when δ=0.7 and n = 100. Red dot represents the true regression function value at any fixed curve x∈Ξ and the black dot is its estimation. The vertical lines represent the confidence intervals.

Figure 4. Asymptotic $90 %$ confidence intervals when $δ = 0.7$ and n = 100. Red dot represents the true regression function value at any fixed curve $x \in Ξ$ and the black dot is its estimation. The vertical lines represent the confidence intervals.

Figure 4. Asymptotic 90% confidence intervals when δ=0.7 and n = 100. Red dot represents the true regression function value at any fixed curve x∈Ξ and the black dot is its estimation. The vertical lines represent the confidence intervals.

5. Applications to real data

5.1. Application 1: prediction of financial asset returns

In Financial market, despite the modern technology, which allows to collect data at a very fine time scale, financial data can still be missing. For instance, there are some regular holidays, such as Thanksgiving Day and Christmas, for which stock price data are missing. There are many other technical reasons (such as breakdown in devises recording data, computers' sudden shutdowns, …) that make stretches of data missing.

This section aims to assess the performance of the estimator proposed in this paper on missing at random financial functional time series. The International Business Machine cooperation (IBM) asset price is considered as the response variable and the Standard & Poor's 500 (SP500) stock market index as predictor. While the IBM asset price is observed at a daily frequency from 24 March 2016 to 28 September 2016, the SP500 is observed every minute during the same period. Note that the daily trading activity lasts for 7 hours excluding the weekend. Since in this paper, we are interested in stationary processes, a first-order differentiation of the IBM daily asset price and the SP500 stock market index was considered to make the original time series stationary.

Our sample here can be denoted as follows: $(X_{d}, Y_{d})_{d = 1, \dots 129}$ , where the sample size n = 129 is the total number of trading days from 24 March 2016 to 28 September 2016 after first differentiation of the original time series, $Y_{d} = Δ {IBM}_{d}$ , $X_{d} = {X_{d} (s) := Δ {SP 500}_{d} (s) : 1 \leq s \leq 420} .$ Originally, the data are completely observed. Therefore, to validate our estimator, we artificially create missing observations. We assume here that the missing data are generated according to the conditional probability distribution given in (Equation21(21) $p (x) = P (ζ_{t} = 1 | X_{t} = x) = expit (\int_{- 1}^{1} x^{2} (s) d s),$ (21) ). We split the original sample into training and testing subsets. Our purpose then is to predict the IBM asset price in the testing subset using the regression operator. Three MAR rates $0 %$ (complete data), $20 %$ and $45 %$ were considered to test the performance of the estimator in terms of prediction. Similarly, as in the simulation section, we considered here the quadratic kernel and the bandwidth was selected using the cross-validation method on the κ-nearest neighbours. For the semi-metric, because the curves are not smooth (as it can be seen in Figure , right panel) we use the PCA-semi-metric, say $d_{4}^{PCA} (\cdot, \cdot)$ , based on the projection on the four eigenfunctions, $v_{1} (\cdot), \dots, v_{4} (\cdot)$ , associated to the four largest eigenvalues of the empirical covariance operator of the functional predictor X: $d_{4}^{PCA} (X_{t_{1}}, X_{t_{2}}) = \sqrt{\sum_{k = 1}^{4} {(\int_{0}^{420} (X_{t_{1}} (s) - X_{t_{2}} (s)) v_{k} (s) d s)}^{2}} .$ As criteria to measure the accuracy of the estimator in predicting 30 observations in the testing subset, we considered the Absolute Error: ${AE}_{t} := | Y_{t} - m_{n} (X_{t}) |$ for $t = 1, \dots, 30$ . Figure displays the distribution of the absolute errors obtained when MAR rate is $0 %, 20 %$ and $45 %$ , respectively. One can clearly observe the effect of the MAR rate on the quality of the prediction. Higher is the MAR rate lower is the quality of prediction.

Figure 5. Left: First-order differentiated IBM asset price. Right: First-order differentiated SP500 intraday (minute frequency) stock market index curves.

Figure 6. The $(AE)_{t = 1, \dots, 30}$ obtained for different values of missing at random rates.

Figure 6. The (AE)t=1,…,30 obtained for different values of missing at random rates.

Moreover, we build a 95% prediction interval for the IBM asset price in the testing subset. Figure shows that the coverage rate is sensitive to the percentage MAR data in the training subset.

Figure 7. A 95% prediction interval of the IBM asset price in the testing subset. Red dot represents the true values of IBM asset price and the black dot is their prediction using the regression operator. The vertical lines represent the prediction intervals.

5.2. Application 2: Daily peak electricity demand imputation

By accurately predicting household peak load, utility companies can better balance the overall electricity demand and supply. This information helps in optimising power generation and distribution, ensuring a stable and reliable electricity grid. It also allows to plan for peak demand periods and avoid potential blackouts or overloading of the grid. Moreover, predicting peak loads empowers consumers with information about their electricity consumption patterns. Thus, with this knowledge, households can make informed decisions to manage their energy usage more effectively, reduce electricity bills and contribute to energy conservation efforts. Furthermore, peak load predictions enable demand response programs, where utility companies offer incentives for consumers to adjust their energy consumption during peak periods, thereby reducing strain on the grid.

For these reasons, electricity companies deployed smart meters to replace the mechanical one. This new generation of smart meters allows to record the electricity demand of any household at very fine time scale and send it to the information system. The transmission of the information from the smart meter towards the information system goes usually through WIFI or optical fibre networks which are significantly dependent on the weather conditions, among several other factors. Therefore, the calculation of the daily peak electricity demand might be subject to missing at random mechanism due to bad weather conditions.

Figure 8. Daily peak electricity demand of a household containing 10% missing data.

Figure 9. Intraday temperature curves. Coloured curves are for the missing peak load days.

Figure 10. Daily peak electricity demand process. Red dots represent values of imputed missing data.

Figure 11. 95% confidence intervals for the imputed values.

Figure displays the daily peak load ${Y_{t_{k}}}_{k = 1, \dots, 1009}$ obtained from a household smart meter from 24 September 1996 to 29 June 1999 (leading to a total of n = 1009 days). The original data contains 10% of missing observations. Here, we assume that the intraday (3-hour frequency) temperature curve ${X_{t_{k}} (s) : s = 3, 6, \dots, 24}$ explains the missingness mechanism in the daily peak demand. Figure displays the intraday, 3-hour frequency, temperature curves. Our purpose in this application is to impute the missing data in the peak demand process using the initial estimator of the regression operator ${\hat{m}}_{n} (x) := {\hat{m}}_{ψ, n} (x, y)$ defined in (Equation17(17) ${\hat{m}}_{ψ, n} (x, y) = \frac{\sum_{k = 1}^{n} ζ_{t_{k}} ψ (Y_{t_{k}}, y) K (\frac{d (x, X_{t_{k}})}{h_{n}})}{\sum_{k = 1}^{n} ζ_{t_{k}} K (\frac{d (x, X_{t_{k}})}{h_{n}})}, t_{k} = kδ (1 \leq k \leq n) .$ (17) ) with $ψ (Y_{t_{k}}, y) = Y_{t_{k}} .$ Figure displays the imputed peak electricity demand process obtained according to the following formula: ${\tilde{Y}}_{t_{k}} = δ_{t_{k}} Y_{t_{k}} + (1 - δ_{t_{k}}) {\hat{m}}_{n} (X_{t_{k}}) .$ If $Y_{t_{k}}$ is observed (that is $δ_{t_{k}} = 1$ ), then ${\tilde{Y}}_{t_{k}} = Y_{t_{k}}$ , otherwise $Y_{t_{k}}$ is missing (i.e. $δ_{t_{k}} = 0$ ) and will be imputed by ${\hat{m}}_{n} (X_{t_{k}}) .$ The red dots in Figure represent the imputed values of the missing observations in the peak electricity demand process. Figure shows the 95% confidence intervals around the missing values of the peak load.

6. Discussion of a special case: conditional quantiles

Let $x \in E$ be fixed and $y \in R$ , then if $ψ (Y, y) = 1 l_{{] - \infty, y]}} (Y)$ the operator $m_{ψ} (x, y)$ is the conditional cumulative distribution function (df) of Y given X = x, namely $F (y | x) = P (Y \leq y | X = x)$ which may be estimated by ${\hat{F}}_{T} (y | x) := {\hat{m}}_{ψ, T} (x, y) .$ For a given $α \in (0, 1),$ the $α th$ -order conditional quantile of the distribution of Y given X = x is defined as $q_{α} (x) = inf {y \in R : F (y | x) \geq α} .$

Notice that, whenever $F (\cdot | x)$ is strictly increasing and continuous in a neighbourhood of $q_{α} (x)$ , the function $F (\cdot | x)$ has a unique quantile of order α at a point $q_{α} (x)$ , that is $F (q_{α} (x) | x) = α .$ In such case $q_{α} (x) = F^{- 1} (α | x) = inf {y \in R : F (y | x) \geq α},$ which may be estimated uniquely by ${\hat{q}}_{T, α} (x) = {\hat{F}}_{T}^{- 1} (α | x)$ . Conditional quantiles have been widely studied in the literature when the predictor X is of finite dimension, see for instance, Gannoun et al. (Citation2003) and Ferraty et al. (Citation2005) for dependent functional data.

(a) Almost sure pointwise and uniform convergence

Under the same conditions of Theorem 3.1, the statement (Equation9(9) ${\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y) = O_{a . s .} (h_{T}^{β}) + O_{a . s .} (\sqrt{\frac{\log T}{Tϕ (h_{T})}}) .$ (9) ) still holds for the estimator of the cumulative conditional distribution function ${\hat{F}}_{T} (y | x)$ . That is ${\hat{F}}_{T} (α | x)$ converges, almost surely, towards $F (y | x)$ with a rate $O (h_{T}^{β}) + O (\sqrt{\log T / (Tϕ (h_{T}))}) .$

Consequently, since $F (q_{α} (x) | x) = α = {\hat{F}}_{T} ({\hat{q}}_{T, α} (x) | x)$ and ${\hat{F}}_{T} (\cdot | x)$ is continuous and strictly increasing, then we have $\forall ϵ > 0, ∃η (ϵ) > 0, ∀y, | {\hat{F}}_{T} (y | x) - {\hat{F}}_{T} (q_{α} (x) | x) | \leq η (ϵ) \Rightarrow | y - q_{α} (x) | \leq ϵ$ which implies that, $\forall ϵ > 0, ∃η (ϵ) > 0,$ (23) $\begin{aligned} P (| {\hat{q}}_{T, α} (x) - q_{α} (x) | \geq η (ϵ)) & \leq P (| {\hat{F}}_{T} ({\hat{q}}_{T, α} (x) | x) - {\hat{F}}_{T} (q_{α} (x) | x) | \geq η (ϵ)) \\ = P (| F (q_{α} (x) | x) - {\hat{F}}_{T} (q_{α} (x) | x) \geq η (ϵ)) . \end{aligned}$ (23) Therefore, the statement (Equation9(9) ${\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y) = O_{a . s .} (h_{T}^{β}) + O_{a . s .} (\sqrt{\frac{\log T}{Tϕ (h_{T})}}) .$ (9) ) still holds for the conditional quantile estimator ${\hat{q}}_{T, α} (x)$ whenever conditions of Theorem 3.1 are satisfied. Ferraty et al. (Citation2005) derived similar pointwise convergence rate by inverting the estimator of the conditional cumulative distribution function. Their result has been obtained under mixing condition and additional assumptions on the joint distribution, and the Lipschitz condition on $F (y | x)$ and its derivatives with respect to y.

Regarding the almost sure uniform convergence, observe that under conditions of Theorem 3.2, the statement (Equation11(11) $sup_{y \in S} sup_{x \in C} | {\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y) | = O_{a . s .} (h_{T}^{β}) + O_{a . s .} (\sqrt{\frac{φ_{C} (ϵ_{T})}{Tϕ (h_{T})}}) as T \to + \infty .$ (11) ) still holds true for the $sup_{y \in S} sup_{x \in C} | {\hat{F}}_{T} (y (x) - F_{T} (y (x) |$ , when $ψ (Y, y)$ is replaced by $1 l_{{] - \infty, y]}} (Y)$ . Moreover, assume that, for fixed $x_{0} \in C$ , $F (y | x_{0})$ is differentiable at $q_{α} (x_{0})$ with $\frac{\partial}{\partial y} F (y | x_{0}) |_{y = q_{α} (x_{0})} := g (q_{α} (x_{0}) | x_{0}) > ν > 0$ , where ν is a real number, and $g (\cdot | x)$ is uniformly continuous for all $x \in C$ . Knowing that ${\hat{F}}_{T} ({\hat{q}}_{T, α} (x) | x) = F (q_{α} (x) | x) = α$ and making use of a Taylor's expansion of the function $F ({\hat{q}}_{T, α} (x) | x)$ around $q_{α} (x)$ , we can write (24) $F ({\hat{q}}_{T, α} (x) | x) - F (q_{α} (x) | x) = ({\hat{q}}_{T, α} (x) - q_{α} (x)) g (q_{T, α}^{*} (x) | x)$ (24) where $q_{T, α}^{*} (x)$ lies between $q_{α} (x)$ and ${\hat{q}}_{T, α} (x)$ . It follows then from (Equation24(24) $F ({\hat{q}}_{T, α} (x) | x) - F (q_{α} (x) | x) = ({\hat{q}}_{T, α} (x) - q_{α} (x)) g (q_{T, α}^{*} (x) | x)$ (24) ) that the inequality (Equation23(23) $\begin{aligned} P (| {\hat{q}}_{T, α} (x) - q_{α} (x) | \geq η (ϵ)) & \leq P (| {\hat{F}}_{T} ({\hat{q}}_{T, α} (x) | x) - {\hat{F}}_{T} (q_{α} (x) | x) | \geq η (ϵ)) \\ = P (| F (q_{α} (x) | x) - {\hat{F}}_{T} (q_{α} (x) | x) \geq η (ϵ)) . \end{aligned}$ (23) ) still holds true uniformly in x and y. Moreover, the fact that ${\hat{q}}_{T, α} (x)$ converges a.s. towards $q_{α} (x)$ as T goes to infinity, combined with the uniformly continuity of $g (\cdot | x)$ , allow to write that (25) $sup_{x \in C} | {\hat{q}}_{T, α} (x) - q_{α} (x) | sup_{x \in C} | g (q_{α} (x) | x) | = O_{a . s .} (sup_{y \in S} sup_{x \in C} | {\hat{F}}_{T} (y | x) - F (y | x) |) .$ (25) Since $g (q_{α} (x) | x)$ is uniformly bounded from below, we can then claim that the estimator ${\hat{q}}_{T, α} (x)$ converges uniformly towards $q_{α} (x)$ with the same convergence rate given in (Equation11(11) $sup_{y \in S} sup_{x \in C} | {\hat{m}}_{ψ, T} (x, y) - m_{ψ} (x, y) | = O_{a . s .} (h_{T}^{β}) + O_{a . s .} (\sqrt{\frac{φ_{C} (ϵ_{T})}{Tϕ (h_{T})}}) as T \to + \infty .$ (11) ), as T goes to infinity.

(b) Continuous-time confidence intervals

Confidence intervals for the conditional quantiles $q_{α} (x)$ may be obtained according to the following steps. First, consider a Taylor's expansion of ${\hat{F}}_{T} (\cdot | x)$ around $q_{α} (x)$ and making use of the fact that ${\hat{q}}_{T, α} (x)$ converges a.s. towards $q_{α} (x)$ as T goes to infinity, one gets (26) ${\hat{q}}_{T, α} (x) - q_{α} (x) = - \frac{1}{{\hat{g}}_{T} (q_{α} (x) | x)} ({\hat{F}}_{T} (q_{α} (x) | x) - F (q_{α} (x) | x)),$ (26) where ${\hat{g}}_{T} (\cdot | x)$ is a consistent estimator of $g (\cdot | x)$ . Then, replacing $ψ (Y, y)$ by the indicator function, we get under conditions of Corollary 3.7, the following $(1 - α)$ confidence intervals for $q_{α} (x) :$ (27) ${\hat{q}}_{T, α} (x) \pm c_{1 - α / 2} \frac{\sqrt{M_{T, 2}}}{M_{T, 1} {\hat{g}}_{T} ({\hat{q}}_{T, α} (x) | x)} \sqrt{\frac{α (1 - α)}{T F_{x, T} (h_{T}) p_{T} (x)}}, as T \to \infty .$ (27)

Supplemental material

Supplemental Material

Download PDF (340.9 KB)

Acknowledgments

Open Access funding provided by the Qatar National Library.

We thank the editor, associate editor and the two referees for their valuable and constructive comments which helped improve the manuscript substantially.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 For any

0 \leq s < t \leq T

such that

t - s \leq α_{0}

, there exists a non-negative continuous random function

f_{t, s} (ω, x)

such that

sup_{s \geq 0, ω \in Ω} | f_{t, s} (ω, x) | \leq b_{s, α_{0}} (x) p . s .

where

b_{s, α_{0}} (x)

is a deterministic function.

References

Andrews, D.W.K. (1984), ‘Non-strong Mixing Autoregressive Processes’, Journal of Applied Probability, 21, 930–934.
Web of Science ®Google Scholar
Bosq, D. (1998), Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, Lecture Notes in Statistics, Vol. 110 (2nd ed.), New York: Springer-Verlag.
Google Scholar
Bouzebda, S., and Didi, S. (2017), ‘Asymptotic Results in Additive Regression Model for Strictly and Ergodic Continuous Times Processes’, Communications in Statistics-Theory and Methods, 46(5), 2454–2493.
Web of Science ®Google Scholar
Chaouch, M., and Laïb, N. (2019), ‘Optimal Asymptotic MSE of Kernel Regression Estimate for Continuous Time Processes with Missing At Random Response’, Statistics and Probability Letters, 154, 108532.
Web of Science ®Google Scholar
Chaouch, M., and Laïb, N. (2023), ‘Supplement to “Regression estimation for continuous time functional data processes with missing at random response”’.
Google Scholar
Cheng, P.E. (1994), ‘Nonparametric Estimation of Mean Functionals with Data Missing at Random’, Journal of the American Statistical Association, 89, 81–87.
Web of Science ®Google Scholar
Cheridito, P., Kawaguchi, H., and Maejima, M. (2003), ‘Fractional Ornstein–Uhlenbeck Processes’, Electronic Journal of Probability, 8(3), 1–14.
Google Scholar
Chesneau, C, and Maillot, B. (2014), ‘Superoptimal Rate of Convergence in Nonparametric Estimation for Functional Valued Processes’, International Scholarly Research Notices, 2014, 264217.
PubMedGoogle Scholar
Chu, C.K., and Cheng, P.E. (2003), ‘Nonparametric Regression Estimation with Missing Data’, Journal of Statistical Planning and Inference, 48, 85–99.
Google Scholar
de la Peña, V.H., and Giné, E. (1999), Decoupling: From Dependence to Independence, Probability and Its Applications, New York: Springer-Verlag.
Google Scholar
Delsol, L. (2009), ‘Advances on Asymptotic Normality in Non-parametric Functional Time Series Analysis’, Statistics, 43, 13–33.
Web of Science ®Google Scholar
Didi, S., and Louani, D. (2014), ‘Asymptotic Results for the Regression Function Estimate on Continuous Time Stationary Ergodic Data’, Journal Statistics & Risk Modeling, 31(2), 129–150.
Google Scholar
Doukhan, P. (2018), Stochastic Models for Time Series, New York: Springer.
Google Scholar
Doukhan, P., and Louhichi, S. (1999), ‘A New Weak Dependence Condition and Applications to Moment Inequalities’, Stochastic Processes and Their Applications, 84, 313–342.
Web of Science ®Google Scholar
Efromovich, S. (2011), ‘Nonparametric Regression with Responses Missing at Random’, Journal of Statistical Planning and Inference, 141, 3744–3752.
Web of Science ®Google Scholar
Ferraty, F., Laksaci, A., Tadj, A., and Vieu, P. (2010), ‘Rate of Uniform Consistency for Nonparametric Estimates with Functional Variables’, Journal of Statistical Planning and Inference, 140, 335–352.
Web of Science ®Google Scholar
Ferraty, F., Mas, A., and Vieu, P. (2007), ‘Nonparametric Regression on Functional Data: Inference and Practical Aspects’, Australian & New Zealand Journal of Statistics, 49(3), 267–286.
Web of Science ®Google Scholar
Ferraty, F., Rabhi, A., and Vieu, P. (2005), ‘Special Issue on Quantile Regression and Related Methods’, Sankhyà : The Indian Journal of Statistics, 67(2), 378–398.
Google Scholar
Ferraty, F., Sued, M., and Vieu, P. (2013), ‘Mean Estimation with Data Missing at Random for Functional Covariables’, Statistics, 47(4), 688–706.
Web of Science ®Google Scholar
Ferraty, F, and Vieu, P. (2006), Nonparametric Modelling for Functional Data, Methods, Theory, Applications and Implementations, London: Springer-Verlag.
Google Scholar
Francq, C., and Zakoïan, J.M. (2010), GARCH Models: Structure, Statistical Inference and Financial Applications, John Wiley and Sons Ltd.
Google Scholar
Gannoun, A., Saracco, J., and Yu, K. (2003), ‘Nonparametric Prediction by Conditional Median and Quantiles’, Journal of Statistical Planning and Inference, 117, 207–223.
Web of Science ®Google Scholar
Giraitis, L., and Leipus, R. (1995), ‘A Generalized Fractionally Differencing Approach in Long-memory Modeling’, Lithuanian Mathematical Journal, 35(1), 53–65.
Google Scholar
González-Manteiga, W., and Pérez-González, A. (2004), ‘Nonparametric Mean Estimation with Missing Data’, Communications in Statistics-Theory and Methods, 33(2), 277–303.
Web of Science ®Google Scholar
Guégan, D., and Ladoucette, S. (2001), ‘Non-mixing Properties of Long Memory Processes’, Comptes Rendus De L'Académie Des Sciences – Series I – Mathematics, 333(1), 373–376.
Google Scholar
Hall, P., and Heyde, C. (1980), Martingale Limit Theory and Its Application, New York: Academic Press.
Google Scholar
Laïb, N., and Louani, D. (2010), ‘Nonparametric Kernel Regression Estimation for Functional Stationary Ergodic Data: Asymptotic Properties’, Journal of Multivariate Analysis, 101(10), 2266–2281.
Web of Science ®Google Scholar
Laïb, N., and Louani, D. (2011), ‘Rates of Strong Consistencies of the Regression Function Estimator for Functional Stationary Ergodic Data’, Journal of Statistical Planning and Inference, 141(1), 359–372.
Web of Science ®Google Scholar
Liang, H., Wang, S., and Carroll, R.J. (2007), ‘Partially Linear Models with Missing Response Variables and Error-prone Covariates’, Biometrika, 94(1), 185–198.
PubMed Web of Science ®Google Scholar
Ling, N., Liang, L., and Vieu, P. (2015), ‘Nonparametric Regression Estimation for Functional Stationary Ergodic Data with Missing At Random’, Journal of Statistical Planning and Inference, 162, 75–87.
Web of Science ®Google Scholar
Little, R.J.A., and Rubin, D.B. (2002), Statistical Analysis with Missing Data (2nd ed.), New York: John Wiley.
Google Scholar
Maillot, B. (2008), ‘Propriétś Asymptotiques de Quelques Estimateurs Non-paramétriques Pour des Variables Vectorielles et Fonctionnelles’, Thése de Doctorat de l'Université Paris 6.
Google Scholar
Maslowski, B., and Pospíšil, P. (2008), ‘Ergodicity and Parameter Estimates for Infinite-dimensional Fractional Ornstein–Uhlenbeck Process’, Applied Mathematics and Optimization, 57, 401–429.
Web of Science ®Google Scholar
Nittner, T. (2003), ‘Missing At Random (MAR) in Nonparametric Regression, a Simulation Experiment’, Statistical Methods and Applications, 12, 195–210.
Google Scholar
Sikov, A. (2018), ‘A Brief Review of Approaches to Non-ignorable Non-response’, International Statistical Review, 86, 415–441.
Web of Science ®Google Scholar
Tsiatis, A. (2006), Semiparametric Theory and Missing Data, New York: Springer.
Google Scholar

Appendix. Proofs of main results

In this section and for sake of simplification will denote ${\hat{m}}_{T} (x, y)$ and $m (x, y)$ for ${\hat{m}}_{ψ, T} (x, y)$ and $m_{ψ} (x, y)$ , respectively, and $ψ_{y} (Y_{t})$ for $ψ (Y_{t}, y)$ . Consider now the following quantities: (A1) $\begin{aligned} Q_{T} (x, y) & := ({\hat{m}}_{T, 2} (x, y) - {\bar{m}}_{T, 2} (x, y)) - m (x, y) ({\hat{m}}_{T, 1} (x) - {\bar{m}}_{T, 1} (x)), \end{aligned}$ (A1) (A2) $\begin{aligned} R_{T} (x, y) & := - B_{T} (x, y) ({\hat{m}}_{T, 1} (x) - {\bar{m}}_{T, 1} (x)) . \end{aligned}$ (A2) We have then (A3) ${\hat{m}}_{T} (x, y) - m (x, y) = {\hat{m}}_{T} (x, y) - C_{T} (x, y) + B_{T} (x, y) = B_{T} (x, y) + \frac{Q_{T} (x, y) + R_{T} (x, y)}{{\hat{m}}_{T, 1} (x)} .$ (A3) We start first by stating some technical lemmas that will be used later.

Lemma A.1

Assume that assumptions (A1)–(A2) are satisfied, then we have for any $j \geq 1$ and $ℓ \geq 1$

$\frac{1}{ϕ (h_{T})} E (Δ_{t}^{ℓ} (x) | F_{j - 2}) = M_{ℓ} f_{t, T_{j - 2}} (x) + O_{a . s .} (\frac{g_{t, T_{j - 2}, x} (h_{T})}{ϕ (h_{T})}),$
$\frac{1}{ϕ (h_{T})} E (Δ_{t}^{ℓ} (x)) = M_{ℓ} f (x) + o (1)$ .

Proof.

The proof is similar to the proof of Lemma 1 of Laïb and Louani (Citation2010).

Lemma A.2

Let $(Z_{n})_{n \geq 1}$ be a sequence of real martingale differences with respect to the sequence of σ-fields $(F_{n} = σ (Z_{1}, \dots Z_{n}))_{n \geq 1}$ where $σ (Z_{1}, \dots, Z_{n})$ is the sigma-field generated by the random variables $Z_{1}, \dots, Z_{n}$ . Set $S_{n} = \sum_{i = 1}^{n} Z_{i} .$ For any $p \geq 2$ and any $n \geq 1$ , assume that there exist some nonnegative constants C and $d_{n}$ such that $E (Z_{n}^{p} | F_{n - 1}) \leq C^{p - 2} p! d_{n}^{2}$ almost surely. Then, for any $ϵ > 0,$ we have $P (| S_{n} | > ϵ) \leq 2 \exp {- \frac{ϵ^{2}}{2 (D_{n} + Cϵ)}},$ where $D_{n} = \sum_{i = 1}^{n} d_{i}^{2} .$

Proof.

See Theorem 8.2.2 of de la Peña and Giné (Citation1999).

Proof

Proof of Theorem 3.4

From (EquationA3(A3) ${\hat{m}}_{T} (x, y) - m (x, y) = {\hat{m}}_{T} (x, y) - C_{T} (x, y) + B_{T} (x, y) = B_{T} (x, y) + \frac{Q_{T} (x, y) + R_{T} (x, y)}{{\hat{m}}_{T, 1} (x)} .$ (A3) ) and Lemma 1.2 (in Chaouch and Laïb Citation2023), we have for T large enough (A4) $\begin{aligned} MSE (x, y) & = E {({\hat{m}}_{T} (x, y) - m (x, y))}^{2} ≃ E {(B_{T} (x, y) + \frac{Q_{T} (x, y) + R_{T} (x, y)}{p (x)})}^{2} \\ ≃ E (B_{T}^{2} (x, y)) + \frac{1}{p^{2} (x)} [E (Q_{T}^{2} (x, y)) + E (R_{T}^{2} (x, y))], \end{aligned}$ (A4) where the products $2 E [B_{T} (x, y) (Q_{T} (x, y) + R_{T} (x, y))]$ and $2 E [Q_{T} (x, y) \times R_{T} (x, y)]$ have been ignored because by the Cauchz–Schwarz inequality $\begin{aligned} E [B_{T} (x, y) (Q_{T} (x) + R_{T} (x, y))] & \leq E {(B_{T}^{2} (x, y))}^{1 / 2} \times E {((Q_{T} (x, y) + R_{T} (x, y))}^{1 / 2} \\ \leq max {E (B_{T}^{2} (x, y)), E [(Q_{T} (x, y) + R_{T} (x, y))^{2}]} . \end{aligned}$ We have the same inequality for the second product. The proof of Theorem 3.4 results from Proposition 3.3 and Lemma A.3 below, which gives an upper bound of the expectation of $Q_{T}^{2} (x, y)$ and $R_{T}^{2} (x, y)$ , respectively.

Lemma A.3

Assume that (A1)–(A3) hold true, then we have (A5) $E (Q_{T}^{2} (x, y)) ≃ \frac{4 p (x) (W_{2} (x, y) + (m (x, y))^{2}) M_{2}}{Tϕ (h_{T}) M_{1}^{2} f (x)} .$ (A5)

Proof.

Ignoring the product term as above, one may write $\begin{aligned} E [Q_{T}^{2} (x, y)] & ≃ E {[{\hat{m}}_{T, 2} (x, y) - {\bar{m}}_{T, 2} (x, y)]}^{2} \\ + m^{2} (x, y) E {[{\hat{m}}_{T, 1} (x) - {\bar{m}}_{T, 1} (x)]}^{2} := I_{T, 1} + m^{2} (x, y) I_{T, 2} . \end{aligned}$ The terms $I_{T, 1}$ and $I_{T, 2}$ can be handled similarly. Let us now evaluate the first one. Since $(T_{j} = jδ)_{0 \geq j \geq n}$ is a δ-partition of $[0, T]$ , we have (A6) $\begin{aligned} {\hat{m}}_{T, 2} (x, y) - {\bar{m}}_{T, 2} (x, y) & = \frac{1}{n E (Z_{1} (x))} \sum_{j = 1}^{n} \int_{T_{j - 1}}^{T_{j}} [ζ_{t} ψ_{y} (Y_{t}) Δ_{t} (x) - E {ζ_{t} ψ_{y} (Y_{t}) Δ_{t} (x) | F_{t - δ} |}] d t \\ := \frac{1}{n E (Z_{1} (x))} \sum_{j = 1}^{n} L_{T, j} (x, y) . \end{aligned}$ (A6) Since $(L_{T, j} (x, y))_{j \geq 1}$ is a sequence of martingale differences with respect to the family $(F_{j - 1})_{j \geq 1}$ , then

$E (L_{T, j} (x, y) \times L_{T, k} (x, y)) = 0$ for every $j, k \in {1, \dots n}$ such that $j \neq k .$ Therefore (by ignoring the product term), we have (A7) $I_{T, 1} = E {[{\hat{m}}_{T, 2} (x, y) - {\bar{m}}_{T, 2} (x, y)]}^{2} ≃ \frac{1}{n^{2} (E (Z_{1} (x)))^{2}} \sum_{j = 1}^{n} E (L_{T, j} (x, y))^{2} .$ (A7) Using Jensen inequality and a double conditioning with respect to $S_{t - δ, δ}$ combined with (A3)(iii) –(iv), $I_{T, 1}$ may bounded as $\begin{aligned} I_{T, 1} & \leq \frac{4}{n^{2} (E (Z_{1} (x)))^{2}} \sum_{j = 1}^{n} \int_{T_{j - 1}}^{T_{j}} E [Δ_{t}^{2} (x) p (X_{t}) W_{2} (X_{t}, y)] d t . \\ = \frac{4 (p (x) + o (1)) (W_{2} (x, y) + o (1)) [M_{2} f (x) + o (1)]}{Tϕ (h_{T}) [M_{1} f (x) + o (1)]^{2}} . \end{aligned}$ Similarly, we have $I_{T, 2} \leq \frac{4 (p (x) + o (1)) [M_{2} f (x) + o (1)]}{Tϕ (h_{T}) [M_{1} f (x) + o (1)]^{2}} .$ Therefore $E (Q_{T}^{2} (x, y)) ≃ I_{T, 1} + m^{2} (x, y) I_{T, 2} = \frac{4 p (x) (W_{2} (x, y) + (m (x, y))^{2}) M_{2}}{Tϕ (h_{T}) M_{1}^{2} f^{2} (x)} .$ Moreover, using the decomposition (EquationA2(A2) $\begin{aligned} R_{T} (x, y) & := - B_{T} (x, y) ({\hat{m}}_{T, 1} (x) - {\bar{m}}_{T, 1} (x)) . \end{aligned}$ (A2) ), Theorem 3.3 and Lemma 1.1 (in Chaouch and Laïb Citation2023) one can see that $E (Q_{T}^{2} (x, y))$ is negligible with respect to $E (Q_{T}^{2} (x, y)) .$ This completes the proof.

Proof

Proof of Theorem 3.5

The proof of Theorem 3.5 is based essentially on Lemma A.4 established below, which gives the normality asymptotic of the principal term $Q_{T} (x, y)$ in (EquationA3(A3) ${\hat{m}}_{T} (x, y) - m (x, y) = {\hat{m}}_{T} (x, y) - C_{T} (x, y) + B_{T} (x, y) = B_{T} (x, y) + \frac{Q_{T} (x, y) + R_{T} (x, y)}{{\hat{m}}_{T, 1} (x)} .$ (A3) ). Indeed, we have from (EquationA3(A3) ${\hat{m}}_{T} (x, y) - m (x, y) = {\hat{m}}_{T} (x, y) - C_{T} (x, y) + B_{T} (x, y) = B_{T} (x, y) + \frac{Q_{T} (x, y) + R_{T} (x, y)}{{\hat{m}}_{T, 1} (x)} .$ (A3) ) that (A8) $\sqrt{Tϕ (h_{T})} ({\hat{m}}_{T} (x, y) - m (x, y)) = \sqrt{Tϕ (h_{T})} B_{T} (x, y) + \frac{\sqrt{Tϕ (h_{T})} Q_{T} (x, y) + \sqrt{Tϕ (h_{T})} R_{T} (x, y)}{{\hat{m}}_{T, 1} (x)} .$ (A8) Under (A1)–(A3), Lemma 1.2 (in Chaouch and Laïb Citation2023) implies that ${\hat{m}}_{T, 1} (x)$ converges, almost surely, to $p (x)$ as $T \to \infty .$ Moreover, using Lemma 1.3 (in Chaouch and Laïb Citation2023), we get under (A3)(i) –(ii) combined with conditions (Equation12(12) $lim_{T \to \infty} Tϕ (h_{T}) = \infty, h_{T}^{β} \sqrt{Tϕ (h_{T})} = o (1) and h_{T}^{β} \log T^{1 / 2} = o (1) as T \to \infty .$ (12) ) that $\sqrt{Tϕ (h_{T})} B_{T} (x, y) = O_{a . s .} (h_{T}^{β} \sqrt{Tϕ (h_{T})}) = o_{a . s} (1),$ and $\sqrt{Tϕ (h_{T})} R_{T} (x, y) = O_{a . s .} (\sqrt{Tϕ (h_{T})} h_{T}^{β} {(\frac{\log T}{Tϕ (h_{T})})}^{1 / 2}) = O_{a . s .} (h_{T}^{β} \log T^{1 / 2}) = o_{a . s .} (1) .$ The proof may be then achieved by Lemma A.4 and Slutsky's Theorem.

Lemma A.4

Under conditions (A1)–(A3), we have $\begin{aligned} \sqrt{Tϕ (h_{T})} ({\hat{m}}_{T} (x, y) - m (x, y)) \overset{d}{\to} N (0, {\tilde{σ}}^{2} (x, y)) as T ⟶ \infty \\ where & {\tilde{σ}}^{2} (x, y) \leq \frac{M_{2}}{M_{1}^{2} f (x) p (x)} {\bar{W}}_{2} (x, y) . \end{aligned}$

Proof

Proof of Lemma A.4

We have (A9) $\begin{aligned} \sqrt{Tϕ (h_{T})} Q_{T} (x, y) & = \sum_{i = 1}^{n} ξ_{T, i} (x, y), with ξ_{T, i} (x, y) = η_{T, i} (x, y) - E [η_{T, i} (x, y) | F_{t - δ}] and \\ η_{T, i} (x, y) & = \frac{1}{E Z_{1}} \sqrt{\frac{ϕ_{T} (h)}{n}} \int_{T_{i - 1}}^{T_{i}} ζ_{t} Δ_{t} (x) [ψ_{y} (Y_{t}) - m (x, y)] d t . \end{aligned}$ (A9) Since for any $i \geq 1$ and $t \in [T_{i - 1}, T_{i}]$ , $F_{i - 2} \subset F_{t - δ} \subset F_{i - 1}$ , then $(ξ_{T, i} (x, y))_{i \geq 1}$ is $F_{i - 1}$ -measurable, $E (| ξ_{T, i} |) < \infty$ provided $E (ζ_{t}^{2}) < \infty$ and $E (X_{t}^{2}) < \infty$ . Moreover, we have for any $1 \leq i \leq n$ , $E (ξ_{T, i} | F_{i - 2}) = E {E [η_{i} | F_{t - δ}] | F_{i - 2}} - E {E [η_{i} | F_{t - δ}] | F_{i - 2}} = 0$ a.s. Hence $(ξ_{T, i} (x, y))_{i \geq 1}$ is a sequence of martingale differences with respect to the σ-fields $(F_{i - 1})_{i \geq 1}$ . To prove the asymptotic normality, it suffices to check both following conditions (see Corollary 3.1, p. 56, Hall and Heyde Citation1980):

(a) $\sum_{i = 1}^{n} E [ξ_{T, i}^{2} (x, y) | F_{i - 2}] \overset{P}{\to} {\tilde{σ}}^{2} (x, y)$ and (b) $n E [ξ_{T, i}^{2} (x, y) 1 l_{{| ξ_{T, i} (x, y) | > ϵ}}] = o (1)$ holds for any $ϵ > 0$ .

Proof of (a) Observe now that $| \sum_{i = 1}^{n} E [η_{T, i}^{2} (x, y) | F_{i - 2}] - \sum_{i = 1}^{n} E [ξ_{T, i}^{2} (x, y) | F_{i - 2}] | \leq | \sum_{i = 1}^{n} {(E [η_{T, i} (x, y) | F_{i - 2}])}^{2} | .$ Using (A1), (A3)(i),(i $^{'}$ ), (ii) and (iv) with Lemma A.1, and a double conditioning with respect to the σ-field $S_{t - δ, δ}$ and the fact that $n E Z_{1} (t) = O (Tϕ (h))$ , we have $\begin{aligned} | E (η_{T, i} | F_{i - 2}) | \\ = \frac{1}{E Z_{1}} \sqrt{\frac{ϕ_{T} (h)}{n}} | \int_{T_{i - 1}}^{T_{i}} E (p (X_{t}) Δ_{t} (x) [m (X_{t}, y) - m (x, y)] | F_{i - 2}) d t | \\ \leq \sqrt{n ϕ_{T} (h)} \frac{p (x)}{n E Z_{1}} sup_{u \in B (x, h)} | m (u, y) - m (x, y) | sup_{u \in B (x, h)} | p (u) - p (x) | | \int_{T_{i - 1}}^{T_{i}} E (Δ_{t} (x) | F_{i - 2}) d t | \\ = O (\sqrt{n ϕ_{T} (h)} h^{β}) O (ϕ (h_{T}) \int_{T_{i - 1}}^{T_{i}} f_{t, T_{i - 2}} (x) d t + o (1)) \frac{1}{n E Z_{1}} \\ = O (\sqrt{n ϕ_{T} (h)} h^{β}) O (\frac{1}{T} \int_{T_{i - 1}}^{T_{i}} b_{t, α_{0}} (x) d t)) . \end{aligned}$ It follows by (A2)-(iii) and the Cauchy–Schwarz inequality that $\sum_{i = 1}^{n} {(E [η_{T, i} (x, y) | F_{i - 2}])}^{2} = O (h^{2 β} ϕ (h)) = o (1) .$ Thus we have only to show that $\sum_{i = 1}^{n} E [η_{T, i}^{2} (x, y) | F_{i - 2}] \overset{P}{\to} σ^{2} (x, y)$ . Using again the Cauchy–Schwarz inequality, one may write (A10) $\begin{aligned} \sum_{i = 1}^{n} E [η_{T, i}^{2} (x, y) | F_{i - 2}] & = \frac{1}{(E Z_{1})^{2}} \frac{ϕ_{T} (h)}{n} \sum_{i = 1}^{n} E [{(\int_{T_{i - 1}}^{T_{i}} ζ_{t} Δ_{t} (x) [ψ_{y} (Y_{t}) - m (x, y)])}^{2} | F_{i - 2}] \\ \leq \frac{δ}{(E Z_{1})^{2}} \frac{ϕ_{T} (h)}{n} \sum_{i = 1}^{n} E [\int_{T_{i - 1}}^{T_{i}} ζ_{t}^{2} Δ_{t}^{2} (x) [ψ_{y} (Y_{t}) - m (x, y)]^{2} d t | F_{i - 2}] \\ = \frac{δ}{(E Z_{1})^{2}} \frac{ϕ_{T} (h)}{n} \sum_{i = 1}^{n} E [\int_{T_{i - 1}}^{T_{i}} ζ_{t}^{2} Δ_{t}^{2} (x) [ψ_{y} (Y_{t}) - m (X_{t}, y)]^{2} d t | F_{i - 2}] \\ + \frac{δ}{(E Z_{1})^{2}} \frac{ϕ_{T} (h)}{n} \sum_{i = 1}^{n} E [\int_{T_{i - 1}}^{T_{i}} ζ_{t} Δ_{t}^{2} (x) [m (X_{t}, y) - m (x, y)]^{2} d t | F_{i - 2}] \\ =: A_{n} + C_{n} . \end{aligned}$ (A10) Now, let us evaluate the term $A_{n}$ . Conditioning three times with respect to $F_{t - δ}$ and $S_{t - δ}$ , and making use of Conditions (A3)(i $^{'}$ ), (iii), (iv) and the fact that $T = nδ$ , to get from Lemma A.1 that (A11) $\begin{aligned} A_{n} & = \frac{δ}{(E Z_{1})^{2}} \frac{ϕ_{T} (h)}{n} \sum_{i = 1}^{n} E [\int_{T_{i - 1}}^{T_{i}} p (X_{t}) Δ_{t}^{2} (x) {\bar{W}}_{2} (X_{t}, y) d t | F_{i - 2}] \\ \leq \frac{δ}{(E Z_{1})^{2}} \frac{ϕ_{T} (h)}{n} (p (x) + o (1)) | ({\bar{W}}_{2} (x, y) + o (1)) \sum_{i = 1}^{n} E [\int_{T_{i - 1}}^{T_{i}} Δ_{t}^{2} (x) d t | F_{i - 2}] \\ \leq (δ + o (1)) p (x) {\bar{W}}_{2} (x, y) \frac{ϕ_{T}^{2} (h)}{(E Z_{1})^{2}} \frac{1}{n} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} E [\frac{1}{ϕ_{T} (h)} Δ_{t}^{2} (x) | F_{i - 2}] d t \\ \leq δ (δ + o (1)) p (x) {\bar{W}}_{2} (x) \frac{ϕ_{T}^{2} (h)}{(E (Z_{1})^{2}} M_{2} {\frac{1}{T} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} f_{t, T_{i - 2}} (x) d t \\ + O_{a . s .} [\frac{1}{T} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} \frac{g_{t, T_{i - 2}, x} (h_{T})}{ϕ_{T} (h)} d t]} \end{aligned}$ (A11) The Riemann's sum combined with condition (A2)(iii) gives that $\frac{1}{T} \sum_{i = 1}^{n} \int_{T_{i - 1}}^{T_{i}} f_{t, T_{i - 2}} (x) d t \leq \frac{1}{T} \int_{0}^{T} f_{t, T_{t - δ}} (x) d t ⟶ f (x) a . s . as T ⟶ \infty .$ Moreover, by (A2)(ii) one gets $\frac{g_{t, T_{i - 2}, x} (h_{T})}{ϕ_{T} (h)} = o (1) as T ⟶ \infty$ . Therefore, we have (A12) $\begin{aligned} A_{n} & \leq δ (δ + o (1)) p (x) {\bar{W}}_{2} (x, y) \frac{ϕ_{T}^{2} (h)}{(δ ϕ_{T} (h_{T}) M_{1} f (x) + o (1))^{2}} M_{2} [f (x) + o (1)] \\ = \frac{M_{2}}{M_{1}^{2} f (x)} p (x) {\bar{W}}_{2} (x, y) := {\tilde{σ}}^{2} (x, y) as T ⟶ \infty . \end{aligned}$ (A12) On the other hand, by the same arguments as above combined with the fact that $sup_{u \in B (x, h)} | m (x) - m (u) | = O (h^{2 β})_{a . s .}$ , we get $C_{n} = o_{a . s .} (1)$ .

Proof of part (b) Using successively Hölder, Markov, Jensen and Minkowski inequalities combined with conditions (A3)(iii), (A3)(iv) and Lemma A.1, we get, for any $ϵ > 0$ and fixed real numbers p>1 and q>1 such that $1 / p + 1 / q = 1$ , $n E [ξ_{T, i}^{2} (x, y) 1 l_{{| ξ_{T, i} (x, y) | > ϵ}}] \leq 4 n (ϵ / 2)^{- 2 q / p} E [| η_{T, i} |^{2 q}] = O ({(Tϕ (h_{T}))}^{- γ / 2}) = o_{a . s} (1)$ by taking $2 q = 2 + γ$ ( $0 < γ < 1$ ), since $Tϕ (h_{T})$ towards to infinity as T goes to infinity.

Proof

Proof of Corollary 3.7

Observe that (A13) $\begin{aligned} \sqrt{\frac{T F_{x, T} (h_{T})}{{\tilde{V}}_{T}^{2} (x, y)}} ({\hat{m}}_{T} (x, y) - m (x, y)) \\ = \sqrt{\frac{F_{x, T} (h_{T})}{ϕ (h_{T}) f (x)}} \sqrt{\frac{σ^{2} (x, y) f (x)}{{\tilde{V}}_{T}^{2} (x, y)}} \sqrt{\frac{Tϕ (h_{T})}{σ^{2} (x, y)}} ({\hat{m}}_{T} (x, y) - m (x, y)) . \end{aligned}$ (A13) It follows from the consistency of $F_{x, T} (h_{T})$ and (A2)(i) that $\frac{F_{x, T} (h_{T})}{ϕ (h_{T}) f (x)}$ goes to 1 a.s. as T goes to infinity. By Theorem 3.5, the quantity $\sqrt{\frac{Tϕ (h_{T})}{σ (x, y)}} ({\hat{m}}_{T} (x, y) - m (x, y))$ converges to $N (0, 1)$ as $T \to \infty$ . Then using the non-decreasing property of the cumulative standard normal distribution function Ψ, we get, for a given risk $0 < α < 1$ , the $(1 - α)$ - pseudo-confidence interval (A14) $\sqrt{\frac{Tϕ (h_{T})}{σ^{2} (x, y)}} | {\hat{m}}_{T} (x, y) - m (x, y) | \leq Ψ^{- 1} (1 - \frac{α}{2}) .$ (A14) Considering now the statement (Equation13(13) $σ^{2} (x, y) \leq \frac{1}{f (x)} \frac{M_{2}}{M_{1}^{2} p (x)} {\bar{W}}_{2} (x, y) =: \frac{1}{f (x)} \tilde{V} (x, y) as T ⟶ \infty$ (13) ) combined with Proposition 3.6, it holds that (A15) $lim_{T \to \infty} \frac{σ^{2} (x, y) f (x)}{{\tilde{V}}_{n}^{2} (x, y)} \leq lim_{T \to \infty} \frac{f (x) V^{2} (x, y)}{{\tilde{V}}_{n}^{2} (x, y)} = lim_{T \to \infty} \frac{{\tilde{V}}^{2} (x, y)}{{\tilde{V}}_{n}^{2} (x, y)} = 1 a . s .,$ (A15) since ${\tilde{V}}_{n}^{2} (x, y)$ is a consistent estimator of ${\tilde{V}}^{2} (x, y)$ . The proofs follows then from the statements (EquationA13(A13) $\begin{aligned} \sqrt{\frac{T F_{x, T} (h_{T})}{{\tilde{V}}_{T}^{2} (x, y)}} ({\hat{m}}_{T} (x, y) - m (x, y)) \\ = \sqrt{\frac{F_{x, T} (h_{T})}{ϕ (h_{T}) f (x)}} \sqrt{\frac{σ^{2} (x, y) f (x)}{{\tilde{V}}_{T}^{2} (x, y)}} \sqrt{\frac{Tϕ (h_{T})}{σ^{2} (x, y)}} ({\hat{m}}_{T} (x, y) - m (x, y)) . \end{aligned}$ (A13) ), (EquationA14(A14) $\sqrt{\frac{Tϕ (h_{T})}{σ^{2} (x, y)}} | {\hat{m}}_{T} (x, y) - m (x, y) | \leq Ψ^{- 1} (1 - \frac{α}{2}) .$ (A14) ) and (EquationA15(A15) $lim_{T \to \infty} \frac{σ^{2} (x, y) f (x)}{{\tilde{V}}_{n}^{2} (x, y)} \leq lim_{T \to \infty} \frac{f (x) V^{2} (x, y)}{{\tilde{V}}_{n}^{2} (x, y)} = lim_{T \to \infty} \frac{{\tilde{V}}^{2} (x, y)}{{\tilde{V}}_{n}^{2} (x, y)} = 1 a . s .,$ (A15) ).

Regression estimation for continuous-time functional data processes with missing at random response

Abstract

1. Introduction

2. Framework and assumptions

2.1. Comments on the assumptions

see Doukhan and Louhichi (Citation1999)

see Doukhan (Citation2018, p. 60)

see Lemma 4.2 in Maslowski and Pospíšil (Citation2008)

Continuous-time long memory processes

Discrete-time processes

3. Main results

3.1. Almost sure consistency rates

3.1.1. Pointwise consistency

Pointwise consistency

3.1.2. Uniform consistency

Uniform consistency

3.2. Asymptotic conditional bias and risk evaluation

Conditional Bias

Quadratic risk

3.3. Asymptotic normality

3.4. Continuous-time confidence intervals

3.5. Sampling schemes and computation of the confidence intervals

4. Simulation study

4.1. Simulation 1: continuous-time versus discrete-time estimators

Table 1. Summary statistics of (SEj)j=1,…,500 for discrete- and continuous-time estimators of the regression function.

4.2. Simulation 2: optimal sampling mesh selection

Table 2. The optimal sampling mesh (δ⋆) obtained for different MAR rates and some summary statistics of the MISE(δ).

4.3. Simulation 3: asymptotic confidence intervals

Table 3. Average coverage over the grid Ξ and average confidence interval length appears in brackets.

5. Applications to real data

5.1. Application 1: prediction of financial asset returns

5.2. Application 2: Daily peak electricity demand imputation

6. Discussion of a special case: conditional quantiles

Supplemental Material

Acknowledgments

Disclosure statement

Notes

References

Appendix. Proofs of main results

Proof of Theorem 3.4

Proof of Theorem 3.5

Proof of Lemma A.4

Proof of Corollary 3.7

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Summary statistics of $({SE}^{j})_{j = 1, \dots, 500}$ for discrete- and continuous-time estimators of the regression function.

Table 2. The optimal sampling mesh ( $δ^{⋆}$ ) obtained for different MAR rates and some summary statistics of the MISE(δ).