517
Views
0
CrossRef citations to date
0
Altmetric
Research Article

On estimation of covariance function for functional data with detection limits

&
Received 19 Jul 2022, Accepted 07 Sep 2023, Published online: 19 Sep 2023

Abstract

In many studies on disease progression, biomarkers are restricted by detection limits, hence informatively missing. Current approaches ignore the problem by just filling in the value of the detection limit for the missing observations for the estimation of the mean and covariance function, which yield inaccurate estimation. Inspired by our recent work [Liu and Houwing-Duistermaat (2022), ‘Fast Estimators for the Mean Function for Functional Data with Detection Limits’, Stat, e467.] in which novel estimators for mean function for data subject to detection limit are proposed, in this paper, we will propose a novel estimator for the covariance function for sparse and dense data subject to a detection limit. We will derive the asymptotic properties of the estimator. We will compare our method to the standard method, which ignores the detection limit, via simulations. We will illustrate the new approach by analysing biomarker data subject to a detection limit. In contrast to the standard method, our method appeared to provide more accurate estimates of the covariance. Moreover its computation time is small.

Mathematics Subject Classifications:

1. Introduction

Technological advances resulted in a growing number of datasets containing temporal observations, either dense or sparse. For analysis of these data, functional data analysis (FDA) methods have been developed, see, for example Ramsay and Silverman (Citation2005), Ferraty and Vieu (Citation2006), Horváth and Kokoszka (Citation2012) and Kokoszka and Reimherr (Citation2017) for dense data and Yao, Müller, and Wang (Citation2005), Peng and Paul (Citation2009), Li and Hsing (Citation2010), Wang, Chiou, and Müller (Citation2016) and Zhang and Wang (Citation2016) for sparse data. These methods assume that there is no missing data. However, in practice, we may have data subject to detection limits. Recently, several methods for the estimation of the mean function have been proposed and investigated when the data are subject to detection limits, namely the global method by Shi, Dong, Wang, and Cao (Citation2021) and several local methods by Liu and Houwing-Duistermaat (Citation2022). However, an estimator for the covariance has not yet been developed, which is the topic of this paper.

When levels of a specific marker in a sample have to be determined in a laboratory, we often deal with detection limits. The amount of the marker might be too low to be detected. This results in too many zeros in the dataset and an observed ‘zero’ might be a true zero or just very small. Also on the other extreme of the distribution, detection limits might occur, since measurement techniques are often optimised for a certain range of values and values above and below a certain threshold cannot be accurately measured. Detection limits are not restricted to laboratory measurements. Devices which measure certain characteristics (number of steps for example) might be out of charge yielding an underestimation of the characteristic per day (e.g. the true number of steps for a day is higher than the measured number of steps if the device was out of charge). For simplicity in this paper, we only consider detection limits on the lower extreme of the distribution, i.e. we do not observe values lower than a specific value, instead we observe this specific value which is also called detection limit (DL).

For observations subject to DL, Shi et al. (Citation2021) proposed a global method. However, since observations close to the target point t contain more information about the mean function at t than observations far away from t, local methods as proposed by Liu and Houwing-Duistermaat (Citation2022) might be more appropriate. The estimator of Liu and Houwing-Duistermaat (Citation2022) is based on approximations of the likelihood function. For data subject to DL, the likelihood function is a product of probability density functions for the observed values and of probability distribution functions for the observations subject to DL, since for the latter observations we know that the unobserved value is below a known threshold. To estimate the mean function around observed time points, the authors proposed to use the local polynomial kernel method (Fan and Gijbels Citation1995Citation2018; Beran and Liu Citation2014Citation2016). Further two weighting schemes for subjects have been considered (Zhang and Wang Citation2016; Liu and Houwing-Duistermaat Citation2022), namely the SUBJ scheme which assigns the same weight to each subject and the OBS scheme which assigns the same weight to each observation. The latter scheme will assign more weight to subjects with more observations. To reduce the computation time, Liu and Houwing-Duistermaat (Citation2022) proposed linear and constant approximations for the probability distribution functions in the likelihood function. The constant approximation is computationally fast especially for dense data while it only performs slightly less than the exact and linear approximation method. Note that the global method of Shi et al. (Citation2021) is even more computational inefficient than the linear approximation for dense data. Therefore, in this paper we will use constant approximations to obtain an estimator for the covariance function.

We propose a local constant estimator with approximation and derive their asymptotic behaviour. Via simulations we evaluate their performance in a sparse and a dense setting under both SUBJ and OBS weighting schemes and compare their performance with the standard method where the detection limit is used for the missing values. We also investigate the asymptotic behaviour of the estimators via simulations. To illustrate the proposed method, we apply it to temporal data from a biomarker study. We finish with a conclusion.

2. Methodology

2.1. Functional principal component analysis (FPCA)

We first define the model for functional data subject to a detection limit. Let {X(t):tI} be an L2 stochastic process on interval I. Let μ(t)=E[X(t)] and C(s,t)=E[(X(s)μ(s))(X(t)μ(t))] be the mean and covariance function of X(t), respectively. Then X(t) can be decomposed into X(t)=μ(t)+U(t)where U(t) is the stochastic part of X(t) which has mean zero, i.e. E[U(t)]=0 for tI, and covariance C(s,t)=E[U(s)U(t)] for all s,tI. By Karhunen–Loeve expansion and Mercer's Theorem, we have C(s,t)=l=1λlψl(s)ψl(t)and U(t)=l=1ξlψl(t),where ψl(t) are eigenfunctions of the covariance operator corresponding to C(s,t), positive real numbers λ1>λ2> are the eigenvalues of the covariance operator corresponding to C(s,t), and var(ξl)=λl. Note that the functional principal components {ψl(t)} (FPCs) are an orthonormal basis for L2(I).

Let X1(t),,Xn(t) be n iid copies of X(t) with tI. We have observations of X1(t),,Xn(t) at discrete time points ti1,,tiNi perturbed by an independent random error. Here, Ni is the number of measurements for subject i Specifically, let Yij denote the random variable for the jth time point for subject i with j=1,,Ni and i=1,,n. We can model Yij as follows: (1) Yij=Xi(tij)+ϵij=μ(tij)+Ui(tij)+ϵij=μ(tij)+l=1ξilψl(tij)+ϵij(1) where ϵij is an independent random measurement error term following a distribution in the exponential family with mean zero and variance σ2 that is, ϵij are independent for any i and j. We assume further that ϵij is independent of Ui(t) (or equivalently ξil). Often a Gaussian distribution is assumed, i.e. we have ϵijN(0,σ2) and ξilN(0,λl).

Now, not all Yij are observed due to the presence of a DL. Let δij be the missingness indicator, i.e. δij=0 if Yij is observed, and δij=1 if Yij is unobserved. If δij=1, we assume that the unobserved Yij has a value smaller than (or equal to) a specific threshold cij. For the sake of simplicity of notation, we assume the threshold is fixed, i.e. cij=c for all i, j. Therefore, the observations are {(tij,yij,δij)},i=1,,n,j=1,,Ni,where yij is missing for δij=1.

The consequence of the presence of a DL is that in the likelihood function the contributions of the observations subject to the DL are represented by the probability distribution function instead of the density function. As a consequence the likelihood function is hard to maximise. Liu and Houwing-Duistermaat (Citation2022) proposed to locally approximate the probability distributions by a linear function or by a constant resulting in time efficient estimators for the mean function. They showed via simulations that the local-linear estimator performed only slightly better than the local-constant estimator, but was less time efficient. Therefore, in this paper we will only consider the local-constant estimator.

2.2. Locally Kernel weighted log-likelihood estimator for the mean function

In this section, we briefly summarise the estimation procedure of the mean function which was developed by Liu and Houwing-Duistermaat (Citation2022). For this section, without loss of generalisability, we assume that U(t)=0 in formula (Equation1), i.e. we ignore the covariance between X(t) at various time points. The loglikelihood function approximated locally by a constant is as follows (see Liu and Houwing-Duistermaat Citation2022): (2) L(β;h,t)=i=1nwij=1Ni[0.251δij(cβ0σ)2+0.8194δijcβ0σ+(0.50.5δij)(yijβ0σ)2]Kh(tijt).(2) where Kh()=1hK(h) and K() is a kernel function see details in Assumption (A1) and wi are weights. Two types of weights wi are considered, namely wiSUBJ=1nNiand wiOBS=1i=1nNi.Using loglikelihood function (Equation2), Liu and Houwing-Duistermaat (Citation2022) obtained the following local constant estimator of the mean function: (3) μ^LC(t)=β^0=R0S0,(3) where S0=i=1nwij=1Ni(10.498δij)Kh(tijt)and R0=i=1nwij=1Ni[0.8194δijσ+0.502δijc+(1δij)yij]Kh(tijt).Note that Liu and Houwing-Duistermaat (Citation2022) only derived the asymptotic distribution of the local linear estimator (see Theorem 2.1 of their paper). The asymptotic distribution of μ^LC(t) given in (Equation3) can be obtained in a similar way. In this paper, we derive estimators for the covariance function C(s,t) using similar ideas.

2.3. Local Kernel weighted estimation of covariance function

In this section, we assume μ(t)=0 and ϵN(0,σ2) (for simplicity of notations). We propose the following fast local constant kernel weighted estimator of C(s,t): (4) C^(s,t)=R00S00(4) where R00=i=1nvi1jlNiKh(tijs)Kh(tilt)Cijland S00=i=1nvi1jlNiKh(tijs)Kh(tilt)Dijlwith Cijl=[0.8194δijσ+0.502δijc+(1δij)yij][0.8194δilσ+0.502δilc+(1δil)yil]and Dijl=(10.498δij)(10.498δil)and vi viOBS=1Ni(Ni1)orviSUBJ=1nNi(Ni1).

Remark 2.1

Note that, without DL, the local constant smoother (or NW) for the mean function is μ^(t)=β^0=argmini=1nwij=1Ni(yijβ0)2Kh(tijt)=R0S0with δij=0 for all i and j in R0 and S0 (in formula (Equation3)), and the local constant smoother for covariance is C^(s,t)=β^0=argmini=1nvi1jlNi(Cijlβ0)2Kh(tijs)Kh(tilt)=R00S00with δij=0 for all i and j in R00 and S00 (in formula (Equation4)).

Remark 2.2

If μ(t) is unknown, it can be estimated by formula (Equation3). Then we can just replace yij by yijμ^(tij) in formula (Equation4) to obtain the estimate of C(s,t).

For the observations yij, we have that they are observed values of a perturbed underlying continuous function X(t). Now, for δij, we define indicator functions {δi(t),i=1,,n} on interval I with range {0,1} and δi(tij)=δij. For the covariance estimator defined in Equation (Equation4), Theorem 2.1 holds.

Theorem 2.1

Under Assumptions B.1–B.3 given in the Appendix, for a fixed interior point (s,t)I×I, (Γn,Ni)1/2[C^(s,t)C(s,t)B(s,t)σ2h22σK2D(s,t)+o(h2)]N(0,1)where Γn,Ni=1(Ni(Ni1)wi2(s,t)f(s)f(t))2×((wi1(s,t)B(s,t)wi2(s,t))2σ4ViI(s,t)+wi42(s,t)σ2ViII(s,t)+2wi4(s,t)wi4(t,s)σ2ViIII(s,t)+2wi2(s,t)wi4(s,t)σViIV(s,t)+wi42(t,s)σ2ViII(t,s)+2wi2(t,s)wi4(t,s)σViIV(t,s)+wi22(s,t)ViV(s,t)),B(s,t)=iNi(Ni1)wi1(s,t)iNi(Ni1)wi2(s,t),D(s,t)=2C(s,t)sf(s)f(s)+2C(s,t)tf(t)f(t)+2C(s,t)s2+2C(s,t)t2,wi1(s,t), wi2(s,t), wi3(s,t), and wi4(s,t) are coefficient functions which are defined in Appendix, ViI(s,t), ViII(s,t), ViIII(s,t), ViIV(s,t) and ViV(s,t) are covariance functions which are also defined in Appendix, and notations σK2 and f(t) are also defined in Appendix.

Proof.

The proof comprises showing that the asymptotic bias and the asymptotic variance of C^(s,t) are equal to B(s,t)σ2+h22σK2D(s,t)+o(h2) and Γn,Ni respectively. Here, Assumptions on the kernel (A1), the local polynomial smoothing (B1), (B2) and (B3). Details are given in Appendix. The additional assumption (C1) assures that the asymptotic bias is bounded, and assumption (C2) guarantees that the variance of the estimator goes to zero. The final step is to prove asymptotic normality of C^(s,t). Note that this follows from the asymptotic normality of (R00E[R00],S00E[S00]) by the application of the delta method see Theorem 1.12 in Shao (Citation2003). Now, asymptotic normality of (R00E[R00],S00E[S00]) follows from the Lyapunov condition and Cramer–Wold device. Specifically, Lyapunov CLT of S00 and R00 can be achieved by the Lyapunov condition given in Assumption (C3), where the power is 3 (i.e. 2+δ and δ is 1) using the notation in Theorem 27.3 in Billingsley (Citation2008). Then the asymptotic joint normality of (R00E[R00],S00E[S00]) can be derived via the Cramer–Wold device for the two-dimension case, see Theorem 29.4 in Billingsley (Citation2008).

This completes the proof.

Remark 2.3

If there are no observations subject to DL, i.e. δij=0, for all i, j, then B(s,t)=0. Moreover, Γn,Ni=1f2(s)f2(t)vi2ViV(s,t),which corresponds to the results of classic local constant covariance estimator (see Zhang and Wang Citation2016).

3. Simulation study

We evaluate the performance of our proposed estimator of C(s,t) via simulations. We compare its performance with a standard method where the missing observations are replaced with the DL value (Yao et al. Citation2005). We compare the methods in terms of bias, efficiency, asymptotic behaviour and computation time.

We assume that μ(t)=0 for simplicity and define the true zero mean random function X(t) as follows: X(t)=ξψ(t),t[0,1],where ψ(t)=2cos(4πt), ξ is a normal random variable with mean zero and variance 2, i.e. ξN(0,λ) and λ=2. Therefore the covariance function is C(s,t)=λψ(s)ψ(t)=4cos(4πs)cos(4πt),s,t[0,1].The observed time points tijU[0,1] are iid sampled from the continuous uniform distribution in the interval [0,1]. Additive errors are sampled from ϵijN(0,1). Then the response is generated by Yij=Xi(tij)+ϵij=ξiψ(tij)+ϵij,i=1,,n,j=1,,Ni.Finally, the missing data are created with the observations less than DL are replaced by DL, with DL={1,0}.

We consider two settings, namely a sparse and a dense grid for the observations for each subject i. We, specifically, sample the number of time points Ni for each trajectory i as follows:

  • Sparse setting: NiU{3,4,5,6,7,8,9,10} i.e. Ni are iid from a discrete uniform distribution in {3,4,,10}.

  • Dense setting: NiU{75,76,,100} i.e. Ni iid from a discrete uniform distribution in in {75,76,,100}.

For each setting, we simulate Q = 100 replicates. Each replicate contains information of n = 100 subjects.

To estimate the covariance functions in the replicates, we consider the following methods:

  • Our estimator based on local constant approximations using either the OBS or the SUBJ weighting schemes.

  • PACE which does not adjust for the detection limit (Yao et al. Citation2005).

For each replicate, the covariance functions are estimated on 20 equal-distant time points in [0,1]×[0,1]. The variance of ϵij is estimated as the mean squared error based on the least-squared fit using all the data (including the values subject to DL). We use the Gaussian kernel for the estimation procedure. To select the bandwidth h, the integrated squared error (ISE) is computed for a dense grid of values, namely h=(2:12)/200. The ISE is defined as follows: ISE(C^(s,t),h)=0101(C^h(s,t)C(s,t))2dsdt,where C^h(s,t) is the estimation of C with bandwidth h. The bandwidth which minimises ISE(C^(s,t),h) is selected as the optimal bandwidth and the corresponding ISE is denoted with ISEopt(C^(s,t)) (see Fan and Gijbels Citation2018).

We then calculate the mean integrated squared error (MISE) and the standard deviation of ISE over Q(=100) replicates: (5) MISE(C^(s,t))=1Qi=1QISEopt(C^([i])(s,t)),(5) (6) SD(C^(s,t))=1Q1i=1Q(ISEopt(C^([i])(s,t))MISE(C^(s,t)))2,(6) where C^([i])(s,t) is the covariance estimation based on the ith replicate.

For the sparse setting, Figure  depicts the first 20 of the 100 trajectories in the first replicate without a DL and subject to a DL={1,0}. The proportion of observations subject to DL is 26.67% and 48.73%, respectively. The corresponding estimates of the covariance function by the local constant approximation method proposed in this paper under the OBS and SUBJ weighting schemes and by the PACE method are given in Figure .

Figure 1. The first 20 of the 100 trajectories in the first replicate of the sparse setting. Left: Data without a detection limit, middle with a detection limit of 0, right with detection limit of 1.

Figure 1. The first 20 of the 100 trajectories in the first replicate of the sparse setting. Left: Data without a detection limit, middle with a detection limit of 0, right with detection limit of −1.

Figure 2. For replicate 1 of the sparse setting, contour plots with lines of the true and estimated covariance function for a DL of 0 (top layer) and of 1 (bottom layer) using the constant approximation methods with the OBS and SUBJ weighting schemes (second and third columns) and using PACE (fourth column). The covariance is estimated at 20 equal-distant points in [0,1]×[0,1]. The bandwidth for the constant approximation method is 0.015.

Figure 2. For replicate 1 of the sparse setting, contour plots with lines of the true and estimated covariance function for a DL of 0 (top layer) and of −1 (bottom layer) using the constant approximation methods with the OBS and SUBJ weighting schemes (second and third columns) and using PACE (fourth column). The covariance is estimated at 20 equal-distant points in [0,1]×[0,1]. The bandwidth for the constant approximation method is 0.015.

For these two replicates, the proposed local constant approximation method performs much better than PACE. For DL=0, PACE estimates the covariance of all considered time points larger than zero while the two local constant estimates also have negative values representing the true situation. For both values of DL, the OBS scheme seems to capture the true covariance function slightly better in this replicate. Further because of less missing values, the results for DL=1 (bottom layers of Figure ) are better than that for DL=0 (top layers of these figures). The time needed for calculating the estimate of the covariance function appeared to be 4.8 seconds for DL=1 and 5.1 seconds for DL=0 for the proposed estimators while the computation time for the PACE method was 7.6 seconds for DL=1 and 7.9 seconds for DL=0. Thus our proposed local approximation method is more time efficient than the PACE method.

For the dense setting, Figure  depicts the first 20 of the 100 trajectories in the first replicate without a DL and subject to a DL of DL=0 and of DL=1. The proportions of observations subject to DL are 50.21% in the replicate with for DL=0 and 25.61% for DL=1. The estimates of the covariance functions by the various methods using the data from these replicates are given in Figure .

Figure 3. The first 20 of the 100 trajectories in the first replicate of the dense setting. Left: Data without a detection limit, middle with a detection limit of 0, right with detection limit of 1.

Figure 3. The first 20 of the 100 trajectories in the first replicate of the dense setting. Left: Data without a detection limit, middle with a detection limit of 0, right with detection limit of −1.

Figure 4. For replicate 1 of the dense setting, contour plots with lines of the true and estimated covariance function for a DL of 0 (top layer) and of 1 (bottom layer) using the constant approximation methods with the OBS and SUBJ weighting schemes (second and third columns) and using PACE (fourth column). The covariance is estimated at 20 equal-distant points in [0,1]×[0,1]. The bandwidth for the constant approximation method is 0.03 for a DL of 0 and 0.035 for a DL of 1.

Figure 4. For replicate 1 of the dense setting, contour plots with lines of the true and estimated covariance function for a DL of 0 (top layer) and of −1 (bottom layer) using the constant approximation methods with the OBS and SUBJ weighting schemes (second and third columns) and using PACE (fourth column). The covariance is estimated at 20 equal-distant points in [0,1]×[0,1]. The bandwidth for the constant approximation method is 0.03 for a DL of 0 and 0.035 for a DL of −1.

As in the sparse setting, the proposed local constant approximation method performs better than the existing PACE method for these two replicates. The two weighting schemes in the local constant approximation methods give similar estimates. The time needed for calculating the estimated covariance function is 0.03 seconds for DL=1 and 0.057 seconds for DL=0 by using the local constant approximation, while for PACE it is 758 seconds for DL=1 and 773 seconds for DL=0. Thus our proposed local approximation method is considerably more time efficient than the PACE method for the dense setting.

Table  shows the results of the simulation study based on all replicates. It provides the MISE and the corresponding standard deviation (SD) for local constant approximation for the two weighting schemes (SUBJ or OBS) for increasing sample size n. Also the mode of optimal bandwidth in local constant estimation selected for each replicate is provided. We did not show the results of the PACE method, as this method appeared to give biased and inaccurate estimates, see Figures and .

Table 1. Results of the simulation study for the scenarios DL=0 and DL=1 and for the sparse and dense setting.

Clearly as n increases from 100 to 1000, the MISE and corresponding SD decreases. The dense case has smaller MISE compared to the sparse case. Comparing DL=1 with DL=0, MISE is smaller for DL=1. This can be explained by the fact that there is more information for DL=1 and in the dense setting. The optimal bandwidth is very stable across all settings for both weighting schemes. For the sparse setting, the OBS scheme performs better than the SUBJ scheme. For the dense setting, the two weighting schemes perform similar.

4. Data application

We illustrate our method using data from a longitudinal biomarker study of scleroderma patients. Scleroderma is a heterogeneous disease where the course of the severity varies among patients. The study comprises 217 patients with hospital visits from 2010 to 2015. Typically, scleroderma patients visit the hospital every 6 months to check whether the disease has progressed. However, patients missed their appointments or their data were not recorded resulting in a sparse unbalanced dataset. The data were collected according to the ethically approved protocol for observational study HRA number 15/NE/0211.

In Liu and Houwing-Duistermaat (Citation2022), the mean functions of two biomarkers subject to detection limits were estimated, namely aldose reductase (AR) and alpha fetoprotein (AF). The percentage of missing data for the AF marker is high, namely 75%, which led to uncertainty in the estimation of the mean function. The percentage of missing data due to the DL for AR is much lower namely 7.8% observations resulting in a more stable mean function. For the estimation of the covariance function we use the data on AR.

For data cleaning, we remove observations at time points with no outcomes or no biomarker values, and some outliers (AR has a value larger than 3 times the standard deviation). Finally, patients with only one observation are dropped. The final dataset comprises 90 patients within total 268 observations. The mean function of AR is estimated using the local constant approximation method under the OBS scheme. The bandwidth was selected using CV over a fine grid. The observed profiles minus the estimated mean function are shown in Figure .

Figure 5. The AR observations with estimated mean being subtracted.

Figure 5. The AR observations with estimated mean being subtracted.

The estimate of the covariance function using the local constant method is shown in Figure . Note that the number of patients observed over a large time period is small, hence the covariance estimations for s or t larger than 0.6 have large uncertainty. For the region [0,0.5]×[0.0.5], we observe that at larger distance the covariance decreases. Along the line s = t for the region [0,0.5]×[0.0.5], the estimated covariance functions appear to decrease. This is probably due to a smaller variance (see Figure ). Finally for the regions [0.5,1]×[0.0.5] and [0,0.5]×[0.5,1], we observe that the covariance increases when s and t respectively increase to 1. This may reflect the fact that patients observed over a longer time range are a specific subset of the patients, namely they are more stable hence show a large correlation and a large covariance over time.

Figure 6. The covariance estimation for AR by using local constant estimation method.

Figure 6. The covariance estimation for AR by using local constant estimation method.

5. Discussion

We have proposed a novel estimator for the covariance function for sparse and dense temporal data subject to a DL. Our method is based on local smoothing of the covariance function using kernel functions. We derived the asymptotic properties of the estimator and evaluated these properties via simulations. We compared our method to the method which ignores the presence of a DL in the data sample. We showed that our methods performed better in terms of bias and computation time. We also considered two weighting schemes for the observations, one based on single observations and one based on subjects. For sparse data, weighting per observation appeared to perform better.

We illustrated the method using data from a biomarker study. The estimated covariance for the biomarker first decreased over time and then started increasing again. The latter might be explained by an increase in variance and/or in correlation. If the biomarker represents disease severity, the patients who have a longer follow up are likely to be patients with a less severe disease course. Thus these results might be explained by non-random drop out of patients. The development of estimators for the mean and covariance function for drop out will be future research.

Liu and Houwing-Duistermaat (Citation2022) also proposed a linear approximation instead of a constant approximation. We did not consider this approach here since for dense data there is no difference in performance and the linear approximation requires more computation time. For sparse data, a linear approximation might perform slightly better. Another approach is to impute the missing observations and then use PACE for the estimation of the covariance function. For cross-sectional data, Uh, Hartgers, Yazdanbakhsh, and Houwing-Duistermaat (Citation2008) studied the performance of imputation methods. They concluded that these methods may give biased estimators or underestimated variances. Given the results of Uh et al. (Citation2008) and the facts that multiple imputations would increase the computation time and that the computation time of PACE is higher than of our methods, we did not consider this approach for the estimation of covariance function.

For the selection of the bandwidth, we used cross validation in the data application while in the simulation we used ISE where we plug in the true value of the covariance function C(t,s). We could have used cross validation in the simulation study as well, however, this would have increased the computation time while we expect that cross validation would only slightly change individual results and our overall conclusions with regard to the effect of DL on the estimation of the covariance and the difference between using OBS and SUBJ weighting would not change.

With the availability of estimators of the mean and the covariance function, models for temporal data subject to DL can be built. Functional principal component analysis (FPCA) can be used to reduce the infinite dimension into finite dimension. For sparse datasets, FPCA can be used to obtain smooth individual curves. Finally functional regression models can be developed to investigate the influence of covariates with DL on the outcomes which might be also subject to DL.

Acknowledgments

We would like to thank a referee for very useful constructive remarks.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work is supported by a fellowship of the Alan Turing Institute, by Yujie Talent Project of North China University of Technology No. 107051360023XN075-04 and by the EU funded Cost action DYNALIFE, no CA21169.

References

  • Beran, J., and Liu, H. (2014), ‘On Estimation of Mean and Covariance Functions in Repeated Time Series with Long-memory Errors’, Lithuanian Mathematical Journal, 54(1), 8–34.
  • Beran, J., and Liu, H. (2016), ‘Estimation of Eigenvalues, Eigenvectors and Scores in FDA Models with Strongly Dependent Errors’, Journal of Multivariate Analysis, 147, 218–233.
  • Billingsley, P. (2008), Probability and Measure, Hoboken, New Jersey: John Wiley & Sons.
  • Fan, J., and Gijbels, I. (1995), ‘Datadriven Bandwidth Selection in Local Polynomial Fitting: Variable Bandwidth and Spatial Adaptation’, Journal of the Royal Statistical Society: Series B (Methodological), 57(2), 371–394.
  • Fan, J., and Gijbels, I. (2018), Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66, London: Routledge.
  • Ferraty, F., and Vieu, P. (2006), Nonparametric Functional Data Analysis: Theory and Practice, New York: Springer.
  • Horváth, L., and Kokoszka, P. (2012), Inference for Functional Data with Applications, New York: Springer.
  • Kokoszka, P., and Reimherr, M. (2017), Introduction to Functional Data Analysis, London: Chapman and Hall/CRC.
  • Li, Y., and Hsing, T. (2010), ‘Uniform Convergence Rates for Nonparametric Regression and Principal Component Analysis in Functional/longitudinal Data’, The Annals of Statistics, 38(6), 3321–3351.
  • Liu, H., and Houwing-Duistermaat, J. (2022), ‘Fast Estimators for the Mean Function for Functional Data with Detection Limits’, Stat, 11(1), e467.
  • Peng, J., and Paul, D. (2009), ‘A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data’, Journal of Computational and Graphical Statistics, 18(4), 995–1015.
  • Ramsay, J.O., and Silverman, B.W. (2005), Functional Data Analysis (2nd ed.), New York: Springer.
  • Shao, J. (2003), Mathematical Statistics, New York: Springer Science & Business Media.
  • Shi, H., Dong, J., Wang, L., and Cao, J. (2021), ‘Functional Principal Component Analysis for Longitudinal Data with Informative Dropout’, Statistics in Medicine, 40(3), 712–724.
  • Uh, H.W., Hartgers, F.C., Yazdanbakhsh, M., and Houwing-Duistermaat, J.J. (2008), ‘Evaluation of Regression Methods When Immunological Measurements are Constrained by Detection Limits’, BMC Immunology, 9(1), 1–10.
  • Wang, J.L., Chiou, J.M., and Müller, H.G. (2016), ‘Functional Data Analysis’, Annual Review of Statistics and Its Application, 3, 257–295.
  • Yao, F., Müller, H.G., and Wang, J.L. (2005), ‘Functional Data Analysis for Sparse Longitudinal Data’, Journal of the American Statistical Association, 100(470), 577–590.
  • Zhang, X., and Wang, J.L. (2016), ‘From Sparse to Dense Functional Data and Beyond’, The Annals of Statistics, 44(5), 2281–2321.

Appendix

Notations

Notation B.1

Define the following notations:

  • Coefficient functions wi1(s,t)=[(0.8194+0.502[1,2])δi(s)][(0.8194+0.502[1,2])δi(t)]viwi2(s,t)=[10.498δi(s)][10.498δi(t)]viwi3(s,t)=[1δi(s)][1δi(t)]viwi4(s,t)=[(0.8194+0.502[1,2])δi(s)][10.498δi(t)]viwhere 0.502[1,2] means the interval [0.502,1.004] and therefore wi1(s,t) is a value in an closed interval for fixed s, t.

  • Conditional expectations E0(s,t)=E[Y1Y2|T1=s,T2=t]E1(s,t)=E[Y1Y2Y3|T1=t,T2=s,T3=t]E2(s,t)=E[Y12Y2|T1=t,T2=s]E3(s,t)=E[Y1Y2Y3Y4|T1=s,T2=t,T3=s,T4=t]E4(s,t)=E[Y12Y2Y3|T1=s,T2=t,T3=t]E5(s,t)=E[Y12Y22|T1=s,T2=t].Note that since we assume in this section μ(t)=0, for independent random variables T1 and T2, we denote E0(s,t)=E[Y1Y2|T1=s,T2=t]; otherwise, E0(s,t)=E[(Y1μ(T1))(Y2μ(T2)|T1=s,T2=t]. The other notations Ei(s,t),i=1,2,,5 are similar.

  • Covariance functions ViI(s,t)=Ni(Ni1)(1+Is=t)K4h2(f(s)f(t)+o(1))+Ni(Ni1)(Ni2)(1+Is=t)K2h(f(s)f2(t)+f2(s)f(t)+o(1))+(Ni(Ni1)(Ni2)(Ni3)(Ni(Ni1))2)(f2(s)f2(t)+o(1)),ViII(s,t)=Ni(Ni1)Is=tK4h2(f(s)f(t)E0(t,t)+o(1))+Ni(Ni1)(Ni2)K2h(f2(s)f(t)E0(t,t)+o(1))ViIII(s,t)=Ni(Ni1)Is=tK4h2(f2(s)E0(s,t)+o(1))+Ni(Ni1)(Ni2)Is=tK2h(f3(s)E0(s,t)+o(1))ViIV(s,t)=Ni(Ni1)(1+Is=t)K4h2(f(s)f(t)E2(s,t)+o(1))+Ni(Ni1)(Ni2)(1+Is=t)K2h(f(s)f2(t)E1(s,t)+f2(s)f(t)E2(s,t)+o(1))+Ni(Ni1)(Ni2)(Ni2)(f2(s)f2(t)+o(1))ViV(s,t)=Ni(Ni1)(1+Is=t)K4h2(f(s)f(t)E5(s,t)+o(1))+Ni(Ni1)(Ni2)(1+Is=t)K2h(f(s)f2(t)E4(s,t)+f2(s)f(t)E4(t,s)+o(1))+Ni(Ni1)(Ni2)(Ni2)(f2(s)f2(t)E3(s,t)+o(1))(Ni(Ni1))2C2(s,t)(f2(s)f2(t)+o(1))

Assumptions

Assumption B.1

Assumptions for the Kernel function:

(A1)

Kernel function K() is a symmetric probability density function on [1,1], and σK2=u2K(u)du<and K2=K2(u)du<.

Assumption B.2

Assumptions for time points and the true functions:

(B1)

Time points {tij,i=1,,n,j=1,,Ni} are iid copies of a random variable T defined on interval I with density f(): 0<mfminf(t)maxf(t)Mf<and f(t) is bounded.

(B2)

X(t) is independent of T, ϵ is independent of T.

(B3)

2C(s,t)s2,2C(s,t)ss,2C(s,t)t2 are bounded on I×I.

Assumption B.3

Assumptions for deriving the asymptotic distribution of the estimated covariance function:

(C1)

For k1,k2=1,2,3,4, as n, h:=hn0iNi(Ni1)wik1(s,t)wik2(s,t)h20iNi(Ni1)(Ni2)wik1(s,t)wik2(s,t)h0iNi(Ni1)(Ni2)(Ni3)wik1(s,t)wik2(s,t)0.

(C2)

For k1,k2=1,2,3,4, as n, mink1,k2=1,2,3,4{h2iNi(Ni1)wik1(s,t)wik2(s,t),hiNi(Ni1)(Ni2)wik1(s,t)wik2(s,t),1iNi(Ni1)(Ni2)(Ni3)wik1(s,t)wik2(s,t)}h60.

(C3)

For k1,k2,k3=1,2,3,4, as n, maxk1,k2,k3=1,2,3,4{iNi(Ni1)wik1(s,t)wik2wik3(s,t)h4,iNi(Ni1)(Ni2)wik1(s,t)wik2wik3(s,t)h3,iNi(Ni1)(Ni2)(Ni3)wik1(s,t)wik2wik3(s,t)h2,iNi(Ni1)(Ni2)(Ni3)(Ni4)wik1(s,t)wik2wik3(s,t)h,iNi(Ni1)(Ni2)(Ni3)(Ni4)(Ni5)wik1(s,t)wik2wik3(s,t)}/(iNi(Ni1)wik1(s,t)wik2(s,t)h2,iNi(Ni1)(Ni2)wik1(s,t)wik2(s,t)h,iNi(Ni1)(Ni2)(Ni3)wik1(s,t)wik2(s,t))320.

Proof of Theorem 2.1

Proof.

The calculation of the asymptotic bias of C^(s,t), E[S00]=[f(s)f(t)+h22σK2B1(s,t)+o(h2)]Ni(Ni1)wi2(s,t)E[R00]=[f(s)f(t)+h22σK2B1(s,t)+o(h2)]Ni(Ni1)wi1(s,t)σ2+[C(s,t)f(s)f(t)+h22σK2B2(s,t)+o(h2)]Ni(Ni1)wi2(s,t),where B1(s,t)=f(s)f(t)+f(s)f(t)B2(s,t)=C(s,t)f(s)f(t)+C(s,t)f(s)f(t)+2Csf(s)f(t)+2Ctf(s)f(t)+2Cs2f(s)f(t)+2Ct2f(s)f(t)=C(s,t)B1(s,t)+2Csf(s)f(t)+2Ctf(s)f(t)+2Cs2f(s)f(t)+2Ct2f(s)f(t)and σK2=u2K(u)du.Therefore, by using the delta method, the asymptotic bias is E[C^(s,t)]C(s,t)=E[R00]E[S00]C(s,t)=B(s,t)σ2+h22D(s,t)+o(h2).For the asymptotic variance, we need to calculate var(S00),var(R00),cov(R00,S00). This involves the calculation of E[Kh(Ts)Kh(Tt)] which is equal to K2f(t)h+o(1h) if s = t and 0 if st as h0, because the support of K(t) is [1,1]. Thus var(S00)=wi22(s,t)ViI(s,t),var(R00)=wi12(s,t)σ4ViI(s,t)+2wi1(s,t)wi2(s,t)C(s,t)σ2ViI(s,t)+wi42(s,t)σ2ViII(s,t)+2wi4(s,t)wi4(t,s)σ2ViIII(s,t)+2wi2(s,t)wi4(s,t)σViIV(s,t)+wi42(t,s)σ2ViII(t,s)+2wi2(t,s)wi4(t,s)σViIV(t,s)+wi22(s,t)ViV(s,t)cov(R00,S00)=[wi22(s,t)C(s,t)+wi1(s,t)wi2(s,t)σ2]ViI(s,t).Therefore, by using the delta method, the asymptotic variance is var(C^(s,t))=(1E[S00]E[R00]E[S002])T(var(R00)cov(R00,S00)cov(R00,S00)var(S00))(1E[S00]E[R00]E[S002])=(Ni(Ni1)wi2(s,t)f(s)f(t))2×[(wi1(s,t)B(s,t)wi2(s,t))2σ4ViI(s,t)+wi42(s,t)σ2ViII(s,t)+2wi4(s,t)wi4(t,s)σ2ViIII(s,t)+2wi2(s,t)wi4(s,t)σViIV(s,t)+wi42(t,s)σ2ViII(t,s)+2wi2(t,s)wi4(t,s)σViIV(t,s)+wi22(s,t)ViV(s,t)]=Γn,Ni.