616
Views
4
CrossRef citations to date
0
Altmetric
Articles

Convergence rate of principal component analysis with local-linear smoother for functional data under a unified weighing scheme

, , &
Pages 55-65 | Received 04 Aug 2018, Accepted 10 Aug 2019, Published online: 20 Aug 2019

ABSTRACT

The unified weighing scheme for the local-linear smoother in analysing functional data can deal with data that are dense, sparse or of neither type. In this paper, we focus on the convergence rate of functional principal component analysis using this method. Almost sure asymptotic consistency and rates of convergence for the estimators of eigenvalues and eigenfunctions have been established. We also provide the convergence rate of the variance estimation of the measurement error. Based on the results, the number of observations within each curve can be of any rate relative to the sample size, which is consistent with the earlier conclusions about the asymptotic properties of the mean and covariance estimators.

1. Introduction

In this article, we consider the typical functional data setting, where a sample of n curves are observed over the time range T, each at mi discrete points for i=1,,n. When analysing such data, sparsity of the time grid at which the measurements are observed should be taken into account, and proper estimation procedures would be adopted accordingly. Conventionally, pre-smoothing the observations from each subject is viable for dense data before subsequent ?>analysis, whereas the subjects are pooled to borrow information for sparse data; furthermore, two types of estimation procedures present different asymptotic properties (Zhang & Wang, Citation2016). For the essential problem of estimating the mean and covariance functions, we refer to, for example, Ferraty and Vieu (Citation2006), Ramsay and Silverman (Citation2005), Rice and Silverman (Citation1991), Staniswalis and Lee (Citation1998), Zhou, Lin, and Liang (Citation2017) and references therein.

Here we focus on local-linear smoother, which has high popularity due to its conceptual simplicity, attractive local features and ability for automatic boundary correction (Fan & Gijbels, Citation1996). To ensure that the effect of each curve on the optimisers is not overly affected by the denseness of observations, different weighing schemes have been proposed. A scatter plot smoother is employed by Yao, Müller, and Wang (Citation2005), which assigns the same weight to each observation for sparse functional data analysis, referred to as the ‘OBS’ scheme. Alternatively, Li and Hsing (Citation2010) suggested a unified framework in which the number of observations within each curve can be of any rate relative to the sample size, where each subject received the same weight, referred to as the ‘SUBJ’ scheme. More recently, Zhang and Wang (Citation2016) proposed a more general weighing scheme which includes the previous two commonly used schemes as special cases, and provided a comprehensive and unifying analysis of the asymptotic properties for estimation.

In addition to the essential estimation problem, functional principal component analysis (FPCA) has become a common part in functional data analysis, for example, to achieve dimension reduction of functional data by summarising the data in a few functional principal component (FPC) scores, or to interpret the varying trend of individual trajectories with the eigenfunctions. For a comprehensive discussion on FPCA, one may refer to Greven, Crainiceanu, Caffo, and Reich (Citation2010), Hall, Müller, and Wang (Citation2006), James, Hastie, and Sugar (Citation2000), Jiang and Wang (Citation2010), Yao and Lee (Citation2006), and the references therein. Although there has been a lot of literature on FPCA, only a few theoretical studies on FPCA have been made, such as in Hall and Hosseini-Nasab (Citation2006), Hall et al. (Citation2006) and Li and Hsing (Citation2010), and they are all based on ‘OBS’ or ‘SUBJ’ scheme. The convergence rates of the eigenvalues and eigenfunctions for the FPCA have not been studied under the general weighing scheme. The theoretical results in the paper not only provide the upper bounds of the convergence rates of eigenvalues and eigenfunctions under a unified weighing scheme but also provide bases for further theoretical studies on functional clustering and classification based on the FPCA method.

The work of Zhang and Wang (Citation2016) established the asymptotic normality, L2 convergence, and uniform convergence for the mean and covariance estimators, but not the asymptotic properties of the FPC. Under the general weighing scheme, we provide in this article the almost sure convergence rate for eigenvalues and eigenfunctions, and further the convergence rate for the variance estimation of the measurement errors. The rest of the article is organised as follows. Notations, model and methodology including FPCA are included in Section 2. The main results about convergence rate in eigenvalues, eigenfunctions and the variance estimation of measurement errors are established in Section 3, with all technical proofs left to the Appendix. Simulation studies to verify the theoretical results are shown in Section 4. The concluding remarks are given in Section 5.

2. Model and methodology

Consider a random process X(t) defined on a fixed interval T=[0,1] with mean function μ(t)=E{X(t)} and covariance function γ(s,t)=cov{X(s),X(t)}. Denote with Yij the error-prone observations of the random process at random points Tij, that is Yij=Xi(Tij)+εij, i=1,,n, j=1,,mi; where the Xis are realisations of X, εij=εi(tij) are identically distributed measurement errors with mean zero and variance σ2, and all the Xis, Tijs and εijs are assumed to be independent.

2.1. Local-linear smoother

A local-linear estimator of the mean function μˆ(t)=βˆ0 is obtained by minimising the weighted least squares i=1nωij=1miKh1(Tijt)Yijβ0β1(Tijt)2, with respect to (β0,β1), where Kh()=h1K(/h) is a kernel with bandwidth h. It was proposed in Zhang and Wang (Citation2016) to assign weight ωi to each observation for the ith subject such that i=1nmiωi=1, which is the general weighing scheme. Specifically, assignment ωi=1/i=1nmi along with νi=1/i=1nmi(mi1) leads to the OBS scheme; assignment ωi=1/(nmi) along with νi=1/nmi(mi1) leads to the SUBJ scheme.

To estimate the covariance function γ(s,t), we first estimate G(s,t)=E{X(s)X(t)}, and then it follows that (1) γˆ(s,t)=Gˆ(s,t)μˆ(s)μˆ(t).(1) Similarly as before, a local-linear estimator Gˆ(s,t)=βˆ0 is obtained by minimising the weighted least squares i=1nνi1jlmiKh2(Tijs)Kh2(Tilt)YijYilβ0β11(Tijs)β12(Tilt)2, where weight νi is attached to each YijYil for the ith subject such that i=1nmi(mi1)νi=1.

Finally to estimate the variance of measurement errors, we start by a local-linear estimator Vˆ(t)=βˆ0 of V(t):=G(t,t)+σ2, obtained by minimising i=1nωij=1miKh3(Tijt){Yij2β0β1(Tijt)}2. We then estimate σ2 by σˆ2=01{Vˆ(t)Gˆ(t,t)}dt. Throughout this article, we select the bandwidths h1,h2 and h3 by using the leave-one-out cross-validation method.

2.2. Functional principal component analysis

We consider a spectral decomposition of γ(s,t) and its approximation. According to Mercer's theorem, the covariance function has the spectral decomposition γ(s,t)=j=1λjφj(s)φj(t), where λ1λ20 are the eigenvalues of γ(,), and the φjs are the corresponding eigenfunctions, i.e., the principal components, which form an orthonormal system on the space of square-integrable functions on [0,1]. Following the Karhunen–Loève expansion, Xi(t) is represented as Xi(t)=μ(t)+j=1ξijφj(t), where ξij=01Xi(t)φj(t)dt is referred to as the jth FPC score of the ith subject. For each i, the ξij s are uncorrelated random variables with E(ξij)=0 and E(ξij2)=λj.

With the local-linear estimate γˆ(s,t), we can approximate it with γˆ(s,t)=j=1Kλˆjφˆj(s)φˆj(t), where λˆ1λˆ20 are the estimated eigenvalues, the φˆjs are the corresponding estimated eigenfunctions, and K is the number of principal components selected. We refer to Hall et al. (Citation2006) and Yao et al. (Citation2005) for comprehensive discussions about computation of the eigenvalues and eigenfunctions of an integral operator with a symmetric kernel. We refer to Li, Wang, and Carroll (Citation2013) for intensive discussions about the choice of K and the underlying theory of the functional principal component analysis.

3. Main results about convergence rates

A few more notations are introduced first. Given a positive integer k, we denote MSk=i=1nmik, M¯=i=1nmi/n, M¯Sk=MSk/n and M¯Hk=(n1i=1nmik)1, where the subscript ‘H’ in M¯H suggests a harmonic mean. The asymptotic behaviour of estimated eigenvalues and eigenfunctions is given in Theorem 3.1.

Theorem 3.1

Suppose that the regularity conditions (A1)–(A2), (B1)–(B4), (C1)–(C2) and (D1)–(D2) in the Appendix hold. For any fixed j, we have the following results :

  1. convergence rate of estimated eigenvalue λˆjλj=O((log(n)/n)1/2+h12+h22+log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2])a.s.

  2. convergence rate of estimated eigenfunction supt[0,1]|φˆj(t)φj(t)|=O(h12+{log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]}1/2+h22+{log(n)[i=1nmiωi2h2+i=1nmi(mi1)ωi2]}1/2+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2])a.s.

In the following, we state in Corollary 3.2 and Corollary 3.3 the specialised results for the OBS and SUBJ schemes, respectively. For either the OBS or SUBJ scheme, convergence rate depends on the order of M¯Sk and M¯Hk relative to n and the order of bandwidth.

Corollary 3.2

Suppose that the conditions in Theorem 3.1 hold, along with two additional assumptions (C3) and (D3). Then under the OBS scheme, for any fixed j,

  1. λˆjλj=O((log(n)/n)1/2+h12+h22+(1M¯h1+M¯S2(M¯)2)log(n)n+(1M¯S2h22+M¯S3(M¯S2)2h2+M¯S4(M¯S2)2)log(n)n)a.s.

  2. supt[0,1]|φˆj(t)φj(t)|=O(h12+(1M¯h1+M¯S2(M¯)2)log(n)n+h22+(1M¯h2+M¯S2(M¯)2)log(n)n+(1M¯S2h22+M¯S3(M¯S2)2h2+M¯S4(M¯S2)2)log(n)n)a.s.

Corollary 3.3

Suppose that the conditions in Theorem 3.1 hold. Then under the SUBJ scheme, for any fixed j,

  1. λˆjλj=O((log(n)/n)1/2+h12+h22+(1M¯Hh1+1)log(n)n+(1M¯H2h22+1M¯Hh2+1)log(n)n)a.s.

  2. supt[0,1]|φˆj(t)φj(t)|=O(h12+(1M¯Hh1+1)log(n)n+h22+(1M¯Hh2+1)log(n)n+(1M¯H2h22+1M¯Hh2+1)log(n)n)a.s.

In addition, we provide below the convergence rate of the estimated variance of the measurement error under the general weighing scheme, as well as the special cases of the OBS and SUBJ schemes.

Theorem 3.4

Assume that the conditions in Theorem 3.1, and conditions (C1') and (C2') hold. Then under the general weighing framework, σˆ2σ2=O(h22+{log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]}1/2+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2]+h32+log(n)[i=1nmiωi2h3+i=1nmi(mi1)ωi2])a.s.

Corollary 3.5

Suppose that the conditions in Theorem 3.4 hold.

  1. OBS: With an additional assumption (C3), σˆ2σ2=O(h22+(1M¯h2+M¯S2(M¯)2)log(n)n+(1M¯S2h22+M¯S3(M¯S2)2h2+M¯S4(M¯S2)2)log(n)n+h32+(1M¯h3+M¯S2(M¯)2)log(n)n)a.s.

  2. SUBJ: σˆ2σ2=O(h22+(1M¯Hh2+1)log(n)n+(1M¯H2h22+1M¯Hh2+1)log(n)n+h32+(1M¯Hh3+1)log(n)n).

4. Simulation

To illustrate the theoretical results in Section 2, we now turn to the numerical performance of the estimators as sample size increases. Choices on ωi and νi that satisfy the conditions in the general weighing scheme must be made. Since the ‘OBS’ and ‘SUBJ’ cases are the two most commonly used choices, and our corollaries showed the specific results regarding these two cases, we use these two cases as examples to illustrate the theoretical results. The data are generated from the following model: Yij=Xi(Tij)+εij, where Xi(t)=μ(t)+k=12ξikφk(t), ξikN(0,λk) for k = 1, 2 and εijN(0,σ2). Let μ(t)=t+sin(t)+cos(t), φ1(t)=2cos(πt), φ2(t)=2sin(πt), set (λ1,λ2,σ2)=(0.6,0.3,0.2).

The observation times are generated in the following way. Each individual has a set of ‘scheduled’ time points, {1,2,,20}, and each scheduled time has a 20% probability of being skipped. The actual observation time is a random perturbation of a scheduled time: a uniform [0,1] random variable is added to a nonskipped scheduled time. This results in different observed time points Tij per subject.

To illustrate that the convergence rates of the estimated λˆk, φˆk and σˆ2 have the orders of magnitude shown in Section 2, let δλk=(λˆkλk)/an, where an is the derived convergence rate (e.g., under the OBS scheme, an=(log(n)/n)1/2+h12+h22+1M¯h1+M¯S2(M¯)2log(n)n+1M¯S2h22+M¯S3(M¯S2)2h2+M¯S4(M¯S2)2log(n)n), if the estimated λˆk is actually consistent with λk by this order of magnitude, the range of the term δλk should decrease or remain constant as n increases. Here we set n = 50, 75, 100, 125, 150, 175, 200 for both ‘OBS’ and ‘SUBJ’ cases, and 200 replications were done for each sample size.

The procedure is visualised in Figures . The number of K in each replication was chosen by the 90% fraction of explained variance. As shown in Figure , the ranges (under ‘OBS’ scheme) for δλk (top two panels) and δφk(t) (bottom two panels) for k = 1, 2 tend to be stable or go down, as the sample size n increases. This demonstrates that the derived order of magnitude of convergence rate in Corollary 3.2 is reasonable.

Figure 1. This is for the ‘OBS’ case. The top two panels present the values of δλ1 and δλ2 for 200 replications as n increases, respectively. The bottom two panels present the values of δφ1(t) and δφ2(t) for 200 replications as n increases, respectively.

Figure 1. This is for the ‘OBS’ case. The top two panels present the values of δλ1 and δλ2 for 200 replications as n increases, respectively. The bottom two panels present the values of δφ1(t) and δφ2(t) for 200 replications as n increases, respectively.

Figure 2. This is for the ‘SUBJ’ case. The top two panels present the values of δλ1 and δλ2 for 200 replications as n increases, respectively. The bottom two panels present the values of δφ1(t) and δφ2(t) for 200 replications as n increases, respectively.

Figure 2. This is for the ‘SUBJ’ case. The top two panels present the values of δλ1 and δλ2 for 200 replications as n increases, respectively. The bottom two panels present the values of δφ1(t) and δφ2(t) for 200 replications as n increases, respectively.

Figure 3. Plots of δσ2 for ‘OBS’ (the left panel) and ‘SUBJ’ (the right panel), respectively.

Figure 3. Plots of δσ2 for ‘OBS’ (the left panel) and ‘SUBJ’ (the right panel), respectively.

A similar phenomenon under the ‘SUBJ’ scheme can be observed in Figure , which shows the converging processes of the estimators λˆk and φˆk for k = 1, 2. Although there are some abnormal values in the estimates of eigenfunctions (shown in the left bottom and right bottom panel), generally the fluctuation range tends to be stable. And as expected, the abnormal values disappear as the sample size increases. This shows that the derived convergence rates in Corollary 3.3 is reasonable.

Furthermore, the δσ2 under the ‘OBS’ and ‘SUBJ; schemes are plotted in Figure . Observe that as the sample size increases, the range of variation of the 200 replications tends to be stable, which demonstrates the results of Corollary 3.5.

5. Discussion

It is common in functional data analysis literature that a method focuses on either dense or sparse data, while discussions about data of neither type are much less. Due to different behaviours of the two types of methods, one needs to choose properly the analysis method when dealing with real data, which is not necessarily as easy as it looks like. For example, we may face a mixture of densely and sparsely observed curves, or even it may be difficult to decide a sampling density. In this sense, methods that can handle any type of data are appreciated. And the method we consider in this article belongs to this category. Specifically, we investigate the almost sure convergence of functional principal component analysis following Zhang and Wang (Citation2016) and complement the unified theoretical framework they set up. We also note that the special case of Corollary 3.3 under SUBJ scheme is consistent with Theorem 3.6 in Li and hsing (Citation2010). The convergence rate of Corollary 3.2 under OBS scheme is better than that of Corollary 1 in Yao et al. (Citation2005), due to different techniques of proofs. We prove with asymptotic expansions of the eigenvalues and eigenfunctions of estimated covariance function (Hall and Hosseini-Nasab (Citation2006) and Hall et al. (Citation2006)) and strong uniform convergence rate of γˆ(s,t) by Lemma 5.1 in this article, which lead to a better convergence rate. It is also of great interest to establish the asymptotic distribution and optimal convergence rate of φˆj(t) under the general weighing framework, which we left for future work. Furthermore, the general weighing framework may be used in functional data regression, classification, clustering, etc., and hence the theoretical results here could be extended to those cases as well. This will also be pursued as future work.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The work was supported by National Natural Science Foundation of China (project number: 11771146, 11831008, 81530086, 11771145), the National Social Science Foundation Key Program (17ZDA091), the 111 Project (B14019) and Program of Shanghai Subject Chief Scientist (14XD1401600). This work was also supported by the China Postdoctoral Science Foundation (2018M630393). We thank Professors Yanyuan Ma and Yukun Liu for very helpful discussions.

Notes on contributors

Xingyu Yan

Xingyu Yan is currently a Ph.D. candidate in the School of Statistics at East China Normal University. He is interested in functional data analysis.

Xiaolong Pu

Xiaolong Pu is currently Professor of statistics in the School of Statistics at East China Normal University. He is interested in applied statistics, particularly in statistical testing via sampling, statistical process control, sequential analysis and reliability. He received a Ph.D. in Statistics from East China Normal University.

Yingchun Zhou

Yingchun Zhou is currently Professor of statistics in the School of Statistics at East China Normal University. She is interested in functional data analysis and biostatistics. She received a Ph.D. in Statistics from Boston University and worked as a postdoc at National Institute of Statistical Sciences before moving to East China Normal University.

Xiaolei Xun

Xiaolei Xun is currently holding a visiting position in the School of Data Science at Fudan University. She is interested in functional data analysis, Bayesian methods, modeling of high-dimensional data and biostatistics. She received a Ph.D. in Statistics from Texas A&M University and worked in Novartis for five years as biometrician and statistical methodologist before moving to Fudan University.

References

  • Fan, J. Q., & Gijbels, I. (1996). Local polynomial modelling and its applications. London: Chapman and Hall.
  • Fan, J. Q., & Zhang, W. Y. (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics, 27, 715–731. doi: 10.1111/1467-9469.00218
  • Ferraty, F., & Vieu, P. (2006). Nonparametric functional data analysis: Theory and practice. Berlin: Springer.
  • Greven, S., Crainiceanu, C. M., Caffo, B. S., & Reich, D. (2010). Longitudinal functional principal component analysis. Electronic Journal of Statistics, 4, 1022–1054. doi: 10.1214/10-EJS575
  • Hall, P., & Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. Journal of the Royal Statistical Society, Series B, 68, 109–126. doi: 10.1111/j.1467-9868.2005.00535.x
  • Hall, P., Müller, H.-G., & Wang, J.-L. (2006). Properties of principal component methods for functional and longitudinal data analysis. Annals of Statistics, 34, 1493–1517. doi: 10.1214/009053606000000272
  • James, G. M., Hastie, T. J., & Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika, 87, 587–602. doi: 10.1093/biomet/87.3.587
  • Jiang, C. R., & Wang, J. -L. (2010). Covariate adjusted functional principal components analysis for longitudinal data. Annals of Statistics, 38, 1194–1226. doi: 10.1214/09-AOS742
  • Li, Y. H., & Hsing, T. (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Annals of Statistics, 38, 3321–3351. doi: 10.1214/10-AOS813
  • Li, Y., Wang, N., & Carroll, R. J. (2013). Selecting the number of principal components in functional data. Journal of the American Statistical Association, 108, 1284–1294. doi: 10.1080/01621459.2013.788980
  • Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. New York: Springer.
  • Rice, J. A., & Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society, Series B, 53, 233–243.
  • Staniswalis, J. G., & Lee, J. J. (1998). Nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association, 93, 1403–1418. doi: 10.1080/01621459.1998.10473801
  • Yao, F., & Lee, T. C. M. (2006). Penalized spline models for functional principal component analysis. Journal of the Royal Statistical Society, Series B, 68, 3–25. doi: 10.1111/j.1467-9868.2005.00530.x
  • Yao, F., Müller, H.-G., & Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100, 577–590. doi: 10.1198/016214504000001745
  • Zhang, X. K., & Wang, J. -L. (2016). From sparse to dense functional data and beyond. Annals of Statistics, 44, 2281–2321. doi: 10.1214/16-AOS1446
  • Zhou, L., Lin, H. Z., & Liang, H (2017). Efficient estimation of the nonparametric mean and covariance functions for longitudinal and sparse functional data. Journal of the American Statistical Association. doi: 10.1080/01621459.2017.1356317.
  • Zhu, H. T., Li, R. Z., & Kong, L. L. (2012). Multivariate varying coefficient model for functional responses. Annals of Statistics, 40, 2634–2666. doi: 10.1214/12-AOS1045

Appendix

The following regularity conditions that are used to establish the asymptotic properties of the proposed estimators are imposed mainly for mathematical simplicity and may be modified as necessary. In the following, h1, h2 and h3 are bandwidths used in estimating μ(t), γ(s,t) and V(t), respectively.

A.1. Regularity conditions

Kernel function

(A1) K() is a symmetric probability density function on [1,1] and σK2=u2K(u)du<,K2=K(u)2du<. (A2) K() is Lipschitz continuous: There exists 0<L< such that |K(u)K(v)|L|uv|, for any u,v[0,1]. This implies K()MK for a constant MK.

Time points and true functions

(B1) {Tij:i=1,,n,j=1,,Ni}, are i.i.d. copies of a random variable T defined on [0,1]. The density f() of T is bounded from below and above: 0<mfmint[0,1]f(t)maxt[0,1]f(t)Mf<. Furthermore, f(2)(), the second derivative of f(), is bounded.

(B2) X is independent of T and ϵ is independent of T and U.

(B3) μ(2)(t), the second derivative of μ(t), is bounded on [0,1].

(B4) 2γ(s,t)/s2, 2γ(s,t)/st and 2γ(s,t)/t2 are bounded on [0,1]2.

Conditions for mean estimation

(C1) h10, log(n)i=1nmiωi2/h10, log(n)i=1nmi(mi1)ωi20.

(C1’) h30, log(n)i=1nmiωi2/h30, log(n)i=1nmi(mi1)ωi20.

(C2) For some α>2, Esupt[0,1]|X(t)|α<, E|ε|α< and n[i=1nmiωi2h1+i=1nmi(mi1)ωi2h12][log(n)n]2/α10. (C2') For some α>2, Esupt[0,1]|X(t)|α<, E|ε|α< and n[i=1nmiωi2h3+i=1nmi(mi1)ωi2h32][log(n)n]2/α10. (C3) supn(nmaximiωi)B<.

Conditions for covariance estimation

(D1)h20, log(n)i=1nmi(mi1)νi2/h220, log(n)i=1nmi(mi1)(mi2)νi2/h20, log(n)i=1nmi(mi1)(mi2)(mi3)νi20.

(D2) For some β>2, Esupt[0,1]|X(t)|2β<, E|ε|2β< and n[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h23+i=1nmi(mi1)(mi2)(mi3)νi2h24]×[log(n)n]2/β1. (D3) supn(nmaximi(mi1)νi)B<.

The above conditions (A1)–(A2) and (B1)–(B4) are commonly used in the literature of the functional data and longitudinal data (see, e.g. Fan Zhang (Citation2000); Zhu, Li, Kong (Citation2012)). Conditions (C1)–(C3) are used to guarantee the consistency of the mean estimators. Conditions (D1)–(D3) are used to guarantee the consistency of the covariance estimators. For the SUBJ estimators, (C3) and (D3) are automatically satisfied, similar versions of (C2) and (D2) were adopted by Li and Hsing (Citation2010). In addition, (C1') and (C2') are used to guarantee the consistency of measurement error variance estimation.

A.2. Proof

To begin, let us give some notations that will be used in the sequel. Denote Sr=i=1nωij=1miKh1(Tijt)Tijth1r,Rr=i=1nωij=1miKh1(Tijt)Tijth1rYij,Qr=i=1nωij=1miKh3(Tijt)Tijth3rYij2, where r = 0, 1, 2, and Spq=i=1nνi1jlmiKh2(Tijs)Kh2(Tilt)×Tijsh2pTilth2q and Rpq=i=1nνi1jlmiKh2(Tijs)Kh2(Tilt)×Tijsh2pTilth2qYijYil where for p, q = 0, 1, 2. For any univariate function ϕ()[0,1] and a bivariate function Ψ(,)[0,1]2, define the L2 norm by ϕ=[ϕ(t)2dt]1/2 and the Hilbert–Schmidt norm by Ψ=[Ψ(s,t)2dsdt]1/2. The domains of the integrals [0,1] are omitted unless otherwise specified.

First, we present the convergence rate of γˆ obtained in (Equation1).

Lemma A.1

Under the conditions in Appendix A.1, we have sups,t[0,1]|γˆ(s,t)γ(s,t)|=O(h12+{log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]}1/2×h22+{log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2]}1/2)a.s.

Proof.

From (Equation1), We note that sups,t[0,1]|γˆ(s,t)γ(s,t)||Gˆ(s,t)G(s,t)|+μ(s)supt[0,1]|μˆ(t)μ(t)|+μˆ(t)sups[0,1]|μˆ(s)μ(s)|. By (B3) and Theorem 5.1 in Zhang and Wang (Citation2016), we have μ(s)supt[0,1]|μˆ(t)μ(t)|,μˆ(t)sups[0,1]|μˆ(s)μ(s)|=O({log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]}1/2)a.s.. Thus the result can be derived from Theorem 5.2 in Zhang and Wang (Citation2016).

The following lemma is similar with Lemma 6 of Li and Hsing (Citation2010) and will be used in our following proof repeatedly. Let Δ be the integral operator with kernel γˆγ.

Lemma A.2

For any bounded measurable function φ on [0,1], supt[0,1]|(Δφ)(t)|=O(h12+{log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]}1/2+h22+{log(n)[i=1nmiωi2h2+i=1nmi(mi1)ωi2]}1/2+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2])a.s.

Proof.

By (Equation1), it follows that (Δφ)(t)=An1+An2, where An1=(GˆG)(s,t)φ(s)ds and An2={μ(s)μ(t)μˆ(s)μˆ(t)}φ(s)ds. With minor derivation, we have An1=(A1R00A2R10A3R01)B1φ(s)ds, where A1=S20S02S112, A2=S10S02S01S11, A3=S01S20S10S11, B=A1S00A2S10A3S01 and Rpq=RpqG(s,t)Spqh2G(s,t)/sSp+1,qh2G(s,t)/tSp,q+1 for p,q=0,1. Further, by Taylor's expansion, (A1) Rpq=i=1nνi1jlmiTijsh2p×Tilth2qKh2(Tijs)Kh2(Tilt)×[YijYilG(Tij,Tij)]+O(h22).(A1) Applying the proof of Theorem 5.2 in Zhang and Wang (Citation2016), we obtain, uniformly in s, t, (A2) Rpq=O(h22+bn/h2)a.s.(A2) With further calculation, (A3) A1=f2(s)f2(t)(σK2)2+O(h2+bn/h2)a.s.(A3) and (A4) B=f3(s)f3(t)(σK2)2+O(h2+bn/h2)a.s.,(A4) where bn={log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h23+i=1nmi(mi1)(mi2)(mi3)νi2h24]}1/2.

We focus on A1R00B1φ(s)ds since the other two terms can be dealt with similarly. Specifically, A1R00B1φ(s)ds=1f(t)i=1nωi1jlmi{YijYilG(Tij,Til)}Kh2(Tilt)×Kh2(Tijs)φ(s)f(s)1ds+O(h22+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2]).

Note that |01Kh2(Tijs)φ(s)f(s)1ds|sups[0,1](|φ(s)|f(s)1)11K(u)du. Similarly to the proof of Lemma 5 in Zhang and Wang (Citation2016), we can prove the following almost sure uniform rate: 1f(t)i=1nωi1jlmi{YijYilG(Tij,Til)}Kh2(Tilt)×Kh2(Tijs)φ(s)f(s)1ds=O({log(n)[i=1nmiωi2h2+i=1nmi(mi1)ωi2]}1/2)a.s. Thus A1R00B1φ(s)ds=O({log(n)[i=1nmiωi2h2+i=1nmi(mi1)ωi2]}1/2+h22+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2])a.s. The term An2 can be written as An2={μ(t)μˆ(t)}μ(s)φ(s)ds+μˆ(t){μ(s)μˆ(s)}φ(s)ds. Following Theorem 5.1 in Zhang and Wang (Citation2016), it is easy to see that An2=O(h12+{log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]}1/2)a.s. Hence, the lemma follows.

Proof of Theorem 3.1.

Proof of Theorem 3.1

Following Hall and Hosseini-Nasab (Citation2006) and Bessel's inequality, we have φˆjφjC(Δφj+Δ2), where Δφj=({γˆ(s,t)γ(s,t)φj(s)}2dsdt)1/2 and Δ=({γˆ(s,t)γ(s,t)}2dsdt)1/2. Lemma A.1 and Lemma A.2 lead to φˆjφj=O(h12+{log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2]}1/2+h22+{log(n)[i=1nmiωi2h2+i=1nmi(mi1)ωi2]}1/2+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2])a.s.

By (4.9) in Hall et al. (Citation2006), we have λˆjλj=Bn1+Bn2+O(Δφj2), where Bn1=(A1R00A2R10A3R01)B1φj(s)φj(t)dsdt and Bn2={μ(s)μˆ(s)}φj(s)dsμˆ(t)φj(t)dt+μ(s)φj(s)ds{μ(t)μˆ(t)}φj(t)dt. For Bn1, again it suffices to focus on A1R00B1φj(s)φj(t)dsdt. First, we have A1R00B1φj(s)φj(t)dsdt=i=1nνi1jlmi{YijYilG(Tij,Til)}×Kh1(Tijs)Kh2(Tjlt)×φj(s)φj(t){f(s)f(t)}1dsdt+O(h22+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2])a.s. By Lemma 5 in Li and Hsing (Citation2010), we have Bn1=O((log(n)/n)1/2+h22+log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h2+i=1nmi(mi1)(mi2)(mi3)νi2])a.s. Following Theorem 5.1 in Zhang and Wang (Citation2016), it can be shown similarly that Bn2=O((log(n)/n)1/2+h12+log(n)[i=1nmiωi2h1+i=1nmi(mi1)ωi2])a.s.

This completes the proof for assertion (a). For assertion (b), we have, for any t[0,1], λj|φˆj(t)φj(t)|=|λˆjφˆj(t)λjφj(t)(λˆjλj)φˆj(t)|{γˆ(s,t)γ(s,t)}φj(t)ds+γˆ(s,t){φˆj(s)φj(s)}ds+|λˆjλj||φˆj(t)|supt[0,1]|(Δφ)(t)|+O(φˆjφj)+|λˆjλj|supt[0,1]|φˆj(t)|, where the last inequality is established by Cauchy-Schwarz inequality. Based on Lemma 6 and assertion (a), assertion (b) holds.

Proof of Theorem 3.4.

Proof of Theorem 3.4

By rearranging the terms, we have σˆ2σ2={Vˆ(t)V(t)}dt{Gˆ(t)G(t)}dt.

First, we consider Vˆ(t)V(t). Similarly to (C.1) in Zhang and Wang (Citation2016), we have (A5) Vˆ(t)V(t)=Q0S2Q1S1S0S2S12,(A5) where Qq=QqV(t)SqhV(1)Sq+1 for q=0,1. By (D.1) in Zhang and Wang (Citation2016), Qq has the uniformly rate O(h32+an/h3) a.s. and we can derive that S2=f(t)σK2+O(h3+an/h3), where an={log(n)[i=1nmiωi2h3+i=1nmi(mi1)ωi2h32]}1/2. One can see that Vˆ(t)V(t)=1f(t)iωij=1miKh3(Tijt){Yij2V(Tij)}+O(h32+(an/h3)2)a.s. Thus 01Vˆ(t)V(t)dt=iωij=1mi{Yij2V(Tij)}01Kh3(Tijt)f1(t)dt+O(h32+(an/h3)2)a.s. We apply (EquationA5) but will focus on the leading term Q0S2S0S2S12, since the other term is of lower order and can be dealt with similarly. Note that |01Kh3(Tijt)f1(t)dt|suptf1(t), and by Lemma 5 in Li and Hsing (Citation2010), we have (A6) 01Vˆ(t)V(t)dt=O((log(n)/n)1/2+h32+(an/h3)2).(A6) Next, to consider Gˆ(t)G(t) we follow the similar expression (C.2) in Zhang and Wang (Citation2016). Again we focus on R00A1B1. Applying (EquationA2), we obtain, uniformly in s, t, R00=O(h22+bn/h2)

and A1B1=[f(s)f(t)]1+O(h2+bn/h2), where bn={log(n)[i=1nmi(mi1)νi2h22+i=1nmi(mi1)(mi2)νi2h23+i=1nmi(mi1)(mi2)(mi3)νi2h24]}1/2. Similarly to the proof of Theorem 3.4 in Li and Hsing (Citation2010), it can be shown that (A7) 01{Gˆ(t)G(t)}dt=O(h22+an/h2+(bn/h2)2)a.s.(A7) The theorem follows from (EquationA6) and (EquationA7).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.