1,505
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Adaptive Functional Thresholding for Sparse Covariance Function Estimation in High Dimensions

, &
Pages 1473-1485 | Received 05 May 2021, Accepted 31 Mar 2023, Published online: 26 May 2023

Abstract

Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this article, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert–Schmidt norm of functions, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage, and propose the adaptive functional thresholding estimator by incorporating the variance effects of individual entries of the sample covariance function into functional thresholding. To handle the practical scenario where curves are partially observed with errors, we also develop a nonparametric smoothing approach to obtain the smoothed adaptive functional thresholding estimator and its binned implementation to accelerate the computation. We investigate the theoretical properties of our proposals when p grows exponentially with n under both fully and partially observed functional scenarios. Finally, we demonstrate that the proposed adaptive functional thresholding estimators significantly outperform the competitors through extensive simulations and the functional connectivity analysis of two neuroimaging datasets. Supplementary materials for this article are available online.

1 Introduction

The covariance function estimation plays an important role in functional data analysis, while existing methods are restricted to data with a single or small number of random functions. Recent advances in technology have made multivariate or even high-dimensional functional datasets increasingly common in various applications: for example, time-course gene expression data in genomics (Storey et al. Citation2005), air pollution data in environmental studies (Kong et al. Citation2016) and different types of brain imaging data in neuroscience (Li and Solea Citation2018; Qiao, Guo, and James Citation2019). Under such scenarios, suppose we observe n independent samples Xi(·)={Xi1(·),,Xip(·)}T for i=1,,n defined on a compact interval U with covariance function Σ(u,v)={Σjk(u,v)}p×p=cov{Xi(u),Xi(v)},  u,vU.

From a heuristic interpretation, we can simply treat each curve Xij(·) as an infinitely long vector and replace the (j, k)th entry of Σ by Σjk(·,·)=cov{Xij(·),Xik(·)}, the cross-covariance matrix of two infinitely long vectors. Then Σ can be understood as a block matrix with infinite sizes and its (j, k)th block being Σjk(·,·). Besides being of interest in itself, an estimator of Σ is useful for many applications including, for example, multivariate functional principal components analysis (FPCA) (Happ and Greven Citation2018), multivariate functional linear regression (Chiou, Yang, and Chen Citation2016), functional factor model (Guo, Qiao, and Wang Citation2022) and functional classification (Park, Ahn, and Jeon Citation2021). See Section 2.3 for details.

Our article focuses on estimating Σ under high-dimensional scaling, where p can be comparable to, or even larger than n. In this setting, the sample covariance function Σ̂(u,v)={Σ̂jk(u,v)}p×p=1n1i=1n{Xi(u)X¯(u)}{Xi(v)X¯(v)}T, u,vU,where X¯(·)=n1i=1nXi(·), performs poorly, and some lower-dimensional structural assumptions need to be imposed to estimate Σ consistently. In contrast to extensive work on estimating high-dimensional sparse covariance matrices (Bickel and Levina Citation2008; Rothman, Levina, and Zhu Citation2009; Cai and Liu Citation2011; Chen and Leng Citation2016; Avella-Medina et al. Citation2018; Wang et al. Citation2021), research on sparse covariance function estimation in high dimensions remains largely unaddressed in the literature.

In this article, we consider estimating sparse covariance functions via adaptive functional thresholding in the sense of shrinking some blocks Σ̂jk(·,·)’s in an adaptive way. To achieve this, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage based on the Hilbert-Schmidt norm of functions, and develop an adaptive functional thresholding procedure on Σ̂(·,·) using entry-dependent functional thresholds that automatically adapt to the variability of blocks Σ̂jk(·,·)’s. To provide theoretical guarantees of our method under high-dimensional scaling, it is essential to develop standardized concentration results taking into account the variability adjustment. Compared with adaptive thresholding for nonfunctional data (Cai and Liu Citation2011), the intrinsic infinite-dimensionality of each Xij(·) leads to a substantial rise in the complexity of sparsity modeling and theoretical analysis, as one needs to rely on some functional norm of standardized Σ̂jk’s, for example, the Hilbert–Schmidt norm, to enforce the functional sparsity in Σ̂ and tackle more technical challenges for standardized processes within an abstract Hilbert space. To handle the practical scenario where functions are partially observed with errors, it is desirable to apply nonparametric smoothers in conjunction with adaptive functional thresholding. This poses a computationally intensive task especially when p is large, thus calling for the development of fast implementation strategy.

There are many applications of the proposed sparse covariance function estimation method in neuroimaging analysis, where brain signals are measured over time at a large number of regions of interest (ROIs) for individuals. Examples include the brain-computer interface classification (Lotte et al. Citation2018) and the brain functional connectivity identification (Rogers et al. Citation2007). Traditional neuroimaging analysis models brain signals for each subject as multivariate random variables, where each ROI is represented by a random variable, and hence the covariance/correlation matrices of interest are estimated by treating the time-course data of each ROI as repeated observations. However, due to the nonstationary and dynamic features of signals (Chang and Glover Citation2010), the strategy of averaging over time fails to characterize the time-varying structure leading to the loss of information in the original space. To overcome these drawbacks, we follow recent proposals to model signals directly as multivariate random functions with each ROI represented by a random function (Li and Solea Citation2018; Qiao, Guo, and James Citation2019; Zapata, Oh, and Petersen Citation2022; Lee et al. in press). The identified functional sparsity pattern in our estimate of Σ can be used to recover the functional connectivity network among different ROIs, which is illustrated using examples of functional magnetic resonance imaging (fMRI) datasets in Section 6 and Section E.3 of the supplementary material.

Our article makes useful contributions at multiple fronts. On the method side, it generalizes the thresholding/sparsity concept in multivariate statistics to the functional setting and offers a novel adaptive functional thresholding proposal to handle the heteroscedastic problem of the sparse covariance function estimation motivated from neuroimaging analysis and many statistical applications, for example, those in Section 2.3 and Section C.2 of the supplementary material. It also provides an alternative way of identifying correlation-based functional connectivity with no need to specify the correlation function, the estimation of which poses challenges as the inverses of Σjj(u,v)’s are unbounded. In practice when functions are observed with errors at either a dense grid of points or a small subset of points, we also develop a unified local linear smoothing approach to obtain the smoothed adaptive functional thresholding estimator and its fast implementation via binning (Fan and Marron Citation1994) to speed up the computation without sacrificing the estimation accuracy. On the theory side, we show that the proposed estimators enjoy the convergence and support recovery properties under both fully and partially observed functional scenarios when p grows exponentially fast relative to n. The proof relies on tools from empirical process theory due to the infinite-dimensional nature of functional data and some novel standardized concentration bounds in the Hilbert–Schmidt norm to deal with issues of high-dimensionality and variance adjustment. Our theoretical results and adopted techniques are general, and can be applied to other settings in high-dimensional functional data analysis.

The remainder of this article is organized as follows. Section 2 introduces a class of functional thresholding operators, based on which we propose the adaptive functional thresholding of the sample covariance function. We then discuss a couple of applications of the sparse covariance function estimation. Section 3 presents convergence and support recovery analysis of our proposed estimator. In Section 4, we develop a nonparametric smoothing approach and its binned implementation to deal with partially observed functional data, and then investigate its theoretical properties. In Sections 5 and 6, we demonstrate the uniform superiority of the adaptive functional thresholding estimators over the universal counterparts through an extensive set of simulation studies and the functional connectivity analysis of a neuroimaging dataset, respectively. All technical proofs are relegated to the supplementary material. We also provide the codes to reproduce the results for simulations and real data analysis in supplementary materials.

2 Methodology

2.1 Functional Thresholding

We begin by introducing some notation. Let L2(U) denotes a Hilbert space of square integrable functions defined on U and S=L2(U)L2(U), where ⊗ is the Kronecker product. For any QS, we denote its Hilbert–Schmidt norm by ||Q||S={Q(u,v)2dudv}1/2. With the aid of Hilbert–Schmidt norm, for any regularization parameter λ0, we first define a class of functional thresholding operators sλ: SS that satisfy the following conditions:

  1. ||sλ(Z)||Sc||Y||S for all Z and YS that satisfy ||ZY||Sλ and some c>0;

  2. ||sλ(Z)||S=0 for ||Z||Sλ;

  3. ||sλ(Z)Z||Sλ for all ZS.

Our proposed functional thresholding operators can be viewed as the functional generalization of thresholding operators (Cai and Liu Citation2011). Instead of a simple pointwise extension of such thresholding operators under functional domain, we advocate a global thresholding rule based on the Hilbert–Schmidt norm of functions that encourages the functional sparsity, in the sense that sλ(Z)(u,v)=0, for all u,vU, if ||Z||Sλ under condition (ii). Condition (iii) limits the amount of (global) functional shrinkage in the Hilbert–Schmidt norm to be no more than λ.

Conditions (i)–(iii) are satisfied by functional versions of some commonly adopted thresholding rules, which are introduced as solutions to the following penalized quadratic loss problem with various penalties: (1) sλ(Z)=argminθS{12||θZ||S2+pλ(θ)}(1) with pλ(θ)=p˜λ(||θ||S) being a penalty function of ||θ||S to enforce the functional sparsity.

The soft functional thresholding rule results from solving (1) with an l1/l2 type of penalty, pλ(θ)=λ||θ||S, and takes the form of sλS(Z)=Z(1λ/||Z||S)+, where (x)+=max(x,0) for xR. This rule can be viewed as a functional generalization of the group lasso solution under the multivariate setting (Yuan and Lin Citation2006). To solve (1) with an l0/l2 type of penalty, pλ(θ)=21λ2I(||θ||S0), we obtain hard functional threhsolding rule as ZI(||Z||Sλ), where I(·) is an indicator function. As a comparison, soft functional thresholding corresponds to the maximum amount of functional shrinkage allowed by condition (iii), whereas no shrinkage results from hard functional thresholding. Taking the compromise between soft and hard functional thresholding, we next propose functional versions of SCAD (Fan and Li Citation2001) and adaptive lasso (Zou Citation2006) thresholding rules. With a SCAD penalty (Fan and Li Citation2001) operating on ||·||S instead of |·| for the univariate scalar case, SCAD functional thresholding sλSC(Z) is the same as soft functional thresholding if ||Z||S<2λ, and equals Z{(a1)aλ/||Z||S}/(a2) for ||Z||S[2λ,aλ] and Z if ||Z||S>aλ, where a>2. Analogously, adaptive lasso functional thresholding rule is sλAL(Z)=Z(1λη+1/||Z||Sη+1)+ with η0.

Our proposed functional generalizations of soft, SCAD and adaptive lasso thresholding rules can be checked to satisfy conditions (i)–(iii), see Section B.1 of the supplementary material for details. To present a unified theoretical analysis, we focus on functional thresholding operators sλ(Z) satisfying conditions (i)–(iii). Note that, although the hard functional thresholding does not satisfy condition (i), theoretical results in Section 3 still hold for hard functional thresholding estimators under similar conditions with corresponding proofs differing slightly. For examples of functional data with some local spikes, one may possibly suggest supremum-norm-based class of functional thresholding operators. See the detailed discussion in Section C.1 of the supplementary material.

2.2 Estimation

We now discuss our estimation procedure based on sλ(Z). Note the variance of Σ̂jk(u,v) depends on the distribution of {Xij(u),Xik(v)} through higher-order moments, which is intrinsically a heteroscedastic problem. Hence, it is more desirable to use entry-dependent functional thresholds that automatically takes into account the variability of blocks Σ̂jk(·,·)’s to shrink some blocks to zero adaptively. To achieve this, define the variance factors Θjk(u,v)=var([Xij(u)E{Xij(u)}][Xik(v)E{Xik(v)}]) with corresponding estimators Θ̂jk(u,v)=1ni=1n[{Xij(u)X¯j(u)}{Xik(v)X¯k(v)}Σ̂jk(u,v)]2,j,k=1,,p.

Then the adaptive functional thresholding estimator Σ̂A={Σ̂jkA(·,·)}p×p is defined by (2) Σ̂jkA=Θ̂jk1/2×sλ(Σ̂jkΘ̂jk1/2),(2) which uses a single threshold level to functionally threshold standardized entries, Σ̂jk/Θ̂jk1/2 for all j, k, resulting in entry-dependent functional thresholds for Σ̂jk’s. The selection of the optimal regularization parameter λ̂ is discussed in Section 5.

An alternative approach to estimate Σ is the universal functional thresholding estimator Σ̂U={Σ̂jkU(·,·)}p×p  with  Σ̂jkU=sλ(Σ̂jk),where a universal threshold level is used for all entries. In a similar spirit to Rothman, Levina, and Zhu (Citation2009), the consistency of Σ̂U requires the assumption that marginal-covariance functions are uniformly bounded in nuclear norm, that is, maxj||Σjj||NM, where ||Σjj||N=UΣjj(u,u)du. However, intuitively, such universal method does not perform well when nuclear norms vary over a wide range, or even fails when the uniform boundedness assumption is violated. Section 5 provides some empirical evidence to support this intuition.

2.3 Applications

Many statistical problems involving multivariate functional data {Xi(·)}i=1n require estimating the covariance function Σ. Under a high-dimensional regime, the functional sparsity assumption can be imposed on Σ to facilitate its consistent sparse estimates. Here we outline three applications of our proposals for the sparse covariance function estimation.

Our first application is multivariate FPCA serving as a natural dimension reduction approach for Xi(·). With the aid of Karhunen-Loève expansion for multivariate functional data (Happ and Greven Citation2018), Xi(·) admits the following expansion (3) Xi(·)=E{Xi(·)}+l=1ξilϕl(·),  i=1,,n,(3) where the principal component scores ξil=j=1p[Xij(u)E{Xij(u)}]ϕlj(u)du and eigenfunctions ϕl(·)={ϕl1(·),, ϕlp(·)}T are attainable by the eigenanalysis of Σ. Under a large p scenario, we can adopt the proposed functional thresholding technique to obtain the sparse estimation of Σ, which guarantees the consistencies of estimated eigenvalues/eigenfunctions pairs. In Section E.1 of the supplementary material, we follow the proposal of a normalized version of multivariate FPCA in Happ and Greven (Citation2018) and use a simulated example to illustrate the superior sample performance of our functional thresholding approaches.

Our second application, multivariate functional linear regression (Chiou, Yang, and Chen Citation2016), takes the form of (4) Yi=β0+UXi(u)Tβ(u)du+ϵi,  i=1,,n,(4) where β(·)={β1(·),,βp(·)}T is p-vector of functional coefficients to be estimated. The standard three-step procedure involves performing (normalized) multivariate FPCA on Xi(·)’s based on Σ̂, then estimating the basis coefficients vector of β(·) and finally recovering the estimated functional coefficients, where details are presented in Section E.1 of the supplementary material and Chiou, Yang, and Chen (Citation2016). When p is large, we can implement our functional thresholding proposals to obtain consistent estimators of Σ and hence β. In Section E.1 of the supplementary material, we demonstrate via a simulated example the superiority of our adaptive-functional-thresholding-based estimator over its competitors.

Our third application considers another dimension reduction framework via functional factor model (Guo, Qiao, and Wang Citation2022) in the form of Xi(·)=Afi(·)+εi(·), where the common components are driven by r functional factors fi(·)={fi1(·),,fir(·)}T, the idiosyncratic components are εi(·) and ARp×r is the factor loading matrix. Denote the covariance functions of Xi(·), fi(·) and εi(·) by ΣX, Σf and Σε, respectively. Under the orthogonality of A, ΣX(u,v)ΣX(u,v)Tdudv can be decomposed as the sum of AΣf(u,v)Σf(u,v)TdudvAT and the remaining smaller order terms. Intuitively, with certain identifiable conditions, A can be recovered by carrying out an eigenanalysis of ΣX(u,v)ΣX(u,v)Tdudv. To provide a parsimonious model and enhance interpretability for near-zero loadings, we can impose subspace sparsity conditions (Vu and Lei Citation2013) on A that results in a functional sparse ΣX and hence our functional thresholding estimators become applicable. See an application of our functional thresholding technique to improve the estimation quality when fitting sparse functional factor model in Guo, Qiao, and Wang (Citation2022). See also Section C.2 of the supplementary material for other applications including functional graphical model estimation (Qiao, Guo, and James Citation2019) and multivariate functional classification.

3 Theoretical Properties

We begin with some notation. For a random variable W, define ||W||ψ=inf{c>0:E[ψ(|W|/c)]1}, where ψ:[0,)[0,) is a nondecreasing, nonzero convex function with ψ(0)=0 and the norm takes the value if no finite c exists for which E[ψ(|W|/c)]1. Denote ψk(x)=exp(xk)1 for k1. Let the packing number D(ϵ,d) be the maximal number of points that can fit in the compact interval U while maintaining a distance greater than ϵ between all points with respect to the semimetric d. We refer to Chapter 8 of Kosorok (Citation2008) for further explanations. For {Xij(u):uU,i=1,,n,j=1,,p}, define the standardized processes by Yij(u)=[Xij(u)E{Xij(u)}]/σj(u)1/2, where σj(u)=Σjj(u,u).

To present the main theorems, we need the following regularity conditions.

Condition 1

. (i) For each i and j, Yij(·) is a separable stochastic process with the semimetric dj(u,v)=||Y1j(u)Y1j(v)||ψ2 for u,vU; (ii) For some u0U, max1jp||Y1j(u0)||ψ2 is bounded.

Condition 2

. The packing numbers D(ϵ,dj)’s satisfy max1jp D(ϵ,dj)Cϵr for some constants C,r>0 and ϵ(0,1].

Condition 3

. There exists some constant τ>0 such that minj,kinfu,vUvar{Y1j(u)Y1k(v)}τ.

Condition 4

. The pair (n, p) satisfies logp/n1/40 as n and p.

Conditions 1 and 2 are standard to characterize the modulus of continuity of sub-Gaussian processes Yij(·)’s, see Chapter 8 of Kosorok (Citation2008). These conditions also imply that there exist some positive constants C0 and η such that E[exp(t||Y1j||2)]C0 for all |t|η and j with ||Y1j||={UY1j(u)2du}1/2, which plays a crucial role in our proof when applying concentration inequalities within Hilbert space. Condition 3 restricts the variances of Yij(u)Yik(v)’s to be uniformly bounded away from zero so that they can be well estimated. It also facilitates the development of some standardized concentration results. This condition precludes the case of a Brownian motion Xij(·) starting at 0 for some j. However, replacing Xij(·) with a contaminated process Xij(·)+ξij, where ξij’s are independent from a normal distribution with zero mean and a small variance and are independent of Xij(·)’s, Condition 3 is fulfilled while the cross-covariance structure in Σ remains the same in the sense of cov{Xij(u)+ξij,Xik(v)}=cov{Xij(u),Xik(v)} for kj and u,vU. Condition 4 allows the high-dimensional case, where p can diverge at some exponential rate as n increases.

We next establish the convergence rate of the adaptive functional thresholding estimator Σ̂A over a large class of “approximately sparse” covariance functions defined by C(q,s0(p),ϵ0;U)={Σ:Σ0,max1jpk=1p||σj||(1q)/2||σk||(1q)/2||Σjk||Sqs0(p),maxj||σj1||||σj||ϵ01<}for some 0q<1, where ||σj||=supuUσj(u) and Σ0 means that Σ={Σjk(·,·)}p×p is positive semidefinite, that is, j,kΣjk(u,v)aj(u)ak(v)dudv0 for any aj(·)L2(U) and j=1,,p. See Cai and Liu (Citation2011) for a similar class of covariance matrices for nonfunctional data. Compared with the class C*(q,s0(p),M;U)={Σ:Σ0,maxj||σj||NM,maxjk=1p||Σjk||Sqs0(p)}, over which the universal functional thresholding estimator Σ̂U can be shown to be consistent, the columns of a covariance function in C(q,s0(p),ϵ0;U) are required to be within a weighted lq/l2 ball instead of a standard lq/l2 ball, where the weights are determined by ||σj||’s. Unlike C*(q,s0(p),M;U), C(q,s0(p),ϵ0;U) no longer requires the uniform boundedness assumption on ||σj||N’s and allows maxj||σj||N. In the special case q = 0, C(q,s0(p),ϵ0;U) corresponds to a class of truly sparse covariance functions. Notably, s0(p) can depend on p and be regarded implicitly as the restriction on functional sparsity.

Theorem 1

. Suppose that Conditions 1–4 hold. Then there exists some constant δ>0 such that, uniformly on C(q,s0(p),ϵ0;U), if λ=δ(logp/n)1/2, (5) ||Σ̂AΣ||1=max1kpj=1p||Σ̂jkAΣjk||S=OP{s0(p)(logpn)1q2}.(5)

Theorem 1 presents the convergence result in the functional version of matrix l1 norm. The rate in (5) is consistent to those of sparse covariance matrix estimates in Rothman, Levina, and Zhu (Citation2009) and Cai and Liu (Citation2011).

We finally turn to investigate the support recovery consistency of Σ̂A over the parameter space of truly sparse covariance functions defined by C0(s0(p);U)={Σ:Σ0,max1jpk=1pI(||Σjk||S0)s0(p)},which assumes that (Σjk)p×p has at most s0(p) nonzero functional entries on each row. The following theorem shows that, with the choice of λ=δ(logp/n)1/2 for some constant δ>0, Σ̂A exactly recovers the support of Σ, supp(Σ)={(j,k):||Σjk||S0}, with probability approaching one.

Theorem 2

. Suppose that Conditions 1–4 hold and Σjk/Θjk1/2S >(2δ+γ)(logp/n)1/2 for all (j,k)supp(Σ) and some γ>0, where δ is stated in Theorem 1. Then we have that infΣC0P{supp(Σ̂A)=supp(Σ)}1 as n.

Theorem 2 ensures that Σ̂A achieves the exact recovery of functional sparsity structure in Σ, that is, the graph support in functional connectivity analysis, with probability tending to 1. This theorem holds under the condition that the Hilbert-Schmidt norms of nonzero standardized functional entries exceed a certain threshold, which ensures that nonzero components are correctly retained. See an analogous minimum signal strength condition for sparse covariance matrices in Cai and Liu (Citation2011).

4 Partially Observed Functional Data

In this section we consider a practical scenario where each Xij(·) is partially observed, with errors, at random measurement locations Uij1,,UijLijU. Let Zijl be the observed value of Xij(Uijl). Then (6) Zijl=Xij(Uijl)+εijl,  l=1,,Lij,(6) where εijl’s are iid errors with E(εijl)=0 and var(εijl)=σ2, independent of Xij(·). For dense measurement designs all Lij’s are larger than some order of n, while for sparse designs all Lij’s are bounded (Zhang and Wang Citation2016; Qiao et al. Citation2020).

4.1 Estimation Procedure

Based on the observed data, {(Uijl,Zijl)}1in,1jp,1lLij, we next present a unified estimation procedure that handles both densely and sparsely sampled functional data.

We first develop a nonparametric smoothing approach to estimate Σjk(u,v)’s. Without loss of generality, we assume that Xi(·) has been centered to have mean zero. Denote Kh(·)=h1K(·/h) for a univariate kernel function K with a bandwidth h>0.A Local Linear Surface smoother (LLS) is employed to estimate cross-covariance functions Σjk(u,v) (jk) by minimizing (7) i=1nl=1Lijm=1Lik{ZijlZikmα0α1(Uijlu)α2(Uikmv)}2    KhC(Uijlu)KhC(Uikmv),(7) with respect to (α0,α1,α2). Let the minimizer of (7) be (α̂0,α̂1,α̂2) and the resulting estimator is Σ˜jk(u,v)=α̂0. To estimate marginal-covariance functions Σjj(u,v)’s, we observe that cov(Zijl,Zijm)=Σjj(Uijl,Uijm)+σ2I(l=m), and hence apply a LLS to the off-diagonals of the raw covariances (ZijlZijm)1lmLij. We consider minimizing i=1n1lmLij{ZijlZijmβ0β1(Uijlu)β2(Uikmv)}2    KhM(Uijlu)KhM(Uikmv) with respect to (β0,β1,β2), thus obtaining the estimate Σ˜jj(u,v)=β̂0. Note that we drop subscripts j, k of hC,jk and j of hM,j to simplify our notation in this section. However, we select different bandwidths hC,jk and hM,j across j,k=1,,p in our empirical studies.

To construct the corresponding adaptive functional thresholding estimator, a standard approach is to incorporate the variance effect of each Σ˜jk(u,v) into functional thresholding. However, the estimation of var{Σ˜jk(u,v)}’s involves estimating multiple complicated fourth moment terms (Zhang and Wang Citation2016), which results in high computational burden especially for large p. Since our focus is on characterizing the main variability of Σ˜jk(u,v) rather than estimating its variance precisely, we next develop a computationally simple yet effective approach to estimate the main terms in the asymptotic variance of Σ˜jk(u,v). For a,b=0,1,2, let (8) Tab,ijk(u,v)=l=1Lijm=1Likgab{hC,(u,v),(Uijl,Uikm)}ZijlZikm,(8) where gab{h,(u,v),(Uijl,Uikm)}=Kh(Uijlu)Kh(Uikmv) (Uijlu)a(Uikmv)b. According to Section D.1 of the supplementary material, minimizing (7) yields the resulting estimator (9) Σ˜jk=i=1n(W1,jkT00,ijk+W2,jkT10,ijk+W3,jkT01,ijk),(9) where W1,jk,W2,jk,W3,jk can be represented via (S.12) in terms of (10) Sab,jk(u,v)=i=1nl=1Lijm=1Likgab{hC,(u,v),(Uijl,Uikm)},a,b=0,1,2.(10)

It is notable that the estimator Σ˜jk in (9) is expressed as the sum of n independent terms. Ignoring the cross-covariances among observations within the subject that are dominated by the corresponding variances, we propose a surrogate estimator for the asymptotic variance of Σ˜jk by (11) Ψ˜jk=Ijki=1n(W1,jkV00,ijk+W2,jkV10,ijk+W3,jkV01,ijk)2,(11) where (12) Ijk=(i=1nLijLik)2{i=1n(LijLikhC2+Lij2LikhC1 +LijLik2hC1+Lij2Lik2)}1,(12) (13) Vab,ijk(u,v)=l=1Lijm=1Likgab{hC,(u,v),(Uijl,Uikm)}{ZijlZikmΣ˜jk(u,v)}.(13)

The rationale of multiplying the rate Ijk in (11) is to ensure that Ψ˜jk(u,v) converges to some finite function when n and hC0 as justified in Section D.4 of the supplementary material. In particular, the rate Ijk can be simplified to i=1nLijLikhC2 for the sparse or moderately dense case and to (i=1nLijLik)2(i=1nLij2Lik2)1 for the very dense case. Note that Ijk is imposed in (11) mainly for the theoretical purpose and hence will not place a practical constraint on our method.

In a similar procedure as above, the estimated variance factor Ψ˜jj of Σ˜jj for each j can be obtained by operating on {ZijlZijm}1in,1lmLij instead of {ZijlZikm}1in,1lLij,1mLik for jk. Substituting Θ̂jk in (2) by Ψ˜jk, we obtain the smoothed adaptive functional thresholding estimator (14) Σ˜A=(Σ˜jkA)p×p  with  Σ˜jkA=Ψ˜jk1/2×sλ(Σ˜jkΨ˜jk1/2).(14)

For comparison, we also define the smoothed universal functional thresholding estimator as Σ˜U=(Σ˜jkU)p×p with Σ˜jkU=sλ(Σ˜jk).

A natural alternative to the proposed LLS-based smoothing procedure considers pre-smoothing each individual data. For densely sampled functional data, the observations Zij1,,ZijLij for each i and j can be pre-smoothed through the local linear smoother to eliminate the contaminated noise, thus, producing reconstructed random curves X̂ij(·)’s before subsequent analysis (Zhang and Chen Citation2007). See detailed implementation of pre-smoothing in Section D.2 of the supplementary material. For sparsely sampled functional data, such pre-smoothing step is not viable, while our smoothing proposal builds strength across functions by incorporating information from all the observations, and hence is still applicable. See also Section 5.3 for the numerical comparison between pre-smoothing and our smoothing approach under different measurement designs.

4.2 Theoretical Properties

In this section, we investigate the theoretical properties of Σ˜A for partially observed functional data. We begin by introducing some notation. For two positive sequences {an} and {bn}, we write anbn if there exits a positive constant c0 such that an/bnc0. We write anbn if and only if anbn and bnan hold simultaneously. Before presenting the theory, we impose the following regularity conditions.

Condition 5

. (i) Let {Uijl:i=1,,n,j1,,p,l=1,,Lij} be iid copies of a random variable U with density fU(·) defined on the compact set U, with the Lij’s fixed. There exist some constants mf and Mf such that 0<mfinfUfU(u)supUfU(u)Mf<; (ii) Xij, εijl and Uijl are independent for each i,j,l.

Condition 6

. (i) Under the sparse measurement design, LijL0< for all i, j and, under the dense design, Lij=L as n with Uijl’s independent of i; (ii) The bandwidth parameters hChMh0 as n.

Condition 5 is standard in functional data analysis literature (Zhang and Wang Citation2016). Condition 6 (i) treats the number of measurement locations Lij as bounded and diverging under sparse and dense measurement designs, respectively. To simplify notation, we assume that Lij=L for the dense case and hC is of the same order as hM in Condition 6 (ii).

Condition 7

. There exists some constant γ1(0,1/2] such that (15) max1j,kpΣ˜jkΣjkSlogpn2γ1+h2  with  probability  approaching  one.(15)

Condition 8

. There exist some positive constants c1, γ2(0,1/2] and some deterministic functions Ψjk(u,v)’s with minj,kinfu,vUΨjk(u,v)c1 such that (16) max1j,kpsupu,vU|Ψ˜jk(u,v)Ψjk(u,v)|logpn2γ2+h2  with  probability  approaching  one.(16)

Condition 9

. The pair (n, p) satisfies logp/nmin(γ1,γ2)0 and logpc2n2γ1h4 for some positive constant c2 as n and p.

We follow Qiao et al. (Citation2020) to impose Condition 7, in which the parameter γ1 depends on h and possibly L under the dense design. This condition is satisfied if there exist some positive constants c3,c4,c5 such that for each j,k=1,,p and t(0,1], (17) P(||Σ˜jkΣjk||St+c5h2)c4exp(c3n2γ1t2).(17)

The presence of h2 comes from the standard results for bias terms under the boundedness condition for the second-order partial derivatives of Σjk(u,v) over U2 (Yao, Müller, and Wang Citation2005; Zhang and Wang Citation2016). This concentration result is fulfilled under different measurement schedules ranging from sparse to dense designs as γ1 increases. For sparsely sampled functional data, Lemma 4 of Qiao et al. (Citation2020) established L2 concentration inequality for Σ˜jk for j=k, which not only results in the same L2 rate as that in the sparse case (Zhang and Wang Citation2016) but also ensures (17) with the choice of γ1=1/2a and hna for some positive constant a<1/2. Following the same proof procedure, the same concentration inequality also applies for jk and hence Condition 7 is satisfied. This condition is also satisfied by densely sampled functional data, since it follows from Lemma 5 of Qiao et al. (Citation2020) that (17) holds for j = k and, with more efforts, also for jk by choosing γ1=min(1/2,1/3+b/6ϵ/22a/3) for some small constant ϵ>0 when hna and Lnb for some constants a,b>0. As L grows sufficiently large, γ1=1/2, thus leading to the same rate as that in the ultra-dense case (Zhang and Wang Citation2016). Condition 8 gives the uniform convergence rate for Ψ˜jk(u,v) in the same form as (15) but with different parameter γ2.A denser measurement design corresponds to a larger value of γ2 and a faster rate in (16). See the heuristic verification of Condition 8 in Section D.4 of the supplementary material. Condition 9 indicates that p can grow exponentially fast relative to n.

We next present the convergence rate of the smoothed adaptive functional thresholding estimator Σ˜A over a class of “approximate sparse” covariance functions defined by C˜(q,s˜0(p),ϵ0;U)={Σ:Σ0,max1jpk=1p||Ψjk||(1q)/2ΣjkSqs˜0(p),maxj,k||Ψjk1||||Ψjk||ϵ01<},for some 0q<1.

Theorem 3.

Suppose that Conditions 5–9 hold. Then there exists some constants δ˜>0 such that, uniformly on C˜(q,s˜0(p),ϵ0;U), if λ=δ˜(logp/n2γ1)1/2, (18) ||Σ˜AΣ||1=max1kpj=1p||Σ˜jkAΣjk||S=OP{s˜0(p)(logpn2γ1)1q2}.(18)

The convergence rate of Σ˜A in (18) is governed by internal parameters (γ1,q) and other dimensionality parameters. Larger values of γ1 correspond to a more frequent measurement schedule with larger L and result in a faster rate. The convergence result implicitly reveals interesting phase transition phenomena depending on the relative order of L to n. As L grows fast enough, γ1=1/2 and the rate is consistent to that for fully observed functional data in (5), presenting that the theory for very densely sampled functional data falls in the parametric paradigm. As L grows moderately fast, γ1<1/2 and the rate is faster than that for sparsely sampled functional data but slower than the parametric rate.

We finally present Theorem 4 that guarantees the support recovery consistency of Σ˜A.

Theorem 4

. Suppose that Conditions 5–9 hold and Σjk/Ψjk1/2S >(2δ˜+γ˜)(logp/n2γ1)1/2 for all (j,k)supp(Σ) and some γ˜>0, where δ˜ is stated in Theorem 3, then infΣC0P{supp(Σ˜A)=supp(Σ)}1 as n.

4.3 Fast Computation

Consider a common situation in practice, where, for each i=1,,n, we observe the noisy versions of Xi1(·),,Xip(·) at the same set of points, Ui1,,UiLiU, across j=1,,p. Then the original model in (6) is simplified to (19) Zijl=Xij(Uil)+εijl,  l=1,,Li,(19) under which the proposed estimation procedure in Section 4.1 can still be applied. Suppose that the estimated covariance function is evaluated at a grid of R × R locations, {(ur1,ur2)U2:r1,r2=1,,R}. To serve the estimation of p(p+1)/2 marginal- and cross-covariance functions and the corresponding variance factors, LLSs under the simplified model in (19) reduce the number of kernel evaluations from O(i=1nj=1pLijR) to O(i=1nLiR), which substantially accelerate the computation under a high-dimensional regime.

Apparently, such nonparametric smoothing approach is conceptually simple but suffers from high computational cost in kernel evaluations. To further reduce the computational burden, we consider fast implementations of LLSs by adopting a simple approximation technique, known as linear binning (Fan and Marron Citation1994), to the covariance function estimation. The key idea of the binning method is to greatly reduce the number of kernel evaluations through the fact that many of these evaluations are nearly the same. We start by dividing U into an equally spaced grid of R points, u1<<uRU, with binwidth Δ=u2u1. Denote by wr(Uil)=max(1Δ1|Uilur|,0) the linear weight that Uil assigns to the grid point ur for r=1,,R. For the ith subject, we define its “binned weighted counts” and “binned weighted averages” as ϖr,i=l=1Liwr(Uil)andDr,ij=l=1Liwr(Uil)Zijl,respectively. The binned implementation of smoothed adaptive functional thresholding can then be done using this modified dataset {(ϖr,i,Dr,ij)}1in,1jp,1rR and related kernel functions gab{h,(u,v),(ur1,ur2)} for r1,r2=1,,R. It is notable that, with the help of such binned implementation, the number of kernel evaluations required in the covariance function estimation is further reduced from O(i=1nLiR) to O(R), while only O(i=1nLi) additional operations are involved for each j in the binning step (Fan and Marron Citation1994).

We next illustrate the binned implementation of LLS, denoted as BinLLS, using the example of smoothed estimates Σ˜jk for jk in (9). Under Model (19), we drop subscripts j, k in W1,jk, W2,jk, W3,jk, and Sab,jk due to the same set of points {Ui1,,UiLi} across j,k. Denote the binned approximations of Tab,ijk and Sab by Tˇab,ijk and Sˇab, respectively. It follows from (8) and (10) that Tˇab,ijk(u,v)=r1=1Rr2=1Rgab{hC,(u,v),(ur1,ur2)}Dr1,ijDr2,ik,Sˇab(u,v)=i=1nr1=1Rr2=1Rgab{hC,(u,v),(ur1,ur2)}ϖr1,iϖr2,i,both of which together with (9) yield the binned approximation of Σ˜jk as Σˇjk=i=1n(Wˇ1Tˇ00,ijk+Wˇ2Tˇ10,ijk+Wˇ3Tˇ01,ijk), where Wˇ1,Wˇ2, and Wˇ3 are the binned approximations of W1, W2, and W3, computed by replacing the related Sab’s in (S.12) of the supplementary material with the Sˇab’s. It is worth noting that, for each pair (j,k), the above binned implementation reduces the number of operations (i.e., additions and multiplications) from O(R2i=1nLi2) to O(nR2+R4), since the kernel evaluations in gab{hC,(u,v),(ur1,ur2)} no longer depend on individual observations. presents the computational complexity analysis of LLS and BinLLS under Models (6) and (19). It reveals that the binned implementation dramatically improves the computational speed for both densely and sparsely sampled functional data, which is also supported by the empirical evidence in Section 5.3.

Table 1 The computational complexity analysis of LLS, BinLLS under Models (6), (19) when evaluating the corresponding smoothed covariance function estimates at a grid of R × R points.

To aid the binned implementation of the smoothed adaptive functional thresholding estimator, we then derive the binned approximation of the variance factor Ψ˜jk, denoted by Ψˇjk. It follows from (13) that Vab,ijk can be approximated by Vˇab,ijk(u,v)=r1=1Rr2=1Rgab(hC,(u,v),(ur1,ur2)){Dr1,ijDr2,ikΣˇjk(u,v)ϖr1,iϖr2,i}.

Substituting each term in (11) with its binned approximation, we obtain that Ψˇjk=Ijki=1n(Wˇ1Vˇ00,ijk+Wˇ2Vˇ10,ijk+Wˇ3Vˇ01,ijk)2.

It is worth mentioning that, when j=k, the binned approximations of Σ˜jj and Ψ˜jj can be computed in a similar fashion except that the terms corresponding to r1 = r2 should be excluded from all double summations over {1,,R}2. Finally, we obtain the binned adaptive functional thresholding estimator ΣˇA=(ΣˇjkA)p×p with ΣˇjkA=Ψˇjk1/2×sλ(Σˇjk/Ψˇjk1/2) and the corresponding universal thresholding estimator ΣˇU=(ΣˇjkU)p×p with ΣˇjkU=sλ(Σˇjk).

5 Simulations

5.1 Setup

We conduct a number of simulations to compare adaptive functional thresholding estimators to universal functional thresholding estimators. Sections 5.2 and 5.3 consider scenarios where random functions are fully and partially observed, respectively.

In each scenario, to mimic the infinite-dimensionality of random curves, we generate functional variables by Xij(u)=s(u)Tθij for i=1,,n,j=1,,p and uU=[0,1], where s(u) is a 50-dimensional Fourier basis function and θi=(θi1T,,θipT)TR50p is generated from a mean zero multivariate Gaussian distribution with block covariance matrix ΩR50p×50p, whose (j, k)th block is ΩjkR50×50 for j,k=1,,p. The functional sparsity pattern in Σ={Σjk(·,·)}p×p with its (j, k)th entry Σjk(u,v)=s(u)TΩjks(v) can be characterized by the block sparsity structure in Ω. Define Ωjk=ωjkD with D=diag(12,,502) and hence cov(θijk,θijk)k2I(k=k) for k,k=1,,50. Then we generate Ω with different block sparsity patterns as follows.

  • Model 1 (block banded). For j,k=1,,p/2,ωjk=(1|jk|/10)+. For j,k=p/2+1,,p,ωjk=4I(j=k).

  • Model 2 (block sparse without any special structure). For j,k=p/2+1,,p,ωjk=4I(j=k). For j,k=1,,p/2, we generate ω=(ωjk)p/2×p/2=B+δIp/2, where elements of B are sampled independently from Uniform[0.3,0.8] with probability 0.2 or 0 with probability 0.8, and δ=max{λmin(B),0}+0.01 to guarantee the positive definiteness of Ω.

We implement a cross-validation approach (Bickel and Levina Citation2008) for choosing the optimal thresholding parameter λ̂ in Σ̂A. Specifically, we randomly divide the sample {Xi:i=1,,n} into two subsamples of size n1 and n2, where n1=n(11/logn) and n2=n/logn and repeat this N times. Let Σ̂A,1(ν)(λ) and Σ̂S,2(ν) be the adaptive functional thresholding estimator as a function of λ and the sample covariance function based on n1 and n2 observations, respectively, from the νth split. We select the optimal λ̂ by minimizing err̂(λ)=N1ν=1N||Σ̂A,1(ν)(λ)Σ̂S,2(ν)||F2,where ||·||F denotes the functional version of Frobenius norm, that is, for any Q={Qjk(·,·)}p×p with each QjkS, ||Q||F=(j,k||Qjk||S2)1/2. The optimal thresholding parameters in Σ̂U, Σ˜A,Σ˜U,ΣˇA,ΣˇU can be selected in a similar fashion.

5.2 Fully Observed Functional Data

We compare the adaptive functional thresholding estimator Σ̂A to the universal functional thresholding estimator Σ̂U under hard, soft, SCAD (with a = 3.7) and adaptive lasso (with η = 3) functional thresholding rules, where the corresponding λ̂’s are selected by the cross-validation with N=5. We generate n = 100 observations for p=50,100,150 and replicate each simulation 100 times. We examine the performance of all competing approaches by estimation and support recovery accuracies. In terms of the estimation accuracy, reports numerical summaries of losses measured by functional versions of Frobenius and matrix l1 norms. To assess the support recovery consistency, we present in the average of true positive rates (TPRs) and false positive rates (FPRs), defined as TPR=#{(j,k):||Σ̂jk||S0 and ||Σjk||S0}/#{(j,k):||Σjk||S0} and FPR=#{(j,k):||Σ̂jk||S0 and ||Σjk||S=0}/#{(j,k):||Σjk||S=0}. Since the results under Models 1 and 2 have similar trends, we only present the numerical results under Model 2 here to save space. See Tables 9 and 10 of the supplementary material for results under Model 1.

Table 2 The average (standard error) functional matrix losses over 100 simulation runs.

Table 3 The average TPRs/FPRs over 100 simulation runs.

Several conclusions can be drawn from and 9–10 of the supplementary material. First, in all scenarios, Σ̂A provides substantially improved accuracy over Σ̂U regardless of the thresholding rule or the loss used. We also obtain the sample covariance function Σ̂S, the results of which deteriorate severely compared with Σ̂A and Σ̂U. Second, for support recovery, again Σ̂A uniformly outperforms Σ̂U, which fails to recover the functional sparsity pattern especially when p is large. Third, the adaptive functional thresholding approach using the hard and the adaptive lasso functional thresholding rules tends to have lower losses and lower TPRs/FPRs than that using the soft and the SCAD functional thresholding rules.

5.3 Partially Observed Functional Data

In this section, we assess the finite-sample performance of LLS and BinLLS methods to handle partially observed functional data. We first generate random functions Xij(·) for i=1,,n,j=1,,p by the same procedure as in Section 5.1 with either nonsparse or sparse Σ depending on p. We then generate the observed values Zijl from (19), where the measurement locations Uil and errors εijl are sampled independently from Uniform[0,1] and N(0,0.52), respectively. We consider settings of n = 100 and Li=11,21,51,101, changing from sparse to moderately dense to very dense measurement schedules. We use the Gaussian kernel with the optimal bandwidths proportional to n1/6, (nLi2)1/6 and n1/4, respectively, as suggested in Zhang and Wang (Citation2016), so for the empirical work in this article we choose the proportionality constants in the range (0,1], which gives good results in all settings we consider.

To compare BinLLS with LLS in terms of the computational speed and estimation accuracy, we first consider a low-dimensional example p = 6 with nonsparse Σ generated by modifying Model 1 with ωjk=(1|jk|/10)+ for j,k=1,,6. In addition to our proposed smoothing methods, we also implement local-linear-smoother-based pre-smoothing and its binned implementation, denoted as LLS-P and BinLLS-P, respectively. reports numerical summaries of estimation errors evaluated at R = 21 equally spaced points in [0,1] and the corresponding CPU time on the processor Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz. The results for the sample covariance function Σ̂S based on fully observed X1(·),,Xn(·) are also provided as the baseline for comparison. Note that, LLS is too slow to implement for the case Li=101, so we do not report its result here.

Table 4 The average (standard error) functional matrix losses and average CPU time for p = 6 over 100 simulation runs.

A few trends are observable from . First, the binned implementations (BinLLS and BinLLS-P) attain similar or even lower estimation errors compared with their direct implementations (LLS and LLS-P) under all scenarios, while resulting in considerably faster computational speeds especially under dense designs. For example, BinLLS runs over 400 times faster than LLS when Li=51. Second, all methods provide higher estimation accuracies as Li increases, and enjoy similar performance when functions are very densely observed, for example, Li = 51 and 101, compared with the fully observed functional case. However, the performance of LLS-P and BinLLS-P deteriorates severely under sparse designs, for example, Li = 11 and 21, since limited information is available from a small number of observations per subject. Among all competitors, we conclude that BinLLS is overall a unified approach that can handle both sparsely and densely sampled functional data well with increased computational efficiency and guaranteed estimation accuracy.

We next examine the performance of BinLLS-based adaptive and universal functional thresholding estimators in terms of estimation accuracy and support recovery consistency using the same performance measures as in . and Tables 11–14 of the supplementary material report numerical results for settings of p = 50 and 100 satisfying Models 1 and 2 under different measurement schedules. We observe a few apparent patterns from and 11–14. First, ΣˇA substantially outperforms ΣˇU with significantly lower estimation errors in all settings. Second, ΣˇA works consistently well in recovering the functional sparsity structures especially under the soft and SCAD functional thresholding rules, while ΣˇU fails to identify such patterns. Third, the estimation and support recovery consistencies of ΣˇA and ΣˇU are improved as Li increases. When curves are very densely observed, for example, Li=101, we observe that both estimators enjoy similar performance with Σ̂A and Σ̂U in and 9–10 of the supplementary material. Such observation provides empirical evidence to support our remark for Theorem 3 about the same convergence rate between very densely observed and fully observed functional scenarios.

Table 5 The average (standard error) functional matrix losses for partially observed functional scenarios and p = 50 over 100 simulation runs.

Table 6 The average TPRs/FPRs for partially observed functional scenarios and p = 50 over 100 simulation runs.

6 Real Data

In this section, we aim to investigate the association between the brain functional connectivity and fluid intelligence (gF), the capacity to solve problems independently of acquired knowledge (Cattell Citation1987). The dataset contains subjects of resting-state fMRI scans and the corresponding gF scores, measured by the 24-item Raven’s Progressive Matrices, from the Human Connectome Project (HCP). We follow many recent proposals based on HCP by modeling signals as multivariate random functions with each region of interest (ROI) representing one random function (Lee et al. in press; Miao, Zhang, and Wong in press; Zapata, Oh, and Petersen Citation2022). We focus our analysis on nlow=73 subjects with intelligence scores gF8 and nhigh=85 subjects with gF23, and consider p = 83 ROIs of three generally acknowledged modules in neuroscience study (Finn et al. Citation2015): the medial frontal (29 ROIs), frontoparietal (34 ROIs) and default mode modules (20 ROIs). For each subject, the BOLD signals at each ROI are collected every 0.72 sec for a total of L = 1200 measurement locations (14.4 min). We first implement the ICA-FIX preprocessed pipeline (Glasser et al. Citation2013) and a standard band-pass filter at [0.01,0.08] Hz to exclude frequency bands not implicated in resting state functional connectivity (Biswal et al. Citation1995). Figure 12 of the supplementary material displays examplified trajectories of pre-smoothed data. The adaptive functional thresholding method is then adopted to estimate the sparse covariance function and therefore the brain networks.

The sparsity structures in Σ̂A for both groups are displayed in . With λ̂ selected by the cross-validation, the network associated with Σ̂A for subjects with gF23 is more densely connected than that with gF8, as evident from . We further set the sparsity level to 70% and 85%, and present the corresponding sparsity patterns in . The results clearly indicate the existence of three diagonal blocks under all sparsity levels, complying with the identification of the medial frontal, frontoparietal and default mode modules in Finn et al. (Citation2015). We also implement the universal functional thresholding method. However, compared with Σ̂A, the results of Σ̂U suffer from the heteroscedasticity, as demonstrated in Section 5 and Section E.3 of the supplementary material, and fail to detect any noticeable block structure, hence, we choose not to report them here. To explore the impact of gF on the functional connectivity, we compute the connectivity strength using the standardized form ||Σ̂jkA||S/(||Σ̂jjA||S||Σ̂kkA||S)1/2 for j,k=1,p. Interestingly, we observe from that subjects with gF23 tend to have enhanced brain connectivity in the medial frontal and frontoparietal modules, while the connectivity strength in the default mode module declines. This agrees with existing neuroscience literature reporting a strong positive association between intelligence score and the medial frontal/frontoparietal functional connectivity in the resting state (Van Den Heuvel et al. Citation2009; Finn et al. Citation2015), and lends support to the conclusion that lower default mode module activity is associated with better cognitive performance (Anticevic et al. Citation2012). See also Section E.3 of the supplementary material, in which we illustrate our adaptive functional thresholding estimation using another ADHD dataset.

Fig. 1 Estimated sparsity structures in Σ̂A using soft functional thresholding rule at fluid intelligence gF8 and gF23: (a)–(b) with the corresponding λ̂ selected by 5-fold cross-validation; (c)–(f) with the estimated functional sparsity levels set at 70% and 85%.

Fig. 1 Estimated sparsity structures in Σ̂A using soft functional thresholding rule at fluid intelligence gF≤8 and gF≥23: (a)–(b) with the corresponding λ̂ selected by 5-fold cross-validation; (c)–(f) with the estimated functional sparsity levels set at 70% and 85%.

Fig. 2 The connectivity strengths in at fluid intelligence gF8 and gF23. Salmon, orange and yellow nodes represent the ROIs in the medial frontal, frontoparietal and default mode modules, respectively. The edge color from cyan to blue corresponds to the value of ||Σ̂jkA||S/(||Σ̂jjA||S||Σ̂kkA||S)1/2 from small to large.

Fig. 2 The connectivity strengths in Figure 1(e)–(f) at fluid intelligence gF≤8 and gF≥23. Salmon, orange and yellow nodes represent the ROIs in the medial frontal, frontoparietal and default mode modules, respectively. The edge color from cyan to blue corresponds to the value of ||Σ̂jkA||S/(||Σ̂jjA||S||Σ̂kkA||S)1/2 from small to large.

Supplementary Materials

The supplementary materials contain all the technical proofs, further methodological derivations and additional discussion and empirical results. We also provide the codes and datasets in Sections 5 and 6 in the supplementary materials.

Supplemental material

Supplemental Material

Download Zip (133.7 MB)

Acknowledgments

We are grateful to the editor, the associate editor and two referees for their insightful comments and suggestions, which have led to significant improvement of our article.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

Shaojun Guo was partially supported by the National Natural Science Foundation of China (No. 11771447).

References

  • Anticevic, A., Cole, M. W., Murray, J. D., Corlett, P. R., Wang, X.-J., and Krystal, J. H. (2012), “The Role of Default Network Deactivation in Cognition and Disease,” Trends in Cognitive Sciences, 16, 584–592. DOI: 10.1016/j.tics.2012.10.008.
  • Avella-Medina, M., Battey, H. S., Fan, J., and Li, Q. (2018), “Robust estimation of High-Dimensional Covariance and Precision Matrices,” Biometrika, 105, 271–284. DOI: 10.1093/biomet/asy011.
  • Bickel, P. J., and Levina, E. (2008), “Covariance Regularization by Thresholding,” The Annals of Statistics, 36, 2577–2604. DOI: 10.1214/08-AOS600.
  • Biswal, B., Zerrin Yetkin, F., Haughton, V. M., and Hyde, J. S. (1995), “Functional Connectivity in the Motor Cortex of Resting Human Brain Using Echo-Planar MRI,” Magnetic Resonance in Medicine, 34, 537–541. DOI: 10.1002/mrm.1910340409.
  • Cai, T., and Liu, W. (2011), “Adaptive Thresholding for Sparse Covariance Matrix Estimation,” Journal of the American Statistical Association, 106, 672–684. DOI: 10.1198/jasa.2011.tm10560.
  • Cattell, R. B. (1987), Intelligence: Its Structure, Growth and Action, Amsterdam: Elsevier.
  • Chang, C., and Glover, G. H. (2010), “Time–Frequency Dynamics of Resting-State Brain Connectivity Measured with fMRI,” Neuroimage, 50, 81–98. DOI: 10.1016/j.neuroimage.2009.12.011.
  • Chen, Z., and Leng, C. (2016), “Dynamic Covariance Models,” Journal of the American Statistical Association, 111, 1196–1207. DOI: 10.1080/01621459.2015.1077712.
  • Chiou, J.-M., Yang, Y.-F., and Chen, Y.-T. (2016), “Multivariate Functional Linear Regression and Prediction,” Journal of Multivariate Analysis, 146, 301–312. DOI: 10.1016/j.jmva.2015.10.003.
  • Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360. DOI: 10.1198/016214501753382273.
  • Fan, J., and Marron, J. S. (1994), “Fast Implementations of Nonparametric Curve Estimators,” Journal of Computational and Graphical Statistics, 3, 35–56. DOI: 10.2307/1390794.
  • Finn, E. S., Shen, X., Scheinost, D., Rosenberg, M. D., Huang, J., Chun, M. M., Papademetris, X., and Constable, R. T. (2015), “Functional Connectome Fingerprinting: Identifying Individuals Using Patterns of Brain Connectivity,” Nature Neuroscience, 18, 1664–1671. DOI: 10.1038/nn.4135.
  • Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R. et al. (2013), “The Minimal Preprocessing Pipelines for the Human Connectome Project,” Neuroimage, 80, 105–124. DOI: 10.1016/j.neuroimage.2013.04.127.
  • Guo, S., Qiao, X., and Wang, Q. (2022), “Factor Modelling for High-Dimensional Functional Time Series,” arXiv:2112.13651v2 .
  • Happ, C., and Greven, S. (2018), “Multivariate Functional Principal Component Analysis for Data Observed on Different (dimensional) Domains,” Journal of the American Statistical Association, 113, 649–659. DOI: 10.1080/01621459.2016.1273115.
  • Kong, D., Xue, K., Yao, F., and Zhang, H. H. (2016), “Partially Functional Linear Regression in High Dimensions,” Biometrika, 103, 147–159. DOI: 10.1093/biomet/asv062.
  • Kosorok, M. R. (2008), Introduction to Empirical Processes and Semiparametric Inference, Springer Series in Statistics, New York: Springer.
  • Lee, K.-Y., Ji, D., Li, L., Constable, T., and Zhao, H. (in press), “Conditional Functional Graphical Models,” Journal of the American Statistical Association, DOI: 10.1080/01621459.2021.1924178.
  • Li, B., and Solea, E. (2018), “A Nonparametric Graphical Model for Functional Data with Application to Brain Networks based on fMRI,” Journal of the American Statistical Association, 113, 1637–1655. DOI: 10.1080/01621459.2017.1356726.
  • Lotte, F., Bougrain, L., Cichocki, A., Clerc, M., Congedo, M., Rakotomamonjy, A., and Yger, F. (2018), “A Review of Classification Algorithms for EEG-based Brain–Computer Interfaces: A 10 Year Update,” Journal of Neural Engineering, 15, 031005. DOI: 10.1088/1741-2552/aab2f2.
  • Miao, R., Zhang, X., and Wong, R. K. (in press), “A Wavelet-based Independence Test for Functional Data with an Application to MEG Functional Connectivity,” Journal of the American Statistical Association, DOI: 10.1080/01621459.2021.2020126.
  • Park, J., Ahn, J., and Jeon, Y. (2021), “Sparse Functional Linear Discriminant Analysis, Biometrika, 109, 209–226. DOI: 10.1093/biomet/asaa107.
  • Qiao, X., Guo, S., and James, G. (2019), “Functional Graphical Models,” Journal of the American Statistical Association, 114, 211–222. DOI: 10.1080/01621459.2017.1390466.
  • Qiao, X., Qian, C., James, G. M., and Guo, S. (2020), “Doubly Functional Graphical Models in High Dimensions,” Biometrika, 107, 415–431. DOI: 10.1093/biomet/asz072.
  • Rogers, B. P., Morgan, V. L., Newton, A. T., and Gore, J. C. (2007), “Assessing Functional Connectivity in the Human Brain by fMRI,” Magnetic Resonance Imaging, 25, 1347–1357. DOI: 10.1016/j.mri.2007.03.007.
  • Rothman, A. J., Levina, E., and Zhu, J. (2009), “Generalized Thresholding of Large Covariance Matrices,” Journal of the American Statistical Association, 104, 177–186. DOI: 10.1198/jasa.2009.0101.
  • Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G., and Davis, R. W. (2005), “Significance Analysis of Time Course Microarray Experiments,” Proceedings of the National Academy of Sciences, 102, 12837–12842. DOI: 10.1073/pnas.0504609102.
  • Van Den Heuvel, M. P., Stam, C. J., Kahn, R. S., and Pol, H. E. H. (2009), “Efficiency of Functional Brain Networks and Intellectual Performance,” Journal of Neuroscience, 29, 7619–7624. DOI: 10.1523/JNEUROSCI.1443-09.2009.
  • Vu, V. Q., and Lei, J. (2013), “Minimax Sparse Principal Subspace Estimation in High Dimensions,” The Annals of Statistics, 41, 2905–2947. DOI: 10.1214/13-AOS1151.
  • Wang, H., Peng, B., Li, D., and Leng, C. (2021), “Nonparametric Estimation of Large Covariance Matrices with Conditional Sparsity,” Journal of Econometrics, 223, 53–72. DOI: 10.1016/j.jeconom.2020.09.002.
  • Yao, F., Müller, H.-G., and Wang, J.-L. (2005), “Functional Data Analysis for Sparse Longitudinal Data,” Journal of the American Statistical Association, 100, 577–590. DOI: 10.1198/016214504000001745.
  • Yuan, M., and Lin, Y. (2006), “Model Selection and Estimation in Regression with Grouped Variables,” Journal of the Royal Statistical Society, Series B, 68, 49–67. DOI: 10.1111/j.1467-9868.2005.00532.x.
  • Zapata, J., Oh, S. Y., and Petersen, A. (2022), “Partial Separability and Functional Graphical Models for Multivariate Gaussian Processes, Biometrika, 109, 665–681. DOI: 10.1093/biomet/asab046.
  • Zhang, J.-T., and Chen, J. (2007), “Statistical Inferences for Functional Data,” The Annals of Statistics, 35, 1052–1079. DOI: 10.1214/009053606000001505.
  • Zhang, X., and Wang, J.-L. (2016), “From Sparse to Dense Functional Data and Beyond,” The Annals of Statistics, 44, 2281–2321. DOI: 10.1214/16-AOS1446.
  • Zou, H. (2006), “The Adaptive Lasso and Its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429. DOI: 10.1198/016214506000000735.