Full article: Adaptive Functional Thresholding for Sparse Covariance Function Estimation in High Dimensions

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this article, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert–Schmidt norm of functions, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage, and propose the adaptive functional thresholding estimator by incorporating the variance effects of individual entries of the sample covariance function into functional thresholding. To handle the practical scenario where curves are partially observed with errors, we also develop a nonparametric smoothing approach to obtain the smoothed adaptive functional thresholding estimator and its binned implementation to accelerate the computation. We investigate the theoretical properties of our proposals when p grows exponentially with n under both fully and partially observed functional scenarios. Finally, we demonstrate that the proposed adaptive functional thresholding estimators significantly outperform the competitors through extensive simulations and the functional connectivity analysis of two neuroimaging datasets. Supplementary materials for this article are available online.

KEYWORDS:

1 Introduction

The covariance function estimation plays an important role in functional data analysis, while existing methods are restricted to data with a single or small number of random functions. Recent advances in technology have made multivariate or even high-dimensional functional datasets increasingly common in various applications: for example, time-course gene expression data in genomics (Storey et al. Citation2005), air pollution data in environmental studies (Kong et al. Citation2016) and different types of brain imaging data in neuroscience (Li and Solea Citation2018; Qiao, Guo, and James Citation2019). Under such scenarios, suppose we observe n independent samples $X_{i} (\cdot) = {X_{i 1} (\cdot), \dots, X_{i p} (\cdot)}^{T}$ for $i = 1, \dots, n$ defined on a compact interval $U$ with covariance function $Σ (u, v) = {Σ_{j k} (u, v)}_{p \times p} = cov {X_{i} (u), X_{i} (v)}, u, v \in U .$

From a heuristic interpretation, we can simply treat each curve $X_{i j} (\cdot)$ as an infinitely long vector and replace the (j, k)th entry of $Σ$ by $Σ_{j k} (\cdot, \cdot) = cov {X_{i j} (\cdot), X_{i k} (\cdot)},$ the cross-covariance matrix of two infinitely long vectors. Then $Σ$ can be understood as a block matrix with infinite sizes and its (j, k)th block being $Σ_{j k} (\cdot, \cdot) .$ Besides being of interest in itself, an estimator of $Σ$ is useful for many applications including, for example, multivariate functional principal components analysis (FPCA) (Happ and Greven Citation2018), multivariate functional linear regression (Chiou, Yang, and Chen Citation2016), functional factor model (Guo, Qiao, and Wang Citation2022) and functional classification (Park, Ahn, and Jeon Citation2021). See Section 2.3 for details.

Our article focuses on estimating $Σ$ under high-dimensional scaling, where p can be comparable to, or even larger than n. In this setting, the sample covariance function $\begin{matrix} \hat{Σ} (u, v) = {{\hat{Σ}}_{j k} (u, v)}_{p \times p} \\ = \frac{1}{n - 1} \sum_{i = 1}^{n} {X_{i} (u) - \bar{X} (u)} {X_{i} (v) - \bar{X} (v)}^{T}, u, v \in U, \end{matrix}$ where $\bar{X} (\cdot) = n^{- 1} \sum_{i = 1}^{n} X_{i} (\cdot),$ performs poorly, and some lower-dimensional structural assumptions need to be imposed to estimate $Σ$ consistently. In contrast to extensive work on estimating high-dimensional sparse covariance matrices (Bickel and Levina Citation2008; Rothman, Levina, and Zhu Citation2009; Cai and Liu Citation2011; Chen and Leng Citation2016; Avella-Medina et al. Citation2018; Wang et al. Citation2021), research on sparse covariance function estimation in high dimensions remains largely unaddressed in the literature.

In this article, we consider estimating sparse covariance functions via adaptive functional thresholding in the sense of shrinking some blocks ${\hat{Σ}}_{j k} (\cdot, \cdot)$ ’s in an adaptive way. To achieve this, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage based on the Hilbert-Schmidt norm of functions, and develop an adaptive functional thresholding procedure on $\hat{Σ} (\cdot, \cdot)$ using entry-dependent functional thresholds that automatically adapt to the variability of blocks ${\hat{Σ}}_{j k} (\cdot, \cdot)$ ’s. To provide theoretical guarantees of our method under high-dimensional scaling, it is essential to develop standardized concentration results taking into account the variability adjustment. Compared with adaptive thresholding for nonfunctional data (Cai and Liu Citation2011), the intrinsic infinite-dimensionality of each $X_{i j} (\cdot)$ leads to a substantial rise in the complexity of sparsity modeling and theoretical analysis, as one needs to rely on some functional norm of standardized ${\hat{Σ}}_{j k}$ ’s, for example, the Hilbert–Schmidt norm, to enforce the functional sparsity in $\hat{Σ}$ and tackle more technical challenges for standardized processes within an abstract Hilbert space. To handle the practical scenario where functions are partially observed with errors, it is desirable to apply nonparametric smoothers in conjunction with adaptive functional thresholding. This poses a computationally intensive task especially when p is large, thus calling for the development of fast implementation strategy.

There are many applications of the proposed sparse covariance function estimation method in neuroimaging analysis, where brain signals are measured over time at a large number of regions of interest (ROIs) for individuals. Examples include the brain-computer interface classification (Lotte et al. Citation2018) and the brain functional connectivity identification (Rogers et al. Citation2007). Traditional neuroimaging analysis models brain signals for each subject as multivariate random variables, where each ROI is represented by a random variable, and hence the covariance/correlation matrices of interest are estimated by treating the time-course data of each ROI as repeated observations. However, due to the nonstationary and dynamic features of signals (Chang and Glover Citation2010), the strategy of averaging over time fails to characterize the time-varying structure leading to the loss of information in the original space. To overcome these drawbacks, we follow recent proposals to model signals directly as multivariate random functions with each ROI represented by a random function (Li and Solea Citation2018; Qiao, Guo, and James Citation2019; Zapata, Oh, and Petersen Citation2022; Lee et al. in press). The identified functional sparsity pattern in our estimate of $Σ$ can be used to recover the functional connectivity network among different ROIs, which is illustrated using examples of functional magnetic resonance imaging (fMRI) datasets in Section 6 and Section E.3 of the supplementary material.

Our article makes useful contributions at multiple fronts. On the method side, it generalizes the thresholding/sparsity concept in multivariate statistics to the functional setting and offers a novel adaptive functional thresholding proposal to handle the heteroscedastic problem of the sparse covariance function estimation motivated from neuroimaging analysis and many statistical applications, for example, those in Section 2.3 and Section C.2 of the supplementary material. It also provides an alternative way of identifying correlation-based functional connectivity with no need to specify the correlation function, the estimation of which poses challenges as the inverses of $Σ_{j j} (u, v)$ ’s are unbounded. In practice when functions are observed with errors at either a dense grid of points or a small subset of points, we also develop a unified local linear smoothing approach to obtain the smoothed adaptive functional thresholding estimator and its fast implementation via binning (Fan and Marron Citation1994) to speed up the computation without sacrificing the estimation accuracy. On the theory side, we show that the proposed estimators enjoy the convergence and support recovery properties under both fully and partially observed functional scenarios when p grows exponentially fast relative to n. The proof relies on tools from empirical process theory due to the infinite-dimensional nature of functional data and some novel standardized concentration bounds in the Hilbert–Schmidt norm to deal with issues of high-dimensionality and variance adjustment. Our theoretical results and adopted techniques are general, and can be applied to other settings in high-dimensional functional data analysis.

The remainder of this article is organized as follows. Section 2 introduces a class of functional thresholding operators, based on which we propose the adaptive functional thresholding of the sample covariance function. We then discuss a couple of applications of the sparse covariance function estimation. Section 3 presents convergence and support recovery analysis of our proposed estimator. In Section 4, we develop a nonparametric smoothing approach and its binned implementation to deal with partially observed functional data, and then investigate its theoretical properties. In Sections 5 and 6, we demonstrate the uniform superiority of the adaptive functional thresholding estimators over the universal counterparts through an extensive set of simulation studies and the functional connectivity analysis of a neuroimaging dataset, respectively. All technical proofs are relegated to the supplementary material. We also provide the codes to reproduce the results for simulations and real data analysis in supplementary materials.

2 Methodology

2.1 Functional Thresholding

We begin by introducing some notation. Let $L_{2} (U)$ denotes a Hilbert space of square integrable functions defined on $U$ and $S = L_{2} (U) \otimes L_{2} (U),$ where ⊗ is the Kronecker product. For any $Q \in S,$ we denote its Hilbert–Schmidt norm by $| | Q | |_{S} = {\int \int Q {(u, v)}^{2} dudv}^{1 / 2} .$ With the aid of Hilbert–Schmidt norm, for any regularization parameter $λ \geq 0,$ we first define a class of functional thresholding operators $s_{λ} : S \to S$ that satisfy the following conditions:

$| | s_{λ} (Z) | |_{S} \leq c | | Y | |_{S}$ for all Z and $Y \in S$ that satisfy $| | Z - Y | |_{S} \leq λ$ and some $c > 0;$
$| | s_{λ} (Z) | |_{S} = 0$ for $| | Z | |_{S} \leq λ;$
$| | s_{λ} (Z) - Z | |_{S} \leq λ$ for all $Z \in S .$

Our proposed functional thresholding operators can be viewed as the functional generalization of thresholding operators (Cai and Liu Citation2011). Instead of a simple pointwise extension of such thresholding operators under functional domain, we advocate a global thresholding rule based on the Hilbert–Schmidt norm of functions that encourages the functional sparsity, in the sense that $s_{λ} (Z) (u, v) = 0$ , for all $u, v \in U,$ if $| | Z | |_{S} \leq λ$ under condition (ii). Condition (iii) limits the amount of (global) functional shrinkage in the Hilbert–Schmidt norm to be no more than $λ .$

Conditions (i)–(iii) are satisfied by functional versions of some commonly adopted thresholding rules, which are introduced as solutions to the following penalized quadratic loss problem with various penalties: (1) $s_{λ} (Z) = \underset{θ \in S}{\arg \min} {\frac{1}{2} | | θ - Z | |_{S}^{2} + p_{λ} (θ)}$ (1) with $p_{λ} (θ) = {\tilde{p}}_{λ} (| | θ | |_{S})$ being a penalty function of $| | θ | |_{S}$ to enforce the functional sparsity.

The soft functional thresholding rule results from solving (1) with an $l_{1} / l_{2}$ type of penalty, $p_{λ} (θ) = λ | | θ | |_{S},$ and takes the form of $s_{λ}^{S} (Z) = Z {(1 - λ / | | Z | |_{S})}_{+},$ where ${(x)}_{+} = \max (x, 0)$ for $x \in R .$ This rule can be viewed as a functional generalization of the group lasso solution under the multivariate setting (Yuan and Lin Citation2006). To solve (1) with an $l_{0} / l_{2}$ type of penalty, $p_{λ} (θ) = 2^{- 1} λ^{2} I (| | θ | |_{S} \neq 0),$ we obtain hard functional threhsolding rule as $Z I (| | Z | |_{S} \geq λ),$ where $I (\cdot)$ is an indicator function. As a comparison, soft functional thresholding corresponds to the maximum amount of functional shrinkage allowed by condition (iii), whereas no shrinkage results from hard functional thresholding. Taking the compromise between soft and hard functional thresholding, we next propose functional versions of SCAD (Fan and Li Citation2001) and adaptive lasso (Zou Citation2006) thresholding rules. With a SCAD penalty (Fan and Li Citation2001) operating on $| | \cdot | |_{S}$ instead of $| \cdot |$ for the univariate scalar case, SCAD functional thresholding $s_{λ}^{SC} (Z)$ is the same as soft functional thresholding if $| | Z | |_{S} < 2 λ,$ and equals $Z {(a - 1) - a λ / | | Z | |_{S}} / (a - 2)$ for $| | Z | |_{S} \in [2 λ, a λ]$ and Z if $| | Z | |_{S} > a λ,$ where $a > 2.$ Analogously, adaptive lasso functional thresholding rule is $s_{λ}^{AL} (Z) = Z {(1 - λ^{η + 1} / | | Z | |_{S}^{η + 1})}_{+}$ with $η \geq 0.$

Our proposed functional generalizations of soft, SCAD and adaptive lasso thresholding rules can be checked to satisfy conditions (i)–(iii), see Section B.1 of the supplementary material for details. To present a unified theoretical analysis, we focus on functional thresholding operators $s_{λ} (Z)$ satisfying conditions (i)–(iii). Note that, although the hard functional thresholding does not satisfy condition (i), theoretical results in Section 3 still hold for hard functional thresholding estimators under similar conditions with corresponding proofs differing slightly. For examples of functional data with some local spikes, one may possibly suggest supremum-norm-based class of functional thresholding operators. See the detailed discussion in Section C.1 of the supplementary material.

2.2 Estimation

We now discuss our estimation procedure based on $s_{λ} (Z) .$ Note the variance of ${\hat{Σ}}_{j k} (u, v)$ depends on the distribution of ${X_{i j} (u), X_{i k} (v)}$ through higher-order moments, which is intrinsically a heteroscedastic problem. Hence, it is more desirable to use entry-dependent functional thresholds that automatically takes into account the variability of blocks ${\hat{Σ}}_{j k} (\cdot, \cdot)$ ’s to shrink some blocks to zero adaptively. To achieve this, define the variance factors $Θ_{j k} (u, v) = var ([X_{i j} (u) - E {X_{i j} (u)}] [X_{i k} (v) - E {X_{i k} (v)}])$ with corresponding estimators $\begin{matrix} {\hat{Θ}}_{j k} (u, v) = \frac{1}{n} \sum_{i = 1}^{n} [ {X_{i j} (u) - {\bar{X}}_{j} (u) } { X_{i k} (v) - {\bar{X}}_{k} (v)} - {\hat{Σ}}_{j k} (u, v) ]^{2} , \\ j, k = 1, \dots, p . \end{matrix}$

Then the adaptive functional thresholding estimator ${\hat{Σ}}_{A} = {{\hat{Σ}}_{j k}^{A} (\cdot, \cdot)}_{p \times p}$ is defined by (2) ${\hat{Σ}}_{j k}^{A} = {\hat{Θ}}_{j k}^{1 / 2} \times s_{λ} (\frac{{\hat{Σ}}_{j k}}{{\hat{Θ}}_{j k}^{1 / 2}}),$ (2) which uses a single threshold level to functionally threshold standardized entries, ${\hat{Σ}}_{j k} / {\hat{Θ}}_{j k}^{1 / 2}$ for all j, k, resulting in entry-dependent functional thresholds for ${\hat{Σ}}_{j k}$ ’s. The selection of the optimal regularization parameter $\hat{λ}$ is discussed in Section 5.

An alternative approach to estimate $Σ$ is the universal functional thresholding estimator ${\hat{Σ}}_{U} = {{\hat{Σ}}_{j k}^{U} (\cdot, \cdot)}_{p \times p} with {\hat{Σ}}_{j k}^{U} = s_{λ} ({\hat{Σ}}_{j k}),$ where a universal threshold level is used for all entries. In a similar spirit to Rothman, Levina, and Zhu (Citation2009), the consistency of ${\hat{Σ}}_{U}$ requires the assumption that marginal-covariance functions are uniformly bounded in nuclear norm, that is, $\max_{j} | | Σ_{j j} | |_{N} \leq M,$ where $| | Σ_{j j} | |_{N} = \int_{U} Σ_{j j} (u, u) du .$ However, intuitively, such universal method does not perform well when nuclear norms vary over a wide range, or even fails when the uniform boundedness assumption is violated. Section 5 provides some empirical evidence to support this intuition.

2.3 Applications

Many statistical problems involving multivariate functional data ${X_{i} (\cdot)}_{i = 1}^{n}$ require estimating the covariance function $Σ .$ Under a high-dimensional regime, the functional sparsity assumption can be imposed on $Σ$ to facilitate its consistent sparse estimates. Here we outline three applications of our proposals for the sparse covariance function estimation.

Our first application is multivariate FPCA serving as a natural dimension reduction approach for $X_{i} (\cdot) .$ With the aid of Karhunen-Loève expansion for multivariate functional data (Happ and Greven Citation2018), $X_{i} (\cdot)$ admits the following expansion (3) $X_{i} (\cdot) = E {X_{i} (\cdot)} + \sum_{l = 1}^{\infty} ξ_{i l} ϕ_{l} (\cdot), i = 1, \dots, n,$ (3) where the principal component scores $ξ_{i l} = \sum_{j = 1}^{p} \int [X_{i j} (u) - E {X_{i j} (u)}] ϕ_{l j} (u) du$ and eigenfunctions $ϕ_{l} (\cdot) = {ϕ_{l 1} (\cdot), \dots$ , $ϕ_{l p} (\cdot)}^{T}$ are attainable by the eigenanalysis of $Σ .$ Under a large p scenario, we can adopt the proposed functional thresholding technique to obtain the sparse estimation of $Σ,$ which guarantees the consistencies of estimated eigenvalues/eigenfunctions pairs. In Section E.1 of the supplementary material, we follow the proposal of a normalized version of multivariate FPCA in Happ and Greven (Citation2018) and use a simulated example to illustrate the superior sample performance of our functional thresholding approaches.

Our second application, multivariate functional linear regression (Chiou, Yang, and Chen Citation2016), takes the form of (4) $Y_{i} = β_{0} + \int_{U} X_{i} {(u)}^{T} β (u) d u + ϵ_{i}, i = 1, \dots, n,$ (4) where $β (\cdot) = {β_{1} (\cdot), \dots, β_{p} (\cdot)}^{T}$ is p-vector of functional coefficients to be estimated. The standard three-step procedure involves performing (normalized) multivariate FPCA on $X_{i} (\cdot)$ ’s based on $\hat{Σ},$ then estimating the basis coefficients vector of $β (\cdot)$ and finally recovering the estimated functional coefficients, where details are presented in Section E.1 of the supplementary material and Chiou, Yang, and Chen (Citation2016). When p is large, we can implement our functional thresholding proposals to obtain consistent estimators of $Σ$ and hence $β .$ In Section E.1 of the supplementary material, we demonstrate via a simulated example the superiority of our adaptive-functional-thresholding-based estimator over its competitors.

Our third application considers another dimension reduction framework via functional factor model (Guo, Qiao, and Wang Citation2022) in the form of $X_{i} (\cdot) = A f_{i} (\cdot) + ε_{i} (\cdot),$ where the common components are driven by r functional factors $f_{i} (\cdot) = {f_{i 1} (\cdot), \dots, f_{i r} (\cdot)}^{T},$ the idiosyncratic components are $ε_{i} (\cdot)$ and $A \in R^{p \times r}$ is the factor loading matrix. Denote the covariance functions of $X_{i} (\cdot), f_{i} (\cdot)$ and $ε_{i} (\cdot)$ by $Σ_{X}, Σ_{f}$ and $Σ_{ε},$ respectively. Under the orthogonality of $A, \int \int Σ_{X} (u, v) Σ_{X} {(u, v)}^{T} dudv$ can be decomposed as the sum of $A \int \int Σ_{f} (u, v) Σ_{f} {(u, v)}^{T} dudv A^{T}$ and the remaining smaller order terms. Intuitively, with certain identifiable conditions, $A$ can be recovered by carrying out an eigenanalysis of $\int \int Σ_{X} (u, v) Σ_{X} {(u, v)}^{T} dudv .$ To provide a parsimonious model and enhance interpretability for near-zero loadings, we can impose subspace sparsity conditions (Vu and Lei Citation2013) on $A$ that results in a functional sparse $Σ_{X}$ and hence our functional thresholding estimators become applicable. See an application of our functional thresholding technique to improve the estimation quality when fitting sparse functional factor model in Guo, Qiao, and Wang (Citation2022). See also Section C.2 of the supplementary material for other applications including functional graphical model estimation (Qiao, Guo, and James Citation2019) and multivariate functional classification.

3 Theoretical Properties

We begin with some notation. For a random variable W, define $| | W | |_{ψ} = \inf {c > 0 : E [ψ (| W | / c)] \leq 1},$ where $ψ : [0, \infty) \to [0, \infty)$ is a nondecreasing, nonzero convex function with $ψ (0) = 0$ and the norm takes the value $\infty$ if no finite c exists for which $E [ψ (| W | / c)] \leq 1.$ Denote $ψ_{k} (x) = \exp (x^{k}) - 1$ for $k \geq 1$ . Let the packing number $D (ϵ, d)$ be the maximal number of points that can fit in the compact interval $U$ while maintaining a distance greater than ϵ between all points with respect to the semimetric d. We refer to Chapter 8 of Kosorok (Citation2008) for further explanations. For ${X_{i j} (u) : u \in U, i = 1, \dots, n, j = 1, \dots, p},$ define the standardized processes by $Y_{i j} (u) = [X_{i j} (u) - E {X_{i j} (u)}] / σ_{j} {(u)}^{1 / 2},$ where $σ_{j} (u) = Σ_{j j} (u, u)$ .

To present the main theorems, we need the following regularity conditions.

Condition 1

. (i) For each i and $j, Y_{i j} (\cdot)$ is a separable stochastic process with the semimetric $d_{j} (u, v) = | | Y_{1 j} (u) - Y_{1 j} (v) | |_{ψ_{2}}$ for $u, v \in U;$ (ii) For some $u_{0} \in U, \max_{1 \leq j \leq p} | | Y_{1 j} (u_{0}) | |_{ψ_{2}}$ is bounded.

Condition 2

. The packing numbers $D (ϵ, d_{j})$ ’s satisfy $\max_{1 \leq j \leq p} D (ϵ, d_{j}) \leq C ϵ^{- r}$ for some constants $C, r > 0$ and $ϵ \in (0, 1] .$

Condition 3

. There exists some constant $τ > 0$ such that $\min_{j, k} \inf_{u, v \in U} var {Y_{1 j} (u) Y_{1 k} (v)} \geq τ .$

Condition 4

. The pair (n, p) satisfies $\log p / n^{1 / 4} \to 0$ as n and $p \to \infty .$

Conditions 1 and 2 are standard to characterize the modulus of continuity of sub-Gaussian processes $Y_{i j} (\cdot)$ ’s, see Chapter 8 of Kosorok (Citation2008). These conditions also imply that there exist some positive constants C₀ and η such that $E [\exp (t | | Y_{1 j} | |^{2})] \leq C_{0}$ for all $| t | \leq η$ and j with $| | Y_{1 j} | | = {\int_{U} Y_{1 j} {(u)}^{2} du}^{1 / 2},$ which plays a crucial role in our proof when applying concentration inequalities within Hilbert space. Condition 3 restricts the variances of $Y_{i j} (u) Y_{i k} (v)$ ’s to be uniformly bounded away from zero so that they can be well estimated. It also facilitates the development of some standardized concentration results. This condition precludes the case of a Brownian motion $X_{i j} (\cdot)$ starting at 0 for some j. However, replacing $X_{i j} (\cdot)$ with a contaminated process $X_{i j} (\cdot) + ξ_{i j},$ where ξ_ij’s are independent from a normal distribution with zero mean and a small variance and are independent of $X_{i j} (\cdot)$ ’s, Condition 3 is fulfilled while the cross-covariance structure in $Σ$ remains the same in the sense of $cov {X_{i j} (u) + ξ_{i j}, X_{i k} (v)} = cov {X_{i j} (u), X_{i k} (v)}$ for $k \neq j$ and $u, v \in U .$ Condition 4 allows the high-dimensional case, where p can diverge at some exponential rate as n increases.

We next establish the convergence rate of the adaptive functional thresholding estimator ${\hat{Σ}}_{A}$ over a large class of “approximately sparse” covariance functions defined by $\begin{matrix} C (q, s_{0} (p), ϵ_{0}; U) = {Σ : Σ ≽ 0, \max_{1 \leq j \leq p} \sum_{k = 1}^{p} | | σ_{j} | |_{\infty}^{(1 - q) / 2} | | σ_{k} | |_{\infty}^{(1 - q) / 2} \\ | | Σ_{j k} | |_{S}^{q} \leq s_{0} (p), \max_{j} | | σ_{j}^{- 1} | |_{\infty} | | σ_{j} | |_{\infty} \\ \leq ϵ_{0}^{- 1} < \infty} \end{matrix}$ for some $0 \leq q < 1,$ where $| | σ_{j} | |_{\infty} = \sup_{u \in U} σ_{j} (u)$ and $Σ ≽ 0$ means that $Σ = {Σ_{j k} (\cdot, \cdot)}_{p \times p}$ is positive semidefinite, that is, $\sum_{j, k} \int \int Σ_{j k} (u, v) a_{j} (u) a_{k} (v) dudv \geq 0$ for any $a_{j} (\cdot) \in L^{2} (U)$ and $j = 1, \dots, p .$ See Cai and Liu (Citation2011) for a similar class of covariance matrices for nonfunctional data. Compared with the class $\begin{matrix} C^{*} (q, s_{0} (p), M; U) = {Σ : Σ ≽ 0, \max_{j} | | σ_{j} | |_{N} \\ \leq M, \max_{j} \sum_{k = 1}^{p} | | Σ_{j k} | |_{S}^{q} \\ \leq s_{0} (p)}, \end{matrix}$ over which the universal functional thresholding estimator ${\hat{Σ}}_{U}$ can be shown to be consistent, the columns of a covariance function in $C (q, s_{0} (p), ϵ_{0}; U)$ are required to be within a weighted $l_{q} / l_{2}$ ball instead of a standard $l_{q} / l_{2}$ ball, where the weights are determined by $| | σ_{j} | |_{\infty}$ ’s. Unlike $C^{*} (q, s_{0} (p), M; U), C (q, s_{0} (p), ϵ_{0}; U)$ no longer requires the uniform boundedness assumption on $| | σ_{j} | |_{N}$ ’s and allows $\max_{j} | | σ_{j} | |_{N} \to \infty .$ In the special case q = 0, $C (q, s_{0} (p), ϵ_{0}; U)$ corresponds to a class of truly sparse covariance functions. Notably, $s_{0} (p)$ can depend on p and be regarded implicitly as the restriction on functional sparsity.

Theorem 1

. Suppose that Conditions 1–4 hold. Then there exists some constant $δ > 0$ such that, uniformly on $C (q, s_{0} (p), ϵ_{0}; U),$ if $λ = δ {(\log p / n)}^{1 / 2},$ (5) $| | {\hat{Σ}}_{A} - Σ | |_{1} = \max_{1 \leq k \leq p} \sum_{j = 1}^{p} | | {\hat{Σ}}_{j k}^{A} - Σ_{j k} | |_{S} = O_{P} { s_{0} (p) {(\frac{\log p}{n})}^{\frac{1 - q}{2}} } .$ (5)

Theorem 1 presents the convergence result in the functional version of matrix $l_{1}$ norm. The rate in (5) is consistent to those of sparse covariance matrix estimates in Rothman, Levina, and Zhu (Citation2009) and Cai and Liu (Citation2011).

We finally turn to investigate the support recovery consistency of ${\hat{Σ}}_{A}$ over the parameter space of truly sparse covariance functions defined by $C_{0} (s_{0} (p); U) = {Σ : Σ ≽ 0, \max_{1 \leq j \leq p} \sum_{k = 1}^{p} I (| | Σ_{j k} | |_{S} \neq 0) \leq s_{0} (p)},$ which assumes that ${(Σ_{j k})}_{p \times p}$ has at most $s_{0} (p)$ nonzero functional entries on each row. The following theorem shows that, with the choice of $λ = δ {(\log p / n)}^{1 / 2}$ for some constant $δ > 0, {\hat{Σ}}_{A}$ exactly recovers the support of $Σ, supp (Σ) = {(j, k) : | | Σ_{j k} | |_{S} \neq 0},$ with probability approaching one.

Theorem 2

. Suppose that Conditions 1–4 hold and $‖ Σ_{j k} / Θ_{j k}^{1 / 2} ‖_{S} > (2 δ + γ) {(\log p / n)}^{1 / 2}$ for all $(j, k) \in supp (Σ)$ and some $γ > 0,$ where δ is stated in Theorem 1. Then we have that $\inf_{Σ \in C_{0}} P {supp ({\hat{Σ}}_{A}) = supp (Σ)} \to 1 as n \to \infty .$

Theorem 2 ensures that ${\hat{Σ}}_{A}$ achieves the exact recovery of functional sparsity structure in $Σ,$ that is, the graph support in functional connectivity analysis, with probability tending to 1. This theorem holds under the condition that the Hilbert-Schmidt norms of nonzero standardized functional entries exceed a certain threshold, which ensures that nonzero components are correctly retained. See an analogous minimum signal strength condition for sparse covariance matrices in Cai and Liu (Citation2011).

4 Partially Observed Functional Data

In this section we consider a practical scenario where each $X_{i j} (\cdot)$ is partially observed, with errors, at random measurement locations $U_{i j 1}, \dots, U_{i j L_{i j}} \in U .$ Let Z_ijl be the observed value of $X_{i j} (U_{ijl}) .$ Then (6) $Z_{ijl} = X_{i j} (U_{ijl}) + ε_{ijl}, l = 1, \dots, L_{i j},$ (6) where $ε_{ijl}$ ’s are iid errors with $E (ε_{ijl}) = 0$ and $var (ε_{ijl}) = σ^{2},$ independent of $X_{i j} (\cdot) .$ For dense measurement designs all L_ij’s are larger than some order of n, while for sparse designs all L_ij’s are bounded (Zhang and Wang Citation2016; Qiao et al. Citation2020).

4.1 Estimation Procedure

Based on the observed data, ${(U_{ijl}, Z_{ijl})}_{1 \leq i \leq n, 1 \leq j \leq p, 1 \leq l \leq L_{i j}},$ we next present a unified estimation procedure that handles both densely and sparsely sampled functional data.

We first develop a nonparametric smoothing approach to estimate $Σ_{j k} (u, v)$ ’s. Without loss of generality, we assume that $X_{i} (\cdot)$ has been centered to have mean zero. Denote $K_{h} (\cdot) = h^{- 1} K (\cdot / h)$ for a univariate kernel function K with a bandwidth $h > 0.$ A Local Linear Surface smoother (LLS) is employed to estimate cross-covariance functions $Σ_{j k} (u, v)$ ( $j \neq k$ ) by minimizing (7) $\begin{matrix} \sum_{i = 1}^{n} \sum_{l = 1}^{L_{i j}} \sum_{m = 1}^{L_{i k}} {Z_{ijl} Z_{ikm} - α_{0} - α_{1} (U_{ijl} - u) - α_{2} (U_{ikm} - v)}^{2} \\ K_{h_{C}} (U_{ijl} - u) K_{h_{C}} (U_{ikm} - v), \end{matrix}$ (7) with respect to $(α_{0}, α_{1}, α_{2}) .$ Let the minimizer of (7) be $({\hat{α}}_{0}, {\hat{α}}_{1}, {\hat{α}}_{2})$ and the resulting estimator is ${\tilde{Σ}}_{j k} (u, v) = {\hat{α}}_{0} .$ To estimate marginal-covariance functions $Σ_{j j} (u, v)$ ’s, we observe that $cov (Z_{ijl}, Z_{ijm}) = Σ_{j j} (U_{ijl}, U_{ijm}) + σ^{2} I (l = m),$ and hence apply a LLS to the off-diagonals of the raw covariances ${(Z_{ijl} Z_{ijm})}_{1 \leq l \leq m \leq L_{i j}} .$ We consider minimizing $\begin{matrix} \sum_{i = 1}^{n} \sum_{1 \leq l \neq m \leq L_{i j}} {Z_{ijl} Z_{ijm} - β_{0} - β_{1} (U_{ijl} - u) - β_{2} (U_{ikm} - v)}^{2} \\ K_{h_{M}} (U_{ijl} - u) K_{h_{M}} (U_{ikm} - v) \end{matrix}$ with respect to $(β_{0}, β_{1}, β_{2}),$ thus obtaining the estimate ${\tilde{Σ}}_{j j} (u, v) = {\hat{β}}_{0} .$ Note that we drop subscripts j, k of $h_{C, j k}$ and j of $h_{M, j}$ to simplify our notation in this section. However, we select different bandwidths $h_{C, j k}$ and $h_{M, j}$ across $j, k = 1, \dots, p$ in our empirical studies.

To construct the corresponding adaptive functional thresholding estimator, a standard approach is to incorporate the variance effect of each ${\tilde{Σ}}_{j k} (u, v)$ into functional thresholding. However, the estimation of $var {{\tilde{Σ}}_{j k} (u, v)}$ ’s involves estimating multiple complicated fourth moment terms (Zhang and Wang Citation2016), which results in high computational burden especially for large p. Since our focus is on characterizing the main variability of ${\tilde{Σ}}_{j k} (u, v)$ rather than estimating its variance precisely, we next develop a computationally simple yet effective approach to estimate the main terms in the asymptotic variance of ${\tilde{Σ}}_{j k} (u, v) .$ For $a, b = 0, 1, 2,$ let (8) $T_{a b, ijk} (u, v) = \sum_{l = 1}^{L_{i j}} \sum_{m = 1}^{L_{i k}} g_{a b} {h_{C}, (u, v), (U_{ijl}, U_{ikm})} Z_{ijl} Z_{ikm},$ (8) where $g_{a b} {h, (u, v), (U_{ijl}, U_{ikm})} = K_{h} (U_{ijl} - u) K_{h} (U_{ikm} - v) {(U_{ijl} - u)}^{a} {(U_{ikm} - v)}^{b} .$ According to Section D.1 of the supplementary material, minimizing (7) yields the resulting estimator (9) ${\tilde{Σ}}_{j k} = \sum_{i = 1}^{n} (W_{1, j k} T_{00, ijk} + W_{2, j k} T_{10, ijk} + W_{3, j k} T_{01, ijk}),$ (9) where $W_{1, j k}, W_{2, j k}, W_{3, j k}$ can be represented via (S.12) in terms of (10) $\begin{matrix} S_{a b, j k} (u, v) = \sum_{i = 1}^{n} \sum_{l = 1}^{L_{i j}} \sum_{m = 1}^{L_{i k}} g_{a b} {h_{C}, (u, v), (U_{ijl}, U_{ikm})}, \\ a, b = 0, 1, 2. \end{matrix}$ (10)

It is notable that the estimator ${\tilde{Σ}}_{j k}$ in (9) is expressed as the sum of n independent terms. Ignoring the cross-covariances among observations within the subject that are dominated by the corresponding variances, we propose a surrogate estimator for the asymptotic variance of ${\tilde{Σ}}_{j k}$ by (11) ${\tilde{Ψ}}_{j k} = I_{j k} \sum_{i = 1}^{n} (W_{1, j k} V_{00, ijk} + W_{2, j k} V_{10, ijk} + W_{3, j k} V_{01, ijk})^{2},$ (11) where (12) $\begin{matrix} I_{j k} = {(\sum_{i = 1}^{n} L_{i j} L_{i k})}^{2} {\sum_{i = 1}^{n} (L_{i j} L_{i k} h_{C}^{- 2} + L_{i j}^{2} L_{i k} h_{C}^{- 1} \\ + L_{i j} L_{i k}^{2} h_{C}^{- 1} + L_{i j}^{2} L_{i k}^{2})}^{- 1}, \end{matrix}$ (12) (13) $\begin{matrix} V_{a b, ijk} (u, v) = \sum_{l = 1}^{L_{i j}} \sum_{m = 1}^{L_{i k}} g_{a b} {h_{C}, (u, v), (U_{ijl}, U_{ikm})} \\ {Z_{ijl} Z_{ikm} - {\tilde{Σ}}_{j k} (u, v)} . \end{matrix}$ (13)

The rationale of multiplying the rate I_jk in (11) is to ensure that ${\tilde{Ψ}}_{j k} (u, v)$ converges to some finite function when $n \to \infty$ and $h_{C} \to 0$ as justified in Section D.4 of the supplementary material. In particular, the rate I_jk can be simplified to $\sum_{i = 1}^{n} L_{i j} L_{i k} h_{C}^{2}$ for the sparse or moderately dense case and to ${(\sum_{i = 1}^{n} L_{i j} L_{i k})}^{2} {(\sum_{i = 1}^{n} L_{i j}^{2} L_{i k}^{2})}^{- 1}$ for the very dense case. Note that I_jk is imposed in (11) mainly for the theoretical purpose and hence will not place a practical constraint on our method.

In a similar procedure as above, the estimated variance factor ${\tilde{Ψ}}_{j j}$ of ${\tilde{Σ}}_{j j}$ for each j can be obtained by operating on ${Z_{ijl} Z_{ijm}}_{1 \leq i \leq n, 1 \leq l \neq m \leq L_{i j}}$ instead of ${Z_{ijl} Z_{ikm}}_{1 \leq i \leq n, 1 \leq l \leq L_{i j}, 1 \leq m \leq L_{i k}}$ for $j \neq k .$ Substituting ${\hat{Θ}}_{j k}$ in (2) by ${\tilde{Ψ}}_{j k},$ we obtain the smoothed adaptive functional thresholding estimator (14) ${\tilde{Σ}}_{A} = {({\tilde{Σ}}_{j k}^{A})}_{p \times p} with {\tilde{Σ}}_{j k}^{A} = {\tilde{Ψ}}_{j k}^{1 / 2} \times s_{λ} (\frac{{\tilde{Σ}}_{j k}}{{\tilde{Ψ}}_{j k}^{1 / 2}}) .$ (14)

For comparison, we also define the smoothed universal functional thresholding estimator as ${\tilde{Σ}}_{U} = {({\tilde{Σ}}_{j k}^{U})}_{p \times p}$ with ${\tilde{Σ}}_{j k}^{U} = s_{λ} ({\tilde{Σ}}_{j k}) .$

A natural alternative to the proposed LLS-based smoothing procedure considers pre-smoothing each individual data. For densely sampled functional data, the observations $Z_{i j 1}, \dots, Z_{i j L_{i j}}$ for each i and j can be pre-smoothed through the local linear smoother to eliminate the contaminated noise, thus, producing reconstructed random curves ${\hat{X}}_{i j} (\cdot)$ ’s before subsequent analysis (Zhang and Chen Citation2007). See detailed implementation of pre-smoothing in Section D.2 of the supplementary material. For sparsely sampled functional data, such pre-smoothing step is not viable, while our smoothing proposal builds strength across functions by incorporating information from all the observations, and hence is still applicable. See also Section 5.3 for the numerical comparison between pre-smoothing and our smoothing approach under different measurement designs.

4.2 Theoretical Properties

In this section, we investigate the theoretical properties of ${\tilde{Σ}}_{A}$ for partially observed functional data. We begin by introducing some notation. For two positive sequences ${a_{n}}$ and ${b_{n}},$ we write $a_{n} ≲ b_{n}$ if there exits a positive constant c₀ such that $a_{n} / b_{n} \leq c_{0} .$ We write $a_{n} ≍ b_{n}$ if and only if $a_{n} ≲ b_{n}$ and $b_{n} ≲ a_{n}$ hold simultaneously. Before presenting the theory, we impose the following regularity conditions.

Condition 5

. (i) Let ${U_{ijl} : i = 1, \dots, n, j \in 1, \dots, p, l = 1, \dots, L_{i j}}$ be iid copies of a random variable U with density $f_{U} (\cdot)$ defined on the compact set $U,$ with the L_ij’s fixed. There exist some constants m_f and M_f such that $0 < m_{f} \leq \inf_{U} f_{U} (u) \leq \sup_{U} f_{U} (u) \leq M_{f} < \infty;$ (ii) $X_{i j}, ε_{ijl}$ and U_ijl are independent for each $i, j, l .$

Condition 6

. (i) Under the sparse measurement design, $L_{i j} \leq L_{0} < \infty$ for all i, j and, under the dense design, $L_{i j} = L \to \infty$ as $n \to \infty$ with U_ijl’s independent of $i;$ (ii) The bandwidth parameters $h_{C} ≍ h_{M} ≍ h \to 0$ as $n \to \infty .$

Condition 5 is standard in functional data analysis literature (Zhang and Wang Citation2016). Condition 6 (i) treats the number of measurement locations L_ij as bounded and diverging under sparse and dense measurement designs, respectively. To simplify notation, we assume that $L_{i j} = L$ for the dense case and h_C is of the same order as h_M in Condition 6 (ii).

Condition 7

. There exists some constant $γ_{1} \in (0, 1 / 2]$ such that (15) $\begin{matrix} \max_{1 \leq j, k \leq p} ‖ {\tilde{Σ}}_{j k} - Σ_{j k} ‖_{S} ≲ \sqrt{\frac{\log p}{n^{2 γ_{1}}}} \\ + h^{2} with probability approaching one . \end{matrix}$ (15)

Condition 8

. There exist some positive constants $c_{1}, γ_{2} \in (0, 1 / 2]$ and some deterministic functions $Ψ_{j k} (u, v)$ ’s with $\min_{j, k} \inf_{u, v \in U} Ψ_{j k} (u, v) \geq c_{1}$ such that (16) $\begin{matrix} \max_{1 \leq j, k \leq p} \sup_{u, v \in U} | {\tilde{Ψ}}_{j k} (u, v) - Ψ_{j k} (u, v) | \\ ≲ \sqrt{\frac{\log p}{n^{2 γ_{2}}}} + h^{2} with probability approaching one . \end{matrix}$ (16)

Condition 9

. The pair (n, p) satisfies $\log p / n^{\min (γ_{1}, γ_{2})} \to 0$ and $\log p \geq c_{2} n^{2 γ_{1}} h^{4}$ for some positive constant c₂ as n and $p \to \infty .$

We follow Qiao et al. (Citation2020) to impose Condition 7, in which the parameter γ₁ depends on h and possibly L under the dense design. This condition is satisfied if there exist some positive constants $c_{3}, c_{4}, c_{5}$ such that for each $j, k = 1, \dots, p$ and $t \in (0, 1],$ (17) $P (| | {\tilde{Σ}}_{j k} - Σ_{j k} | |_{S} \geq t + c_{5} h^{2}) \leq c_{4} \exp (- c_{3} n^{2 γ_{1}} t^{2}) .$ (17)

The presence of h² comes from the standard results for bias terms under the boundedness condition for the second-order partial derivatives of $Σ_{j k} (u, v)$ over $U^{2}$ (Yao, Müller, and Wang Citation2005; Zhang and Wang Citation2016). This concentration result is fulfilled under different measurement schedules ranging from sparse to dense designs as γ₁ increases. For sparsely sampled functional data, Lemma 4 of Qiao et al. (Citation2020) established L₂ concentration inequality for ${\tilde{Σ}}_{j k}$ for $j = k,$ which not only results in the same L₂ rate as that in the sparse case (Zhang and Wang Citation2016) but also ensures (17) with the choice of $γ_{1} = 1 / 2 - a$ and $h ≍ n^{- a}$ for some positive constant $a < 1 / 2.$ Following the same proof procedure, the same concentration inequality also applies for $j \neq k$ and hence Condition 7 is satisfied. This condition is also satisfied by densely sampled functional data, since it follows from Lemma 5 of Qiao et al. (Citation2020) that (17) holds for j = k and, with more efforts, also for $j \neq k$ by choosing $γ_{1} = \min (1 / 2, 1 / 3 + b / 6 - ϵ' / 2 - 2 a / 3)$ for some small constant $ϵ' > 0$ when $h ≍ n^{- a}$ and $L ≍ n^{b}$ for some constants $a, b > 0.$ As L grows sufficiently large, $γ_{1} = 1 / 2,$ thus leading to the same rate as that in the ultra-dense case (Zhang and Wang Citation2016). Condition 8 gives the uniform convergence rate for ${\tilde{Ψ}}_{j k} (u, v)$ in the same form as (15) but with different parameter $γ_{2} .$ A denser measurement design corresponds to a larger value of γ₂ and a faster rate in (16). See the heuristic verification of Condition 8 in Section D.4 of the supplementary material. Condition 9 indicates that p can grow exponentially fast relative to n.

We next present the convergence rate of the smoothed adaptive functional thresholding estimator ${\tilde{Σ}}_{A}$ over a class of “approximate sparse” covariance functions defined by $\begin{matrix} \tilde{C} (q, {\tilde{s}}_{0} (p), ϵ_{0}; U) = {Σ : Σ ≽ 0, \max_{1 \leq j \leq p} \sum_{k = 1}^{p} | | Ψ_{j k} | |_{\infty}^{(1 - q) / 2} {‖ Σ_{j k} ‖}_{S}^{q} \\ \leq {\tilde{s}}_{0} (p), \max_{j, k} | | Ψ_{j k}^{- 1} | |_{\infty} | | Ψ_{j k} | |_{\infty} \leq ϵ_{0}^{- 1} < \infty }, \end{matrix}$ for some $0 \leq q < 1.$

Theorem 3.

Suppose that Conditions 5–9 hold. Then there exists some constants $\tilde{δ} > 0$ such that, uniformly on $\tilde{C} (q, {\tilde{s}}_{0} (p), ϵ_{0}; U),$ if $λ = \tilde{δ} {(\log p / n^{2 γ_{1}})}^{1 / 2},$ (18) $| | {\tilde{Σ}}_{A} - Σ | |_{1} = \max_{1 \leq k \leq p} \sum_{j = 1}^{p} | | {\tilde{Σ}}_{j k}^{A} - Σ_{j k} | |_{S} = O_{P} {{\tilde{s}}_{0} (p) {(\frac{\log p}{n^{2 γ_{1}}})}^{\frac{1 - q}{2}} } .$ (18)

The convergence rate of ${\tilde{Σ}}_{A}$ in (18) is governed by internal parameters $(γ_{1}, q)$ and other dimensionality parameters. Larger values of γ₁ correspond to a more frequent measurement schedule with larger L and result in a faster rate. The convergence result implicitly reveals interesting phase transition phenomena depending on the relative order of L to n. As L grows fast enough, $γ_{1} = 1 / 2$ and the rate is consistent to that for fully observed functional data in (5), presenting that the theory for very densely sampled functional data falls in the parametric paradigm. As L grows moderately fast, $γ_{1} < 1 / 2$ and the rate is faster than that for sparsely sampled functional data but slower than the parametric rate.

We finally present Theorem 4 that guarantees the support recovery consistency of ${\tilde{Σ}}_{A} .$

Theorem 4

. Suppose that Conditions 5–9 hold and $‖ Σ_{j k} / Ψ_{j k}^{1 / 2} ‖_{S} > (2 \tilde{δ} + \tilde{γ}) {(\log p / n^{2 γ_{1}})}^{1 / 2}$ for all $(j, k) \in supp (Σ)$ and some $\tilde{γ} > 0,$ where $\tilde{δ}$ is stated in Theorem 3, then $\inf_{Σ \in C_{0}} P {supp ({\tilde{Σ}}_{A}) = supp (Σ)} \to 1 as n \to \infty .$

4.3 Fast Computation

Consider a common situation in practice, where, for each $i = 1, \dots, n,$ we observe the noisy versions of $X_{i 1} (\cdot), \dots, X_{i p} (\cdot)$ at the same set of points, $U_{i 1}, \dots, U_{i L_{i}} \in U,$ across $j = 1, \dots, p .$ Then the original model in (6) is simplified to (19) $Z_{ijl} = X_{i j} (U_{i l}) + ε_{ijl}, l = 1, \dots, L_{i},$ (19) under which the proposed estimation procedure in Section 4.1 can still be applied. Suppose that the estimated covariance function is evaluated at a grid of R × R locations, ${(u_{r_{1}}, u_{r_{2}}) \in U^{2} : r_{1}, r_{2} = 1, \dots, R} .$ To serve the estimation of $p (p + 1) / 2$ marginal- and cross-covariance functions and the corresponding variance factors, LLSs under the simplified model in (19) reduce the number of kernel evaluations from $O (\sum_{i = 1}^{n} \sum_{j = 1}^{p} L_{i j} R)$ to $O (\sum_{i = 1}^{n} L_{i} R),$ which substantially accelerate the computation under a high-dimensional regime.

Apparently, such nonparametric smoothing approach is conceptually simple but suffers from high computational cost in kernel evaluations. To further reduce the computational burden, we consider fast implementations of LLSs by adopting a simple approximation technique, known as linear binning (Fan and Marron Citation1994), to the covariance function estimation. The key idea of the binning method is to greatly reduce the number of kernel evaluations through the fact that many of these evaluations are nearly the same. We start by dividing $U$ into an equally spaced grid of R points, $u_{1} < \dots < u_{R} \in U,$ with binwidth $Δ = u_{2} - u_{1} .$ Denote by $w_{r} (U_{i l}) = \max (1 - Δ^{- 1} | U_{i l} - u_{r} |, 0)$ the linear weight that U_il assigns to the grid point u_r for $r = 1, \dots, R .$ For the ith subject, we define its “binned weighted counts” and “binned weighted averages” as $ϖ_{r, i} = \sum_{l = 1}^{L_{i}} w_{r} (U_{i l}) and D_{r, i j} = \sum_{l = 1}^{L_{i}} w_{r} (U_{i l}) Z_{ijl},$ respectively. The binned implementation of smoothed adaptive functional thresholding can then be done using this modified dataset ${(ϖ_{r, i}, D_{r, i j})}_{1 \leq i \leq n, 1 \leq j \leq p, 1 \leq r \leq R}$ and related kernel functions $g_{a b} {h, (u, v), (u_{r_{1}}, u_{r_{2}})}$ for $r_{1}, r_{2} = 1, \dots, R .$ It is notable that, with the help of such binned implementation, the number of kernel evaluations required in the covariance function estimation is further reduced from $O (\sum_{i = 1}^{n} L_{i} R)$ to $O (R),$ while only $O (\sum_{i = 1}^{n} L_{i})$ additional operations are involved for each j in the binning step (Fan and Marron Citation1994).

We next illustrate the binned implementation of LLS, denoted as BinLLS, using the example of smoothed estimates ${\tilde{Σ}}_{j k}$ for $j \neq k$ in (9). Under Model (19), we drop subscripts j, k in $W_{1, j k}, W_{2, j k}, W_{3, j k}$ , and $S_{a b, j k}$ due to the same set of points ${U_{i 1}, \dots, U_{i L_{i}}}$ across $j, k .$ Denote the binned approximations of $T_{a b, ijk}$ and S_ab by ${\overset{ˇ}{T}}_{a b, ijk}$ and ${\overset{ˇ}{S}}_{a b},$ respectively. It follows from (8) and (10) that $\begin{matrix} {\overset{ˇ}{T}}_{a b, ijk} (u, v) = \sum_{r_{1} = 1}^{R} \sum_{r_{2} = 1}^{R} g_{a b} {h_{C}, (u, v), (u_{r_{1}}, u_{r_{2}})} D_{r_{1}, i j} D_{r_{2}, i k}, \\ {\overset{ˇ}{S}}_{a b} (u, v) = \sum_{i = 1}^{n} \sum_{r_{1} = 1}^{R} \sum_{r_{2} = 1}^{R} g_{a b} {h_{C}, (u, v), (u_{r_{1}}, u_{r_{2}})} ϖ_{r_{1}, i} ϖ_{r_{2}, i}, \end{matrix}$ both of which together with (9) yield the binned approximation of ${\tilde{Σ}}_{j k}$ as ${\overset{ˇ}{Σ}}_{j k} = \sum_{i = 1}^{n} ({\overset{ˇ}{W}}_{1} {\overset{ˇ}{T}}_{00, ijk} + {\overset{ˇ}{W}}_{2} {\overset{ˇ}{T}}_{10, ijk} + {\overset{ˇ}{W}}_{3} {\overset{ˇ}{T}}_{01, ijk}),$ where ${\overset{ˇ}{W}}_{1}, {\overset{ˇ}{W}}_{2}$ , and ${\overset{ˇ}{W}}_{3}$ are the binned approximations of W₁, W₂, and $W_{3},$ computed by replacing the related S_ab’s in (S.12) of the supplementary material with the ${\overset{ˇ}{S}}_{a b}$ ’s. It is worth noting that, for each pair $(j, k),$ the above binned implementation reduces the number of operations (i.e., additions and multiplications) from $O (R^{2} \sum_{i = 1}^{n} L_{i}^{2})$ to $O (n R^{2} + R^{4}),$ since the kernel evaluations in $g_{a b} {h_{C}, (u, v), (u_{r_{1}}, u_{r_{2}})}$ no longer depend on individual observations. presents the computational complexity analysis of LLS and BinLLS under Models (6) and (19). It reveals that the binned implementation dramatically improves the computational speed for both densely and sparsely sampled functional data, which is also supported by the empirical evidence in Section 5.3.

Table 1 The computational complexity analysis of LLS, BinLLS under Models (6), (19) when evaluating the corresponding smoothed covariance function estimates at a grid of R × R points.

Display Table

To aid the binned implementation of the smoothed adaptive functional thresholding estimator, we then derive the binned approximation of the variance factor ${\tilde{Ψ}}_{j k},$ denoted by ${\overset{ˇ}{Ψ}}_{j k} .$ It follows from (13) that $V_{a b, ijk}$ can be approximated by $\begin{matrix} {\overset{ˇ}{V}}_{a b, ijk} (u, v) = \sum_{r_{1} = 1}^{R} \sum_{r_{2} = 1}^{R} g_{a b} (h_{C}, (u, v), (u_{r_{1}}, u_{r_{2}})) \\ {D_{r_{1}, i j} D_{r_{2}, i k} - {\overset{ˇ}{Σ}}_{j k} (u, v) ϖ_{r_{1}, i} ϖ_{r_{2}, i}} . \end{matrix}$

Substituting each term in (11) with its binned approximation, we obtain that ${\overset{ˇ}{Ψ}}_{j k} = I_{j k} \sum_{i = 1}^{n} ({\overset{ˇ}{W}}_{1} {\overset{ˇ}{V}}_{00, ijk} + {\overset{ˇ}{W}}_{2} {\overset{ˇ}{V}}_{10, ijk} + {\overset{ˇ}{W}}_{3} {\overset{ˇ}{V}}_{01, ijk})^{2} .$

It is worth mentioning that, when $j = k,$ the binned approximations of ${\tilde{Σ}}_{j j}$ and ${\tilde{Ψ}}_{j j}$ can be computed in a similar fashion except that the terms corresponding to r₁ = r₂ should be excluded from all double summations over ${1, \dots, R}^{2} .$ Finally, we obtain the binned adaptive functional thresholding estimator ${\overset{ˇ}{Σ}}_{A} = {({\overset{ˇ}{Σ}}_{j k}^{A})}_{p \times p}$ with ${\overset{ˇ}{Σ}}_{j k}^{A} = {\overset{ˇ}{Ψ}}_{j k}^{1 / 2} \times s_{λ} ({\overset{ˇ}{Σ}}_{j k} / {\overset{ˇ}{Ψ}}_{j k}^{1 / 2})$ and the corresponding universal thresholding estimator ${\overset{ˇ}{Σ}}_{U} = {({\overset{ˇ}{Σ}}_{j k}^{U})}_{p \times p}$ with ${\overset{ˇ}{Σ}}_{j k}^{U} = s_{λ} ({\overset{ˇ}{Σ}}_{j k}) .$

5 Simulations

5.1 Setup

We conduct a number of simulations to compare adaptive functional thresholding estimators to universal functional thresholding estimators. Sections 5.2 and 5.3 consider scenarios where random functions are fully and partially observed, respectively.

In each scenario, to mimic the infinite-dimensionality of random curves, we generate functional variables by $X_{i j} (u) = s {(u)}^{T} θ_{i j}$ for $i = 1, \dots, n, j = 1, \dots, p$ and $u \in U = [0, 1],$ where $s (u)$ is a 50-dimensional Fourier basis function and $θ_{i} = {(θ_{i 1}^{T}, \dots, θ_{i p}^{T})}^{T} \in R^{50 p}$ is generated from a mean zero multivariate Gaussian distribution with block covariance matrix $Ω \in R^{50 p \times 50 p},$ whose (j, k)th block is $Ω_{j k} \in R^{50 \times 50}$ for $j, k = 1, \dots, p .$ The functional sparsity pattern in $Σ = {Σ_{j k} (\cdot, \cdot)}_{p \times p}$ with its (j, k)th entry $Σ_{j k} (u, v) = s {(u)}^{T} Ω_{j k} s (v)$ can be characterized by the block sparsity structure in $Ω .$ Define $Ω_{j k} = ω_{j k} D$ with $D = diag (1^{- 2}, \dots, 50^{- 2})$ and hence $cov (θ_{ijk}, θ_{ijk'}) \sim k^{- 2} I (k = k')$ for $k, k' = 1, \dots, 50.$ Then we generate $Ω$ with different block sparsity patterns as follows.

Model 1 (block banded). For $j, k = 1, \dots, p / 2, ω_{j k} = {(1 - | j - k | / 10)}_{+}$ . For $j, k = p / 2 + 1, \dots, p, ω_{j k} = 4 I (j = k) .$
Model 2 (block sparse without any special structure). For $j, k = p / 2 + 1, \dots, p, ω_{j k} = 4 I (j = k) .$ For $j, k = 1, \dots, p / 2$ , we generate $ω = {(ω_{j k})}_{p / 2 \times p / 2} = B + δ' I_{p / 2},$ where elements of $B$ are sampled independently from $Uniform [0.3, 0.8]$ with probability 0.2 or 0 with probability $0.8,$ and $δ' = \max {- λ_{\min} (B), 0} + 0.01$ to guarantee the positive definiteness of $Ω .$

We implement a cross-validation approach (Bickel and Levina Citation2008) for choosing the optimal thresholding parameter $\hat{λ}$ in ${\hat{Σ}}_{A}$ . Specifically, we randomly divide the sample ${X_{i} : i = 1, \dots, n}$ into two subsamples of size n₁ and $n_{2},$ where $n_{1} = n (1 - 1 / \log n)$ and $n_{2} = n / \log n$ and repeat this N times. Let ${\hat{Σ}}_{A, 1}^{(ν)} (λ)$ and ${\hat{Σ}}_{S, 2}^{(ν)}$ be the adaptive functional thresholding estimator as a function of λ and the sample covariance function based on n₁ and n₂ observations, respectively, from the νth split. We select the optimal $\hat{λ}$ by minimizing $\hat{err} (λ) = N^{- 1} \sum_{ν = 1}^{N} | | {\hat{Σ}}_{A, 1}^{(ν)} (λ) - {\hat{Σ}}_{S, 2}^{(ν)} | |_{F}^{2},$ where $| | \cdot | |_{F}$ denotes the functional version of Frobenius norm, that is, for any $Q = {Q_{j k} (\cdot, \cdot)}_{p \times p}$ with each $Q_{j k} \in S, | | Q | |_{F} = {(\sum_{j, k} | | Q_{j k} | |_{S}^{2})}^{1 / 2} .$ The optimal thresholding parameters in ${\hat{Σ}}_{U}, {\tilde{Σ}}_{A}, {\tilde{Σ}}_{U}, {\overset{ˇ}{Σ}}_{A}, {\overset{ˇ}{Σ}}_{U}$ can be selected in a similar fashion.

5.2 Fully Observed Functional Data

We compare the adaptive functional thresholding estimator ${\hat{Σ}}_{A}$ to the universal functional thresholding estimator ${\hat{Σ}}_{U}$ under hard, soft, SCAD (with a = 3.7) and adaptive lasso (with η = 3) functional thresholding rules, where the corresponding $\hat{λ}$ ’s are selected by the cross-validation with $N = 5.$ We generate n = 100 observations for $p = 50, 100, 150$ and replicate each simulation 100 times. We examine the performance of all competing approaches by estimation and support recovery accuracies. In terms of the estimation accuracy, reports numerical summaries of losses measured by functional versions of Frobenius and matrix $l_{1}$ norms. To assess the support recovery consistency, we present in the average of true positive rates (TPRs) and false positive rates (FPRs), defined as $TPR = # {(j, k) : | | {\hat{Σ}}_{j k} | |_{S} \neq 0 and | | Σ_{j k} | |_{S} \neq 0} / # {(j, k) : | | Σ_{j k} | |_{S} \neq 0}$ and $FPR = # {(j, k) : | | {\hat{Σ}}_{j k} | |_{S} \neq 0 and | | Σ_{j k} | |_{S} = 0} / # {(j, k) : | | Σ_{j k} | |_{S} = 0} .$ Since the results under Models 1 and 2 have similar trends, we only present the numerical results under Model 2 here to save space. See Tables 9 and 10 of the supplementary material for results under Model 1.

Table 2 The average (standard error) functional matrix losses over 100 simulation runs.

Display Table

Table 3 The average TPRs/FPRs over 100 simulation runs.

Display Table

Several conclusions can be drawn from and 9–10 of the supplementary material. First, in all scenarios, ${\hat{Σ}}_{A}$ provides substantially improved accuracy over ${\hat{Σ}}_{U}$ regardless of the thresholding rule or the loss used. We also obtain the sample covariance function ${\hat{Σ}}_{S},$ the results of which deteriorate severely compared with ${\hat{Σ}}_{A}$ and ${\hat{Σ}}_{U} .$ Second, for support recovery, again ${\hat{Σ}}_{A}$ uniformly outperforms ${\hat{Σ}}_{U}$ , which fails to recover the functional sparsity pattern especially when p is large. Third, the adaptive functional thresholding approach using the hard and the adaptive lasso functional thresholding rules tends to have lower losses and lower TPRs/FPRs than that using the soft and the SCAD functional thresholding rules.

5.3 Partially Observed Functional Data

In this section, we assess the finite-sample performance of LLS and BinLLS methods to handle partially observed functional data. We first generate random functions $X_{i j} (\cdot)$ for $i = 1, \dots, n, j = 1, \dots, p$ by the same procedure as in Section 5.1 with either nonsparse or sparse $Σ$ depending on p. We then generate the observed values Z_ijl from (19), where the measurement locations U_il and errors $ε_{ijl}$ are sampled independently from $Uniform [0, 1]$ and $N (0, {0.5}^{2}),$ respectively. We consider settings of n = 100 and $L_{i} = 11, 21, 51, 101,$ changing from sparse to moderately dense to very dense measurement schedules. We use the Gaussian kernel with the optimal bandwidths proportional to $n^{- 1 / 6}, {(n L_{i}^{2})}^{- 1 / 6}$ and $n^{- 1 / 4},$ respectively, as suggested in Zhang and Wang (Citation2016), so for the empirical work in this article we choose the proportionality constants in the range $(0, 1],$ which gives good results in all settings we consider.

To compare BinLLS with LLS in terms of the computational speed and estimation accuracy, we first consider a low-dimensional example p = 6 with nonsparse $Σ$ generated by modifying Model 1 with $ω_{j k} = {(1 - | j - k | / 10)}_{+}$ for $j, k = 1, \dots, 6.$ In addition to our proposed smoothing methods, we also implement local-linear-smoother-based pre-smoothing and its binned implementation, denoted as LLS-P and BinLLS-P, respectively. reports numerical summaries of estimation errors evaluated at R = 21 equally spaced points in $[0, 1]$ and the corresponding CPU time on the processor Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz. The results for the sample covariance function ${\hat{Σ}}_{S}$ based on fully observed $X_{1} (\cdot), \dots, X_{n} (\cdot)$ are also provided as the baseline for comparison. Note that, LLS is too slow to implement for the case $L_{i} = 101,$ so we do not report its result here.

Table 4 The average (standard error) functional matrix losses and average CPU time for p = 6 over 100 simulation runs.

Display Table

A few trends are observable from . First, the binned implementations (BinLLS and BinLLS-P) attain similar or even lower estimation errors compared with their direct implementations (LLS and LLS-P) under all scenarios, while resulting in considerably faster computational speeds especially under dense designs. For example, BinLLS runs over 400 times faster than LLS when $L_{i} = 51.$ Second, all methods provide higher estimation accuracies as L_i increases, and enjoy similar performance when functions are very densely observed, for example, L_i = 51 and 101, compared with the fully observed functional case. However, the performance of LLS-P and BinLLS-P deteriorates severely under sparse designs, for example, L_i = 11 and 21, since limited information is available from a small number of observations per subject. Among all competitors, we conclude that BinLLS is overall a unified approach that can handle both sparsely and densely sampled functional data well with increased computational efficiency and guaranteed estimation accuracy.

We next examine the performance of BinLLS-based adaptive and universal functional thresholding estimators in terms of estimation accuracy and support recovery consistency using the same performance measures as in . and Tables 11–14 of the supplementary material report numerical results for settings of p = 50 and 100 satisfying Models 1 and 2 under different measurement schedules. We observe a few apparent patterns from and 11–14. First, ${\overset{ˇ}{Σ}}_{A}$ substantially outperforms ${\overset{ˇ}{Σ}}_{U}$ with significantly lower estimation errors in all settings. Second, ${\overset{ˇ}{Σ}}_{A}$ works consistently well in recovering the functional sparsity structures especially under the soft and SCAD functional thresholding rules, while ${\overset{ˇ}{Σ}}_{U}$ fails to identify such patterns. Third, the estimation and support recovery consistencies of ${\overset{ˇ}{Σ}}_{A}$ and ${\overset{ˇ}{Σ}}_{U}$ are improved as L_i increases. When curves are very densely observed, for example, $L_{i} = 101,$ we observe that both estimators enjoy similar performance with ${\hat{Σ}}_{A}$ and ${\hat{Σ}}_{U}$ in and 9–10 of the supplementary material. Such observation provides empirical evidence to support our remark for Theorem 3 about the same convergence rate between very densely observed and fully observed functional scenarios.

Table 5 The average (standard error) functional matrix losses for partially observed functional scenarios and p = 50 over 100 simulation runs.

Display Table

Table 6 The average TPRs/FPRs for partially observed functional scenarios and p = 50 over 100 simulation runs.

Display Table

6 Real Data

In this section, we aim to investigate the association between the brain functional connectivity and fluid intelligence (gF), the capacity to solve problems independently of acquired knowledge (Cattell Citation1987). The dataset contains subjects of resting-state fMRI scans and the corresponding gF scores, measured by the 24-item Raven’s Progressive Matrices, from the Human Connectome Project (HCP). We follow many recent proposals based on HCP by modeling signals as multivariate random functions with each region of interest (ROI) representing one random function (Lee et al. in press; Miao, Zhang, and Wong in press; Zapata, Oh, and Petersen Citation2022). We focus our analysis on $n_{low} = 73$ subjects with intelligence scores $g F \leq 8$ and $n_{high} = 85$ subjects with $g F \geq 23$ , and consider p = 83 ROIs of three generally acknowledged modules in neuroscience study (Finn et al. Citation2015): the medial frontal (29 ROIs), frontoparietal (34 ROIs) and default mode modules (20 ROIs). For each subject, the BOLD signals at each ROI are collected every 0.72 sec for a total of L = 1200 measurement locations (14.4 min). We first implement the ICA-FIX preprocessed pipeline (Glasser et al. Citation2013) and a standard band-pass filter at $[0.01, 0.08]$ Hz to exclude frequency bands not implicated in resting state functional connectivity (Biswal et al. Citation1995). Figure 12 of the supplementary material displays examplified trajectories of pre-smoothed data. The adaptive functional thresholding method is then adopted to estimate the sparse covariance function and therefore the brain networks.

The sparsity structures in ${\hat{Σ}}_{A}$ for both groups are displayed in . With $\hat{λ}$ selected by the cross-validation, the network associated with ${\hat{Σ}}_{A}$ for subjects with $g F \geq 23$ is more densely connected than that with $g F \leq 8$ , as evident from . We further set the sparsity level to 70% and $85 %,$ and present the corresponding sparsity patterns in . The results clearly indicate the existence of three diagonal blocks under all sparsity levels, complying with the identification of the medial frontal, frontoparietal and default mode modules in Finn et al. (Citation2015). We also implement the universal functional thresholding method. However, compared with ${\hat{Σ}}_{A},$ the results of ${\hat{Σ}}_{U}$ suffer from the heteroscedasticity, as demonstrated in Section 5 and Section E.3 of the supplementary material, and fail to detect any noticeable block structure, hence, we choose not to report them here. To explore the impact of gF on the functional connectivity, we compute the connectivity strength using the standardized form $| | {\hat{Σ}}_{j k}^{A} | |_{S} / {(| | {\hat{Σ}}_{j j}^{A} | |_{S} | | {\hat{Σ}}_{k k}^{A} | |_{S})}^{1 / 2}$ for $j, k = 1 \dots, p .$ Interestingly, we observe from that subjects with $g F \geq 23$ tend to have enhanced brain connectivity in the medial frontal and frontoparietal modules, while the connectivity strength in the default mode module declines. This agrees with existing neuroscience literature reporting a strong positive association between intelligence score and the medial frontal/frontoparietal functional connectivity in the resting state (Van Den Heuvel et al. Citation2009; Finn et al. Citation2015), and lends support to the conclusion that lower default mode module activity is associated with better cognitive performance (Anticevic et al. Citation2012). See also Section E.3 of the supplementary material, in which we illustrate our adaptive functional thresholding estimation using another ADHD dataset.

Fig. 1 Estimated sparsity structures in ${\hat{Σ}}_{A}$ using soft functional thresholding rule at fluid intelligence $g F \leq 8$ and $g F \geq 23$ : (a)–(b) with the corresponding $\hat{λ}$ selected by 5-fold cross-validation; (c)–(f) with the estimated functional sparsity levels set at 70% and 85%.

Fig. 1 Estimated sparsity structures in Σ̂A using soft functional thresholding rule at fluid intelligence gF≤8 and gF≥23: (a)–(b) with the corresponding λ̂ selected by 5-fold cross-validation; (c)–(f) with the estimated functional sparsity levels set at 70% and 85%.

Fig. 2 The connectivity strengths in at fluid intelligence $g F \leq 8$ and $g F \geq 23$ . Salmon, orange and yellow nodes represent the ROIs in the medial frontal, frontoparietal and default mode modules, respectively. The edge color from cyan to blue corresponds to the value of $| | {\hat{Σ}}_{j k}^{A} | |_{S} / {(| | {\hat{Σ}}_{j j}^{A} | |_{S} | | {\hat{Σ}}_{k k}^{A} | |_{S})}^{1 / 2}$ from small to large.

Fig. 2 The connectivity strengths in Figure 1(e)–(f) at fluid intelligence gF≤8 and gF≥23. Salmon, orange and yellow nodes represent the ROIs in the medial frontal, frontoparietal and default mode modules, respectively. The edge color from cyan to blue corresponds to the value of ||Σ̂jkA||S/(||Σ̂jjA||S||Σ̂kkA||S)1/2 from small to large.

Supplementary Materials

The supplementary materials contain all the technical proofs, further methodological derivations and additional discussion and empirical results. We also provide the codes and datasets in Sections 5 and 6 in the supplementary materials.

Supplemental material

Supplemental Material

Download Zip (133.7 MB)

Acknowledgments

We are grateful to the editor, the associate editor and two referees for their insightful comments and suggestions, which have led to significant improvement of our article.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

Shaojun Guo was partially supported by the National Natural Science Foundation of China (No. 11771447).

References

Anticevic, A., Cole, M. W., Murray, J. D., Corlett, P. R., Wang, X.-J., and Krystal, J. H. (2012), “The Role of Default Network Deactivation in Cognition and Disease,” Trends in Cognitive Sciences, 16, 584–592. DOI: 10.1016/j.tics.2012.10.008.
PubMed Web of Science ®Google Scholar
Avella-Medina, M., Battey, H. S., Fan, J., and Li, Q. (2018), “Robust estimation of High-Dimensional Covariance and Precision Matrices,” Biometrika, 105, 271–284. DOI: 10.1093/biomet/asy011.
PubMed Web of Science ®Google Scholar
Bickel, P. J., and Levina, E. (2008), “Covariance Regularization by Thresholding,” The Annals of Statistics, 36, 2577–2604. DOI: 10.1214/08-AOS600.
Web of Science ®Google Scholar
Biswal, B., Zerrin Yetkin, F., Haughton, V. M., and Hyde, J. S. (1995), “Functional Connectivity in the Motor Cortex of Resting Human Brain Using Echo-Planar MRI,” Magnetic Resonance in Medicine, 34, 537–541. DOI: 10.1002/mrm.1910340409.
PubMed Web of Science ®Google Scholar
Cai, T., and Liu, W. (2011), “Adaptive Thresholding for Sparse Covariance Matrix Estimation,” Journal of the American Statistical Association, 106, 672–684. DOI: 10.1198/jasa.2011.tm10560.
Web of Science ®Google Scholar
Cattell, R. B. (1987), Intelligence: Its Structure, Growth and Action, Amsterdam: Elsevier.
Google Scholar
Chang, C., and Glover, G. H. (2010), “Time–Frequency Dynamics of Resting-State Brain Connectivity Measured with fMRI,” Neuroimage, 50, 81–98. DOI: 10.1016/j.neuroimage.2009.12.011.
PubMed Web of Science ®Google Scholar
Chen, Z., and Leng, C. (2016), “Dynamic Covariance Models,” Journal of the American Statistical Association, 111, 1196–1207. DOI: 10.1080/01621459.2015.1077712.
Web of Science ®Google Scholar
Chiou, J.-M., Yang, Y.-F., and Chen, Y.-T. (2016), “Multivariate Functional Linear Regression and Prediction,” Journal of Multivariate Analysis, 146, 301–312. DOI: 10.1016/j.jmva.2015.10.003.
Web of Science ®Google Scholar
Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360. DOI: 10.1198/016214501753382273.
Web of Science ®Google Scholar
Fan, J., and Marron, J. S. (1994), “Fast Implementations of Nonparametric Curve Estimators,” Journal of Computational and Graphical Statistics, 3, 35–56. DOI: 10.2307/1390794.
Google Scholar
Finn, E. S., Shen, X., Scheinost, D., Rosenberg, M. D., Huang, J., Chun, M. M., Papademetris, X., and Constable, R. T. (2015), “Functional Connectome Fingerprinting: Identifying Individuals Using Patterns of Brain Connectivity,” Nature Neuroscience, 18, 1664–1671. DOI: 10.1038/nn.4135.
PubMed Web of Science ®Google Scholar
Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R. et al. (2013), “The Minimal Preprocessing Pipelines for the Human Connectome Project,” Neuroimage, 80, 105–124. DOI: 10.1016/j.neuroimage.2013.04.127.
PubMed Web of Science ®Google Scholar
Guo, S., Qiao, X., and Wang, Q. (2022), “Factor Modelling for High-Dimensional Functional Time Series,” arXiv:2112.13651v2 .
Google Scholar
Happ, C., and Greven, S. (2018), “Multivariate Functional Principal Component Analysis for Data Observed on Different (dimensional) Domains,” Journal of the American Statistical Association, 113, 649–659. DOI: 10.1080/01621459.2016.1273115.
Web of Science ®Google Scholar
Kong, D., Xue, K., Yao, F., and Zhang, H. H. (2016), “Partially Functional Linear Regression in High Dimensions,” Biometrika, 103, 147–159. DOI: 10.1093/biomet/asv062.
Web of Science ®Google Scholar
Kosorok, M. R. (2008), Introduction to Empirical Processes and Semiparametric Inference, Springer Series in Statistics, New York: Springer.
Google Scholar
Lee, K.-Y., Ji, D., Li, L., Constable, T., and Zhao, H. (in press), “Conditional Functional Graphical Models,” Journal of the American Statistical Association, DOI: 10.1080/01621459.2021.1924178.
Web of Science ®Google Scholar
Li, B., and Solea, E. (2018), “A Nonparametric Graphical Model for Functional Data with Application to Brain Networks based on fMRI,” Journal of the American Statistical Association, 113, 1637–1655. DOI: 10.1080/01621459.2017.1356726.
Web of Science ®Google Scholar
Lotte, F., Bougrain, L., Cichocki, A., Clerc, M., Congedo, M., Rakotomamonjy, A., and Yger, F. (2018), “A Review of Classification Algorithms for EEG-based Brain–Computer Interfaces: A 10 Year Update,” Journal of Neural Engineering, 15, 031005. DOI: 10.1088/1741-2552/aab2f2.
PubMed Web of Science ®Google Scholar
Miao, R., Zhang, X., and Wong, R. K. (in press), “A Wavelet-based Independence Test for Functional Data with an Application to MEG Functional Connectivity,” Journal of the American Statistical Association, DOI: 10.1080/01621459.2021.2020126.
Web of Science ®Google Scholar
Park, J., Ahn, J., and Jeon, Y. (2021), “Sparse Functional Linear Discriminant Analysis, Biometrika, 109, 209–226. DOI: 10.1093/biomet/asaa107.
Web of Science ®Google Scholar
Qiao, X., Guo, S., and James, G. (2019), “Functional Graphical Models,” Journal of the American Statistical Association, 114, 211–222. DOI: 10.1080/01621459.2017.1390466.
Web of Science ®Google Scholar
Qiao, X., Qian, C., James, G. M., and Guo, S. (2020), “Doubly Functional Graphical Models in High Dimensions,” Biometrika, 107, 415–431. DOI: 10.1093/biomet/asz072.
Web of Science ®Google Scholar
Rogers, B. P., Morgan, V. L., Newton, A. T., and Gore, J. C. (2007), “Assessing Functional Connectivity in the Human Brain by fMRI,” Magnetic Resonance Imaging, 25, 1347–1357. DOI: 10.1016/j.mri.2007.03.007.
PubMed Web of Science ®Google Scholar
Rothman, A. J., Levina, E., and Zhu, J. (2009), “Generalized Thresholding of Large Covariance Matrices,” Journal of the American Statistical Association, 104, 177–186. DOI: 10.1198/jasa.2009.0101.
Web of Science ®Google Scholar
Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G., and Davis, R. W. (2005), “Significance Analysis of Time Course Microarray Experiments,” Proceedings of the National Academy of Sciences, 102, 12837–12842. DOI: 10.1073/pnas.0504609102.
PubMed Web of Science ®Google Scholar
Van Den Heuvel, M. P., Stam, C. J., Kahn, R. S., and Pol, H. E. H. (2009), “Efficiency of Functional Brain Networks and Intellectual Performance,” Journal of Neuroscience, 29, 7619–7624. DOI: 10.1523/JNEUROSCI.1443-09.2009.
PubMed Web of Science ®Google Scholar
Vu, V. Q., and Lei, J. (2013), “Minimax Sparse Principal Subspace Estimation in High Dimensions,” The Annals of Statistics, 41, 2905–2947. DOI: 10.1214/13-AOS1151.
Web of Science ®Google Scholar
Wang, H., Peng, B., Li, D., and Leng, C. (2021), “Nonparametric Estimation of Large Covariance Matrices with Conditional Sparsity,” Journal of Econometrics, 223, 53–72. DOI: 10.1016/j.jeconom.2020.09.002.
Web of Science ®Google Scholar
Yao, F., Müller, H.-G., and Wang, J.-L. (2005), “Functional Data Analysis for Sparse Longitudinal Data,” Journal of the American Statistical Association, 100, 577–590. DOI: 10.1198/016214504000001745.
Web of Science ®Google Scholar
Yuan, M., and Lin, Y. (2006), “Model Selection and Estimation in Regression with Grouped Variables,” Journal of the Royal Statistical Society, Series B, 68, 49–67. DOI: 10.1111/j.1467-9868.2005.00532.x.
Google Scholar
Zapata, J., Oh, S. Y., and Petersen, A. (2022), “Partial Separability and Functional Graphical Models for Multivariate Gaussian Processes, Biometrika, 109, 665–681. DOI: 10.1093/biomet/asab046.
Web of Science ®Google Scholar
Zhang, J.-T., and Chen, J. (2007), “Statistical Inferences for Functional Data,” The Annals of Statistics, 35, 1052–1079. DOI: 10.1214/009053606000001505.
Web of Science ®Google Scholar
Zhang, X., and Wang, J.-L. (2016), “From Sparse to Dense Functional Data and Beyond,” The Annals of Statistics, 44, 2281–2321. DOI: 10.1214/16-AOS1446.
Web of Science ®Google Scholar
Zou, H. (2006), “The Adaptive Lasso and Its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429. DOI: 10.1198/016214506000000735.
Web of Science ®Google Scholar

Adaptive Functional Thresholding for Sparse Covariance Function Estimation in High Dimensions

Abstract

1 Introduction