Search in:

Statistical Theory and Related Fields Volume 5, 2021 - Issue 4

Submit an article Journal homepage

Free access

504

Views

CrossRef citations to date

Altmetric

Listen

Articles in the special topic of Bayesian analysis

Covariance estimation via fiducial inference

W. Jenny Shia Financial Planning & Analysis, MassMutual, Boston, MA, USA

https://orcid.org/0000-0002-5564-0246 View further author information

Jan Hannigb Department of Statistics & Operations Research, University of North Carolina, Chapel Hill, NC, USACorrespondence[email protected]

https://orcid.org/0000-0002-4164-0173 View further author information

Randy C. S. Laic Department of Statistics, University of California, Davis, CA, USA

https://orcid.org/0000-0002-4291-9256 View further author information

Thomas C. M. Leec Department of Statistics, University of California, Davis, CA, USA

https://orcid.org/0000-0001-7067-405X View further author information

Pages 316-331 | Received 23 May 2020, Accepted 10 Jan 2021, Published online: 15 Feb 2021

Cite this article
https://doi.org/10.1080/24754269.2021.1877950
CrossMark

In this article

1. Introduction
2. Generalised fiducial inference
3. A fiducial approach to covariance estimation
4. Clique model
5. Discussion
Supplemental material
Disclosure statement
Additional information
References
Appendixes

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

As a classical problem, covariance estimation has drawn much attention from the statistical community for decades. Much work has been done under the frequentist and Bayesian frameworks. Aiming to quantify the uncertainty of the estimators without having to choose a prior, we have developed a fiducial approach to the estimation of covariance matrix. Built upon the Fiducial Berstein–von Mises Theorem, we show that the fiducial distribution of the covariate matrix is consistent under our framework. Consequently, the samples generated from this fiducial distribution are good estimators to the true covariance matrix, which enable us to define a meaningful confidence region for the covariance matrix. Lastly, we also show that the fiducial approach can be a powerful tool for identifying clique structures in covariance matrices.

Keywords:

covariance estimation
sparsity
fiducial inference
cliques

2010 Mathematics Subject Classifications:

Primary 62J10
62E20
62F25
Secondary 62F12

1. Introduction

Estimating covariance matrices has historically been a challenging problem. Many regression-based methods have emerged in the last few decades, especially in the concept of ‘large p small n’. Among the notable methods, there are the graphical LASSO algorithms (Friedman et al., Citation2008, Citation2010; Rothman, Citation2012). Pourahmadi provided a detailed overview on the progress of covariance estimation (Pourahmadi, Citation2011). The Positive Definite Sparse Covariance Estimators (PDSCE) method (Rothman, Citation2012) has grained great popularity due to its performance comparing to other current methods, although it only produces a point estimator.

Aiming to have a distribution of good covariance estimators, we propose a generalised fiducial approach. The ideas underpinning fiducial inference were introduced by Fisher (Citation1922,Citation1930,Citation1933,Citation1935), whose intention was to overcome the need for priors and other issues with Bayesian methods perceived at the time. The procedure of fiducial inference allows to obtain a measure on the parameter space without requiring priors and defines approximate pivots for parameters of interest. It is ideal when a priori information about the parameters is unavailable. The key recipe of the fiducial argument is the data generating equation. Roughly, the generalised fiducial likelihood is defined as the distribution of the functional inverse of the data generating mechanism.

One great advantage of the fiducial approach to covariance matrix estimation is that, without specifying a prior, it produces a family of matrices that are close to the true covariance with a probabilistic characterisation using the fiducial likelihood function. This attractive property enables a meaningful definition for matrix confidence regions.

We are particularly interested in a high-dimensional multivariate linear model setting with possibly an atypical sparsity constraint. Instead of classical sparsity assumptions on the covariance matrix, we consider a type of experimental design that enforces sparsity on the covariate matrix. This phenomenon often arises in the studies of metabolomics and proteomics. One example of this setup is modelling the relationship between a set of gene expression levels and a list of metabolomic data. The expression levels of the genes serve as the predictor variables while the response variables are a variety of metabolite levels, such as sugar and triglycerides. It is known that only a small subset of genes contribute to each metabolite level, and each gene can be responsible for just a few metabolite levels.

Under the sparse covariate setting, we derive the generalised fiducial likelihood of the covariate matrix based on given observations and prove its asymptotic consistency as the sample size increases. For the covariance with community structures (cliques), we prove the necessary conditions for achieving accurate clique structure estimation. Samples from the fiducial distribution of a covariate matrix can be generated using Monte Carlo methods. In the general case, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm may be needed. Similar to the classic likelihood functions, fiducial distributions favour models with more parameters. Therefore, in the case where the exact sparsity structure of the covariate is unclear, a penalty term needs to be added. To obtain a family of covariance estimators in the general case, we adapt a zeroth-order method and develop an efficient RJMCMC algorithm that samples from the penalised fiducial distribution.

The rest of the paper is arranged as follows. In Section 2, we will provide a brief background and development on fiducial inference. Then we will introduce the fiducial model for covariance estimation and derive the Generalised Fiducial Distribution (GFD) for the covariate and covariance matrices and examine the asymptotic property of the GFD of the covariance matrix under minor assumptions in Section 3. Some toy examples on sampling from GFD will also be shown. Section 4 focuses on the clique model, where we show some theoretical results for the clique model and how the fiducial approach can be applied to uncover clique structures. Finally, Section 5 concludes the paper with a summary and a short discussion on the relationship of our approach to Bayesian methods.

2. Generalised fiducial inference

2.1. Brief background

Fiducial inference was first proposed by Fisher (Citation1930) when he introduced the concept of a fiducial distribution of a parameter. In the case of a single parameter family of distributions, Fisher gave the following definition for a fiducial density $f (θ | x)$ of the parameter based on a single observation x for the case where the cumulative distribution function $F (x | θ)$ is a monotonic decreasing function of θ: (1) $\begin{aligned} f (θ | x) \propto - \frac{\partial F (x | θ)}{\partial θ} . \end{aligned}$ (1) A fiducial distribution can be viewed as a Bayesian posterior distribution without hand picking priors. In many single parameter distribution families, Fisher's fiducial intervals coincide with classical confidence interval. For families of distributions with multiple parameters, the fiducial approach leads to confidence set. The definition of fiducial inference has been generalised in the past decades. Hannig et al. (Citation2016) provide a detailed review on the philosophy and current development on the subject.

The generalised fiducial approach has been applied to a variety of models, both parametric and nonparametric, both continuous and discrete. These applications include bioequivalence (Hannig et al., Citation2006), variance components (Cisewski & Hannig, Citation2012; Lidong et al., Citation2008; Li et al., Citation2018), problems of metrology (Hannig et al., Citation2007,Citation2003; Wang et al., Citation2012; Wang & Iyer, Citation2005, Citation2006a, Citation2006b), inter laboratory experiments and international key comparison experiments (Hannig et al., Citation2018; Iyer et al., Citation2004), maximum mean of a multivariate normal distribution (Wandler & Hannig, Citation2011), multiple comparisons (Wandler & Hannig, Citation2012), extreme value estimation (Wandler & Hannig, Citation2012), mixture of normal and Cauchy distributions (Glagovskiy, Citation2006), wavelet regression (Hannig & Lee, Citation2009), high-dimensional regression (Lai et al., Citation2015; Williams & Hannig, Citation2018), item response models (Liu & Hannig, Citation2016,Citation2017), non-parametric survival function estimation with censoring (Cui & Hannig, Citation2019), Other related approaches include Martin and Liu (Citation2015); Schweder and Hjort (Citation2016); Xie and Singh (Citation2013).

2.2. Generalised fiducial distribution

The idea underlying generalised fiducial inference is built upon a data generating algorithm $G (\cdot, \cdot)$ expressing the relationship between the data X and the parameters θ: (2) $\begin{aligned} X = G (U, θ), \end{aligned}$ (2) where U is the random component of this data generating algorithm whose distribution is known. The data X are assumed to be created by generating a random variable U and plugging it into the data generating algorithm above.

The GFD inverts Equation (Equation2(2) $\begin{aligned} X = G (U, θ), \end{aligned}$ (2) ). Assume that $x \in R^{n}$ is continuous, and the parameter $θ \in R^{p}$ . Under the conditions provided in Hannig et al. (Citation2016), fiducial distribution is shown to have density (3) $\begin{aligned} r (θ | x) = \frac{f (x, θ) J (x, θ)}{\int_{Θ} f (x, θ^{'}) J (x, θ^{'}) d θ^{'}}, \end{aligned}$ (3) where $f (x, θ)$ is the likelihood, and (4) $\begin{aligned} J (x, θ) = D ({\nabla_{θ} G (u, θ)|}_{u = G^{- 1} (x, θ)}) . \end{aligned}$ (4) Here $\nabla_{θ} G (u, θ)$ is the $n \times p$ Jacobian matrix. The exact form of $D (\cdot)$ depends on the choices made in the process of inverting (Equation2(2) $\begin{aligned} X = G (U, θ), \end{aligned}$ (2) ). In this manuscript, we concentrate on what Hannig et al. (Citation2016) calls the $ℓ_{2}$ -norm choice: (5) $\begin{aligned} D (M) = \sqrt{det (M^{T} M / n)}, \end{aligned}$ (5) where $M^{T}$ denotes the matrix transpose of M. Other choices, in particular the $ℓ_{\infty}$ -norm that was often used in the past, leads to similar results is studied in detail in Shi (Citation2015).

3. A fiducial approach to covariance estimation

In this section, we will derive the GFD for the covariance matrix of a multivariate normal random variable. For this problem, various regularised estimators were proposed under the assumption that the true covariance matrix is sparse (Avella-Medina et al., Citation2018; Bickel & Levina, Citation2008a, Citation2008b; Cai & Liu, Citation2011; Furrer & Bengtsson, Citation2007; Huang & Lee, Citation2016; Huang et al., Citation2006; Lam & Fan, Citation2009; Levina et al., Citation2008; Rothman et al., Citation2009, Citation2010; Wu & Pourahmadi, Citation2003). While many of these estimators have been shown to enjoy excellent rates of convergence, so far little work has been done to quantify the uncertainties of their corresponding estimates.

Let $Q^{T}$ denote the transpose of a matrix/vector Q. Denote a collection of n observed p dimensional objects $Y = {Y_{i} : i = 1, \dots, n}$ . For the rest of the paper, we assume p is fixed, unless stated otherwise. Consider the following data generating equation: (6) $\begin{aligned} Y_{i} = A Z_{i}, i = 1, \dots, n; \end{aligned}$ (6) where A is a $p \times p$ matrix of full rank; $Z = {Z_{i} = (z_{i 1}, \dots, z_{i p})^{T}, i = 1, \dots, n}$ are independent and identically distributed (i.i.d) $p \times 1$ random vectors following multivariate normal distribution $N (0, I)$ . Hence, $Y_{i}$ 's are i.i.d random vectors centred at 0 with covariance matrix $A A^{T}$ , (7) $\begin{aligned} i . e . Y_{i} \overset{i . i . d}{\sim} N (0, Σ), w h e r e Σ = A A^{T} . \end{aligned}$ (7) Consequently, we have the likelihood for observations $y$ : (8) $\begin{aligned} f (y, A) & = (2 π)^{- \frac{n p}{2}} | det (A) |^{- n} \\ \times \exp [- \frac{1}{2} tr {n S_{n} (A A^{T})^{- 1}}], \end{aligned}$ (8) where $S_{n} = \frac{1}{n} \sum_{i = 1}^{n} y_{i} y_{i}^{T}$ is the corresponding sample covariance matrix and $tr {\cdot}$ is the trace operator.

We propose to estimate the covariance matrix Σ through the GFD of covariate matrix A: (9) $\begin{aligned} r (A | y) \propto J (y, A) f (y, A) . \end{aligned}$ (9) Define the stacked observation vector $w = (y_{1}^{T}, \dots, y_{n}^{T})^{T} = (w_{1}, \dots, w_{n p})^{T}$ . Denote $u = (u_{1}, \dots, u_{n})$ , such that $y_{i} = G (u_{i}, A), \forall i$ . Let $a_{i j}$ be the $(i, j)$ -th entry of matrix A, i.e., $A = [a_{i j}]_{1 \leq i, j \leq p}$ . The corresponding Jacobian $J (y, A)$ derived from (Equation4(4) $\begin{aligned} J (x, θ) = D ({\nabla_{θ} G (u, θ)|}_{u = G^{- 1} (x, θ)}) . \end{aligned}$ (4) ) is then (10) $\begin{aligned} J (y, A) = D ({\nabla_{A} w|}_{u = G^{- 1} (y, A)}), \end{aligned}$ (10) where $\nabla_{A} w$ is an $n p \times p^{2}$ matrix $\nabla_{A} w = (\begin{matrix} \frac{\partial w_{1}}{\partial a_{11}} & \frac{\partial w_{1}}{\partial a_{12}} & \dots & \frac{\partial w_{1}}{\partial a_{p p}} \\ \frac{\partial w_{2}}{\partial a_{11}} & \frac{\partial w_{2}}{\partial a_{12}} & \dots & \frac{\partial w_{2}}{\partial a_{p p}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial w_{n p}}{\partial a_{11}} & \frac{\partial w_{n p}}{\partial a_{12}} & \dots & \frac{\partial w_{n p}}{\partial a_{p p}} \end{matrix})$ and $D (\cdot)$ is given by (Equation5(5) $\begin{aligned} D (M) = \sqrt{det (M^{T} M / n)}, \end{aligned}$ (5) ).

Often some $a_{k l}$ are known to be zero; a common example is the lower triangular matrix A for which $a_{k l} = 0$ for l>k. Additionally, sparsity on the covariate model can be introduced by having most of the $a_{k l}$ known to be zero as a part of the model. Note that if $a_{k l}$ is known to be zero, as implied by model, then the corresponding $(k, l)$ th column is dropped. Therefore, depending on the sparsity model, the dimension of $\nabla_{A} w$ varies.

Recall, that there is a one-to-one mapping between positive definite matrices Σ and lower triangular matrices A with positive entries on the main diagonal. While we are not assuming A is lower triangular, in order to alleviate some identifiability issues we will assume that all diagonal entries of A are positive, i.e., $a_{k k} > 0, k = 1, \dots, p$ .

3.1. Jacobian for full models

Suppose that none of the entries of A is fixed at zero, namely, the parameter space Θ for A is $R^{p \times p}$ . We will refer this to a full model. Under a full model, $\nabla_{A} w$ consists of p blocks, each of dimension $n p \times p$ . Every row of $\nabla_{A} w$ has non-zero entries in only one block.

By swapping rows in the matrix $\nabla_{A} w$ and plugging $u = G^{- 1} (y, A)$ , we obtain the $n p \times p^{2}$ matrix P: (11) $\begin{aligned} P = (\begin{matrix} U \\ ⋱ \\ U \end{matrix}), \end{aligned}$ (11) where $U = (A^{- 1} y_{1}, \dots, A^{- 1} y_{n})^{T} = V (A^{- 1})^{T}, V = (y_{1}; \dots; y_{n})^{T} .$ Notice that P breaks into p blocks, $B_{1}, \dots, B_{p}$ , where $B_{i} = (\begin{matrix} O_{(n i - n) \times p} \\ U \\ O_{(n p - n i) \times p} \end{matrix})$ , $O_{a \times b}$ denotes a zero matrix with dimension $a \times b$ .

Since as a consequence of Cauchy–Binnet formula (see also Hannig et al. (Citation2016)), swapping rows do not change the value of the Jacobian function (Equation10(10) $\begin{aligned} J (y, A) = D ({\nabla_{A} w|}_{u = G^{- 1} (y, A)}), \end{aligned}$ (10) ). Therefore $J (y, A)$ can be expressed using matrix P: (12) $\begin{aligned} J (y, A) = D (P) = {|det (S_{n})|}^{\frac{p}{2}} | det (A) |^{- p}, \end{aligned}$ (12) where $S_{n} = n^{- 1} \sum_{i = 1}^{n} y_{i} y_{i}^{T}$ is the MLE estimator of the covariance matrix.

By (Equation9(9) $\begin{aligned} r (A | y) \propto J (y, A) f (y, A) . \end{aligned}$ (9) ), the GFD is proportional to (13) $\begin{aligned} r (A | y) & \propto {|det (S_{n})|}^{\frac{p}{2}} (2 π)^{- \frac{n p}{2}} | det (A) |^{- (n + p)} \\ \times \exp [- \frac{1}{2} tr {n S_{n} (A A^{T})^{- 1}}] . \end{aligned}$ (13) By transforming the GFD of A, we conclude that the GFD of $Σ = A A^{T}$ has the inverse Wishart distribution with n degrees of freedom and parameter $n S_{n} .$

3.2. Jacobian for the general case

While having a closed form for the GFD of Σ for the full model, the covariance estimation requires sufficient number of observations (roughly at least $n > 15 (p + 1)$ ) to maintain reasonable power. In the cases where n is small, we reduce the parameter space by introducing a sparse structure $M$ , which determines which entries of A are known to be zero. Recall, that we only consider A with positive diagonal entries.

Now assume the general case with a sparse model $M$ , where some entries of A are known to be zero. Denote the $(i, j)$ th entry of A as $A_{i j}$ . Define the zero index set for the ith row as (14) $\begin{aligned} S_{i} = {j : A_{i j} \equiv 0, j = 1, \dots, p}, i = 1, \dots, p . \end{aligned}$ (14) The set $S_{i}$ indicates which entries of A in the ith row are fixed at zero.

Then Equation (Equation10(10) $\begin{aligned} J (y, A) = D ({\nabla_{A} w|}_{u = G^{- 1} (y, A)}), \end{aligned}$ (10) ) becomes (15) $\begin{aligned} J (y, A) = D (\tilde{P}), \end{aligned}$ (15) where $\tilde{P} = ({\tilde{B}}_{1}, \dots, {\tilde{B}}_{p})$ is the matrix P with correct corresponding columns dropped, i.e., block ${\tilde{B}}_{i}$ is obtained from block $B_{i}$ with $S_{i}$ columns removed.

Let $p_{i}$ be the number of nonzero entries in the ith row of A, and $U_{i}$ be the sub-matrix of U excluding columns in $S_{i}$ , i.e., $U_{i} = U_{[:, - S_{i}]}$ . Consequently, Equation (Equation15(15) $\begin{aligned} J (y, A) = D (\tilde{P}), \end{aligned}$ (15) ) becomes (16) $\begin{aligned} J (y, A) = \sqrt{\prod_{i}^{p} det (U_{i}^{T} U_{i} / n)} . \end{aligned}$ (16)

3.3. Consistency of fiducial distribution

In general, there is no one-to-one correspondence between the covariance matrix Σ and the covariate matrix A. However, if A is sparse enough, e.g., a lower triangular matrix with positive diagonal entries, the identifiability problem vanishes. In this section, we will show that, if there is one-to-one correspondence between Σ and A, then the GFD of the covariate matrix achieves a fiducial Bernstein–von Mises Theorem (Theorem 3.1), which provides theoretical guarantees of asymptotic normality and asymptotic efficiency for the GFD (Hannig et al., Citation2016).

The results here are derived based on FM-distance (Förstner & Moonen, Citation1999). For two symmetric positive definite matrices M and N, with the eigenvalues $λ_{i} (M, N)$ from $det (λ M - N) = 0$ , the FM-distance between the two matrices M and N is (17) $\begin{aligned} d (M, N) = \sqrt{\sum_{i = 1}^{n} \log^{2} λ_{i} (M, N)} . \end{aligned}$ (17) This distance measure is a metric and invariant with respect to both affine transformations of the coordinate system and an inversion of the matrices (Förstner & Moonen, Citation1999).

The Bernstein–von Mises Theorem provides conditions under which the Bayesian posterior distribution is asymptotically normal (van der Vaart, Citation1998; Ghosh & Ramamoorthi, Citation2003). The fiducial Bernstein–von Mises Theorem is an extension that includes a list of conditions under which the GFD is asymptotically normal (Sonderegger & Hannig, Citation2012). Those conditions can be divided into three parts to ensure each of the following:

the Maximum Likelihood Estimator (MLE) is asymptotically normal;
the Bayesian posterior distribution becomes close to that of the MLE;
the fiducial distribution is close to the Bayesian posterior.

It is clear that the MLE of Σ is asymptotically normal. Under our model, the conditions for (b) hold due to Proposition A.1 and the construction of the Jacobian formula; the conditions for (c) are satisfied by Propositions A.2, 3.1. Statements and proofs of the propositions are included in Appendix A.1. Here we state only Proposition 3.1 that contains notation needed in the statement of the main Theorem.

Proposition 3.1

The Jacobian function $J (y, A) ⟹ a . s . π_{Σ_{0}} (A)$ uniformly on compacts in A, where $π_{Σ_{0}} (A)$ is a function of A, independent of the sample size and observations, but depending on the true $Σ_{0}$ . Moreover $π_{Σ_{0}} (A)$ is continuous.

Closely following Sonderegger Hannig (Citation2012), we arrive at Theorem 3.1.

Theorem 3.1

Asymptotic Normality

Let $R_{A}$ be an vectorized observation from the fiducial distribution $r (A | y)$ and denote the density of $B = \sqrt{n} (R_{A} - {\hat{A}}_{n})$ by $π^{*} (B, y)$ , where ${\hat{A}}_{n}$ is the vectorized version of a maximum likelihood estimator. Let $I (A)$ be the Fisher information matrix of the vectorized version of matrix (A). If the sparsity structure is such, that there is one-to-one correspondence between the true covariance matrix $Σ_{0}$ and the covariate matrix $A_{0}$ , $I (A_{0})$ is positive definite, $π_{Σ_{0}} (A_{0}) > 0$ , then (18) $\begin{aligned} \int_{R^{p^{2}}} |π^{*} (B, y) - \frac{\sqrt{d e t | I (A_{0}) |}}{(2 π)^{p}} \\ \times \exp {- B^{T} I (A_{0}) B / 2}| d B ⟹ P_{A_{0}} 0. \end{aligned}$ (18)

See Appendix A.2 for the proof.

Remark 3.1

Since we assume that the diagonal entries of A are positive, the assumption of one-to-one correspondence between $Σ_{0}$ and $A_{0}$ is satisfied if rows and columns of A can be permuted so that the resulting matrix is lower triangular matrix with positive entries on diagonal.

There are other highly sparse matrices for which there might be a finite number of different $A_{0, r}$ so that $Σ_{0} = A_{0, r} A_{0, r}^{T}$ . Of course in this case we cannot distinguish between these $A_{0, r}$ based on data. However, Theorem 3.1 will still be true if we restrict the domain of A to a small enough Euclidean neighbourhood of any of the $A_{0, r}$ . Each of these neighbourhoods being selected with a chance proportional to $π_{Σ_{0}} (A_{0, r})$ .

3.4. Sampling in the general case

Given the true model $M_{0}$ , standard Markov chain Monte Carlo (MCMC) methods can be utilised for the estimation of the covariance matrix. Under the full model and clique model, the GFD of Σ follows either an inverse Wishart distribution or a composite of inverse Wishart distributions (see Section 3). Sampling from the GFD becomes straight forward and it can be done through one of the inverse Wishart random generation functions, e.g., InvWishart (MCMCpack, R) or iwishrnd (Matlab).

When p is small and n is large, the estimation of Σ can always be done through this setting, regardless if there are zero entries in A. The concept of having entries of A fixed at zero is to impose sparsity structure and allow estimation under a high dimensional setting without requiring large number of observations. As in practice the true sparse structure is often unobserved, we will focus on the cases where $M_{0}$ is not given.

For the general case, if the sparse model is unknown, we propose to utilise a reversible jump MCMC (RJMCMC) method to efficiently sample from Equation (Equation20(20) $\begin{aligned} r_{p} (A | M, y) \propto r (A | M, y) \\ \times \exp \{- \sum_{i = 1}^{p} [\frac{1}{2} p_{i} \log (n p) + \log (\begin{matrix} p \\ p_{i} \end{matrix})]\} . \end{aligned}$ (20) ) and simultaneously update $M$ .

RJMCMC is an extension of standard Markov chain Monte Carlo methods that allows simulation of the target distribution on spaces of varying dimensions (Green, Citation1995). The ‘jumps’ refers to moves between models with possibly different parameter spaces. More details on RJMCMC can be found in Shi (Citation2015). Since $M$ is unknown, namely the number and the locations of fixed zeros in the matrix A are unknown, the property of jumping between parameter spaces with different dimension is desired for estimating $Σ = A A^{T}$ . Because the search space for RJMCMC is both within parameter space and between spaces, it is known for slower convergence. To improve efficiency of the algorithm, we adapt the zeroth-order method (Brooks et al., Citation2003) and impose additional sparse constrains.

Assuming that there are fixed zeros in A, then for a $p \times p$ matrix A, the number needed to be estimated is less than $p^{2}$ . If there are many fixed zeros, then this number is much smaller, hence the estimation is feasible even if the number of observations n is less than p. In other words, the sparsity assumption on A allows estimations under a large p small n setting. Suppose the zero entry locations of A are known. The rest of A can be obtain via standard MCMC techniques, such as Metropolis–Hastings.

Figure considers a case with $p = 15, n = 30$ . It shows the confidence curve plot per Markov chain for each statistic of interest. In addition to D2Sig, LogD and Eigvec angle as before, we have GFD ( $\log (r_{p} (A | y))$ without the normalising constant). The initial states for the four Markov chains are SnPa ( $S_{n}$ restricted to maxC (see Section 3.6), in blue), dcho (diagonal matrix of Cholesky decomposition, in cyan), diag (diagonal matrix of $S_{n}$ , in yellow) and oracle (true A, in green). In addition, we include the statistics for Σ, $S_{n}$ , and the PDSCE estimator in comparison with the confidence curves. They are shown as vertical lines as in the previous example.

Figure 1. All the chains show good estimation of covariance matrix. The estimators are better than both the sample covariance matrix and the PDSCE estimator.

The fiducial estimators have confidence curves peak around the truth in Panels GFD and LogD. In the right two panels, the (majority of) fiducial estimators lie on the left of the dotted-dashed lines, indicating that the estimators are closer to the truth than the sample covariance. The PDSCE estimator falls on the right edge of the Panel D2Sig shows that it is not as close to the truth. As before, the PDSCE estimator overestimates the covariance determinant. Here, burn in = 5000, window = 10,000.

3.5. Model selection for the general case

Often time in practice, to obtain enough statistical power or simply for feasibility, sparse covariates/covariances assumptions are imposed. The exact sparse structure is usually unknown, model selection is required to determine the appropriate parameter space.

Since GFD behaves like the likelihood function, in order to avoid over-fitting, a penalty term on the parameter space needs to be included in the model selection process (Hannig et al., Citation2016).

For the general case, we propose the following penalty function that is based on the Minimum Description Length (MDL) Rissanen (Citation1978) for a model $M$ : (19) $\begin{aligned} q_{M} (n) = \exp \{- \sum_{i = 1}^{p} [\frac{1}{2} p_{i} \log (n p) + \log (\begin{matrix} p \\ p_{i} \end{matrix})]\}, \end{aligned}$ (19) where $M$ corresponds to a $p \times p$ matrix with $p_{i}$ many non-fixed-zero elements in its ith row, and n is the number of observations.

The penalised GFD of A is therefore (20) $\begin{aligned} r_{p} (A | M, y) \propto r (A | M, y) \\ \times \exp \{- \sum_{i = 1}^{p} [\frac{1}{2} p_{i} \log (n p) + \log (\begin{matrix} p \\ p_{i} \end{matrix})]\} . \end{aligned}$ (20)

3.6. Sampling in the general case with sparse locations unknown

In the general case with sparse locations unknown, we further assume that there is a maximum number of nonzeros per column allowed, denoted as maxC. This additional constraint can be viewed as each predictor only contribute to few tuples of the multivariate response. This assumption has been implemented to reduce the search space for RJMCMC. The starting states include MaxC ( $S_{n}^{0.5}$ , restricted to maxC, in blue) along with chol (in cyan), dcho (in artichoke), diag (in yellow) and true (in green) as before. We will revisit the example discussed in Section 3.4.

(See Figure ). In the left two panels, the fiducial estimators peak at the true fiducial likelihood and covariance determinant. The distance comparison plot (top right) show that the estimators are closer to the truth than both the sample covariance matrix and the PDSCE estimator. Bottom right panel shows that the leading eigenvector of the estimators are as close to the truth as for sample covariance and the PDSCE estimator as in Figure . Here, burn in = 50,000, window = 10,000.

Figure 2. Similar to Figure , the fiducial estimators are better than both the sample covariance matrix and the PDSCE estimator in this case.

Figure 2. Similar to Figure 1, the fiducial estimators are better than both the sample covariance matrix and the PDSCE estimator in this case.

Additional simulations are included in the supplementary document.

4. Clique model

4.1. Jacobian for the clique model

Assume that the coordinates of $y$ are broken into cliques, i.e., coordinates i and j are correlated if i, j belong to the same clique and independent otherwise. By simply swapping rows and columns of the covariate matrix, we can arrive at a block diagonal form. Without loss of generality, suppose that A is a block diagonal matrix with block sizes $g_{1}, \dots, g_{k}$ . Then its model $M$ defines the parameter space $\otimes_{i = 1}^{k} R^{g_{i} \times g_{i}}$ . Given $M$ , as an extension of the full model, the GFD function in this case becomes a composite of inverse Wishart distributions: (21) $\begin{aligned} r (Σ | y, M) & = \prod_{i = 1}^{k} \frac{| n S_{n}^{i} |^{\frac{n}{2}}}{2^{\frac{n g_{i}}{2}} Γ_{g_{i}} (\frac{n}{2})} | Σ^{i} |^{- \frac{n + g_{i} + 1}{2}} \\ \times \exp \{- \frac{1}{2} tr (n S_{n}^{i} (Σ^{i})^{- 1})\}, \end{aligned}$ (21) where $S_{n}^{i}$ and $Σ^{i}$ are the sample covariance and covariance component of the ith clique, and $Γ_{g_{i}} (\cdot)$ is the $g_{i}$ dimensional multivariate gamma function.

4.2. Theoretic results for the clique models

Recall that under the full model, $\begin{aligned} r (A | y) \propto {|det (S_{n})|}^{\frac{p}{2}} (2 π)^{- \frac{n p}{2}} | det (A) |^{- (n + p)} \\ \times \exp [- \frac{1}{2} tr {n S_{n} (A A^{T})^{- 1}}] . \end{aligned}$ For clique model selection, we need to evaluate the normalising constant. (22) $\begin{aligned} \int J (y, A) f (y | A) d A = \frac{π^{(p^{2} - n p) / 2} {|det (S_{n})|}^{\frac{p}{2}} Γ_{p} (\frac{n}{2})}{| det (n S_{n}) |^{n / 2} Γ_{p} (\frac{p}{2})} . \end{aligned}$ (22) The detailed derivation is provided in Appendix A.3.

Let us denote by $M$ a clique model; a collection of k cliques – sets of indexes that are related to each other. The coordinates are assumed independent if they are not in the same cliques. For any positive-definite symmetric matrix S, whose dimension is compatible with $M$ , we denote $S^{M}$ as the matrix obtained from S by setting the off-diagonal entries that corresponds to pairs of indexes not in the same clique within $M$ to zero. Note that $S^{M}$ is a block diagonal (after possible permutations of rows and columns) positive-definite symmetric matrix.

The classical Fischer–Hadamard inequality (Fischer, Citation1908) implies that for any positive definite symmetric matrix S and any clique model $det (S) \leq det (S^{M}) .$ Ipsen Lee (Citation2011) provides a useful lower bound. Let ρ be the spectral radius and λ be the smallest eigenvalue of $(S^{M})^{- 1} (S - S^{M})$ , (23) $\begin{aligned} e^{- \frac{p ρ^{2}}{1 + λ}} det (S^{M}) \leq det (S) \leq det (S^{M}) . \end{aligned}$ (23) Assume the clique sizes are $g_{1}, \dots, g_{k} .$ Then the GFD of the model is (24) $\begin{aligned} r (M | y) \propto \frac{π^{\frac{\sum_{i = 1}^{k} g_{i}^{2}}{2}}}{| det S_{n}^{M} |^{\frac{n}{2}}} \prod_{i = 1}^{k} C_{M, i} (y) \frac{Γ_{g_{i}} (\frac{n}{2})}{Γ_{g_{i}} (\frac{g_{i}}{2})}, \end{aligned}$ (24) where $C_{M, i} (y)$ denotes the Jacobian constant term $| det (S_{n, i}) |^{\frac{g_{i}}{2}}$ computed only using the observations in the ith clique.

In the remaining part of this section, we consider the dimension of $y$ as a fixed number p and the sample size $n \to \infty$ . Similar arguments could be extended to $p \to \infty$ with $p / \sqrt{n} \to 0$ .

Given two clique models $M_{1}$ and $M_{2}$ . We write $M_{1} \subset M_{2}$ if cliques in $M_{2}$ are obtained by merging cliques in $M_{1}$ . Consequently, $M_{2}$ has fewer cliques and these cliques are larger than $M_{1}$ . Let $M_{0}, Σ_{0}$ be the ‘true’ clique model and covariance matrix used to generate the observed data. We will call all the clique models $M$ satisfying $Σ_{0}^{M} = Σ_{0}$ compatible with the true covariance matrix. We assume that $M_{0} \subset M$ for all clique models compatible with $Σ_{0}$ .

The following theorem provides some guidelines for choosing penalty function $q_{M} (n)$ . Its proof is included in the appendix. Define the penalised GFD of the model as $r_{p} (M | y) = r (M | y) q_{M} (n)$ .

Theorem 4.1

For any clique model $M$ that is not compatible with $Σ_{0}$ , assume $d e t (Σ_{0}) < d e t (Σ_{0}^{M})$ and the penalty $e^{- a n} q_{M} (n) / q_{M_{0}} (n) \to 0$ for all a>0 as $n \to \infty$ .

For any clique model $M$ compatible with $Σ_{0}$ assume that $q_{M} (n) / q_{M_{0}} (n)$ is bounded.

Then as $n \to \infty$ with p held fixed $r_{p} (M_{0} | Y) \overset{P}{⟶} 1.$

The exact form of the penalty function depends on the norm choice for the Jacobian. Under the $ℓ_{2}$ -norm, the following penalty function works well. (25) $\begin{aligned} q_{M} (n) = \exp \{- \sum_{i = 1}^{k} [\frac{1}{4} g_{i}^{2} \log (n) - \frac{1}{2} g_{i}^{2} \log (g_{i})]\} . \end{aligned}$ (25) It is easy to check that Equation (Equation25(25) $\begin{aligned} q_{M} (n) = \exp \{- \sum_{i = 1}^{k} [\frac{1}{4} g_{i}^{2} \log (n) - \frac{1}{2} g_{i}^{2} \log (g_{i})]\} . \end{aligned}$ (25) ) satisfies Theorem 4.1.

4.3. Sampling from a clique model

The estimation of cliques is closely related to applications in network analysis, such as communities of people in social networks and gene regulatory network. Recall the penalised clique model GFD introduced in Section 4.2, $r_{p} (M | y) \propto \frac{π^{\frac{\sum_{i = 1}^{k} g_{i}^{2}}{2}}}{| det S_{n}^{M} |^{\frac{n}{2}}} \prod_{i = 1}^{k} C_{M, i} (y) \frac{Γ_{g_{i}} (\frac{n}{2})}{Γ_{g_{i}} (\frac{g_{i}}{2})} q_{M} (n) .$ Assuming that both the number of cliques k and the clique sizes $g_{k}$ 's are unknown, the clique structure can be estimated via Gibbs sampler. The first example shows the simulation result for a $200 \times 200$ covariance matrix (Figure ). We consider the covariance matrix to be with 1's on the diagonal and $(i, j)$ th entry being 0.5 if the coordinate $(i, j)$ belongs to a clique. From top down, left to right, Figure shows the trace plot for $\log (r_{p} (M | y))$ without normalising constant, true covariance Σ, sample covariance $S_{n}$ , and the fiducial probability of the estimated cliques based on the 10 Gibbs sampler Markov chains with random initial states. The trace plot helps to monitor the convergence. The fiducial probability of cliques panel reveals the clique structure precisely. The last panel is the aggregate result of 4000 iterations with burn in = 1000 from the 10 Markov chains.

Figure 3. Result for k = 10, p = 200, n = 1000. The trace plot (top left) shows that the chains converge quickly. Although $\frac{n}{p}$ is small, the sample covariance (bottom left) roughly captures the shape of true covariance (top right). The last panel (bottom right) shows that the fiducial estimate captures the true clique structure perfectly.

Figure 3. Result for k = 10, p = 200, n = 1000. The trace plot (top left) shows that the chains converge quickly. Although np is small, the sample covariance (bottom left) roughly captures the shape of true covariance (top right). The last panel (bottom right) shows that the fiducial estimate captures the true clique structure perfectly.

The covariance estimators can be obtained by sampling from inverse Wishart distributions based on the estimated clique structure. Figure shows the confidence curves of four statistics for estimated covariance matrix $\hat{Σ}$ : log-transformed generalised fiducial likelihood (SlogGFD), distance to Σ (D2Sig), log-determinant (LogD), and angle between the leading eigenvectors of $\hat{Σ}$ and Σ (Eigvec angle). The truth for SlogGFD and LogD is shown as red solid vertical lines. In D2Sig and Eigvec angle panels, we include comparisons to sample covariance as red dotted-dashed vertical lines. In addition, we compute the point estimation via the Positive Definite Sparse Covariance Estimators (PDSCE) method introduced in Rothman (Citation2012). Its corresponding statistics are shown as magenta dotted vertical lines. In this example, the fiducial estimates peak near the truth in Panels SlogGFD and LogD. The estimated covariance matrices all appear to be more similar to Σ than $S_{n}$ as shown in panels D2Sig and Eigvec angle. The PDSCE estimator is even closer to Σ in terms of FM-distance; it however greatly overestimates $det Σ$ .

Figure 4. Confidence curve plots for estimated covariance matrix. k=10,p=200,n= 1000. Comparing to the sample covariance, the estimators are closer to Σ. The PDSCE estimator shows even smaller FM-distance to Σ, it, however, greatly overestimates $det Σ$ .

The PDSCE method produces a good point estimator to the covariance matrix. It is worth noting that our method shows similar performance with the benefit of producing a distribution of estimators.

With the same underlying clique model, we generate 200 data sets. Then we apply our method with a random Markov chain starting point and compute the one-sided p-values for the estimate covariance log determinant. With the same true covariance matrix, a new set of 1000 observations are generated for each simulation. Figure shows the quantile–quantile plot of the p-values against the uniform [0,1] distribution. The dotted-dashed envelope is the 95% coverage band. It shows a well-calibrated 95% confidence interval. The p-value curve (in green) is well enclosed by the envelope, indicating good calibration of the coverage.

Figure 5. 95% coverage plots for 200 repeated simulations. k=10,p=200,n= 1000. The p-values (in green) roughly follow a uniform [0,1] distribution, and they lie inside of the envelope.

5. Discussion

Covariance estimation is an important problem in statistics. In this manuscript, we propose to look into this classical problem via a generalised fiducial approach. We demonstrate that, under mild assumptions, the GFD of the covariate matrix is asymptotically normal. In addition, we discuss the clique model and show that the fiducial approach is a powerful tool for identifying clique structures, even when the dimension of the parameter space is large and the ratio n/p is small. To identify the covariance structure for non-clique models, in contrast to typical sparse covariance/precision matrix assumptions, we look at cases where the ratio n/p is small and the covariate matrix is sparse. This ‘unusual’ sparsity assumption arises in applications where multiple dependent variables contribute to several response variables collaboratively. The fiducial approach allows us to obtain a distribution of covariance estimators that are better than sample covariance and comparable to the PDSCE estimator. The distances to true covariance matrix show that as dimension increases, the fiducial estimators become closer to the true covariance matrix.

Similar to Bayesian approaches, generalised fiducial inference produces a distribution of estimators, yet the two methods differ fundamentally. Bayesian methods rely on prior distributions on the parameter of interest, while fiducial approaches depend on the data generating equation. In the framework discussed here, the data generating mechanism is natural to establish than choosing appropriate priors while some other times priors are easier to construct.

Estimating sparse covariance matrix without knowing the fixed zeros is a hard problem. While our approach shows promising results for the clique model, for the general case it still suffers from a few drawbacks: (1) due to the nature of RJMCMC, the computational burden can be significant if the matrix is not very sparse; (2) to limit the search space, a row/column-wise sparsity upper bound needs to be chosen based on prior knowledge of the data type; (3) the results presented in this manuscript assume a squared covariate matrix, which can be limited to direct applications to high-throughput data. Furthermore, a more sophisticated way of choosing initial states and mixing method can improve the efficiency of our algorithm. It is possible and well worth it to extend our current work to more general cases.

Supplemental material

Supplemental Material

Download PDF (2.7 MB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Shi's research was supported in part by the National Library of Medicine Institutional Training Grant T15 LM009451. Hannig's research was supported in part by the National Science Foundation (NSF) under Grant Nos. 1512945, 1633074, and 1916115. Lee's research was supported in part by the NSF under Grant No. 1512945 and 1513484.

Notes on contributors

W. Jenny Shi

W. Jenny Shi obtained her PhD in Statistics from the University of North Carolina. From 2015 to 2018, she was a National Institute of Health postdoctoral fellow at the University of Colorado. She is now a Quantitative Strategist at MassMutual, specializing in financial modeling and strategic initiatives.

Jan Hannig

Jan Hannig received his Mgr (MS equivalent) in mathematics in 1996 from the Charles University, Prague, Czech Republic. He received Ph.D. in statistics and probability in 2000 from Michigan State University under the direction of Professor A.V. Skorokhod. From 2000 to 2008 he was on the faculty of the Department of Statistics at Colorado State University where he was promoted to an Associate Professor. He has joined the Department of Statistics and Operation Research at the University of North Carolina at Chapel Hill in 2008 and was promoted to Professor in 2013. He is an elected member of International Statistical Institute and a fellow of the American Statistical Association and Institute of Mathematical Statistics.

Randy C. S. Lai

Randy C. S. Lai obtained his Ph.D. in Statistics from the University of California, Davis (UC Davis). In 2015–2019 he was an Assistant Professor at the University of Maine. He is now a Visiting Assistant Professor at UC Davis, and he will join Google as a Data Scientist in Spring 2021. His research interests include fiducial inference and statistical computing.

Thomas C. M. Lee

Thomas C. M. Lee is Professor of Statistics and Associate Dean for the Faculty in Mathematical and Physical Sciences at the University of California, Davis. He is an elected Fellow of the American Association for the Advancement of Science (AAAS), the American Statistical Association (ASA), and the Institute of Mathematical Statistics (IMS). From 2013 to 2015 he served as the editor-in-chief for the Journal of Computational and Graphical Statistics, and from 2015 to 2018 he served as the Chair of the Department of Statistics at UC Davis. His recent research interests include astrostatistics, fiducial inference, machine learning, and statistical image and signal processing.

References

Abramowitz, M., & Stegun, I. A. (1964). Handbook of mathematical functions: with formula, graphs and mathematical tables. Courier Corporation.
Google Scholar
Avella-Medina, M., Battey, H. S., Fan, J., & Li, Q. (2018). Robust estimation of high-dimensional covariance and precision matrices. Biometrika, 105, 271–284. https://doi.org/https://doi.org/10.1093/biomet/asy011
Web of Science ®Google Scholar
Bickel, P. J., & Levina, E. (2008a). Covariance regularization by thresholding. The Annals of Statistics, 36, 2577–2604. https://doi.org/https://doi.org/10.1214/08-AOS600
Web of Science ®Google Scholar
Bickel, P. J., & Levina, E. (2008b). Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199–227. https://doi.org/https://doi.org/10.1214/009053607000000758
Web of Science ®Google Scholar
Brooks, S. P., Giudici, P., & Roberts, G. O. (2003). Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. Journal of the Royal Statistical Society, 65(1), 3–39. https://doi.org/https://doi.org/10.1111/rssb.2003.65.issue-1
Google Scholar
Cai, T., & Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association, 106, 672–684. https://doi.org/https://doi.org/10.1198/jasa.2011.tm10560
Web of Science ®Google Scholar
Cisewski, J., & Hannig, J. (2012). Generalized fiducial inference for normal linear mixed models. The Annals of Statistics, 40(4), 2102–2127. https://doi.org/https://doi.org/10.1214/12-AOS1030
Web of Science ®Google Scholar
Cui, Y., & Hannig, J. (2019). Nonparametric generalized fiducial inference for survival functions under censoring, with discussion and rejoinder by the author. Biometrika, 106, 501–518. https://doi.org/https://doi.org/10.1093/biomet/asz016
Web of Science ®Google Scholar
Fischer, E. (1908). Uber den hadamardschen determinantensatz. Archiv D Math U Phys, 13, 32–40.
Google Scholar
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, 222, 309–368. https://doi.org/https://doi.org/10.1098/rsta.1922.0009
Google Scholar
Fisher, R. A. (1930). Inverse probability. Proceedings of the Cambridge Philosophical Society, 26, 528–535. https://doi.org/https://doi.org/10.1017/S0305004100016297
Google Scholar
Fisher, R. A. (1933). The concepts of inverse probability and fiducial probability referring to unknown parameters. Proceedings of the Royal Society of London Series A, 139, 343–348.
Google Scholar
Fisher, R. A. (1935). The fiducial argument in statistical inference. The Annals of Eugenics, 6, 91–98.
Google Scholar
Förstner, W., & Moonen, B. (1999). A metric for covariance matrices. In Quo vadis geodesia? Festschrift for Erik W. Grafarend on the occasion of his 60th birthday, Schriftenreihe der Institute des Studiengangs Geodäsie und Geoinformatik (pp. 113–128). IAGB, Stuttgart.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics (Oxford, England), 9(3), 432–441. https://doi.org/https://doi.org/10.1093/biostatistics/kxm045
PubMed Web of Science ®Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Applications of lasso and grouped lass to estimation of sparse graphical models. Technical Report, Standford University.
Google Scholar
Furrer, R., & Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. Journal of Multivariate Analysis, 98, 227–255. https://doi.org/https://doi.org/10.1016/j.jmva.2006.08.003
Web of Science ®Google Scholar
Ghosh, J. K., & Ramamoorthi, R. V. (2003). Bayesian nonparametrics. Springer Series in Statistiscs. Springer-Verlag.
Google Scholar
Glagovskiy, Y. S. (2006). Construction of fiducial confidence intervals for the mixture of cauchy and normal distributions(Master's thesis). Colorado State University.
Google Scholar
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732. https://doi.org/https://doi.org/10.1093/biomet/82.4.711
Web of Science ®Google Scholar
Hannig, J., Feng, Q., Iyer, H., Wang, C., & Liu, X. (2018). Fusion learning for inter-laboratory comparisons. Journal of Statistical Planning and Inference, 195, 64–79. https://doi.org/https://doi.org/10.1016/j.jspi.2017.09.011
Web of Science ®Google Scholar
Hannig, J., Iyer, H. K., Lai, R. C. S., & Lee, T. C. M. (2016). Generalized fiducial inference: a review and new results. Journal of the American Statistical Association, 111, 1346–1361. https://doi.org/https://doi.org/10.1080/01621459.2016.1165102
Web of Science ®Google Scholar
Hannig, J., Iyer, H. K., & Wang, J. C. M. (2007). Fiducial approach to uncertainty assessment: account for error due to instrument resolution. Metrologia, 44, 476–483. https://doi.org/https://doi.org/10.1088/0026-1394/44/6/006
Web of Science ®Google Scholar
Hannig, J., & Lee, T. C. M. (2009). Generalized fiducial inference for wavelet regression. Biometrika, 96, 847–860. https://doi.org/https://doi.org/10.1093/biomet/asp050
Web of Science ®Google Scholar
Hannig, J., Lidong, E., Abdel-Karim, A., & Iyer, H. K. (2006). Simultaneous fiducial generalized confidence intervals for ratios of means of lognormal distributions. Austrian Journal of Statistics, 35, 261–269. https://doi.org/https://doi.org/10.17713/ajs.v35i2&3.372
Google Scholar
Hannig, J., Wang, J. C. M., & Iyer, H. K. (2003). Uncertainty calculation for the ratio of dependent measurements. Metrologia, 4, 177–186. https://doi.org/https://doi.org/10.1088/0026-1394/40/4/306
Web of Science ®Google Scholar
Huang, H.-C., & Lee, T. C. M. (2016). High-dimensional covariance estimation under the presence of outliers. Statistics and Its Interface, 9, 461–468. https://doi.org/https://doi.org/10.4310/SII.2016.v9.n4.a6
Web of Science ®Google Scholar
Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 93, 85–98. https://doi.org/https://doi.org/10.1093/biomet/93.1.85
Web of Science ®Google Scholar
Ipsen, I. C., & Lee, D. J. (2011). Determinant approximations. arXiv preprint arXiv:1105.0437.
Google Scholar
Iyer, H. K., Wang, C. M. J., & Mathew, T. (2004). Models and confidence intervals for true values in interlaboratory trials. Journal of the American Statistical Association, 99, 1060–1071. https://doi.org/https://doi.org/10.1198/016214504000001682
Web of Science ®Google Scholar
Jameson, G. (2013). Inequalities for gamma function ratios. The American Mathematical Monthly, 120(10), 936–940. https://doi.org/https://doi.org/10.4169/amer.math.monthly.120.10.936
Web of Science ®Google Scholar
Lai, R. C. S., Hannig, J., & Lee, T. C. M. (2015). Generalized fiducial inference for ultrahigh dimensional regression. Journal of American Statistical Association, 110, 760–772. https://doi.org/https://doi.org/10.1080/01621459.2014.931237
Web of Science ®Google Scholar
Lam, C., & Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 37, 42–54. https://doi.org/https://doi.org/10.1214/09-AOS720
Web of Science ®Google Scholar
Levina, E., Rothman, A. J., & Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested lasso penalty. The Annals of Applied Statistics, 2, 245–263. https://doi.org/https://doi.org/10.1214/07-AOAS139
Web of Science ®Google Scholar
Li, X., Su, H., & Liang, H. (2018). Fiducial generalized p-values for testing zero-variance components in linear mixed-effects models. Science China Mathematics, 61(7), 1303–1318. https://doi.org/https://doi.org/10.1007/s11425-016-9068-8
Web of Science ®Google Scholar
Lidong, E., Hannig, J., & Iyer, H. K. (2008). Fiducial intervals for variance components in an unbalanced two-component normal mixed linear model. Journal of the American Statistical Association, 103, 854–865. https://doi.org/https://doi.org/10.1198/016214508000000229
Web of Science ®Google Scholar
Liu, Y., & Hannig, J. (2016). Generalized fiducial inference for binary logistic item response models. Psychometrica, 81, 290–324. https://doi.org/https://doi.org/10.1007/s11336-015-9492-7
PubMed Web of Science ®Google Scholar
Liu, Y., & Hannig, J. (2017). Generalized fiducial inference for logistic graded response models. Psychometrica, 82, 1097–1125. https://doi.org/https://doi.org/10.1007/s11336-017-9554-0
Web of Science ®Google Scholar
Martin, R., & Liu, C. (2015). Inferential models: reasoning with uncertainty. CRC Press.
Google Scholar
Pourahmadi, M. (2011). Covariance estimation: the GLM and regularization perspectives. Statistical Science, 26(3), 369–387. https://doi.org/https://doi.org/10.1214/11-STS358
Web of Science ®Google Scholar
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–658. https://doi.org/https://doi.org/10.1016/0005-1098(78)90005-5
Web of Science ®Google Scholar
Rothman, A. (2012). Positive definite estimators of large covariance matrices. Biometrika, 99(3), 733–740. https://doi.org/https://doi.org/10.1093/biomet/ass025
Web of Science ®Google Scholar
Rothman, A. J., Levina, E., & Zhu, J. (2009). Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104, 177–186. https://doi.org/https://doi.org/10.1198/jasa.2009.0101
Web of Science ®Google Scholar
Rothman, A. J., Levina, E., & Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika, 97, 539–550. https://doi.org/https://doi.org/10.1093/biomet/asq022
Web of Science ®Google Scholar
Schweder, T., & Hjort, N. L. (2016). Confidence, likelihood, probability. Cambridge University Press.
Google Scholar
Shi, W. J. (2015). Bayesian modeling for viral sequencing and covariance estimation via fiducial inference (PhD thesis). University of North Carolina at Chapel Hill.
Google Scholar
Sonderegger, D., & Hannig, J. (2012). Bernstein–von Mises theorem for generalized fiducial distributions with application to free knot splines. Preprint.
Google Scholar
van der Vaart, A. W. (1998). Asymptotic statistics, volume 3 of Cambridge series in Statistical and Probabilistic Mathematics. Cambridge University Press.
Google Scholar
Wandler, D. V., & Hannig, J. (2011). Fiducial inference on maximum mean of a multivariate normal distribution. Journal of Multivariate Analysis, 102, 87–104. https://doi.org/https://doi.org/10.1016/j.jmva.2010.08.003
Web of Science ®Google Scholar
Wandler, D. V., & Hannig, J. (2012). A fiducial approach to multiple comparisons. Journal of Statistical Planning and Inference, 142, 878–895. https://doi.org/https://doi.org/10.1016/j.jspi.2011.10.011
Web of Science ®Google Scholar
Wandler, D. V., & Hannig, J. (2012). Generalized fiducial confidence intervals for extremes. Extremes, 15, 67–87. https://doi.org/https://doi.org/10.1007/s10687-011-0127-9
Web of Science ®Google Scholar
Wang, J. C. M., Hannig, J., & Iyer, H. K. (2012). Pivotal methods in the propagation of distributions. Metrologia, 49, 382–389. https://doi.org/https://doi.org/10.1088/0026-1394/49/3/382
Web of Science ®Google Scholar
Wang, J. C. M., & Iyer, H. K. (2005). Propagation of uncertainties in measurements using generalized inference. Metrologia, 42, 145–153. https://doi.org/https://doi.org/10.1088/0026-1394/42/2/010
Web of Science ®Google Scholar
Wang, J. C. M., & Iyer, H. K. (2006a). A generalized confidence interval for a measurand in the presence of type-a and type-b uncertainties. Measurement, 39, 856–863. https://doi.org/https://doi.org/10.1016/j.measurement.2006.04.011
Web of Science ®Google Scholar
Wang, J. C. M., & Iyer, H. K. (2006b). Uncertainty of analysis of vector measurands using fiducial inference. Metrologia, 43, 486–494. https://doi.org/https://doi.org/10.1088/0026-1394/43/6/002
Web of Science ®Google Scholar
Williams, J., & Hannig, J. (2018). Non-penalized variable selection in high-dimensional linear model settings via generalized fiducial inference. Annals of Statistics, 47(3), 1723–1753. https://doi.org/https://doi.org/10.1214/18-AOS1733
Web of Science ®Google Scholar
Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831–844. https://doi.org/https://doi.org/10.1093/biomet/90.4.831
Web of Science ®Google Scholar
Xie, M., & Singh, K. (2013). Confidence distribution, the frequentist distribution estimator of a parameter: a review. International Statistical Review, 81, 3–39. https://doi.org/https://doi.org/10.1111/insr.2013.81.issue-1
Web of Science ®Google Scholar

Appendix

A.1. Regularity conditions and Jacobian formula

Before proving the theorem on consistency of the GFD, we will first define the δ-neighbourhood of

A_{0}

and establish some regularity conditions on the likelihood function and Jacobian formula (Propositions A.1, A.2, 3.1).

Definition A.1

For a fixed covariate matrix $A_{0}$ and $δ \geq 0$ , define the δ-neighbourhood of $A_{0}$ as the set $B (A_{0}, δ) = {A : d (A A^{T}, A_{0} A_{0}^{T}) \leq δ}$ . Recall that $d$ is the FM-distance (Equation17(17) $\begin{aligned} d (M, N) = \sqrt{\sum_{i = 1}^{n} \log^{2} λ_{i} (M, N)} . \end{aligned}$ (17) ).

Proposition A.1

For any $δ > 0$ there exists $ϵ > 0$ such that $P_{A_{0}} \{sup_{A \notin B (A_{0}, δ)} \frac{1}{n} (L_{n} (A) - L_{n} (A_{0})) \leq - ϵ\} \to 1,$ where $L_{n} (A) = \log f (y, A) = \sum_{i = 1}^{n} \log f (y_{i}, A) .$

Proof.

Let $Σ = A A^{T}, Σ_{0} = A_{0} A_{0}^{T}$ . Denote $S_{n}$ as the sample covariance matrix as before, $n \in N$ . Since $S_{n}$ is the maximum likelihood estimator, we have $\begin{aligned} S_{n} ⟹ P_{A_{0}} Σ_{0}, i . e ., \\ \forall r > 0, P_{A_{0}} ({ω : d (S_{n} (ω), Σ_{0}) \geq r}) \to 0. \end{aligned}$ Define $L_{δ, n} = {ω : d (S_{n} (ω), Σ_{0}) < δ / 2}$ . For an arbitrary $ω \in L_{δ, n},$ assume that $λ_{i}^{†}$ 's and $λ_{i}^{*}$ 's are the eigenvalues of $S_{n} (ω) Σ^{- 1}$ and $S_{n} (ω) Σ_{0}^{- 1}$ , respectively. Suppose that $A \notin B (A_{0}, δ)$ , then $\begin{aligned} δ < d (Σ, Σ_{0}) & \leq d (Σ, S_{n} (ω)) + d (S_{n} (ω), Σ_{0}) \\ < d (Σ, S_{n} (ω)) + δ / 2 \\ \Rightarrow d (Σ, S_{n} (ω)) = \sqrt{\sum_{i = 1}^{p} \log^{2} λ_{i}^{†}} > δ / 2. \end{aligned}$ So there exists $k \in {1, 2, \dots, p}$ , such that $\ln^{2} λ_{k}^{†} > \frac{δ^{2}}{4 p}$ , then $\begin{aligned} \ln λ_{k} - λ_{k} < max \{\frac{δ}{2 \sqrt{p}} - \exp (\frac{δ}{2 \sqrt{p}}), \\ - \frac{δ}{2 \sqrt{p}} - \exp (- \frac{δ}{2 \sqrt{p}})\} := m_{δ} \end{aligned}$ due to the fact that the function $g (λ) = \ln λ - λ$ is concave with unique maxima $λ = 1$ ; $g (1) = - 1$ .

Meanwhile, $\begin{aligned} \frac{1}{n} (L_{n} (A) - L_{n} (A_{0})) (ω) \\ = - \ln | det (A) | - \frac{1}{2} tr {S_{n} (ω) Σ^{- 1}} \\ + \ln | det (A_{0}) | + \frac{1}{2} tr {S_{n} (ω) Σ_{0}^{- 1}} \\ = \frac{1}{2} \ln (S_{n} (ω) Σ^{- 1}) - \frac{1}{2} tr {S_{n} (ω) Σ^{- 1}} \\ - \frac{1}{2} \ln (S_{n} (ω) Σ_{0}^{- 1}) + \frac{1}{2} tr {S_{n} (ω) Σ_{0}^{- 1}} \\ = \frac{1}{2} \{\sum_{i = 1}^{p} (\ln λ_{i}^{†} - λ_{i}^{†}) - \sum_{i = 1}^{p} (\ln λ_{i}^{*} - λ_{i}^{*})\} \\ < \frac{1}{2} \{- (p - 1) + m_{δ} + p\} \\ = \frac{1}{2} (m_{δ} + 1) . \end{aligned}$ This implies $sup_{A \notin B (A_{0}, δ)} \frac{1}{n} (L_{n} (A) - L_{n} (A_{0})) (ω) \leq \frac{1}{2} (m_{δ} + 1) < 0.$ Let $ϵ = - \frac{1}{2} (m_{δ} + 1), U_{δ, n} = {ω : sup_{A \notin B (A_{0}, δ)} \frac{1}{n} (L_{n} (A) - L_{n} (A_{0})) (ω) \leq - ϵ}$ . Then $L_{δ, n} \subseteq U_{δ, n}$ . Notice that $\begin{aligned} 1 & = lim_{n \to \infty} P_{A_{0}} (L_{δ, n}) = \underset{n \to \infty}{lim inf} P_{A_{0}} (L_{δ, n}) \\ \leq \underset{n \to \infty}{lim inf} P_{A_{0}} (U_{δ, n}) \leq \underset{n \to \infty}{lim sup} P_{A_{0}} (U_{δ, n}) \leq 1. \end{aligned}$ Therefore, $lim_{n \to \infty} P_{A_{0}} (U_{δ, n}) = 1.$

Proposition A.2

Let $L_{n} (\cdot)$ be as above. Then for any $δ > 0$ $inf_{A \notin B (A_{0}, δ)} \frac{min_{\begin{matrix} i = {i_{1}, \dots, i_{p}} \\ 1 \leq i_{1} < \dots < i_{p} \leq n \end{matrix}} \log f (A, y_{i})}{| L_{n} (A) - L_{n} (A_{0}) |} ⟹ A_{0} 0,$ where $f (A, y_{i})$ is the joint likelihood of p observations $y_{i_{1}}, \dots, y_{i_{p}}$ .

Proof.

Note that $\begin{aligned} inf_{A \notin B (A_{0}, δ)} \frac{min_{\begin{matrix} i = {i_{1}, \dots, i_{p}} \\ 1 \leq i_{1} < \dots < i_{p} \leq n \end{matrix}} \log f (A, y_{i})}{| L_{n} (A) - L_{n} (A_{0}) |} \\ \leq \frac{inf_{A \notin B (A_{0}, δ)} min_{\begin{matrix} i = {i_{1}, \dots, i_{p}} \\ 1 \leq i_{1} < \dots < i_{p} \leq n \end{matrix}} \log f (A, y_{i})}{inf_{A \notin B (A_{0}, δ)} | L_{n} (A) - L_{n} (A_{0}) |} . \end{aligned}$ For any $A \notin B (A_{0}, δ)$ , denote $Σ = A A^{T}, Σ_{0} = A_{0} A_{0}^{T}$ and let t>0, we have $\begin{aligned} P_{A_{0}} (min_{\begin{matrix} i = {i_{1}, \dots, i_{p}} \\ 1 \leq i_{1} < \dots < i_{p} \leq n \end{matrix}} \log f (A, y_{i}) \leq - t \log n) \\ \leq P_{A_{0}} (min_{i = 1, \dots, n} \log f (A, Y_{i}) \leq - \frac{t \log n}{p}) \\ = 1 - {[1 - P_{A_{0}} (- \log f (A, Y_{i}) \geq - \frac{t \log n}{p})]}^{n} \\ \leq 1 - {[1 - \frac{p E_{A_{0}} (- \log f (A, Y_{i}))}{t \log n}]}^{n} (M a r k o v i n e q u a l i t y) \\ = 1 - {[1 - \frac{p (\log (2 π) + \log det (Σ) + tr {Σ^{- 1} Σ_{0}})}{2 t \log n}]}^{n} \\ \to 0, a s n \to \infty . \end{aligned}$ Note that the numerator goes to $- \infty$ at most as fast as $- t \log n$ . Meanwhile, for a fixed n and any $ω \in L_{δ, n} = {ω : d (S_{n} (ω), Σ_{0}) < δ / 2}$ , $\begin{aligned} inf_{A \notin B (A_{0}, δ)} | L_{n} (A) - L_{n} (A_{0}) | \\ = - sup_{A \notin B (A_{0}, δ)} L_{n} (A) - L_{n} (A_{0}) \geq ϵ n . \end{aligned}$ By Proposition (A.1), $lim_{n \to \infty} P_{A_{0}} (inf_{A \notin B (A_{0}, δ)} | L_{n} (A) - L_{n} (A_{0}) | \geq ϵ n) = 1,$ i.e., the denominator goes to infinity at least as fast as $ϵ n$ .

Proof of Proposition 3.1.

Proof of Proposition 3.1

Given an ordered index vector $r = (r_{1}, \dots, r_{l})$ , let $E_{r} = (e_{r_{1}}; \dots; e_{r_{l}})$ , where each $e_{r_{j}}$ is a $1 \times p$ vector with 1 in the $r_{j}$ th tuple and 0 everywhere else. Denote $- r = {1, \dots, p} ∖ r$ .

Under the $ℓ_{2}$ -norm, $\begin{aligned} J (y, A) & = \sqrt{\prod_{i = 1}^{p} det (U_{i}^{T} U_{i} / n)} \\ = \sqrt{\prod_{i = 1}^{p} det (E_{- S_{i}}^{T} A^{- 1} S_{n} (A^{- 1})^{T} E_{- S_{i}})}, \end{aligned}$ where $S_{i}$ is the list of indexes of fixed zeros in the ith row of A.

By the Strong Law of Large Numbers for $S_{n}$ and continuity of $J (y, A)$ , $\begin{aligned} J (y, A) ⟶ \sqrt{\prod_{i = 1}^{p} det (E_{- S_{i}}^{T} A^{- 1} Σ_{0} (A^{- 1})^{T} E_{- S_{i}})} \\ := π_{Σ_{0}} (A) a . s . \end{aligned}$ Note that both $P_{n} = J (y, A)$ and $P_{0} = π_{Σ_{0}} (A)$ are polynomials of entries of $A^{- 1} .$ If the domain of A is in compact, the coefficients of $P_{n}$ converge to the coefficients of $P_{0}$ uniformly. Furthermore, the derivative is bounded, hence $P_{n}$ is equicontinuous. We have $J (y, A) ⟹ a . s . π_{Σ_{0}} (A)$ uniformly on compacts in A.

A.2. Proof of Theorem 3.1

Proof.

Proposition 3.1 implies $sup_{A \in B (A_{0}, δ)} | J (y, A) - π_{Σ_{0}} (A) | \to 0 a . s . P_{A_{0}} .$ $\begin{aligned} π^{*} (B, y) & = \frac{J (y, {\hat{A}}_{n} + \frac{B}{\sqrt{n}}) f (y | {\hat{A}}_{n} + \frac{B}{\sqrt{n}})}{\int_{R^{p^{2}}} J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) f (y | {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) d C} \\ = \frac{\begin{matrix} J (y, {\hat{A}}_{n} + \frac{B}{\sqrt{n}}) \\ \exp [L_{n} ({\hat{A}}_{n} + \frac{B}{\sqrt{n}}) - L_{n} (\hat{A_{n}})] \end{matrix}}{\begin{matrix} \int_{R^{p^{2}}} J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} (\hat{A_{n}})] d C . \end{matrix}} \end{aligned}$ Notice that $H = - \frac{1}{n} \frac{\partial^{2}}{\partial A \partial A} (\hat{A_{n}}) \to I (A_{0}) a . s . P_{A_{0}} .$ It suffices to show that (A1) $\begin{aligned} \begin{aligned} \int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} (\hat{A_{n}})] \\ - π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}]| d C ⟹ P_{A_{0}} 0. \end{aligned} \end{aligned}$ (A1) Let $C_{x}$ be the ijth entry of C, where $x = i + (p - 1) j$ . By Taylor Theorem, $\begin{aligned} L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) & = L_{n} ({\hat{A}}_{n}) + \sum_{x = 1}^{p^{2}} (\frac{C_{x}}{\sqrt{(n)}}) \frac{\partial}{\partial A_{x}} L_{n} ({\hat{A}}_{n}) \\ + \frac{1}{2} \sum_{x = 1}^{p^{2}} \sum_{y = 1}^{p^{2}} (\frac{C_{x} C_{y}}{(\sqrt{(n)})^{2}}) \\ \times \frac{\partial^{2}}{\partial A_{x} \partial A_{y}} L_{n} ({\hat{A}}_{n}) \\ + \frac{1}{6} \sum_{x = 1}^{p^{2}} \sum_{y = 1}^{p^{2}} \sum_{z = 1}^{p^{2}} (\frac{C_{x} C_{y} C_{z}}{(\sqrt{(n)})^{3}}) \\ \times \frac{\partial^{3}}{\partial A_{x} \partial A_{y} \partial A_{z}} L_{n} (A^{'}) \\ = L_{n} ({\hat{A}}_{n}) - \frac{C^{T} H C}{2} + R_{n} \end{aligned}$ for some $A^{'} \in [{\hat{A}}_{n}, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}]$ . Notice that $R_{n} = O p (n^{- 3 / 2} \times | | C | |)$ . Given any $0 < δ < δ_{0}$ and t>0, the parameter space $R^{p^{2}}$ can be partitioned into three regions: $\begin{aligned} S_{1} & = {C : | | C | | < t \log \sqrt{n}}; \\ S_{2} & = {C : t \log \sqrt{n} < | | C | | < δ \sqrt{n}}; \\ S_{3} & = {C : | | C | | > δ \sqrt{n}} . \end{aligned}$ On $S_{1} \cup S_{2}$ , $\begin{aligned} \int_{S_{1} \cup S_{2}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] \\ - π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}]| d C \\ \leq \int_{S_{1} \cup S_{2}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - π_{Σ_{0}} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}})| \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] d C \\ + \int_{S_{1} \cup S_{2}} |π_{Σ_{0}} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] \\ - π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}]| d C . \end{aligned}$ Since $π_{Σ_{0}} (\cdot)$ is a proper prior on the region $S_{1} \cup S_{2}$ , the second term goes to zero by the Bayesian Bernstein–von Mises Theorem (see the proof of Theorem 1.4.2 in Ghosh and Ramamoorthi (Citation2003)).

Next we notice that $\begin{aligned} \int_{S_{1} \cup S_{2}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - π_{A_{0}} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}})| \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] d C \\ \leq sup_{C \in S_{1} \cup S_{2}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - π_{A_{0}} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}})| \\ \times \int_{S_{1} \cup S_{2}} \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] d C . \end{aligned}$ Since $\sqrt{n} ({\hat{A}}_{n} - A_{0}) ⟹ D N (0, I (A_{0})^{- 1})$ , we have $P_{A_{0}} [\{{\hat{A}}_{n} + \frac{C}{\sqrt{n}}; C \in S_{1} \cup S_{2}\} \subset B (A_{0}, δ_{0})] \to 1.$ Furthermore, $L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n}) = - \frac{C^{T} H C}{2} + R_{n},$ so the integral converges in probability to 1. Since $max_{C \in S_{1} \cup S_{2}} \leq δ$ and $J_{n} \to π_{Σ_{0}}$ , the term goes to 0 in probability.

Turning our attention to $S_{3}$ , notice that $\begin{aligned} \int_{S_{3}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] \\ - π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}]| d C \\ \leq \int_{S_{3}} J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] d C \\ + \int_{S_{3}} π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}] d C . \end{aligned}$ The last integral goes to zero in $P_{A_{0}}$ because $min_{S_{3}} | | C | | \to \infty$ .

For each $y$ , let $i$ be $\begin{aligned} i & = \underset{\tilde{i}}{argmin} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - J (y_{\tilde{i}}, {\hat{A}}_{n} + \frac{C}{\sqrt{n}})| \\ = \underset{\tilde{i}}{argmin} h (y, C, \tilde{i}) . \end{aligned}$ $\begin{aligned} \int_{S_{3}} J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] d C \\ \leq \int_{S_{3}} \{h (y, C, i) f (y_{i} | {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \\ + J (y_{i}, {\hat{A}}_{n} \frac{C}{\sqrt{n}}) f (y_{i} | {\hat{A}}_{n} + \frac{C}{\sqrt{n}})\} \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \\ - L_{n} ({\hat{A}}_{n}) - \log f (y_{i} | {\hat{A}}_{n} + \frac{C}{\sqrt{n}})] d C . \end{aligned}$ Note that as n goes to infinity, the first two product terms, $h (\cdot) f (\cdot)$ and $J (\cdot) f (\cdot)$ , are both bounded; the exponent term goes to $- \infty$ by Proposition A.2, so the integral goes to zero in probability.

Having shown Equation (EquationA1(A1) $\begin{aligned} \begin{aligned} \int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} (\hat{A_{n}})] \\ - π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}]| d C ⟹ P_{A_{0}} 0. \end{aligned} \end{aligned}$ (A1) ), we now follow Ghosh and Ramamoorthi (Citation2003) and let $\begin{aligned} D_{n} & = \int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})]| d C . \end{aligned}$ Then the main result to be proven Equation (Equation18(18) $\begin{aligned} \int_{R^{p^{2}}} |π^{*} (B, y) - \frac{\sqrt{d e t | I (A_{0}) |}}{(2 π)^{p}} \\ \times \exp {- B^{T} I (A_{0}) B / 2}| d B ⟹ P_{A_{0}} 0. \end{aligned}$ (18) ) becomes (A2) $\begin{aligned} \begin{aligned} D_{n}^{- 1} \{\int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{B}{\sqrt{n}}) \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{B}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] \\ - D_{n} \frac{\sqrt{det (I (A_{0}))}}{(2 π)^{p}} \exp (- \frac{B^{T} I (A_{0}) B}{2})|\} d B ⟹ P_{A_{0}} 0. \end{aligned} \end{aligned}$ (A2) Because $\begin{aligned} \int_{R^{p^{2}}} J (y, {\hat{A}}_{n}) \exp (- \frac{B^{T} I (A_{0}) B}{2}) d B \\ = J (y, {\hat{A}}_{n}) \int_{R^{p^{2}}} \exp (- \frac{B^{T} I (A_{0}) B}{2}) d B \\ = J (y, {\hat{A}}_{n}) \frac{(2 π)^{p}}{\sqrt{det (H)}} \\ ⟹ a . s . π (A_{0}) \frac{(2 π)^{p}}{\sqrt{det (H)}}, \end{aligned}$ and (EquationA1(A1) $\begin{aligned} \begin{aligned} \int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} (\hat{A_{n}})] \\ - π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}]| d C ⟹ P_{A_{0}} 0. \end{aligned} \end{aligned}$ (A1) ) implies that $D_{n} ⟹ P π (A_{0}) \frac{(2 π)^{p}}{\sqrt{det (H)}}$ . It is sufficient to show that the integral in Equation (EquationA2(A2) $\begin{aligned} \begin{aligned} D_{n}^{- 1} \{\int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{B}{\sqrt{n}}) \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{B}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] \\ - D_{n} \frac{\sqrt{det (I (A_{0}))}}{(2 π)^{p}} \exp (- \frac{B^{T} I (A_{0}) B}{2})|\} d B ⟹ P_{A_{0}} 0. \end{aligned} \end{aligned}$ (A2) ) goes to 0 in probability. This integral is less than $I_{1} + I_{2}$ , where $\begin{aligned} I_{1} & = \int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{B}{\sqrt{n}}) \\ \times \exp [L_{n} ({\hat{A}}_{n} + \frac{B}{\sqrt{n}}) - L_{n} ({\hat{A}}_{n})] \\ - J (y, {\hat{A}}_{n}) \exp (- \frac{B^{T} I (A_{0}) B}{2})| d B \end{aligned}$ and $\begin{aligned} I_{2} & = \int_{R^{p^{2}}} |J (y, {\hat{A}}_{n}) \exp (- \frac{B^{T} H B}{2}) \\ - D_{n} \frac{\sqrt{det (I (A_{0}))}}{(2 π)^{p}} \exp (- \frac{B^{T} I (A_{0}) B}{2})| d B . \end{aligned}$ Equation (EquationA1(A1) $\begin{aligned} \begin{aligned} \int_{R^{p^{2}}} |J (y, {\hat{A}}_{n} + \frac{C}{\sqrt{n}}) \exp [L_{n} ({\hat{A}}_{n} + \frac{C}{\sqrt{n}}) - L_{n} (\hat{A_{n}})] \\ - π_{Σ_{0}} (A_{0}) \exp [\frac{- C^{T} I (A_{0}) C}{2}]| d C ⟹ P_{A_{0}} 0. \end{aligned} \end{aligned}$ (A1) ) shows that $I_{1} \to 0$ in probability.

Since $J (y, {\hat{A}}_{n}) ⟹ P π (A_{0}) and D_{n} ⟹ P π (A_{0}) \frac{(2 π)^{p}}{\sqrt{det (I (A_{0}))}},$ we have $\begin{aligned} I_{2} & = |J (y, {\hat{A}}_{n}) - D_{n} \frac{\sqrt{det (I (A_{0}))}}{(2 π)^{p}}| \\ \times \int_{R^{p^{2}}} \exp (- \frac{B^{T} H B}{2}) d B ⟹ P 0. \end{aligned}$

A.3. Derivation of the normalising constant (22)

Using a substitution $A^{- 1} (n S_{n})^{1 / 2} = Z$ with the Jacobian $d A = | det Z |^{- 2 p} | det (n S_{n}) |^{p / 2} d Z$ we have $\begin{aligned} \int J (y, A) f (y | A) d A \\ = {|det (S_{n})|}^{\frac{p}{2}} \int \frac{e^{- \frac{1}{2} tr (A^{- 1} (n S_{n})^{1 / 2}) (A^{- 1} (n S_{n})^{1 / 2})^{⊤}}}{(2 π)^{n p / 2} | det A |^{n + p}} d A \\ = {|det (S_{n})|}^{\frac{p}{2}} \int | det Z |^{n - p} | det (n S_{n}) |^{- n / 2} e^{- \frac{1}{2} tr Z Z^{T}} d Z \\ = (2 π)^{- (n - p) p / 2} {|det (S_{n})|}^{\frac{p}{2}} | det (n S_{n}) |^{- n / 2} E | det Z |^{n - p} \\ = \frac{π^{(p^{2} - n p) / 2} {|d e t (S_{n})|}^{\frac{p}{2}} Γ_{p} (\frac{n}{2})}{| d e t (n S_{n}) |^{n / 2} Γ_{p} (\frac{p}{2})} . \end{aligned}$ The last equality follows from the fact that for a $p \times p$ matrix of independent standard normal normal variables Z we have $E | d e t Z |^{n} = \frac{2^{n p / 2} Γ_{p} (\frac{n + p}{2})}{Γ_{p} (\frac{p}{2})} .$

A.4. Lemmas for the clique model

Lemma A.1

Under the $ℓ_{2}$ -norm, for any clique model $M$ with k cliques of sizes $g_{i}, i = 1, \dots, k$ , we have $C_{M, i} (y) = | d e t (S_{n}^{M, i}) |^{g_{i} / 2} \to | d e t (Σ_{0}^{M, i}) |^{\frac{g_{i}}{2}} a . s .,$ where $S_{n}^{M, i}$ is the sample covariance computed using only observations within clique i under the model $M$ , and $Σ_{0}^{M, i}$ denotes the ith block component of $Σ_{0}^{M}$ .

Proof.

The Strong Law of Large Numbers implies $S_{n, i}^{M} \to Σ_{0}^{M, i}$ a.s. for each $i = 1, \dots, k$ and the results follow by continuity.

Lemma A.1 provides the limits of the constant $C_{M, i} (y)$ as sample size increases. The next lemma shows how the ratio $\frac{\prod_{i = 1}^{k} Γ_{g_{i}} (\frac{n}{2})}{\prod_{j = 1}^{l} Γ_{h_{j}} (\frac{n}{2})}$ behaves when sample size increases.

Lemma A.2

Let $g_{i}, i = 1, \dots, k$ and $h_{j}, j = 1, \dots, l$ be integers such that $\sum_{i = 1}^{k} g_{i} = \sum_{j = 1}^{l} h_{i}$ . Then as $n \to \infty$ $\frac{\prod_{i = 1}^{k} Γ_{g_{i}} (\frac{n}{2})}{\prod_{j = 1}^{l} Γ_{h_{j}} (\frac{n}{2})} \sim {(\frac{π}{n})}^{\frac{\sum_{i = 1}^{k} g_{i}^{2} - \sum_{j = 1}^{l} h_{j}^{2}}{4}} .$

Proof.

It is well known (Abramowitz & Stegun, Citation1964) that (A3) $\begin{aligned} \frac{Γ (x + y)}{Γ (x)} \sim x^{y}, as x \to \infty a n d y i s f i x e d . \end{aligned}$ (A3) Recall $\frac{\prod_{i = 1}^{k} Γ_{g_{i}} (\frac{n}{2})}{\prod_{j = 1}^{l} Γ_{h_{j}} (\frac{n}{2})} = \frac{π^{\sum_{i = 1}^{k} (g_{i}^{2} - g_{i}) / 4} \prod_{i = 1}^{k} \prod_{s = 1}^{g_{i}} Γ (\frac{n + 1 - s}{2})}{π^{\sum_{i = 1}^{l} (h_{i}^{2} - h_{i}) / 4} \prod_{j = 1}^{l} \prod_{t = 1}^{h_{j}} Γ (\frac{n + 1 - t}{2})} .$ Since both numerator and denominator include a product of p gamma functions, the result of the lemma then follows directly from Equation (EquationA3(A3) $\begin{aligned} \frac{Γ (x + y)}{Γ (x)} \sim x^{y}, as x \to \infty a n d y i s f i x e d . \end{aligned}$ (A3) ). Note that Equation (EquationA3(A3) $\begin{aligned} \frac{Γ (x + y)}{Γ (x)} \sim x^{y}, as x \to \infty a n d y i s f i x e d . \end{aligned}$ (A3) ) will be sufficient when p is fixed. More precise bounds available in Jameson (Citation2013) could be used when p is growing with n.

Lemma A.3

Let $M$ be a clique model.

If $d e t (Σ_{0}) < d e t (Σ_{0}^{M})$ , then there is a>0, such that ${|\frac{d e t (S_{n}^{M_{0}})}{d e t (S_{n}^{M})}|}^{n / 2} \leq e^{- a n} eventually a.s.$
If $M \neq M_{0}$ is compatible with $Σ_{0}$ , then as $n \to \infty$ ${|\frac{d e t (S_{n}^{M_{0}})}{d e t (S_{n}^{M})}|}^{n / 2} = O_{P} (1) .$

Proof.

If $det (Σ_{0}) < det (Σ_{0}^{M})$ , set $a = \frac{\log det Σ_{0}^{M} - \log det Σ_{0}}{4} .$ By the Strong Law of Large Numbers, $S_{n}^{M_{0}} \to Σ_{0}, S_{n}^{M} \to Σ_{0}^{M}, a . s .$ Thus eventually a.s. $det S_{n}^{M_{0}} / det S_{n}^{M} < e^{- a}$ and the statement of the lemma follows.

If $M \neq M_{0}$ is compatible with $Σ_{0}$ , by the Central Limit Theorem $\sqrt{n} (S_{n}^{M} - S_{n}^{M_{0}}) \overset{D}{⟶} R .$ By Slutsky's theorem the spectral radius and minimum eigenvalue of $(S_{n}^{M_{0}})^{- 1} (S_{n}^{M} - S_{n}^{M_{0}})$ satisfy $ρ = O_{P} (n^{- 1 / 2})$ and $λ = o_{P} (1)$ respectively. Consequently by (Equation23(23) $\begin{aligned} e^{- \frac{p ρ^{2}}{1 + λ}} det (S^{M}) \leq det (S) \leq det (S^{M}) . \end{aligned}$ (23) ) ${|\frac{det S_{n}^{M_{0}}}{det S_{n}^{M}}|}^{n / 2} \leq e^{\frac{n p ρ^{2}}{2 (1 + λ)}} = O_{P} (1) .$

A.5. Proof of Theorem 4.1

Theorem 4.1

For any clique model $M$ that is not compatible with $Σ_{0}$ assume $d e t (Σ_{0}) < d e t (Σ_{0}^{M})$ and the penalty $e^{- a n} q_{M} (n) / q_{M_{0}} (n) \to 0$ for all a>0 as $n \to 0$ .

For any clique model $M$ compatible with $Σ_{0}$ assume that $q_{M} (n) / q_{M_{0}} (n)$ is bounded.

Then as $n \to \infty$ with p held fixed $r_{p} (M_{0} | Y) \overset{P}{⟶} 1.$

Proof.

Because for any fixed p there are finitely many clique models, we only need to prove that for any $M \neq M_{0}$ , $\frac{r_{p} (M | Y)}{r_{p} (M_{0} | Y)} \overset{P}{⟶} 0$ .

Denote by $g_{i}, i = 1, \dots, k$ , the size of cliques in $M$ and $h_{j}, j = 1, \dots, l$ , the size of cliques in $M_{0}$ .

By Lemma (A.1), (A.2) we have as $n \to \infty$ $\frac{r_{p} (M | Y)}{r_{p} (M_{0} | Y)} \sim K n^{- \frac{\sum_{i = 1}^{k} g_{i}^{2} - \sum_{j = 1}^{l} h_{j}^{2}}{4}} \frac{q_{M} (n)}{q_{M_{0}} (n)} {|\frac{det S_{n}^{M_{0}}}{det S_{n}^{M}}|}^{n / 2},$ where K is a constant independent of n.

If $M$ is not compatible with $Σ_{0}$ by assumption and Lemma A.3(i), we have $\frac{r_{p} (M | Y)}{r_{p} (M_{0} | Y)} \to 0$ a.s.

If $M \neq M_{0}$ is compatible with $Σ_{0}$ notice that $M$ is obtained by pooling together some cliques of $M_{0}$ . Therefore $\sum_{i = 1}^{k} g_{i}^{2} - \sum_{j = 1}^{l} h_{j}^{2} > 4$ . Consequently $\frac{r_{p} (M | Y)}{r_{p} (M_{0} | Y)} \overset{P}{⟶} 0$ by assumption and Lemma A.3(ii).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Covariance estimation via fiducial inference

Abstract

1. Introduction

2. Generalised fiducial inference

2.1. Brief background

2.2. Generalised fiducial distribution

3. A fiducial approach to covariance estimation

3.1. Jacobian for full models

3.2. Jacobian for the general case

3.3. Consistency of fiducial distribution

Asymptotic Normality

3.4. Sampling in the general case

3.5. Model selection for the general case

3.6. Sampling in the general case with sparse locations unknown

4. Clique model

4.1. Jacobian for the clique model

4.2. Theoretic results for the clique models

4.3. Sampling from a clique model

5. Discussion

Supplemental Material

Disclosure statement

Notes on contributors

W. Jenny Shi

Jan Hannig

Randy C. S. Lai

Thomas C. M. Lee

References

Appendix

A.1. Regularity conditions and Jacobian formula

Proof of Proposition 3.1

A.2. Proof of Theorem 3.1

A.3. Derivation of the normalising constant (22)

A.4. Lemmas for the clique model

A.5. Proof of Theorem 4.1

Information for

Open access

Opportunities

Help and information

Covariance estimation via fiducial inference

Abstract

1. Introduction

2. Generalised fiducial inference

2.1. Brief background

2.2. Generalised fiducial distribution

3. A fiducial approach to covariance estimation

3.1. Jacobian for full models

3.2. Jacobian for the general case

3.3. Consistency of fiducial distribution

Asymptotic Normality

3.4. Sampling in the general case

3.5. Model selection for the general case

3.6. Sampling in the general case with sparse locations unknown

4. Clique model

4.1. Jacobian for the clique model

4.2. Theoretic results for the clique models

4.3. Sampling from a clique model

5. Discussion

Supplemental Material

Disclosure statement

Additional information

Funding

Notes on contributors

W. Jenny Shi

Jan Hannig

Randy C. S. Lai

Thomas C. M. Lee

References

Appendix

A.1. Regularity conditions and Jacobian formula

Proof of Proposition 3.1

A.2. Proof of Theorem 3.1

A.3. Derivation of the normalising constant (22)

A.4. Lemmas for the clique model

A.5. Proof of Theorem 4.1

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date