262
Views
0
CrossRef citations to date
0
Altmetric
Articles

A class of admissible estimators of multiple regression coefficient with an unknown variance

&
Pages 190-201 | Received 10 Jun 2019, Accepted 21 Jul 2019, Published online: 20 Aug 2019

Abstract

Suppose that we observe yθ, τNp(Xθ,τ1Ip), where θ is an unknown vector with unknown precision τ. Estimating the regression coefficient θ with known τ has been well studied. However, statistical properties such as admissibility in estimating θ with unknown τ are not well studied. Han [(2009). Topics in shrinkage estimation and in causal inference (PhD thesis). Warton School, University of Pennsylvania] appears to be the first to consider the problem, developing sufficient conditions for the admissibility of estimating means of multivariate normal distributions with unknown variance. We generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model. 2-level and 3-level hierarchical models with unknown precision τ are investigated when a standard class of hierarchical priors leads to admissible estimators of θ under the normalised squared error loss. One reason to consider this problem is the importance of admissibility in the hierarchical prior selection, and we expect that our study could be helpful in providing some reference for choosing hierarchical priors.

1. Introduction

Consider a multivariate normal model, (1) yθ,τNp(θ,τ1Ip), independently,wτχm2/τ,(1) where y is a p×1 observation vector, θ is a p-dimensional vector of unknown parameters, and τ>0 is the unknown precision. Statistical properties such as admissibility for estimating θ can be dated back to James-Stein (Citation1961) when the error variance is known, while the admissibility of generalisation of James-Stein estimator of θ with unknown parameter τ was studied in Judge, Yancey, and Bock (Citation1983), Fraisse, Raoult, Robert, and Roy (Citation1990), Robert (Citation2007) and so on. For estimating θ with the unknown nuisance parameter τ in the model (Equation1), some authors, such as Strawderman (Citation1973), Maruyama and Strawderman (Citation2005) and Willing and Zhou (Citation2008) studied the minimaxity of Bayesian estimators of θ under hierarchical priors. The admissibility of a generalised Bayesian estimator of θ under a class of noninformative priors was recently studied in Han (Citation2009). With additional independent observation wττ1χm2, Han (Citation2009) found a set of sufficient conditions for the joint priors of (θ,τ), so that the generalised Bayesian estimator of θ is admissible under the squared error loss. In practice, we often need to consider a normal linear regression model, (2) yθ,τNn(Xθ,τ1In),(2)

where X is n×p design matrix with full column rank p, n>p. It is of great interest to study the admissibility in estimating the unknown regression coefficients θ with unknown τ in the normal linear regression model (Equation2).

Several authors have described admissibility as a powerful tool for selecting satisfactory hierarchical generalised Bayesian priors. For example, Berger, Strawderman, and Tang (Citation2005) pointed out that the ‘use of objective improper priors in hierarchical modelling is of enormous practical importance, yet little is known about which such priors are good or bad. It is important that the prior distribution not be too diffuse, and study of admissibility is the most powerful tool known for detecting an over-diffuse prior’. For known precision or error variance, Brown (Citation1971) provided the necessary and sufficient condition of the admissible Bayes estimators under quadratic loss, based on a Markovian representation of the estimation problem. Recent papers related to the theoretical studies of the admissibility of estimators of θ can be found in Berger and Strawderman (Citation1996), Berger et al. (Citation2005), Berger, Sun, and Song (Citation2018), and so on.

However, most of the literature focussed on models of which variances are given, yet in practical problems, the precision or variance is often unknown. For the admissibility in the model (Equation2), to the best of our knowledge, very few results have been obtained because of the technical difficulty. The fundamental tool for proving admissibility for unknown precision is Blyth's method (Blyth, Citation1951), which proposed a sufficient admissibility condition, relating admissibility of an estimator with the existence of a sequence of prior distributions approximating this estimator. Based on Blyth's results, Han's (Citation2009) found sufficient conditions for the joint priors of (θ,τ) for model (Equation1). Sometimes, those sufficient conditions are strict and difficult to satisfy. We will generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model (Equation2). Using the generalised conditions, a 2-level and 3-level hierarchical models with unknown precision τ are investigated when a standard class of hierarchical priors leads to admissible estimators of θ under the normalised squared error loss. One motivation to consider this problem is the importance of admissibility in the hierarchical prior selection, and we expect that our study could be helpful to provide some reference for choosing hierarchical priors.

The paper is organised as follows. In Section 2, we introduce the sufficient conditions for admissibility of the generalised Bayesian estimators of θ for the model (Equation1), which is studied by Han (Citation2009). In Section 3, we generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model (Equation2). 2-level and a 3-level hierarchical models with unknown precision τ are investigated in Sections 4 and 5, determining when a standard class of hierarchical priors leads to admissible estimators of θ under the normalised squared error loss. Finally, some comments are made in Section 6.

2. Han's (Citation2009) results for (1)

Recall the model (Equation1) considered in Han (Citation2009), i.e. (yθ,τ)Np(θ,τ1Ip),(wτ)τ1χm2, where y=(y1,,yp) and w are independent of each other. Let θˆθˆ(y,w) denote an estimator of θ=(θ1,,θp). Correspondingly, the squared error loss function of θˆ becomes (3) L(θ,τ;θˆ)=τ(θˆθ)(θˆθ).(3) Han (Citation2009) studied a class of prior density for (θ,τ) with assumption (4) π(θ,τ)=π0(θτ)π1(τ).(4) Consequently, the generalised Bayes estimator for the normal mean θ is the posterior mean of θ, given by (5) θˆB(y,w)=Rp0τθf1(yθ,τ)f2(wτ)π0(θτ)π1(τ)dτdθRp0τf1(yθ,τ)f2(wτ)π0(θτ)π1(τ)dτdθ,(5) where f1(yθ,τ)τp/2expτ2yθ2,f2(wτ)w(m2)/2τm/2expτw2. Let m(y,w,τ) be the marginal likelihood function of (y,w,τ) with the form m(y,w,τ)=Rpf1(yθ,τ)f2(wτ)×π0(θτ)π1(τ)dθ. From Brown (Citation1971), the generalised Bayes estimator in (Equation5) can be expressed as (6) θˆB(y,w)=y+0ym(y,w,τ)dτ0τm(y,w,τ)dτ,(6) where ∇ denotes the gradient. Let S denote the ball of radius 1 at the origin in IRp and Sc be the complement of S, defined ab=max(a,b). For the hierarchical Bayes model (Equation1), Han (Citation2009) studied the admissible generalised Bayes estimators θˆB(y,w) under the following sufficient conditions.

  • Condition 1. Sc0(1/τ)(π0(θτ)/θ2log(θ2))π1(τ)dτdθ<;

  • Condition 2. Sc0π0(θτ)π1(τ)dτdθ<;

  • Condition 3. Sc0τθ2π0(θτ)π1(τ)dτdθ<;

  • Condition 4. Sc0(1/τ)(θπ0(θτ)2/π0(θτ))π1(τ)dτdθ<;

  • Condition 5. For any positive constant B, θ2<Bτ<Bπ0(θτ)π1(τ)dτdθ<;

  • Condition 6. Define two sequences of functions (7) hj(θ)=1,ifθ<1;1log(θ)logj,1θj;0,θ>j,andlj(τ)=1,if τ<1;1log(τ)logj,1τj;0,τ>j.(7) Write Hj(θτ)=hj(θ)π0(θτ) and Lj(τ)=lj(τ)π1(τ). There is a constant C>0, such that S0τf1(yθ,τ)f2(wτ)Hj(θτ)Lj(τ)dτdθSc0τf1(yθ,τ)f2(wτ)Hj(θτ)Lj(τ)dτdθ<C, y,w.

Theorem 2.1

Han, Citation2009

Consider the model (Equation1) with the prior densities π0(θτ) and π1(τ) satisfying Conditions 1–6. If π0(θτ) is decreasing with respect to θ, the corresponding generalised Bayes estimator (Equation6) for θ is admissible under the squared error loss function (Equation3).

3. Main results for (2)

We are primarily interested in the normal linear regression model (Equation2). For the model (Equation2), we let y~=(XX)1Xy denote the least squared estimators of θ, and w=y(InX(XX)1X)y be the usual residual sum squares errors (SSE). Then (8) y~θ,τNp(θ,τ1(XX)1),andwττ1χnp2,(8) independently. Here we obtain w automatically with m=np. For the model (Equation2), consider the normalised squared error loss function of θˆ given by (9) L(θ,τ;θˆ)=τ(θˆθ)XX(θˆθ).(9) The corresponding risk function of θˆ is (10) R(θ,τ;θˆ)=Eθ,τL(θ,τ;θˆ),θRp.(10) An estimator θˆ1 is inadmissible if there exists another estimator whose risk function is nowhere bigger and somewhere smaller. If no such better estimator exists, θˆ1 is admissible.

For the model (Equation2), to obtain the admissible estimator of θ under the normalised squared error loss (Equation9), we define (11) δB(y)=(T)1θˆB(Ty~,w),(11) where T is a p×p matrix such that T(XX)1T=Ip.

Lemma 3.1

For the model (Equation1), assume the estimator θˆB(y,w) in (Equation6) is admissible under the loss function (Equation3). Then the estimator δB(y) in (Equation11) is admissible under the normalised squared error loss (Equation9) for the model (Equation2).

Proof.

Note that the model (Equation2) is equivalent to (Equation8). It yields that (12) (Ty~θ,τ)Np(Tθ,τ1Ip).(12) It follows from the admissibility of θˆB(y,w) under the model (Equation1) that the estimator θˆB(Ty~,w) for Tθ is admissible under the loss function L(Tθ,τ;θˆB(Ty~,w))=τθˆB(Ty~,w)Tθ×θˆB(Ty~,w)Tθ=τ(T)1θˆB(Ty~,w)θ(XX)×(T)1θˆB(Ty~,w)θ=τ(δBθ)XX(δBθ). The proof of this lemma is completed.

Combining Theorem 1 with Lemma 1, we can reach the following theorem.

Theorem 3.2

For the model (Equation2) with the prior densities π0(θτ) and π1(τ) satisfying Conditions 1–6, suppose that π0(θτ) is decreasing with respect to θ, then δB(y) defined in (Equation11) is the admissible estimator of θ under the normalised squared error loss (Equation9).

Theorem 2.1 applies to the case where π0(θτ) is spherically symmetric of θ and decreases in θ. As discussed in Han (Citation2009), this requirement is not unique and can be replaced by the following condition.

Condition 7. Denote (13) u1=0Sf1(yθ,τ)f2(wτ)θ×π0(θτ)π1(τ)dθdτ,(13) (14) z1=0Sτf1(yθ,τ)f2(wτ)×π0(θτ)π1(τ)dθdτ,(14) (15) u2j=0Sf1(yθ,τ)f2(wτ)×θHj(θτ)Lj(τ)dθdτ,(15) (16) z2j=0Sτf1(yθ,τ)f2(wτ)×Hj(θτ)Lj(τ)dθdτ.(16) We have (17) u1z1y,(17) (18) u2jz2jy,for any j=1,2,.(18) As an immediate corollary, we have the following result.

Theorem 3.3

For the model (Equation2), assume that the prior densities π0(θτ) and π1(τ) satisfy Conditions 1–7. Then estimator δB(y) in (Equation11) for θ is admissible under the normalised squared error loss (Equation9).

It might be difficult to show that the π0(θτ) is a decreasing function of θ. Interestingly, this requirement can be relaxed to the requirement that π0(θτ) is a decreasing function of its component, θi2, for i=1,,p.

Lemma 3.4

For the model (Equation1) with given τ, π0(θτ) is a decreasing function of θi2, for i=1,,p, then Condition 7 holds.

Proof.

For any given y, there is an orthogonal matrix Q, such that Qy=(y,0,,0). Without loss of generality, we can transform the coordinate system of θ such that y=(y,0,,0). Then, f1(yθ,τ)=(2π)p/2τp/2expτy2+θ22×expτθ1y. It is easy to verify that the ith coordinate of u1 is (19) v1i=0Sf1(yθ,τ)f2(wτ)×π0(θτ)θiπ1(τ)dθdτ.(19) Since π0(θτ) is a function of (θ12,,θp2), f1(yθ,τ)f2(wτ)(π0(θτ)/θi)π1(τ) is an odd function for θi when i=2,,p. It yields v12==v1p=0. Therefore, u1=|v11| and u1/z1=|v11|/z1.

Let f(y,w,θ,τ) be the joint density of (y,w,θ,τ), i.e. f(y,w,θ,τ)=f1(yθ,τ)f2(wτ)π0(θτ)π1(τ). Using (Equation19), we get (20) 0Sf(y,w,θ,τ)θ1dθdτ=0Sf1(yθ,τ)θ1f2(wτ)×π0(θτ)π1(τ)dθdτ+0Sf1(yθ,τ)f2(wτ)×π0(θτ)θ1π1(τ)dθdτ=0S(yθ1)τf(y,w,θ,τ)dθdτ+v11.(20) By the Divergence Theorem (Katz, Citation2005), (21) 0Sf(y,w,θ,τ)θ1dθdτ=0Sf(y,w,θ,τ)dθ2dθpdτ,(21) where S is the boundary of S. Combining (Equation20) and (Equation21), we get v11=0Sf(y,w,θ,τ)dθ2dθpdτ0S(yθ1)τf(y,w,θ,τ)dθdτ. Then, we have v11z1=0Sf(y,w,θ,τ)dθ2dθpdτ0Sτf(y,w,θ,τ)dθ1dθpdτ0S(yθ1)τf(y,w,θ,τ)dθdτ0Sτf(y,w,θ,τ)dθ1dθpdτ=0Sf(y,w,θ,τ)dθ2dθpdτ0Sτf(y,w,θ,τ)dθ1dθpdτy+0Sθ1τf(y,w,θ,τ)dθdτ0Sτf(y,w,θ,τ)dθ1dτ. Clearly, 0Sf(y,w,θ,τ)dθ2dθpdτ0Sτf(y,w,θ,τ)dθ1dθpdτ0. Since f(y,w,θ,τ) can be written by f(y,w,θ,τ)=(2π)p/2τp/2expy2+θ22τ×exp(yθ1τ)f2(w|τ)π0(θτ)π1(τ), and π0(θτ) is symmetric about θ1, then 0Sθ1τf(y,w,θ,τ)dθdτ0Sτf(y,w,θ,τ)dθ1dτ0. Therefore, v11z1y. Since π0(θτ) is an even function for θ1 and decreasing in θ12, then v11<0. Therefore, we have u1/z1=|v11|/z1y. With the same argument as above, we have u2/z2jy, for any j=1,2,.

Consequently, we obtain the following result.

Theorem 3.5

For the model (Equation2), assume that the prior densities π0(θτ) and π1(τ) satisfy Conditions 1–6. If for any given τ>0, π0(θτ) is decreasing in θi2, i=1,,p. the estimator δB(y) in (Equation11) for θ is admissible under the normalised squared error loss (Equation9).

Sometimes, π0(θτ) is not strictly a decreasing function of its component, θi2, for i=1,,p, but it could be a decreasing function of the components of some given orthogonal transformation. The following lemma shows that such cases also work.

Lemma 3.6

Consider the model (Equation1). Suppose there is an orthogonal matrix H, such that u=Hθ=(u1,,up), and π0(θτ) is a decreasing function of ui2, for i=1,,p, then Condition 7 holds.

Proof.

It is easy to verify that the Jacobian of the transformation u=Hθ=(u1,,up) is J=|θ/u|=1, and θ=u. Note that u1=0Ωf1(Hyu,τ)f2(wτ)×0uπ0(Huτ)π1(τ)dudτ, where f1(Hyu,τ) is the normal density function of Hy with mean u and variance τ1Ip, and Ω={u:u1}. Similarly, we have z1=0Ωτf1(Hyu,τ)f2(wτ)×π0(Huτ)π1(τ)dudτ. Since π0(Huτ) s a decreasing function of ui2, for i=1,,p, from Lemma 3.4, for any y and w, we have 0Ωf1(Hyu,τ)f2(wτ)×0uπ0(Huτ)π1(τ)dudτHy0Ωτf1(Hyu,τ)f2(wτ)×π0(Huτ)π1(τ)dudτ, i.e. u1z1y. With the same argument as above, we have u2/z2jy, for any j=1,2,.

Accordingly, we get the following result.

Theorem 3.7

For the model (Equation2), assume that the prior densities π0(θτ) and π1(τ) satisfy Conditions 1-6. If there is an orthogonal matrix H, such that u=Hθ=(u1,,up), and π0(θτ) is a decreasing function of ui2, for i=1,,p, the estimator δB(y) in (Equation11) for θ is admissible under the normalised squared error loss (Equation9).

In the next two sections, we will apply the above results to a 2-level and a 3-level hierarchical model, with unknown variance and a standard class of hierarchical priors.

4. Admissibility for a 2-level hierarchical model

4.1. g-Prior

For the model (Equation2), we consider the following class of hierarchical prior for (θ,τ), (22) (θg,τ)Np(0,gτ1(XX)1),π1(τ)1τk,(22) where k0. Zellner (Citation1986) proposed this form of the conjugate Normal-Gamma family with k=1. Many authors followed his work, for example, Eaton (Citation1989), Berger, Pericchi, and Varshavsky (Citation1998), Liang, Paulo, Molina, Clyde, and Berger (Citation2008) and Bayarri, Berger, Forte, and Garcła-Donato (Citation2012). From the perspective of model selection, g acts as a dimensionality penalty (Liang et al., Citation2008). For the choice of g, we study two cases:

Case 1. g is a known positive constant.

Recommendations for g have included the following: Kass & Wasserman's (Citation1995) unit information prior (g=n), Foster & George's (Citation1994) risk inflation criterion (g=p2), Fernández, Ley, & Steel's (Citation2001) benchmark prior (g=max(n,p2)) and so on.

Case 2. g is an unknown parameter, and the prior of g is π2(g).

By integrating out the latent variable g, one can get the conditional prior of θ given τ>0, (23) π0(θτ)=0τ2gπp/2×expτθXXθ2gπ2(g)dg,(23) which can be represented as a mixture of g priors.

For Case 2, some priors π2(g) have been previously considered. Here are two examples.

Example 4.1

Inv-Gamma(v,c), i.e. (24) π2(g)=cvΓ(v)g(v+1)ec/g.(24) As discussed by Berger and Strawderman (Citation1996), it results in the multivariate t-prior for θ given τ>0, namely (25) π0(θτ)τp/21+τ2cθ(XX)θ(p/2+v).(25) Zellner-Siow (Citation1980) studied the multivariate cauchy prior for θ, which is one special case of (Equation25) with v=1/2 and c=n/2.

Example 4.2

Robust prior (Bayarri et al., Citation2012): (26) π2(g)=h1[h3(h2+p)]h1(h2+g)(h1+1)×1{g>h3(h2+p)h2}(h2+g)(h1+1)1{g>h3(h2+p)h2},(26)

where h1>0, h2>0, and h3h2/(h2+p). The prior (Equation26) has its origins in the robust prior introduced by Strawderman (Citation1971), Berger (Citation1980) and Berger (Citation1985). As Bayarri et al. (Citation2012) discussed, the priors proposed by Liang et al. (Citation2008) are particular cases with h1=12,h2=1,h3=1/(1+p) (the hyper-g prior) and h1=12,h2=p,h3=12 (the hyper-g/n prior). The prior in Cui and George (Citation2008) has h1=1,h2=1,h3=1/(1+p).

For the robust prior (Equation26), it is not straightforward to obtain the closed form of the marginal conditional prior for θ given τ. Alternatively, we attempt to get the boundary of the marginal density of θ given τ.

Lemma 4.3

Define (27) f(u)01vr1(v+c)r2expu2vdv,(27) where r1>1, r20 and c>0, then there are two positive constants C1 and C2, such that C1ur11(1+u)r2f(u)C2ur11(1+u)r2, for any u>0.

The proof is given in the Appendix. Applying this lemma to (Equation23), the resulting prior for θ given τ for robust prior (Equation26) with h3=h2/(h2+p) has the boundary (28) C1τ(θXXθ)p/21(1+τθXXθ)h1+1π0(θτ)C2τ(θXXθ)p/21(1+τθXXθ)h1+1,(28) where p>2.

4.2. Admissibility

We apply the results in Section 3 to determine when the hierarchical priors (Equation22) result in admissible estimators of θ under the normalised squared error loss (Equation9).

Theorem 4.4

(Case 1) For the model (Equation2) under the hierarchical prior (Equation22) with a given g. If 0k<1, the estimator δB(y) in (Equation11) for θ is admissible under the normalised squared error loss (Equation9).

The proof of Theorem 4.4 is similar to the proof of Theorem 4.5 later, thus it is omitted. As discussed by George and Foster (Citation2000), the choice of g effectively controls model selection, with large g typically concentrating the prior on parsimonious models with a few large coefficients, whereas small g tends to concentrate the prior on saturated models with small coefficients. Herein, we consider Case 1 from the perspective of admissibility, not the model selection. From Theorem 4.4, the choice of fix g has no effect on the admissibility of estimators δB(y) of θ.

Next, we consider Case 2. The prior density of g satisfies the following conditions:

  • Condition A1. π2(g) is a continuous function in (0,);

  • Condition A2. aR, π2(g)=O(ga), as g0;

  • Condition A3. b0, π2(g)Cgb, as g for some constant C>0.

Clearly, two examples of π2(g) discussed in Section 4.1 satisfy Condition A1–A3 with appropriate a and b.

Theorem 4.5

(Case 2) For the model (Equation2) with the hierarchical prior (Equation22), assume π2(g) satisfies Condition A1–A3. If 0k<1, a>k−1 and k+b>3, the estimator δB(y) in (Equation11) for θ is admissible under the normalised squared error loss (Equation9).

Proof.

It is convenient to write XX=HDH, where H is the matrix of eigenvectors corresponding to D=diag(d1,d2,,dp) with d1dp. Define u=Hθ=(u1,,up). From (Equation23), the conditional prior of θ given τ>0 is π0(θτ)=0τ2gπp/2×expτ2gi=1pui2diπ2(g)dg, which is a decreasing function of ui2, for i=1,,p. From Theorem 3.7, we just need to verify Condition 1–6. For Condition 1, Sc01τπ0(θ|τ)θ2log(θ2)π1(τ)dτdθ=Sc001τπ0(θg,τ)θ2log(θ2)×π1(τ)π2(g)dτdgdθSc0gp/2θ2log(θ2)π2(g)×0τp/2k1expdpτθ22gdτdgdθ. If k<p/2, there is a positive constant C, such that Sc01τπ0(θτ)θ2log(θ2)π1(τ)dτdθCSc0gp/2θ2log(θ2)×π2(g)θ2gp/2+kdgdθ=CSc1θ2+p2klog(θ2)dθ×0gkπ2(g)dg. By polar coordinate transformation r=θ, the integration over θ becomes Sc1θ2+p2klog(θ2)dθ=11r32klog(r2)dr, which is finite if 32k1, i.e. k1. Since π2(g) satisfies Condition A1-A3, there are some postive constants N0<N1<N2, C1 and C2 such that 0gkπ2(g)dg=0N0gkπ2(g)dg+N0N1gkπ2(g)dg+N1gkπ2(g)dgC10N0gakdg+N0N1gkπ2(g)dg+C2N1gkbdg, which is finite if a>k−1, and k+b>1.

For Condition 2, Sc0π0(θτ)π1(τ)dτdθSc0gp/2π2(g)×0τp/2kexpdpτθ22gdτdgdθ=Γ(1+p/2k)(dp/2)1+p/2kSc1θ2+p2kdθ×0g1kπ2(g)dg, which is finite if 0k<1, a>k−2 and k+b>2.

For Condition 3, Sc0τθ2π0(θτ)π1(τ)dτdθSc0θ2gp/2π2(g)×0τ1+p/2kexpdpτθ22gdτdgdθ=Γ(2+p/2k)(dp/2)2+p/2kSc1θ2+p2kdθ×0g2kπ2(g)dg, which is finite if 0k<1, a>k−3 and k+b>3.

For Condition 4, note that θπ0(θτ)=XXθτp/2+10g(p/2+1)×expτθXXθ2gπ2(g)dg. By Cauchy-Schwarz inequality, it yields θπ0(θτ)2d12θ2τp+20gp/2×expτθXXθ2gπ2(g)dg×0g(p/2+2)expτθXXθ2g×π2(g)dgd12θ2τp/2+2π0(θτ)×0g(p/2+2)expdpτθ22g×π2(g)dg. Therefore, Sc01τθπ0(θτ)2π0(θτ)π1(τ)dτdθd12Sc00θ2τp/2+1g(p/2+2)×expdpτθ22gπ2(g)dgπ1(τ)dτdθ=d12Sc0θ2g(p/2+2)π2(g)×0τp/2+1kexpτθ22gdτdgdθ=Γ(2+p/2k)(dp/2)2+p/2kSc1θ2+p2kdθ×0gkπ2(g)dg, which is finite if 0k<1, a>k−1, and k+b>1.

For Condition 5, θ2<Bτ<Bπ0(θτ)π1(τ)dτdθ00Bgp/2τp/2k×θ2<Bexpdpτθ22gdθπ2(g)dτdg. By polar coordinate transformation r=θ2, the integration over θ becomes θ2<Bexpdpτθ22gdθ=0Brp/21expdpτr2gdrCτp/2gp/2. Therefore, θ2<Bτ<Bπ0(θτ)π1(τ)dτdθC0Bτkdτ0π2(g)dg, which is finite if 0k<1, a>−1, and b>1.

Combining these restrictions, we can find that when 0k<1, a>k−1 and k+b>3, Conditions 1–5 hold. As Han (Citation2009) discussed, Condition 6 is very mild. Proceeding in an analogous way on page 47 of Han (Citation2009), Condition 6 holds. By Theorem 3.7, the estimator δB(y) in (Equation11) for θ is admissible.

We are also interested in admissible estimators under Inv-Gamma and robust prior for g. Using Theorem 4.5, we have the following results.

Theorem 4.6

For the model (Equation2) with the hierarchical prior (Equation22), assume π2(g) is Inv-Gamma(v,c). If 0k<1, and v>2k, the estimator δB(y) in (Equation11) for θ is admissible.

Proof.

By Theorem 4.5 with any constant a>k−1 and b=v+1, the result holds.

Theorem 4.7

For the model (Equation2) with the hierarchical prior (Equation22), assume π2(g) is robust prior (Equation26). If 0k<1, and h1>2k, the estimator δB(y) in (Equation11) for θ is admissible.

Proof.

From Theorem 4.5 with a=0 and b=h1+1, the proof is completed.

5. Admissibility for a 3-level hierarchical model

We also study a 3-level hierarchical model and determine which elements of the hierarchical prior class lead to admissible estimators of the θ under normalised squared error loss.

5.1. The model and priors

Consider the following 3-level hierarchical model (29) Level 1:(yθ,τ)Nn(Xθ,τ1In);Level 2:(θβ,τ,g)Np(Zβ,gτ1A);Level 3:(βλ,τ)Ns(0,λτ1B),(29) where Z is a given p×s matrix with full rank s, β is the s×1 unknown vector, A and B are a p×p and s×s known covariate matrix, respectively, and λ is an unknown hyperparameter. To simplify the computation, without loss of generality, we set A=Ip and B=Is.

Assume π1(τ)τk, and the prior of g satisfies the Condition A1–A3. The prior π3(λ) satisfies the following conditions.

  • Condition B1. π3(λ) is a continuous function in (0,);

  • Condition B2. c1, π3(λ)=O(λc1), as λ0;

  • Condition B3. c20, π3(λ)Cλc2 as λ for some constant C>0.

5.2. Admissibility

The following lemma is needed.

Lemma 5.1

For the 3-level hierarchical model (Equation29), assume π1(τ)τk, π2(g) satisfies Condition A1–A3, and π3(λ) satisfies Condition B1–B3. Then (30) π0(θτ)τp/2exp12τθ(gIp+λZZ)1θ×(g+λ)s/2g(ps)/2π2(g)π3(λ)dgdλ.(30)

Proof.

Note that π0(θτ)00Rsπ0(θβ,τ,g)π(βλ)π2(g)π3(λ)dβdgdλτ(p+s)/200Rs×expτθZβ22gτβ22λ×gp/2λs/2π2(g)π3(λ)dβdgdλ. Define β0=g1(λ1Is+g1ZZ)1Zθ, we have τθZβ2g+τβ2λ=τ(ββ0)×(λ1Is+g1ZZ)×(ββ0)+τθ×(gIp+λZZ)1θ. Then the marginal distribution of θ given τ, π0(θτ)τp/200|λ1Is+g1ZZ|1/2×exp12τθ(gIp+λZZ)1θ×gp/2λs/2π2(g)π3(λ)dgdλ, which is proportional to (Equation30). The proof is completed.

Theorem 5.2

For the 3-level hierarchical model (Equation29), assume π1(τ)τk, π2(g) satisfies Condition A1–A3, and π3(λ) satisfies Condition B1–B3. Then, the estimator δB(y) in (Equation11) for θ is admissible if 0k<1, b>3k, c2>3+(ps)/2k, and one of the following conditions holds,

  1. p>s, a>(ps)/2+1, c1>1;

  2. p=s, a>1, c1>1+k;

  3. p=s, a>1+k, c1>1.

Proof.

It is convenient to write ZZ=ΓDΓ, where Γ is the matrix of eigenvectors corresponding to D=diag(z1,z2,,zp) with z1zp. Herein, we denote u=Γθ=(u1,,up). Therefore, from Lemma 5.1, π0(θτ)τp/2expτ2i=1p(g+ziλ)1ui2×(g+λ)s/2g(ps)2π2(g)π3(λ)dgdλ, which is a decreasing function of ui2, for i=1,,p. In addition, there are two positive constant C1 and C2, such that π0(θτ)C1τp/2expC2τθ2g+λ×(g+λ)s/2g(ps)/2π2(g)π3(λ)dgdλ. For the technical reasons, we first consider Condition 2. Note that Sc0π0(θτ)π1(τ)dτdθC1Sc00×0τp/2kexpC2τθ2g+λdτ×(g+λ)s/2g(ps)/2π2(g)π3(λ)dλdgdθ=C1Γ(1+p/2k)C21+p/2kSc1θ2+p2kdθ×(λ+g)1+(ps)/2kg(ps)/2×π2(g)π3(λ)dgdλ. The integration over θ is finite if k<1. For simplicity, denote l=1+(ps)/2k and h=(ps)/2. If 0k<1, we have l>0.

Note that (31) 00(λ+g)1+(ps)/2kg(ps)/2π2(g)π3(λ)dgdλ=0101+010+101+11×(λ+g)lghπ2(g)π3(λ)dgdλI1+I2+I3+I4.(31) Clearly, I12l01ghπ2(g)dg×01π3(λ)dλ, which is finite if a>h−1 and c1>1. Clearly, I42l1glhπ2(g)dg×1λlπ3(λ)dλ, which is finite if b>1−h+l and c2>1+l. Similarly, it is easy to verify that I2+I3 is finite if a>h−1, b>1−h+l, c1>1 and c2>1+l. Therefore, (Equation31) is finite if a>(ps)/21,b>2k,c1>1 and c2>2+(ps)/2k.

Similarly, for Condition 3, (32) Sc0τθ2π0(θτ)π1(τ)dτdθC1Sc1θ2+p2kdθ×(λ+g)2+(ps)/2kg(ps)/2×π2(g)π3(λ)dgdλ,(32) where C1 is a postive constant.

Clearly, 2+(ps)/2k>0 if 0k<1. As in the proof of Condition 2, (Equation32) is finite if 0k<1, a>(ps)/21,b>3k,c1>1 and c2>3+(ps)/2k.

For Condition 4, from Lemma 5.1, note that θπ0(θτ)=τp/2+1(gIp+λZZ)1θ×exp12τθ(gIp+λZZ)1θ×(g+λ)s/2g(ps)/2π3(λ)×π2(g)dgdλ. We will consider two cases, i.e. p>s and p=s, respectively. If p>s, (gIp+λZZ)1θg1θ. It yields θπ0(θτ)2θ2τp+2×exp12τθ(gIp+λZZ)1θ×(g+λ)s/2g(ps)/21π3(λ)π2(g)dgdλ2θ2τp/2+2π0(θτ)×exp12τθ(gIp+λZZ)1θ×(g+λ)s/2g(ps)/22×π3(λ)π2(g)dgdλC1|θ2τp/2+2π0(θτ)expC2τθ2g+λ×(g+λ)s/2g(ps)/22π2(g)π3(λ)dgdλ. In the second step, we apply the Cauchy-Schwartz inequality. Therefore, (33) Sc01τθπ0(θτ)2π0(θτ)π1(τ)dτdθC1Scθ20τ1+p/2k×expC2τθ2g+λdτ(g+λ)s/2g(ps)/22×π2(g)π3(λ)dgdλdθ=C1Sc1θ2+p2kdθ×(λ+g)(ps)/2+2k×g(ps)/22π2(g)π3(λ)dgdλ.(33) As in the proof of Condition 3, (Equation33) is finite if 0k<1, a>(ps)/2+1, b>1−k, c1>1 and c2>3+(ps)/2k.

If p=s, there is a positive constant C3, such that (gIp+λZZ)1θC3(g+λ)1θ. Therefore, using the Cauchy-Schwartz inequality, θπ0(θτ)2C4θ2τp/2+2π0(θτ)×expC2τθ2g+λ×(g+λ)p/22π2(g)π3(λ)dgdλ, where C4=C1C32. Thus, (34) Sc01τθπ0(θτ)2π0(θτ)π1(τ)dτdθC4Scθ2×0τ1+p/2kexpC2τθ2g+λdτ×(g+λ)p/22π2(g)π3(λ)dgdλdθ=C4Sc1θ2+p2kdθ×(λ+g)kπ2(g)π3(λ)dgdλ,(34) where C4 is a positive constant. Note that the integration over θ is finite if 0k<1.

If k0, 00(λ+g)kπ2(g)π3(λ)dgdλ0π2(g)dg0λkπ3(λ)dλ, which is finite if a>1,b>1,c1>1+k and c2>1k. Meanwhile, 00(λ+g)kπ2(g)π3(λ)dgdλ0gkπ2(g)dg0π3(λ)dλ, which is finite if a>1+k,b>1k,c1>1 and c2>1.

For Condition 5, note that θ2<Bτ<Bπ0(θτ)π1(τ)dτdθC10Bτp/2k×θ2<BexpC2τθ2g+λdθ×(g+λ)s/2g(ps)/2π2(g)π3(λ)dλdgdτ. By polar coordinate transformation r=θ2, the integration over θ becomes θ2<BexpC2τθ2g+λdθ=0Brp/21expC2τrg+λdC5τp/2(g+λ)p/2, where C5 is a positive constant. Therefore, (35) θ2<Bτ<Bπ0(θτ)π1(τ)dτdθC50Bτkdτ×00(λ+g)(ps)/2×g(ps)/2π2(g)π3(λ)dgdλ,(35) which is finite if 0k<1, a>(ps)/21,b>1,c1>1 and c2>1+(ps)/2.

Combining the above results, Conditions 2–5 hold if (k,a,b,c1,c2) satisfy the conditions as this theorem states. For Condition 1, (36) Sc01τπ0(θτ)θ2log(θ2)π1(τ)dτdθC1Sc1θ2log(θ2)×0τp/2k1expC2τθ2g+λdτ×(g+λ)s/2g(ps)2×π2(g)π3(λ)dgdλdθC6Sc1θ2+p2kdθ×(λ+g)(ps)/2kg(ps)/2×π2(g)π3(λ)dgdλ,(36) where C6 is a positive constant. If p=s, (Equation36) can be proceeded as (Equation34). If p>s, it is easy to verify that (Equation36) is finite if 0k<1, a>(ps)/2+1, b>3k, c2>3+(ps)/2k, c1>1. Proceeding in an analogous way on page 47 of Han (Citation2009), Condition 6 also holds. By Theorem 3.7, estimator (Equation11) for θ are admissible.

6. Comments

In Section 2, we listed the sufficient conditions for admissibility of the estimators of θ with unknown τ, which was developed by Han (Citation2009). In Section 3, we generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model (Equation2). We have to admit that those sufficient conditions are still not optimal enough. Sometimes, we can't obtain satisfactory results utilising the conditions directly. In our paper, we consider π1(τ)τk for the prior of τ. The condition of k for admissibility is 0k<1. Unfortunately, we can't prove the admissibility for the boundary point k=1, which is of great interest since it is the natural extension of Stein's harmonic prior (Stein, Citation1981) to the unknown variance problem. In follow-up work, we will try to explore the more powerful sufficient conditions for admissibility of the estimators of θ with unknown τ. One promising method for this problem may be by Blyth's method (Blyth, Citation1951), discovering an appropriate sequence of finite measures.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The project was supported by the 111 Project of China (No. B14019), the National Natural Science Foundation of China [Grant No. 11671146].

Notes on contributors

Chengyuan Song

Chengyuan Song is a PhD candidate in the College of Statistics, East China Normal University, Shanghai, China. His research interests include Bayesian statistics.

Dongchu Sun

Dr. Dongchu Sun received his PhD. in 1991 from Department of Statistics, Purdue University, under the guidence of Professor James O. Berger.

References

  • Bayarri, M. J., Berger, J. O., Forte, A., & Garcła-Donato, G. (2012). Criteria for Bayesian model choice with application to variable selection. The Annals of Statistics, 40, 1550–1577. doi: 10.1214/12-AOS1013
  • Berger, J. O. (1980). A robust generalized Bayes estimator and confidence region for a multivariate normal mean. The Annals of Statistics, 8, 716–761. doi: 10.1214/aos/1176345068
  • Berger, J. O. (1985). Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer-Verlag Inc.
  • Berger, J. O., Pericchi, L. R., & Varshavsky, J. A. (1998). Bayes factors and marginal distributions in invariant situations. Sankhya, Series A, Indian Journal of Statistics, 60, 307–321.
  • Berger, J. O., & Strawderman, W. E. (1996). Choice of hierarchical priors: Admissibility in estimation of normal means. The Annals of Statistics, 24, 931–951. doi: 10.1214/aos/1032526950
  • Berger, J. O., Strawderman, W., & Tang, D. (2005). Posterior propriety and admissibility of hyperpriors in normal hierarchical models. The Annals of Statistics, 33, 606–646. doi: 10.1214/009053605000000075
  • Berger, J. O., Sun, D., & Song, C. (2018). An objective prior for hyperparameters in normal hierarchical models. Submitted.
  • Blyth, C. (1951). On minimax statistical decision procedures and their admissibility. The Annals of Mathematical Statistics, 22, 22–42. doi: 10.1214/aoms/1177729690
  • Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. The Annals of Mathematical Statistics, 42, 855–903. doi: 10.1214/aoms/1177693318
  • Cui, W., & George, E. I. (2008). Empirical Bayes vs. fully Bayes variable selection. Journal of Statistical Planning and Inference, 138, 888–900. doi: 10.1016/j.jspi.2007.02.011
  • Eaton, M. L. (1989). Group invariance applications in statistics. Institute of Mathematical Statistics, 1, 1–133.
  • Fernández, C., Ley, E., & Steel, M. F. J. (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100, 381–427. doi: 10.1016/S0304-4076(00)00076-2
  • Foster, D. P., & George, E. I. (1994). The risk inflation criterion for multiple regression. The Annals of Statistics, 22, 1947–1975. doi: 10.1214/aos/1176325766
  • Fraisse, A., Raoult, J., Robert, C., & Roy, M. (1990). Une condition nécessaire d'admissibilité et ses conséquences sur les estimateurs a˙ rétrécisseur de la moyenne d'un vecteur normal. Canadian Journal of Statistics, 18, 213–220. doi: 10.2307/3315452
  • George, E. I., & Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika, 87, 731–747. doi: 10.1093/biomet/87.4.731
  • Han, X. (2009). Topics in shrinkage estimation and in causal inference (PhD thesis). Warton School, University of Pennsylvania.
  • James, W., & Stein, C. (1961). Estimation with quadratic loss.Proceedings Fourth Berkeley Symposium in Mathematical Statistics and Probability (Vol. 1, pp. 361–3679).
  • Judge, G., Yancey, T., & Bock, M. (1983). Pre-test estimation under squared error loss. Economics Letters, 11, 347–352. doi: 10.1016/0165-1765(83)90028-9
  • Kass, R. E., & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90, 928–934. doi: 10.1080/01621459.1995.10476592
  • Katz, V. J. (2005). The history of Stokes' theorem. Mathematics Magazine, 52, 146–156. doi: 10.1080/0025570X.1979.11976770
  • Liang, F., Paulo, R., Molina, G., Clyde, M., & Berger, J. O. (2008). Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410–423. doi: 10.1198/016214507000001337
  • Maruyama, Y., & Strawderman, W. (2005). A new class of generalized Bayes minimax ridge regression estimators. The Annals of Statistics, 33, 1753–1770. doi: 10.1214/009053605000000327
  • Robert, C. (2007). The Bayesian choice: From decision-theoretic foundations to computational implementation. New York: Springer.
  • Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9, 1135–1151. doi: 10.1214/aos/1176345632
  • Strawderman, W. (1971). Proper Bayes minimax estimators of the multivariate normal mean. The Annals of Mathematical Statistics, 42, 385–388. doi: 10.1214/aoms/1177693528
  • Strawderman, W. E. (1973). Proper Bayes minimax estimators of the multivariate normal mean vector for the case of common unknown variances. The Annals of Statistics, 1, 1189–1194. doi: 10.1214/aos/1176342567
  • Willing, R., & Zhou, G. (2008). Generalized Bayes minimax estimators of the mean of multivariate normal distribution with unknown variance. Journal of Multivariate Analysis, 99, 2208–2220. doi: 10.1016/j.jmva.2008.02.016
  • Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel, P. and Zellner, A., Eds., Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (pp. 233–243). New York: Elsevier Science Publishers, Inc.
  • Zellner, A., & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), Bayesian statistics (pp. 585–603). Valencia: University Press.

Appendix

A.1. Proof of Lemma 4.3

To simplify the computation, without loss of generality, we set c=1. Let x=u/v, then f(u)=01(u/x)a(u/x+1)bux2exp(x/2)dx=u1a0xa+b2(u+x)bexp(x/2)dx.One just needs to consider (A1) C1(1+u)b0xa+b2(u+x)bexp(x/2)dxC2(1+u)b.(A1)

The integral can be written by 01+1xa+b2(u+x)bexp(x/2)dxI1+I2. For the lower bound of I1 and I2, we have I101xa+b2(u+1)bexp(x/2)dx=C11(1+u)b,I21xa2(u+1)bexp(x/2)dx=C21(1+u)b, where C1=01xa+b2exp(x/2)dx and C2=1xa2exp(x/2)dx. For the upper bound of I1 and I2, we have I101xa+b2(u+1)bxbexp(x/2)dx=C31(1+u)b,I21xa+b2(u+1)bexp(x/2)dx=C41(1+u)b, where C3=01xa2exp(x/2)dx and C4=1xa+b2exp(x/2)dx. Therefore, let C1=C1+C2 and C2=C3+C4. We get (EquationA1). The lemma is proved.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.