1,208
Views
0
CrossRef citations to date
0
Altmetric
Research Article

On the Bayesianity of minimum risk equivariant estimator for location or scale parameters under a general convex and invariant loss function

| (Reviewing Editor)
Article: 1023670 | Received 08 Sep 2014, Accepted 20 Feb 2015, Published online: 17 Mar 2015

Abstract

The Minimum Risk Equivariant (MRE), estimator is a widely used estimator which has several well-known theoretical and practical properties. It is well known that for the square error and absolute error loss functions, the MRE estimator is a generalized Bayes estimator. This article investigates the potential Bayesianity (or generalized Bayesianity) of the MRE estimator under a general convex and invariant loss function, ρ(·), for estimating the location and scale parameters of an unimodal density function.

AMS Subject Classifications:

Public Interest Statement

This article investigates the potential Bayesianity (or generalized Bayesianity) of the MRE estimator under a general convex and invariant loss function, ρ(·), for estimating the location and scale parameters of an unimodal density function.

1. Introduction

Compare with the uniform minimum variance unbiased estimator, the Minimum Risk Equivariant (MRE) estimator: (1) typically exists not only for convex loss function but also even for non-restricted loss functions and (2) does not need to consider randomized estimators (Lehmann & Casella, Citation1998, p. 156). Moreover, the MRE estimator is a widely used estimator which has several well-known theoretical (such as minimaxity and admissibility under some certain conditions) and practical properties. The MRE estimator has a wide range of applications in Finite sampling framework (Chandrasekar & Sajesh, Citation2013; Ledoit & Wolf, Citation2013), Reliability (Chandrasekar & Sajesh, Citation2013; Wei, Song, Yan, & Mao, Citation2000), Regression and non-linear models (Grafarend, Citation2006; Hallin & Jurečková, Citation2012), Contingency tables (Lehmann & Casella, Citation1998), Economic Forecasting (Elliott & Timmermann, Citation2013), etc.

It is well known that for the square error and the absolute error loss functions, the MRE estimator is a generalized Bayes estimator. This article extends this fact to a class of convex and invariant loss functions for unimodal location (or scale) density functions. This fact has been proven remarkably useful in solving a variety of problems in statistics. Subjects for which the problem is applicable range from truncated data (Manrique-Vallier & Reiter, Citation2014), imputation omitted data due to undercounting (Rubin, Gelman, & Meng, Citation2004, §12), Capture–recapture Estimation (Mitchell, Citation2014), etc.

The problem of finding a prior distribution that its corresponding Bayes estimator under a given loss function coincides with a given estimator started by Lehmann (Citation1951). In his seminal paper, he considered a situation that a Bayes estimator under the squared error loss function is an unbiased estimator. His work has been followed and extended by several authors. For instance, Noorbaloochi and Meeden (Citation1983, Citation2000) expanded Lehmann’s (Citation1951) finding for a general class of prior distributions. Kass and Wasserman (Citation1996) reviewed the problem of selecting prior distributions that their corresponding Bayes estimators are invariance under some sort of transforms. Meng and Zaslavsky (Citation2002) considered a class of single observation unbiased priors (i.e. such priors produced unbiased Bayes estimator under squared error loss function). They showed that under mild regularity conditions, such class of priors must be “noninformative” for estimating either location or scale parameters. Gelman (Citation2006) constructed a non-central t student family of conditionally conjugate priors for hierarchical standard deviation parameters. For restricted parameter space, Kucerovsky, Marchand, Payandeh, and Strawderman (Citation2009) provided a class of prior distributions which their corresponding Bayes estimator under absolute value error loss equal to the maximum likelihood estimator. Ma and Leijon (Citation2011) found a conjugate beta mixture prior such that its corresponding Bayes estimator under the variational inference framework retains some given properties.

This paper provides a class of prior distributions that their corresponding Bayes estimator under general convex and invariant loss function coincides with the MRE estimator for location or scale family of distributions.

Section 2 collects some required elements for other sections. The problem of finding such prior distribution for location and scale parameters have been studied in Sections 3 and 4, respectively.

2. Preliminaries

Bayes estimator for an unknown parameter θ under a general loss function ρ has been evaluated from the posterior distribution π(θ|x). Therefore, to study some specific properties of a Bayes estimator, one has to study posterior distribution π(θ|x). The following provides a condition which leads to equivalent Bayes estimator under two prior distributions.

Lemma 1

Suppose X is a continuous random variable with density function f. Moreover, suppose that π1 and π2 are two priors distributions which lead to Bayes estimators δπ1 and δπ2, under a general loss function ρ, respectively. Then, two Bayes estimators δπ1 and δπ2, are equivalent estimator (i.e. δπ1(x)δπ2(x)) if and only if π1(θ)=cπ2(θ), for all θΘ.

Proof

Bayes estimators with respect to π1 and π2 are equivalent if and only if the posterior distribution θ|x under these priors are equivalent, i.e.π1(θ)f(x,θ)Θπ1(θ)f(x,θ)dθ=π2(θ)f(x,θ)Θπ2(θ)f(x,θ)dθπ1(θ)π2(θ)=Θπ1(θ)f(x,θ)dθΘπ2(θ)f(x,θ)dθ

The rest of proof arrives from the fact that left hand side of the above equation is a function of θ while the right hand side is a function of x.

The following from Marchand and Payandeh (Citation2011) recalls the Bayes estimator under generalized loss function ρ for location parameter μ.

Lemma 2

(Marchand & Payandeh, Citation2011) Suppose random variable X sampled from a location density function g0. Moreover, suppose that δπ(x) stands for the Bayes estimator under generalized loss function ρ1 and prior distribution π(μ) for location parameter μ. Then, δπ(x) satisfies(1) -ρ1(δπ(x)-μ)g0(x-μ)π(μ)dμ=0,xR(1)

Now, we extend the above result for the problem of finding a Bayes estimator for a scale parameter θ, under general loss function ρ2 and prior distribution τ(θ).

Lemma 3

Suppose random variable Y sampled from a scale density function f1. Moreover, suppose δπ(y) stands for the Bayes estimator under generalized loss function ρ2 and prior distribution τ(θ) for a scale parameter θ. Then, δπ(y) satisfies(2) 0ρ2δπ(y)θf1yθπ(ln(θ))θ3dθ=0,y0(2)

where τ(θ)=π(ln(θ))/θ.

Proof

Marchand and Strawderman (Citation2005) provided a connection between a scale parameter estimation problem, with elements (Y,θ,f1,ρ2) and a location parameter estimation problem with elements (Y,μ,g0,ρ1). They showed that by choosing Y=ln(X), μ=ln(θ), g0(z)=ezf1(ez), and ρ1(z)=ρ2(ez) transformations the problem of finding a Bayes estimator for a scale parameter θ under general loss function ρ2 and prior distribution τ(θ) can be restated as a problem of finding a Bayes estimator for location parameter μ under general loss function ρ1 and prior distribution π(μ). Moreover, they showed that the Bayes estimator δτ(x) of a location parameter μ with respect to prior τ(μ) and the Bayes estimator δπ(y) of a scale parameter θ with respect to prior π(θ) satisfies δπ(y)=exp{δτ(ln(x))} and π(θ)=τ(ln(θ))/θ. The desired proof arrives from the above observations.

The Fourier transform is an integral transform which defines for an integrable and real-valued f:RC by(3) F(f(t);t;ω):=Rf(t)e-iωtdt,ωC(3)

The convolution theorem for the Fourier transforms states that(4) FRf(x-t)g(t)dt;x;ω=Ff(x);x;ωFg(x);ω(4) (see Dym & Mckean, Citation1972 for more details).

Now, we recall definition and some properties of exponential type functions which are used later in proof of Theorem 4.

Definition 1

A function f in L1(R)L2(R) is said to be an exponential type T function on the domain DC if there are positive constants M and T such that |f(ω)|Mexp{T|ω|}, for ωD.

The Paley–Wiener theorem states that the Fourier (or inverse Fourier) transform of an L2(R) function vanishes outside of an interval [-T,T], if and only if the function is an exponential type T (Dym & McKean, Citation1972, p. 158). An exponential type function is a continuous function which is infinitely differentiable everywhere and has a Taylor series expansion over every interval (see Champeney, Citation1987, p. 77; Walnut, Citation2002, p. 81). The exponential type functions are also called band-limited functions (see Bracewell, Citation2000, p. 119 for more details).

3. Bayesianity of the MRE estimator for location family of distributions

Suppose X has distribution Pμ with respect to Lebesgue density function g0(x-μ) where g0(·) is known and unknown location parameter μR is to be estimated by decision rule a under equivalent loss function L(μ,a):=ρ1(a-μ). An estimator (decision rule) T of μ is location invariant if and only if T(X+c)=T(X)+c, for all cR. The MRE estimator is an estimator which has the smallest risk among all invariant estimators (Shao, Citation2003). It is well known that the MRE estimator for μ based upon one observation is X+r where the value of r will minimize E0[l(X+r)], or for smooth l(·) satisfies E0[l(X+r)]=0 (Lehmann & Casella, Citation1998).

Now, we consider the problem of finding prior distribution π(μ) that its corresponding Bayes estimator δBayesρ1(·) under general convex and invariant loss function ρ1, coincides with the MRE estimator X+r, i.e.(5) δBayesρ1(X)X+r(5)

The following studies possible solution of Equation 5 under the absolute value loss function.

Theorem 1

Suppose X is a continuous random variable with location density function g0. Then, the Bayes estimator under the absolute value loss function and prior μ, coincides with the MRE estimator X+r, whenever π(μ)constant, for μR.

Proof

Under the absolute value loss function and prior π,δπ(x)=x is a Bayes estimator if and only ifpμx+r|X=x=12-x+rπ(μ)g0(x-μ)dμ=x+rπ(μ)g0(x-μ)dμ

Letting π(μ)=c+π(μ),k(x)=g0(x)I(-,r)(x)-f0(x)I(r,)(x), and a=-c-k(x)dx. The above equation can be restated as-π(μ)k(x-θ)dθ=a,xR

An application of the Fourier transform along with the convolution theorem (Equation 4), then, takeing the Fourier transform back lead toπ(μ)=c+F-12aπDirect(ω)F(k(x);x;ω);ω;μ=constant

where Direct,F, and F-1 stand for delta direct function and the Fourier and inverse Fourier transforms, respectively. The desired proof arrives from an application of Lemma 1.

The following theorem using Lemma 2 extends results of Theorem 1 to a general convex and invariant loss function ρ1.

Theorem 2

Suppose X is a continuous random variable with location density function g0. Moreover, suppose that ρ1(·) stands for a convex and invariant loss function. Then, under prior π(μ)constant, the MRE estimator X+r is a Bayes estimator for a location parameter μ.

Proof

Lemma 2 states that the Bayes estimate δπ(x) satisfies the equation -ρ(δπ(x)-μ)g0(x-μ)π(θ)dμ=0, for all x. Setting π(μ)=c+π(μ), one can reduce solving equation δπ(x)x, in π, to-ρ(δπ(x)-μ)g0(x-μ)π(μ)dμ=-cbF(π(μ);μ;ω)=-2cbπDirect(ω)F(ρ(-μ-r)g0(-μ);μ;ω)π(μ)=c-2cbπF-1Direct(ω)F(ρ(-μ-r)g0(-μ);μ;ω);ω;μπ(μ)constant

where b=-ρ(t)g0(t)dt. The last equation arrives from that fact that there exist a delta direct function in the nominator of the function insides of the inverse Fourier transform. An application of Lemma 1 warranties that, if there is another prior distribution π1(μ) such that its corresponding Bayes estimator δπ1(x)x. Then, π1(μ)=cπ(μ), for all μΘ.

The following two propositions verify findings of Theorem 2 for two class of loss functions.

The MRE estimator for normal distribution under the LINEX loss function ρLINEX(δ,μ):=exp{a(δ-μ)}-a(δ-μ)-1 is x-a/2. The following proposition studies Bayes ianity of such MRE estimator.

Proposition 1

Suppose X is a random variable which distributed according to a normal distribution with mean μ and variance 1. Then, the Bayes estimator for μ with respect to prior distribution π(μ)constant, and under the LINEX loss function ρLINEX(δ,μ):=exp{a(δ-μ)}-a(δ-μ)-1 coincides with the MRE estimator X-a/2.

Proof

The Bayes estimate, say δπ(x), with respect to prior π and under loss function ρLINEX finds out by δπ(x)=-ln(Eπ(exp{-aμ}|X=x))/a. Setting π(μ)=c+π(μ), Theorem 2 reduces solving equation δπ(x)x-a/2, in π, toπ(μ)=c-cea2/28π3aF-1Direct(ω)F((ea(μ-a/2)-1)e-μ2/2;μ;ω);ω;μ=c-cea2/28π3a-28π3=c1+12aea2/2=constant

where constance c should be chosen such that π(μ)>0, for all μR.

The following extends the above result to a convex combination of two LINEX loss functions, say ρCLINEX.

Proposition 2

Suppose X is a random variable which distributed according to a normal distribution with mean μ and variance 1. Then, the Bayes estimator for μ with respect to prior distribution π(μ)constant, and under convex combination of two LINEX loss functions ρCLINEX(δ,μ):=α(exp{a(δ-μ)}-a(δ-μ)-1)+(1-α)(exp{-a(δ-μ)}+a(δ-μ)-1) coincides with the MRE estimator X+r.

Proof

Setting π(μ)=c+π(μ), along with result of Theorem 2, one may reduce solving equation δπ(x)x+r, in π, toπ(μ)=c-cb8π3aF-1Direct(ω)F((α(ea(μ+r)-1)-(1-α)(e-a(μ+r)+1))e-μ2/2;μ;ω);ω;μ=c-cb8π32πa2+2ea2/2-ra+4αea2/2sinh(ra)-4α1+ea2/2-ra+2αea2/2sinh(ra)-2α=constant

where b:=-(α(eat-1)-(1-α)(e-at+1))e-t2/2/2πdt and constance c should be chosen such that π(μ)>0, for all μR. The second equality arrives from the fact that F(e-μ2/2;μ;ω)=2πe-ω2/2,Fe±a(μ+r)-μ2/2;μ;ω=2πe±ar-(ω±ai)2/2, and F-1(Direct(ω)h(ω);ω;μ)=h(0).

4. Bayesianity of the MRE estimator for scale family of distributions

Suppose X has distribution Pθ with respect to Lebesgue density function f1(x/θ) where f1(·) is known and unknown scale parameter θR+ is to be estimated by decision rule a under equivalent loss function L(θ,a):=ρ2(a/θ). An estimator (decision rule) T of θ is scale invariant if and only if T(cX)=cT(X), for all cR. The MRE estimator is an estimator which has the smallest risk among all invariant estimators (Shao, Citation2003). It is well known that the MRE estimator for θ based upon one observation is rX where the value of r will minimize E1[l(rX)], for smooth l(·) satisfies E1[l(rX)]=0 (Lehmann & Casella, Citation1998).

Now, we consider the problem of finding prior distribution π(θ) that its corresponding Bayes estimator δBayesρ2(·) under general convex and invariant loss function ρ2, coincides with the MRE estimator rX, i.e.(6) δBayesρ2(X)rX(6)

The following studies possible solution of Equation 6 under the absolute value loss function for symmetric-scale distribution functions (see Jafarpour & Farnoosh, Citation2005 for more details on symmetric-scale distribution functions).

Theorem 3

Suppose non-negative and continuous random variable X distributed according to a scale density function f1. Moreover, suppose that X/θ is symmetric about 1/r. Then, the Bayes estimator, under the absolute value loss function and prior distribution τ(θ)=1/θ, coincides with the MRE estimator rX.

Proof

Under the absolute value loss function, solving equation δτ(x)rx, in τ, can be restated asmedian(θ|X=x)=rx0rxτ(θ)θf1(xθ)dθ=rxτ(θ)θf1(xθ)dθ(lety=x/θ)01/rτ(x/y)yf1(y)dy=1/rτ(x/y)yf1(y)dy2Ef1τ(x/Y)YI[0,1/r](Y)|X=x-Ef1τ(x/Y)Y|X=x=0Covf1I[0,1/r](Y),τ(x/Y)Y|X=x=0

From the above equation, one may conclude that, a trivial solution of τ is to let τ(x/Y)/Y be free of Y. This observation along with an application of Lemma 1 complete the desired proof.

XUnif(0,2θ/r),θ,r>0, is an obvious example for such symmetric-scale distribution satisfies Theorem 3’s conditions. Since rX/θUnif(0,2) an expression E(ρ2(rX/θ)) does not depend on θ. Therefore, rX is a MRE estimator (see Rohatgi & Saleh, Citation2011, p. 446 for more details).

The following theorem using Lemma 3 extends results of Theorem 3 to a general convex and invariant loss function ρ2.

Theorem 4

Suppose non-negative and continuous random variable X distributed according to a scale family distribution with density function f1. Then, Bayes estimator, under convex and invariant loss function ρ2 and prior distributionτ(θ)=cθ1-F-1bπe-(ω-2)2/4F(ρ2(re-y)f1(e-y);y;ω-2);ω;ln(θ)

coincides with the MRE estimator rX.

Proof

Using Lemma 3, one may conclude that solving equation δτ(x)rx, in π, where τ(θ)=(c+π(ln(θ)))/θ, can be reduced to-cbx2=0ρ2rxθf1xθπ(ln(θ))θ3dθ-cbe-z2=-ρ2(ez-γ+ln(r))f1(ez-γ)π(γ)e-2γdγ

where b=0tρ2(rt)f1(t)dt and in the second equality z=ln(x) and γ=ln(θ). Taking the Fourier transform from both sides along with an application of the convolution theorem, the above equation can be restated asF(π(γ)e-2γ;γ;ω)=-cbπe-ω2/4F(ρ2(re-y)f1(e-y);y;ω)π(γ)=-ce2γF-1bπe-ω2/4F(ρ2(re-y)f1(e-y);y;ω);ω;γπ(γ)=-cF-1bπe-(ω-2)2/4F(ρ2(re-y)f1(e-y);y;ω-2);ω;γπ(ln(θ))=-cF-1bπe-(ω-2)2/4F(ρ2(re-y)f1(e-y);y;ω-2);ω;ln(θ)τ(θ)=cθ1-F-1bπe-(ω-2)2/4F(ρ2(re-y)f1(e-y);y;ω-2);ω;ln(θ)

Positivity of τ(·) arrives from the fact that bπe-(ω-2)2/4/F(ρ2(re-y)f1(e-y);y;ω-2) is an exponential type 1 function. Now, the Paley–Wiener theorem warranties that its corresponding inverse Fourier transform takes its values inside of an interval [-1,1]. The desired proof arrives by an application of Lemma 1.

It worthwhile mentioning that the above inverse Fourier transform may not evaluated analytically. Therefore, one has to employ some numerical approach, such as the fast Fourier transformation, to handel it.

5. Conclusion and suggestion

This paper provides a class of prior distributions which their corresponding Bayes estimator under general convex and invariant loss function coincides with the MRE estimator for location or scale family of distributions. This problem can be studied for scale-location family of distributions under general convex and invariant loss function.

Acknowledgements

Author would like to thank professor Xiao-Li Meng who introduced the problem and William Strawderman for his constrictive comments. Referees’ comments and suggestions are gratefully acknowledged by author.

Additional information

Funding

The author has received no direct funding for this research.

Notes on contributors

Amir T. Payandeh Najafabadi

Amir T. Payandeh Najafabadi is an Associate Professor in Department of Mathematics Sciences at Shahid Behashti University, Tehran, Evin. He was born on September 3, 1973. He received his PhD from University of New Brunswick, Canada in 2006. He has published 28 papers and was co-author of two books. His major research interests are: Statistical Decision Theory, Lévy processes, Risk theory, Riemann–Hilbert problem, & integral equations.

References

  • Bracewell, R. N. (2000). The Fourier transform and its applications (3rd ed.). New York, NY: McGraw-Hill.
  • Champeney, D. C. (1987). A handbook of Fourier theorems. New York, NY: Cambridge University Press.
  • Chandrasekar, B., & Sajesh, T. A. (2013). Reliability measures of systems with location-scale ACBVE components. Theory & Applications, 28, 7–15.
  • Dym, H., & Mckean, H. P. (1972). Fourier series and integrals. Probability and mathematical statistics. New York, NY: Academic Press.
  • (2013). Elliott, G., & Timmermann, A. (Eds.). Handbook of economic forecasting (Vol. 2). New York, NY: Newnes.
  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1, 515–534.
  • Grafarend, E. W. (2006). Linear and nonlinear models: Fixed effects, random effects, and mixed models. New York, NY: Walter de Gruyter.
  • Hallin, M., & Jurečková, J. (2012). Equivariant estimation. Encyclopedia of environmetrics. New York, NY: Wiley.
  • Jafarpour, H., & Farnoosh, R. (2005). Comparing the kurtosis measures for symmetric-scale distribution functions considering a new kurtosis. In Proceedings of the 8th WSEAS International Conference on Applied Mathematics (pp. 90–94). Tenerife: World Scientific and Engineering Academy and Society (WSEAS).
  • Kass, R. E., & Wasserman, L. (1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, 91, 1343–1370.
  • Kucerovsky, D., Marchand, É., Payandeh, A. T., & Strawderman, W. E. (2009). On the Bayesianity of maximum likelihood estimators of restricted location parameters under absolute value error loss. Statistics & Risk Modeling, 27, 145–168.
  • Ledoit, O., & Wolf, M. (2013). Optimal estimation of a large-dimensional covariance matrix under Stein’s loss ( Working Paper No. 122). Zurich: University of Zurich Department of Economics.
  • Lehmann, E. L. (1951). A general concept of unbiasedness. The Annals of Mathematical Statistics, 22, 587–592.
  • Lehmann, E. L., & Casella, G. (1998). Theory of point estimation. New York, NY: Springer.
  • Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2160–2173.
  • Manrique-Vallier, D., & Reiter, J. P. (2014). Bayesian estimation of discrete multivariate latent structure models with structural zeros. Journal of Computational and Graphical Statistics, 23, 1061–1079.
  • Marchand, É., & Payandeh, A. T. (2011). Bayesian improvements of a MRE estimator of a bounded location parameter. Electronic Journal of Statistics, 5, 1495–1502.
  • Marchand, É., & Strawderman, W. E. (2005). On improving on the minimum risk equivariant estimator of a scale parameter under a lower-bound constraint. Journal of Statistical Planning and Inference, 134, 90–101.
  • Meng, X., & Zaslavsky, A. M. (2002). Single observation unbiased priors. The Annals of Statistics, 30, 1345–1375.
  • Mitchell, S. A. (2014). Capture–recapture estimation for conflict data and hierarchical models for program impact evaluation ( PhD thesis), Harvard University Cambridge, Cambridge, MA.
  • Noorbaloochi, S., & Meeden, G. (1983). Unbiasedness as the dual of being Bayes. Journal of the American Statistical Association, 78, 619–623.
  • Noorbaloochi, S., & Meeden, G. (2000). Unbiasedness and Bayes estimators (Technical Report No. 9971331). University of Minnesota. Retrieved from http://users.stat.umn.edu/gmeeden/papers/bayunb.pdf
  • Rohatgi, V. K., & Saleh, A. M. E. (2011). An introduction to probability and statistics (Vol. 910). New York, NY: Wiley.
  • (2004). Rubin, D. B., & Gelman, A. (Eds.). (2004). Applied Bayesian modeling and causal inference from incomplete-data perspectives ( Vol. 561). New York, NY: Wiley.
  • Shao, J. (2003). Mathematical statistics: Springer texts in statistics. New York, NY: Springer.
  • Walnut, D. F. (2002). An introduction to wavelet analysis (2nd ed.). New York, NY: Birkhäuser Publisher.
  • Wei, J., Song, B., Yan, W., & Mao, Z. (2011, June). Reliability estimations of Burr-XII distribution under Entropy loss function. In The 9th international Conference on IEEE Reliability, Maintainability and Safety (ICRMS) (pp. 244–247). Guiyang, China.