Abstract
This paper proposes a model of prior ignorance about a multivariate variable based on a set of distributions . In particular, we discuss four minimal properties that a model of prior ignorance should satisfy: invariance, near ignorance, learning and convergence. Near ignorance and invariance ensure that our prior model behaves as a vacuous model with respect to some statistical inferences (e.g. mean, credible intervals, etc.) and some transformation of the parameter space. Learning and convergence ensure that our prior model can learn from data and, in particular, that the influence of
on the posterior inferences vanishes with increasing numbers of observations. We show that these four properties can all be satisfied by a set of conjugate priors in the multivariate exponential families if the set
includes finitely additive probabilities obtained as limits of truncated exponential functions. The obtained set
is a model of prior ignorance with respect to the functions (queries) that are commonly used for statistical inferences and, because of conjugacy, it is tractable and easy to elicit. Applications of the model to some practical statistical problems show the effectiveness of the approach.
Acknowledgements
The authors would like to thank the anonymous referees for comments and constructive criticism that helped us to improve the presentation of the paper.
Notes
1. More precisely, denotes a semigroup of transformations of
. That is, each
maps
into itself, and the composition f1f2 defined by f1(f2(w)) is in
whenever
. The semigroup
is Abelian if f1f2=f2f1 whenever
.
2. In this paper we mainly focus on translation invariance. However, for multivariate models, we will impose other invariance properties: invariance to permutations and invariance to representation.
3. Note that I{A} is the indicator function of set A, that is, I{A}(x)=1 if x∈A and zero otherwise.
4. Equivalently, if −g and −g(f) belong to then
, which implies that
being
for any g.
5. We point the reader to [Citation16, Chapter 20] for a general discussion about dominated priors. When the likelihood belongs to the exponential families (the focus of this paper), as dominated prior we may consider any proper conjugate prior, the improper uniform or other sufficiently regular priors. The posterior becomes asymptotically Normal in these cases.
6. Let the least term φ of a sequence be a term which is smaller than all but a finite number of the terms which are equal to φ. Then φ is called the lower limit of the sequence.
7. The differences are due to the Jacobians of the transformations.
8. With sufficiently smooth RVBFs, we mean integrable w.r.t. the kernel for any ℓ∈[−c,c],
and
with support in
and continuous on a neighbourhood of the point where the posterior relative to the improper uniform prior concentrates for
.
9. This holds for any ℓ∈(0,c]. All ℓ∈(0,c] are equivalent w.r.t. this property, since all are increasing in
for ℓ>0.
10. Notice that this behaviour in general is not monotone and depends on how converges with the number of observations.
11. This also holds for the exponential distribution for n0<−1.
12. Since the priors in are all countably additive, it also satisfies strong coherence as defined in [Citation6, Chapter 7].
13. In the formulation we need to impose the additional constraint that the argument of the square root is positive. This implies that the parameters ℓ·1 and ℓ·2 cannot vary independently in [−c·1,c·1] and, respectively, [−c·2,c·2]. However, for suitably large n and non-degenerate distributions, the argument of the square root is usually positive for any ℓ·1 and ℓ·2 in the intervals.
14. For Δ=0 the plot actually reports the Type I error.
15. We have chosen this interval in analogy with that of the IEM test. The distribution of the p-values for a different boundary of the ‘no decision zone’ can easily be deduced from Figure (c) and (d).
16. These lower and upper probabilities are obtained by densities in approaching the extreme priors in Equations (42)–(43).