3,913
Views
4
CrossRef citations to date
0
Altmetric
General

A Generalization of the Savage–Dickey Density Ratio for Testing Equality and Order Constrained Hypotheses

, & ORCID Icon
Pages 102-109 | Received 21 Apr 2020, Accepted 20 Jul 2020, Published online: 26 Aug 2020

ABSTRACT

The Savage–Dickey density ratio is a specific expression of the Bayes factor when testing a precise (equality constrained) hypothesis against an unrestricted alternative. The expression greatly simplifies the computation of the Bayes factor at the cost of assuming a specific form of the prior under the precise hypothesis as a function of the unrestricted prior. A generalization was proposed by Verdinelli and Wasserman such that the priors can be freely specified under both hypotheses while keeping the computational advantage. This article presents an extension of this generalization when the hypothesis has equality as well as order constraints on the parameters of interest. The methodology is used for a constrained multivariate t-test using the JZS Bayes factor and a constrained hypothesis test under the multinomial model.

1 Introduction

The Savage–Dickey density ratio (Dickey Citation1971) is a special expression of the Bayes factor, the Bayesian measure of statistical evidence between two statistical hypotheses in light of the observed data (Jeffreys Citation1961; Kass and Raftery Citation1995). The Savage–Dickey density ratio is relatively easy to compute from Markov chain Monte Carlo (MCMC) output without requiring the marginal likelihoods under the hypotheses. Consider a test of a normal mean θ with unknown variance σ2,Hc:θ=0 versus Hu:θR, with independent observations yiN(θ,σ2), for i=1,,n. The indices “c” and “u” refer to a constrained hypothesis and an unconstrained hypothesis.1 Denote the priors for the unknown parameters under Hc and Hu by πc(σ2) and πu(θ,σ2), respectively, which reflect which values for the parameters are likely before observing the data. Under Hu we consider a unit information prior πu(θ|σ2)=N(0,σ2) and a conjugate inverse gamma prior for the nuisance parameter, say, πu(σ2)=IG(12,12) (the exact choice of the hyperparameters does not qualitatively affect the argument; see, e.g., Verdinelli and Wasserman Citation1995). The marginal prior for θ under Hu then follows a Cauchy distribution (equivalent to a Student’s t-distribution with 1 degree of freedom) centered at θ = 0 with a scale parameter of 1. The marginal posterior for θ under Hu, πu(θ|y), also has a Student’s t-distribution. When the prior for the nuisance parameter σ2 under Hc equals the conditional prior for σ2 under Hu given the restriction under Hc, that is, πc(σ2)=πu(σ2|θ=0), the Bayes factor for Hc against Hu can then be written as the Savage–Dickey density ratio: the ratio of the unconstrained posterior and unconstrained prior density evaluated at the constrained null value under Hc (Dickey Citation1971), that is,Bcu=pc(y)pu(y)=p(y|0,σ2)π1(σ2)dσ2p(y|θ,σ2)πu(θ,σ2)dθdσ2=πu(θ=0|y)πu(θ=0),where p(y|θ,σ2) denotes the likelihood of the data given the normal mean θ and variance σ2, and pc(y) and pu(y) denote the marginal likelihoods under Hc and Hu, respectively. For the current problem, we would thus need to divide the posterior t distribution of θ under Hu evaluated at θ = 0 by the prior Cauchy distribution at θ = 0, which both have analytic expressions. Note, of course, that the same expression would be obtained by deriving the marginal likelihoods which also have analytic expressions in this scenario. For more complex statistical models with more nuisance parameters, for which the marginal likelihoods would not have analytic expressions, the Savage–Dickey density ratio is particularly useful as we only need to compute the ratio of the unconstrained posterior and the unconstrained prior evaluated at the constrained null value, which are generally easy to obtain, for example, using MCMC output.

Despite its computational convenience, a limitation of the Savage–Dickey density ratio is that it only holds for a specific form of the prior for the nuisance parameters under the restricted model which is completely determined by the prior under the unrestricted model. This imposed prior under the restricted model may not always have a desirable interpretation. For example, for the Savage–Dickey ratio to hold in the above example, the prior for the population variance under Hc equals πc(σ2)=πu(σ2|θ=0)=IG(1,12). This prior under Hc is more concentrated around smaller values for σ2 than under Hu as can be seen from the prior modes for σ2 under Hc and Hu which are 14 and 13, respectively. This is contradictory however because the sample estimate for σ2 will always be smaller under Hu where the mean θ is unrestricted. Therefore, the Savage–Dickey density ratio should be used with care. For discussions on the Savage–Dickey density ratio, see Marin and Robert (Citation2010) and Heck (Citation2019). For discussions on priors for the nuisance parameters, see Consonni and Veronese (Citation2008).

To retain the computational convenience of the Savage–Dickey density ratio, while allowing researchers to freely specify the prior for the nuisance parameters under the restricted model, Verdinelli and Wasserman (Citation1995) proposed a generalization. In a multivariate setting when testing a vector of key parameters θ, that is, Hc:θ=r, where r is a vector of constants, against an unconstrained alternative, Hu:θ unconstrained, with nuisance parameters ϕ, where the priors under Hc and Hu are denoted by πc(ϕ) and πu(θ,ϕ), respectively, the multivariate generalized Savage–Dickey density ratio is given by(1) B1u=πu(θ=r|y)πu(θ=r)×E{πc(ϕ)πu(ϕ|θ=r)},(1) where the expectation is taken over the conditional posterior under the unconstrained model, πu(ϕ|θ=r,y). As can be seen, the generalization is equal to the original Savage Dickey density ratio (the first factor on the right hand side of (1)) multiplied with a correction factor based on the ratio of the freely chosen prior for the nuisance parameters, πc(ϕ), and the imposed prior for the nuisance parameters under the Savage–Dickey density ratio, πu(ϕ|θ=r). In the above example, one might want to use the same marginal prior for the nuisance parameter under Hc as under Hu, that is, πc(σ2)=IG(12,12).

The generalization in (1) was not derived when the constrained hypothesis contains order (or one-sided) constraints in addition to equality constraints, say, Hc:θe=re & θo>ro. Scientific theories however are very often formulated with combinations of equality and order constraints (Hoijtink Citation2011). In repeated measures studies, for instance, theory may suggest a specific ordering of the measurement means (de Jong, Rigotti, and Mulder Citation2017) or measurement variances (Böing-Messing and Mulder Citation2020), in a regression model theory may suggest that a certain set of predictor variables have zero effects, while other variables are expected to have a positive or a negative effects (Mulder and Olsson-Collentine Citation2019), or order constraints may be formulated on regression effects (Haaf and Rouder Citation2017) or intraclass correlations (Mulder and Fox Citation2013, Citation2019) in multilevel models. The goal of the current article is therefore to show the generalization of the Savage–Dickey density ratio in (1) for a constrained hypothesis with equality and order constraints on certain key parameters. This is shown in Section 2, where the generalization is related to existing special cases of the Bayes factor. Section 3 presents two applications of Bayesian constrained hypothesis testing under two statistical models: A multivariate Bayesian t-test for standardized effects under the multivariate normal model using a novel extension of the JZS Bayes factor (Rouder et al. Citation2009), and a constrained hypothesis test on the cell probabilities under a multinomial model. The article ends with some short concluding remarks in Section 4.

2 Extending the Savage–Dickey Density Ratio

Lemma 1

presents our main result.

Lemma 1. Consider a constrained statistical model, Hc, where the parameters θe are fixed with equality constraints, that is, θe=re, and order (or one-sided) constraints are formulated on the parameters θo, that is, θo>ro, with (unconstrained) nuisance parameters ϕ, and an alternative unconstrained model Hu, where (θe,θo,ϕ) are unrestricted. If we denote the priors under Hc and Hu according to πc(θo,ϕ) and πu(θe,θo,ϕ), respectively, then the Bayes factor of model Hc against model Hu given a dataset y can be expressed as(2) Bcu=πu(θe=re|y)πu(θe=re)Prc*(θo>ro)×E{πc*(θo,ϕ)πu(θo,ϕ|θe=re)1{θo>ro}(θo)},(2) where the expectation is taken over the conditional posterior of (θo,ϕ) given θe=re under Hu, that is, πu(θo,ϕ|y,θe=re), and πc*(θo,ϕ) denotes the “completed” prior under the completed constrained hypothesis where the one-sided constraints are omitted, that is, Hc*:θe=re, such that πc(θo,ϕ)=Prc*(θo>ro)1πc*(θo,ϕ)1{θo>ro}(θo), where 1{θo>ro}(θo) is the indicator function which equals 1 if θo>ro holds, and 0 otherwise, and Prc*(·) denotes the prior probability of θo>ro under the completed prior under Hc*.

Proof. A

ppendix A. □

Remark 1. N

ote that in the special case whereπc(θo,ϕ)=πu(θo,ϕ|θe=re)Pru(θo>ro|θe=re)11{θo>ro}(θo),so that the completed prior under Hc* is equal to πu(θo,ϕ|θe=re), then (2) results in the known generalization of the Savage–Dickey density ratio of the Bayes factor for an equality and order hypothesis against an unconstrained alternative,(3) Bcu=πu(θe=re|y)πu(θe=re)×Pru(θo>ro|y,θe=re)Pru(θo>ro|θe=re).(3)

This expression has been reported in Mulder and Gelissen (Citation2018), for example.

Remark 2.

In the special case with no order constraints, the parameters θo would be part of the nuisance parameters ϕ, and thus (2) becomes equal to (1).

Remark 3.

The importance of the “completed” prior where the one-sided constraints are omitted was also highlighted by Pericchi, Liu, and Torres (Citation2008) for intrinsic Bayes factors.

Lemma 1

shows which four ingredients need to be computed to obtain the Bayes factor of a constrained hypothesis against an unconstrained alternative. The computation of these four ingredients can be done in different ways across different statistical models. To give readers more insights about the computational aspects, the next section shows the application of the result under two different statistical models: the multivariate normal model for multivariate continuous data and the multinomial model for categorical data.

3 Applications

3.1 A Multivariate t-Test Using the JZS Bayes Factor

The Cauchy prior for standardized effects is becoming increasingly popular for Bayes factor testing in the social and behavioral sciences (Rouder et al. Citation2009, Citation2012; Rouder and Morey Citation2015). This Bayes factor is based on key contributions by Jeffreys (Citation1961), Zellner and Siow (Citation1980), and Liang et al. (Citation2008), and is therefore also referred to as the JZS Bayes factor. Here, we extend this to a Bayesian multivariate t-test under the multivariate normal model, and show how to compute the Bayes factor for testing a hypothesis with equality and order constraints on the standardized effects using Lemma 1. Note that this test differs from multivariate t-tests on multiple coefficients using a multivariate Cauchy prior under univariate linear regression models (Rouder and Morey Citation2015; Heck Citation2019) as we consider a model with a multivariate outcome variable.

Let a multivariate dependent variable of p dimensions, yi, follow a multivariate normal distribution, that is, yiN(μ,Σ), for i=1,,n. To explicitly model the standardized effects, we reparameterize the model according to(4) yiN(LΣδ,Σ),(4) where δ are the unknown standardized effects, and LΣ is the lower triangular Cholesky factor of the unknown covariance matrix Σ, such that LΣLΣ=Σ. The model in (4) is a generalization of the univariate model considered by Rouder et al. (Citation2009), yiN(σδ,σ2).

As a motivating example we consider the bivariate dataset (p = 2) presented in Larocque and Labarre (Citation2004), where yi=(yi1,yi2) contains the cell count differences of CD45RA T and CD45RO T cells of n = 36 HIV-positive newborn infants (Sleasman et al. Citation1999). We are interested in testing whether the standardized effects of the cell count differences of the two cell types are equal and positive, that is,Hc:δ1=δ2>0Hu:(δ1,δ2)R2.

The sample means were y¯=(86.94,193.47) and the estimated covariance matrix equaled Σ̂=[20197 23515;23515 106350].

Extending the prior proposed by Rouder et al. (Citation2009) to the multivariate normal model, we set an unconstrained Cauchy prior on δ under Hu and the Jeffreys’ prior for the covariance matrix:πu(δ,Σ)=πu(δ)×πu(Σ)=Cauchy(δ|Su,0)×|Σ|p+12.

A diagonal prior scale matrix is set for δ given by Su,0=diag(s12,s22), with s12=s22=0.25. This prior implies that standardized effects of about 0.5 are likely under Hu. Under the constrained hypothesis Hc, the free parameters are the common standardized effect, say, δ=δ1=δ2, and the error covariance matrix, Σ. We set a univariate Cauchy prior for δ with scale s1 truncated in δ>0, and the Jeffreys’ prior for Σ, that is,πc(δ,Σ)=π1(δ)×π1(Σ)=2×Cauchy(δ|s1)×1(δ>0)×|Σ|p+12,where πc*(δ)=Cauchy(δ|s1) denotes the completed prior, and 2 serves as a normalizing constant for the completed prior as Prc*(δ>0)1=2. As δ has a similar interpretation as δ1 and δ2 under Hu, the prior scale is also set to s1=0.5.

By applying the following linear transformation on the standardized effects,(5) θ=[θeθo]=[δ1δ2δ2][1101][δ1δ2]=Tδ,(5) the model can equivalently be written as yiN(LT1θ,Σ), and the hypotheses can be written asHc:θe=0,θo>0Hu:(θe,θo)R2.

Note here that θo corresponds to the common standardized effect δ under Hc. The prior for (θe,θo) under Hu follows a bivariate Cauchy distribution with scale matrix TSu,0T=[0.5 0.25;0.25 0.25].

If one would be testing the hypotheses with the Savage–Dickey density ratio in (3), it is easy to show that the implied prior for δ under Hc (i.e., the conditional unconstrained prior for θo given θe=0 under Hu) follows a Student’s t-distribution with 2 degrees of freedom with a scale parameter of 0.252=0.125; thus assuming that standardized effects of 0.25 are likely under Hc. As was discussed earlier, there is no logical reason why the common standardized effect under the restricted hypothesis Hc is expected to be smaller than the standardized effects under Hu a priori.

The JSZ Bayes factor for this constrained testing problem using Lemma 1 based on the actual Cauchy priors for the standardized effects can be computed using MCMC output from a sampler under Hu, which is described in Appendix B. The R code for the computation is given in Appendix C.1. The four key quantifies in (2) are computed as follows:

  • As the unconstrained marginal prior for θe follows a Cauchy distribution with scale 0.5 (, left panel, dashed line), the prior density equals πu(θe=0|Y)=2/π.

  • The estimated marginal posterior for θe under Hu follows from MCMC output. The estimated posterior for θe is plotted in (left panel, solid line). This yields π̂u(θe=0|Y)=0.9871618.

  • As the completed prior for δ under Hc* follows a Cauchy(0.5) distribution that is centered at zero, the prior probability equals Prc*(δ>0)=0.5.

  • As the priors for the covariance matrices cancel out in the fraction, the expected value can be written as E{Cauchy(θo|0.5)Cauchy(θo|0.25)1{θo>0}(θo)} under the conditional posterior for θo given θe=0 under Hu. Appendix B also shows how to get posterior draws from θo under Hu given θe=0. The estimated posterior is displayed in (right panel). A Monte Carlo estimate can then be used to compute the expectation, which yields 1.098799.

Fig. 1 Estimated probability densities for the multivariate Student’s t-test. Left panel: Marginal posterior (solid line) and prior (dashed line) for θe=δ1δ2. The dotted lines indicate the estimated density values at θe=0. Right panel: Estimated conditional posterior for θo given θe=0 under Hu.

Fig. 1 Estimated probability densities for the multivariate Student’s t-test. Left panel: Marginal posterior (solid line) and prior (dashed line) for θe=δ1−δ2. The dotted lines indicate the estimated density values at θe=0. Right panel: Estimated conditional posterior for θo given θe=0 under Hu.

Application of Lemma 1 then yields a Bayes factor for Hc against Hu of Bcu=0.98716182/π×0.5×1.098799=4.8. Thus, there is 4.8 times more evidence in the data for equal and positive standardized count differences than for the unconstrained alternative hypothesis. Assuming equal prior probabilities for Hc and Hu this would yield posterior probabilities of Pr(Hc|Y)=0.783 and Pr(Hu|Y)=0.217. Thus, there is mild evidence for Hc relative to Hu. To draw clearer conclusions more data would need to be collected.

3.2 Constrained Hypothesis Testing Under the Multinomial Model

When analyzing categorical data using a multinomial model, researchers are often interested in testing the relationships between the probabilities of the different cells (Robertson Citation1978; Klugkist, Laudy, and Hoijtink Citation2010; Heck and Davis-Stober Citation2019). As an example, we consider an experiment for testing the Mendelian inheritance theory discussed by Robertson (Citation1978). A total of 556 peas coming from crosses of plants from round yellow seeds and plants from wrinkled green seeds were divided in four categories. The cell probabilities for these categories are contained in the vector γ=(γ1,γ2,γ3,γ4), where γ1 denotes the probability that a pea resulting from such a mating is round and yellow; γ2 denotes the probability that it is wrinkled and yellow; γ3 denotes the probability that it is round and green; and γ4 denotes the probability that it is wrinkled and green. The Mendelian theory states that γ1 is largest, followed by γ2 and γ3 which are assumed to be equal, and γ4 is expected to be smallest. This can be summarized as Hc:γ1>γ2=γ3>γ4. In particular, the theory dictates that the four probabilities are proportional to 9, 3, 3, and 1, respectively. We translate this to a completed prior under Hc* such that its means satisfy E(γ1)E(γ2)=E(γ2)E(γ4)=3. This can be achieved via a Dirichlet prior under an alternative parameterization, (ξ1,ξ2,ξ4)Dirichlet(αc1,αc2,αc3), with αc=(9,6,1). The cell probabilities under Hc* are then defined by (γ1,γ2,γ4)=(ξ1,ξ2/2,ξ4), which then follow a specific scaled Dirichlet distribution, which we denote by SDirichlet(9, 6, 1).2 The prior for the cell probabilities under Hc is then a truncation of this scaled Dirichlet distribution truncated under γ1>γ2>γ4. The Mendelian hypothesis can equivalently be formulated on the transformed parameters (θe,θo,1,θo,2,ϕ)=(γ2γ3,γ1γ2,γ2γ4,γ2) so that Hc:θe=0,(θo,1,θo,2)>0, as in Lemma 1. It is easier however to compute the four quantities in (2) via the untransformed parameters γ as will be shown below.

The Mendelian hypothesis will be tested against an unconstrained alternative which does not make any assumptions about the relationships between the cell probabilities. A uniform prior on the simplex will be used under the alternative, that is, πu(γ1,γ2,γ3,γ4)=Dirichlet(1,1,1,1). The observed frequencies in the four respective categories were equal to 315, 101, 108, and 32.

The R code for the computation of the Bayes factor of Hc against Hu can be found in Appendix C.2.

  • The unconstrained marginal prior density at θe=0 can be estimated from a sample of θe=γ2γ3 where γ is sampled from the unconstrained Dirichlet(1,1,1,1) prior, resulting in π̂u(θe=0)=1.476556.

  • Similarly, the unconstrained marginal posterior density at θe=0 can be obtained by sampling γ from the unconstrained Dirichlet(316,102,109,33) posterior, resulting in π̂u(θe=0|y)=13.71403.

  • The prior probability under Hc can be obtained by first sampling (ξ1,ξ2,ξ4)Dirichlet(9,6,1), then transforming the prior draws according to (γ1,γ2,γ4)=(ξ1,ξ2/2,ξ4), and taking the proportion of draws satisfying the constraints Prc(γ1>γ2>γ3)S1s=1SI(γ1(s)>γ2(s)>γ3(s))=0.8949818, where γ(s) denotes the sth draw, for s=1,,S.

  • To get draws from the conditional distribution (γ1,γ2,γ3,γ4) given γ2=γ3 when (γ1,γ2,γ3,γ4)Dirichlet(α1,α2,α3,α4) under Hu, we can sample transformed parameters (ξ1,ξ2,ξ4)Dirichlet(α1,α2+α31,α4), and compute (γ1,γ2,γ3,γ4)=(ξ1,ξ2/2,ξ2/2,ξ4). This can be used to obtain draws from the conditional posterior for (γ1,γ2,γ3,γ4) given γ2=γ3 under Hu by setting α=(315,101,108,33). The expectation in (2) can then be computed as the arithmetic mean of SDirichlet((γ1,γ2,γ4)|α=(9,6,1))SDirichlet((γ1,γ2,γ4)|α=(1,1,1))I(γ1>γ2>γ4) based on a sufficiently large sample. This yields an estimate of 10.50881.

In sum the Bayes factor of the Mendelian hypothesis against the noninformative unconstrained alternative is equal to Bcu=13.714031.476556×0.8949818×10.50881=109.0572. This can be interpreted as relatively strong evidence for the Mendelian hypothesis against an unconstrained alternative based on the observed data.

Finally note that by using probability calculus it can be shown that the first two ingredients have analytic solutions as the marginal probability density at θe=γ2γ3=0 under Hu, when γDirichlet(α), is equal to Γ(α2+α3)(α1+α2+α3+α41)Γ(α2)Γ(α3)(α2+α31)2α2+α31. In the above calculation, numerical estimates were used to give readers more insights how to obtain these quantities when analytic expressions are unavailable.

4 Concluding Remarks

As Bayes factors are becoming increasingly popular to test hypotheses with equality as well as order constraints on the parameters of interest, more flexible and fast estimation methods to acquire these Bayes factors are needed. The generalization of the Savage–Dickey density ratio that was presented in this article will be a useful contribution for this purpose. The expression allows one to compute Bayes factors in a straightforward manner from MCMC output while being able to freely specify the priors for the free parameters under the competing hypotheses. The applicability of the proposed methodology was illustrated in a constrained multivariate t-test using a novel extension of the JSZ Bayes factor to the multivariate normal model and in a constrained hypothesis test under the multinomial model.

Acknowledgments

The authors would like to thank Florian Böing-Messing for helpful discussions at an early stage of the article, and the editor and three anonymous reviewers for constructive feedback which improved the readability of the article.

References

Appendix A

Proof of Lemma 1

As the constrained model Hc:θe=re & θo>ro is nested in the unconstrained model Hu, the likelihood under Hc can be written as the truncation of the unconstrained likelihood, that is, pc(y|θo,ϕ)=pu(y|θe=re,θo,ϕ)1{θo>ro}(θo). The result in Lemma 1 then follows via the following steps,Bcu=pc(y)pu(y)=θo>ropc(y|θo,ϕ)πc(θo,ϕ)dθodϕpu(y|θe,θo,ϕ)πu(θe,θo,ϕ)dθedθodϕ=θo>ropu(y|θe=re,θo,ϕ)1{θo>ro}(θo)πc(θo,ϕ)pu(y)πu(θe=re|y)dθodϕ×πu(θe=re|y)=θo>ropu(y|θe=re,θo,ϕ)πc(θo,ϕ)pu(y)πu(θe=re,θo,ϕ|y)πu(θo,ϕ|y,θe=re)dθodϕ×πu(θe=re|y)=θo>roπc(θo,ϕ)πu(θe=re,θo,ϕ)πu(θo,ϕ|y,θe=re)dθodϕ×πu(θe=re|y)=θo>roπc(θo,ϕ)πu(θo,ϕ|θe=re)πu(θo,ϕ|y,θe=re)dθodϕ×πu(θe=re|y)πu(θe=re)=πc*(θo,ϕ)1{θo>ro}(θo)πu(θo,ϕ|θe=re)Prc*(θo>ro)πu(θo,ϕ|y,θe=re)dθodϕ×πu(θe=re|y)πu(θe=re)=πc*(θo,ϕ)1{θo>ro}(θo)πu(θo,ϕ|θe=re)πu(θo,ϕ|y,θe=re)dθodϕ×Prc*(θo>ro)1×πu(θe=re|y)πu(θe=re),which completes the proof. Note that in the third step the indicator function, 1{θo>ro}(θo), was omitted as the integrand is integrated over the subspace where θo>ro. In the second last step, the completed version of the constrained hypothesis has the order constraints omitted, that is, Hc*:θe=re, with completed prior πc*(θo,ϕ), such that πc(θo,ϕ)=πc*(θo,ϕ)Prc*(θo>ro)11{θo>ro}(θo).

Appendix B

MCMC Sampler for the Multivariate Student’s t-test

  1. Drawing the standardized effects δ. It is well-known that a multivariate Cauchy prior of p dimensions can be written as a Multivariate normal distribution with an inverse Wishart mixing distribution on the normal covariance matrix with p degrees of freedom, that is,πu(δ)=Cauchy(δ|S0)=N(δ|0,Φ)×IW(Φ|p,S0)dΦ.

    Thus, the conditional prior for δ given the auxiliary parameter matrix Φ follows a N(0,Φ) distribution. Consequently, as zΣ,i=LΣ1yiN(δ,Ip), the conditional posterior of δ follows a multivariate normal posterior,δ|Φ,Σ,yN(n(Φ1+nIp)1z¯Σ,(Φ1+nIp)1),where z¯Σ are the sample means of zΣ,i, for i=1,,n.

  2. Drawing the auxiliary covariance matrix Φ. The conditional posterior for Φ only depends on the standardized effects and it follows an inverse Wishart distribution,Φ|δIW(p+1,S0+δδ).

  3. Drawing the error covariance matrix Σ. The conditional posterior for the covariance matrix does not follow a known distribution. For this reason we use a random walk (e.g., Gelman et al. Citation2004) for sampling the separate elements of Σ.

The sampler under the unconstrained model while restricting δ1=δ2 (=δ) is very similar except that the prior for δ is now univariate Cauchy(δ|0.25) and Φ=[ϕ2] is a scalar, and thus the conditional posterior for δ is univariate normal N(2n(ϕ2+2n)1z¯Σ,(ϕ2+n)1), where z¯Σ is the mean of z¯Σ. Also note that the inverse Wishart distribution in Step 2 is now for a 1 × 1 covariance matrix which is equivalent to an inverse gamma distribution.

Appendix C

R Code for Empirical Analyses

C.1 R Code for Multivariate t-Test in Section 3.1

library(mvtnorm) library(Matrix) # computing the unconstrained marginal prior density at\theta_e = 0:priorE <- dcauchy(0, location = 0,scale = sqrt(.5)) # computing the unconstrained marginal posterior density at\theta_e = 0: # read data Y <- t(matrix(c(242,1708,569,569,270,757,-25,499, 309,231,22,338,-42,26,-233,119,206,163,-106, -186,55,54,85,48,30,50,194,525,-87,-110,159, 148,29,102,89,364,-9,36,158,234,76,122,15,24, 3,36,93,71,160,44,66,128,180,155,237,85,105, 76,16,6,167,364,-10,-18,-61,-21,-7,-2,15,32, 160,188), nrow = 2)) set.seed(123) #dimension p <- ncol(Y) nums <- p*(p + 1)/2 n <- nrow(Y) #initial parameter values based on burn-in period delta <- c(.5,.2) Sigma <- matrix(c(2,2,2,11),2,2) * 10**4 L <- t(chol(Sigma)) Phi <- diag(p) #selection of unique elements in\Sigma lowerSigma <- lower.tri(Sigma,diag = TRUE) welklower <- which(lowerSigma) # tranformation matrix Trans <- matrix(c(1,0,-1,1),ncol = 2) #prior hyperparameters S0 <- diag(p) *.5**2 # random walk sd’s for the elements of\Sigma # to have an efficient acceptance probability # based on burn-in period. sdstep <- c(9,13,48) * 10**3 #store draws numdraws <- 1e5 storeDelta <- matrix(0,nrow = numdraws,ncol = p) storeSigma <- storePhi <- array(0,dim = c(numdraws, p,p)) #draws from stationary distribution for(s in 1:numdraws){ #draw delta deltaMean <- c(apply(Y SigmaDelta <- solve(n*diag(p) + solve(Phi)) muDelta <- c(SigmaDelta delta <- c(rmvnorm(1,mean = muDelta,sigma= SigmaDelta)) #draw Phi Phi <- solve(rWishart(1,df = p + 1,Sigma = solve(S0 + delta #draw Sigma using MH for(sig in 1:nums){ welknu <- welklower[sig] step1 <- rnorm(1,sd = sdstep[sig]) Sigma0 <- matrix(0,p,p) Sigma0[lowerSigma] <- Sigma[lowerSigma] Sigma0[welknu] <- Sigma0[welknu] + step1 Sigma_can <- Sigma0 + t(Sigma0) - diag (diag(Sigma0)) if(min(eigen(Sigma_can)$values) >.000001){ #the candidate is positive definite L_can <- t(chol(Sigma_can)) #acceptance probability R_MH <- exp(sum(dmvnorm(Y,mean = c(L_can%* %delta), sigma = Sigma_can, log = TRUE)) - (p + 1)/2*log(det(Sigma_can))- sum(dmvnorm(Y,mean = c(L Sigma,log = TRUE)) + (p + 1)/2*log(det(Sigma))) if(runif(1) < R_MH){ #accept draw Sigma <- Sigma_can L <- t(chol(Sigma))}}} storeDelta[s,] <- delta storeSigma[s,] <- Sigma storePhi[s,] <- Phi} drawsE <- storeDelta[,1] - storeDelta[,2] denspost <- density(drawsE) df <- approxfun(denspost) postE <- df(0) # (left panel) plot(denspost,xlim = c(-3,3),main="",xlab="theta_e") seq1 <- seq(-3,3,length = 1e3) lines(seq1,dcauchy(seq1,scale = sqrt(.5)),lty = 2) # computing the prior probability of\theta_o > 0 under H_c: priorO <- 1 - pcauchy(0, location = 0, scale =.5) # computing the expectation of the ratio of the # priors from a posterior sample under H_c given #\theta_e = 0 initialization set.seed(123) p1 <- 1 p <- ncol(Y) nums <- p*(p + 1)/2 n <- nrow(Y) S0 <- diag(1)*.25**2 # initial parameter values based on burn-in period delta <-.55 Phi <- matrix(1) Sigma <- matrix(c(23,22,22,89),nrow = 2) * 10**3 L <- t(chol(Sigma)) # random walk sd’s for the elements of\Sigma to # have an efficient acceptance probability based # on burn-in period. sdstep1 <- c(10,15,48) * 10**3 lowerSigma <- lower.tri(Sigma,diag = TRUE) welklower <- which(lowerSigma) # store draws numdraws <- 1e5 storeDelta1 <- matrix(0,nrow = numdraws,ncol = 1) storePhi1 <- array(0,dim = c(numdraws,p1,p1)) storeSigma1 <- array(0,dim = c(numdraws,p,p)) for(s in 1:numdraws){ #draw delta deltaMean <- mean(c(apply(Y mean))) SigmaDelta <- solve(2*n*diag(p1) + solve(Phi)) muDelta <- c(SigmaDelta delta <- c(rmvnorm(1,mean = muDelta,sigma= SigmaDelta)) #draw Phi Phi <- solve(rWishart(1,df = p1 + 1,Sigma = solve(S0+ Delta%*% (delta))) [,1]) #draw Sigma using MH deltavec <- rep(delta,2) for(sig in 1:nums){ welknu <- welklower[sig] step1 <- rnorm(1,sd = sdstep1[sig]) Sigma0 <- matrix(0,p,p) Sigma0[lowerSigma] <- Sigma0[lowerSigma] + Sigma[lowerSigma] Sigma0[welknu] <- Sigma0[welknu] + step1 Sigma_can <- Sigma0 + t(Sigma0) - diag(diag(Sigma0)) if(min(eigen(Sigma_can)$values) >.000001){ #the candidate is positive definite L_can <- t(chol(Sigma_can)) #dit zou sneller kunnen via onafhankelijke univariate normals R_MH <- exp(sum(dmvnorm(Y,mean = c(L_can% *%deltavec) sigma = Sigma_can,log = TRUE)) - (p + 1)/2* log(det(Sigma_can)) - sum(dmvnorm(Y,mean = c(L%*%deltavec), sigma = Sigma,log = TRUE)) + (p + 1)/2*log(det(Sigma))) if(runif(1) < R_MH){ #accept draw Sigma <- Sigma_can L <- t(chol(Sigma))}}} storeDelta1[s,] <- delta storePhi1[s,] <- Phi storeSigma1[s,] <- Sigma} expratio <- mean(dcauchy(c(storeDelta1), scale=.5)/dcauchy(c(storeDelta1),scale=.25) * (c(storeDelta1)>0)) # , right panel plot(density(c(storeDelta1)),main="", xlab="theta_o") # computation of the Bayes factor Bcu <- postE/(priorE * priorO) * expratio

C.2 R Code for Multinomial Model in Section 3.2

library(MCMCpack) set.seed(123) # computing the unconstrained marginal prior density at\theta_e = 0: uncpriorsample <- rdirichlet(n = 1e7, alpha = c(1,1,1,1)) densprior <- density(uncpriorsample[,2]- uncpriorsample[,3]) df <- approxfun(densprior) priorE <- df(0) remove(uncpriorsample) # computing the unconstrained marginal posterior density at\theta_e = 0:uncpostsample <- rdirichlet(n = 1e7, alpha = c(1 + 315,1 + 101,1 + 108, 1 + 32)) denspost <- density(uncpostsample[,2]- uncpostsample[,3]) df <- approxfun(denspost) postE <- df(0) remove(uncpostsample) # computing the prior probability of\theta_o > 0 under H_c:priorsample1 <- rdirichlet (n = 1e7,alph = c(9,6,1))priorsample1[,2] <- priorsample1[,2]/2 priorO <- mean(priorsample1[,1] > priorsample1[,2] & priorsample1[,2] > priorsample1[,3]) remove(priorsample1) # computing the expectation of the ratio of # priors: first define probability density # for (gamma1,gamma2) SDirichlet <- function(gamma1,gamma2,alpha1, alpha2,alpha3){ alphavec <- c(alpha1,alpha2,alpha3) B1 <- exp(sum(lgamma(alphavec)) - lgamma(sum (alphavec))) return( 2^alpha2/B1 * gamma1^(alpha1-1) * gamma2^(alpha2-1) * (1-gamma1-2*gamma2) ^(alpha3-1) )} condpostsample <- rdirichlet(n = 1e7, alpha = c(316, 210,33)) condpostsample[,2] <- condpostsample[,2]/2 expratio <- mean(SDirichlet(condpostsample[,1], condpostsample[,2],9,6,1)/ SDirichlet(condpostsample[,1],condpostsample [,2],1,1,1) * (condpostsample[,1]>condpostsample[,2] & condpostsample[,2]>condpostsample[,3]) ) remove(condpostsample) # computing the Bayes factor of $H_c$against $H_u$: Bcu <- postE/(priorE*priorO)*expratio

Funding

Mulder is supported by an ERC starting grant (758791).

Notes

1 The test can equivalently be formulated as a test of Hc:θ=0 versus Hu:θ=0 as θ = 0 has zero probability under Hu when using a continuous prior for θ. The formulation Hu:θR is used however to make it explicit that the constrained hypothesis Hc is nested in the unconstrained hypothesis Hu.

2 This specific scaled Dirichlet distribution has probability density function πc*(γ1,γ2,γ4)=SDirichlet(αc1,αc2,αc3)=2αc2B(αc1,αc2,αc3)γ1αc11γ2αc21(1γ12γ2)αc31, with γ4=1γ12γ2, where B(·) is the multivariate beta function.