Full article: Posterior propriety of an objective prior for generalized hierarchical normal linear models

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Bayesian Hierarchical models has been widely used in modern statistical application. To deal with the data having complex structures, we propose a generalized hierarchical normal linear (GHNL) model which accommodates arbitrarily many levels, usual design matrices and ‘vanilla’ covariance matrices. Objective hyperpriors can be employed for the GHNL model to express ignorance or match frequentist properties, yet the common objective Bayesian approaches are infeasible or fraught with danger in hierarchical modelling. To tackle this issue, [Berger, J., Sun, D., & Song, C. (2020b). An objective prior for hyperparameters in normal hierarchical models. Journal of Multivariate Analysis, 178, 104606. https://doi.org/10.1016/j.jmva.2020.104606] proposed a particular objective prior and investigated its properties comprehensively. Posterior propriety is important for the choice of priors to guarantee the convergence of MCMC samplers. James Berger conjectured that the resulting posterior is proper for a hierarchical normal model with arbitrarily many levels, a rigorous proof of which was not given, however. In this paper, we complete this story and provide an user-friendly guidance. One main contribution of this paper is to propose a new technique for deriving an elaborate upper bound on the integrated likelihood, but also one unified approach to checking the posterior propriety for linear models. An efficient Gibbs sampling method is also introduced and outperforms other sampling approaches considerably.

KEYWORDS:

1. Introduction

Bayesian hierarchical models (or multilevel models) have been extensively used in the modern application, including education (Raudenbush & Bryk, Citation1986), psychology (Lindenberger & Pötter, Citation1998), clinical trials (Xia et al., Citation2011), economics (Shimotsu, Citation2010) and many other applied statistical fields. The fundamental idea of hierarchical modelling is to think of the lowest-level units (smallest and most numerous) as organized into a hierarchy of successively higher-level units. For example, students are in classes, classes are in schools, schools are in districts, and districts are in states. Accordingly, hierarchical models are naturally applicable to the survey, observational or experimental data involved with complicated nesting. However, the most commonly used and fully discussed hierarchical models are merely of two levels. Goldstein (Citation2011) and Berger et al. (Citation2020b) have ever defined 3-level hierarchical model and implemented statistical analysis on that. The hierarchical model with more levels was usually avoided by the researchers for the reason of analytical difficulty and intractable computation. To the best of authors' knowledge, a general hierarchical linear model with arbitrary levels seems to have never been defined or studied. In this paper, we will introduce the definition of a generalized hierarchical normal linear (GHNL) model and carry out an in-depth theoretical investigation of Bayesian inference for the GHNL models.

In order to implement fully Bayesian analysis, priors are supposed to be specified on the hyperparameters (parameters at higher levels of the hierarchical model). Improper (objective) priors are often used to express ignorance or to match frequentist properties (see the review article, Consonni et al., Citation2018). When using improper priors, an important issue whether the resulting posterior distributions are proper arises. As Hobert and Casella (Citation1996) stated, without proper precaution, misuse of improper priors, sometimes unknowingly, will result in practical difficulties, such as the non-convergence of the Gibbs sampler. The enormous practical importance of posterior propriety motivates us to explore it in the framework of GHNL modelling afterwards. There is also a vast modern literature investigating the posterior propriety of improper priors applied to a large variety of models, such as, Sun et al. (Citation2001), Speckman and Sun (Citation2003), Berger et al. (Citation2005) and Michalak and Morris (Citation2016).

A great deal of efforts have been devoted to the development of objective hyperpriors in hierarchical modelling, such as, Daniels and Kass (Citation1999), Everson and Morris (Citation2000), Gelman (Citation2006), Gustafson et al. (Citation2006), Berger et al. (Citation2005) and Berger et al. (Citation2020b). Formal objective Bayesian approaches, like the Jeffreys-rule prior or reference prior are only feasible for the simple hierarchical settings. For instance, the exact Jeffreys-rule prior for covariance matrices at higher level depends on the parameters from the lower level of the model, leading to plenty of difficulties in formulation and computation. Therefore, a common way is to use less formal approaches, such as applying formal objective priors from non-hierarchical models to hierarchical modelling. Unfortunately, the non-hierarchical Jefferys-rule prior and reference prior typically yield improper posteriors in the hierarchical settings (cf. Berger et al., Citation2005). Those who can recognize this problem often use constant priors instead for higher level variance components. However, the constant prior is so diffuse that it requires twice as many observations as logically needed to achieve posterior propriety (cf. Berger et al., Citation2005 and Berger et al., Citation2020b). In other words, the extra observations required are wasted on correcting the over-diffuse tail of the constant prior. The most powerful tool known for detecting over-diffuse hyperpriors is by looking at the frequentist notion of admissibility of resulting estimators (see Berger et al., Citation2005 for discussions and references). Sensible choices of objective hyperpriors are on the boundary of admissibility, being as diffuse as possible without leading to inadmissible estimators.

Berger et al. (Citation2005) studied the propriety and admissibility of a number of hyperprios, but no overall conclusion was reached as to a specific prior to recommend. The reasons are as follows: (a) the admissibility of the leading candidate prior was unable to get proved; (b) the proposed computation methods were only efficient for relatively low-dimensional covariance matrices and remained quite challenging for the candidate priors; (c) the hierarchical model discussed was merely of two levels, and the results are not adaptive to a general hierarchical model with many levels. To address this issue, Berger et al. (Citation2020b) recommended a particular objective prior for use in all normal hierarchical models. Consider the following canonical form of 2-level hierarchical normal model. Suppose that, independently, $y_{i} \sim N_{k} (θ_{i}, I_{k})$ and $θ_{i} \sim N_{k} (β, V)$ for $i = 1, \dots, m$ , where $N_{k} (\cdot, \cdot)$ denotes the k-dimensional normal distribution, $y_{i}$ are $k \times 1$ observation vectors, $θ_{i}$ are the $k \times 1$ unobserved mean vectors, $β$ is a $p \times 1$ ‘hypermean’ vector, and $V \in R^{k \times k}$ is an unknown ‘hypercovariance’ matrix. Berger et al. (Citation2020b) proposed a particular combination of independent priors on hyperparameters $β$ and $V$ as (1) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(k - 1) / 2}}, \\ π (V) & \propto \frac{1}{| V |^{1 - 1 / (2 k)} \prod_{1 \leq s < t \leq k} (v_{s} - v_{t})}, \end{aligned}$ (1) where $v_{1} > v_{2} > \dots > v_{k} > 0$ are the ordered eigenvalues of $V$ . The recommendation (Equation1(1) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(k - 1) / 2}}, \\ π (V) & \propto \frac{1}{| V |^{1 - 1 / (2 k)} \prod_{1 \leq s < t \leq k} (v_{s} - v_{t})}, \end{aligned}$ (1) ) for hyperpriors was justified by Berger et al. (Citation2020b) from the aspects of admissibility, ease of computation and performance. Most importantly, prior (Equation1(1) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(k - 1) / 2}}, \\ π (V) & \propto \frac{1}{| V |^{1 - 1 / (2 k)} \prod_{1 \leq s < t \leq k} (v_{s} - v_{t})}, \end{aligned}$ (1) ) is adapted to being used at any level in hierarchical modelling, which is not true for other proposed objective priors as previously mentioned.

Since it is hazardous to skip demonstrating propriety at the risk of making inference from improper posterior, Berger et al. (Citation2020b) has shown the posterior propriety of a 3-level hierarchical model using prior (Equation1(1) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(k - 1) / 2}}, \\ π (V) & \propto \frac{1}{| V |^{1 - 1 / (2 k)} \prod_{1 \leq s < t \leq k} (v_{s} - v_{t})}, \end{aligned}$ (1) ), while assuming square design matrices for a technical reason. Berger et al. (Citation2020b) also conjectured that the posterior is proper with the recommended prior being utilized at all levels of a hierarchical normal model with arbitrarily many levels, a rigorous proof of which was unable to be provided, however. In this paper, we complete this story and prove the posterior propriety for the GHNL models in general situations. Besides, as pointed out in Michalak and Morris (Citation2016), researchers have been finding it daunting and time-consuming to inspect posterior propriety when using improper priors, except in the simplest models. For this reason, we supply a user-friendly guidance for checking posterior propriety to practitioners in different practical situations.

In Section 2, we give an explicit definition of the GHNL model which accommodates arbitrarily many levels and usual design matrices. It is important to note that we are considering the ‘vanilla’ covariance matrix problem herein. We are not assuming any special structures or sparsity for hypercovariance matrices. The association between the GHNL model and a linear mixed-effect model is also discussed. In Section 3, we demonstrate that recommended prior yields a proper posterior in the framework of GHNL modelling. In addition, we exhibit a guidance for checking posterior propriety. An efficient MCMC algorithm for sampling from the posterior is introduced in Section 4. Section 5 provides some concluding remarks and further generalizations.

2. Generalized hierarchical normal linear model

In this section, we will introduce the definition of a GHNL model with $(r + 1)$ levels, where $r \geq 1$ . The association between the GHNL model and a linear mixed-effect model will be demonstrated as well, which brings an insight into the GHNL model. At last, the recommended prior on the hyperparameters of the GHNL model will be presented and discussed. Firstly, we introduce some notations to be used in the main body of this paper.

Notations Let $[k] = {1, 2, \dots, k}$ for a positive integer k; $1_{{\cdot}}$ stands for the indicator function; $N_{k} (μ, Σ)$ represents the k-dimensional normal distribution with mean $μ$ and covariance $Σ$ ; $N_{k} (μ, Σ)$ denotes a k-dimensional normal random variable with mean $μ$ and covariance $Σ$ ; for a symmetric matrix $A$ , $A > (<) 0$ means that $A$ is a positive (negative) definite matrix, and $A \geq (\leq) 0$ denotes that $A$ is a non-negative (non-positive) definite matrix.

2.1. Model structure

Berger et al. (Citation2020b) proposed a 3-level hierarchical model with the form: (2) ${\begin{cases} L e v e l 1 : y_{i} = θ_{i} + N_{k} (0, I_{k}), & i \in [m]; \\ \begin{array}{l} L e v e l 2 : θ_{i} = Z_{i} β + N_{k} (0, V), \end{array} & \begin{array}{l} β^{⊤} = (β_{1}^{⊤}, \\ \dots, β_{s}^{⊤}); \end{array} \\ L e v e l 3 : β_{j} = η + N_{p} (0, W), & j \in [s], \end{cases}$ (2) where $y_{i}$ are $k \times 1$ observation vectors, $θ_{i}$ are the $k \times 1$ unobserved mean vectors, $η$ is a $p \times 1$ ‘hypermean’ vector, $V \in R^{k \times k}$ and $W \in R^{p \times p}$ are unknown ‘hypercovariance’ matrices, and $Z_{i}$ are $k \times s p$ known matrices. At last, all the normal random variables in model (Equation2(2) ${\begin{cases} L e v e l 1 : y_{i} = θ_{i} + N_{k} (0, I_{k}), & i \in [m]; \\ \begin{array}{l} L e v e l 2 : θ_{i} = Z_{i} β + N_{k} (0, V), \end{array} & \begin{array}{l} β^{⊤} = (β_{1}^{⊤}, \\ \dots, β_{s}^{⊤}); \end{array} \\ L e v e l 3 : β_{j} = η + N_{p} (0, W), & j \in [s], \end{cases}$ (2) ) are mutually independent. Based on the 3-level hierarchical normal model, a more general hierarchical model with $(r + 1)$ levels ( $r \geq 1$ ) can be constructed as (3) ${\begin{cases} L e v e l 1 : & \begin{array}{l} y_{i_{0}} = Z_{0 i_{0}} θ_{1} \\ + N_{k_{0}} (0, I_{k_{0}}), \end{array} & \begin{array}{l} i_{0} \in [m_{0}], \\ θ_{1}^{⊤} = (θ_{11}^{⊤}, \\ \dots, θ_{1 m_{1}}^{⊤}); \end{array} \\ L e v e l 2 : & \begin{array}{l} θ_{1 i_{1}} = Z_{1 i_{1}} θ_{2} \\ + N_{k_{1}} (0, V_{1}), \end{array} & \begin{array}{l} i_{1} \in [m_{1}], θ_{2}^{⊤} \\ = (θ_{21}^{⊤}, \\ \dots, θ_{2 m_{2}}^{⊤}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & \begin{array}{l} θ_{r - 1, i_{r - 1}} \\ = Z_{r - 1, i_{r - 1}} θ_{r} \\ + N_{k_{r - 1}} (0, V_{r - 1}), \end{array} & \begin{array}{l} i_{r - 1} \in [m_{r - 1}], \\ θ_{r}^{⊤} = (θ_{r 1}^{⊤}, \\ \dots, θ_{r m_{r}}^{⊤}); \end{array} \\ L e v e l r + 1 : & \begin{array}{l} θ_{r i_{r}} = Z_{r i_{r}} η \\ + N_{k_{r}} (0, V_{r}), \end{array} & i_{r} \in [m_{r}] . \end{cases}$ (3) Firstly, all the normal random variables noted in the above model are mutually independent. Within model (Equation3(3) ${\begin{cases} L e v e l 1 : & \begin{array}{l} y_{i_{0}} = Z_{0 i_{0}} θ_{1} \\ + N_{k_{0}} (0, I_{k_{0}}), \end{array} & \begin{array}{l} i_{0} \in [m_{0}], \\ θ_{1}^{⊤} = (θ_{11}^{⊤}, \\ \dots, θ_{1 m_{1}}^{⊤}); \end{array} \\ L e v e l 2 : & \begin{array}{l} θ_{1 i_{1}} = Z_{1 i_{1}} θ_{2} \\ + N_{k_{1}} (0, V_{1}), \end{array} & \begin{array}{l} i_{1} \in [m_{1}], θ_{2}^{⊤} \\ = (θ_{21}^{⊤}, \\ \dots, θ_{2 m_{2}}^{⊤}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & \begin{array}{l} θ_{r - 1, i_{r - 1}} \\ = Z_{r - 1, i_{r - 1}} θ_{r} \\ + N_{k_{r - 1}} (0, V_{r - 1}), \end{array} & \begin{array}{l} i_{r - 1} \in [m_{r - 1}], \\ θ_{r}^{⊤} = (θ_{r 1}^{⊤}, \\ \dots, θ_{r m_{r}}^{⊤}); \end{array} \\ L e v e l r + 1 : & \begin{array}{l} θ_{r i_{r}} = Z_{r i_{r}} η \\ + N_{k_{r}} (0, V_{r}), \end{array} & i_{r} \in [m_{r}] . \end{cases}$ (3) ), the output of level $(j + 1)$ consists of $m_{j}$ units whose values are $k_{j} \times 1$ vectors for $j = 0, 1, \dots, r$ . By stacking the output units of level $(j + 1)$ on top of one another, we can obtain the outcome vector of level $(j + 1)$ as $θ_{j}$ for $j \in [r]$ and $y = {(y_{1}^{⊤}, \dots, y_{m_{0}}^{⊤})}^{⊤}$ for level 1. Then $θ_{j}$ 's are $(m_{j} k_{j}) \times 1$ vectors and $y$ is a $(m_{0} k_{0}) \times 1$ vector. In fact, only the outcome of the lowest level can be observed, and the outcomes of higher levels are inaccessible and latent variables. Hence, the outcome variables of interest are always situated at the lowest level of the hierarchy. Different units in the same level share in common input effects (intercept can be included) which are exactly the outcome vectors from the upper level, except that the input effect of level $(r + 1)$ is $η$ which is a $d \times 1$ vector of fixed effects. In addition, the units from the same level have the same variance component. The variance component within level $(j + 1)$ is denoted by $V_{j} \in R^{k_{j} \times k_{j}}$ for $j \in [r]$ and accounts for the magnitude of random variation within the corresponding level. The covariance matrices $V_{j}$ are unobserved for $j \in [r]$ . The matrices $Z_{j i_{j}}$ are $k_{j} \times (m_{j + 1} k_{j + 1})$ matrices and denote the matrices of observed covariates for unit $i_{j}$ in level j, where $j = 0, 1, \dots, r$ and $i_{j} \in [m_{j}]$ . It is natural to assume that there exist at least two units in each level and the dimensions of all units and $η$ are no less than 1, mathematically, $m_{j} \geq 2$ and $k_{j} \geq 1$ for $j = 0, 1, \dots, r$ , and $d \geq 1$ . Table summarizes several important notations that will mainly affect the results for the posterior propriety in Section 3.

Table 1. Summary of certain important notations within model (Equation3(3) ${\begin{cases} L e v e l 1 : & \begin{array}{l} y_{i_{0}} = Z_{0 i_{0}} θ_{1} \\ + N_{k_{0}} (0, I_{k_{0}}), \end{array} & \begin{array}{l} i_{0} \in [m_{0}], \\ θ_{1}^{⊤} = (θ_{11}^{⊤}, \\ \dots, θ_{1 m_{1}}^{⊤}); \end{array} \\ L e v e l 2 : & \begin{array}{l} θ_{1 i_{1}} = Z_{1 i_{1}} θ_{2} \\ + N_{k_{1}} (0, V_{1}), \end{array} & \begin{array}{l} i_{1} \in [m_{1}], θ_{2}^{⊤} \\ = (θ_{21}^{⊤}, \\ \dots, θ_{2 m_{2}}^{⊤}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & \begin{array}{l} θ_{r - 1, i_{r - 1}} \\ = Z_{r - 1, i_{r - 1}} θ_{r} \\ + N_{k_{r - 1}} (0, V_{r - 1}), \end{array} & \begin{array}{l} i_{r - 1} \in [m_{r - 1}], \\ θ_{r}^{⊤} = (θ_{r 1}^{⊤}, \\ \dots, θ_{r m_{r}}^{⊤}); \end{array} \\ L e v e l r + 1 : & \begin{array}{l} θ_{r i_{r}} = Z_{r i_{r}} η \\ + N_{k_{r}} (0, V_{r}), \end{array} & i_{r} \in [m_{r}] . \end{cases}$ (3) ) and $j = 0, 1, \dots, r$ .

Display Table

The extensions from Berger et al. (Citation2020b)'s model (Equation2(2) ${\begin{cases} L e v e l 1 : y_{i} = θ_{i} + N_{k} (0, I_{k}), & i \in [m]; \\ \begin{array}{l} L e v e l 2 : θ_{i} = Z_{i} β + N_{k} (0, V), \end{array} & \begin{array}{l} β^{⊤} = (β_{1}^{⊤}, \\ \dots, β_{s}^{⊤}); \end{array} \\ L e v e l 3 : β_{j} = η + N_{p} (0, W), & j \in [s], \end{cases}$ (2) ) to model (Equation3(3) ${\begin{cases} L e v e l 1 : & \begin{array}{l} y_{i_{0}} = Z_{0 i_{0}} θ_{1} \\ + N_{k_{0}} (0, I_{k_{0}}), \end{array} & \begin{array}{l} i_{0} \in [m_{0}], \\ θ_{1}^{⊤} = (θ_{11}^{⊤}, \\ \dots, θ_{1 m_{1}}^{⊤}); \end{array} \\ L e v e l 2 : & \begin{array}{l} θ_{1 i_{1}} = Z_{1 i_{1}} θ_{2} \\ + N_{k_{1}} (0, V_{1}), \end{array} & \begin{array}{l} i_{1} \in [m_{1}], θ_{2}^{⊤} \\ = (θ_{21}^{⊤}, \\ \dots, θ_{2 m_{2}}^{⊤}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & \begin{array}{l} θ_{r - 1, i_{r - 1}} \\ = Z_{r - 1, i_{r - 1}} θ_{r} \\ + N_{k_{r - 1}} (0, V_{r - 1}), \end{array} & \begin{array}{l} i_{r - 1} \in [m_{r - 1}], \\ θ_{r}^{⊤} = (θ_{r 1}^{⊤}, \\ \dots, θ_{r m_{r}}^{⊤}); \end{array} \\ L e v e l r + 1 : & \begin{array}{l} θ_{r i_{r}} = Z_{r i_{r}} η \\ + N_{k_{r}} (0, V_{r}), \end{array} & i_{r} \in [m_{r}] . \end{cases}$ (3) ) are two-fold, and model (Equation3(3) ${\begin{cases} L e v e l 1 : & \begin{array}{l} y_{i_{0}} = Z_{0 i_{0}} θ_{1} \\ + N_{k_{0}} (0, I_{k_{0}}), \end{array} & \begin{array}{l} i_{0} \in [m_{0}], \\ θ_{1}^{⊤} = (θ_{11}^{⊤}, \\ \dots, θ_{1 m_{1}}^{⊤}); \end{array} \\ L e v e l 2 : & \begin{array}{l} θ_{1 i_{1}} = Z_{1 i_{1}} θ_{2} \\ + N_{k_{1}} (0, V_{1}), \end{array} & \begin{array}{l} i_{1} \in [m_{1}], θ_{2}^{⊤} \\ = (θ_{21}^{⊤}, \\ \dots, θ_{2 m_{2}}^{⊤}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & \begin{array}{l} θ_{r - 1, i_{r - 1}} \\ = Z_{r - 1, i_{r - 1}} θ_{r} \\ + N_{k_{r - 1}} (0, V_{r - 1}), \end{array} & \begin{array}{l} i_{r - 1} \in [m_{r - 1}], \\ θ_{r}^{⊤} = (θ_{r 1}^{⊤}, \\ \dots, θ_{r m_{r}}^{⊤}); \end{array} \\ L e v e l r + 1 : & \begin{array}{l} θ_{r i_{r}} = Z_{r i_{r}} η \\ + N_{k_{r}} (0, V_{r}), \end{array} & i_{r} \in [m_{r}] . \end{cases}$ (3) ) accommodates arbitrarily many levels and usual covariate matrices. Further define $Z_{j} = {Z_{j 1}^{⊤}, \dots, Z_{j m_{j}}^{⊤}}^{⊤}$ for $j = 0, 1, \dots, r$ . Then $Z_{j}$ are $(m_{j} k_{j}) \times (m_{j + 1} k_{j + 1})$ matrices for $j = 0, 1, \dots, (r - 1)$ , $Z_{r}$ is an $(m_{r} k_{r}) \times d$ matrix, and an alternative representation of the $(r + 1)$ -level hierarchical model (Equation3(3) ${\begin{cases} L e v e l 1 : & \begin{array}{l} y_{i_{0}} = Z_{0 i_{0}} θ_{1} \\ + N_{k_{0}} (0, I_{k_{0}}), \end{array} & \begin{array}{l} i_{0} \in [m_{0}], \\ θ_{1}^{⊤} = (θ_{11}^{⊤}, \\ \dots, θ_{1 m_{1}}^{⊤}); \end{array} \\ L e v e l 2 : & \begin{array}{l} θ_{1 i_{1}} = Z_{1 i_{1}} θ_{2} \\ + N_{k_{1}} (0, V_{1}), \end{array} & \begin{array}{l} i_{1} \in [m_{1}], θ_{2}^{⊤} \\ = (θ_{21}^{⊤}, \\ \dots, θ_{2 m_{2}}^{⊤}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & \begin{array}{l} θ_{r - 1, i_{r - 1}} \\ = Z_{r - 1, i_{r - 1}} θ_{r} \\ + N_{k_{r - 1}} (0, V_{r - 1}), \end{array} & \begin{array}{l} i_{r - 1} \in [m_{r - 1}], \\ θ_{r}^{⊤} = (θ_{r 1}^{⊤}, \\ \dots, θ_{r m_{r}}^{⊤}); \end{array} \\ L e v e l r + 1 : & \begin{array}{l} θ_{r i_{r}} = Z_{r i_{r}} η \\ + N_{k_{r}} (0, V_{r}), \end{array} & i_{r} \in [m_{r}] . \end{cases}$ (3) ) is thereby given by (4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4)

Remark 2.1

If we assume that the covariance matrix for the units from level 1 in model (Equation3(3) ${\begin{cases} L e v e l 1 : & \begin{array}{l} y_{i_{0}} = Z_{0 i_{0}} θ_{1} \\ + N_{k_{0}} (0, I_{k_{0}}), \end{array} & \begin{array}{l} i_{0} \in [m_{0}], \\ θ_{1}^{⊤} = (θ_{11}^{⊤}, \\ \dots, θ_{1 m_{1}}^{⊤}); \end{array} \\ L e v e l 2 : & \begin{array}{l} θ_{1 i_{1}} = Z_{1 i_{1}} θ_{2} \\ + N_{k_{1}} (0, V_{1}), \end{array} & \begin{array}{l} i_{1} \in [m_{1}], θ_{2}^{⊤} \\ = (θ_{21}^{⊤}, \\ \dots, θ_{2 m_{2}}^{⊤}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & \begin{array}{l} θ_{r - 1, i_{r - 1}} \\ = Z_{r - 1, i_{r - 1}} θ_{r} \\ + N_{k_{r - 1}} (0, V_{r - 1}), \end{array} & \begin{array}{l} i_{r - 1} \in [m_{r - 1}], \\ θ_{r}^{⊤} = (θ_{r 1}^{⊤}, \\ \dots, θ_{r m_{r}}^{⊤}); \end{array} \\ L e v e l r + 1 : & \begin{array}{l} θ_{r i_{r}} = Z_{r i_{r}} η \\ + N_{k_{r}} (0, V_{r}), \end{array} & i_{r} \in [m_{r}] . \end{cases}$ (3) ) is a positive definite matrix $Σ_{0}$ instead of the identity matrix, when $Σ_{0}$ is known, the two assumptions are actually equivalent by taking reparameterization that $y_{i_{0}}^{*} = Σ_{0}^{- \frac{1}{2}} y_{i_{0}}$ and $Z_{i_{0}}^{*} = Σ_{0}^{- \frac{1}{2}} Z_{i_{0}}$ . Furthermore, for a technical reason, $Σ_{0}$ must be assumed as known throughout this paper, and this reason will be explained in Section 5.

2.2. Connection with the linear mixed-effect model $($ LMM $)$

The two-level hierarchical normal models are often referred to as LMMs in many places. As for the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ), let $Θ = {θ_{1}, \dots, θ_{r}}$ denote the set of unobserved outcome vectors and $V = {V_{1}, \dots, V_{r}}$ represent the set of unknown covariance matrices. If we take $θ_{j}$ 's as intermediate variables, then marginalizing out over $Θ$ yields (5) $(y | η, V) \sim N_{m_{0} k_{0}} (X_{r} η, Δ),$ (5) where (6) $\begin{aligned} Δ & = I_{m_{0} k_{0}} + \sum_{t = 1}^{r} X_{t - 1} (I_{m_{t}} \otimes V_{t}) X_{t - 1}^{⊤}, a n d \\ X_{j} & = \prod_{s = 0}^{j} Z_{s}, j = 0, 1, \dots, r . \end{aligned}$ (6) $Δ$ is a $(m_{0} k_{0}) \times (m_{0} k_{0})$ matrix and $X_{j}$ are $(m_{0} k_{0}) \times (m_{j + 1} k_{j + 1})$ matrices for $j = 0, 1, \dots, r$ . Suppose that $Z_{j}$ are of full column ranks. Then by Sylvester's rank inequality $X_{j}$ are also of full column ranks, $j = 0, 1, \dots, r$ . In the rest of the paper, $Z_{j}$ are assumed to be of full column ranks for $j = 0, 1, \dots, r$ .

If we consider a particular LMM as (7) $y = X_{r} η + X_{0} θ_{1}^{*} + \dots + X_{r - 1} θ_{r}^{*} + ϵ,$ (7) where $η$ is the fixed effect, $θ_{j}$ 's are random effects and independently distributed as $N_{m_{j} k_{j}} (0, I_{m_{j}} \otimes V_{j})$ for $j \in [r]$ , and $ϵ$ denotes the vector of random errors and is distributed as $N_{m_{0} k_{0}} (0, I_{m_{0} k_{0}})$ . By integrating out the random effects, the marginal distribution of $y$ conditioning on $(η, V)$ is identical to the distribution (Equation5(5) $(y | η, V) \sim N_{m_{0} k_{0}} (X_{r} η, Δ),$ (5) ). In one word, the GHNL is equivalent to an LMM in the sense of the marginal distribution of observations after integrating out intermediate outcome vectors or random effects. The equivalence between the GHNL models and the LMMs can be illustrated by an example of mixed-effect ANOVA model as well.

Example 2.1

Mixed-effect ANOVA model

Suppose we can observe the scores of p courses for student (ijk) as $y_{i j k}$ for $i = 1, \dots, s_{1}$ , $j = 1, \dots, s_{2}$ and $k = 1, \dots, s_{3}$ . The observed data are within a hierarchy of three levels: student (ijk) is nested within class (ij), and class (ij) is nested within school i. Thus, we have total $s_{1}$ schools, each school has $s_{2}$ classes and each class has $s_{3}$ students. Consider a mixed-effect ANOVA model as (8) $y_{i j k} = η + α_{i} + β_{i j} + ϵ_{i j k},$ (8) where $y_{i j k}$ , $η, α_{i}, β_{i j}$ and $ϵ_{i j k}$ are all $p \times 1$ vectors for $i = 1, \dots, s_{1}$ , $j = 1, \dots, s_{2}$ and $k = 1, \dots, s_{3}$ , $η$ denotes the overall mean and is fixed effect, $α_{i} \sim N_{p} (0, V_{α})$ is the effect of school, $β_{i j}$ is distributed as $N_{p} (0, V_{β})$ and represents the effect of class, the student-level independent random error is denoted by $ϵ_{i j k}$ and has distribution $N_{p} (0, Σ_{0})$ , and $Σ_{0}$ is a known matrix. At the same time, $α_{i}$ , $β_{i j}$ and $ϵ_{i j k}$ are independently distributed. Consequently, $V_{α}$ , $V_{β}$ and $Σ_{0}$ are the variance components describing the school-level, class-level and student-level variations, respectively. Due to the hierarchical structure of the observations, we can naturally build a hierarchical model as (9) $\begin{aligned} y_{i j k} & \sim N_{p} (β_{i j}^{*}, Σ_{0}), β_{i j}^{*} \sim N_{p} (α_{i}^{*}, V_{β}) a n d \\ α_{i}^{*} & \sim N_{p} (η, V_{α}) \end{aligned}$ (9) independently, for $i = 1, \dots, s_{1}$ , $j = 1, \dots, s_{2}$ and $k = 1, \dots, s_{3}$ . Denote that $\begin{aligned} Y_{i} & = (\begin{matrix} y_{i 11} & \dots & y_{i s_{2} 1} \\ ⋮ & ⋱ & ⋮ \\ y_{i 1 s_{3}} & \dots & y_{i s_{2} s_{3}} \end{matrix}), \\ E_{i} & = (\begin{matrix} ϵ_{i 11} & \dots & ϵ_{i s_{2} 1} \\ ⋮ & ⋱ & ⋮ \\ ϵ_{i 1 s_{3}} & \dots & ϵ_{i s_{2} s_{3}} \end{matrix}), \\ β_{i} & = (\begin{matrix} β_{i 1} \\ ⋮ \\ β_{i s_{2}} \end{matrix}) a n d \\ β_{i}^{*} & = (\begin{matrix} β_{i 1}^{*} \\ ⋮ \\ β_{i s_{2}}^{*} \end{matrix}), \end{aligned}$ where $Y_{i}$ and $E_{i}$ are both $(s_{3} p) \times s_{2}$ matrices, and $β_{i}$ and $β_{i}^{*}$ are both $(s_{2} p) \times 1$ vectors. Let $y_{i} = v e c (Y_{i})$ and $ϵ_{i} = v e c (E_{i})$ , where $v e c (A)$ denotes the column vector obtained by stacking the columns of the matrix $A$ on top of one another. Define that $\begin{aligned} y & = (\begin{matrix} y_{1} \\ ⋮ \\ y_{s_{1}} \end{matrix}), ϵ = (\begin{matrix} ϵ_{1} \\ ⋮ \\ ϵ_{s_{1}} \end{matrix}), β = (\begin{matrix} β_{1} \\ ⋮ \\ β_{s_{1}} \end{matrix}), \\ β^{*} & = (\begin{matrix} β_{1}^{*} \\ ⋮ \\ β_{s_{1}}^{*} \end{matrix}), α = (\begin{matrix} α_{1} \\ ⋮ \\ α_{s_{1}} \end{matrix}), α^{*} = (\begin{matrix} α_{1}^{*} \\ ⋮ \\ α_{s_{1}}^{*} \end{matrix}) . \end{aligned}$ Thus, $y$ and $ϵ$ are $(m_{0} p) \times 1$ vectors, and $β$ and $β^{*}$ are $(m_{1} p) \times 1$ vectors, and $α$ and $α^{*}$ are $(m_{2} p) \times 1$ vectors, where $m_{0} = s_{1} s_{2} s_{3}$ , $m_{1} = s_{1} s_{2}$ and $m_{2} = s_{1}$ . Then the hierarchical normal model (Equation9(9) $\begin{aligned} y_{i j k} & \sim N_{p} (β_{i j}^{*}, Σ_{0}), β_{i j}^{*} \sim N_{p} (α_{i}^{*}, V_{β}) a n d \\ α_{i}^{*} & \sim N_{p} (η, V_{α}) \end{aligned}$ (9) ) can be expressed as a GHNL model with the form (10) ${\begin{cases} L e v e l 1 : & (y | β^{*}, Σ_{0}) & \sim & \begin{array}{l} N_{m_{0} p} (Z_{0} β^{*}, I_{m_{0}} \otimes Σ_{0}); \end{array} \\ L e v e l 2 : & (β^{*} | α^{*}, V_{β}) & \sim & \begin{array}{l} N_{m_{1} p} (Z_{1} α^{*}, I_{m_{1}} \otimes V_{β}); \end{array} \\ L e v e l 3 : & (α^{*} | η, V_{α}) & \sim & \begin{array}{l} N_{m_{2} p} (Z_{2} η, I_{m_{2}} \otimes V_{α}), \end{array} \end{cases}$ (10) where $\begin{aligned} Z_{0} & = d i a g {\underset{s_{1} s_{2}}{\underset{⏟}{1_{s_{3}} \otimes I_{p}, \dots, 1_{s_{3}} \otimes I_{p}}}}, \\ Z_{1} & = d i a g {\underset{s_{1}}{\underset{⏟}{1_{s_{2}} \otimes I_{p}, \dots, 1_{s_{2}} \otimes I_{p}}}}, Z_{2} = 1_{s_{1}} \otimes I_{p}, \end{aligned}$ where $1_{q}$ denotes the $q \times 1$ vector with all elements being one, and $Z_{0}$ , $Z_{1}$ , $Z_{2}$ are $(m_{0} p) \times (m_{1} p)$ , $(m_{1} p) \times (m_{2} p)$ , $(m_{2} p) \times p$ matrices, respectively. Denote that $\begin{aligned} X_{0} & ≜ Z_{0}, \\ X_{1} & ≜ d i a g {\underset{s_{1}}{\underset{⏟}{1_{s_{2} s_{3}} \otimes I_{p}, \dots, 1_{s_{2} s_{3}} \otimes I_{p}}}} = Z_{0} Z_{1}, \\ X_{2} & ≜ 1_{s_{1} s_{2} s_{3}} \otimes I_{p} = Z_{0} Z_{1} Z_{2}, \end{aligned}$ and $X_{0}$ , $X_{1}$ , $X_{2}$ are $(m_{0} p) \times (m_{1} p)$ , $(m_{0} p) \times (m_{2} p)$ , $(m_{0} p) \times p$ matrices, respectively. Thus, model (Equation8(8) $y_{i j k} = η + α_{i} + β_{i j} + ϵ_{i j k},$ (8) ) can be summarized as (11) $y = X_{2} η + X_{1} α + X_{0} β + ϵ,$ (11) where $α \sim N_{m_{2} p} (0, I_{m_{2}} \otimes V_{α})$ , $β \sim N_{m_{1} p} (0, I_{m_{1}} \otimes V_{β})$ and $ϵ \sim N_{m_{0} p} (0, I_{m_{0}} \otimes Σ_{0})$ , independently. By integrating out $(α, β)$ and $(α^{*}, β^{*})$ , the marginal distributions of $y$ for model (Equation11(11) $y = X_{2} η + X_{1} α + X_{0} β + ϵ,$ (11) ) and model (Equation10(10) ${\begin{cases} L e v e l 1 : & (y | β^{*}, Σ_{0}) & \sim & \begin{array}{l} N_{m_{0} p} (Z_{0} β^{*}, I_{m_{0}} \otimes Σ_{0}); \end{array} \\ L e v e l 2 : & (β^{*} | α^{*}, V_{β}) & \sim & \begin{array}{l} N_{m_{1} p} (Z_{1} α^{*}, I_{m_{1}} \otimes V_{β}); \end{array} \\ L e v e l 3 : & (α^{*} | η, V_{α}) & \sim & \begin{array}{l} N_{m_{2} p} (Z_{2} η, I_{m_{2}} \otimes V_{α}), \end{array} \end{cases}$ (10) ) are identical and of the form: $\begin{aligned} (y | η, V_{α}, V_{β}, Σ_{0}) \sim N_{m_{0} p} (X_{2} η, Ω), \\ Ω = I_{m_{0}} \otimes Σ_{0} + X_{0} (I_{m_{1}} \otimes V_{β}) X_{0}^{⊤} \\ + X_{1} (I_{m_{2}} \otimes V_{α}) X_{1}^{⊤} . \end{aligned}$

Example 2.1 provides a simple example to illustrate how the hierarchical model and the mixed-effect model can be constructed based on the nested data, and the equivalence between two models is also presented. In Appendix 1, we define a special LMM which is a special case of model (Equation7(7) $y = X_{r} η + X_{0} θ_{1}^{*} + \dots + X_{r - 1} θ_{r}^{*} + ϵ,$ (7) ) with $m_{j} = 1$ for all $j \in [r]$ , and the theoretical investigation of this special LMM is distinct from that of the GHNLM. This special LMM could be common in application, and we just want to provide some theoretical results for those who have interests to refer to. We will still focus on the GHNL model in the following sections.

2.3. Priors on the hyperparameters

In order to implement fully Bayesian analysis, we should specify hyperpriors on the parameters $(η, V)$ . It follows the recommendation from Berger et al. (Citation2020b) that we can assume priors on $(η, V)$ as: (12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) (13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) where $ω_{j 1} > ω_{j 2} > \dots > ω_{j k_{j}} > 0$ are the decreasingly ordered eigenvalues of $V_{j}$ , $j \in [r]$ . Apart from prior (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ), common choices of prior on $η$ include the constant prior and conjugate prior. None of the three priors will result in improper priors or difficulties in computation. However, among the three priors, prior (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) is the most perfect for all k from the perspective of admissibility. Besides, it refers to Berger et al. (Citation2005) that the prior (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) is a mixture-of-normal prior of the hierarchical structure as (14) $(η | λ) \sim N_{d} (0, λ I_{d}) a n d [λ] \propto λ^{- 1 / 2} \exp (- \frac{1}{2 λ}),$ (14) and those mixture-of-normal priors have shown great success in shrinkage estimation particularly (cf. Fourdrinier et al., Citation1998) and robust Bayesian estimation generally (cf. Berger, Citation1980). Therefore, prior (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) was actually recommended by Berger et al. (Citation2005) for default use.

As for prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on the unknown covariance matrices $V_{j}$ , $j \in [r]$ , consider the transformation from $V_{j}$ to $Ω_{j} = d i a g (ω_{j 1}, \dots, ω_{j k_{j}})$ and the orthogonal matrix $Γ_{j}$ of corresponding eigenvectors, the Jacobian is (15) $| \frac{\partial V_{j}}{\partial (Ω_{j}, Γ_{j})} | = \prod_{s < t} (ω_{j s} - ω_{j t}) .$ (15) Consequently, the prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $V_{j}$ becomes the prior density of $(Ω_{j}, Γ_{j})$ as (16) $π (Ω_{j}, Γ_{j}) \propto \frac{1}{| Ω_{j} |^{1 - 1 / (2 k_{j})}}$ (16) with respect to Lebesgue measure on $(ω_{j 1}, \dots, ω_{j k_{j}})$ and the invariant Haar measure over the space ${Γ : Γ Γ^{⊤} = I_{k_{j}}}$ . Note that the prior on $Ω_{j}$ is improper and, independently, the prior on $Γ_{j}$ is constant. Use of a uniform prior for $Γ_{j}$ ranging over a compact space is natural and non-controversial and has no influence on the eigenvalues. The term $\prod_{s < t} (ω_{j s} - ω_{j t})$ is eliminated after changing variables for prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ). In contrast, the commonly used priors on the covariance matrix, such as inverse Wishart, Jeffreys-rule and constant priors, contain the term $\prod_{s < t} (ω_{j s} - ω_{j t})$ in the transformed space. This special term gives low mass to close eigenvalues and hence effectively force the eigenvalues apart. It is contrary to the common intuition which would suggest that one should choose a prior that pushes the eigenvalues closer together. As a result, prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) is essentially neutral as to expansion or shrinkage of the eigenvalues.

In the context of 2-level hierarchical normal model, Theorem 1 from Berger et al. (Citation2020b) has demonstrated that the combination of priors (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) and (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $(η, V)$ is on the boundary of admissibility, being as diffuse as possible without yielding inadmissible estimators. Furthermore, it is shown that the generalization that allows covariates at all levels of the hierarchical model will not affect the result of admissibility (cf. Berger et al., Citation2020b). Nonetheless, the admissibility of the recommended prior for the $(r + 1)$ -level hierarchical model with $r \geq 2$ is not clear. Generally speaking, this is a very difficult question to answer, and we mainly justify the recommendation of hyperpriors from the angles of posterior propriety and computation afterwards in the framework of the GHNL model.

3. Posterior propriety

Berger et al. (Citation2020b) has shown that the resulting posterior of the recommended prior is proper for the 3-level hierarchical model (Equation2(2) ${\begin{cases} L e v e l 1 : y_{i} = θ_{i} + N_{k} (0, I_{k}), & i \in [m]; \\ \begin{array}{l} L e v e l 2 : θ_{i} = Z_{i} β + N_{k} (0, V), \end{array} & \begin{array}{l} β^{⊤} = (β_{1}^{⊤}, \\ \dots, β_{s}^{⊤}); \end{array} \\ L e v e l 3 : β_{j} = η + N_{p} (0, W), & j \in [s], \end{cases}$ (2) ), but under a narrow set of assumptions. They also conjectured the posterior propriety for a hierarchical model with any number of levels, a rigorous one of which was not given yet. In this section, we will comprehensively investigate the conditions for the posterior propriety of the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) using the recommended prior in more general situations. The dimension of $η$ affects the investigation of posterior propriety considerably, and two cases $d \geq 2$ or d = 1 will be discussed separately.

Based on (Equation5(5) $(y | η, V) \sim N_{m_{0} k_{0}} (X_{r} η, Δ),$ (5) ) and (Equation14(14) $(η | λ) \sim N_{d} (0, λ I_{d}) a n d [λ] \propto λ^{- 1 / 2} \exp (- \frac{1}{2 λ}),$ (14) ), by integrating out $η$ , we can obtain the marginal distribution of $y$ conditioning on $(V, λ)$ as (17) $(y | V, λ) \sim N_{m_{0} k_{0}} (0, Δ + λ X_{r} X_{r}^{⊤}) .$ (17) The posterior propriety of the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) employing priors (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) and (Equation14(14) $(η | λ) \sim N_{d} (0, λ I_{d}) a n d [λ] \propto λ^{- 1 / 2} \exp (- \frac{1}{2 λ}),$ (14) ) is defined as (18) $m (y) = \int f (y | V, λ) π (λ) \prod_{s = 1}^{r} π (V_{s}) d λ \prod_{t = 1}^{r} d V_{t} < \infty,$ (18) where $m (y)$ denotes the marginal density of the observation vector. Next, we display some definitions and additional notations which are frequently used in this section.

More Notations Let $card (A)$ denote the cardinality of the set A. For $0 < I_{1}, I_{2} \leq \infty$ , denote that $I_{1} ≃ I_{2}$ if there exist constants $0 < C_{1} \leq C_{2}$ such that $C_{1} I_{2} \leq I_{1} \leq C_{2} I_{2}$ ; For a symmetric matrix $A \in R^{n \times n}$ , $λ_{i} (A)$ represents the i-th largest eigenvalue of $A$ , namely, $λ_{1} (A) \geq \dots \geq λ_{n} (A)$ ; Let $λ_{max} (A)$ and $λ_{min} (A)$ denote the maximum and minimum eigenvalues of an arbitrary symmetric matrix $A$ .

Definition

For convenience, let $ω_{r + 1, 1} ≜ λ$ , $m_{r + 1} ≜ d$ and $k_{r + 1} ≜ 1$ . Define a function of an arbitrary non-empty set E as $F (E) ≜ {D | D \subset E, D \neq \emptyset}$ , such that $F (E)$ denotes the set of all non-empty subsets of E. For any $R \in F ([r + 1])$ , let $H (R) ≜ {(j, l) | j \in R, l \in [k_{j}]}$ . Further define the composition of $F$ and H as $S (R) ≜ (F \circ H) (R) = {D | D \subset H (R), D \neq \emptyset}$ for any $R \in F ([r + 1])$ . Define that (19) $c_{j l, s} = 1_{{m_{j} (l - 1) < s \leq m_{j} l}},$ (19) for $j \in [r + 1]$ , $l \in [k_{j}]$ and $s \in [m_{0} k_{0}]$ .

3.1. Two key lemmas

Before we formally investigate the posterior propriety, we first introduce two important lemmas which dominate in the process of proving the main theorems in this paper.

Lemma 3.1

Assume that $A_{j}$ are $p_{j} \times p_{j}$ positive definite matrices, $j \in [r]$ . Let $X_{j}$ be $n \times p_{j}$ matrices of full column ranks, $j \in [r]$ . Define that (20) $H = \sum_{j = 1}^{r} X_{j} A_{j} X_{j}^{⊤} .$ (20) Then

$λ_{max} (H) \leq C_{1} \sum_{j = 1}^{r} λ_{max} (A_{j})$ , where $C_{1} = max_{j \in [r]} λ_{max} (X_{j}^{⊤} X_{j})$ .
Also, (21) $| H | \geq {(\frac{C_{2}}{r})}^{n} | \sum_{j = 1}^{r} D_{j} |,$ (21) where $C_{2} = min_{j \in [r]} λ_{min} (X_{j}^{⊤} X_{j}) > 0$ . For any $j \in [r]$ , $D_{j} = d i a g (a_{j 1}, \dots, a_{j n})$ , where $a_{j k} = λ_{k} (A_{j})$ for $k \in [p_{j}]$ and $a_{j k} = 0$ for $p_{j} < k \leq n$ .

Lemma 3.1 mainly demonstrates two inequalities with respect to the summation of a series of quadratic forms which have matrical inputs. Since $X_{i}$ 's have full column rank and $A_{j}$ 's are positive definite, then $n \geq p_{j}$ and $r a n k (X_{j} A_{j} X_{j}^{⊤}) = p_{j}$ . It is worthwhile to note that the non-zero diagonal elements of $D_{j}$ are the decreasingly ordered eigenvalues of $A_{j}$ in the lower bound of $| H |$ , and this relation will deeply influence the last result when we derive the sufficient condition for posterior propriety afterwards. Besides, we can never find a constant $C^{*} > 0$ such that ${| H |}_{+} \leq C^{*} {| \sum_{j = 1}^{r} D_{j} |}_{+}$ , where ${| M |}_{+}$ denotes the product of all non-zero eigenvalues of $M$ . The proof of Lemma 3.1 can be found in Appendix A.2.

Lemma 3.2

Assume a positive integer k. Suppose $M$ is a subset of $F ([k])$ with cardinality n and let $C_{1}, \dots, C_{n}$ denote the sequence of all the elements within $M$ . Define an integral $I = \int_{Ω} \frac{1}{[\prod_{j = 1}^{k} λ_{j}^{a_{j}}] [\prod_{i = 1}^{n} (1 + \sum_{r \in C_{i}} λ_{r})^{b_{i}}]} d λ,$ where $a_{j} j = 1, 2, \dots, k,$ and $b_{i}, i = 1, 2, \dots, n,$ are real constants, and $λ = (λ_{1}, \dots, λ_{k})^{⊤} \in Ω \equiv [0, \infty]^{k}$ . Then the integral I is finite if and only if (iff) the following two conditions are both satisfied.

$a_{j} < 1$ , $j \in [k]$ ;
inequalities (22) $\sum_{j \in D} a_{j} + \sum_{i \in G_{D}} b_{i} > c a r d (D)$ (22) for all $D \in F ([k])$ hold, where $G_{D} = {i | D ⋂ C_{i} \neq \emptyset, i \in [n]}$ .

Here, Lemma 3.2 (b) may not be straightforward enough, so we will use the following example to elaborate on how Lemma 3.2 can be employed.

Example 3.1

Consider integral $I_{0} = \int_{Ω} \frac{1}{\begin{matrix} [\prod_{j = 1}^{3} λ_{j}^{a_{j}}] (1 + λ_{1})^{b_{1}} (1 + λ_{1} + λ_{2})^{b_{2}} (1 + λ_{2} + λ_{3})^{b_{3}} \end{matrix}} d λ_{1} d λ_{2} d λ_{3},$ where $a_{1}, a_{2}, a_{3}, b_{1}, b_{2}, b_{3}$ are all real constants. Similar to Lemma 3.2, we can define $M = {{1}, {1, 2}, {2, 3}}$ . Then $I_{0} < \infty$ iff all the following inequalities hold: (a) $a_{j} < 1, j \in [3]$ ; (b) $\begin{aligned} f o r D = {1}, a_{1} + b_{1} + b_{2} > 1; \\ f o r D = {2}, a_{2} + b_{2} + b_{3} > 1; \\ f o r D = {3}, a_{3} + b_{3} > 1; \\ f o r D = {1, 2}, a_{1} + a_{2} + b_{1} + b_{2} + b_{3} > 2; \\ f o r D = {1, 3}, a_{1} + a_{3} + b_{1} + b_{2} + b_{3} > 2; \\ f o r D = {2, 3}, a_{2} + a_{3} + b_{2} + b_{3} > 2; \\ f o r D = {1, 2, 3}, a_{1} + a_{2} + a_{3} + b_{1} + b_{2} + b_{3} > 3. \end{aligned}$

Note that, no matter how $M$ is defined, we always need to check all the inequalities corresponding to all $D \in F ([k])$ . Even though some inequalities could be trivial after being written down, for the sake of assurance, we would better take all non-empty subsets of $[k]$ into account in the early stage. Lemma 3.2 plays a crucial role in obtaining the follow-up theorems, and detailed proof of Lemma 3.2 sees Appendix A.2.

3.2. Conditions for the posterior to be proper when $d \geq 2$

In this subsection, the case $d \geq 2$ is mainly considered.

Theorem 3.1

Consider the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) with priors (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) and (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $η$ and $V_{i} \in V$ , respectively. When $d \geq 2$ , a sufficient condition for the posterior propriety is given by (23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) for any $D \in S ([r + 1])$ .

Proof.

It follows (Equation17(17) $(y | V, λ) \sim N_{m_{0} k_{0}} (0, Δ + λ X_{r} X_{r}^{⊤}) .$ (17) ) that the integrated likelihood of $(V, λ)$ after marginalizing out $(θ_{1}, \dots, θ_{r}, η)$ is given by (24) $\begin{aligned} L (V, λ; y) & \propto \frac{1}{| Δ + λ X_{r} X_{r}^{⊤} |^{1 / 2}} \\ \times \exp {- \frac{1}{2} y^{⊤} {(Δ + λ X_{r} X_{r}^{⊤})}^{- 1} y} . \end{aligned}$ (24) By dropping the exponent term involving $y$ (since it is less than one), we have $L (V, λ; y) < \frac{1}{| Δ + λ X_{r} X_{r}^{⊤} |^{1 / 2}} .$ By applying Lemma 3.1 (b), we can further bound the integrated likelihood as (25) $L (V, λ; y) \leq \frac{C_{1}}{{| M_{1} |}^{1 / 2}}$ (25) where $C_{1}$ is a positive constant that only depends on $X_{j}$ 's and (26) $\begin{aligned} M_{1} & = I_{m_{0} k_{0}} + \sum_{j = 1}^{r + 1} D_{j} a n d \\ D_{j} & = {(\begin{matrix} Ω_{j} \otimes I_{m_{j}} \\ O_{q_{j}} \end{matrix})}_{(m_{0} k_{0}) \times (m_{0} k_{0})}, \end{aligned}$ (26) where $Ω_{j}$ are the diagonal matrices of the decreasingly ordered eigenvalues of $V_{j}$ for $j \in [r]$ , $Ω_{r + 1} = (ω_{r + 1, 1})$ , $O_{q_{j}}$ are $q_{j} \times q_{j}$ zero matrices and $q_{j} = m_{0} k_{0} - m_{j} k_{j}$ for $j \in [r + 1]$ . Since $Ω_{r + 1} = (ω_{r + 1, 1})$ , then $Ω_{r + 1}$ is a degenerate matrix as scalar λ and the prior on λ becomes (27) $π (Ω_{r + 1}) \propto \frac{1}{| Ω_{r + 1} |^{1 - 1 / (2 k_{r + 1})}} \exp {- \frac{1}{2} t r (Ω_{r + 1}^{- 1})} .$ (27) Combining (Equation16(16) $π (Ω_{j}, Γ_{j}) \propto \frac{1}{| Ω_{j} |^{1 - 1 / (2 k_{j})}}$ (16) ), (Equation18(18) $m (y) = \int f (y | V, λ) π (λ) \prod_{s = 1}^{r} π (V_{s}) d λ \prod_{t = 1}^{r} d V_{t} < \infty,$ (18) ), (Equation27(27) $π (Ω_{r + 1}) \propto \frac{1}{| Ω_{r + 1} |^{1 - 1 / (2 k_{r + 1})}} \exp {- \frac{1}{2} t r (Ω_{r + 1}^{- 1})} .$ (27) ) and (Equation25(25) $L (V, λ; y) \leq \frac{C_{1}}{{| M_{1} |}^{1 / 2}}$ (25) ), we have $\begin{aligned} m (y) & \leq \int \frac{\begin{matrix} C_{1} \exp {- t r (Ω_{r + 1}^{- 1}) / 2} \prod_{j = 1}^{r + 1} 1_{{ω_{j 1} > \dots > ω_{j k_{j}} > 0}} \end{matrix}}{{| M_{1} |}^{1 / 2} \prod_{j = 1}^{r + 1} {| Ω_{j} |}^{1 - \frac{1}{2 k_{j}}}} \\ \times [\prod_{j = 1}^{r} d Ω_{j} d Γ_{j}] d Ω_{r + 1} \\ < \int \frac{C_{1}}{{| M_{1} |}^{1 / 2} \prod_{j = 1}^{r + 1} {| Ω_{j} |}^{1 - \frac{1}{2 k_{j}}}} \prod_{j = 1}^{r + 1} d Ω_{j} ≜ I_{0} . \end{aligned}$ The definition of $c_{j l, s}$ in (Equation19(19) $c_{j l, s} = 1_{{m_{j} (l - 1) < s \leq m_{j} l}},$ (19) ) yields $| M_{1} | = \prod_{s = 1}^{m_{0} k_{0}} (1 + \sum_{j = 1}^{r + 1} \sum_{l = 1}^{k_{j}} c_{j l, s} ω_{j l}) .$ Therefore, $I_{0} ≃ \int \frac{\prod_{j = 1}^{r + 1} d Ω_{j}}{\begin{matrix} \prod_{j = 1}^{r + 1} {| Ω_{j} |}^{1 - \frac{1}{2 k_{j}}} \prod_{s = 1}^{m_{0} k_{0}} {(1 + \sum_{j = 1}^{r + 1} \sum_{l = 1}^{k_{j}} c_{j l, s} ω_{j l})}^{\frac{1}{2}} \end{matrix}},$ which is finite iff (28) $\sum_{(j, l) \in D} (1 - \frac{1}{2 k_{j}}) + \frac{1}{2} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > c a r d (D)$ (28) for any $D \in S ([r + 1])$ by employing Lemma 3.2. It's obvious that the inequality (Equation28(28) $\sum_{(j, l) \in D} (1 - \frac{1}{2 k_{j}}) + \frac{1}{2} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > c a r d (D)$ (28) ) is equivalent to that in (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ).

Applying Lemma 3.1 yields the upper bound on the integrated likelihood function of hyperparameters as $C_{0} | M_{1} |^{- \frac{1}{2}}$ , where $M_{1}$ is an $(m_{0} k_{0}) \times (m_{0} k_{0})$ matrix and defined in (Equation26(26) $\begin{aligned} M_{1} & = I_{m_{0} k_{0}} + \sum_{j = 1}^{r + 1} D_{j} a n d \\ D_{j} & = {(\begin{matrix} Ω_{j} \otimes I_{m_{j}} \\ O_{q_{j}} \end{matrix})}_{(m_{0} k_{0}) \times (m_{0} k_{0})}, \end{aligned}$ (26) ). Therefore, the special notation $c_{j l, s}$ can be understood as the indicator of whether eigenvalue $ω_{j l}$ appears in the s-th diagonal element of $M_{1}$ for $j \in [r + 1]$ , $l \in [k_{j}]$ and $s \in [m_{0} k_{0}]$ . At the same time, the left-hand side of inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) actually stands for the cardinality of set ${s | \exists (j, l) \in D, s u c h t h a t c_{j l, s} > 0}$ for any $D \in S ([r + 1])$ .

The cardinality of $S ([r + 1])$ is $2^{\sum_{j \in [r + 1]} k_{j}} - 1$ , which means that the total number of the inequalities to be checked in (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) is exponential with r and the dimensions $k_{j}$ . It has to be admitted that this will impose considerably heavy computational burden in common practice by applying Theorem 3.1 directly. Nevertheless, the researchers have no need to be anxious about the heavy computational burden, because most of the inequalities in Theorem 3.1 are trivial. To conclude this point, we have the following corollary.

Corollary 3.1

Recursively define that $\begin{aligned} R_{1} & = {j | m_{j} \leq r + 1, j \in [r + 1]}; \\ R_{2} & = {j | m_{j} \leq c a r d (R_{1}), j \in R_{1}}; \\ ⋮ \\ R_{p} & = {j | m_{j} \leq c a r d (R_{p - 1}), j \in R_{p - 1}}, \end{aligned}$ where p is the smallest positive integer i such that ${j | m_{j} > card (R_{i}), j \in R_{i}} = \emptyset$ . We call the levels within $R_{p}$ as kernel levels and denote $R_{p}$ by $R_{k e r}$ . The inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) holds for any $D \in S ([r + 1])$ iff the inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) holds for any $D \in S (R_{k e r})$ . Consequently, if $R_{k e r} = \emptyset$ , then the posterior is always proper.

Proof.

Let $R_{1}^{c} = [r + 1] - R_{1}$ , thus, $m_{j} > r + 1$ for $j \in R_{1}^{c}$ by the definition of $R_{1}$ . For any $D \in S ([r + 1])$ , if there exists $j^{*} \in R_{1}^{c}$ such that $(j^{*}, l) \in D$ for any $l \in [k_{j}^{*}]$ , then $\begin{aligned} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} & \geq max_{(j, l) \in D} m_{j} > r + 1 \\ = max_{D \in S ([r + 1])} \sum_{(j, l) \in D} \frac{1}{k_{j}}, \end{aligned}$ which is a trivial one. As a result, inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) holds for any $D \in S ([r + 1])$ iff inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) holds for any $D \in S (R_{1})$ . Since $max_{D \in S (R_{i})} \sum_{(j, l) \in D} \frac{1}{k_{j}} = c a r d (R_{i}),$ it can be recursively shown that inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) holds for any $D \in S ([r + 1])$ iff inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) holds for any $D \in S (R_{i})$ and $i \in [p]$ , where p is the smallest positive integer i such that $R_{i} - R_{i + 1} = \emptyset$ .

By using the technique of extracting kernel levels, we dramatically narrow down the checking region for posterior propriety. Since we should only check the inequalities for the levels within $R_{k e r}$ , substantially reducing the number of inequalities to be checked from $2^{\sum_{j \in [r + 1]} k_{j}} - 1$ to $2^{\sum_{j \in R_{k e r}} k_{j}} - 1$ . Moreover, Corollary 3.1 also indicates two interesting conclusions depicted as follows.

First, it reveals the mechanism how three roles, number of levels, numbers of units in levels and dimensions of levels, affect the posterior propriety simultaneously. Roughly speaking, in the context of GHNL, the more levels with fewer units having lower dimensions, the less likely the posterior is to be proper. For example, if $m_{r - 2} = m_{r - 1} = m_{r} = 2$ and $k_{r - 2} = k_{r - 1} = k_{r} = 1$ , for $D = {(j, l) | j = r - 2, r - 1, r, l = 1}$ , we have (29) $2 = \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} < \sum_{(j, l) \in D} \frac{1}{k_{j}} = 3.$ (29) Therefore, the posterior propriety can hardly be guaranteed by Theorem 3.1. Conversely, if the units in each level are adequate enough such that the set of kernel levels is empty, i.e., $R_{k e r} = \emptyset$ , then the posterior is always proper. As a consequence, more attention should be focused on the levels with small number of units, namely, the kernel levels.
Second, the recommended prior for use at any level in hierarchical modelling is further justified from the aspect of posterior propriety and ease of implementation. For instance, if we switch the prior on $V_{j}$ , $j \in [r]$ from (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) to (30) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - a} \prod_{1 \leq s < t \leq k_{j}} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (30) where $0 < a \leq 1$ (a has to be larger than zero by Lemma (3.2)). Then the condition in Theorem 3.1 becomes (31) $\begin{aligned} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 2 a \times c a r d (D), \\ D \in S ([r + 1]) . \end{aligned}$ (31) On one hand, when a is greater than but close to zero (denoted by $a ⪆ 0$ ), all the inequalities in (Equation31(31) $\begin{aligned} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 2 a \times c a r d (D), \\ D \in S ([r + 1]) . \end{aligned}$ (31) ) always hold. Thus, the posterior will be always proper. However, it is impractical to decide how small is for a and find such a fixed value that fits all levels. On the other hand, when $a ≲ 1$ , which denotes that a is less than and close to 1, then $2 a * card (D) > \sum_{(j, l) \in D} \frac{1}{k_{j}}$ , concluding that inequality (Equation31(31) $\begin{aligned} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 2 a \times c a r d (D), \\ D \in S ([r + 1]) . \end{aligned}$ (31) ) is harder to get reached than inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ), especially for large dimensions $k_{j}$ 's. Therefore, the posterior using prior (Equation30(30) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - a} \prod_{1 \leq s < t \leq k_{j}} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (30) ) is rather unlikely to be proper than using prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ). Similar to Corollary 3.1, we can recursively define that $\begin{aligned} R_{1}^{*} & = {j | m_{j} \leq 2 a \times c a r d (E_{0}), j \in [r + 1]}; \\ R_{2}^{*} & = {j | m_{j} \leq 2 a \times c a r d (E_{1}^{*}), j \in R_{1}^{*}}; \\ ⋮ \\ R_{p^{*}}^{*} & = {j | m_{j} \leq 2 a \times c a r d (E_{p^{*} - 1}^{*}), j \in R_{p^{*} - 1}^{*}}, \end{aligned}$ where $E_{0} = H ([r + 1])$ , $E_{i}^{*} = H (R_{i}^{*})$ for $i \in [p^{*}]$ and $p^{*}$ is the smallest positive integer l such that ${j | m_{j} > 2 a * card (E_{l}^{*}), j \in R_{l}^{*}} = \emptyset$ . Then the posterior using prior (Equation30(30) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - a} \prod_{1 \leq s < t \leq k_{j}} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (30) ) instead is proper if (Equation31(31) $\begin{aligned} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 2 a \times c a r d (D), \\ D \in S ([r + 1]) . \end{aligned}$ (31) ) holds for any $D \in S (R_{p^{*}}^{*})$ . When $a ≲ 1$ , $card (R_{p^{*}}^{*})$ will be remarkably larger than $card (R_{k e r})$ for large values of $k_{j}$ 's, imposing a dramatically heavier burden of checking inequalities than that for prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ). Above all, one sensible choice for a is that let a be inversely proportional to $k_{j}$ for level j, which takes both practical and theoretical considerations into account.

The upper bound on $\sum_{(j, l) \in D} \frac{1}{k_{j}}$ as $card (R_{i})$ for $D \in S (R_{i})$ , leads to an effective way to extract kernel levels as presented in Corollary 3.1, but this upper bound is still too rough. Next, an elaborate upper bound on $\sum_{(j, l) \in D} \frac{1}{k_{j}}$ is demonstrated and a sufficient condition of clean form for posterior propriety is derived then.

Theorem 3.2

Consider the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) with priors (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) and (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $η$ and $V_{i} \in V$ , respectively. Denote that $m^{*} = min_{j \in [r + 1]} m_{j} = min_{j \in R_{k e r}} m_{j}$ . When $d \geq 2$ , the posterior is always proper if (32) $\sum_{j \in R_{k e r}} \frac{1}{k_{j}} < m^{*} .$ (32)

Proof.

For $D \in S ([r + 1])$ , define that (33) $\begin{aligned} L (D) & = \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}}, a n d \\ R (D) & = {j | (j, l) \in D} . \end{aligned}$ (33) It follows from Corollary 3.1 that we only need to prove $\sum_{(j, l) \in D} \frac{1}{k_{j}} < L (D), \forall D \in S (R_{k e r}) .$ Also, for any D belonging to $S (R_{k e r})$ , we have (34) $\begin{aligned} \sum_{(j, l) \in D} \frac{1}{k_{j}} & = \sum_{j \in R (D)} \frac{1}{k_{j}} \sum_{l = 1}^{k_{j}} 1_{{(j, l) \in D}} \\ \leq (\sum_{j \in R (D)} \frac{1}{k_{j}}) max_{j \in R (D)} \sum_{l = 1}^{k_{j}} 1_{{(j, l) \in D}} . \end{aligned}$ (34) Distinct eigenvalues from the same level ( $\leq r$ -th) never occur in the same row of matrix $M_{1}$ . Mathematically, $c_{j l_{1}, s} c_{j l_{2}, s} = 0$ , for $1 \leq l_{1} < l_{2} \leq k_{j}$ , $j \in [r]$ , $s \in [m_{0} k_{0}]$ . Thus, for any $j \in [r]$ , (35) $\sum_{l = 1}^{k_{j}} 1_{{(j, l) \in D}} \leq \frac{1}{m_{j}} L (D),$ (35) which is also true for j = r + 1 since $k_{r + 1} = 1$ . It is obvious that $min_{j \in [r + 1]} m_{j} = min_{j \in R_{k e r}} m_{j}$ by the definition of $R_{k e r}$ , and that is denoted by $m^{*}$ . Combining (Equation35(35) $\sum_{l = 1}^{k_{j}} 1_{{(j, l) \in D}} \leq \frac{1}{m_{j}} L (D),$ (35) ) with (Equation34(34) $\begin{aligned} \sum_{(j, l) \in D} \frac{1}{k_{j}} & = \sum_{j \in R (D)} \frac{1}{k_{j}} \sum_{l = 1}^{k_{j}} 1_{{(j, l) \in D}} \\ \leq (\sum_{j \in R (D)} \frac{1}{k_{j}}) max_{j \in R (D)} \sum_{l = 1}^{k_{j}} 1_{{(j, l) \in D}} . \end{aligned}$ (34) ) yields (36) $\begin{aligned} \sum_{(j, l) \in D} \frac{1}{k_{j}} & \leq (\sum_{j \in R (D)} \frac{1}{k_{j}}) \frac{L (D)}{min_{j \in R (D)} m_{j}} \\ \leq \frac{1}{m^{*}} (\sum_{j \in R_{k e r}} \frac{1}{k_{j}}) L (D) < L (D) . \end{aligned}$ (36)

According to the proof procedure of Theorem 3.2, it can be deduced that $\sum_{j \in [r + 1]} \frac{1}{k_{j}} < m^{*}$ is also sufficient for posterior propriety. Obviously, the condition in Theorem 3.2 is easier to be satisfied. Theorem 3.2 reveals that for fixed $m^{*}$ , the posterior is more likely to be proper for higher dimensions of the units in the kernel levels. Theorem 3.2 also provides the researchers with a powerful tool to check the posterior propriety quickly.

Remark 3.1

Consider the model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) with r = 1, namely, a two-level hierarchical model. When $d \geq 2$ , we have $m^{*} = min {d, m_{1}} \geq 2$ . Then the posterior using the recommended prior is always proper for $k_{1} \geq 2$ referring to Theorem 3.2.

Example 3.2

Continue with Example 2.1

Consider the GHNL modelling of the mixed-effect ANOVA as (Equation10(10) ${\begin{cases} L e v e l 1 : & (y | β^{*}, Σ_{0}) & \sim & \begin{array}{l} N_{m_{0} p} (Z_{0} β^{*}, I_{m_{0}} \otimes Σ_{0}); \end{array} \\ L e v e l 2 : & (β^{*} | α^{*}, V_{β}) & \sim & \begin{array}{l} N_{m_{1} p} (Z_{1} α^{*}, I_{m_{1}} \otimes V_{β}); \end{array} \\ L e v e l 3 : & (α^{*} | η, V_{α}) & \sim & \begin{array}{l} N_{m_{2} p} (Z_{2} η, I_{m_{2}} \otimes V_{α}), \end{array} \end{cases}$ (10) ), which is a 3-level hierarchical model with r = 2, $m_{0} = s_{1} s_{2} s_{3}$ , $m_{1} = s_{1} s_{2}$ , $m_{2} = s_{1}$ , $m_{3} = p$ , $k_{0} = k_{1} = k_{2} = p$ and $k_{3} = 1$ . It is natural to assume that we have at least two schools, each school has at least two classes and each class has at least two students, namely, $s_{1} \geq 2$ , $s_{2} \geq 2$ and $s_{3} \geq 2$ . If (a) p>2 and $s_{1} \geq 2$ or (b) $p \geq 2$ and $s_{1} > 2$ holds, it can be readily derived that the set of kernel levels is empty. Thus, the posterior is always proper according to Corollary 3.1. When $s_{1} = p = 2$ , the set of kernel levels is $R_{2} = {2, 3}$ , since $\frac{1}{k_{2}} + \frac{1}{k_{3}} < 2$ , then the posterior is proper by applying Theorem 3.2. In conclusion, the posterior using the recommended prior is always proper when $p \geq 2$ .

In Berger et al. (Citation2020b)'s work, for a technical reason, they assumed k = sp for 3-level hierarchical normal model (Equation2(2) ${\begin{cases} L e v e l 1 : y_{i} = θ_{i} + N_{k} (0, I_{k}), & i \in [m]; \\ \begin{array}{l} L e v e l 2 : θ_{i} = Z_{i} β + N_{k} (0, V), \end{array} & \begin{array}{l} β^{⊤} = (β_{1}^{⊤}, \\ \dots, β_{s}^{⊤}); \end{array} \\ L e v e l 3 : β_{j} = η + N_{p} (0, W), & j \in [s], \end{cases}$ (2) ) such that the design matrices for units within level 2 and 3 are square matrices. They eventually reached a conclusion that the posterior employing the recommended prior in model (Equation2(2) ${\begin{cases} L e v e l 1 : y_{i} = θ_{i} + N_{k} (0, I_{k}), & i \in [m]; \\ \begin{array}{l} L e v e l 2 : θ_{i} = Z_{i} β + N_{k} (0, V), \end{array} & \begin{array}{l} β^{⊤} = (β_{1}^{⊤}, \\ \dots, β_{s}^{⊤}); \end{array} \\ L e v e l 3 : β_{j} = η + N_{p} (0, W), & j \in [s], \end{cases}$ (2) ) is always proper for k = sp and $p \geq 2$ (p is the dimension of hypermean in model (Equation2(2) ${\begin{cases} L e v e l 1 : y_{i} = θ_{i} + N_{k} (0, I_{k}), & i \in [m]; \\ \begin{array}{l} L e v e l 2 : θ_{i} = Z_{i} β + N_{k} (0, V), \end{array} & \begin{array}{l} β^{⊤} = (β_{1}^{⊤}, \\ \dots, β_{s}^{⊤}); \end{array} \\ L e v e l 3 : β_{j} = η + N_{p} (0, W), & j \in [s], \end{cases}$ (2) )). However, the assumption that the design matrices for the units at high levels are square matrices appears to be unnatural and hard to be interpreted in practice. Nevertheless, we still generalize Berger et al. (Citation2020b)'s result to the GHNL model in the following so as to draw a consistent conclusion with theirs.

Corollary 3.2

Consider the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) with priors (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) and (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $η$ and $V_{i} \in V$ , respectively. Assume that $d \geq 2$ and $k_{j} = m_{j + 1} k_{j + 1}$ , $j \in [r]$ . Then the posterior is always proper.

Proof.

Since $m_{j} \geq 2$ , $j \in [r + 1]$ , then $m^{*} \geq 2$ . It remains to show that $\sum_{j \in R_{k e r}} \frac{1}{k_{j}} < 2$ by Theorem 3.2. By utilizing the condition that $k_{j} = m_{j + 1} k_{j + 1}$ for $j \in [r]$ and $k_{r + 1} = 1$ , we have $k_{j} = \prod_{s = j + 1}^{r + 1} m_{s} \geq 2^{r + 1 - j}$ for $j \in [r]$ . Thus, $\sum_{j \in R_{k e r}} \frac{1}{k_{j}} \leq \sum_{j \in [r + 1]} \frac{1}{k_{j}} \leq \sum_{j \in [r + 1]} \frac{1}{2^{r + 1 - j}} = 2 - \frac{1}{2^{r}} < 2,$ which completes the proof.

Above all, a general procedure for checking the posterior propriety of the GHNL models (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) employing the recommended prior for $d \geq 2$ can be summarized as follows.

Guidance for checking the posterior propriety when $d \geq 2$ :

If the design matrices for each unit in each level are square matrices, then the posterior is proper, otherwise, turn to (b).
Derive the set of kernel levels, $R_{k e r}$ . If $R_{k e r} = \emptyset$ or inequality (Equation32(32) $\sum_{j \in R_{k e r}} \frac{1}{k_{j}} < m^{*} .$ (32) ) holds, the the posterior is proper. If neither, turn to (c).
Check inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ) for all D belonging to $S (R_{k e r})$ . If that always holds, the the posterior is proper. If not, the posterior propriety can hardly be guaranteed.

3.3. Conditions for the posterior to be proper when d = 1

It is quite common that the dimension of the fixed effect $η$ is only one in practice. However, when $d = 1$ , note that $\begin{aligned} 1 & = m_{r + 1} = \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} \\ = \sum_{(j, l) \in D} \frac{1}{k_{j}} = \frac{1}{k_{r + 1}} = 1 \end{aligned}$ for $D = {(r + 1, 1)}$ , resulting in the failure of the sufficient condition in Theorem 3.1. Therefore, in this subsection, we mainly reinvestigate the conditions for the the posterior to be proper for d = 1.

Theorem 3.3

Consider the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) with constant prior on η and prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $V_{j} \in V$ . When d = 1, the posterior is proper if (37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) holds for all $D \in S ([r])$ .

Proof.

When d = 1, vector $η$ will degenerate into a scalar η. Hence the prior (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ) on $η$ becomes a constant prior on η. Based on (Equation5(5) $(y | η, V) \sim N_{m_{0} k_{0}} (X_{r} η, Δ),$ (5) ), by integrating out over η, we can get the upper bound on the integrated likelihood of $V$ after dropping the exponential term (less than one) $L (V) < \frac{1}{{| Δ |}^{\frac{1}{2}} {| X_{r}^{⊤} Δ^{- 1} X_{r} |}^{\frac{1}{2}}} .$ Since $| X_{r}^{⊤} Δ^{- 1} X_{r} | \geq X_{r}^{⊤} X_{r} λ_{min} (Δ^{- 1}) = X_{r}^{⊤} X_{r} λ_{max} {(Δ)}^{- 1}$ , using Lemma 3.1 (a) and (b), we have $\begin{aligned} L (V) & < C_{0} {| Δ |}^{- \frac{1}{2}} {(1 + \sum_{j = 1}^{r} ω_{j 1})}^{\frac{1}{2}} \\ \leq C_{1} {(1 + \sum_{j = 1}^{r} ω_{j 1})}^{\frac{1}{2}} {| M_{2} |}^{- \frac{1}{2}}, \end{aligned}$ where $C_{0}$ and $C_{1}$ are constants that are independent of the $Ω_{j}$ for $j \in [r]$ , $M_{2} = I_{m_{0} k_{0}} + \sum_{j = 1}^{r} D_{j}$ and $D_{j}$ 's are defined in (Equation26(26) $\begin{aligned} M_{1} & = I_{m_{0} k_{0}} + \sum_{j = 1}^{r + 1} D_{j} a n d \\ D_{j} & = {(\begin{matrix} Ω_{j} \otimes I_{m_{j}} \\ O_{q_{j}} \end{matrix})}_{(m_{0} k_{0}) \times (m_{0} k_{0})}, \end{aligned}$ (26) ). Similar to the proof of Theorem 3.1, we can derive the upper bound on $m (y)$ as $m (y) \leq \int \frac{C_{1} {(1 + \sum_{j = 1}^{r} ω_{j 1})}^{\frac{1}{2}}}{{| M_{2} |}^{1 / 2} \prod_{j = 1}^{r} {| Ω_{j} |}^{1 - \frac{1}{2 k_{j}}}} \prod_{j = 1}^{r} d Ω_{j} ≜ I_{1} .$ It follows from the definition of $c_{j l, s}$ in (Equation19(19) $c_{j l, s} = 1_{{m_{j} (l - 1) < s \leq m_{j} l}},$ (19) ) that $| M_{2} | = \prod_{s = 1}^{m_{0} k_{0}} (1 + \sum_{j = 1}^{r} \sum_{l = 1}^{k_{j}} c_{j l, s} ω_{j l}) .$ Thus, $I_{0} ≃ \int \frac{{(1 + \sum_{j = 1}^{r} ω_{j 1})}^{\frac{1}{2}} \prod_{j = 1}^{r} d Ω_{j}}{\begin{matrix} \prod_{j = 1}^{r} {| Ω_{j} |}^{1 - \frac{1}{2 k_{j}}} \prod_{s = 1}^{m_{0} k_{0}} {(1 + \sum_{j = 1}^{r} \sum_{l = 1}^{k_{j}} c_{j l, s} ω_{j l})}^{\frac{1}{2}} \end{matrix}},$ which is finite iff (38) $\begin{aligned} \sum_{(j, l) \in D} (1 - \frac{1}{2 k_{j}}) + \frac{1}{2} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} \\ - \frac{1}{2} 1_{{\exists j \in [r], (j, 1) \in D}} > c a r d (D) \end{aligned}$ (38) for any $D \in S ([r])$ by employing Lemma 3.2. It's obvious that inequality (Equation38(38) $\begin{aligned} \sum_{(j, l) \in D} (1 - \frac{1}{2 k_{j}}) + \frac{1}{2} \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} \\ - \frac{1}{2} 1_{{\exists j \in [r], (j, 1) \in D}} > c a r d (D) \end{aligned}$ (38) ) is equivalent to that in (Equation37(37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) ).

Resembling the interpretation of Theorem 3.1, the left-hand side of inequality (Equation37(37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) ) actually denotes the cardinality of set ${s | \exists (j, l) \in D, s u c h t h a t c_{j l, s} > 0}$ for any $D \in S ([r])$ . To reduce the burden of checking inequalities, we have the following corollary.

Corollary 3.3

Recursively define that $\begin{aligned} {\tilde{R}}_{1} & = {j | m_{j} \leq r + 1, j \in [r]}; \\ {\tilde{R}}_{2} & = {j | m_{j} \leq c a r d ({\tilde{R}}_{1}) + 1, j \in {\tilde{R}}_{1}}; \\ ⋮ \\ {\tilde{R}}_{q} & = {j | m_{j} \leq c a r d ({\tilde{R}}_{p - 1}) + 1, j \in {\tilde{R}}_{p - 1}}, \end{aligned}$ where q is the smallest positive integer i such that ${j | m_{j} > card ({\tilde{R}}_{i}) + 1, j \in {\tilde{R}}_{i}} = \emptyset$ . We call the levels within ${\tilde{R}}_{q}$ as kernel levels and denote ${\tilde{R}}_{q}$ by ${\tilde{R}}_{k e r}$ . Inequality (Equation37(37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) ) holds for any $D \in S ([r])$ iff inequality (Equation37(37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) ) holds for any $D \in S ({\tilde{R}}_{k e r})$ . Consequently, if ${\tilde{R}}_{k e r} = \emptyset$ , then the resulting posterior is always proper.

In the process of extracting kernel levels, the thresholds of $m_{j}$ to split up levels are increased by one in Corollary 3.3, when compared with that in Corollary 3.1, and this is because the upper bound on the right-hand side of inequality (Equation37(37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) ) is increased by one than that of inequality (Equation23(23) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{k_{j}},$ (23) ). Except for this point, the proof of Corollary 3.1 is same as that of Corollary 3.3. For good measure, a simple tool to check the posterior propriety is shown as follows, which is a counterpart of Theorem 3.2 for d = 1.

Theorem 3.4

Consider the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) with constant prior on η and prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $V_{j} \in V$ . When $d = 1$ , the posterior is always proper if (39) $\sum_{j \in {\tilde{R}}_{k e r}} \frac{1}{k_{j}} < m^{*} - 1,$ (39) where $m^{*} = min_{j \in [r]} m_{j}$ and ${\tilde{R}}_{k e r}$ is the derived set of kernel levels.

Proof.

For any $D \in S ([r])$ , define $L (D)$ and $R (D)$ in the same way as that in (Equation33(33) $\begin{aligned} L (D) & = \sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}}, a n d \\ R (D) & = {j | (j, l) \in D} . \end{aligned}$ (33) ). According to Corollary (3.3), it suffices to show that $L (D) > \sum_{(j, l) \in D} \frac{1}{k_{j}} + 1$ for any $D \in S ({\tilde{R}}_{k e r})$ . Similar to (Equation36(36) $\begin{aligned} \sum_{(j, l) \in D} \frac{1}{k_{j}} & \leq (\sum_{j \in R (D)} \frac{1}{k_{j}}) \frac{L (D)}{min_{j \in R (D)} m_{j}} \\ \leq \frac{1}{m^{*}} (\sum_{j \in R_{k e r}} \frac{1}{k_{j}}) L (D) < L (D) . \end{aligned}$ (36) ), we have $\begin{aligned} \sum_{(j, l) \in D} \frac{1}{k_{j}} & \leq \frac{1}{m^{*}} (\sum_{j \in R (D)} \frac{1}{k_{j}}) L (D) \\ \leq \frac{1}{m^{*}} (\sum_{j \in {\tilde{R}}_{k e r}} \frac{1}{k_{j}}) L (D) < L (D) - \frac{L (D)}{m^{*}}, \end{aligned}$ for any $D \in S ({\tilde{R}}_{k e r})$ . Since $L (D) \geq m^{*}$ always holds, then the proof is completed.

Remark 3.2

In model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ), when r = 1 and d = 1, suppose $m_{1} \geq 2$ and $k_{1} \geq 2$ . The posterior using the recommended prior is always proper by applying Theorem 3.4.

Remark 3.3

If all $k_{j}$ 's are equal to one, the sufficient condition in Theorem 3.3 can be simplified as $c a r d (D) < max_{j \in D} m_{j} - 1,$ where $D \in F ([r])$ . By employing the technique of extracting kernel levels, Theorem 3.4 is equivalent to Theorem 3.3, rather than a mere sufficient condition.

Example 3.3

Continue with Example 2.1

Consider model (Equation10(10) ${\begin{cases} L e v e l 1 : & (y | β^{*}, Σ_{0}) & \sim & \begin{array}{l} N_{m_{0} p} (Z_{0} β^{*}, I_{m_{0}} \otimes Σ_{0}); \end{array} \\ L e v e l 2 : & (β^{*} | α^{*}, V_{β}) & \sim & \begin{array}{l} N_{m_{1} p} (Z_{1} α^{*}, I_{m_{1}} \otimes V_{β}); \end{array} \\ L e v e l 3 : & (α^{*} | η, V_{α}) & \sim & \begin{array}{l} N_{m_{2} p} (Z_{2} η, I_{m_{2}} \otimes V_{α}), \end{array} \end{cases}$ (10) ) with assuming that $s_{1} \geq 2$ , $s_{2} \geq 2$ and $s_{3} \geq 2$ . If p = 1, we have $k_{0} = k_{1} = k_{2} = k_{3} = m_{3} = 1$ . When $s_{1} > 2$ , we can easily derive the set of kernel levels as empty set. Thus, the posterior is always proper by Corollary 3.3. If $s_{1} = 2$ , inequality (Equation37(37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) ) fails, and the posterior propriety can hardly be guaranteed. Consequently, when p = 1, the posterior using the recommended prior is always proper for $s_{1} > 2$ .

Next, we generalize Berger et al. (Citation2020b)'s result to the GHNL models for d = 1, assuming that the design matrices $Z_{j i_{j}}$ 's for units are square matrices.

Corollary 3.4

When $d = 1$ , consider the same model and prior as Theorem 3.3. Suppose $m_{j} \geq 3$ and $k_{j} = m_{j + 1} k_{j + 1}$ , $j \in [r]$ . Then the posterior is always proper.

Proof.

It follows from Theorem 3.4 that we should only present $\sum_{j \in [r]} \frac{1}{k_{j}} < m^{*} - 1.$ According to the conditions that $k_{j} = m_{j + 1} k_{j + 1}$ , $j \in [r]$ and $m_{r + 1} = k_{r + 1} = 1$ , we have $k_{j} = \prod_{s = j + 1}^{r} m_{s}$ , $j \in [r]$ . Thus, $\sum_{j \in [r]} \frac{1}{k_{j}} \leq \sum_{j \in [r]} \frac{1}{3^{r - j}} = \frac{3}{2} (1 - \frac{1}{3^{r}}) < \frac{3}{2} < 2 \leq m^{*} - 1.$

Summing up the theoretical results above, a general procedure for checking the posterior propriety of the GHNL models (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) employing the recommended prior for d = 1 can be depicted as follows.

Guidance for checking the posterior propriety when d = 1:

If the design matrices for each unit in each level are square matrices and $m_{j} \geq 3$ , $j \in [r]$ , then the posterior is proper, otherwise, turn to (b).
Derive the set of kernel levels ${\tilde{R}}_{k e r}$ . If ${\tilde{R}}_{k e r} = \emptyset$ or inequality (Equation39(39) $\sum_{j \in {\tilde{R}}_{k e r}} \frac{1}{k_{j}} < m^{*} - 1,$ (39) ) holds, then the posterior is proper. If neither, turn to (c).
Check inequality (Equation37(37) $\sum_{s = 1}^{m_{0} k_{0}} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{k_{j}}$ (37) ) for all D belonging to $S ({\tilde{R}}_{k e r})$ . If that always holds, then the posterior is proper. If not, the posterior propriety can hardly be guaranteed.

4. Computation

In this section, we consider the MCMC sampling from the posterior arising from the model in Section 2. For the GHNL model (Equation4(4) ${\begin{cases} L e v e l 1 : & (y | θ_{1}) & \sim & \begin{array}{l} N_{m_{0} k_{0}} (Z_{0} θ_{1}, I_{m_{0} k_{0}}); \end{array} \\ L e v e l 2 : & (θ_{1} | θ_{2}, V_{1}) & \sim & \begin{array}{l} N_{m_{1} k_{1}} (Z_{1} θ_{2}, \\ I_{m_{1}} \otimes V_{1}); \end{array} \\ ⋮ & ⋮ & ⋮ \\ L e v e l r : & (θ_{r - 1} | θ_{r}, V_{r - 1}) & \sim & \begin{array}{l} N_{m_{r - 1} k_{r - 1}} (Z_{r - 1} θ_{r}, I_{m_{r - 1}} \\ \otimes V_{r - 1}); \end{array} \\ L e v e l r + 1 : & (θ_{r} | η, V_{r}) & \sim & \begin{array}{l} N_{m_{r} k_{r}} (Z_{r} η, \\ I_{m_{r}} \otimes V_{r}) . \end{array} \end{cases}$ (4) ) with prior (Equation14(14) $(η | λ) \sim N_{d} (0, λ I_{d}) a n d [λ] \propto λ^{- 1 / 2} \exp (- \frac{1}{2 λ}),$ (14) ) and (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on $η$ and $V$ , respectively, the joint posterior of $(Θ, V, η, λ)$ can be written as (40) $\begin{aligned} π (Θ, V, η, λ | y) \propto f (y | θ_{1}) \prod_{j = 1}^{r - 1} f (θ_{j} | θ_{j + 1}, V_{j}) f (θ_{r} | η, V_{r}) \\ \times \prod_{s = 1}^{r} π (V_{s}) π (η | λ) π (λ) . \end{aligned}$ (40) Sampling $(Θ, V, η, λ)$ from the posterior density (Equation40(40) $\begin{aligned} π (Θ, V, η, λ | y) \propto f (y | θ_{1}) \prod_{j = 1}^{r - 1} f (θ_{j} | θ_{j + 1}, V_{j}) f (θ_{r} | η, V_{r}) \\ \times \prod_{s = 1}^{r} π (V_{s}) π (η | λ) π (λ) . \end{aligned}$ (40) ) can be handled by Gibbs sampling method. The main difficulty of the computation is to sample the covariance matrices $V_{j}$ 's efficiently.

4.1. Gibbs sampling for input effects

The full conditionals of the input effects $(Θ, η)$ can be derived from the joint posterior (Equation40(40) $\begin{aligned} π (Θ, V, η, λ | y) \propto f (y | θ_{1}) \prod_{j = 1}^{r - 1} f (θ_{j} | θ_{j + 1}, V_{j}) f (θ_{r} | η, V_{r}) \\ \times \prod_{s = 1}^{r} π (V_{s}) π (η | λ) π (λ) . \end{aligned}$ (40) ) and are illustrated as follows.

Conditioning on $θ_{2}$ and $V_{1}$ , the posterior distribution of $θ_{1}$ is (41) $(θ_{1} | θ_{2}, V_{1}; y) \sim N_{m_{1} k_{1}} ({\tilde{θ}}_{1}, {\tilde{V}}_{1}),$ (41) where $\begin{aligned} {\tilde{V}}_{1} & = {(Z_{0}^{⊤} Σ^{- 1} Z_{0} + I_{m_{1}} \otimes V_{1}^{- 1})}^{- 1}, \\ {\tilde{θ}}_{1} & = {\tilde{V}}_{1} [Z_{0}^{⊤} Σ^{- 1} y + (I_{m_{1}} \otimes V_{1}^{- 1}) Z_{1} θ_{2}] . \end{aligned}$
The full conditional posteriors of $θ_{j}$ , $j = 2, \dots, r$ have the forms: (42) $(θ_{j} | θ_{j - 1}, θ_{j + 1}, V_{j - 1}, V_{j}) \sim N_{m_{j} k_{j}} ({\tilde{θ}}_{j}, {\tilde{V}}_{j}),$ (42) where $\begin{aligned} {\tilde{V}}_{j} & = [Z_{j - 1}^{⊤} (I_{m_{j - 1}} \otimes V_{j - 1}^{- 1}) Z_{j - 1} \\ + {I_{m_{j}} \otimes V_{j}^{- 1}]}^{- 1}, \\ {\tilde{θ}}_{j} & = {\tilde{V}}_{j} [Z_{j - 1}^{⊤} (I_{m_{j - 1}} \otimes V_{j - 1}^{- 1}) θ_{j - 1} \\ + (I_{m_{j}} \otimes V_{j}^{- 1}) Z_{j} θ_{j + 1}], \end{aligned}$ and $θ_{r + 1} ≜ η$ .
By using (Equation14(14) $(η | λ) \sim N_{d} (0, λ I_{d}) a n d [λ] \propto λ^{- 1 / 2} \exp (- \frac{1}{2 λ}),$ (14) ), the full conditional of $η$ can be derived as (43) $(η | θ_{r}, λ, V_{r}) \sim N_{d} (\tilde{η}, {\tilde{V}}_{η}),$ (43) where $\begin{aligned} {\tilde{V}}_{η} & = {[Z_{r}^{⊤} (I_{m_{r}} \otimes V_{r}^{- 1}) Z_{r} + λ^{- 1} I_{d}]}^{- 1}, \\ \tilde{η} & = {\tilde{V}}_{η} Z_{r}^{⊤} (I_{m_{r}} \otimes V_{r}^{- 1}) θ_{r} . \end{aligned}$

Input effects $θ_{j} \in Θ$ and $η$ can be readily sampled from their conditionals during the Gibbs sampling procedure, as their full conditional posterior distributions are all standard distributions.

4.2. Gibbs sampling for variance components

The variance components which include $V_{j}$ 's and λ can be updated from their full conditionals, and these conditionals have densities as follows.

Given $η$ , the conditional posterior density of λ is (44) $π (λ | η) \propto λ^{- \frac{d + 1}{2}} \exp {- \frac{1 + ‖ η ‖^{2}}{2 λ}},$ (44) which is actually an inverse gamma distribution as $I G (\frac{d - 1}{2}, \frac{1 + ‖ η ‖^{2}}{2})$ .
For $j \in [r]$ , define that $t_{j} = \frac{m_{j}}{2} + 1 - \frac{1}{2 k_{j}}$ . The marginal posterior density of $V_{j}$ given $(θ_{j}, θ_{j + 1})$ is (45) $\begin{aligned} π (V_{j} | θ_{j}, θ_{j + 1}) & \propto \frac{1}{{| V_{j} |}^{t_{j}} \prod_{1 \leq s < t \leq k_{j}} (ω_{j s} - ω_{j t})} \\ \times e t r {- \frac{1}{2} V_{j}^{- 1} H_{j}}, \end{aligned}$ (45) where $e t r (A)$ denotes $\exp (t r (A))$ for a square matrix $A$ , and $\begin{aligned} H_{j} & ≜ H_{j} (θ_{j}, θ_{j + 1}) \\ = \sum_{i = 1}^{m_{j}} (θ_{j i} - Z_{j i} θ_{j + 1}) (θ_{j i} - Z_{j i} θ_{j + 1})^{⊤} . \end{aligned}$

The updating of λ can be simply carried out by sampling from an inverse gamma distribution. The full conditional posteriors of $V_{j}$ , (Equation45(45) $\begin{aligned} π (V_{j} | θ_{j}, θ_{j + 1}) & \propto \frac{1}{{| V_{j} |}^{t_{j}} \prod_{1 \leq s < t \leq k_{j}} (ω_{j s} - ω_{j t})} \\ \times e t r {- \frac{1}{2} V_{j}^{- 1} H_{j}}, \end{aligned}$ (45) ) are actually distributed as a recently proposed class of prior distributions by Berger et al. (Citation2020a) for the covariance matrix, which is called the Shrinkage Inverse Wishart (SIW) distributions. The new class $S I W (a, H)$ for a $k \times k$ covariance matrix $W$ has the density as (46) $π^{S I W} (W | a, H) \propto \frac{e t r (- \frac{1}{2} W^{- 1} H)}{{| W |}^{a} \prod_{i < j} (ν_{i} - ν_{j})},$ (46) where $ν_{1} > ν_{2} > \dots > ν_{k} > 0$ are the ordered eigenvalues of $W$ , a is a real constant and $H$ is a $k \times k$ non-negative definite matrix. Thus, $V_{j}$ are distributed as $S I W (t_{j}, H_{j})$ , $j \in [r]$ . To sample the covariance matrices from the full conditional posteriors, the previously suggested methods include the Metropolis-Hastings algorithm (cf. Berger et al., Citation2005) and Hit-and-run method (cf. Yang & Berger, Citation1994). The two methods both generate full candidate matrices by utilizing full-parameter proposal distributions, resulting in that they only work for moderate dimensions of the covariance matrices. To tackle this issue, Berger et al. (Citation2020a) proposed a powerful Gibbs method for efficiently sampling the covariance matrices from their conditional densities and this new method works for higher dimensions k. The audience can refer to Berger et al. (Citation2020a) or Appendix 3 for details of this Gibbs sampling method. According to the simulation results of Berger et al. (Citation2020a), the new Gibbs method outperforms the Metropolis-Hastings and Hit-and-run methods for moderate dimensions and work for k up to 100, while the other two algorithms break down in much lower dimensions.

In the framework of 2-level HNLM, Berger et al. (Citation2020b) compared the numerical performance, from the mean square error (MSE) perspective, of a dozen of objective hyperpriors, which are the product of three objective hyperpriors for the hypermean and four objective hyperpriors for the hypercovariance matrix. Priors on the hypermean include constant prior, conjugate prior and the recommended prior (Equation12(12) $\begin{aligned} π (η) & \propto \frac{1}{(1 + ‖ η ‖^{2})^{(d - 1) / 2}}, η \in R^{d}, \end{aligned}$ (12) ). Priors on the hypercovariance matrix include constant prior, hierarchical Jefferys prior, hierarchical reference prior and the recommended prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ). Their simulation results have shown that the recommended combination of hyperpriors dominates all the others in terms of Bayes risk, and the constant prior on the hypercovariance performs the worst. However, neither of the two remaining choices for hypercovariance is computationally easy. Considering the 4-level HNLM, Song et al. (Citation2020) performed numerical experiment to compare the recommended prior with constant prior for the hypercovariance matrices, and the other two priors were canceled due to intractable computation. Also, Song et al. (Citation2020)'s result presented the domination of the recommended hyperpriors over other priors. In conclusion, both Berger et al. (Citation2020b) and Song et al. (Citation2020) have provided strong numerical evidence of the superiority of the recommended hyperpriors for use in GHNLM, since 2-level and 4-level HNLMs both are specific GHNLM.

5. Discussions

We have proposed a generalized hierarchical normal linear model applicable to the nested data with complex structures. The GHNL model proves to be equivalent to a LMM model, while the GHNL model is more natural for researchers to model nested data from scratch, especially when incorporating covariates at high levels. Like generalizations to the simple normal linear model, the GHNL can be generalized to the hierarchical model with generalized liner model in the first level, and thus discrete observations can be handled. Besides, the first level (or even higher levels) of the GHNL model can be also extended to the setting of semiparametric regression models, such as, single index model and partially linear model. The technique of modelling and investigation in this paper can be applied to the linear part of the models mentioned above. The statistical analysis could be complicated and such explorations are beyond the scope of this paper, however.

Berger et al. (Citation2020b) put an end to the endless search for the appropriate hyperpriors in hierarchical modelling and investigated properties comprehensively to justify the recommendation. Nonetheless, when it came to the propriety of the resulting posterior, they only suspected that that is true for use at any level for a general hierarchical normal model, the conditions for which were not given. To complete the story, we have studied the conditions for the posterior to be proper in more general situations than Berger et al. (Citation2020b), when employing the recommended prior for the GHNL model. Theorems 3.1 and 3.3 demonstrate the main result, and Corollaries 3.1 and 3.3 reduce the computational burdens by defining kernel sets for $d \geq 2$ and d = 1, respectively. In addition, Theorems 3.2 and 3.4 provide powerful tools of simple forms for checking propriety of posterior for $d \geq 2$ and d = 1, separately. The user-friendly guidance for checking posterior propriety is eventually supplied. Note that our results only present sufficient conditions, and necessary conditions have never been discussed. The reason is because the derivation of the lower bound on the integrated likelihood of hyperparameters is intractable. Moreover, it is not worthwhile to investigate necessary conditions, as the derived upper bounds are tight enough such that the corresponding sufficient conditions are very modest, according to the remarks and examples in Section 3. At last, an efficient and powerful Gibbs sampling method for sampling from the posterior is introduced, overcoming the bottleneck of computation that the previously proposed sampling method only works for low dimensions or moderate dimensions inefficiently. The numerical evidence supporting the superiority of the recommended prior for hierarchical models was presented in Berger et al. (Citation2020b) and Song et al. (Citation2020).

Though we have made much progress in the hierarchical linear modelling, a major obstacle to applying our results is that the variance component for the first level is supposed to be known, which can hardly be satisfactory in practice. If we assume an unknown covariance matrix $Σ_{0}$ for the first level and specify prior (Equation13(13) $\begin{aligned} π (V_{j}) & \propto \frac{1}{| V_{j} |^{1 - 1 / (2 k_{j})} \prod_{s < t} (ω_{j s} - ω_{j t})}, \\ V_{j} > 0, j \in [r], \end{aligned}$ (13) ) on it, the exponential term within the likelihood can not be dropped simply any longer when deriving the upper bound, otherwise, the resulting integral will be always infinite. The upper bound on the exponential term with respect to the eigenvalues of covariance matrices is very tricky to be obtained, and the condition for the integrability of the resulting integral remains to be further studied. Thus, the GHNL modelling with unknown $Σ_{0}$ can be taken as a sequential study of this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research was supported by the National Natural Science Foundation of China [grant number 11671146].

References

Berger, J. (1980). A robust generalized Bayes estimator and confidence region for a multivariate normal mean. Annals of Statistics, 8(4), 716–761. https://doi.org/10.1214/aos/1176345068
Web of Science ®Google Scholar
Berger, J., Strawderman, W., & Tang, D. (2005). Posterior propriety and admissibility of hyperpriors in normal hierarchical models. Annals of Statistics, 33(2), 606–646. https://doi.org/10.1214/009053605000000075
Web of Science ®Google Scholar
Berger, J., Sun, D., & Song, C. (2020a). Bayesian analysis of the covariance matrix of a multivariate normal distribution with a new class of priors. Annals of Statistics, 48(4), 2381–2403. https://doi.org/10.1214/19-AOS1891
Web of Science ®Google Scholar
Berger, J., Sun, D., & Song, C. (2020b). An objective prior for hyperparameters in normal hierarchical models. Journal of Multivariate Analysis, 178(2020), 1–13. https://doi.org/10.1016/j.jmva.2020.104606
Google Scholar
Consonni, G., Fouskakis, D., Liseo, B., & Ntzoufras, I. (2018). Prior distributions for objective Bayesian analysis. Bayesian Analysis, 13(2), 627–679. https://doi.org/10.1214/18-BA1103
Web of Science ®Google Scholar
Daniels, M. J., & Kass, R. E. (1999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. Journal of the American Statistical Association., 94(448), 1254–1263. https://doi.org/10.1080/01621459.1999.10473878
Web of Science ®Google Scholar
Everson, P. J., & Morris, C. N. (2000). Inference for multivariate normal hierarchical models. Journal of the Royal Statistical Society: Series B, 62(2), 399–412. https://doi.org/10.1111/rssb.2000.62.issue-2
Google Scholar
Fourdrinier, D., Strawderman, W. E., & Wells, M. T. (1998). On the construction of Bayes minimax estimators. Annals of Statistics, 26(2), 660–671. https://doi.org/10.1214/aos/1028144853
Web of Science ®Google Scholar
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–534. https://doi.org/10.1214/06-BA117A
Web of Science ®Google Scholar
Goldstein, H. (2011). Multilevel statistical models (Vol. 922). John Wiley & Sons.
Google Scholar
Gustafson, P., Hossain, S., & Macnab, Y. C. (2006). Conservative prior distributions for variance parameters in hierarchical models. Canadian Journal of Statistics, 34(3), 377–390. https://doi.org/10.1002/cjs.v34:3
Web of Science ®Google Scholar
Hobert, J. P., & Casella, G. (1996). The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. Journal of the American Statistical Association, 91(436), 1461–1473. https://doi.org/10.1080/01621459.1996.10476714
Web of Science ®Google Scholar
Hoff, P. D. (2009b). Simulation of the matrix Bingham-Von Mises-Fisher distribution, with applications to multivariate and relational data. Journal of Computational and Graphical Statistics, 18(2), 438–456. https://doi.org/10.1198/jcgs.2009.07177
Web of Science ®Google Scholar
Horn, R. A., & Johnson, C. R. (2012). Matrix analysis. Cambridge university press.
Google Scholar
Lindenberger, U., & Pötter, U. (1998). The complex nature of unique and shared effects in hierarchical linear regression: Implications for developmental psychology. Psychological Methods, 3(2), 218–230. https://doi.org/10.1037/1082-989X.3.2.218
Web of Science ®Google Scholar
Michalak, S. E., & Morris, C. N. (2016). Posterior propriety for hierarchical models with log-Likelihoods that have norm bounds. Bayesian Analysis, 11(2), 545–571. https://doi.org/10.1214/15-BA962
Web of Science ®Google Scholar
Raudenbush, S., & Bryk, A. S. (1986). A hierarchical model for studying school effects. Sociology of Education, 59(1), 1–17. https://doi.org/10.2307/2112482
Web of Science ®Google Scholar
Shimotsu, K. (2010). Exact local Whittle estimation of fractional integration with unknown mean and time trend. Econometric Theory, 26(2), 501–540. https://doi.org/10.1017/S0266466609100075
Web of Science ®Google Scholar
Song, C., Sun, D., Fan, K., & Mu, R. (2020). Posterior propriety of an objective prior in a 4-Level normal hierarchical model. Mathematical Problems in Engineering, 2020. https://doi.org/10.1155/2020/8236934
Google Scholar
Speckman, P. L., & Sun, D. (2003). Fully Bayesian spline smoothing and intrinsic autoregressive priors. Biometrika, 90(2), 289–302. https://doi.org/10.1093/biomet/90.2.289
Web of Science ®Google Scholar
Sun, D., Tsutakawa, R. K., & He, Z. (2001). Propriety of posteriors with improper priors in hierarchical linear mixed models. Statistica Sinica, 11(1), 77–95. http://www.jstor.org/stable/24306811
Web of Science ®Google Scholar
Xia, A., Ma, H., & Carlin, B. P. (2011). Bayesian hierarchical modeling for detecting safety signals in clinical trials. Journal of Biopharmaceutical Statistics, 21(5), 1006–1029. https://doi.org/10.1080/10543406.2010.520181
PubMed Web of Science ®Google Scholar
Yang, R., & Berger, J. (1994). Estimation of a covariance matrix using the reference prior. Annals of Statistics, 22(3), 1195–1211. https://doi.org/10.1214/aos/1176325625
Web of Science ®Google Scholar

Appendices

Appendix 1. A special LMM

Consider a special LMM of the form:

(A1)

y = X β + Z_{1} u_{1} + \dots + Z_{r} u_{r} + ϵ,

(A1) where

y

denotes the observations and is an

n \times 1

vector,

β

is the vector of fixed effects and is an

p \times 1

vector. For

i \in [r]

u_{i}

's are

q_{i} \times 1

vectors and represent the vectors of random effects, and

u_{i}

's are assumed to be independently distributed as

N_{q_{i}} (0, W_{i})

, where

W_{i}

's are

q_{i} \times q_{i}

positive definite matrices and unknown.

X

is an

n \times p

matrix,

Z_{i}

's are

n \times q_{i}

matrices, and

X

and

Z_{i}

's are known design matrices.

ϵ

is the vector of random errors and distributed as

N_{n} (0, Σ)

Σ

is an

n \times n

positive definite matrix and given.

It follows from Berger et al. (Citation2020b) that we can assume independent priors on $(β, W_{1}, \dots, W_{r})$ as (A2) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(p - 1) / 2}}, β \in R^{p}, \\ π (W_{j}) & \propto \frac{1}{| W_{j} |^{1 - 1 / (2 q_{j})} \prod_{1 \leq s < t \leq q_{j}} (ν_{j s} - ν_{j t})}, \\ W_{j} > 0, j \in [r], \end{aligned}$ (A2) where $ν_{j 1} > ν_{j 2} > \dots > ν_{j q_{j}} > 0$ are the ordered eigenvalues of $W_{j}$ , $j \in [r]$ . The prior on $β$ has a hierarchical structure of the form $(β | τ) \sim N_{d} (0, τ I_{d}) a n d [τ] \propto τ^{- 1 / 2} \exp (- \frac{1}{2 τ}) .$ The posterior propriety results for the special LMM (EquationA1(A1) $y = X β + Z_{1} u_{1} + \dots + Z_{r} u_{r} + ϵ,$ (A1) ) is displayed as follows. Firstly, let $τ ≜ ν_{01}$ and $q_{0} = 1$ . Denote the index set of the variance scale or the eigenvalues of the covariance matrices by $F = {(j, l) | j = 0, 1, \dots, r, l \in [q_{j}]}$ , and $T = {D | D \subseteq F, D \neq \emptyset}$ represents the set of the non-empty subsets of F. Define that $c_{j l, s} = 1_{{l = s}}$ for $j \in [r]$ , $l \in [q_{j}]$ and $s \in [n]$ .

Theorem A.1

Consider linear mixed effect model (EquationA1(A1) $y = X β + Z_{1} u_{1} + \dots + Z_{r} u_{r} + ϵ,$ (A1) ) with prior (EquationA2(A2) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(p - 1) / 2}}, β \in R^{p}, \\ π (W_{j}) & \propto \frac{1}{| W_{j} |^{1 - 1 / (2 q_{j})} \prod_{1 \leq s < t \leq q_{j}} (ν_{j s} - ν_{j t})}, \\ W_{j} > 0, j \in [r], \end{aligned}$ (A2) ) on $(β, W_{1}, \dots, W_{r})$ . Assume p>1, and then the posterior is proper if (A3) $\sum_{s = 1}^{n} 1_{{1_{{(0, 1) \in D}} 1_{{s \leq p}} + \sum_{j \neq 0, (j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{q_{j}}$ (A3) holds for any $D \in T$ .

The proof of Theorem A.1 is similar to that of Theorem 3.1 and is omitted here.

Fact A.1

When p>1, (EquationA3(A3) $\sum_{s = 1}^{n} 1_{{1_{{(0, 1) \in D}} 1_{{s \leq p}} + \sum_{j \neq 0, (j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{q_{j}}$ (A3) ) holds for any $D \in T$ iff (A4) $\sum_{j \in [r]} \frac{1}{q_{j}} < 1 a n d p > 1 + \sum_{j \in [r]} \frac{min (p, q_{j})}{q_{j}} .$ (A4)

Proof.

For any $D \in T$ and $(0, 1) \notin D$ , (EquationA3(A3) $\sum_{s = 1}^{n} 1_{{1_{{(0, 1) \in D}} 1_{{s \leq p}} + \sum_{j \neq 0, (j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{q_{j}}$ (A3) ) is equivalent to (A5) $\sum_{s = 1}^{n} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{q_{j}} .$ (A5) It can be deduced that inequality (EquationA5(A5) $\sum_{s = 1}^{n} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{q_{j}} .$ (A5) ) holds for any $D \in T$ and $(0, 1) \notin D$ iff (A6) $L > \sum_{j \in [r]} \frac{min (L, q_{j})}{q_{j}}, f o r L \in [n],$ (A6) which is equivalent to $\sum_{j \in [r]} \frac{1}{q_{j}} < 1$ since $q_{j} \geq 1$ for $j \in [r]$ .

Inequality (EquationA3(A3) $\sum_{s = 1}^{n} 1_{{1_{{(0, 1) \in D}} 1_{{s \leq p}} + \sum_{j \neq 0, (j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{q_{j}}$ (A3) ) holds for any $D \in T$ with $(0, 1) \in D$ iff (A7) $L > 1 + \sum_{j \in [r]} \frac{min (L, q_{j})}{q_{j}}, f o r L = p, \dots, n .$ (A7) Under the condition that $\sum_{j \in [r]} \frac{1}{q_{j}} < 1$ , (EquationA7(A7) $L > 1 + \sum_{j \in [r]} \frac{min (L, q_{j})}{q_{j}}, f o r L = p, \dots, n .$ (A7) ) is equivalent to $p > 1 + \sum_{j \in [r]} \frac{min (p, q_{j})}{q_{j}} .$

Corollary A.1

Consider model (EquationA1(A1) $y = X β + Z_{1} u_{1} + \dots + Z_{r} u_{r} + ϵ,$ (A1) ) with prior (EquationA2(A2) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(p - 1) / 2}}, β \in R^{p}, \\ π (W_{j}) & \propto \frac{1}{| W_{j} |^{1 - 1 / (2 q_{j})} \prod_{1 \leq s < t \leq q_{j}} (ν_{j s} - ν_{j t})}, \\ W_{j} > 0, j \in [r], \end{aligned}$ (A2) ) on parameters. The posterior is proper if one of the following condition holds,

p>1 + r and $\sum_{j \in [r]} \frac{1}{q_{j}} < 1$ ;
p>1 and $\sum_{j \in [r]} \frac{1}{q_{j}} < 1 - \frac{1}{p}$ .

Proof.

Since $\sum_{j \in [r]} \frac{min (p, q_{j})}{q_{j}} \leq r a n d \sum_{j \in [r]} \frac{min (p, q_{j})}{q_{j}} \leq p \sum_{j \in [r]} \frac{1}{q_{j}},$ (a) and (b) follow from Fact A.1 directly.

Remark A.1

Consider model (EquationA1(A1) $y = X β + Z_{1} u_{1} + \dots + Z_{r} u_{r} + ϵ,$ (A1) ) with r = 1. The posterior using prior (EquationA2(A2) $\begin{aligned} π (β) & \propto \frac{1}{(1 + ‖ β ‖^{2})^{(p - 1) / 2}}, β \in R^{p}, \\ π (W_{j}) & \propto \frac{1}{| W_{j} |^{1 - 1 / (2 q_{j})} \prod_{1 \leq s < t \leq q_{j}} (ν_{j s} - ν_{j t})}, \\ W_{j} > 0, j \in [r], \end{aligned}$ (A2) ) is prior if either (a) $p \geq 2$ , $q_{1} \geq 3$ or (b) $p \geq 3$ , $q_{1} \geq 2$ holds. If p = 1, the posterior propriety can hardly be satisfied, the reason of which is two-fold. First, inequality (EquationA3(A3) $\sum_{s = 1}^{n} 1_{{1_{{(0, 1) \in D}} 1_{{s \leq p}} + \sum_{j \neq 0, (j, l) \in D} c_{j l, s} > 0}} > \sum_{(j, l) \in D} \frac{1}{q_{j}}$ (A3) ) fails for $D = {(0, 1)}$ . Second, if we follow the thread of deriving the condition in Theorem 3.3, a sufficient condition can be derived as (A8) $\sum_{s = 1}^{n} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{q_{j}},$ (A8) for any $D \in T$ with $(0, 1) \notin D$ . However, inequality (EquationA8(A8) $\sum_{s = 1}^{n} 1_{{\sum_{(j, l) \in D} c_{j l, s} > 0}} > 1_{{\exists j \in [r], (j, 1) \in D}} + \sum_{(j, l) \in D} \frac{1}{q_{j}},$ (A8) ) does not hold for $D = {(j, 1)}$ , $j \in [r]$ .

Appendix 2

Proof of lemmas in Section 3.1

Lemma A.1

min–max theorem, cf. Horn & Johnson, (Citation2012): For an $n \times n$ symmetric matrix $A$ and a non-zero $n \times 1$ vector $x$ , the Rayleigh quotient for $A$ and $x$ can be defined as $R (A, x) = \frac{〈 A x, x 〉}{〈 x, x 〉},$ where $〈 \cdot, \cdot 〉$ denotes the Euclidean inner product. Then (A9) $\begin{aligned} λ_{k} (A) & = max_{U} {min_{x} {R (A, x) | x \in U, \\ ‖ x ‖ \neq 0} | \dim (U) = k}, k \in [n] \end{aligned}$ (A9) where U denotes the linear subspace of $R^{n}$ . Especially, (A10) $\begin{aligned} λ_{max} (A) & = max_{x} {R (A, x) | ‖ x ‖ \neq 0} a n d \\ λ_{min} & = min_{x} (A) {R (A, x) | ‖ x ‖ \neq 0} . \end{aligned}$ (A10)

Lemma A.2

For $n \times n$ symmetric matrices $A_{j}$ , $j \in [r]$ , we have

$λ_{max} (\sum_{j = 1}^{r} A_{j}) \leq \sum_{j = 1}^{r} λ_{max} (A_{j})$ ;
supposing $A_{j} \geq 0$ , $j \in [r]$ , then $λ_{k} (\sum_{j = 1}^{r} A_{j}) \geq \frac{1}{r} \sum_{j = 1}^{r} λ_{k} (A_{j}), k \in [n] .$

Proof.

For (a), given an $n \times 1$ non-zero vector $x$ , by (EquationA10(A10) $\begin{aligned} λ_{max} (A) & = max_{x} {R (A, x) | ‖ x ‖ \neq 0} a n d \\ λ_{min} & = min_{x} (A) {R (A, x) | ‖ x ‖ \neq 0} . \end{aligned}$ (A10) ), (A11) $\begin{aligned} R (\sum_{j = 1}^{r} A_{j}, x) & = \sum_{j = 1}^{r} R (A_{j}, x) \leq \sum_{j = 1}^{r} max_{x_{j} \neq 0} R (A_{j}, x_{j}) \\ = \sum_{j = 1}^{r} λ_{max} (A_{j}) . \end{aligned}$ (A11) The proof for (a) is completed by using (EquationA10(A10) $\begin{aligned} λ_{max} (A) & = max_{x} {R (A, x) | ‖ x ‖ \neq 0} a n d \\ λ_{min} & = min_{x} (A) {R (A, x) | ‖ x ‖ \neq 0} . \end{aligned}$ (A10) ) again.

For (b), it suffices to prove that for any $j \in [r]$ (A12) $λ_{k} (\sum_{j = 1}^{r} A_{j}) \geq λ_{k} (A_{j}), i \in [n] .$ (A12) Since $A_{l} \geq 0$ , $l \in [r]$ , for any $j \in [r]$ , $x \in R^{n}$ and $x \neq 0$ , $R (\sum_{l = 1}^{r} A_{l}, x) = \sum_{l = 1}^{r} R (A_{l}, x) \geq R (A_{j}, x) .$ Minimize the both sides of the inequality above over ${x | x \in U, x \neq 0}$ first. Then take the maximum over ${U | U \subseteq R^{n}, \dim (U) = k}$ . (EquationA12(A12) $λ_{k} (\sum_{j = 1}^{r} A_{j}) \geq λ_{k} (A_{j}), i \in [n] .$ (A12) ) can be easily obtained by using Lemma A.1.

A.1. Proof of Lemma 3.1

For (a), by Lemma A.2 (a), it suffices to prove that $λ_{max} (X_{j} A_{j} X_{j}^{⊤}) \leq C_{1} λ_{max} (A_{j})$ , $j \in [r]$ . For any $j \in [r]$ , applying (EquationA10(A10) $\begin{aligned} λ_{max} (A) & = max_{x} {R (A, x) | ‖ x ‖ \neq 0} a n d \\ λ_{min} & = min_{x} (A) {R (A, x) | ‖ x ‖ \neq 0} . \end{aligned}$ (A10) ), yields $\begin{aligned} 0 < λ_{max} (X_{j} A_{j} X_{j}^{⊤}) & = R (X_{j} A_{j} X_{j}^{⊤}, x^{*}) \\ = max_{x \neq 0} R (X_{j} A_{j} X_{j}^{⊤}, x) . \end{aligned}$ It is obvious that $X_{j}^{⊤} x^{*} \neq 0$ , otherwise, it will result in $R (X_{j} A_{j} X_{j}^{⊤}, x^{*}) = 0$ , which contradicts. In addition, since $〈 X_{j}^{⊤} x^{*}, X_{j}^{⊤} x^{*} 〉 \leq λ_{max} (X_{j}^{⊤} X_{j}) 〈 x^{*}, x^{*} 〉 \leq C_{1} 〈 x^{*}, x^{*} 〉$ , we have $\begin{aligned} R (X_{j} A_{j} X_{j}^{⊤}, x^{*}) & \leq C_{1} R (A_{j}, X_{j}^{⊤} x^{*}) \\ \leq C_{1} max {R (A_{j}, z) | z \in R^{p_{j}}, z \neq 0} \\ = C_{1} λ_{max} (A_{j}) . \end{aligned}$ Therefore, we have proved part (a).

For (b), we only need to prove that $λ_{k} (H) \geq \frac{C_{2}}{r} \sum_{j = 1}^{r} a_{j k}, k \in [n] .$ It follows from Lemma A.2 (b) that for any $k \in [n]$ $λ_{k} (H) \geq \frac{1}{r} \sum_{j = 1}^{r} λ_{k} (X_{j} A_{j} X_{j}^{⊤}) .$ Since $r a n k (X_{j} A_{j} X_{j}^{⊤}) = r a n k (A_{j})$ , then $λ_{k} (X_{j} A_{j} X_{j}^{⊤}) = 0$ , $p_{j} < k \leq n$ . Thus, it remains to show that (A13) $λ_{k} (X_{j} A_{j} X_{j}^{⊤}) \geq C_{2} λ_{k} (A_{j}), j \in [r], k \in [p_{j}] .$ (A13) Firstly, for any $j \in [r]$ , we introduce a linear transformation $L_{j} : R^{n} \to R^{p_{j}}$ defined as $L_{j} (ν) = X_{j}^{⊤} ν$ , $ν \in R^{n}$ . The kernel space of $L_{j}$ is denoted by $K e r (L_{j}) = {ν \in R^{n} : L_{j} (ν) = 0}$ . Since $X_{j}$ is of full column rank $p_{j} \leq n$ , then the dimension of the complementary space of $K e r (L_{j})$ is $p_{j}$ , i.e., $\dim (K e r (L_{j})^{⊥}) = p_{j}$ . Thus, the mapping $L_{j} : K e r (L_{j})^{⊥} \mapsto R^{p_{j}}$ is a one-to-one mapping. For any $U \subseteq R^{p_{j}}$ with $\dim (U) = k$ , $k \in [p_{j}]$ , define that $L_{j}^{*} (U) = {ν \in K e r (L_{j})^{⊥} : L_{j} (ν) = x, x \in U} .$ It is obvious that $L_{j}^{*} (U) \subseteq K e r (L_{j})^{⊥}$ and $\dim (L_{j}^{*} (U)) = k$ . For any $U \subseteq R^{p_{j}}$ with $\dim (U) = k$ and any $x \in U$ , there exists one and only one $ν \in L_{j}^{*} (U)$ such that $L_{j} (ν) = x$ . Since $〈 x, x 〉 = ν^{⊤} (X_{j}^{⊤} X_{j}) ν \geq C_{2} 〈 ν, ν 〉$ , we have (A14) $R (X_{j} A_{j} X_{j}^{⊤}, ν) \geq C_{2} R (A_{j}, x) .$ (A14) It refers to Lemma A.1 that (A15) $\begin{aligned} λ_{k} (X_{j} A_{j} X_{j}^{⊤}) & = max_{V} {min_{v} {R (X_{j} A_{j} X_{j}^{⊤}, ν) | ν \in V, ‖ ν ‖ \neq 0} \\ | V \subseteq K e r (L_{j})^{⊥}, \dim (V) = k}, \end{aligned}$ (A15) for $k \in [p_{j}]$ . Minimize the both sides of inequality (EquationA14(A14) $R (X_{j} A_{j} X_{j}^{⊤}, ν) \geq C_{2} R (A_{j}, x) .$ (A14) ) over ${x | x \in U, x \neq 0}$ first. Then take the maximum over ${U | U \subseteq R^{p_{j}}, \dim (U) = k}$ , (EquationA13(A13) $λ_{k} (X_{j} A_{j} X_{j}^{⊤}) \geq C_{2} λ_{k} (A_{j}), j \in [r], k \in [p_{j}] .$ (A13) ) can be easily obtained by using Lemma A.1 and (EquationA15(A15) $\begin{aligned} λ_{k} (X_{j} A_{j} X_{j}^{⊤}) & = max_{V} {min_{v} {R (X_{j} A_{j} X_{j}^{⊤}, ν) | ν \in V, ‖ ν ‖ \neq 0} \\ | V \subseteq K e r (L_{j})^{⊥}, \dim (V) = k}, \end{aligned}$ (A15) ).

A.2. Proof of Lemma 3.2

The Domain of the integral can be divided into $\begin{aligned} Ω_{0} = {λ | 0 \leq λ_{j} \leq 1, j \in [k]}, \\ Ω_{D} = {λ | λ_{j} > 1, j \in D a n d 0 \leq λ_{i} \leq 1, i \in [k] / D}, \\ D \in F ([k]) \end{aligned}$ i.e., $Ω = (⋃_{D \in F ([k])} Ω_{D}) ⋃ Ω_{0}$ . Thus, the integral I is finite iff the integrals over $Ω_{0}$ and $Ω_{D}$ for each $D \in F ([k])$ are finite.

Denote the integrand as $F (λ)$ . Then $\int_{Ω_{0}} F (λ) d λ ≃ \int_{Ω_{0}} \frac{1}{\prod_{j = 1}^{k} λ_{j}^{a_{j}}} d λ,$ which is finite iff condition (a) is satisfied.

To verify condition (b), we only need to justify the following statement. Also, we assume condition (a) is always satisfied hereafter.

Fact A.2

For all $D \in F ([k])$ with $card (D) = L$ , $1 \leq L \leq k$ , the integrals $\int_{Ω_{D}} F (λ) d λ$ are finite iff inequalities (A16) $\sum_{j \in E} a_{j} + \sum_{i \in G_{E}} b_{i} > c a r d (E)$ (A16) hold for all $E \in F ([k])$ with $card (E) \leq L$ and $G_{E} = {i | E ⋂ C_{i} \neq \emptyset, i \in [n]}$ . Under the condition above, (A17) $\begin{aligned} \int_{Θ_{D} (t)} G_{D} (λ) (\prod_{r \in D} d λ_{r}) \\ ≃ \exp {- \log t (\sum_{j \in D} a_{j} + \sum_{i \in G_{D}} b_{i} - c a r d (D))} \end{aligned}$ (A17) always holds, where $\begin{aligned} Θ_{D} (t) & = {λ | λ_{j} \geq t, j \in D a n d 0 \leq λ_{i} \leq 1, i \in [k] / D}, \\ G_{D} (λ) & = \frac{1}{[\prod_{j \in D} λ_{j}^{a_{j}}] [\prod_{i \in G_{D}} (1 + \sum_{r \in C_{i}} λ_{r})^{b_{i}}]} . \end{aligned}$

The reason why formula (EquationA17(A17) $\begin{aligned} \int_{Θ_{D} (t)} G_{D} (λ) (\prod_{r \in D} d λ_{r}) \\ ≃ \exp {- \log t (\sum_{j \in D} a_{j} + \sum_{i \in G_{D}} b_{i} - c a r d (D))} \end{aligned}$ (A17) ) is required is that it plays an important role in verifying condition (EquationA16(A16) $\sum_{j \in E} a_{j} + \sum_{i \in G_{E}} b_{i} > c a r d (E)$ (A16) ).

Proof.

We prove the result by the technique of mathematical induction. First, we assume that the statement in Fact A.2 is true for L = l, $1 \leq l \leq (k - 1)$ . With this assumption, we must show that the statement is true for its successor, $L = (l + 1)$ . Write an arbitrary set $D \in F ([k])$ with cardinality $(l + 1)$ as ${j_{1}, \dots, j_{l + 1}}$ , where $1 \leq j_{1} < \dots < j_{l + 1} \leq k$ . Denote that $D_{- j_{i}} = D / [{j_{i}}]$ , $i = 1, \dots, (l + 1)$ .

Step 1: We first prove that $\int_{Ω_{D}} F (λ) d λ$ is finite iff inequalities (EquationA16(A16) $\sum_{j \in E} a_{j} + \sum_{i \in G_{E}} b_{i} > c a r d (E)$ (A16) ) hold for $L = (l + 1)$ .

Region $Ω_{D}$ can be divided into $\begin{aligned} Σ_{1} & = {λ | λ_{j} \geq λ_{j_{1}} > 1, j \in D_{- j_{1}} a n d \\ 0 \leq λ_{i} \leq 1, i \in [k] / D} \\ ⋮ \\ Σ_{l + 1} & = {λ | λ_{j} \geq λ_{j_{l + 1}} > 1, j \in D_{- j_{l + 1}} \\ a n d 0 \leq λ_{i} \leq 1, i \in [k] / D} . \end{aligned}$ Therefore, integral $\int_{Ω_{D}} F (λ) d λ$ is finite iff $\int_{Σ_{i}} F (λ) d λ < \infty$ for any $i = 1, \dots, (l + 1)$ . For $i = 1, \dots, (l + 1)$ , we have $\int_{Σ_{i}} F (λ) d λ ≃ \int_{Σ_{i}} \frac{1}{\prod_{s \notin D} λ_{s}^{a_{s}}} {(\frac{1}{λ_{j_{i}}})}^{a_{j_{i}} + \sum_{r \in H_{i}} b_{r}} G_{D_{- j_{i}}} (λ) d λ,$ where $H_{i} = {r | C_{r} ⋂ D_{j_{i}} = \emptyset a n d j_{i} \in C_{r}, r \in [n]}$ , and it's easy to see that $G_{D} = G_{D_{- j_{i}}} ⋃ H_{i}$ and $G_{D_{- j_{i}}} ⋂ H_{i} = \emptyset$ . Since (A18) $\begin{aligned} \int_{Σ_{i}} G_{D_{- j_{i}}} (λ) (\prod_{r \in D_{- j_{i}}} d λ_{r}) \\ ≃ \int_{Θ_{D_{- j_{i}}} (λ_{j_{i}})} G_{D_{- j_{i}}} (λ) (\prod_{r \in D_{- j_{i}}} d λ_{r}), \end{aligned}$ (A18) (A19) $\begin{aligned} \int_{Ω_{D_{- j_{i}}}} F (λ) d λ ≃ \int_{Ω_{D_{- j_{i}}}} \frac{1}{\prod_{s \notin D_{- j_{i}}} λ_{s}^{a_{s}}} G_{D_{- j_{i}}} (λ) d λ \end{aligned}$ (A19) and the RHS (Right-Hand Side) of (EquationA19(A19) $\begin{aligned} \int_{Ω_{D_{- j_{i}}}} F (λ) d λ ≃ \int_{Ω_{D_{- j_{i}}}} \frac{1}{\prod_{s \notin D_{- j_{i}}} λ_{s}^{a_{s}}} G_{D_{- j_{i}}} (λ) d λ \end{aligned}$ (A19) ) and (EquationA18(A18) $\begin{aligned} \int_{Σ_{i}} G_{D_{- j_{i}}} (λ) (\prod_{r \in D_{- j_{i}}} d λ_{r}) \\ ≃ \int_{Θ_{D_{- j_{i}}} (λ_{j_{i}})} G_{D_{- j_{i}}} (λ) (\prod_{r \in D_{- j_{i}}} d λ_{r}), \end{aligned}$ (A18) ) are finite simultaneously under condition (a). Hence, The LHS (Left-Hand Side) of (EquationA18(A18) $\begin{aligned} \int_{Σ_{i}} G_{D_{- j_{i}}} (λ) (\prod_{r \in D_{- j_{i}}} d λ_{r}) \\ ≃ \int_{Θ_{D_{- j_{i}}} (λ_{j_{i}})} G_{D_{- j_{i}}} (λ) (\prod_{r \in D_{- j_{i}}} d λ_{r}), \end{aligned}$ (A18) ) is finite iff $\int_{Ω_{D_{- j_{i}}}} F (λ) d λ$ is finite.

Furthermore, by assumption and (EquationA17(A17) $\begin{aligned} \int_{Θ_{D} (t)} G_{D} (λ) (\prod_{r \in D} d λ_{r}) \\ ≃ \exp {- \log t (\sum_{j \in D} a_{j} + \sum_{i \in G_{D}} b_{i} - c a r d (D))} \end{aligned}$ (A17) ), we have $\begin{aligned} \int_{Σ_{i}} G_{D_{- j_{i}}} (λ) (\prod_{r \in D_{- j_{i}}} d λ_{r}) \\ ≃ \exp {- \log λ_{j_{i}} (\sum_{j \in D_{- j_{i}}} a_{j} + \sum_{r \in G_{D_{- j_{i}}}} b_{r} - l)} . \end{aligned}$ Thus, under condition (a) and assumption, we have $\int_{Σ_{i}} F (λ) d λ ≃ \int_{1}^{\infty} {(\frac{1}{λ_{j_{i}}})}^{\sum_{j \in D} a_{j} + \sum_{r \in G_{D}} b_{r} - l} d λ_{j_{i}},$ the RHS of which is finite iff $\sum_{j \in D} a_{j} + \sum_{r \in G_{D}} b_{r} > 1 + l = card (D)$ .

In conclusion, $\int_{Ω_{D}} F (λ) d λ$ is finite iff $\sum_{j \in D} a_{j} + \sum_{r \in G_{D}} b_{r} > card (D)$ and $\int_{Ω_{D_{- j_{i}}}} F (λ) d λ$ is finite for any $i \in [l + 1]$ . Since D is arbitrary and $card (D_{- j_{i}}) = l$ , we have accomplished the goal of Step 1.

Step 2: Next, we prove that formula (EquationA17(A17) $\begin{aligned} \int_{Θ_{D} (t)} G_{D} (λ) (\prod_{r \in D} d λ_{r}) \\ ≃ \exp {- \log t (\sum_{j \in D} a_{j} + \sum_{i \in G_{D}} b_{i} - c a r d (D))} \end{aligned}$ (A17) ) holds for D with cardinality $(l + 1)$ .

Region $Θ_{D} (t)$ can be divided into $\begin{aligned} Θ_{D}^{(1)} (t) & = {λ | λ_{j} \geq λ_{j_{1}} \geq t, j \in D_{- j_{1}} a n d \\ 0 \leq λ_{i} \leq 1, i \in [k] / D} \\ ⋮ \\ Θ_{D}^{(l + 1)} (t) & = {λ | λ_{j} \geq λ_{j_{l + 1}} \geq t, j \in D_{- j_{l + 1}} a n d \\ 0 \leq λ_{i} \leq 1, i \in [k] / D} . \end{aligned}$ Similar to the proof of Step 1, we can prove that for $i = 1, \dots, (l + 1)$ , $\begin{aligned} \int_{Θ_{D}^{(i)} (t)} G_{D} (λ) (\prod_{r \in D} d λ_{r}) \\ ≃ \int_{t}^{\infty} {(\frac{1}{λ_{j_{i}}})}^{\sum_{j \in D} a_{j} + \sum_{r \in G_{D}} b_{r} - l} d λ_{j_{i}} \\ = \exp {- \log t (\sum_{j \in D} a_{j} + \sum_{r \in G_{D}} b_{r} - c a r d (D))} . \end{aligned}$ Therefore, we get Step 2 proved.

Step 3: We need to present that the statement is true for L = 1 to complete the proof, on the basis of mathematical induction.

Denote that $D = {r}$ , $r = 1, \dots, k$ . Then $\int_{Ω_{D}} F (λ) d λ ≃ \int_{1}^{\infty} {(\frac{1}{λ_{r}})}^{a_{r} + \sum_{i \in G_{D}} b_{i}} d λ_{r},$ which is finite iff $\sum_{j \in D} a_{j} + \sum_{i \in G_{D}} b_{i} > 1 = card (D)$ , under which, $\begin{aligned} \int_{Θ_{D} (t)} G_{D} (λ) (\prod_{i \in D} d λ_{i}) \\ ≃ \int_{t}^{\infty} {(\frac{1}{λ_{r}})}^{a_{r} + \sum_{i \in G_{D}} b_{i}} d λ_{r} \\ = \exp {- \log t (\sum_{j \in D} a_{j} + \sum_{i \in G_{D}} b_{i} - c a r d (D))}, \end{aligned}$ which accomplishes the proof of Fact A.2.

Appendix 3

Gibbs sampling from the SIW distributions

As for the SIW distribution (Equation46(46) $π^{S I W} (W | a, H) \propto \frac{e t r (- \frac{1}{2} W^{- 1} H)}{{| W |}^{a} \prod_{i < j} (ν_{i} - ν_{j})},$ (46) ), we first consider the change of variables from $W$ to $Ξ = d i a g (ν_{1}, \dots, ν_{k})$ and the orthogonal matrix $O$ of corresponding eigenvectors. The Jacobian is (A20) $| \frac{\partial W}{\partial (Ξ, O)} | = \prod_{i < j} (ν_{i} - ν_{j}) .$ (A20) According to (EquationA20(A20) $| \frac{\partial W}{\partial (Ξ, O)} | = \prod_{i < j} (ν_{i} - ν_{j}) .$ (A20) ) and Lemma 4 in Berger et al. (Citation2020a), (Equation46(46) $π^{S I W} (W | a, H) \propto \frac{e t r (- \frac{1}{2} W^{- 1} H)}{{| W |}^{a} \prod_{i < j} (ν_{i} - ν_{j})},$ (46) ) can be transformed to (A21) $π (Ξ, O) \propto \frac{1}{{| Ξ |}^{a}} e t r (- \frac{1}{2} Ξ^{- 1} O^{'} H O) .$ (A21) Gibbs sampling of $Ξ$ : We first sample $Ξ$ given $(O, H)$ from $\begin{aligned} π (Ξ | O, H) & \propto \frac{1}{\prod_{i = 1}^{k} ν_{i}^{a}} e t r (- \frac{1}{2} Ξ^{- 1} O^{'} H O) \\ = \prod_{i = 1}^{k} \frac{1}{ν_{i}^{a}} e t r (- \frac{c_{i}}{ν_{i}}), \end{aligned}$ where $c_{i}$ is the i-th diagonal element of $O^{'} H O / 2$ , $i \in [k]$ . Therefore, we can sample $ν_{i}$ independently from $I G (a - 1, c_{i})$ .

Gibbs sampling of $O$ : Given $(Ξ, H)$ , the marginal density of $O$ has the form: $π (O | Ξ, H) \propto e t r (- \frac{1}{2} H O Ξ^{- 1} O^{'}) .$ Let $H = L U L^{⊤}$ , where $L L^{⊤} = I_{k}$ and $U = d i a g (u_{1}, \dots, u_{k})$ is the diagonal matrix of corresponding eigenvalues with $u_{1} \geq \dots \geq u_{k}$ . Define $G = L^{⊤} O$ . Since the invariant right Haar measure is invariant to the orthonormal transformation, the conditional density of $G$ is (A22) $π (G | Ξ, H) \propto e t r (- \frac{1}{2} U G Ξ^{- 1} G^{'}) .$ (A22) The updating of $G$ from (EquationA22(A22) $π (G | Ξ, H) \propto e t r (- \frac{1}{2} U G Ξ^{- 1} G^{'}) .$ (A22) ) can be implemented by applying a Gibbs update to two randomly selected columns (cf. Hoff, Citation2009b) or rows (cf. Berger et al., Citation2020a). The two ways are essentially equivalent when $r a n k (H) = k$ , but Berger et al. (Citation2020a)'s method is considerably faster if $r a n k (H) < k$ . Without any loss, assume that the two randomly selected rows are the first and second rows. The updated value of $G$ can be written as $G^{n e w} = d i a g (Φ, I_{k - 2}) (\begin{matrix} G_{12}^{o l d} \\ G_{- 12}^{o l d} \end{matrix})$ , where $G_{12}^{o l d}$ denotes the first two rows of the old value of $G$ which is $G^{o l d}$ , $G_{- 12}^{o l d}$ is the remaining k−2 rows of $G^{o l d}$ and $Φ = D_{ϵ} Φ_{0} = (\begin{matrix} ϵ_{1} & 0 \\ 0 & ϵ_{2} \end{matrix}) (\begin{matrix} \cos ϕ & - \sin ϕ \\ \sin ϕ & \cos ϕ \end{matrix}),$ with $ϕ \in (- \frac{π}{2}, \frac{π}{2}]$ and $ϵ_{i} = \pm 1$ for i = 1, 2. Let $U_{1} = d i a g (u_{1}, u_{2})$ . The full conditional density of ϕ has the form: $π (ϕ | G^{o l d}, Ξ, H) \propto e t r {- \frac{1}{2} U_{1} Φ_{0} G_{12}^{o l d} Ξ^{- 1} {(G_{12}^{o l d})}^{⊤} Φ_{0}^{⊤}} .$ Write $\begin{aligned} G_{12}^{o l d} Ξ^{- 1} {(G_{12}^{o l d})}^{⊤} & = (\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}) (\begin{matrix} s_{1} & 0 \\ 0 & s_{2} \end{matrix}) \\ \times (\begin{matrix} \cos θ & \sin θ \\ - \sin θ & \cos θ \end{matrix}) \end{aligned}$ where $θ \in (- \frac{π}{2}, \frac{π}{2}]$ and $s_{1} > s_{2}$ . Then the conditional density of ϕ can be rewritten as $π (ϕ | G^{o l d}, Ξ, H) \propto \exp {- c_{0} \cos^{2} (ϕ + θ)},$ where $c_{0} = \frac{1}{2} (s_{1} - s_{2}) (u_{1} - u_{2}) \geq 0$ . Define $α = \cos^{2} (ϕ + θ)$ . Then the full conditional density of α has the form: $\begin{aligned} π (α | G^{o l d}, Ξ, H) & \propto \exp {- c_{0} α} α^{- \frac{1}{2}} (1 - α)^{- \frac{1}{2}}, \\ α \in [0, 1] . \end{aligned}$ Simulating $α \in [0, 1]$ can proceed with a rejection sampler by setting the proposal distribution as $B e t a (\frac{1}{2}, \frac{1}{2})$ .

Posterior propriety of an objective prior for generalized hierarchical normal linear models

ABSTRACT

1. Introduction

2. Generalized hierarchical normal linear model

2.1. Model structure

2.2. Connection with the linear mixed-effect model $($ LMM $)$

Mixed-effect ANOVA model

2.3. Priors on the hyperparameters

3. Posterior propriety

3.1. Two key lemmas

3.2. Conditions for the posterior to be proper when $d \geq 2$

Continue with Example 2.1

3.3. Conditions for the posterior to be proper when d = 1

Continue with Example 2.1

4. Computation

4.1. Gibbs sampling for input effects

4.2. Gibbs sampling for variance components

5. Discussions

Disclosure statement

References

Appendices

Appendix 1. A special LMM

Appendix 2

Proof of lemmas in Section 3.1

A.1. Proof of Lemma 3.1

A.2. Proof of Lemma 3.2

Appendix 3

Gibbs sampling from the SIW distributions

Information for

Open access

Opportunities

Help and information

Posterior propriety of an objective prior for generalized hierarchical normal linear models

ABSTRACT

1. Introduction

2. Generalized hierarchical normal linear model

2.1. Model structure

2.2. Connection with the linear mixed-effect model ( LMM )

Mixed-effect ANOVA model

2.3. Priors on the hyperparameters

3. Posterior propriety

3.1. Two key lemmas

3.2. Conditions for the posterior to be proper when d≥2

Continue with Example 2.1

3.3. Conditions for the posterior to be proper when d = 1

Continue with Example 2.1

4. Computation

4.1. Gibbs sampling for input effects

4.2. Gibbs sampling for variance components

5. Discussions

Disclosure statement

Additional information

Funding

References

Appendices

Appendix 1. A special LMM

Appendix 2

Proof of lemmas in Section 3.1

A.1. Proof of Lemma 3.1

A.2. Proof of Lemma 3.2

Appendix 3

Gibbs sampling from the SIW distributions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

2.2. Connection with the linear mixed-effect model $($ LMM $)$

3.2. Conditions for the posterior to be proper when $d \geq 2$