Full article: A robustness evaluation of Bayesian tests for longitudinal data

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Linear mixed models are standard models to analyze repeated measures or longitudinal data under the assumption of normality for random components in the model. Although the mixed models are often used in both frequentist and Bayesian inference, their evaluation from robustness perspective has not received as much attention in Bayesian inference as in frequentist. The aim of this study is to evaluate Bayesian tests in mixed models for their robustness to normality. We use a general class of exponential power distributions, EPD, and particularly focus on testing fixed effects in longitudinal models. The EPD class contains both light and heavy tailed distributions, with normality as a special case. Further, we consider a new paradigm of Bayesian testing decision theory where the hypotheses are formulated as a mixture model, with subsequent testing based on the posterior distribution of the mixture weights. It is shown that the EPD class provides a flexible alternative to normality assumption, particularly in the presence of outliers. Real data applications are also demonstrated.

Keywords:

1. Introduction

Linear mixed models (LMM) provide standard tools to analyze longitudinal or repeated measures data in both Bayesian and frequentist inference. The LMMs offer much flexibility for modeling a variety of covariance structures and can be used for both balanced and unbalanced data. Furthermore, their implementation is facilitated through the availability of statistical software. Most of the modeling of continuous data in real life problems through LMMs is essentially based on normality assumption for the random components of the postulated model. When the assumption is not tenable, the tests and confidence intervals are no longer valid.

In Bayesian theory of LMMs, the setting of priors may have much flexibility but normality assumption is the main source of likelihood in mixed models. This leads to serious consequences for posterior modeling if the likelihood part is misspecified. The problem obviously exacerbates if the data additionally contain outliers. Such robustness aspect in Bayesian context has not been considered as often as in frequentist case, although there have been a few studies in this direction.

For example, skewness in the random components of the LMM has been considered by Arellano-Valle, Bolfarine, and Lachos (Citation2005), Lin and Lee (Citation2008), Zhang and Davidian (Citation2001). Of particular interest is the study of influence of outliers using multivariate t and Laplace distributions; see e.g., Lange, Little, and Taylor (Citation1989), Pinheiro, Liu, and Wu (Citation2001), Rukhin and Possolo (Citation2011), Yavuz and Arslan (Citation2018). Another approach focuses directly on the log-likelihood by replacing the quadratic term in the normal distribution by a slower growing function (Huggins Citation1993).

The aforementioned references deal with a specific distribution as a replacement of normality, hence lacking a broader perspective of evaluation. Our main objective in this article is to evaluate the robustness aspect for Bayesian testing theory from the perspective of a general class of distributions, called exponential power distributions (EPD), which includes uniform, Laplace and normal distributions. The univariate EPD class was introduced by Box and Tiao (Citation1962) for the purpose of studying robustness of Bayesian t-test; see also Box and Tiao (Citation1964). We use its multivariate extension in the context of Bayesian testing of fixed effects in a mixed model set up for longitudinal data. The EPD class makes a subclass of elliptically contoured distributions which is widely used, particularly in frequentist inference for similar purposes.

Extension of the EPD class to the multivariate case was given in Gómez, Gomez-Viilegas, and Marín (Citation1998). For its use as an alternative to normality in Bayesian inference, see e.g., Choy and Walker (Citation2003), Haro-López and Smith (Citation1999), Lindsey (Citation1999), Walker and Gutierrex-Pena (Citation1998). Like the normal distribution, the EPD is parametrized by location and variance parameters, but with an additional parameter which determines the kurtosis and makes the EPD class particularly attractive to study robustness. Setting this parameter to 1 reduces the EPD to normal. Otherwise, the distribution has heavier or lighter tails than normal, depending on the values of the parameter. For a special application of the EPD class in cross-over experiments, see Lindsey (Citation1999).

The aforementioned aspect needs special emphasis in the context of Bayesian inference. Generally, the setting of priors provides most flexibility in Bayesian inference, whereas the likelihood comes from a relatively restricted assumption. If, however, the likelihood is misspecified, the resulting posterior distribution leads to seriously misleading predictive inference. Considered from this perspective, the EPD class provides alternative likelihood sources for Bayesian testing theory when normality assumption is suspect, apart from being an effective source of assessing robustness. A more technical discussion on the structure of the EPD, supplemented with graphs, is provided in Section 2.

After providing a brief orientation to the EPD class in Section 2, along with the form of mixed models to be considered under the EPD, the Bayesian framework of the problem is given in Secction 3.1, with their corresponding Markov chain Monte Carlo (MCMC) algorithms given in Section 3.2. A detailed simulation study focusing on the use of EPD for robustness in the considered models is provided in Section 3.3, where its application on real data is illustrated in Section 4.

2. Preliminaries

Consider the LMM (1) $y_{i} = X_{i} β + Z_{i} b_{i} + e_{i}, i = 1, \dots, m,$ (1) where $y_{i} \in R^{n_{i}}$ is the response vector on the ith individual, $X_{i} \in R^{n_{i} \times p}$ and $Z_{i} \in R^{n_{i} \times q}$ are the design matrices for the fixed and random effects respectively, with $β \in R^{p}$ and $b_{i} \in R^{q}$ the corresponding parameter vectors. We denote $N = \sum_{i = 1}^{m} n_{i}$ as the total number of observations. For inferential purposes, it is commonly assumed that (2) $b_{i} \sim N_{q} (0, Ψ), e_{i} \sim N_{n_{i}} (0, σ^{2} I_{n_{i}}),$ (2) where $Ψ \in R^{q \times q}$ is a symmetric positive definite matrix and $I_{n_{i}}$ is the $n_{i} \times n_{i}$ identity matrix. $Ψ$ is assumed unknown and with no specific structure. It is further assumed that $b_{i}$ and $e_{i}$ are independent.

Model (1) covers a wide variety of models as special cases. One of the simplest special cases is the one-way repeated measures ANOVA model which we shall be particularly dealing with. For this, $X_{i} = Z_{i} = 1_{n_{i}}$ where $1_{n_{i}}$ is a $n_{i} \times 1$ vector of ones. To avoid confusion, the fixed and random effects under the ANOVA model shall be denoted by μ and α_i respectively. Model (1) reduces then to its one-way repeated measures ANOVA form as (3) $y_{i} = μ 1_{n_{i}} + α_{i} 1_{n_{i}} + e_{i}, i = 1, \dots, m .$ (3) The distributional assumptions correspondingly reduce to (4) $α_{i} \sim N (0, τ), e_{i} \sim N_{n_{i}} (0, σ^{2} I_{n_{i}}),$ (4) where α_i and $e_{i}$ are assumed independent.

The previously outlined setting is often using in frequentist inference of Model (1) and its special form. Our purpose is to assess the tests of the fixed effects components of these models in a Bayesian context under the EPD class. The normality assumptions stated above will therefore be replaced with their counterparts under the EPD setting. For this, we provide a brief outline of the EPD class and mention some essential ingredients that will be frequently referred to in the sequel.

For a random variable y, the pdf of the univariate EPD is defined as (Box and Tiao Citation1962) (5) $f (y; μ, σ, κ) = {(σ Γ (1 + {(2 κ)}^{- 1}) 2^{1 + \frac{1}{2 κ}})}^{- 1} exp {- \frac{1}{2} | \frac{y - μ}{σ} |^{2 κ}}, μ \in R, σ \in R^{+}, κ \in R^{+},$ (5) with mean and variance $E (y) = μ and Var (y) = \frac{2^{\frac{1}{κ}} Γ (\frac{3}{2 κ}) σ^{2}}{Γ (\frac{1}{2 κ})},$ where κ is the kurtosis parameter, indicating the extent of non-normality. For κ = 1, (5) reduces to the normal distribution, where the distribution is leptokurtic for $κ < 1$ and platykurtic for $κ > 1 .$

The pdf of the multivariate extension of the EPD is given as (Gómez, Gomez-Viilegas, and Marín Citation1998) (6) $f (y; μ, Σ, κ) = \frac{p Γ (\frac{p}{2})}{π^{\frac{p}{2}} Γ (1 + \frac{p}{2 β}) 2^{1 + \frac{p}{2 κ}}} | Σ |^{- \frac{1}{2}} exp {- \frac{1}{2} {({(y - μ)}^{T} Σ^{- 1} (y - μ))}^{κ}},$ (6) with $E (y) = μ and Var (y) = \frac{2^{\frac{1}{κ}} Γ (\frac{p + 2}{2 κ})}{p Γ (\frac{p}{2 κ})} Σ,$ where $μ \in R^{p}$ is the mean vector, $Σ \in R^{p \times p}$ is the covariance matrix and $κ \in R^{+} .$ We denote $y \sim E P_{p} (μ, Σ, κ) .$ Like the univariate case, the most important parameter in the multivariate EPD, particularly from the perspective of studying it as an extension to the multivariate normal distribution, is κ. depicts the pdf in (6) for $κ = {0.5, 1, 5} .$

Figure 1. The density function of $E P_{2} (μ, Σ, κ)$ displayed for $κ = (0.5, 1, 5) .$ Special cases of multivariate Laplace in (a) and multivariate normal in (b).

For $κ \in (0, 1],$ a convenient re-formulation of the multivariate EPD is in terms of scale mixture of normals (Gómez, Gómez-Villegas, and Marín Citation2008), as (7) $f (y; μ, Σ, κ) = \int_{R^{+}} N_{p} (y; μ, v^{2} Σ) d H_{κ} (v),$ (7) where $N_{p} (\cdot; μ, Σ)$ denotes a p-variate normal distribution and $H_{κ}$ is a one-dimensional distribution function with density function (8) $\begin{matrix} h_{κ} (v) = \frac{2^{1 + \frac{p}{2} (1 - \frac{1}{κ})} Γ (1 + \frac{p}{2})}{Γ (1 + \frac{p}{2 κ})} v^{p - 3} S (v^{- 2}; κ, 1, γ_{κ}, δ_{κ}), v > 0, \\ γ_{κ} = 2^{1 - \frac{1}{κ}} cos (π \frac{κ}{2}), δ_{κ} = γ_{κ} tan (\frac{π κ}{2}), \end{matrix}$ (8) where $S (\cdot; κ, 1, γ_{κ}, δ_{κ})$ in (8) is the density function of a stable distribution with characteristic function (Nolan Citation1997) $φ (t) = exp {- γ_{κ}^{κ} | t |^{κ} [1 - i tan (\frac{π κ}{2}) sign (t)] + i δ_{κ} t} .$ When κ = 1, $H_{κ} (v)$ in (7) is degenerate at 1.

3. Bayesian tests of fixed effects under the EPD

3.1. Model set up

We are interested in evaluating the Bayesian tests for any number of the fixed effects parameters in Model (1), namely, (9) $H_{0} : {(β_{c_{1}}, \dots, β_{c_{k}})}^{T} = 0 vs . H_{1} : {(β_{c_{1}}, \dots, β_{c_{k}})}^{T} \neq 0,$ (9) under the EPD class, where ${c_{i}}_{i = 1}^{k} \subseteq {1, \dots, p} .$ The same hypotheses for the special case in (3) can be stated as (10) $H_{0} : μ = 0 vs . H_{1} : μ \neq 0 .$ (10) To carry out these tests in a Bayesian context, we need to deal with the joint and marginal distributions of the components involved. For this, recall that we can write the distributional assumptions in (2) as (11) $[\begin{matrix} y_{i} \\ b_{i} \end{matrix}] \sim N_{n_{i} + q} ([\begin{matrix} X_{i} β \\ 0 \end{matrix}], [\begin{matrix} Z_{i} Ψ Z_{i}^{T} + σ^{2} I_{n_{i}} & Z_{i} Ψ \\ Ψ Z_{i}^{T} & Ψ \end{matrix}]), i = 1, \dots, m .$ (11) Motivated by the procedures for robust estimation using the t-distribution outlined in Bai, Chen, and Yao (Citation2016), Lange, Little, and Taylor (Citation1989), Pinheiro, Liu, and Wu (Citation2001), we recast the joint distributional assumption for EPD as (12) $[\begin{matrix} y_{i} \\ b_{i} \end{matrix}] \sim E P_{n_{i} + q} ([\begin{matrix} X_{i} β \\ 0 \end{matrix}], [\begin{matrix} Z_{i} Ψ Z_{i}^{T} + σ^{2} I_{n_{i}} & Z_{i} Ψ \\ Ψ Z_{i}^{T} & Ψ \end{matrix}], κ), i = 1, \dots, m .$ (12) We shall consider the reparametrizations $Ψ = σ^{2} D,$ where D is unknown with no assumed structure as this was assumed for $Ψ,$ and $τ = σ^{2} d$ which will allow partial collapsing of the random and fixed regression coefficients in the MCMC algorithms outlined in the next section (Park and Min Citation2016). Thus, with the scale mixture of normal representation of the EPD, (12) can be expressed as (13) $[\begin{matrix} y_{i} \\ b_{i} \end{matrix}] | v_{i} \sim N_{n_{i} + q} ([\begin{matrix} X_{i} β \\ 0 \end{matrix}], σ^{2} v_{i}^{2} [\begin{matrix} Σ_{i} & Z_{i} D \\ D Z_{i}^{T} & D \end{matrix}]), v_{i} \sim h_{κ} (v_{i}),$ (13) where $Σ_{i} = Z_{i} D Z_{i}^{T} + I_{n_{i}} .$ As interest lies in inference of the fixed effect coefficients, the marginal model of $y_{i}$ in (13) could be considered. This corresponds to integrating out the random effects from the posterior distribution, meaning less parameters to sample in a MCMC sampling scheme. However, this approach would lead to an unknown normalizing constant of the conditional distribution of D due to the presence of the inverse and determinant of $Σ_{i}$ in the likelihood. Thus the joint approach is often preferred to a marginal one.

For the matrix form of (13), denote $V = diag (v_{1}, \dots, v_{m}), v = {(v_{1}, \dots, v_{m})}^{T},$ the stacked vectors $y = {(y_{1}^{T}, \dots, y_{m}^{T})}^{T}, b = {(b_{1}^{T}, \dots, b_{m}^{T})}^{T}$ and the stacked matrices $X = {(X_{1}^{T}, \dots, X_{m}^{T})}^{T}, Z = diag (Z_{1}, \dots, Z_{m})$ and $Σ = diag (v_{1} Σ_{1}, \dots, v_{m} Σ_{m}) .$ Then (13) can be expressed as (14) $[\begin{matrix} y \\ b \end{matrix}] | v \sim N_{N + m q} ([\begin{matrix} Xβ \\ 0 \end{matrix}], σ^{2} [\begin{matrix} Σ & Z (V \otimes D) \\ (V \otimes D) Z^{T} & (V \otimes D) \end{matrix}]) .$ (14)

Following the same strategy for the special case in (3, 13) can be expressed as (15) $[\begin{matrix} y_{i} \\ α_{i} \end{matrix}] | v_{i} \sim N_{n_{i} + 1} ([\begin{matrix} μ 1_{n_{i}} \\ 0 \end{matrix}], σ^{2} v_{i}^{2} [\begin{matrix} \tilde{Σ} & d 1_{n_{i}} \\ d 1_{n_{i}}^{T} & d \end{matrix}]), i = 1, \dots, m,$ (15) where $J_{n_{i}} = 1_{n_{i}} 1_{n_{i}}^{T}$ and ${\tilde{Σ}}_{i} = d J_{n_{i}} + I_{n_{i}} .$ Denote $α = {(α_{1}, \dots, α_{m})}^{T},$ and the block diagonal matrices $1 = diag (1_{n_{1}}, \dots, 1_{n_{m}})$ and $\tilde{Σ} = diag (v_{1} {\tilde{Σ}}_{1}, \dots, v_{m} {\tilde{Σ}}_{m}) .$ The matrix form of (15) is then (16) $[\begin{matrix} y \\ α \end{matrix}] | v \sim N_{N + m} ([\begin{matrix} μ 1_{N} \\ 0 \end{matrix}], σ^{2} [\begin{matrix} \tilde{Σ} & d 1V \\ d V 1^{T} & d V \end{matrix}]) .$ (16)

Now, under this set up, Bayesian tests of the form (9) and (10) can be carried out for which the models associated with H₀ and H₁ can be formulated as (17) $M_{0} : y \sim f (y | θ_{0}), θ_{0} \in Θ_{0} and M_{1} : y \sim f (y | θ_{1}), θ_{1} \in Θ_{1},$ (17) respectively, with corresponding prior distributions $π_{0} (θ_{0})$ and $π_{1} (θ_{1}) .$ Further, for the hypotheses tests (9) and (10) we have $Θ_{0} \subset Θ_{1} .$ For example, the parameter spaces stipulated by (10) with model (16) are $Θ_{0} = {(0, \infty)}^{2 + m} \times (0, 1), Θ_{1} = R \times Θ_{0} .$ Standard procedure is then to compute the marginal likelihoods $m_{0} = \int_{Θ_{0}} f_{0} (y | θ_{0}) π_{0} (θ_{0}) d θ_{0} and m_{1} = \int_{Θ_{1}} f_{1} (y | θ_{1}) π_{1} (θ_{1}) d θ_{1},$ with model choice subsequently carried out by the Bayes factor, defined as the quotient of m₀ and m₁, or the posterior probability of any of the hypotheses (Kass and Raftery Citation1995).

Recently, a new paradigm of Bayesian testing decision theory has been introduced. For models $M_{0}$ and $M_{1}$ in (17), the problem is phrased as a two component mixture (Kamary Citation2016) (18) $M_{ω} : y \sim ω f_{0} (y | θ_{0}) + (1 - ω) f_{1} (y | θ_{1}), ω \in [0, 1],$ (18) with $π_{0} (θ_{0})$ and $π_{1} (θ_{1})$ as the corresponding priors. In this paper we use (18), where model choice is based on the posterior distribution of ω rather than a discrete choice determined by some threshold.

For hypotheses (9), the model under H₀ is considered to be nested in that under H₁, thus parametrized by the same $β .$ Let ζ₀ be a $p \times 1$ vector with zeros in the positions ${c}_{i = 1}^{k}$ and ones in the complement ${({c}_{i = 1}^{k})}^{c},$ where complementation is taken with respect to ${1, \dots, p} .$ Denote the model matrix associated with the null hypothesis $X_{0, i} = X_{i} diag (ζ_{0})$ and the stacked null matrix $X_{0} = {(X_{0, 1}^{T}, \dots, X_{0, m}^{T})}^{T} .$ The mixture model used for testing parts of $β$ is then (19) $M_{ω}^{β} : y \sim ω N_{N} (X_{0} β + Zb, σ^{2} \tilde{V}) + (1 - ω) N_{N} (Xβ + Zb, σ^{2} \tilde{V}),$ (19) where $\tilde{V} = diag (v_{1} 1_{n_{1}}^{T}, \dots, v_{m} 1_{n_{m}}^{T}) .$ The corresponding structure for hypotheses (10) follows as (20) $M_{ω}^{μ} : y \sim ω N_{N} (1α, σ^{2} \tilde{V}) + (1 - ω) N_{N} (μ 1_{N} + 1α, σ^{2} \tilde{V}),$ (20)

After settling with the likelihood formulation, we need the priors. We consider conjugate prior distributions, conditional on the scale mixture parameters, so that the priors for the general model and its special case, (14) and (16), are, respectively, (21) $μ | σ^{2} \sim N (μ_{0}, σ^{2} σ_{μ}^{2}), β | σ^{2} \sim N_{p} (μ_{β}, σ^{2} Σ_{β}), σ^{2} \sim \frac{σ_{0}}{χ_{ν}^{2}}, τ \sim \frac{τ_{0}}{χ_{η}^{2}}, Ψ \sim W^{- 1} (ξ, Ψ_{0}),$ (21) where $χ_{ν}^{2}$ denotes the $χ^{2}$ distribution with ν degrees of freedom and $W^{- 1} (ξ, Ψ)$ denotes the inverse Wishart distribution with ξ degrees of freedom and scale matrix $Ψ .$ Moreover, the prior distributions of the kurtosis and mixture parameters are (22) $κ \sim U (0, 1) and ω \sim B eta (a_{1}, a_{2}) .$ (22)

To sample from the posterior of (19), latent indicators $z_{1}, \dots, z_{m}$ are utilized such that $z_{i} \in {0, 1},$ with $p (z_{i} = 0 | y, X, Z, θ_{β} \ z_{i}) = ω$ and $p (z_{i} = 1 | y, X, Z, θ_{β} \ z_{i}) = 1 - ω,$ where $θ_{β} = {β, b, D, z, ω, v, σ^{2}, κ}$ and $\$ is the set theoretic difference. The likelihood augmented by the latent indicators is $\begin{matrix} p (y, z | X, Z, θ_{β} \ z) = \\ \prod_{i = 1}^{m} (ω N_{n_{i}} (y_{i}; X_{0, i} β + Zb, σ^{2} v_{i}^{2} I_{n_{i}}))^{z_{i 0}} {((1 - ω) N_{n_{i}} (y_{i}; X_{i} β + Zb, σ^{2} v_{i}^{2} I_{n_{i}}))}^{z_{i 1}}, \end{matrix}$ where $z_{i 0} = 1_{0} (z_{i})$ and $z_{i 1} = 1_{1} (z_{i}),$ where $1$ is the indicator function. Similarly, the likelihood of (20) augmented by the latent component indicators is $p (y, z | X, Z, θ_{μ} \ z) = \prod_{i = 1}^{m} ω N_{n_{i}} {(y_{i}; 1α, σ^{2} v_{i}^{2} I_{n_{i}})}^{z_{i 0}} (1 - ω) N_{n_{i}} {(y_{i}; μ 1 + 1α, σ^{2} v_{i}^{2} I_{n_{i}})}^{z_{i 1}},$ where $θ_{μ} = {μ, α, d, z, ω, v, σ^{2}, κ} .$ The posterior distributions of (19) and (20) are thus given by (23) $p (θ_{β} | y, X, Z) \propto p (y, z | X, Z, θ_{β} \ z) p (b | y, X, Z, θ_{β} \ b) \prod_{i = 1}^{m} h_{κ} (v_{i}) π (β | σ^{2}) π (σ^{2}) π (D) π (κ) π (ω),$ (23) (24) $p (θ_{μ} | y) \propto p (y, z | X, Z, θ_{μ} \ z) p (α | y, X, Z, θ_{β}) \prod_{i = 1}^{m} h_{κ} (v_{i}) π (μ | σ^{2}) π (σ^{2}) π (d) π (κ) π (ω) .$ (24)

3.2. Sampling from the mixture of hypotheses

To sample from (23) we consider an extension of the partially collapsed Gibbs (PCG) sampler outlined in Park and Min (Citation2016), based on normality and no mixture level. Outlined in Sampler 1, the MCMC sampler is designed to block $(σ^{2}, b, β),$ resulting in faster mixing.

Sampler 1: LMM.

Display Table

Steps 1 to 3 are a generalization of a blocked sample from $\begin{matrix} p (σ^{2}, β, b | y, X, Z, θ_{β} \ {σ^{2}, b, β}) = \\ p (σ^{2} | y, X, Z, θ_{β} \ {σ^{2}, b, β}) p (β | y, X, Z, θ_{β} \ {b, β}) p (b | y, X, Z, θ_{β} \ b) \end{matrix}$ by partial collapsing, given that the internal order of the steps are not changed. The process of partial collapsing is achieved by marginalization, permutation and trimming (van Dyk and Park Citation2008). Denote $X_{i; z_{i}} = z_{i 0} X_{0, i} + z_{i 1} X_{i}$ and $X_{z} = {(X_{1; z_{1}}^{T}, \dots, X_{m; z_{m}}^{T})}^{T},$ the conditional distributions of steps 1 to 3 are then given by (25) $\begin{matrix} σ^{2} | y, X, Z, θ_{β} \ {σ^{2}, b, β} \sim (ν σ_{0}^{2} + {(\hat{β} - μ_{β})}^{T} Σ_{β}^{- 1} (\hat{β} - μ_{β}) + tr (D^{- 1} Ψ_{0}) + \\ {(y - X_{z} \hat{β})}^{T} Σ^{- 1} (y - X_{z} \hat{β})) \frac{1}{χ_{ν + N + q η}^{2}}, \end{matrix}$ (25) (26) $β | y, X, Z, θ_{β} \ {b, β} \sim N_{p} (\hat{β}, σ^{2} {(Σ_{β}^{- 1} + X_{z}^{T} Σ X_{z})}^{- 1}),$ (26) (27) $b | y, X, Y, θ_{β} \ b \sim N_{m q} (\hat{b}, σ^{2} (D - (V \otimes D) Z^{T} Σ^{- 1} Z (V \otimes D))),$ (27) where $\hat{β} = {(Σ_{β}^{- 1} + X_{z}^{T} Σ^{- 1} X_{z})}^{- 1} (Σ_{β}^{- 1} μ_{β} + X_{z}^{T} Σ y), \hat{b} = (V \otimes D) Z^{T} Σ^{- 1} (y - X_{z} β) .$ The conditional distribution of steps 4 to 6 are given by (28) $D | y, X, Z, θ_{β} \ D \sim W^{- 1} (η + m, σ^{- 2} (Ψ_{0} + b b^{T}))$ (28) (29) $z_{i} | X_{i}, θ_{β} \ z_{i} \sim B e r ({(1 + \frac{ω N_{n_{i}} (y_{i}; X_{i, 0} β + Z_{i} b_{i}, σ^{2} v_{i}^{2} I_{n_{i}})}{(1 - ω) N_{n_{i}} (y_{i}; Xβ + Z_{i} b_{i}, σ^{2} v_{i}^{2} I_{n_{i}})})}^{- 1}),$ (29) (30) $ω | y, θ_{β} \ ω \sim B eta (a_{0} + \sum_{i = 1}^{m} z_{i 1}, a_{1} + \sum_{i = 1}^{m} z_{i 0}),$ (30) where $B e r (p)$ denotes the Bernoulli distribution with mean p. Note that the specification of (19) means that values ω closer to 1 implies that the H₀ is more likely and vice versa, as can be seen in (30).

The conditional distribution of v_i is given by (31) $p (v_{i} | y_{i}, X_{i}, Z_{i}, θ_{β} \ v_{i}) \propto h_{κ} (v_{i}) N_{n_{i}} (y_{i}; X_{i; z_{i}} β + Z_{i} b_{i}, v_{i}^{2} σ^{2} I_{n_{i}}),$ (31) which does not have a known normalizing constant, so it is sampled by MH. As in (Gómez, Gómez-Villegas, and Marín Citation2008), we utilize the transformation $w_{i} = 2^{\frac{1}{κ} - 1} v_{i}^{- 2}$ which has the conditional distribution (32) $p (w_{i} | y_{i}, X_{i}, Z_{i}, θ_{β} \ v_{i}) \propto w_{i}^{- \frac{n_{i}}{2}} S (w_{i}; κ, 1, γ_{κ}^{*}, 1, δ_{κ}^{*}) N_{n_{i}} (y_{i}; X_{i; z_{i}} β, 2^{1 - \frac{1}{κ}} w_{i}^{- 1} Σ_{i}),$ (32) where $γ_{κ}^{*} = cos (\frac{π κ}{2})$ and $δ_{κ}^{*} = γ_{κ}^{*} tan (\frac{π κ}{2}) .$ By generating proposals independently from the previous state as $w' \sim S (w; κ, 1, γ_{κ}^{*}, 1, δ_{κ}^{*}),$ the stable densities cancel out in the acceptance probability, given by (33) $ψ (w_{i}, w') = \min (1, exp {\frac{w_{i} - w'_{i}}{2 σ^{2}} ({(y_{i} - X_{i; z_{i}} β - Z_{i} b_{i})}^{T} (y_{i} - X_{i; z_{i}} β - Z_{i} b_{i}))}) .$ (33) The conditional density of the kurtosis parameter is given by (34) $p (κ | θ_{β}) \propto \prod_{i = 1}^{m} \frac{2^{1 + \frac{n_{i}}{2} (1 - \frac{1}{κ})} Γ (1 + \frac{n_{i}}{2})}{Γ (1 + \frac{n_{i}}{2 κ})} v_{i}^{n_{i} - 3} S (v_{i}^{- 2}; κ, 1, γ_{κ}, δ_{κ}),$ (34) where $γ_{κ}$ and $δ_{κ}$ are defined as in (8). The normalizing constant of (34) is not known and κ is thus sampled by an MH-step with proposals generated by a normal random walk truncated to $[0, 1]$ with standard error ϵ. The acceptance probability of proposed $κ'$ is given by (35) $ψ (κ, κ') = \frac{p (κ' | θ_{μ} \ κ) (Φ (\frac{1 - κ}{ϵ}) - Φ (\frac{- κ}{ϵ}))}{p (κ | θ_{μ} \ κ) (Φ (\frac{1 - κ'}{ϵ}) - Φ (\frac{- κ'}{ϵ}))},$ (35) where $Φ$ denotes the standard normal cumulative distribution function. The conditional distribution (34) is uni-modal and very peaked around its mode. A slice sampler is more suitable but leads to extreme time consumption due to repeated evaluations of the stable density function. A MH step has proved sufficient in our applications and has been compared with a slice sampler. The peakedness of (34) leads to posterior samples of κ being largely determined by the likelihood rather than the prior distribution. Replacing the uniform prior by a more informative one has little impact on the posterior distribution. Mitigating this by a truncated prior generally leads to a situation similar to fixing the kurtosis parameter at either the upper or lower bound of the truncation set.

A similar MCMC algorithm for the posterior distribution of the one-way ANOVA in (24) is outlined in Sampler 2.

Sampler 2: one-way-ANOVA.

Display Table

As for Sampler 1, steps 1, 2 and 3 form a blocked sample from $(μ, α, σ^{2})$ through partial collapsing, given that their internal order is unchanged. Their conditional distributions are (36) $\begin{matrix} σ^{2} | y, θ_{μ} \ {μ, α} \sim \\ \frac{1}{χ_{N + η + ν}^{2}} (d^{- 1} η τ_{0} + ν σ_{0} + \frac{{(μ_{0} - \tilde{μ})}^{2}}{σ_{μ}^{2}} + \sum_{i = 1}^{m} {(y_{i} - \tilde{μ} z_{i 1} 1_{n_{i}})}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} (y_{i} - \tilde{μ} z_{i 1} 1_{n_{i}})), \end{matrix}$ (36) (37) $μ | y, θ_{μ} \ {μ, α} \sim N (\tilde{μ}, σ^{2} {(\sum_{i = 1}^{m} z_{i 1} 1_{n_{i}}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} 1_{n_{i}} + σ_{μ}^{- 2})}^{- 1}),$ (37) (38) $α | y, θ_{μ} \ α \sim N_{q} (d V 1^{T} {\tilde{Σ}}^{- 1} (y - 1 μ), d V - d^{2} V 1^{T} {\tilde{Σ}}^{- 1} 1V)$ (38) where $\tilde{μ} = {(\sum_{i = 1}^{m} z_{i 1} 1_{n_{i}}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} 1_{n_{i}} + σ_{μ}^{- 2})}^{- 1} (\sum_{i = 1}^{m} z_{i 1} 1_{n_{i}}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} y_{i} + σ_{μ}^{- 2} μ_{0}) .$ Furthermore, ${\tilde{Σ}}_{i}^{- 1} = I_{n_{i}} - 1_{n_{i}} 1_{n_{i}}^{T} \frac{d}{1 + n_{i} d}$ as ${\tilde{Σ}}_{i}$ is compound symmetric. The derivation of (36, 37) and (38) are outlined in the Appendix.

The conditional distributions of d, z_i and ω are given by (39) $d | y, θ_{μ} \ d \sim σ^{- 2} (τ_{0} + α^{T} α) \frac{1}{χ_{η + m}^{2}},$ (39) (40) $z_{i} | y, θ_{μ} \ z_{i} \sim B e r ({(1 + \frac{ω N_{n_{i}} (y_{i}; α_{i} 1_{n_{i}}, σ^{2} v_{i}^{2} I_{n_{i}})}{(1 - ω) N_{n_{i}} (y_{i}; (μ + α_{i}) 1_{n_{i}}, σ^{2} v_{i}^{2} I_{n_{i}})})}^{- 1}),$ (40) (41) $ω | y, θ_{μ} \ ω \sim B eta (a_{0} + \sum_{i = 1}^{m} z_{i 1}, a_{1} + \sum_{i = 1}^{m} z_{i 0}) .$ (41)

The scale mixture weight is sampled by the same procedure as in Sampler 1, with acceptance probability (42) $ψ (w_{i}, w_{i}^{'}) = \min (1, exp {\frac{w_{i} - w'_{i}}{2 σ^{2}} {(y_{i} - 1_{n_{i}} (α_{i} + z_{i 1} μ))}^{T} (y_{i} - 1_{n_{i}} (α_{i} + z_{i 1} μ))}) .$ (42)

The kurtosis parameter is sampled as in Sampler 1, with acceptance probability (35).

3.3. Simulation study

To compare performance of the mixture test for varying values of the kurtosis parameter, we consider settings similar to Pinheiro, Liu, and Wu (Citation2001); Yavuz and Arslan (Citation2018) with focus on the LMM and its mixture representation in (19). The kurtosis parameter is treated as a hyper-parameter by considering one-point distributions as prior distributions. Data is simulated as (43) $y_{i} = X_{i} β + 1_{4} b_{i} + e_{i}, i = 1, \dots, 20.$ (43) where $X_{i}$ and $β$ are defined as $X_{i} = {[\begin{matrix} 1 & 1 & 1 & 1 \\ 8 & 10 & 12 & 14 \end{matrix}]}^{T} and β = [\begin{matrix} 20 \\ 0.5 \end{matrix}],$ where the null hypothesis is for the second element of $β$ being 0. The random effects and error in (43) are distributed as $b_{i} \overset{iid}{\sim} N (0, σ^{2} d), e_{i} \overset{iid}{\sim} N_{4} (0, σ^{2} I_{4}) .$ To contaminate the data with outliers, b_i and $e_{i}$ are expressed as mixtures $b_{i} \sim (1 - p_{b}) N (0, σ^{2} d) + p_{b} f N (0, σ^{2} d), e_{i} \sim (1 - p_{e}) N_{4} (0, σ^{2} I_{4}) + p_{e} f N_{4} (0, σ^{2} I_{4}),$ where f is a constant used to adjust the variance of one of the components. The resulting covariance of $y_{i}$ is $Var (y_{i}) = σ^{2} ((1 + (f^{2} - 1) p_{b}) J_{4} + (1 + (f^{2} - 1) p_{e}) I_{4}) .$

Results are based on 500 Monte Carlo simulations for all combinations of $f = (2, 4, 6), κ = (0.6, 0.7, 0.8, 0.9, 1),$ and $p_{e} = (0, 0.1, 0.2)$ with p_b = 0, as well as a worst case scenario with $p_{b} = 0.2,$ f = 6 and $p_{e} = 0.2$ for all κ. We estimate the mixture weight by the posterior median based on 10⁴ samples from Sampler 1, with a transient phase of 10³ iterations. The median is considered rather than the mean as ω generally concentrates on its boundaries (Kamary Citation2016). The hyper-parameters are set as $β | σ^{2} \sim N_{2} (0, N σ^{2} {(X^{T} X)}^{- 1}), d \sim \frac{1}{χ_{1}^{2}}, σ \sim \frac{1}{χ_{1}^{2}} and ω \sim B eta (0.5, 0.5) .$

The results with no outliers in the random effects are shown in . The effect of the kurtosis on the posterior median of the mixture weight is negligible for scaling factor f = 2. For scaling factors 4 and 6, the effect over increasing percentage of outliers is more clear. With a higher kurtosis, i.e., closer to the normal distribution, choosing the correct model gets less probable as p_e increases. The difference is most clear with scaling factor 6, where the effect of increasing p_e is quite small for $κ = 0.6,$ but drastic for κ = 1. The results with $p_{b} = p_{e} = 0.2$ and f = 6 are displayed in , which are nearly identical to the setting p_b = 0 and f = 6. Overall, from and , we see that inference based on lower kurtosis is less affected by increased impairment from outliers. Further, with no or low contamination by outliers, there is little difference in model choice for the different values of kurtosis.

Figure 2. Posterior median of the mixture weight for each κ with $p_{e} = 0.2, p_{b} = 0.2$ and f = 6.

Figure 3. Posterior median of the mixture weight for each combination of κ, p_e and f with p_b = 0.

3.4. Applications

3.4.1. Example I

The dataset found in (Mid-Michigan Medical Center Citation1999) consists of repeated measurements of oral condition for 23 cancer patients. Each patient is randomly assigned to a treatment and placebo group, where the treated received aloe juice treatment. For each individual, an initial measure is taken with repeated measurements at weeks 2, 4 and 6. Other variables are initial age, weight and cancer stage of the patients. The data is displayed in .

Figure 4. Cancer data.

The model of interest is defined as (44) $\begin{matrix} y_{i j} = β_{0} + b_{1, i} + (β_{1} + b_{2, i}) {week}_{i j} + β_{2} {treatment}_{i} + β_{3} {age}_{i} + β_{4} {weight}_{i} + e_{i j}, \\ i = 1, \dots, 23 j = 1, \dots, 4, \end{matrix}$ (44) where cancer stage has been omitted as its inclusion led to issues in frequentist estimation of the random effects covariance matrix, this did not impact the results in terms of model choice. For testing parts of (44), we consider the null $H_{0} : (β_{1}, β_{2})' = 0$ against $H_{1} : (β_{1}, β_{2})' \neq 0 .$ The hyper-parameters are set as $β | σ^{2} \sim N_{5} (0, σ_{β}^{2} I_{5}), D \sim W^{- 1} (2, I_{2}), σ^{2} \sim \frac{1}{χ_{1}^{2}}, ω \sim B eta (\frac{1}{2}, \frac{1}{2}) and κ \sim U (0, 1),$ where results shall be compared for $σ_{β}^{2} = 100$ and 10. All results are based on 10⁴ iterations of Sampler 1, with a burn-in phase of $5 \times 10^{3}$ iterations. The trace and posterior densities of the mixture weight are displayed in , where the posterior median is estimated as 0.986 for $σ_{β}^{2} = 100$ and 0.934 for $σ_{2}^{β} = 10 .$ The null is therefore favored with both settings of $σ_{β}^{2},$ but the difference is quite large based on with much fewer jumps between models for $σ_{β}^{2} = 100 .$ Posterior samples of the kurtosis, treatment and week fixed effect coefficients and between subject variance are displayed in . The posterior mean of the kurtosis parameter is estimated as 0.91 for both values of $σ_{β}^{2},$ and this result is not sensitive to changes in the hyper-parameters of the prior on κ. Both the treatment and week coefficients are generally sampled by the prior with $σ_{β}^{2} = 100 .$ For $σ_{β}^{2} = 10,$ the posterior mean of the treatment and week coefficient, based on the iterations where one or more observations are assigned to the mixture component corresponding to the alternative hypothesis, are estimated as −0.406 and 0.61 respectively. For the between subjects variance, $σ^{2},$ the posterior mean with $σ_{β}^{2} = 100$ and $σ_{β}^{2} = 10$ are estimated as 8.72 and 8.02 respectively. The increased oscillation of the mixture weights with $σ_{β}^{2} = 10$ can also be seen in the trace plots of the variance term in , when comparing with the higher value of $σ_{β}^{2} .$ The posterior mean of the elements of D for $σ_{β}^{2} = 10$ and $σ_{β}^{2} = 100$ are estimated as ${\hat{D}}_{σ_{β}^{2} = 100} = [\begin{matrix} 0.112 & - 0.016 \\ - 0.016 & 0.053 \end{matrix}], {\hat{D}}_{σ_{β}^{2} = 10} = [\begin{matrix} 0.118 & - 0.014 \\ - 0.014 & 0.052 \end{matrix}],$ and are thus unaffected by the increased assignment of observations to the alternative hypothesis component with $σ_{β}^{2} = 10 .$ In , the random effects, e.g., $b_{1, i}$ and $b_{2, i}$ in (44), are compared for patients with Id’s 6 and 14, which are marked as gray lines in . Patient 6 did receive treatment and thus corresponds to the dashed gray line, whilst patient 14 did not and thus correspond to the solid gray line. There is not much difference between the posterior samples for different $σ_{β}^{2}$ of the random effects.

Figure 5. Posterior sample of the mixture weight for the hypotheses of no treatment effect.

Figure 6. Trace plots of the kurtosis parameter, treatment and week fixed effect coefficients, error variance and elements of the random effects covariance based on Sampler 1.

Figure 7. Posterior samples of the random effects of individuals with id 6 and 14.

Figure 8. Posterior density of the mixture weights with the kurtosis parameter treated as a hyper-parameter.

In , the posterior density of the mixture weights are compared when fixing the kurtosis parameter to 0.6 and 1. In all settings, the null is favored. The evidence for the null is slightly weaker with a lower kurtosis for both $σ_{β}^{2} = 10$ and 100.

To compare the models in a frequentist setting based on the normal likelihood, the models stipulated by the null and alternative hypotheses are estimated using the lme4 (Bates et al. Citation2015) package in R (R Core Team Citation2020) and compared using the Akaike information criterion (AIC). The AIC is 436.15 for the unrestricted model (44), where the treatment and week fixed effects are estimated as −0.622 and 0.535 respectively. The AIC for the restricted model is 444.69. Thus, the model under the alternative hypothesis is favored in a standard frequentist setting.

3.4.2. Example II

The dataset considered in Example 3.3 in Crowder and Hand (Citation1990) consists of measurements of plasma ascorbic acid (PAA) for twelve patients that underwent dietary regime treatment with measurements taken at weeks 1, 2, 6, 10, 14, 15 and 16. First 2 weeks consist of pretreatment measurements, last 2 consists of post-treatment measurements and the measurements in the remaining weeks were taken during treatment. The data are displayed in .

Figure 9. PAA data.

To test whether the mean for pretreatment, treatment and post-treatment are equal, the data is transformed by taking the mean within the treatment periods. Letting ${\tilde{y}}_{i}$ be the vector of 7 measurements for patient i, the data is transformed as $y_{i} = (\begin{matrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{matrix}) {\tilde{y}}_{i}, i = 1, \dots, 12,$ with the model specified as (45) $y_{i j} = μ_{j} + α_{i} + e_{i j}, i = 1, \dots, 12, j = 1, 2, 3,$ (45) where μ_j represents the average within the $j^{th}$ period. We shall first consider testing $H_{0} : μ_{1} = μ_{2} = μ_{3}$ vs. H₁: not H₀, and subsequently $H_{0} : μ_{1} = μ_{3}$ vs. $H_{1} : μ_{1} \neq μ_{3} .$ To test equality over the periods, (45) is expressed as (46) $y_{i j} = β_{1} + b_{1, i} + β_{2} {Treatment}_{i j} + β_{3} {Post - treatment}_{i j} + e_{i j}, i = 1, \dots, 12, j = 1, 2, 3,$ (46) with corresponding null hypothesis $(β_{2}, β_{3})' = 0$ and where ${Treatment}_{i j}$ is 1 if j = 2 and 0 otherwise and similarly ${Post - treatment}_{i j}$ is 1 if j = 3 and 0 otherwise. The hyper-parameters are set as $β | σ^{2} \sim N_{3} (0, 10 σ^{2} I_{3}), d \sim \frac{1}{χ_{1}^{2}}, σ^{2} \sim \frac{1}{χ_{1}^{2}}, ω \sim B eta (\frac{1}{2}, \frac{1}{2}) and κ \sim U (0, 1) .$ Results are based on 10⁴ with a burn-in phase of $5 \cdot 10^{3}$ iterations using Sampler 1, where some steps for observations sampled under the null model being identical to Sampler 2. The trace and posterior density of the mixture weight are displayed in , with the posterior median estimated as 0.078, strongly favoring the alternative hypothesis. Posterior samples of the kurtosis, fixed effects, between subjects variance and random effects variance are displayed in . The posterior mean of the kurtosis parameter is estimated as 0.84, and the estimate is not sensitive to changes in the hyper-parameters for the prior on κ. The estimated posterior means of the intercept, treatment and post-treatment coefficients, based on iterations where one or more observations are assigned to the mixture component corresponding to the alternative hypothesis, are 0.546, 0.606 and 0.158 respectively. For the variance components, the posterior mean of the between subjects variance is estimated as 0.117 and for the random effects variance, d, the posterior mean is estimated as 0.921. In , the random intercepts $b_{1, i}$ are displayed for individuals 2 and 10 which are identified by dashed lines in , where patient 10 corresponds to the dashed line with an initial measurement near 1.5. The posterior mean of the random effects for patients 2 and 10 are given by −0.112 and 0.296 respectively.

Figure 10. Posterior sample of the mixture weight for the hypotheses of the fixed effect equal to 0.5.

Figure 11. Trace plots of the kurtosis parameter, treatment coefficient and error and random effects variance based on Sampler 2.

Figure 12. Posterior samples of random coefficients for individuals with id 1 and 20.

In , the posterior density of the mixture weights are compared when fixing the kurtosis parameter to 0.6 and 1. The posterior median is estimated as 0.046 for the normal case, and 0.102 for $κ = 0.6 .$ The alternative hypothesis is thus strongly favored for both cases.

Figure 13. Posterior density of the mixture weights with the kurtosis parameter treated as a hyper-parameter.

We now consider testing whether the pretreatment levels of PAA are equal to the post-treatment levels, i.e., $H_{0} : μ_{1} = μ_{3}$ vs. $H_{1} : μ_{1} \neq μ_{3}$ in (45). The test is carried out based on model (46) with null hypothesis $β_{3} = 0 .$ We only consider the comparison for $κ = 0.6$ against κ = 1 for this case, with the comparison of the posterior densities of the mixture weights shown in . The difference between the posterior densities is larger in comparison to , with the posterior median for the normal case estimated as 0.90 and 0.772 for the kurtosis parameter set to 0.6. The difference in terms of posterior medians is thus large, although both cases tend to favor the null.

Figure 14. Posterior density of the mixture weights with the kurtosis parameter treated as a hyper-parameter.

For the frequentist model comparison, we consider tests based on Hotelling’s T² as in Example 4.2 in Crowder and Hand (Citation1990). To test $μ_{1} = μ_{2} = μ_{3},$ the general null hypothesis $Hμ = 0$ is considered, where $μ = (μ_{1}, μ_{2}, μ_{3})'$ and $H = (\begin{matrix} 1 & - 1 & 0 \\ 1 & 0 & - 1 \end{matrix}) .$ When testing equality of pretreatment and post-treatment, H is set to the row vector $H = (10 - 1) .$ For both cases, the test statistic is defined as (47) $T^{2} = m {\bar{y}}^{T} H^{T} {(HS H^{T})}^{- 1} H \bar{y},$ (47) where $\bar{y}$ is the sample mean vector with elements ${\bar{y}}_{j} = n^{- 1} \sum_{i = 1}^{m} y_{i j}$ and S is the sample covariance matrix with elements $S_{j k} = (n - 1) \sum_{i = 1}^{m} (y_{i j} - {\bar{y}}_{j}) (y_{i k} - {\bar{y}}_{k}) .$ With the standard assumption of normality, $T^{2} \sim F_{q, n - q}$ under the null, where q is the number of rows of H and $F_{q, n - q}$ is the F distribution with q and n–q degrees of freedom. For the test of equality over all periods, $T^{2} = 69.75$ which is $F_{2, 10}$ distributed under the null with p-value virtually 0. For testing equality between pretreatment and post-treatment, $T^{2} = 2.06$ which is $F_{1, 11}$ distributed under the null with p-value 0.18. Thus, model choice based on Hotelling’s T² agrees with our results.

4. Conclusions

In this paper, the EPD class has been considered, in place of the standard normal assumption, in the context of Bayesian hypothesis testing of the fixed effects in LMMs for repeated measures. Tests have been carried out using a mixture representation rather than the traditional Bayes factor or posterior probability of a given hypothesis. In a simulation study, the kurtosis parameter is treated as a hyper-parameter to study its effect on model choice under increasing contamination by outliers. Main focus is on outliers in the error term, but outliers in the random effects are also considered. With no outliers in the random effects, results from the simulation study show that the EPD with a lower kurtosis than that of the normal distribution performs better in terms of consistently choosing the true model. A kurtosis of 0.6 is much less affected by increasing percentage of outliers and scaling factor, in comparison with the normal case. With increasing kurtosis, the sensitivity to outliers also increases. One notable result is that the difference between 0.6 and 0.7 is smaller than the difference between 0.9 and 1, both in terms of spread and average of the posterior median over the Monte Carlo replications. Increasing the outliers of the random effects had no effect on the result with the settings used in our simulation design.

When treating the kurtosis as a hyper-parameter in the applications to real data, we find in Example I that the normal case and $κ = 0.6$ generally agree, albeit with the mixture parameter concentrating more around 1, i.e., favoring the null, for κ = 1. This was the case for both hyper-parameters considered for the variance of the fixed effects. Similarly, for Example II, we find that the results generally agree when treating the kurtosis parameter as a hyper-parameter for the test of all periods. When testing only pretreatment and post-treatment however, the difference was larger for the different values of the kurtosis parameter. One notable difference between Examples I and II is that the kurtosis parameter was sampled much closer to 1, i.e., normal, in Example I. When comparing the results in Example I to model choice in a frequentist setting with the usual normal assumption, we found that the full model under the alternative was favored. For Example II, we found that the model choice based on Hotelling’s T² agreed with our results.

The results from the applications and simulations indicate that using the general EPD class instead of a specific distribution as replacement, when the normality assumption is suspected, provides a flexible solution. For future research, the random effects could be included in the set of hypotheses in conjunction with extending to multiple hypotheses.

Acknowledgements

The authors are thankful to the editor, the associate editor and the two reviewers for their help and suggestions for the improvement of the article. Further, we gratefully acknowledge that the computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX partially funded by the Swedish Research Council through grant agreement no. 2018-05973.

References

Arellano-Valle, R. B., H. Bolfarine, and V. H. Lachos. 2005. Skew-normal linear mixed models. Journal of Data Science 3:415–38.
Google Scholar
Bai, X., K. Chen, and W. Yao. 2016. Mixture of linear mixed models using multivariate t distribution. Journal of Statistical Computation and Simulation 86 (4):771–87. doi: 10.1080/00949655.2015.1036431.
Web of Science ®Google Scholar
Bates, D., M. Mächler, B. Bolker, and S. Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67 (1):1–48. doi: 10.18637/jss.v067.i01.
Web of Science ®Google Scholar
Box, G. E. P., and G. C. Tiao. 1962. A further look at robustness via Bayes’s theorem. Biometrika 49 (3–4):419–32. doi: 10.2307/2333976.
Web of Science ®Google Scholar
Box, G. E. P., and G. C. Tiao. 1964. A Bayesian approach to the importance of assumptions applied to the comparison of variances. Biometrika 51 (1–2):153–67. doi: 10.2307/2334203.
Web of Science ®Google Scholar
Choy, S. T. B., and S. G. Walker. 2003. The extended exponential power distribution and Bayesian robustness. Statistics & Probability Letters 65 (3):227–32. doi: 10.1016/j.spl.2003.01.001.
Web of Science ®Google Scholar
Crowder, M. J., and D. J. Hand. 1990. Analysis of repeated measures. Volume 41 of Monographs on statistics and applied probability. London: Chapman & Hall.
Google Scholar
Gómez, E., M. A. Gomez-Viilegas, and J. M. Marín. 1998. Multivariate generalization of the power exponential family of distributions. Communications in Statistics - Theory and Methods 27 (3):589–600. doi: 10.1080/03610929808832115.
Web of Science ®Google Scholar
Gómez, E., M. A. Gómez-Villegas, and J. M. Marín. 2008. Multivariate exponential power distributions as mixtures of normal distributions with Bayesian applications. Communications in Statistics - Theory and Methods 37 (6):972–85. doi: 10.1080/03610920701762754.
Web of Science ®Google Scholar
Haro-López, R. A., and A. F. M. Smith. 1999. On robust Bayesian analysis for location and scale parameters. Journal of Multivariate Analysis 70 (1):30–56. doi: 10.1006/jmva.1999.1820.
Web of Science ®Google Scholar
Huggins, R. M. 1993. A robust approach to the analysis of repeated measures. Biometrics 49 (3):715–20. doi: 10.2307/2532192.
Web of Science ®Google Scholar
Kamary, K. 2016. Non-informative priors and modelization via mixture models. PhD thesis., PSL Research University.
Google Scholar
Kass, R. E., and A. E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90 (430):773–95. doi: 10.1080/01621459.1995.10476572.
Web of Science ®Google Scholar
Lange, K. L., R. J. A. Little, and J. M. G. Taylor. 1989. Robust statistical modeling using the t distribution. Journal of the American Statistical Association 84 (408):881–96. doi: 10.2307/2290063.
Web of Science ®Google Scholar
Lin, T. I., and J. C. Lee. 2008. Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data. Statistics in Medicine 27 (9):1490–507. doi: 10.1002/sim.3026.
PubMed Web of Science ®Google Scholar
Lindsey, J. K. 1999. Multivariate elliptically contoured distributions for repeated measurements. Biometrics 55 (4):1277–80. doi: 10.1111/j.0006-341x.1999.01277.x.
PubMed Web of Science ®Google Scholar
Mid-Michigan Medical Center. 1999. A study of oral condition of cancer patients. http://calcnet.mth.cmich.edu/org/spss/prj_cancer_data.htm.
Google Scholar
Nolan, J. P. 1997. Numerical calculation of stable densities and distribution functions. Communications in Statistics. Stochastic Models 13 (4):759–74. doi: 10.1080/15326349708807450.
Google Scholar
Park, T., and S. Min. 2016. Partially collapsed Gibbs sampling for linear mixed-effects models. Communications in Statistics - Simulation and Computation 45 (1):165–80. doi: 10.1080/03610918.2013.857687.
Web of Science ®Google Scholar
Pinheiro, J. C., C. H. Liu, and Y. N. Wu. 2001. Efficient algorithm for robust estimation in linear mixed-effects models using a multivariate t-distribution. Journal of Computational and Graphical Statistics 10 (2):249–76. doi: 10.1198/10618600152628059.
Web of Science ®Google Scholar
R Core Team. 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Google Scholar
Rukhin, A. L., and A. Possolo. 2011. Laplace random effects models for interlaboratory studies. Computational Statistics & Data Analysis 55 (4):1815–27. doi: 10.1016/j.csda.2010.11.016.
Web of Science ®Google Scholar
van Dyk, D. A., and T. Park. 2008. Partially collapsed Gibbs samplers: Theory and methods. Journal of the American Statistical Association 103 (482):790–6. doi: 10.1198/016214508000000409.
Web of Science ®Google Scholar
Walker, S. G., and E. Gutierrex-Pena. 1998. Robustifying Bayesian procedures (with discussion). In Proceedings of the 6th Valencia International Meeting on Bayesian Statistics, Pages 685–713. Oxford University Press,
Google Scholar
Yavuz, F. G., and O. Arslan. 2018. Linear mixed model with Laplace distribution (LLMM). Statistical Papers 59 (1):271–89. doi: 10.1007/s00362-016-0763-x.
Web of Science ®Google Scholar
Zhang, D., and M. Davidian. 2001. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics 57 (3):795–802. doi: 10.1111/j.0006-341x.2001.00795.x.
PubMed Web of Science ®Google Scholar

Appendix: Gibbs sampler for repeated measures one-way-ANOVA

The posterior (Equation24) is

\begin{matrix} p (θ_{μ} | y) \propto {(σ^{2})}^{- \frac{N}{2}} exp {- \frac{1}{2 σ^{2}} \sum_{i = 1}^{m} v_{i}^{- 2} {(y_{i} - z_{i 1} μ 1_{n_{i}} - α_{i} 1_{n_{i}})}^{T} (y_{i} - z_{i 1} μ 1_{n_{i}} - α_{i} 1_{n_{i}})} \\ \times ω^{\sum_{i = 1}^{n} z_{i 0} + a_{2}} {(1 - ω)}^{\sum_{i = 1}^{n} z_{i 1}} (\prod_{i = 1}^{m} {(v_{i}^{2})}^{- \frac{n_{i}}{2}} h_{κ} (v_{i})) {(σ^{2})}^{- \frac{m}{2}} exp {- \frac{1}{2 σ^{2} d} α^{T} V^{- 1} α} \\ \times {(σ^{2})}^{- \frac{1}{2}} exp {- \frac{{(μ_{0} - μ)}^{2}}{2 σ^{2} σ_{μ}^{2}}} {(σ^{2})}^{- 1 - \frac{ν}{2}} exp {- \frac{ν σ_{0}}{σ^{2}}} d^{- 1 - \frac{η}{2}} {(σ^{2})}^{- \frac{η}{2}} exp {- \frac{η τ_{0}}{σ^{2} d}} 1_{[0, 1]} (κ) . \end{matrix}

σ^{2}, μ

and

α

are sampled jointly from

(σ^{2}, μ, α) \sim p (σ^{2} | y, θ_{μ} \ {μ, α}) p (μ | y, θ_{μ} \ α) p (α | y, θ_{μ}) .

To derive the conditionals of σ and μ with

α

marginalized out, the likelihood is based on the marginal distribution of

y_{i}

in (Equation15) rather than the conditional. For

σ^{2},

\begin{matrix} p (σ^{2}, μ | y, θ_{μ} \ {σ^{2}, α}) \propto \int_{α} p (θ_{μ} | y) d α \propto \\ {(σ^{2})}^{- \frac{N}{2}} exp {- \frac{1}{2 σ^{2}} \sum_{i = 1}^{m} {(y_{i} - z_{i 1} μ 1_{n_{i}})}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} (y_{i} - z_{i 1} μ 1_{n_{i}})} \\ \times {(σ^{2})}^{- 1 - \frac{ν}{2}} exp {- \frac{ν σ_{0}}{σ^{2}}} {(σ^{2})}^{- \frac{η}{2}} exp {- \frac{η τ_{0}}{σ^{2} d}} {(σ^{2})}^{- \frac{1}{2}} exp {- \frac{{(μ_{0} - μ)}^{2}}{2 σ^{2} σ_{μ}^{2}}} . \end{matrix}

Next, μ is integrated out

\begin{matrix} p (σ^{2} | y, θ_{μ} \ {σ^{2}, μ, α}) \propto {(σ^{2})}^{- (1 + \frac{1}{2} (N + ν + η))} exp {- \frac{1}{2 σ^{2}} (d^{- 1} η τ_{0} + ν σ_{0})} \\ \int_{μ} {(σ^{2})}^{- \frac{1}{2}} exp {- \frac{1}{2 σ^{2}} (\sum_{i = 1}^{m} {(y_{i} - z_{i 1} μ 1_{n_{i}})}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} (y_{i} - z_{i 1} μ 1_{n_{i}}) + \frac{{(μ - μ_{0})}^{2}}{σ_{0}^{2}})} d μ \\ \propto {(σ^{2})}^{- (1 + \frac{1}{2} (N + ν + η))} exp {- \frac{1}{2 σ^{2}} (d^{- 1} η τ_{0} + ν σ_{0})} exp {- \frac{1}{2 σ^{2}} {[\sum_{i = 1}^{m} (y_{i}^{T} (v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} y_{i}) + \\ \frac{μ_{0}^{2}}{σ_{μ}^{2}} - \tilde{μ} (\sum_{i = 1}^{n} z_{i 1} 1_{n_{i}}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} y_{i} + σ_{μ}^{- 2} μ_{0}))]} = {(σ^{2})}^{- (1 + \frac{1}{2} (N + ν + η))} exp {- \frac{1}{2 σ^{2}} [ \\ (d^{- 1} η τ_{0} + ν σ_{0}) + \sum_{i = 1}^{m} {(y_{i} - \tilde{μ} z_{i 1} 1_{n_{i}})}^{T} {(v_{i}^{2} Σ_{i})}^{- 1} (y_{i} - \tilde{μ} z_{i 1} 1_{n_{i}}) + \frac{{(μ_{0} - \tilde{μ})}^{2}}{σ_{μ}^{2}}]}, \end{matrix}

where

\tilde{μ} = {(\sum_{i = 1}^{m} z_{i 1} 1_{n_{i}}^{T} {(v_{i}^{2} Σ_{i})}^{- 1} 1_{n_{i}} + Σ_{μ}^{- 1})}^{- 1} (\sum_{i = 1}^{m} z_{i 1} 1_{n_{i}}^{T} {(v_{i}^{2} Σ_{i})}^{- 1} y_{i} + Σ_{μ}^{- 1} μ_{0}) .

The conditional distribution of

σ^{2}

is then

σ^{2} | y, θ_{μ} \ {μ, α} \sim \frac{1}{χ_{N + η + ν}^{2}} (d^{- 1} η τ_{0} + ν σ_{0} + \frac{{(μ_{0} - \tilde{μ})}^{2}}{σ_{μ}^{2}} + \sum_{i = 1}^{m} {(y_{i} - \tilde{μ} z_{i 1} 1_{n_{i}})}^{T} {(v_{i}^{2} Σ_{i})}^{- 1} (y_{i} - \tilde{μ} z_{i 1} 1_{n_{i}})) .

The conditional distribution of μ with

α

marginalized is

p (μ | y, θ_{μ} \ {μ, α}) \propto exp {- \frac{1}{2 σ^{2}} (\sum_{i = 1}^{m} {(y_{i} - z_{i 1} μ 1_{n_{i}})}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} (y_{i} - z_{i 1} μ 1_{n_{i}}) + \frac{{(μ - μ_{0})}^{2}}{σ_{μ}^{2}})},

thus

μ | y, θ_{μ} \ {μ, α} \sim N (\tilde{μ}, σ^{2} {(\sum_{i = 1}^{m} z_{i 1} 1_{n_{i}}^{T} {(v_{i}^{2} {\tilde{Σ}}_{i})}^{- 1} 1_{n_{i}} + σ_{μ}^{- 2})}^{- 1}) .

A robustness evaluation of Bayesian tests for longitudinal data

Abstract

1. Introduction

2. Preliminaries