749
Views
1
CrossRef citations to date
0
Altmetric
Research Article

A trivariate Bernoulli regression model

Article: 1472519 | Received 24 Jul 2017, Accepted 01 May 2018, Published online: 04 Jun 2018

Abstract

A trivariate Bernoulli regression model is proposed in this paper. There is extensive need for analysing repeated binary outcomes where correlated binary outcomes are obtained from repeated measures or longitudinal data. The proposed model is based on marginal and conditional probabilities as functions of covariates. The estimation and test procedures are shown. The tests include testing of hypotheses for first- and second-order associations among outcome variables. It is noteworthy that the test procedures for dependence in outcome variables can be demonstrated in terms of vectors of regression parameters for models on outcome variables. The proposed model can be extended for more than three correlated outcome variables conveniently.

PUBLIC INTEREST STATEMENT

This research work provides an important development in modelling correlated outcomes data and shows the test for different types of associations. This model is developed for three binary outcomes. In various fields, the presence of correlation in outcome variables poses formidable difficulty to model the relationship between potential explanatory variables and outcome variables. However, if the dependence in outcome variables is not considered in modelling, the relationships between explanatory and outcome variables may be affected to a large extent resulting in misleading results. This paper provides a trivariate Bernoulli regression model and tests for the potential relationships of first, second and third orders are proposed. The applications of these procedures will make the analysis of correlated binary outcomes possible with appropriate interpretation and underlying mechanism of relationships among outcome variables and between outcome and explanatory variables.

1. Introduction

The use of Bernoulli regression model is well established. The logistic regression model has been one of the most extensively used techniques in various applications which is based on univariate Bernoulli distribution. Since the development of generalized linear models, it has been presented more explicitly using the logit link function from exponential family of distributions for binary data. There have been attempts to develop regression models for bivariate and multivariate Bernoulli regression models. Some noteworthy works were presented by McCullagh and Nelder (Citation1989), Glonek and McCullagh (Citation1995), Yee and Dirnbock (Citation2009) and Dai, Ding, and Wahba (Citation2013). McCullagh and Nelder (Citation1989) considered proportional odds model as a starting point for constructing proportional odds model for three ordinal categories. Glonek and McCullagh generalized the model for several categorical responses. Alternative conditional models based on Markovian assumptions were proposed by Islam and Chowdhury (Citation2006, Citation2008, Citation2017), Islam, Chowdhury, and Huda (Citation2009) and Islam et al. (Citation2012a). On the other hand, Dai et al. (Citation2013) showed a multivariate Bernoulli model to estimate structure of graphs with binary nodes. Islam, Alzaid, Chowdhury, and Sultan (Citation2013) proposed an alternative procedure using marginal-conditional approach to construct a bivariate Bernoulli model and provided tests for dependence in outcome variables. In this paper, a trivariate Bernoulli regression model is shown using marginal-conditional approach. Tests for dependence in trivariate models are also shown. The proposed model can be used very extensively in longitudinal studies with trivariate binary outcomes in various fields.

2. Trivariate Bernoulli distribution

Marshall and Olkin (Citation1985) showed the bivariate Bernoulli form of Y1 and Y2with Bernoulli marginal. Let us consider three binary variables Y1, Y2 and Y3, which, in longitudinal studies, can be considered as status of outcome variables at time points T1, T2 and T3, respectively. The probability distribution is displayed below:

Y1Y2Y3Total000p001p0001p000p011p0110p010p101p1011p100p111p11Totalp110p..11

and the trivariate Bernoulli probability can be represented as follows

(1) P(Y1=y1,Y2=y2,Y3=y3)=p000(1y1)(1y2)(1y3)p001(1y1)(1y2)y3p010(1y1)y2(1y3)p100y1(1y2)(1y3)p011(1y1)y2y3p101y1(1y2)y3p110y1y2(1y3)p111y1y2y3.(1)

We can easily find the expression for the above trivariate Bernoulli in exponential family form and after taking log the log-likelihood for n = 1 is

(2) l=y1lnp100p000+y2lnp010p000+y3lnp001p000+y1y2lnp110p000p100p010+y1y3lnp101p000p100p001+y2y3lnp011p000p010p001+y1y2y3lnp100p010p001p111p110p011p101p000+lnp000.(2)

The natural link functions are:

(3) θ1=lnp100p000,θ2=lnp010p000,θ3=lnp001p000,θ12=lnp110p000p100p010,θ13=lnp101p000p100p001,θ23=lnp011p000p010p001,θ123=lnp100p010p001p111p110p011p101p000,θ0=lnp000.(3)

Islam, Alzaid, Chowdhury, and Sultan (Citation2013) showed for bivariate Bernoulli regression model that underlying relationships can be explored more conveniently if a marginal-conditional approach is employed. We can use the underlying conditional and marginal models for expressing the association parameters in the proposed model. The bivariate Bernoulli model is comprised of 3 (221) models, two conditional and one marginal. Extension to trivariate Bernoulli shows that there are 7 (23–1) models. It appears from the above link functions that there are underlying relationships between link functions of three first, three second and one third orders that emerge from the natural link functions. The first-order link functions are simply odds between respective cell probability for an outcome variable and baseline probability of non-occurrence of any of the outcomes at three time points.

3. The marginal-conditional models for trivariate Bernoulli

We can express the joint probability of three outcome variables Y1, Y2 and Y3 for given X as follows:

(4) P(Y1=y1,Y2=y2,Y3=y3|X)=P(Y1=y1|X)P(Y2=y2|y1,X)P(Y3=y3|y1,y2,X)(4)

Here is P(Y1=y1|X)=πy1(X) marginal probability for Y1, P(Y2=y2|y1,X)=πy1y2(X) is conditional probability for Y2 given Y1, P(Y3=y3|y1,y2,X)=πy1y2y3(X) is conditional probability for Y3 given Y1 and Y2, and X=(1,X1,...,Xp). Using the relationship shown in Equation (4), the joint probabilities are

(5) p000(X)=1π1(X)1π01(X)1π001(X)p100(X)=π1(X)1π11(X)1π101(X)p010(X)=1π1(X)π01(X)1π011(X)p001(X)=1π1(X)1π01(X)π001(X)p110(X)=π1(X)π11(X)1π111(X)p011(X)=1π1(X)π01(X)π011(X)p101(X)=π1(X)1π11(X)π101(X)p111(X)=π1(X)π11(X)π111(X)(5)

The marginal model π1(X) is

(6) π1(X)=eXβ11+eXβ1(6)

where β1=(β10,β11,...,β1p) and π0(X)=11+eXβ1 as π0(X)+π1(X)=1.

The conditional probabilities can be obtained from the first- and second-order Markov models with covariate dependence (Islam et al., Citation2012b, Citation2012a; Islam et al., Citation2013; Islam & Chowdhury, Citation2017). The conditional models for first-order Markov chain transition probabilities in the relationships (5) are displayed in Table shown below for outcome variables Y1 and Y2:

Table 1. Transition probabilities for outcome variables Y1 and Y2

The transition probabilities are functions of covariates as displayed below

(7) π01(X)=eXβ011+eXβ01,(7)
(8) π11(X)=eXβ111+eXβ11(8)

where β01=(β010,β011,...,β01p),β11=(β110,β111,...,β11p),

π00(X)=11+eXβ01,π10(X)=11+eXβ11,and π00(X)+π01(X)=1andπ10(X)+π11(X)=1

Four second-order conditional models π001(X),π011(X),π101(X),and π111(X) are needed for the trivariate Bernoulli model. The conditional probabilities for second-order Markov models satisfy π000(X)+π001(X)=1,

π010(X)+π011(X)=1, π100(X)+π101(X)=1 , π110(X)+π111(X)=1

The second-order transition probabilities are shown in Table :

Table 2. Transition probabilities for outcome variables Y1, Y2 and Y3

The covariate-dependent second-order models are

(9) π001(X)=eXβ0011+eXβ001,π000(X)=11+eXβ001,(9)
(10) π011(X)=eXβ0111+eXβ011,π010(X)=11+eXβ011(10)
(11) π101(X)=eXβ1011+eXβ101,π100(X)=11+eXβ101(11)

and

(12) π111(X)=eXβ1111+eXβ111,π110(X)=11+eXβ111.(12)

Here, β001=(β0010,β0011,...,β001p),β011=(β0110,β0111,...,β011p)

β101=(β1010,β1011,,β101p),β111=(β1110,β1111,,β111p)

4. Link functions and estimating equations for trivariate Bernoulli model

The link functions for trivariate Bernoulli regression models are displayed in (3). In Section 3, we have shown the relationship between joint probabilities and marginal-conditional probabilities. The conditional probabilities are obtained from the first- and second-order transition probabilities. We have introduced seven logistic regression models, one marginal model in Equation (6) and six conditional models in Equations 712. Using the relationships between joint and marginal-conditional probabilities, we can redefine the link functions which are summarized below:

(13) θ0=lnp000=ln(1+eXβ1)ln(1+eXβ01)ln(1+eXβ001),(13)
(14) θ1=lnp100p000=Xβ1ln(1+eXβ11)ln(1+eXβ01)ln(1+eXβ101)ln(1+eXβ001)(14)
(15) θ2=lnp010p000=Xβ01ln(1+eXβ011)ln(1+eXβ001),(15)
(16) θ3=lnp001p000=Xβ001,(16)
(17) θ12=lnp110p000p100p010=Xβ11Xβ01ln(1+eXβ111)+ln(1+eXβ011) +ln(1+eXβ101)ln(1+eXβ001)(17)
(18) θ13=lnp101p000p100p001=Xβ101Xβ001,(18)
(19) θ23=lnp011p000p010p001=Xβ011Xβ001,(19)
(20) θ123=lnp111p100p010p001p110p011p101p000=Xβ111+Xβ001Xβ101Xβ011.(20)

The estimating equations are obtained by differentiating the log-likelihood function (2) with respect to regression parameters of seven marginal and conditional models. It may be noted here that for simplicity, the summation signs for i = 1,…,n is ignored here and the equations are shown for n = 1. The estimating equations are:

(21) lβsk=lβ1klβ01klβ11klβ001klβ011klβ101klβ111k=Xky1π1(X)Xk(1y1)y2π01(X)y1Xky2π11(X)Xk(1y1y2+y1y2)y3π001(X)y2Xk(1y1)y3π011(X)y1Xk(1y2)y3π101(X)y1y2Xky3π111(X)=0000000,k=0,1,,p.(21)

where Xk,k=1,...,p is the kth explanatory variable corresponding to the coefficient βsk , s denotes marginal and conditional models represented by 1, 01,11,001, 011, 101 and 111. The information matrix is comprised of information matrices for seven sets of parameters for marginal and conditional models. The (k,k) th element of the information matrix for model s is 2lβskβsk. Let us denote the 7(p+1)×7(p+1) matrix containing 7 diagonal (p+1)×(p+1) matrices as shown below:

(22) I=I10000000I20000000I30000000I40000000I50000000I60000000I7(22)

where

I1=XkXkπ1(X)(1π1(X)),k,k=0,1,,p(p+1)×(p+1),
I2=XkXk(1y1)π01(X)(1π01(X)),k,k=0,1,,p(p+1)×(p+1),
I3=XkXky1π11(X)(1π11(X)),k,k=0,1,,p(p+1)×(p+1),
I4=XkXk(1y1y2+y1y2)π001(X)(1π001(X)),k,k=0,1,,p(p+1)×(p+1),
I5=XkXk(1y1)y2π011(X)(1π011(X)),k,k=0,1,,p(p+1)×(p+1),
I6=XkXky1(1y2)π101(X)(1π101(X)),k,k=0,1,,p(p+1)×(p+1),
I7=XkXky1y2π111(X)(1π111(X)),k,k=0,1,,p(p+1)×(p+1).

5. Tests for models and dependence of outcome variables

For trivariate Bernoulli regression models, we have seven sets of parameters, one set for marginal for Y1 (Model 1for s = 1), two sets for conditional models from first-order Markov chains for transition from Y1 to Y2 (Models 1 and 2 for s = 01, 11, respectively) and four sets from second-order Markov chain for outcomes Y1, Y2 and Y3 (s = 001, 011, 101, 111, respectively) . The likelihood ratio test for overall joint model is

2lnLβ0lnLβχ27p

where β0=βs0,s=1,01,11,001,011,101,111 and β=βs,s=1,01,11,001,011,101,111 . Let β=βs,s=1,01,11,001,011,101,111 where βs=βs1,...,βsp. The null hypothesis of the above test is H0:β=0.

Tests for the marginal and conditional models can be performed separately as well. In that case, the null hypothesis for each model is H0:βs=0 , s = 1, 01, 11, 001, 011, 101, 111. The likelihood ratio test for each model can be shown as

2lnLβs0lnLβsχ2p

The Wald test will be applied to test for significance of a parameter for each model.

The tests for association parameters, θ12,θ13,θ23,θ123, can be performed based on the parameters of regression models (17)—(20). The null hypotheses and test statistics are displayed here:

  1. Test for independence of Y2 and Y3

The null hypothesis is

H01:β001=β011

The test statistic for equality of parameters

A1=βˆ001βˆ011V(βˆ001βˆ011)1βˆ001βˆ011

which is asymptotically chi-square with p degrees of freedom. It may be noted here that V(βˆ001βˆ011)1I4+I5.

  • (ii) Test for independence of Y1 and Y3

The null hypothesis is

H01:β001=β101

The test statistic for equality of parameters

A2=βˆ001βˆ101V(βˆ001βˆ101)1βˆ001βˆ101

which is asymptotically chi-square with p degrees of freedom. It may be noted here that V(βˆ001βˆ101)1I4+I6.

  • (iii) Test for independence of Y1 and Y2.

It appears from Equation (17) that independence of Y1 and Y2 depends not only on equality of regression parameters from Models 7 and 8 but also on equality of parameters of Models 9 and 11 and also Models 10 and 12.

The null hypotheses are

H01:β01=β11H02:β001=β101H03:β011=β111.

The test statistic for equality of regression parameters from Models 7 and 8 (H01) can be performed using the following statistic

A3=βˆ01βˆ11V(βˆ01βˆ11)1βˆ01βˆ11

The denominator is obtained approximately from

V(βˆ01βˆ11)1I2+I3

Test for H02 is shown in (ii). Similarly, the test for H03 is based on Models 10 and 12 and the test statistic is

A4=βˆ011βˆ111V(βˆ011βˆ111)1βˆ011βˆ111

where V(βˆ011βˆ111)1I5+I7.

  • (iv) Test for independence of Y1, Y2 and Y3

It appears from Equation (20) that independence of Y1, Y2 and Y3 depends on equality of regression parameters from Models 9 and 11 but also on equality of parameters of Models 10 and 12.

The null hypotheses are

H01:β001=β101H02:β011=β111

The first null hypothesis is shown in (ii) which is the test for independence of Y1 and Y3(A2) and the second null hypothesis is discussed in (iii) for partial test for independence of Y1 and Y2 (A4). In other words, independence of Y1, Y2 and Y3 depends on both (a) conditional independence of Y1 and Y2 (A4) and (b) independence of Y1 and Y3 (A2). Independence of three outcome variables depends on these pairwise conditional or unconditional independence of outcome variables that are shown in previous tests.

An alternative null hypothesis is quite straightforward from the model shown in Equation (20). Let us define β...=β111+β001β101β011 then the independence of Y1, Y2 and Y3 can be tested alternatively for null hypothesis H01:β...=0. We can test the null hypothesis using

A5=βˆV(βˆ)1βˆ

where V(βˆ...)1I4+I5+I6+I7.

6. Concluding remarks

We need to analyse binary repeated measures data in many instances where the outcome variables are correlated. The modelling of correlated outcome variables have been of interest in many fields due to recent emergence of need for analysing repeated measures data in the presence of correlation among outcome variables in addition to models for identifying explanatory variables associated with outcome variables. Several attempts have been made in the past to model such data but due to inbuilt complexity in modelling multivariate data with correlated outcomes it remained a challenge for a long time. This study shows an alternative approach based on marginal-conditional formulation to describe a joint model and provides a set of marginal and conditional models that can provide joint probabilities for a trivariate binary case. This procedure can be extended for more than three correlated outcomes easily using the same approach which is not shown in this paper to keep the exposition simple. Several tests are displayed in this paper for testing the overall model for trivariate Bernoulli as well as for examining dependence in outcome variables.

Acknowledgements

This work was supported by the HEQEP sub-project CP 3293 sponsored by UGC, Bangladesh and World Bank. I would like to express my gratitude to the anonymous reviewers for their helpful suggestions.

Additional information

Funding

This work was supported by the HEQEP sub-project [Grant Number CP 3293] sponsored by UGC, Bangladesh and World Bank.

Notes on contributors

M. Ataharul Islam

M. Ataharul Islam is currently the QM Husain Professor, ISRT, University of Dhaka. He was a former professor of statistics at the University Sains Malaysia, King Saud University, University of Dhaka and the East West university. He was a visiting faculty at the University of Hawaii and University of Pennsylvania. He is recipient of the Pauline Stitt Award, the WNAR Biometric Society Award for content and writing, University Grants Commission Award for book and research, Ibrahim Gold Medal for research, etc. He published more than 100 research papers in international journals on various topics, extensively on longitudinal and repeated measures data, including multistate and multistage hazards models, statistical models for repeated measures data, Markov models with covariate dependence, generalized linear models, conditional and joint models for correlated outcomes, etc. He authored several books either published or being published.

References

  • Dai, B., Ding, S., & Wahba, G. (2013). Multivariate Bernoulli distribution. Bernoulli, 19(4), 1465–1483. doi:10.3150/12-BEJSP10
  • Glonek, G. F. V., & McCullagh, P. (1995). Multivariate logistic models. Journal of the Royal Statistical Society, Series B (Methodological), 57, 533–546.
  • Islam, M. A., Alzaid, A. A., Chowdhury, R. I., & Sultan, K. S. (2013). A generalized bivariate Bernoulli model with covariate dependence. Journal of Applied Statistics, 40(5), 1064–1075. doi:10.1080/02664763.2013.780156
  • Islam, M. A., & Chowdhury, R. I. (2006). A higher order Markov model for analyzing covariate dependence. Applied Mathematical Modeling, 30, 477–488. doi:10.1016/j.apm.2005.05.006
  • Islam, M. A., & Chowdhury, R. I. (2008). Chapter 4: First and higher order transition models with covariate dependence. In F. Columbus (Ed.). Progress in applied mathematical modeling (pp. 153–196). Hauppage, NY: Nova Science Publishers.
  • Islam, M. A., & Chowdhury, R. I. (2017). Analysis of repeated measures data. Singapore: Springer.
  • Islam, M. A., Chowdhury, R. I., & Alzaid, A. A. (2012b). Tests for dependence in binary repeated measures data. Journal of Statistical Research, 46, 203–217.
  • Islam, M. A., Chowdhury, R. I., & Briollais, L. (2012a). A bivariate binary model for testing depedence in outcomes. Bulletin of Malaysian Mathematical Sciences Society, 35, 845–858.
  • Islam, M. A., Chowdhury, R. I., & Huda, S. (2009). Markov models with covariate dependence for repeated measures. New York: Nova Science Publishers.
  • Marshall, A. W., & Olkin, I. (1985). A family of bivariate distributions generated by the bivariate Bernoulli distribution. Journal of the American Statistical Association, 80, 332–338. doi:10.1080/01621459.1985.10478116
  • McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall.
  • Yee, T. W., & Dirnbock, T. (2009). Models for analyzing species’ presence/absence data at two time points. Journal of Theoretical Biology, 259, 684–694. doi:10.1016/j.jtbi.2009.05.004