2,727
Views
0
CrossRef citations to date
0
Altmetric
Articles

Regression models of Pearson correlation coefficient

, & ORCID Icon
Pages 97-106 | Received 06 Sep 2022, Accepted 01 Jan 2023, Published online: 11 Jan 2023

Abstract

We propose two simple regression models of Pearson correlation coefficient of two normal responses or binary responses to assess the effect of covariates of interest. Likelihood-based inference is established to estimate the regression coefficients, upon which bootstrap-based method is used to test the significance of covariates of interest. Simulation studies show the effectiveness of the method in terms of type-I error control, power performance in moderate sample size and robustness with respect to model mis-specification. We illustrate the application of the proposed method to some real data concerning health measurements.

1. Introduction

Most regression models have been developed to describe the relationship between the expected value of response(s) and a number of covariates (predictors). In some situations, it is desired to understand the influence of covariates on the strength of Pearson correlation between two responses. For example, in some psychological study, it is of interest to understand the impact of age and/or sex on the association between physical functionality and mental functionality of the elder (Thomas et al., Citation2016).

One traditional measure for such association is the partial correlation coefficient, i.e., the correlation coefficient of two responses after removing the covariate effect (Anderson, Citation2003). However, its inference depends on the normality assumption and it misses the connection to the actual values of the covariates which is supposed to have different effects on the association. In terms of the rank-based counterpart, Liu et al. (Citation2018) proposed covariate-adjusted partial Spearman's rank correlation using probability scale residuals, which is particularly useful for ordinal responses.

Another method is to use a conditional approach to assess the correlation at different levels of the covariates. Bartlett (Citation1993) proposed to model the Fish z-transformed correlation by a polynomial regression of a single covariate z. His model builds a linear regression up on paired observations of (zk,r(zk)), k=1,,K, where r(zk) is the sample correlation coefficient of two responses at zk over K distinct levels of z. One clear drawback of this method is that it requires repeated measures of responses at zk, which would lead to the demand of a large total sample size and even larger when multiple covariates are included. Other studies along this direction include Yu and Dunn (Citation1982) and Paul (Citation1989). Wilding et al. (Citation2011) proposed a model to relate a transformed correlation coefficient of two normal responses (y1 and y2) with a linear combination of covariates (z1,,zp) through the probit link, i.e., (1+ρ)/2=Φ(γ0+γ1z1++γpzp), where ρ=corr(y1,y2) and Φ is the distribution function of the standard normal variable. (Restricted maximum likelihood corrected LRT is used to test the significance of covariates.) We point out that the linear transformed correlation (1+ρ)/2 guarantees the range for a distribution function. However, it can lose efficiency when ρ is known positive (since it can be modelled directly without transformation).

All the aforementioned methods consider the continuous responses. In many situations, there are only binary responses available, e.g., after dichotomization. In this paper, we propose two simple regression models of Pearson correlation coefficient of two normal responses and two binary responses (Section 2). Specifically, when the correlation coefficient is known positive, the logistic link function is used and when the correlation coefficient is arbitrary in (1,1), the hyperbolic tangent (or Fisher z-transformation) link function is used. Likelihood-based inference is developed to estimate the regression coefficients (Section 3). We demonstrate the performance of the proposed method by simulation in Section 4 and illustrate the applications in great detail to a real data concerning physical functionality and mental functionality of the elder (Section 5). We defer the technical details in Appendix.

2. Model

Let (y1,y2) be a pair of responses of a subject with correlation coefficient ρ=corr(y1,y2). Let x1,,xp denote p associated covariates of the paired responses, which follow a joint distribution f(x1,,xp). Consider modelling ρ as a monotonic function of the linear combination of these covariates through (1) ρ=h(xβ),(1) where x=(1,x1,,xp) and β=(β0,β1,,βp). When ρ>0, we use the logistic function h(x)=(1+ex)1, referred by link 1. When 1<ρ<1, we use the hyperbolic tangent function h(x)=tanh(x)=(e2x1)/(e2x+1), referred by link 2. We adopt these two commonly used link functions to map the real line to intervals (0,1) and (1,1), respectively, as they are analytically simple. Other links can also be used, e.g., the probit link for ρ>0. When the correlation is known positive, the logistic function is preferred as it is more efficient for estimation and easy for interpretation. Let h and h denote the first and second derivatives of h, respectively. Our goal is to assess model (Equation1).

2.1. Bivariate normal responses

Assume that y1 and y2 follow a bivariate normal distribution with marginal distributions N(μ1,σ12) and N(μ2,σ22), respectively. For j = 1, 2, we further model the expected value of yj by a linear regression, i.e., E(yjx)=μj=xγj where γj=(γ0,γ1,,γp). Denote η=(γ1,γ2,σ12,σ22) as the collection of all nuisance parameters.

The log-likelihood function is (2) (β,η)=log(2π)T12(1ρ2)+ρT21ρ2log(1ρ2)2,(2) where (3) T1=(y1μ1σ1)2+(y2μ2σ2)2,T2=(y1μ1)(y2μ2)σ1σ2,(3) ρ=h(xβ), and μj=xγj. The first and second derivatives of ℓ with respect to β are, respectively (4) s(β,η)={a(ρ)T1+b(ρ)T2+c(ρ)}x,H(β,η)={u(ρ)T1+v(ρ)T2+w(ρ)}xx,(4) where a, b, c, u, v and w are given in Table . By the fact that E(T1x)=2 and E(T2x)=ρ, we have E(s)=0, where 0 is the zero vector.

Table 1. Some functions under two link functions.

2.2. Bivariate binary responses

Assume that both y1 and y2 are binary responses. Let E(y1)=p1 and E(y2)=p2. For a, b = 0, 1, let Iab=I(y1=a,y2=b), where I() is the indicator function, and pab=E(Iab)=P(y1=a,y2=b). Clearly, p00+p01+p10+p11=1. By definition, we have p1=p11+p10,p2=p11+p01,ρ=p11p1p2p1(1p1)p2(1p2). Inversely, given p1, p2 and ρ, we can obtain (5) p11=p1p2+ρp1(1p1)p2(1p2),p10=p1p11,p01=p2p11.(5) For j = 1, 2, we further model the expected value of yj by a logistic regression through pj={1+exp(xγj)}1, where γj is as defined in Section 2.1. Denote η=(γ1,γ2). (Through (Equation1) and (Equation5), pab is a function of (β,η).)

By (Equation5), the log-likelihood is (β,η)=a,b=0,1Iablogpab=a,b=0,1Iablog(cab+dabρ), where c11=p1p2,d11=p1(1p1)p2(1p2),c10=p1(1p2),d10=d11,c01=p2(1p1),d01=d11,c00=(1p1)(1p2),d00=d11. Then, the first and second derivatives of ℓ with respect to β are, respectively (6) s(β,η)=a,b=0,1Iabeab(ρ)x,H(β,η)=a,b=0,1Iabfab(ρ)xx,(6) where eab(ρ)=dabh(xβ)cab+dabρ,fab(ρ)=dab2h2(xβ)+dab(cab+dabρ)h(xβ)(cab+dabρ)2. The detailed expressions of eab and fab under the two models of h are given in Appendix. When E(Iabx)=pab, we have E(s)=0.

3. Inference

Let {(y1,y2,x1,,xp):i=1,,n} denote independent samples of (y1,y2,x1,,xp) of size n. For i=1,,n and j = 1, 2, denote xi=(1,xi1,,xip), yj=(y1j,,ynj), x¯=n1i=1nxi and Xn×p=(x1,,xn).

The gradient (first derivative) and the Hessian matrix (second derivative) of the log-likelihood over n independent samples with respect to β are s(β,η)=i=1nsi(β,η) and H(β,η)=i=1nHi(β,η), respectively, where si(β,η) and Hi(β,η) are obtained by replacing (y1,y2,x) in (Equation4) or (Equation6) by (yi1,yi2,xi) for the ith subject.

In addition, let Gi(β,η)=2i/βη. Denote Θ=E(Hi) and J=E(Gi), where the expectation is taken with respect to the joint distribution of (yi1,yi2,xi). Let I=E{si(β,η)si(β,η)} denote the Fisher information matrix with respect to β. It is noted that unlike the usual regression of the mean value of response, the regularity condition does not hold for the model (Equation1) (Tsiatis, Citation2006, Theorem 3.2). (It can be easily verified that IΘ.)

Let ηˆ be the maximum likelihood estimate (MLE) of η obtained from the marginal models. For instance, the marginal MLEs of the nuisance parameters under the bivariate normal response case are γˆj=(XX)1Xyj, σˆj2=n1i=1n(yjiμˆji)2, where μˆji=xiγˆj. By large sample theory, we have ηˆpη and n(ηˆη)dN(0,Ση), where Ση is the covariance matrix.

Next, we estimate β by using the Newton–Raphson method through iterating (7) β(r+1)=β(r)H1(β(r),ηˆ)s(β(r),ηˆ),(7) where β(r) is the estimate of β at the rth iteration. The convergence of Newton–Raphson method depends on the initial point and the negative definitiveness of the Hessian matrix, which guarantees the (unique) global maximizer of the log-likelihood function. The initial estimate β(1) can be chosen such that h(x¯β(1)) is in the range of ρ. However, the Hessian matrix involves the plug-in estimators of the nuisance parameters. It is not trivial to show its negative definitiveness theoretically. Nevertheless, numerical study shows the condition holds with all eigenvalues being negative. Denote the root of the Newton–Raphson method by βˆ.

Theorem 3.1

Suppose that the conditions of Lemma A.1 for the bivariate normal responses or those of Lemma A.2 for the bivariate binary responses hold. (Lemmas A.1 and A.2 are given in Appendix.) Suppose the iteration (Equation7) converges. (i) βˆ is a consistent estimator of β, i.e., βˆpβ, and (ii) n(βˆβ)dN(0,Σ), where Σ is the covariance matrix given in the proof.

The significance of the overall model is assessed by testing the hypothesis H0:β1==βp=0. Denote the Wald statistic W=βˆ0covˆ1(βˆ0)βˆ0, where βˆ0=(βˆ1,,βˆp) and covˆ(βˆ0) is the estimate of the covariance of βˆ0. By Theorem 3.1, under the null hypothesis, W follows a chi-square distribution of degrees of freedom p asymptotically. The significance of individual covariate xj (j=1,,p) is assessed by testing the hypothesis H0j:βj=0 using the statistic wj=βˆj/seˆ(βˆj), where seˆ(βˆj) is the estimate of the standard deviation of βˆj, which is the square root of the (j,j)th element of covˆ(βˆ0). The null distribution of wj is asymptotically standard normal. Since the exact expression of Σ is rather complicated, we use bootstrap to estimate it in practice.

4. Simulation study

We conduct simulations to examine the performance of the proposed model for making inference in estimation and testing.

4.1. Setup

Unlike possible large numbers of covariates for the mean model, a small number of relevant covariates are usually adequate for modelling the correlation coefficient. Consider the number of covariates p to be two and three, respectively. Set the sample size n to be 100, 200, 400 and 1000, respectively.

For bivariate normal responses, set the marginal variances σ12=σ22=1. For both bivariate normal responses and bivariate binary responses, we set both γ1 and γ2 to be a vector of p + 1 zeros for the marginal mean models. It simplifies the data generation without simplifying the estimation procedure. Let x1,,xp be independent uniform random variables in (0,1).

For the correlation coefficient model, set β0=0.25. Under the nonnull model, set β1,…,βp under two link functions as in Table , where only the first covariate is significant. Under the null model, simply set all β1,,βp to be zeros. We set (0.25,0,,0) as the initial values for β under both link functions.

Table 2. Specifications of β1,,βp under p = 2, 3 and two link functions.

Throughout, we set the significant level to be 0.05 for both the global test and individual test, and set the number of replications, B, to be 1000.

4.2. Results

First, denote the empirical root of mean square error of βˆ by ERMSE=B1b=1B{j=0p(βˆj(b)βj)2}1/2, where βˆ(b) is the estimate of β in the bth replication. It measures the performance of the estimation in terms of magnitude. Second, denote the directional consistency rate (CR) by CR=B1b=1B{n1i=1nI(ρˆi(b)ρi(b)>0)}, where ρi(b) is the true correlation coefficient in the bth replication and ρˆi(b)=h(xiβˆ(b)). It measures the performance of estimation in terms of sign direction.

Table  presents the type-I error of the global test and individual tests under the null model. It is seen that the proposed resampling test controls the type-I error well at the nominal level.

Table 3. Type-I errors (in %) of the global test (for the model significance) and individual test (for the significance of the covariates) under two types of responses, two link functions and various sample sizes.

Columns 4–5 and 9–10 of Table  present the ERMSE and CR under the nonnull model considered in Section 4.1. It is seen that as the sample size increases the ERMSE decreases while the CR increases as expected. The CR ranges from 85% to 100% indicating that the proposed model yields a good fit in the sign direction of the correlation coefficient. Columns 7–9 and 12–15 of Table  present the powers of the global test and individual tests of the covariates. It is seen that both the global test and the individual test for the significant covariate (i.e., x1) gain power as the sample size increases as desired. The rejection rates for the insignificant covariates (i.e., x2 and x3) are close to the nominal level as desired. We note that the power performance depends on the actual model. In general, we see the proposed method works well under moderate sample size.

Table 4. ERMSE, CR and testing power of β's of models considered in Table  with various sample sizes.

4.3. Sensitivity analysis

We adopt the simulation model from Wilding et al. (Citation2011) for bivariate normal responses, where y1N(1+2x1,1), y2N(1.5+1.2x1,1) and their correlation coefficient is given by ρ(y1,y2)=2Φ(0.5+β1x1)1 with x1 being a uniform random variable in (0,1). Consider β1 to be 0, 0.5 and 1, respectively, to represent the null case and two nonnull cases. Set n to be 30, 60 and 150, respectively.

We compute the rejection rate for testing the significance of x1 using the proposed method with link 2 in comparison with Wilding et al. (Citation2011)'s method by which the model is correctly specified. This serves as a sensitivity analysis under model mis-specification. Table  reports the rejection rates under the considered cases. It is seen that under the null case when β1=0 the proposed method controls the type-I error well. Under the nonnull cases when β1=0.5 and 1 the proposed method is less powerful than Wilding et al.'s method when the sample size is 30 and becomes comparable in power when the sample size gets larger (60).

Table 5. Rejection rates (in %) of Wilding et al. (Citation2011)'s method and the proposed method for testing the significance of the covariate.

5. Real data application

Our data are taken from the survey of National Health Measurement Study in 2005–2006 (Fryback, Citation2009). It collects responses to health-related quality of life questionnaires and health status questions through Short Form SF-36 questionnaires from 3844 older adults in the continental United States (1641 males and 2203 females).

5.1. Bivariate normal responses

First, we illustrate the application of the proposed model to the bivariate normal responses case. The SF-36 items were aggregated into eight health concepts, which were further summarized into the physical component summary (PCS) and the mental component summary (MCS), both in the range of (0,100) with higher scores indicating better physical and mental health functions, respectively [28]. Several medical studies showed that the PCS declined significantly with age, but that the MCS did not change with age (Kim, Citation2016; Ware Jr et al., Citation1994). To continue along these lines, we applied the proposed method to further investigate the effect of gender and marital status. We used the Fisher z-transformation (i.e., tanh link) to accommodate negative value of correlation. The estimate model for the correlation between PCS and age is found to be tanh(ρ(PCS,age))=0.2030.145×MARRIED+0.018×SEX, where MARRIED is 1 if married or living with a partner and 0 otherwise, and SEX is 1 if female and 0 if male. The corresponding p-values for marital status and gender are <0.001 and 0.592, respectively, indicating strong significance of marriage and insignificance of gender. The negative coefficient of MARRIED implies that the status of being married or living with a partner aggravates the declination of PCS with age (0.283 for people married or living with a partner and 0.209 for people living by themselves). In other words, the people who are married or living with a partner have relatively lower physical functionality than the people who live by themselves when they age in general. On the other hand, our model concurred with the conclusion of insignificant correlation between MCS and age. In addition, neither gender nor marital status has significant effect on this correlation. (The details are omitted.)

Moreover, we applied the proposed model to study the age effect on the correlation of PCS and MCS. (Throughout this section the observations of age are standardized, denoted by AGE, before fitting the model as covariate.) Insignificant result is found and supported by the visual inspection of the correlations over 20 age intervals determined by the percentiles (Figure (a)). This result is different from the finding of Wilding et al. (Citation2011), who showed a nonlinear downward trend of the correlation coefficient over age based on the 2003–2004 Health Outcome Survey data. When we further include gender and marital status as covariates, there is no significant gender effect either (p-value 0.312) as also seen in Figure (b). However, there appears a very mild effect of marital status (p-value 0.236) in the way that the status of being married or living with a partner slightly reduces the correlation from otherwise slight positive to the level of near zero (Figure (c)). (The fitted model is tanh(ρ(PCS,MCS))=0.1480.016×AGE0.063×MARRIED0.052×SEX.)

Figure 1. (a) Correlation of PCS and MCS over 20 age groups, (b) correlation of PCS and MCS over 20 age groups with different genders, (c) correlation of PCS and MCS over 20 age groups with different marital status.

Figure 1. (a) Correlation of PCS and MCS over 20 age groups, (b) correlation of PCS and MCS over 20 age groups with different genders, (c) correlation of PCS and MCS over 20 age groups with different marital status.

5.2. Bivariate binary responses

Second, we applied the proposed method to model the correlation of two binary indicators of stroke (STR) and diabetes (DIA) using the covariates of age, gender and marital status. Since the correlation is known positive, the logistic link is used. The estimated model which includes the second-order age effect is logistic(ρ(STR,DIA))=2.0501.352×AGE0.090×AGE2+0.938×MARRIED+0.270×SEX with strong significance of the linear age effect (p-value <0.001) and marriage effect (p-value 0.028) and insignificance of the second-order term of age (p-value 0.786) and gender effect (p-value 0.675). It is seen from Figure (a) that the correlation exhibits a parabola shape in age where the largest value (about 0.25) occurs around age 55 and then starts to decline afterwards. This indicates that the association of stroke and diabetes is strongly age dependent. The diabetes becomes a less significant risk factor to stroke after a certain age, e.g., 70 (Kim, Citation2016). The status of married or living with a partner contributes significantly to the positive correlation (0.145 for married vs 0.106 for otherwise) especially for people who are more than 50 years old (Figure (c)), which could be explained by the evidence of low physical functionality found before.

Figure 2. (a) Correlation of STROKE and DIABETES over 10 age groups, (b) correlation of STROKE and DIABETES over 10 age groups with different genders, (c) correlation of STROKE and DIABETES over 10 age groups with different marital status.

Figure 2. (a) Correlation of STROKE and DIABETES over 10 age groups, (b) correlation of STROKE and DIABETES over 10 age groups with different genders, (c) correlation of STROKE and DIABETES over 10 age groups with different marital status.

6. Concluding remarks

We propose two simple regression models of Pearson correlation coefficient of two continuous responses or two binary responses. Likelihood-based inference is developed to estimate the model and test the significance of covariates. Our method for the binary response case is new to the literature and is useful for analysing data when outcomes are observed in binary form (such as yes or no) as seen in many psychological and sociological studies. The proposed method is easy to implement and computationally affordable. (The Newton–Raphson iteration converges in a few steps.) We have made R package available upon request.

It is noted that for the binary response case, the correlation ρ actually has a restricted range given by max{ψ1ψ2,(ψ1ψ2)1}ρmin{ψ1ψ21,ψ2ψ11}, where ψ1=p1/(1p1) and ψ2=p2/(1p2) (Qaqish, Citation2003). A more precise model than the Fisher z-transformation is warranted to further improve the efficiency.

The limitation of the present paper is that the proposed method does not handle the case of mixed responses, i.e., one continuous response and one binary response, or more generally an ordinal response categorized from a latent variable, such as ‘mild’, ‘moderate’, ‘severe’. Regression model of correlation involving latent variable(s) is worth investigation. We will report the result in a separate work.

Disclosure statement

All authors declare no conflict of interest.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon request.

References

  • Anderson, T. (2003). An introduction to statistical multivariate analysis. 3rd ed. Wiley.
  • Bartlett, R. F. (1993). Linear modelling of Pearson's product moment correlation coefficient: An application of Fisher's z-transformation. Journal of the Royal Statistical Society: Series D, 42(1), 45–53.
  • Fryback, D. G. (2009). United States National Health Measurement Study, 2005-2006. Inter-University Consortium for Political and Social Research.
  • Kim, D. (2016). Correlation between physical function, cognitive function, and health-related quality of life in elderly persons. Journal of Physical Therapy Science, 28(6), 1844–1848. https://doi.org/10.1589/jpts.28.1844
  • Liu, Q., Li, C., Wanga, V., & Shepherd, B. E. (2018). Covariate-adjusted Spearman rank correlation with probability-scale residuals. Biometrics, 74(2), 595–605. https://doi.org/10.1111/biom.v74.2
  • Newey, W. K., & McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of Econometrics, 4, 2111–2245. https://doi.org/10.1016/S1573-4412(05)80005-4
  • Paul, S. (1989). Test for the equality of several correlation coefficients. Canadian Journal of Statistics, 17(2), 217–227. https://doi.org/10.2307/3314850
  • Qaqish, B. F. (2003). A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika, 90(2), 455–463. https://doi.org/10.1093/biomet/90.2.455
  • Thomas, P., Jolanda, L., Antonius, J. M. d. C., Joris, P. J. S., & Rudi, G. J. W. (2016). Impact of physical and mental health on life satisfaction in old age: A population based observational study. BMC Geriatrics, 16(1), 194. https://doi.org/10.1186/s12877-016-0365-4
  • Tsiatis, A. A. (2006). Semiparametric theory and missing data. Springer.
  • Ware Jr, J. E., Kosinski, M., & Keller, S. D. (1994). SF-36 physical and mental summary scales: A user's manual. The Health Institute, New England Medical Center.
  • Wilding, G. E., Cai, X., Hutson, A., & Yu, Z. (2011). A linear model-based test for the heterogeneity of conditional correlations. Journal of Applied Statistics, 38(10), 2355–2366. https://doi.org/10.1080/02664763.2011.559201
  • Yu, M., & Dunn, O. (1982). Robust tests for the equality of two correlation coefficients: A monte carlo study. Educational and Psychological Measurement, 42(4), 987–1004. https://doi.org/10.1177/001316448204200407

Appendix. Technical details

A.1. Details of eab and fab

The expressions of eab and fab under two link functions of h are, respectively eab(ρ)={dabρ(1ρ)cab+dabρ,link 1,dab(1+ρ)(1ρ)cab+dabρ,link 2, and fab(ρ)={{dab2+dab(cab+dabρ)(12ρ)}ρ2(1ρ)2(cab+dabρ)2,link 1,{dab2dab(cab+dabρ)2ρ}(1+ρ)2(1ρ)2(cab+dabρ)2,link 2.

A.2. Lemmas

Lemma A.1

Assume that under the bivariate normal response model in Section 2.1, there exist constants c1, c2, c3, and c4 such that 0<c1|ρ|c2<1, 0|μˆj|<c3, |σˆj|c4>0 for j = 1, 2. Then, (i) n1i=1nHi(β,ηˆ)pΘ, and (ii) n1i=1nGi(β,η)pJ.

Proof.

Note that we cannot apply the weak law of large number as i(β,ηˆ)s and i(β,η)s are neither independent nor identically distributed.

(i) Let Tˆ1i=(y1iμˆ1iσˆ1)2+(y2iμˆ2iσˆ2)2 and Tˆ2i=(y1iμˆ1i)(y2iμˆ2i)σˆ1σˆ2. First, by the assumptions of 0|μˆj|<c3 and |σˆj|c4>0 for j = 1, 2, we have |Tˆ1i|=|(y1iμˆ1σˆ1)2+(y2iμˆ2σˆ2)2|y1i2+c32+2c3|y1i|c42+y2i2+c32+2c3|y2i|c42;|Tˆ2i|=|(y1iμˆ1)(y2iμˆ2)σˆ1σˆ2|(|y1i|+c3)(|y2i|+c3)c42. Second, by the assumption of 0<c1|ρ|c2<1, there exists a positive constant M1 such that |u(ρ)|, |v(ρ)|, and |w(ρ)| are all bounded by M1. Since Hi(β,ηˆ)={u(ρi)Tˆ1i+v(ρi)Tˆ2i+w(ρi)}xixi, then, there exists a function g(x,y) such that Hi(β,ηˆ)∥⩽g(xi,yi) and E{g(xi,yi)}, where is the Euclidean norm (i.e., A∥={tr(AA)}1/2). Clearly, Hi(β,η) is continuous in (β,η).

By Lemma 2.4 of Newey and McFadden (Citation1994), we have supβ,ηn1i=1nHi(β,η)Θp0. Since βpβ (because βˆpβ and β is between β and βˆ), ηˆpη and Θ is continuous in (β,η), by Tsiatis (Citation2006, Section 3.2), the result of (i) follows.

(ii) By (Equation2) and (Equation3), we have T1iγ1=2(y1iμ1)σ12xi,T2iγ1=y2iμ2σ1σ2xi,T1iγ2=2(y2iμ2)σ22xi,T2iγ2=y1iμ1σ1σ2xi,T1iσ12=(y1iμ1)2σ14,T2iσ12=(y1iμ1)(y2iμ2)2σ13σ2,T1iσ22=(y2iμ2)2σ24,T2iσ22=(y1iμ1)(y2iμ2)2σ1σ23, and 2i(β,η)βγ1={a(ρ)2(y1iμ1)σ12b(ρ)y2iμ2σ1σ2}xixi,2i(β,η)βγ2={a(ρ)2(y2iμ2)σ22b(ρ)y1iμ1σ1σ2}xixi,2i(β,η)βσ12={a(ρ)(y1iμ1)2σ14b(ρ)(y1iμ1)(y2iμ2)2σ13σ2}xi,2i(β,η)βσ22={a(ρ)(y2iμ2)2σ24b(ρ)(y1iμ1)(y2iμ2)2σ1σ23}xi. By the assumptions, we can find a function k(x,y) such that Di(β,η)k(xi,yi) and E{|k(xi,yi)|}<. And Di(β,η) is continuous in (β,η). Using the similar argument as in the proof for (i), by Lemma 2.4 of Newey and McFadden (Citation1994) supβ,ηn1i=1nGi(β,η)Jp0. Since ηpη and J is continuous in (β,η), the result of (ii) follows.

Lemma A.2

Assume that under the bivariate Bernoulli response model in Section 2.2 there exist constants d1,,d6 such that |h(xβ)|d1, |h(xβ)|d2 for all x, 0<d3<pj<d4<1, for j = 1, 2, 0<d5<|ρ|<d6<1. Then, (i) n1i=1nHi(β,ηˆ)pΘ, and (ii) n1i=1nGi(β,η)pJ.

Proof.

(i) By the assumptions, there exists M2 such that |fiab(β,η)|M2 for all a, b = 0, 1 and i=1,,n. Then, there exists a function g(x,y) such that Hi(β,ηˆ)g1(xi,yi) and E{g1(xi,yi)}. Clearly, Hi(β,η) is continuous in (β,η). By Lemma 2.4 of Newey and McFadden (Citation1994), supβ,ηn1i=1nHi(β,η)Θp0.

Since βpβ, ηˆpη and Θ is continuous in (β,η), by Tsiatis (Citation2006) Section 3.2, the result of (i) follows.

(ii) Observe that Gi(β,η)=a,b=0,1Iiabeiab(β,η)ηxi=a,b=0,1Iiabdiabη(ciab+diabρi)(ciabηdiabηρi)diab(ciab+diabρi)2h(xiβ)xi. By the logistic model defined in Section 2.2, pji/ηj=pj(1pj)xi. The expressions of diab/η and ciab/η are obtained as follows. di11ηj={p1i(1p1i)p2i(1p2i)}1/2(12pji)xi,j=1,2;di10ηj=di01ηj=di11ηj,di00ηj=di11ηj,j=1,2;ci11ηj=p1ip2i(1pji)xi,j=1,2;ci10η1=p1iη1p2ip1iη1=p1i(1p1i)(1p2i)xi;ci10η2=p1ip2iη2=p1ip2i(1p2i)xi;ci01η1=p2ip1iη1=p2ip1i(1p1i)xi;ci01η2=p2iη2p1ip2iη2=p2i(1p2i)(1p1i)xi;ci00ηj=pji(1p1i)(1p2i)xi,j=1,2. By the assumptions, we can find a function ω(x,y) such that Gi(β,η)ω(xi,yi) and E{ω(xi,yi)}<. And, Gi(β,η) is continuous in (β,η). Using the similar argument as in the proof for (i), by Lemma 2.4 of Newey and McFadden (Citation1994) supβ,ηn1i=1nGi(β,η)Jp0. Since ηpη in probability and J is continuous in (β,η), the result of (ii) follows.

A.3. Proof of Theorem 3.1

By the standard theory of regression, we have n(ηˆη)N(0,Ση).

By the fact that βˆ solves s(β,ηˆ)=0, expand s(βˆ,ηˆ) (with respect to β) around β as (A1) 0=s(βˆ,ηˆ)=s(β,ηˆ)+i=1nHi(β,ηˆ)(βˆβ),(A1) where β is between βˆ and β component-wise.

Second, expand s(β,ηˆ) (with respect to η) around η as (A2) s(β,ηˆ)=i=1nsi(β,ηˆ)=i=1nsi(β,η)+i=1nGi(β,η)(ηˆη),(A2) where η is between ηˆ and η component-wise.

Combining (EquationA1) and (EquationA2), n(βˆβ) is expressed as (A3) n{1ni=1nHi(β,ηˆ)}1[1ni=1nsi(β,η)+{1ni=1nGi(β,η)}(ηˆη)].(A3) By Lemma A.1 for the bivariate normal response case or Lemma A.2 for the bivariate Bernoulli response case, (A4) 1ni=1nHi(β,ηˆ)pΘand1ni=1nGi(β,η)pJ.(A4) And by the central limit theorem, n1/2i=1nsi(β,η)dN(0,I). Then, by Slusky theorem, (A5) {1ni=1nHi(β,ηˆ)}11ni=1nsi(β,η)dN(0,Θ1IΘ1),(A5) and (A6) {1ni=1nHi(β,ηˆ)}1{1ni=1nGi(β,η)}n(ηˆη)dN(0,Θ1JΣηJΘ1).(A6) Combining (EquationA3), (EquationA5), and (EquationA6), n(βˆβ) is asymptotically normal with zero mean vector and covariance matrix given by Σ=Θ1IΘ1+Θ1JΣηJΘ1+cov{LHS of (A5),LHS of (A6)}.