Abstract
The moment estimate of the correlation parameters using generalized estimating equations (GEE) is not guaranteed to exist or to be feasible. We introduce the Gaussian estimation method and show that the estimate of the correlation parameters in longitudinal data setup is asymptotically unbiased and feasible. We derive the large sample properties of the regression and correlation estimates and we illustrate these estimators via real life data example.
1 Introduction
Longitudinal or repeated measures data arise in several areas of research including social, environmental, medical, biological sciences where many experimental units (human, animals, plants, schools, etc.) are measured repeatedly over time. The broad goal of such studies is to assess changes over time within units and relate them to the treatments, interventions and other relevant covariates using the idea of regression. The difficulty in exact specification of the distribution of nonnormal data (binary, count, etc.) precludes the use of maximum likelihood for parameter estimation. However, the idea of quasi-likelihood (QL) for independent observations (Wedderburn, Citation1974 and Heyde, Citation1997) requires only a model for the mean and the relationship between the mean and variance. Liang and Zeger (Citation1986) generalized estimating equations (GEE) method extended the framework of QL estimation to dependent longitudinal data by specifying models for the marginal means and variances. They relied on a ‘working’ correlation matrix with few parameters (say, α) to write an analog of the generalized least squares normal equations. Some pitfalls and limitations of the GEE estimator of α were pointed out by Crowder (Citation2001) and other researchers. For example, the moment estimator of the correlation parameter may not exist in some simple cases or it may not be feasible. Despite the list of drawbacks of GEE, researchers insisted on the fact that GEE usually works well in practice even in the misspecified structures (Crowder, Citation2001). Several articles proposed alternative approaches based on minimizing objective functions instead of solving some ad hoc estimating equations to obtain efficient estimators of β and α (Chaganty, Citation1997; Chaganty and Shults, Citation1999; Al-Rawwash, Citation2001; Crowder, Citation2001; Al-Rawwash and Pourahmadi, Citation2006).
Chaganty (Citation1997) relied on partial minimization of the generalized least squares criterion to introduce the quasi-least squares (QLS) estimates where the covariance matrix is a function of both regression and correlation parameters (β, α). The estimating equation for β is precisely the same as GEE, but the estimating equation for α is new and different from all previously developed approaches (see Diggle et al., Citation1994). Shults and Chaganty (Citation1998) used the QLS method for certain correlation models and showed the importance of the QLS compared to the GEE for unbalanced longitudinal data setup. The fact that the QLS estimator of the correlation parameter is asymptotically biased motivated Chaganty and Shults (Citation1999) to propose a method to eliminate this problem. They proposed using a continuous, one-to-one transformation that depends on the working correlation matrix under the QLS scheme. They presented the method for different correlation structures and obtained a bias corrected estimate of the correlation matrix. Pourahmadi (Citation2000) developed the maximum likelihood estimators for the parameters of a generalized linear model for the covariance matrix and discussed the asymptotic results under normality condition. Al-Rawwash (Citation2005) introduced a nonparametric approach to estimate the parameters of interest as an alternative to the classical parametric techniques. Al-Rawwash and Pourahmadi (Citation2006) introduced the generalized Gaussian likelihood function as their objective function and they used data-driven approach to obtain estimates of the parameters assuming minimal distributional requirements. In this paper, we follow the footsteps of Al-Rawwash and Pourahmadi (Citation2006) and use the generalized Gaussian likelihood function as the objective function to pursue the estimation of the regression and the correlation parameters.
The history of Gaussian estimation (GE) goes back to Whittle (Citation1961) in the time series literature where the parameter estimates are the values that maximize the Gaussian likelihood function for certain correlation structures. Several articles discussed the GE in longitudinal data literature where they focused on obtaining the parameter estimates based on minimizing an objective function even when the observations are not normally distributed (Crowder, Citation1995; Crowder, Citation2001; Al-Rawwash, Citation2001; Al-Rawwash and Pourahmadi, Citation2006). In this article, we assume only the knowledge of the first two moments and not the exact specification of the distribution of the data. The notion of Gaussian estimation is distinct from GEE in that it minimizes an objective function rather than relying on estimating equations that might produce flawed estimators (Chaganty, Citation1997; Al-Rawwash, Citation2001; Crowder, Citation2001). Several articles discussed the idea of modeling longitudinal data sets including but not limited to Vonesh and Chinchilli (Citation1997), Lipsitz et al. (Citation2000), Wang and Carey (Citation2003), Pourahmadi et al. (Citation2007) and Buzkova and Lamley (Citation2008)".
The organization of the paper is as follows. In Section 2, we introduce the Gaussian estimation and outline a Newton-Raphson iterative method for computing the estimates. Some closed form solutions for the estimates of the correlation parameter α is given in Section 3 when the correlation structure is AR(1). In Section 4, we establish the consistency and asymptotic normality of the regression and the correlation parameter estimates under mild regularity conditions. In Section 5, we provide a real life data analysis of the GE compared to other well known estimation procedures. Finally, we outline the proof of some results in the appendix.
2 The Gaussian estimation
To set the notation, we consider a random vector comprising the measurements taken on a generic subject at times
) and the associated covariates xj = (xj1, … ,xjp)′, j = 1, 2, … , ni, i = 1,2, ⋯ ,m. We plan to make no distributional assumption about Y except for its first two moments. Firstly, the means and variances of the entries of the response are related to the covariates xj via
, var(yj) = φv(μj), where β = (β1, … , βp)′ is the regression parameters of primary interest, g is an invertible known link function, v(·) is a known variance function and φ is a dispersion parameter not depending on β. The importance of the parameter φ appears clearly when we handle a nonnormal data set for it expresses the variability beyond the mean. The setup may be carried out for unbalanced longitudinal data, however we focus in this article on the balanced case where ni = n. Secondly, we use the matrix R = R(α) to model correlations among the measurements on the same subject. The matrix R and its parameters α are usually viewed as nuisance, though in some situations they are of primary interest. Now, if we define μ = μ(β) = (μ1, … ,μn)′ and A(β) = diag(v(μ1), … ,v(μn)), then the covariance matrix of Y is decomposed into
In longitudinal data setup, we assume Yi, i = 1,2, ⋯ ,m, to be the vector of repeated measures on the ith subject with covariance . The Gaussian estimates (Al-Rawwash and Pourahmadi, Citation2006) of (β, α) are the minimizers of the objective function
(1) where
is the vector of partially standardized measurement on the ith subject. Even if G(.,.) is differentiable, its differentiation with respect to β is complicated because of the appearance of β in Ai(β). This problem is avoided by pretending that
is a function of β∗ different from β (sometimes called decoupling). Therefore, if we differentiate Equation(1)
(1) with respect to β, α and then substituting β for β∗ we arrive at:
(2)
(3) where D(β) = diag(D1,D2, ⋯ ,Dm),
for 1 ⩽ i ⩽ m,
, A(β) = diag(A1(β),A2(β), ⋯ , Am(β)),
and ⊗ is the kronecker product. The estimating Eqs. Equation(2)
(2) are precisely the GEEs for β, but Equation(3)
(3) is different from the estimating equations developed in the literature to estimate the correlation parameter. Solving these two equations for β and α could give the GE of (β,α). Since closed form solution is not available in general, we suggest using the Newton-Raphson iterative algorithm as follows. Firstly, we choose an initial value
for β, then we evaluate the following quantities:
,
,
and
at
. Accordingly, we solve Eq. Equation(3)
(3) for
, then we compute
and
for 1 ⩽ i ⩽ m and construct the covariance matrix
as well as
and
. Hence, we update the value of
using
. Finally, we repeat the process until
, then we take
as an estimate of β and
as an estimate of α. The least squares estimates
of β are a good candidate for the initial value of β, while estimates of α must be chosen so that the positive definiteness of the correlation matrix is guaranteed. The estimate of φ depends on the GE results of β and α obtained using Newton-Raphson iterative method which is reduced to
The unbiasedness of the GE of (β,α) is not guaranteed when the working correlation structure is misspecified. However, we show in Section 4 that the GE of β and α is consistent when the working correlation matrix is correctly specified.
3 Specific correlation structures
In this section, we present closed form solutions for the correlation parameter of AR(1) correlation structure when the correlation matrix is either misspecified or correctly specified. These can be used in Newton–Raphson algorithm and hence avoid an iterative cycle within this algorithm. The impact of the correlation matrix misspecification on the degrees of estimating equations and the form of their solutions are illuminated. Chaganty (Citation1997) and Chaganty and Shults (Citation1999) considered special correlation structures and they derived closed form solutions for the estimator of the correlation parameter using the the QLS and the C-QLS. The asymptotic bias of the QLS estimator of the correlation parameter is corrected by Chaganty and Shults (Citation1999) using a continuous one-to-one function that transformed the biased correlation matrix to an asymptotically unbiased one. We show that GE of α is asymptotically unbiased and feasible when the number of measurements per subject n is greater than 3 or a slightly modified (conditional) form of the Gaussian estimation is used.
In the following example we assume that the true correlation matrix, denoted by , is unstructured, and we intend to give explicit formulas for the estimate of α by solving Equation(3)
(3) for the AR(1) ‘working’ correlation matrices.
Example 1.
Let the working correlation structure of the repeated measurements be the AR(1) matrix R(ρ) = (ρ∣i−j∣) with the parameter α = ρ∈ (−1, 1). Note thatwhere C1 is a tridiagonal matrix with 0 on the diagonal and 1 on the first upper and lower diagonals and C2 = diag(0,1, ⋯ ,1,0). Since the entries of (1 − ρ2)R−1 are quadratic functions of ρ, the estimating Eq. Equation(3)
(3) reduces to the quadratic equation
where
,
,
is the current estimate of β and Zi is defined in Equation(1)
(1) . Its (unique) feasible solution is given by
provided that ∣bm∣ > ∣am∣. This is in sharp contrast to the QLS estimate of the correlation parameter which is known to be feasible all the time (Chaganty, Citation1997). The differences between the coefficients of the quadratic equations for GE and QLS estimates of α are in the second terms in am and bm, for which with
we have
. Nevertheless, there are datasets for which ∣bm∣ < ∣am∣ and the quadratic equation have complex roots.
It is important to mention that the working and the true correlation matrices may have the same AR(1) structure which consequently makes corr(Yi) correctly specified. In this case, the estimating Eq. Equation(3)(3) is cubic in ρ, therefore we shall derive closed formulas for its solutions and study their feasibility.
Example 2
Let both the working and true correlation matrices have AR(1) structures with R(ρ) and R−1(ρ) given in Example 1. Then, Equation(3)(3) reduces to the cubic equation
(4) where
,
and
with
is defined in Equation(1)
(1) . To solve the cubic Eq. Equation(4)
(4) , we follow the standard procedure and set
where
and for later use define
where the upper sign in γ1 is used if v is positive and the lower sign if v is negative. The nature and number of feasible roots of Equation(4)(4) depend on the signs of Δ and u as discussed below:
Case 1: | If Δ > 0 and u > 0, then Equation(4) | ||||
Case 2: | If Δ < 0 and u < 0 , then Equation(4) | ||||
Case 3: | If Δ = 0 and u < 0, then Equation(4) |
3.1 Unique Feasible Solutions
In this section we show that when the number of subjects m is large or when a modified (conditional) Gaussian estimate is used, then a unique feasible correlation parameter estimate satisfying Equation(4)(4) exists.
A number of simulations in Al-Rawwash (Citation2001) corresponding to different values of the triplet (m,n,ρ) revealed that the cubic Eq. Equation(4)(4) has a unique feasible solution for m ⩾ 3. This provided a clue that for m large Equation(4)
(4) must have a unique feasible solution in (-1,1), which we show next. From Equation(4)
(4) and definitions of a1 and b1, it follows that f(−1) < 0 and f(1) > 0. Thus, if we show that f(ρ) is monotone in (-1,1) when m → ∞, then existence of a unique feasible root follows. A sufficient condition for f(ρ) to be monotone is for its derivative f′(ρ) = 3ρ2 − 2a1ρ + b1 to have at most one real root. But, this happens if and only if its discriminant
is nonpositive. Now, applying the law of large numbers (as m → ∞) to a1 and b1, it follows that
and the latter is nonpositive if and only if n ⩾ 4.
Another situation that gives rise to a unique feasible correlation parameter estimate for the AR(1) ’working’ correlation structure is obtained by replacing the objective function Equation(1)(1) by that corresponding to the conditional Gaussian likelihood function.
4 Large Sample Properties
This section presents the large sample properties of the Gaussian estimates and
. We establish their consistency and asymptotic normality as m → ∞, under certain regularity conditions. Chaganty (Citation1997) showed that the QLS estimate
is consistent, but
is asymptotically biased and the joint distribution of
is asymptotically normal. A corrected version of the QLS to eliminate the bias problem is proposed in Chaganty and Shults (Citation1999) and they considered markov and generalized markov structures to model the unbalanced correlation matrix cases. Thus, compared to the QLS estimate of α, our proposed GE of the correlation parameter is asymptotically unbiased and we intend to compare our results to the QLS as well as the C-QLS estimates. Our proofs are in the spirit of Chaganty (Citation1997); and Sen and Singer (Citation1993, p. 206). Crowder (Citation2001) has presented an alternative proof establishing consistency of the GE, but only asymptotic normality of
.
We start by introducing some preliminary facts and notation about the components of Equation(1)(1) . Let Yi = (yi1,yi2, ⋯ , yin)′,i = 1,2, ⋯ ,m be independent random vectors with E(Yi) = μi(β) and cov(Yi) = Σi(β,α) where the expectation and the covariance are taken under the true probability distribution. Also, let R(α) be the working correlation matrix and let
be the true correlation matrix.
Theorem 1
Let θ = (β,α)′, and set
(5)
If ∇vi(θ) and ∇2vi(θ) exist everywhere and satisfy 0 < E(∇2vi(θ)) < ∞, then
(a) | E(∇vi(θ)) = 0, | ||||
(b) | cov(∇vi(θ)) = ψi(θ), | ||||
(c) | E(∇2vi(θ)) = ξi(θ), |
wherewith the entries of the above matrices given by
Assume that if as m → ∞, we have
(6) where
Then,
where
is a (p × 1) column vector with one at the jth row and zero elsewhere,
is the true correlation structure and ⊗ is the Kronecker product. It is assumed throughout that components of
and
are finite.
Next, we present our results on the large sample properties of the GE of (β,α). The regularity conditions Equation(6)(6) needed to establish the consistency and asymptotic normality are similar to those used in the multivariate central limit theorem for independent but not identically distributed random vectors.
Theorem 2
Let θ = (β,α)′ and assume that Equation(6)(6) holds and as δ → 0 we have
Then, the Gaussian estimate
is consistent for (β,α) and we conclude that
Theorem 2 establishes the consistency and asymptotic normality of the GE of (β,α) and we outline the proof in the Appendix.
Theorem 3
The marginal distributions of the Gaussian estimates and
are asymptotically correlated and
In view of the previous theorems, we notice that if R(α) is correctly specified and is equal to , then we have ψi11(θ) = 2ξi11(θ), ψ11(θ) = 2ξ11(θ), and the asymptotic covariance of
reduces to
Also, the asymptotic distribution of
depends on the parameters β,α, φ and
while the asymptotic distribution of
depends on α and
but not on β. It also depends on the third and fourth moments of Yi, which may cause a loss of efficiency if these moments are misspecified. The sandwich-type covariance estimator protects against the impact of misspecification of R(α), however, we may lose some efficiency when using this estimator.
5 Data analysis
Consider the dataset from Potthoff and Roy (Citation1964) that consists of 27 subjects in a dental study (16 boys and 11 girls). The response yij is the distance, in mm, from the center of each subjects’ pituitary to pteryomaxillary fissure recorded at the age of 8, 10, 12 and 14. We treat the data set as normally distributed with a mean vector of μ = Xβ and covariance matrix Σ. Also, we consider the models used by Chaganty and Shults (Citation1999) and analyze the data using the GE and compare the results with those in Chaganty and Shults (Citation1999). First, we fit the model:where, xi1,xi2 are indicator variables for the two genders: boys and girls, respectively. The covariate xi3 is the subject’s age at the jth measurement time. The estimates of (β,α,φ) are computed using the Newton-Raphson iterative method assuming working correlation structures to be:
1. | AR(1) structure, | ||||
2. | An unstructured correlation matrix. |
contains the estimates and the standard errors for the parameters using the GEE, C-QLS and the GE with AR(1) working correlation structure. Similarly, contains the estimates when the working correlation matrix is unstructured. The standard error in both Tables were computed using the model-robust, sandwich-type estimator. From and , it can be seen that the estimates of the regression parameters are in a reasonable agreement, but the standard errors are slightly different. Overall, the GE and C-QLS have smaller standard errors than the GEE also the GE has a smaller standard errors for all estimates except for the estimate of γ2 in . Finally, show the estimates of the working correlation matrices using the GE for the AR(1) and the unstructured correlation matrix.
Table 1 Parameter estimates of the dental data using AR(1) structure.
Table 2 Parameter estimates of the dental data using unstructured correlation.
Table 3 Estimated correlation matrix using GE.
Notes
Peer review under responsibility of University of Bahrain.
References
- Al-Rawwash, M., 2001. Gaussian estimation and modeling covariance in longitudinal data analysis. Ph.D. thesis, Northern Illinois University, USA.
- M.Al-RawwashCovariance matrix estimation using repeated measurements when data are incompleteApplied Mathematics and Computation1672005299315
- M.Al-RawwashM.PourahmadiGaussian estimation and joint modeling of dispersions and correlations in longitudinal dataComputer Methods and Programs in Biomedicine822006106113
- P.BuzkovaT.LamleySemiparametric log-linear regression for longitudinal measurements subject to outcome-dependent follow-upJournal of Statistical Planning and Inference138200824502461
- N.R.ChagantyAn alternative approach to the analysis of longitudinal data via generalized estimating equationsJournal of Statistics Planning and Inference6319973954
- N.R.ChagantyJ.ShultsOn eliminating the asymptotic bias in the quasi-least squares estimate of the correlation parameterJournal of Statistics Planning and Inference761999145161
- M.CrowderOn the use of a working correlation matrix in using generalized linear models for repeated measurementsBiometrika821995407410
- M.CrowderOn repeated measures analysis with misspecified covariance structureJournal of Royal Statistical Society B6320015562
- P.DiggleK.LiangS.ZegerAnalysis of Longitudinal Data1994Oxford University PressOxford
- C.C.HeydeQuasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation1997Springer-VerlagNew York
- K.Y.LiangS.L.ZegerLongitudinal data analysis using generalized linear modelsBiometrika198673 1322
- S.R.LipsitzG.MolenberghsG.M.FitzmauriceJ.IbrahimGEE with Gaussian estimation of the correlations when data are incompleteBiometrics562000528536
- R.F.PotthoffS.N.RoyA generalized multivariate analysis of variance model useful especially for growth curve problemsBiometrika511964665680
- M.PourahmadiMaximum likelihood estimation of generalized linear multivariate normal covariance matrixBiometrika872000425435
- M.PourahmadiM.DanielsT.ParkSimultaneous modelling of the cholesky decomposition of several covariance matricesJournal of Multivariate Analysis982007568587
- P.SenJ.SingerLarge sample methods in statisticsAn Introduction with Applications1993Chapman and HallNew York
- J.ShultsN.R.ChagantyAnalysis of serially correlated data using quasi-least squaresBiometrics54199816221630
- E.F.VoneshV.M.ChinchilliLinear and Nonlinear Models for the Analysis of Repeated Measurements1997Marcel DekkerNew York
- Y.WangV.CareyWorking correlation structure misspecification, estimation and covariate design: implications for generalised estimating equations performanceBiometrika9020032941
- R.W.M.WedderburnQuasi-likelihood functions, generalized linear models, and the Gauss–Newton methodBiometrika611974439447
- P.WhittleGaussian estimation in stationary time seriesBulltien Instutite of International Statistics391961105112 livraison 2
Appendix A
Consider the objective functionwhere, ∣Σi∣ = det(Σi), θ = (β,α)′, R(α) is the working correlation matrix,
and β is a (p × 1) vector of unknown regression parameters while α is a (q × 1) vector of unknown correlation parameters. In this proof, we use arguments similar to those found in Sen and Singer (Citation1993, p. 206).
Define εm(t,θ) by
Under certain regularity conditions and using the Taylor series expansion for around θ and for ∥t∥ ⩽ K, 0 < K < ∞, we can rewrite εm(t,θ) as:
where θ∗ is a point on the line joining θ and
.
Now, adding and subtracting the term , we get:
(A.1) where,
Given δ > 0, there exists an integer m0 = m0(δ) such that
for all m ⩾ m0. Therefore, for sufficiently large m, we have
Hence, using Khintchine strong law of large numbers (Sen and Singer, Citation1993, p 69) and assuming that E(∇2vi(θ)) is finite, we havealmost surely as δ→ 0, therefore we have sup {∣ηm(t)∣,∥t∥ ⩽ K} → 0 almost surely as m → ∞.
This will be true since the value of θ∗ lies on the line joining θ and . Now
Hence, using the weak law of large numbers, we get
Therefore,
In order to find the value of t that minimizes the term εm(t, θ), we will find the first derivative of εm(t, θ) with respect to t and disregarding the term op(1), hence this will give us the following(A.2) Notice that using the definition of εm(t,θ), we conclude that
will also correspond closely to the Gaussian estimate of θ, which is attained at
, therefore we conclude that
can be written as follows
(A.3) But since
and using Theorem 1, Equation(A.2)
(A.2) , Equation(A.3)
(A.3) and using the weak law of large numbers, we conclude that:
(A.4) in probability as m → ∞.
The consistency of Gaussian estimates could be easily concluded from Equation(A.4)(A.4) . Finally, under the assumptions of A.2 ,A.3, A.4 and Equation(6)
(6) , the results in Theorem 1, the multivariate central limit theorem and Slutsky’s theorem, we conclude that
(A.5) which is the end of the proof.