304
Views
0
CrossRef citations to date
0
Altmetric
Original Article

Gaussian estimation of regression and correlation parameters in longitudinal dataFootnote

&
Pages 28-34 | Published online: 27 Mar 2018

Abstract

The moment estimate of the correlation parameters using generalized estimating equations (GEE) is not guaranteed to exist or to be feasible. We introduce the Gaussian estimation method and show that the estimate of the correlation parameters in longitudinal data setup is asymptotically unbiased and feasible. We derive the large sample properties of the regression and correlation estimates and we illustrate these estimators via real life data example.

1 Introduction

Longitudinal or repeated measures data arise in several areas of research including social, environmental, medical, biological sciences where many experimental units (human, animals, plants, schools, etc.) are measured repeatedly over time. The broad goal of such studies is to assess changes over time within units and relate them to the treatments, interventions and other relevant covariates using the idea of regression. The difficulty in exact specification of the distribution of nonnormal data (binary, count, etc.) precludes the use of maximum likelihood for parameter estimation. However, the idea of quasi-likelihood (QL) for independent observations (Wedderburn, Citation1974 and Heyde, Citation1997) requires only a model for the mean and the relationship between the mean and variance. Liang and Zeger (Citation1986) generalized estimating equations (GEE) method extended the framework of QL estimation to dependent longitudinal data by specifying models for the marginal means and variances. They relied on a ‘working’ correlation matrix with few parameters (say, α) to write an analog of the generalized least squares normal equations. Some pitfalls and limitations of the GEE estimator of α were pointed out by Crowder (Citation2001) and other researchers. For example, the moment estimator of the correlation parameter may not exist in some simple cases or it may not be feasible. Despite the list of drawbacks of GEE, researchers insisted on the fact that GEE usually works well in practice even in the misspecified structures (Crowder, Citation2001). Several articles proposed alternative approaches based on minimizing objective functions instead of solving some ad hoc estimating equations to obtain efficient estimators of β and α (Chaganty, Citation1997; Chaganty and Shults, Citation1999; Al-Rawwash, Citation2001; Crowder, Citation2001; Al-Rawwash and Pourahmadi, Citation2006).

Chaganty (Citation1997) relied on partial minimization of the generalized least squares criterion to introduce the quasi-least squares (QLS) estimates where the covariance matrix is a function of both regression and correlation parameters (β, α). The estimating equation for β is precisely the same as GEE, but the estimating equation for α is new and different from all previously developed approaches (see Diggle et al., Citation1994). Shults and Chaganty (Citation1998) used the QLS method for certain correlation models and showed the importance of the QLS compared to the GEE for unbalanced longitudinal data setup. The fact that the QLS estimator of the correlation parameter is asymptotically biased motivated Chaganty and Shults (Citation1999) to propose a method to eliminate this problem. They proposed using a continuous, one-to-one transformation that depends on the working correlation matrix under the QLS scheme. They presented the method for different correlation structures and obtained a bias corrected estimate of the correlation matrix. Pourahmadi (Citation2000) developed the maximum likelihood estimators for the parameters of a generalized linear model for the covariance matrix and discussed the asymptotic results under normality condition. Al-Rawwash (Citation2005) introduced a nonparametric approach to estimate the parameters of interest as an alternative to the classical parametric techniques. Al-Rawwash and Pourahmadi (Citation2006) introduced the generalized Gaussian likelihood function as their objective function and they used data-driven approach to obtain estimates of the parameters assuming minimal distributional requirements. In this paper, we follow the footsteps of Al-Rawwash and Pourahmadi (Citation2006) and use the generalized Gaussian likelihood function as the objective function to pursue the estimation of the regression and the correlation parameters.

The history of Gaussian estimation (GE) goes back to Whittle (Citation1961) in the time series literature where the parameter estimates are the values that maximize the Gaussian likelihood function for certain correlation structures. Several articles discussed the GE in longitudinal data literature where they focused on obtaining the parameter estimates based on minimizing an objective function even when the observations are not normally distributed (Crowder, Citation1995; Crowder, Citation2001; Al-Rawwash, Citation2001; Al-Rawwash and Pourahmadi, Citation2006). In this article, we assume only the knowledge of the first two moments and not the exact specification of the distribution of the data. The notion of Gaussian estimation is distinct from GEE in that it minimizes an objective function rather than relying on estimating equations that might produce flawed estimators (Chaganty, Citation1997; Al-Rawwash, Citation2001; Crowder, Citation2001). Several articles discussed the idea of modeling longitudinal data sets including but not limited to Vonesh and Chinchilli (Citation1997), Lipsitz et al. (Citation2000), Wang and Carey (Citation2003), Pourahmadi et al. (Citation2007) and Buzkova and Lamley (Citation2008)".

The organization of the paper is as follows. In Section 2, we introduce the Gaussian estimation and outline a Newton-Raphson iterative method for computing the estimates. Some closed form solutions for the estimates of the correlation parameter α is given in Section 3 when the correlation structure is AR(1). In Section 4, we establish the consistency and asymptotic normality of the regression and the correlation parameter estimates under mild regularity conditions. In Section 5, we provide a real life data analysis of the GE compared to other well known estimation procedures. Finally, we outline the proof of some results in the appendix.

2 The Gaussian estimation

To set the notation, we consider a random vector comprising the measurements taken on a generic subject at times ) and the associated covariates xj = (xj1, … ,xjp)′, j = 1, 2, … ,  ni, i = 1,2, ⋯ ,m. We plan to make no distributional assumption about Y except for its first two moments. Firstly, the means and variances of the entries of the response are related to the covariates xj via , var(yj) = φv(μj), where β = (β1, … , βp)′ is the regression parameters of primary interest, g is an invertible known link function, v(·) is a known variance function and φ is a dispersion parameter not depending on β. The importance of the parameter φ appears clearly when we handle a nonnormal data set for it expresses the variability beyond the mean. The setup may be carried out for unbalanced longitudinal data, however we focus in this article on the balanced case where ni = n. Secondly, we use the matrix R = R(α) to model correlations among the measurements on the same subject. The matrix R and its parameters α are usually viewed as nuisance, though in some situations they are of primary interest. Now, if we define μ = μ(β) = (μ1, … ,μn)′ and A(β) = diag(v(μ1), … ,v(μn)), then the covariance matrix of Y is decomposed into

In longitudinal data setup, we assume Yi, i = 1,2, ⋯ ,m, to be the vector of repeated measures on the ith subject with covariance . The Gaussian estimates (Al-Rawwash and Pourahmadi, Citation2006) of (β, α) are the minimizers of the objective function(1) where is the vector of partially standardized measurement on the ith subject. Even if G(.,.) is differentiable, its differentiation with respect to β is complicated because of the appearance of β in Ai(β). This problem is avoided by pretending that is a function of β different from β (sometimes called decoupling). Therefore, if we differentiate Equation(1) with respect to β, α and then substituting β for β we arrive at:(2) (3) where D(β) = diag(D1,D2, ⋯ ,Dm), for 1  ⩽ i ⩽ m, , A(β) = diag(A1(β),A2(β), ⋯ , Am(β)), and ⊗ is the kronecker product. The estimating Eqs. Equation(2) are precisely the GEEs for β, but Equation(3) is different from the estimating equations developed in the literature to estimate the correlation parameter. Solving these two equations for β and α could give the GE of (β,α). Since closed form solution is not available in general, we suggest using the Newton-Raphson iterative algorithm as follows. Firstly, we choose an initial value for β, then we evaluate the following quantities: , , and at . Accordingly, we solve Eq. Equation(3) for , then we compute and for 1 ⩽ i ⩽ m and construct the covariance matrix as well as and . Hence, we update the value of using . Finally, we repeat the process until , then we take as an estimate of β and as an estimate of α. The least squares estimates of β are a good candidate for the initial value of β, while estimates of α must be chosen so that the positive definiteness of the correlation matrix is guaranteed. The estimate of φ depends on the GE results of β and α obtained using Newton-Raphson iterative method which is reduced to The unbiasedness of the GE of (β,α) is not guaranteed when the working correlation structure is misspecified. However, we show in Section 4 that the GE of β and α is consistent when the working correlation matrix is correctly specified.

3 Specific correlation structures

In this section, we present closed form solutions for the correlation parameter of AR(1) correlation structure when the correlation matrix is either misspecified or correctly specified. These can be used in Newton–Raphson algorithm and hence avoid an iterative cycle within this algorithm. The impact of the correlation matrix misspecification on the degrees of estimating equations and the form of their solutions are illuminated. Chaganty (Citation1997) and Chaganty and Shults (Citation1999) considered special correlation structures and they derived closed form solutions for the estimator of the correlation parameter using the the QLS and the C-QLS. The asymptotic bias of the QLS estimator of the correlation parameter is corrected by Chaganty and Shults (Citation1999) using a continuous one-to-one function that transformed the biased correlation matrix to an asymptotically unbiased one. We show that GE of α is asymptotically unbiased and feasible when the number of measurements per subject n is greater than 3 or a slightly modified (conditional) form of the Gaussian estimation is used.

In the following example we assume that the true correlation matrix, denoted by , is unstructured, and we intend to give explicit formulas for the estimate of α by solving Equation(3) for the AR(1) ‘working’ correlation matrices.

Example 1.

Let the working correlation structure of the repeated measurements be the AR(1) matrix R(ρ) = (ρij) with the parameter α = ρ∈ (−1, 1). Note thatwhere C1 is a tridiagonal matrix with 0 on the diagonal and 1 on the first upper and lower diagonals and C2 = diag(0,1, ⋯ ,1,0). Since the entries of (1 − ρ2)R−1 are quadratic functions of ρ, the estimating Eq. Equation(3) reduces to the quadratic equationwhere , , is the current estimate of β and Zi is defined in Equation(1). Its (unique) feasible solution is given byprovided that ∣bm∣ > ∣am∣. This is in sharp contrast to the QLS estimate of the correlation parameter which is known to be feasible all the time (Chaganty, Citation1997). The differences between the coefficients of the quadratic equations for GE and QLS estimates of α are in the second terms in am and bm, for which with we have . Nevertheless, there are datasets for which ∣bm∣ < ∣am∣ and the quadratic equation have complex roots.

It is important to mention that the working and the true correlation matrices may have the same AR(1) structure which consequently makes corr(Yi) correctly specified. In this case, the estimating Eq. Equation(3) is cubic in ρ, therefore we shall derive closed formulas for its solutions and study their feasibility.

Example 2

Let both the working and true correlation matrices have AR(1) structures with R(ρ) and R−1(ρ) given in Example 1. Then, Equation(3) reduces to the cubic equation(4) where , and with is defined in Equation(1). To solve the cubic Eq. Equation(4), we follow the standard procedure and set whereand for later use define

where the upper sign in γ1 is used if v is positive and the lower sign if v is negative. The nature and number of feasible roots of Equation(4) depend on the signs of Δ and u as discussed below:

Case 1:

If Δ > 0 and u > 0, then Equation(4) has only one real feasible root, given by

Case 2:

If Δ < 0 and u < 0 , then Equation(4) has three real roots given bywhere , the upper sign in γ is used if v is positive, otherwise, we use the lower sign. Notice that , but the number of feasible roots depends on the data and cannot be decided in advance.

Case 3:

If Δ = 0 and u < 0, then Equation(4) has the unique feasible closed form solution:where the upper sign in is used if v is positive, otherwise we use the lower sign.

3.1 Unique Feasible Solutions

In this section we show that when the number of subjects m is large or when a modified (conditional) Gaussian estimate is used, then a unique feasible correlation parameter estimate satisfying Equation(4) exists.

A number of simulations in Al-Rawwash (Citation2001) corresponding to different values of the triplet (m,n,ρ) revealed that the cubic Eq. Equation(4) has a unique feasible solution for m ⩾ 3. This provided a clue that for m large Equation(4) must have a unique feasible solution in (-1,1), which we show next. From Equation(4) and definitions of a1 and b1, it follows that f(−1) < 0 and f(1) > 0. Thus, if we show that f(ρ) is monotone in (-1,1) when m → ∞, then existence of a unique feasible root follows. A sufficient condition for f(ρ) to be monotone is for its derivative f′(ρ) = 3ρ2 − 2a1ρ + b1 to have at most one real root. But, this happens if and only if its discriminant is nonpositive. Now, applying the law of large numbers (as m → ∞) to a1 and b1, it follows thatand the latter is nonpositive if and only if n ⩾ 4.

Another situation that gives rise to a unique feasible correlation parameter estimate for the AR(1) ’working’ correlation structure is obtained by replacing the objective function Equation(1) by that corresponding to the conditional Gaussian likelihood function.

4 Large Sample Properties

This section presents the large sample properties of the Gaussian estimates and . We establish their consistency and asymptotic normality as m → ∞, under certain regularity conditions. Chaganty (Citation1997) showed that the QLS estimate is consistent, but is asymptotically biased and the joint distribution of is asymptotically normal. A corrected version of the QLS to eliminate the bias problem is proposed in Chaganty and Shults (Citation1999) and they considered markov and generalized markov structures to model the unbalanced correlation matrix cases. Thus, compared to the QLS estimate of α, our proposed GE of the correlation parameter is asymptotically unbiased and we intend to compare our results to the QLS as well as the C-QLS estimates. Our proofs are in the spirit of Chaganty (Citation1997); and Sen and Singer (Citation1993, p. 206). Crowder (Citation2001) has presented an alternative proof establishing consistency of the GE, but only asymptotic normality of .

We start by introducing some preliminary facts and notation about the components of Equation(1). Let Yi = (yi1,yi2, ⋯ , yin)′,i = 1,2, ⋯ ,m be independent random vectors with E(Yi) = μi(β) and cov(Yi) = Σi(β,α) where the expectation and the covariance are taken under the true probability distribution. Also, let R(α) be the working correlation matrix and let be the true correlation matrix.

Theorem 1

Let θ = (β,α), and set(5)

If ∇vi(θ) and ∇2vi(θ) exist everywhere and satisfy 0 < E(∇2vi(θ)) < ∞, then

(a)

E(∇vi(θ)) = 0,

(b)

cov(∇vi(θ)) = ψi(θ),

(c)

E(∇2vi(θ)) = ξi(θ),

wherewith the entries of the above matrices given byAssume that if as m → ∞, we have(6) whereThen,where is a (p ×  1) column vector with one at the jth row and zero elsewhere, is the true correlation structure and ⊗ is the Kronecker product. It is assumed throughout that components of and are finite.

Next, we present our results on the large sample properties of the GE of (β,α). The regularity conditions Equation(6) needed to establish the consistency and asymptotic normality are similar to those used in the multivariate central limit theorem for independent but not identically distributed random vectors.

Theorem 2

Let θ = (β,α)and assume that Equation(6) holds and as δ → 0 we haveThen, the Gaussian estimate is consistent for (β,α) and we conclude thatTheorem 2 establishes the consistency and asymptotic normality of the GE of (β,α) and we outline the proof in the Appendix.

Theorem 3

The marginal distributions of the Gaussian estimates and are asymptotically correlated and

In view of the previous theorems, we notice that if R(α) is correctly specified and is equal to , then we have ψi11(θ) = 2ξi11(θ), ψ11(θ) = 2ξ11(θ), and the asymptotic covariance of reduces to Also, the asymptotic distribution of depends on the parameters β,α, φ and while the asymptotic distribution of depends on α and but not on β. It also depends on the third and fourth moments of Yi, which may cause a loss of efficiency if these moments are misspecified. The sandwich-type covariance estimator protects against the impact of misspecification of R(α), however, we may lose some efficiency when using this estimator.

5 Data analysis

Consider the dataset from Potthoff and Roy (Citation1964) that consists of 27 subjects in a dental study (16 boys and 11 girls). The response yij is the distance, in mm, from the center of each subjects’ pituitary to pteryomaxillary fissure recorded at the age of 8, 10, 12 and 14. We treat the data set as normally distributed with a mean vector of μ =  and covariance matrix Σ. Also, we consider the models used by Chaganty and Shults (Citation1999) and analyze the data using the GE and compare the results with those in Chaganty and Shults (Citation1999). First, we fit the model:where, xi1,xi2 are indicator variables for the two genders: boys and girls, respectively. The covariate xi3 is the subject’s age at the jth measurement time. The estimates of (β,α,φ) are computed using the Newton-Raphson iterative method assuming working correlation structures to be:

1.

AR(1) structure,

2.

An unstructured correlation matrix.

contains the estimates and the standard errors for the parameters using the GEE, C-QLS and the GE with AR(1) working correlation structure. Similarly, contains the estimates when the working correlation matrix is unstructured. The standard error in both Tables were computed using the model-robust, sandwich-type estimator. From and , it can be seen that the estimates of the regression parameters are in a reasonable agreement, but the standard errors are slightly different. Overall, the GE and C-QLS have smaller standard errors than the GEE also the GE has a smaller standard errors for all estimates except for the estimate of γ2 in . Finally, show the estimates of the working correlation matrices using the GE for the AR(1) and the unstructured correlation matrix.

Table 1 Parameter estimates of the dental data using AR(1) structure.

Table 2 Parameter estimates of the dental data using unstructured correlation.

Table 3 Estimated correlation matrix using GE.

Notes

Peer review under responsibility of University of Bahrain.

References

  • Al-Rawwash, M., 2001. Gaussian estimation and modeling covariance in longitudinal data analysis. Ph.D. thesis, Northern Illinois University, USA.
  • M.Al-RawwashCovariance matrix estimation using repeated measurements when data are incompleteApplied Mathematics and Computation1672005299315
  • M.Al-RawwashM.PourahmadiGaussian estimation and joint modeling of dispersions and correlations in longitudinal dataComputer Methods and Programs in Biomedicine822006106113
  • P.BuzkovaT.LamleySemiparametric log-linear regression for longitudinal measurements subject to outcome-dependent follow-upJournal of Statistical Planning and Inference138200824502461
  • N.R.ChagantyAn alternative approach to the analysis of longitudinal data via generalized estimating equationsJournal of Statistics Planning and Inference6319973954
  • N.R.ChagantyJ.ShultsOn eliminating the asymptotic bias in the quasi-least squares estimate of the correlation parameterJournal of Statistics Planning and Inference761999145161
  • M.CrowderOn the use of a working correlation matrix in using generalized linear models for repeated measurementsBiometrika821995407410
  • M.CrowderOn repeated measures analysis with misspecified covariance structureJournal of Royal Statistical Society B6320015562
  • P.DiggleK.LiangS.ZegerAnalysis of Longitudinal Data1994Oxford University PressOxford
  • C.C.HeydeQuasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation1997Springer-VerlagNew York
  • K.Y.LiangS.L.ZegerLongitudinal data analysis using generalized linear modelsBiometrika198673 1322
  • S.R.LipsitzG.MolenberghsG.M.FitzmauriceJ.IbrahimGEE with Gaussian estimation of the correlations when data are incompleteBiometrics562000528536
  • R.F.PotthoffS.N.RoyA generalized multivariate analysis of variance model useful especially for growth curve problemsBiometrika511964665680
  • M.PourahmadiMaximum likelihood estimation of generalized linear multivariate normal covariance matrixBiometrika872000425435
  • M.PourahmadiM.DanielsT.ParkSimultaneous modelling of the cholesky decomposition of several covariance matricesJournal of Multivariate Analysis982007568587
  • P.SenJ.SingerLarge sample methods in statisticsAn Introduction with Applications1993Chapman and HallNew York
  • J.ShultsN.R.ChagantyAnalysis of serially correlated data using quasi-least squaresBiometrics54199816221630
  • E.F.VoneshV.M.ChinchilliLinear and Nonlinear Models for the Analysis of Repeated Measurements1997Marcel DekkerNew York
  • Y.WangV.CareyWorking correlation structure misspecification, estimation and covariate design: implications for generalised estimating equations performanceBiometrika9020032941
  • R.W.M.WedderburnQuasi-likelihood functions, generalized linear models, and the Gauss–Newton methodBiometrika611974439447
  • P.WhittleGaussian estimation in stationary time seriesBulltien Instutite of International Statistics391961105112 livraison 2

Appendix A

Consider the objective functionwhere, ∣Σi∣ = deti), θ =  (β,α)′, R(α) is the working correlation matrix, and β is a (p × 1) vector of unknown regression parameters while α is a (q × 1) vector of unknown correlation parameters. In this proof, we use arguments similar to those found in Sen and Singer (Citation1993, p. 206).

Define εm(t,θ) by

Under certain regularity conditions and using the Taylor series expansion for around θ and for ∥t∥ ⩽ K, 0 < K < ∞, we can rewrite εm(t,θ) as:where θ is a point on the line joining θ and .

Now, adding and subtracting the term , we get:(A.1) where,Given δ > 0, there exists an integer m0 = m0(δ) such thatfor all m ⩾ m0. Therefore, for sufficiently large m, we have

Hence, using Khintchine strong law of large numbers (Sen and Singer, Citation1993, p 69) and assuming that E(∇2vi(θ)) is finite, we havealmost surely as δ→ 0, therefore we have sup {∣ηm(t)∣,∥t∥ ⩽ K} → 0 almost surely as m → ∞.

This will be true since the value of θ lies on the line joining θ and . NowHence, using the weak law of large numbers, we getTherefore,

In order to find the value of t that minimizes the term εm(t, θ), we will find the first derivative of εm(t, θ) with respect to t and disregarding the term op(1), hence this will give us the following(A.2) Notice that using the definition of εm(t,θ), we conclude that will also correspond closely to the Gaussian estimate of θ, which is attained at , therefore we conclude that can be written as follows(A.3) But since and using Theorem 1, Equation(A.2), Equation(A.3) and using the weak law of large numbers, we conclude that:(A.4) in probability as m → ∞.

The consistency of Gaussian estimates could be easily concluded from Equation(A.4). Finally, under the assumptions of A.2 ,A.3, A.4 and Equation(6), the results in Theorem 1, the multivariate central limit theorem and Slutsky’s theorem, we conclude that(A.5) which is the end of the proof.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.