1,627
Views
1
CrossRef citations to date
0
Altmetric
STATISTICS

Dealing with covariate measurement error in a clustered cross-sectional survey

ORCID Icon, , &
Article: 1945743 | Received 13 Feb 2021, Accepted 15 Jun 2021, Published online: 09 Jul 2021

Abstract

Many surveys are often complex cross-sectional studies that involve clustered data. Such surveys can have the additional complexity of the measurement error problem. Ignoring the measurement error problem and the clustering aspect may lead to incorrect inferences and conclusions. The purpose of this study was to demonstrate the application of regression calibration to correct for covariate measurement error in a clustered cross-sectional survey in a generalized estimating equations (GEE) framework. Methods that ignore both covariate measurement error and within-cluster correlation structure are compared to the proposed regression calibration-GEE method. The study found that clustering does not affect the association estimates adjusted for measurement error using regression calibration. However, the standard errors of the coefficient estimates are overestimated or underestimated in methods that ignore the within-cluster dependency despite adjusting for measurement error. Specifically, for clusters of size 10 and under unstructured and exchangeable correlation structure, the standard error was about 10.3% higher and 13.6% lower, respectively, in the method that ignores the within-cluster dependency than in the proposed method. From the findings of this study, we conclude that it is important to adjust for covariate measurement error in clustered data, while accounting for the within-cluster correlation.

PUBLIC INTEREST STATEMENT

Cross-sectional surveys are widely used to collect data from the population of interest. Features such as stratification and sampling weights form a critical part in designing surveys. Data collected from surveys are prone to measurement error. Measurement error in covariates/exposures is often ignored in statistical analyses, despite its adverse effects on the results. This study provides insights on how to model the association between an outcome and a covariate, while adjusting for measurement error in the covariate and addressing the within-cluster dependencies in clustered cross-sectional data. We hope that the findings of this study will positively impact how the public handles data from cross-sectional surveys. This will help present correct inferences from statistical analyses of survey data, advance science faster, and benefit society.

1. Introduction

Many surveys are often complex in design and cross-sectional in nature. These surveys make use of data collection tools that are prone to measurement error, for instance, self-reported questionnaires. Measurement error (ME) in exposures (or covariates) biases the association between the covariate and an outcome. The bias can be in any direction depending on the error structure (Agogo, Citation2017; Fosgate, Citation2006; Fuller, Citation2009; Hill & Kleinbaum, Citation2014; Stefanski, Citation1985). Study designs range from simple designs to complicated ones. In many cross-sectional surveys with complex study design features, data within clusters are usually correlated (Akter et al., Citation2018; Hanley et al., Citation2003; Liang & Zeger, Citation1993; Neuhaus et al., Citation1991; Santos et al., Citation2008). Analysis of such data using standard methods that ignore covariate ME and clustering, may lead to invalid inference and conclusions. Regression calibration (RC) is a popular technique for adjusting for ME in a continuous covariate. Regression calibration is the conditional expectation of the true covariate, given the measured covariate and a vector of error-free covariates (Agogo et al., Citation2014; Carroll et al., Citation2006; Carroll & Stefanski, Citation1990; Freedman et al., Citation2008; Gleser, Citation1990). In a clustered survey, generalized estimating equations (GEE) approach is commonly used to account for the within-cluster dependencies, while estimating the association parameter of interest (Hanley et al., Citation2003; Liang & Zeger, Citation1986; Zeger & Liang, Citation1986).

Currently, there is limited research focusing on correcting for covariate ME, while accounting for survey design simultaneously in cross-sectional surveys. In this work, we demonstrated how to apply RC in a GEE context to correct for covariate ME while accounting for within-cluster correlation. We re-emphasize the need to correct for ME in a covariate and simultaneously allow for correlation structure in clustered data.

The other sections of this paper are organized as follows: In section 2, we present the methods and materials for this study. Specifically, in section 2, we review the RC method and GEE approach, describe the simulation design and provide a real-data example. Simulation and real data results are presented in section 3. Section 4 provides a discussion and concluding remarks.

2. Methods and materials

2.1. Regression calibration method

Usually, in epidemiological studies, it is impossible to observe the true covariate of interest, X. Instead, we observe a mismeasured covariate, Q. Regression calibration was first proposed by Carroll and Stefanski (Citation1990), and Gleser (Citation1990) as a method for correcting ME in the covariates. Regression calibration involves approximation of the conditional expectation of the true covariate given the mismeasured covariate and a vector of error-free covariates (Freedman et al., Citation2008; Guolo, Citation2008; Küchenhoff & Carroll, Citation1997). The basic idea of RC is to replace X, which is unobservable, with an estimate Qˆcalib, a function of the error-prone covariate Q and a vector of error-free covariates Z. Regression calibration is applicable under the assumptions that: (i) the measurement error in the observed covariate Q is non-differential with respect to X, and a vector of error-free covariates. Non-differential error occurs when the measured covariate contains no extra information about the outcome other than what is contained in true covariate (Carroll et al., Citation2006), and (ii) the measurement error in the unbiased measurement, say, R of the true covariate X is uncorrelated with the measurement error in the observed covariate Q and with the true covariate, X. Noteworthy, R is a reference measurement from the calibration study.

Regression calibration is implemented in two main steps:

Step 1. Estimating the calibration function. This involves estimating the conditional expectation of X given Q and Z, denoted by

(1) E[X|Q,Z]=Qˆcalib,(1)

where Qˆcalib is the calibrated version of Q. In the calibration function in equation (1) above, the unobservable true covariate X is replaced with R, which can be obtained from a validation, replication or instrumental data. Therefore, Equationequation (1) can be re-expressed as

(2) E[R|Q,Z]=Qˆcalib.(2)

Step 2. Using Qˆcalib instead of Q in the standard analysis to obtain the parameter estimate that quantifies the association between the outcome and the covariate of interest given the error-free covariates.

2.2. The GEE approach

Zeger and Liang (Citation1986) proposed the GEE to extend generalized linear models (GLMs) to analyzing correlated observations. The GEE approach requires the specification of the first two moments (mean and variance) of responses from the same cluster and a working correlation rather than the full specification of the joint distribution (Akter, Sarker, & Rahman, Citation2018). The GEE yields asymptotically unbiased regression coefficient estimates regardless of the specified correlation structure. The GEE estimates have marginal population-averaged interpretation.

Assume that a population of size N is divided into K non-overlapping clusters of sizes ni (i=1,2,,K) such that iKni=N. Let Yij, j=1,2,,ni be the jth response from the ith cluster and Xij=xij1,,xijp be a vector of the corresponding p covariates. Using the GLM framework, the marginal expectation E(Yi|Xi)=μi=(μi1,μi2,μini) can be modeled as ϕμi=Xiβ, where β=β0,β1,,βp is a p-dimensional vector of regression coefficients to be estimated, Xi is a matrix whose first column is a vector of 1’s corresponding to the intercept terms and ϕ. is the appropriate link function. For a binary response variable a logit link can be used such that the mean model can be expressed as

(3) E(Yij|Xij)=μij=expXiβ1+expXiβ.(3)

We denote the working covariance matrix by Vi=Ai1/2ρiαAi1/2, where Ai is a diagonal matrix with a known variance function vμij and ρiα is the corresponding working correlation matrix, which depends on some vector of parameters α which is generally unknown. Assuming that the structure of ρiα is known, the regression parameters β can be estimated by solving the GEE,

(4) Uβ,α=i=1KDiTVi1yiμi=0(4)

where Di=μi/β

The four commonly used correlation structures include the exchangeable, independence, auto-regressive (AR) and unstructured structures. In the exchangeable structure, it is assumed that any two observations within a cluster are equally correlated with correlation ρ (fixed) but observations between clusters are assumed to be uncorrelated. For the ith cluster with size ni the exchangeable (or compound symmetry) correlation matrix can be expressed as follows:

(5) ρiα=1ρ12ρ13ρ1niρ211ρ23ρ2niρni1ρni2ρni31=1ρρρρ1ρρρρρ1.(5)

Horton and Lipsitz (Citation1999) proposed the exchangeable structure as the appropriate correlation structure for handling data from a complex clustered design, where observations from the same cluster are not ordered chronologically such as in the case of longitudinal data.

Under the independent (or scaled identity) correlation structure, it is assumed that there is no correlation between observations hence, no need for GEE. The independent working correlation matrix for the ith cluster can be expressed as follows:

(6) ρiα=1ρ12ρ13ρ1niρ211ρ23ρ2niρni1ρni2ρni31=100001000001.(6)

In the AR correlation structure which is more appropriate for observations made over time from the same unit, repeated observations that are close together in time are strongly correlated, and the correlation becomes weaker and weaker as repeated observations get further in time. The correlation between, say the ath and bth observations in cluster i is given by ρiα=ρab, where 0ρ1, as shown in the AR(1) correlation matrix below:

(7) ρiα=1ρ12ρ13ρ1niρ211ρ23ρ2niρni1ρni2ρni31=1ρρ2ρ1niρ1ρρ2niρni1ρni2ρni31.(7)

In the unstructured correlation structure, no constraints are put, and the correlation between different observations in a cluster can be different. Though this correlation structure is flexible, fitting such a correlation structure becomes computationally costly, as the number of parameters to be estimated increases with an increase in the number of observations in a cluster.

2.2.1. GEE procedure

Zeger and Liang (Citation1986) proposed an iterative procedure for obtaining the GEE estimates βˆ of β under exchangeable correlation structure. The first step involves choosing the initial estimate β0 of β, obtained by fitting a GLM considering the independence working correlation. In the second step, we set βˆ=β0 and calculate moment estimate αˆ of α, for instance, for exchangeable working correlation matrix ρα is calculated as

(8) αˆ=1Ki=1K1nini1jknisijsik(8)

where sij=yijμijvμij

In the third step, the working correlation matrix ραˆ obtained in the second step is used to update the current estimate βˆt using the Newton–Raphson method as

(9) βˆt+1=βˆt+Iβ,α1|β=βˆtUβ,α|β=βˆt.(9)

Steps two and three are repeated until convergence to obtain βˆ of β.

The standard error (SE) of the GEE estimate is commonly calculated using the sandwich-based robust method. This is because the sandwich-based robust estimator is consistent and asymptotically unbiased, even under the mis-specification of the working correlation structure. The variance of βˆ, varβˆ is obtained by substituting the estimate of β at each iteration, and updating the following equation for the final estimate:

(10) varβˆ=i=1NDiTVi1Di1i=1NDiTVi1covYiVi1Dii=1NDiTVi1Di1,(10)

where covYi=EYiμi(Yiμi)

2.3. Monte Carlo simulations

In this study, we first use Monte Carlo simulations to show the application of RC in GEE for analyzing clustered data when the covariate is subject to ME. The simulations were conducted in R software. This section provides details of the simulation design, a description of the methods used and how the methods are evaluated.

2.3.1. Simulation design

For simplicity and without loss of generality, we focus on the following binary logit model with two regressors, one of which is subject to additive ME

(11) logitPYij=1=0.2Xij1+0.8Xij2,(11)
(12) Qij=Xij1+Uij,(12)

where X1N5,4, X2 is binary covariate (assumed to be error-free), Q is the mis-measured version of X1. The additive error U is assumed to follow a normal distribution with mean 0 and variance σU2=25. Noteworthy, the binary outcome Y is generated based on X1, X2, and a pre-defined working correlation structure using the rbin function in SimCorMultRes package (Touloumis, Citation2016).

The unbiased version, R, of X1 is simulated such that it contains a small additive ME, u,

(13) Rij=Xij1+uij,(13)

where uN0,0.01

We generate a total of 100 clusters with cluster sizes, ni5,10,30,90,200, assuming the commonly used correlation structures described in section 2.2. For illustrative purposes, the following ni×ni working correlation matrices are used in the simulation of the clustered observations:

1. For exchangeable correlation structure, we use a working correlation matrix of the form:

(14) ρiα=10.9250.9250.9250.92510.9250.9250.9250.9250.9251.(14)

2. AR(1) working correlation matrix is generated as

(15) ρiα=10.9250.92520.9251ni0.92510.9250.9252ni0.925ni10.925ni20.925ni31,(15)

3. For unstructured working correlation, we first generate a positive definite covariance matrix, and then convert it to a correlation matrix. This is implemented in the clusterGeneration package (Qiu et al., Citation2015).

4. For the independence correlation structure, we model the simulated data using GLM.

Survey weights form a key feature of complex-clustered surveys and are used to ensure that statistics calculated from data are more representative of the population of interest. To incorporate this feature, the binary covariate X2 is simulated such that it contains two possible values, that is, Male and Female, with probabilities 0.6 and 0.4, respectively. To account for the simulation of the values of X2 with unequal probabilities, we use the rake function in the survey package (Lumley & Lumley, Citation2007)  to create weights for the simulated clustered data.

2.3.2. Calibration and methods description

The calibrated version of the observable mis-measured version of X1, Qˆcalib, is the predicted value obtained in the linear regression of R on Q, and the error-free covariate, X2. Thus the calibrated exposure variable of interest is given by

(16) Qˆijcalib=E[Rij|Qij,Xij2].(16)

We compare the estimates of the association between the outcome and the covariate of interest obtained from the following described methods:

M1 True GEE: This method relates the outcome (Y) and true simulated covariate (X1) and an error-free covariate (X2), taking into consideration the within-cluster correlation structure.

M2 Naive GEE: In this method, we modeled the association between Y and (Q, X2), taking into consideration the within-cluster correlation structure.

M3 Calibrated GEE: A method taking into consideration the correlation structure of observations within a cluster and relating Y and (Qcalib, X2).

M4 True GLM: In this method, we modeled the association between Y and (X1, X2) without taking into consideration the within-cluster correlation.

M5 Naive GLM: A method that ignores both the covariate ME and within-cluster dependencies.

M6 Calibrated GLM: This method related Y and (Qcalib, X2) ignoring the within-cluster dependencies.

The methods are summarized in the flow-chart diagram shown in

Figure 1. Flow-chart diagram for the methods to be compared

Figure 1. Flow-chart diagram for the methods to be compared

2.3.3. Model evaluation

Our interest is in the coefficient estimate βˆ1 of the parameter β1=0.2, which quantifies the association between Y and X1. Models comparison is based on the following:

  1. Relative bias in βˆ1: Rel.bias (βˆ1) =biasβˆ1β1,

  2. Empirical standard error of βˆ1: SE (βˆ1) =1M1Mc=1(βˆ1,cβˆ1)2,

  3. Mean squared error of βˆ1: MSEβˆ1=SEβˆ1]2+biasβˆ1]2, where βˆ1=1MMc=1βˆ1,c, biasβˆ1=βˆ1β1 and βˆ1,c is the parameter estimate from the cth simulated data set (Burton et al., Citation2006).

We compared the results obtained by using the methods described in section 2.3.2, under correctly specified within-cluster correlation structure and different cluster sizes (ni). We also compared the results from the different methods when the within-cluster correlation structure is mis-specified. The simulations were repeated 500 times. A random seed was used to ensure the reproducibility of the results. We provide the mean coefficient estimates and Monte Carlo standard errors in the supplemental data for this article.

2.4. Application to real data

In this study, we illustrate the use of RC to correct for covariate ME in real clustered cross-sectional data. Specifically, we used a subset data of cigarette smokers extracted from the South African National Health and Nutrition examination survey 2011–2012 (SANHANES-1). The survey applied a stratified cluster sampling approach (Human Sciences Research Council, Citation2017). Enumeration areas (EAs) were the primary sampling units. The selection of EAs was stratified by province. Responses from the same EA are likely to be correlated in this survey, since they share the same cluster information. We focused on modeling the association between coughing status and smoking. In the study, smoking was quantified using the self-reported average number of cigarettes smoked per week. In addition to the average number of cigarettes smoked per week, some smokers reported the number of cigarettes smoked daily. The self-reported number of cigarettes smoked weekly is prone to ME, and therefore using such in modeling the association between coughing and smoking, yields biased estimates of the association.

We first adjusted for ME in the average number of cigarettes smoked per week before modeling the association between coughing and smoking. In this study, the number of cigarettes smoked daily was used to calibrate those smoked weekly in the following RC setting:

(17) E[Rij|Qij,Zij]=Qˆijcalib,(17)

where for jth response in the ith cluster, Rij= the number of cigarettes smoked daily, Qij= the number of cigarettes smoked weekly, Zij is an error-free covariate (in this case, gender) and Qˆijcalib= the calibrated number of cigarettes smoked weekly.

Taking into consideration the survey design features (i.e. clustering, stratification and sampling weight), we modeled the association between coughing status (1 = Yes, 0 = No), and the calibrated number of cigarettes as follows:

(18) ϕ[E(Yij|Qˆijcalib,Zij)]=βˆ0+βˆ1Qˆijcalib+βˆ2Zij,(18)

where ϕ. is a logit link function, Yij is the coughing status of the jth individual from the ith cluster (EA), β0 = the intercept term, βˆ1 = the coefficient estimate for the calibrated number of cigarettes and βˆ2 is the coefficient estimate for gender. We compared βˆ1 and its SE with those obtained when using a naive model under different correlation structure considerations.

3. Results

3.1. Simulation results

shows the relative bias, standard error (SE), and the mean squared error (MSE) of the estimate of the association between the outcome, and the covariate of interest obtained using the methods described in section 2.3.2, under consideration of different cluster sizes and correctly specified working correlation structures. We considered clusters with 5, 10, 30, 90 and 200 observations. This facilitates a comparison of how the models perform at different cluster sizes.

Table 1. Comparison of relative bias, SE and MSE of the estimate of the association between the outcome and covariate of interest obtained using different methods under different cluster sizes with correctly specified dependency structure (True parameter, β1=0.2)

The relative bias of the regression coefficient estimates obtained using the calibrated GEE, and calibrated GLM under different cluster sizes, and correctly specified correlation structures was close to zero. As the clusters become bigger, the relative bias approaches zero (). Negative relative bias is obtained when naive methods are used.

The results further showed that when the exchangeable and AR(1) correlation structures are correctly specified in clusters with 5,10 and 30 observation, the SE obtained when using the calibrated GEE method is larger than that obtained when using the calibrated GLM method. The SEs obtained in bigger clusters are essentially the same, for instance, for correctly specified AR(1) and ni=90, the SE obtained from both calibrated GEE and calibrated GLM is 0.014 and for ni=200, the SE is 0.009. A similar pattern is observed for the SEs obtained from naive methods. When the unstructured correlation structure is correctly specified, the SEs obtained under-calibrated GEE are slightly lower than those obtained with calibrated GLM.

The MSEs obtained when using the calibrated methods are smaller and closer to zero than those from the naive methods. With the naive methods, the MSEs remain the same regardless of the cluster size. However, for calibrated methods, the MSEs are larger in small clusters (ni=5,10) than in large clusters (ni30). Specifically, the MSEs obtained when using calibrated methods in large clusters are approximately equal to zero.

Presented in are the results for the comparison of relative bias, SE, and MSE for the coefficient estimate of the association between the outcome and covariate of interest obtained using different methods, with a correctly specified and mis-specified within-cluster dependency structure. With the calibrated GEE method, mis-specifying exchangeable correlation structure as AR(1) resulted in relatively higher bias. However, with the naive GEE, mis-specifying the correlation structure does not change the relative bias. A similar pattern is observed when AR(1) dependency structure is mis-specified as exchangeable. With the calibrated GEE method, mis-specifying the unstructured dependency structure as either exchangeable or AR(1) results in higher relative biases and SEs. Similar SEs are obtained under mis-specification of exchangeable and AR(1) correlation structures, whereas slightly higher SEs are obtained under the mis-specification of the unstructured correlation structure. The MSEs remain unchanged under the mis-specification of the dependency structures. For further details, see Table S 2 in the supplemental data for this article.

Table 2. Comparison of relative bias, SE and MSE for the estimate of the association between the outcome and covariate of interest obtained using different methods with correctly specified and mis-specified dependency structure (ni=10)

3.2. Real application results

Presented in are the results obtained from analyzing real data as described in section 2.4. The results show that using the number of cigarettes smoked per week before adjusting for ME yielded lower odds of coughing than when the covariate is adjusted for ME. For instance, considering the exchangeable correlation structure, the odds of coughing is found to increase by 0.1% OddsRatio=e0.0011 per unit increase in the number of cigarettes smoked per week, under the naive model and by 0.4% when the number of cigarettes is adjusted for ME. Noteworthy, the coefficient estimates are approximately similar across the correlation structures considered but the SEs are different. The P-values obtained under the independence correlation structure are smaller than those obtained under either the exchangeable or AR(1) correlation structures.

Table 3. The estimate of the association between coughing status and the number of cigarettes smoked, βˆ1, alongside its standard error (SE) and the P-value

4. Discussion and conclusion

In this study, we have shown the application of RC in GEE for analyzing data when the covariate is subject to ME. In the simulation study, we compared results from naive and calibrated models under a correctly specified and mis-specified correlation structure. The relative bias of the regression coefficient estimates obtained using both the calibrated GEE and calibrated GLM models across different cluster sizes were close to zero, an indication that the coefficient estimates obtained after adjusting for covariate ME closely approximated the true coefficient. Furthermore, the results imply that RC is not sensitive to changes in cluster sizes and the within-cluster dependencies.

The negative relative bias obtained under the naive GLM is an indication that ignoring the covariate ME, led to the underestimation of the true coefficient. Our finding is in line with Stefanski et al. (Citation1985), who noted that ME in covariates attenuates predicted probabilities in the logistic regression. Similarly, the underestimation effect was also observed in the method that considered the dependency structure but ignored the covariate ME. This is a clear indication that covariate ME in clustered data can lead to underestimation of the true association between the covariate and an outcome.

As expected, the SEs and the MSEs of the coefficient estimates were found to decrease with an increase in cluster sizes, due to the reduced uncertainty in estimating the true coefficient. Differences in SEs of the coefficient estimates obtained from the GLM and GEE models can be attributed to the within-cluster correlations. Small MSEs obtained when using the calibrated methods than when using the naive methods imply that better estimates are obtained under the calibrated models.

The results from the comparison of relative bias, SEs and MSEs of the coefficient estimate of the association between an outcome and a covariate subject to ME obtained under the mis-specification of within-cluster correlation structure, has some implications (i) mis-specifying exchangeable working correlation structure as AR(1) and vice-versa can yield approximately similar results; (ii) mis-specifying unstructured correlation structure as either exchangeable or AR(1), can result into either smaller or larger coefficient estimates and SEs. AR(1) correlation structure is commonly used in longitudinal data and therefore, as proposed by Horton and Lipsitz (Citation1999), and from the findings of our study, exchangeable correlation structure may be the only stable option for handling clustered cross-sectional data.

As a motivating example, we showed in this study, the use of RC to correct for ME in cross-sectional data from SANHANES-1. The results re-affirmed that ignoring ME in a covariate can underestimate the association between the covariate and an outcome in complex surveys. Furthermore, the results showed that ignoring the structure of correlation in clustered data can underestimate the SEs of the coefficient estimates (Hu et al., Citation1998; Ghisletta & Spini, Citation2004) , and produce smaller P-values (Ying et al., Citation2017) , irrespective of whether or not the ME in the covariate is corrected.

The study has the advantage that, apart from adjusting for within-cluster dependencies and covariate ME, it incorporates other survey design features such as stratification and sampling weights. Our study has a few limitations: (1) for simplicity and illustration purposes, we assumed that the covariate of interest is measured with classical additive error. However, in practice, the covariate can be measured with systematic error. In such a case, the systematic error components can be incorporated in the measurement error model in Equationequation (11); (2) although a covariate can have a multiplicative measurement error structure (Heid et al., Citation2004), our study assumed an additive measurement error structure. A covariate measured with multiplicative error can be handled by first converting the multiplicative structure to an additive structure, through an appropriate transformation that linearizes the error structure.

From the findings of this study, we conclude that it is important to adjust for covariate ME in clustered data while accounting for within-cluster correlation.

Ethical statement

Ethics approval was granted by the HSRC Research Ethics Committee and was based on the Helsinki Declaration which has been adopted by the World Medical Association. Informed written consent or assent was obtained from each participant in the study. Participants were provided with written information on the study (including the background and objectives of the study) and their rights regarding participation and withdrawing at any time.

Supplemental material

Supplemental Material

Download PDF (332.3 KB)

Disclosure statement

No potential conflict of interest to declare.

Data availability

SANHANES-1 data is made available to the researcher upon registration and agreeing to the terms and conditions of use in the Human Sciences Research Council (HSRC) website at http://curation.hsrc.ac.za/Dataset-565-datafiles.phtml.

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

This work was supported through the DELTAS Africa Initiative. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)’s Alliance for Accelerating Excellence in Science in Africa (AESA), and is supported by the New Partnership for Africa’s Development Planning and Coordinating Agency (NEPAD Agency), with funding from the Welcome Trust [grant 107754/Z/15/Z- DELTAS Africa Sub-Saharan Africa Consortium for Advanced Biostatistics (SSACAB) programme] and the UK government. The views expressed in this publication are those of the authors and not necessarily those of AAS, NEPAD Agency, Welcome Trust, or the UK government.

Notes on contributors

Alexander K. Muoka

Alexander K. Muoka is a PhD student in the School of Mathematics, Statistics and Computer Science at the University of KwaZulu-Natal, South Africa. He is an assistant lecturer in the Department of Mathematics, Statistics and Physical Sciences at Taita Taveta University, Kenya. He has research interests in covariate measurement error modeling, multivariate analysis, among others.

Henry Mwambi

Henry G. Mwambi is a Professor of Statistics in the School of Mathematics, Statistics and Computer Science at the University of KwaZulu-Natal, South Africa. Henry has vast experience in modeling and analysis of biological and health outcome data including survival data, missing data, among others.

George O. Agogo

George O. Agogo is a biostatistician at the Centers for Disease Control and Prevention, Kenya. He has research interests in mixed modeling, covariate measurement error modeling, epidemiology, analysis of survival data, among others.

Oscar Ngesa

Oscar O. Ngesa is a Senior Lecturer in the Department of Mathematics, Statistics and Physical Sciences at the Taita Taveta University, Kenya. He has research interests in Spatial, Bayesian, food security and resilience analysis, among others.

References