Full article: Dealing with missing observations in the outcome and covariates in randomized controlled trials

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This article compares different missing data methods in randomized controlled trials, specifically addressing cases involving joint missingness in the outcome and covariates. In the existing literature, it is still unclear how advanced methods like linear mixed model (LMM) and multiple imputation (MI) perform in comparison to simpler methods regarding the estimation of treatment effects and their standard errors. We therefore evaluates the performance of LMM and MI against simple alternatives across a wide range of simulation scenarios for various realistic missingness mechanisms. The results show that no single method universally outperforms the others. However, LMM followed by MI demonstrates superior performance across most missingness scenarios. Interestingly, a simple method that combines complete case analysis for the missing outcome and mean imputation for the missing covariate (CCAME) performs similarly to LMM and MI. All methods are furthermore compared in the context of a randomized controlled trial on chronic obstructive pulmonary disease.

KEYWORDS:

1. Introduction

Missing data are common in clinical research and a cause of concern in statistical analyses since they may lead to biased and/or imprecise results [Citation1,Citation2]. Depending on whether one is dealing with missing covariates, missing outcomes or both, missing data can have two effects: reduced efficiency and power and potential to introduce bias [Citation3]. In randomized trials, the former effect applies when dealing with missing covariates as long as the outcome is complete [Citation2], whereas both effects come into play if there are missing data in the outcome [Citation1]. In this article, unless stated otherwise, we consider the following linear regression as the analysis model of interest: (1) $Y_{i} = β_{0} + β_{1} X_{i 1} + β_{2} X_{i 2} + ε_{i}, i = 1, \dots, n,$ (1) where $X_{1}$ is the treatment indicator ( $X_{1} = 0$ for control, and $X_{1} = 1$ for intervention); unless specified otherwise, $X_{2}$ is the baseline outcome (which is independent of $X_{1}$ due to randomization), $β_{1}$ is the treatment effect of interest, and $ε_{i}$ ‘s are (independent) error terms with a mean of zero and a common variance $σ^{2}$ . $X_{1}$ is always complete, but $X_{2}$ and/or $Y$ might have missing data.

Statistical analyses involving missing data make assumptions about the cause of missingness, known as the missingness mechanism. Following Rubin [Citation4] and Little and Rubin [Citation5], we consider three missingness mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). Data are classified as MCAR if the probability of missing data does not depend on both observed and unobserved data. MAR implies that the probability of missing data does not depend on unobserved data, conditional on the observed data. MNAR, on the other hand, implies that the probability of missing data depends on unobserved data, even after conditioning on the observed data. In model (1), $Y$ could be missing based on any of these three missingness mechanisms, but $X_{2}$ may be missing only under MCAR or MNAR (MAR is implausible because $X_{2}$ is collected before $X_{1}$ and $Y$ and, therefore, its missingness cannot be influenced by these two variables). If there are additional baseline covariates (e.g. $X_{3}$ ), the missingness of $X_{2}$ could be considered MAR, depending on those covariates. Although this particular scenario is not within the scope of the present study, randomization ensures that the results would still be akin to those obtained under MCAR for X₂.

As stated above, the missingness of the covariate $X_{2}$ and/or outcome $Y$ in randomized trials always causes a reduction in efficiency. The amount of efficiency reduction depends on the missingness rate (See, among others, Kayembe et al. [Citation6] and Sullivan et al. [Citation1]). However, the missingness of the covariate $X_{2}$ does not introduce bias, provided $X_{2}$ is measured before randomization. This is because neither the covariate nor its missingness can depend on the treatment assignment (i.e. $X_{1}$ ) [Citation1,Citation2]. This fact holds in randomized trials with missing data in more than one covariate if these covariates are all measured before randomization. In contrast, missing data in the outcome $Y$ can introduce bias in the analysis if a suitable method for handling missing data is not used. For example, if individuals who stopped adhering to the treatment protocol (because of their poor outcome values which are missing at the end of the study) are excluded from the analysis, the treatment effect estimate will likely be biased (see Jackson et al. [Citation1,Citation3]).

There are several theoretically simple as well as advanced methods for handling missing data in the literature: Complete-case analysis (CCA) restricts the analysis to individuals with complete data; unadjusted analysis (UA) excludes the incomplete covariate(s) from the analysis; overall mean imputation (ME) replaces each missing value of a variable with the observed mean of that variable and then uses the standard analysis for complete data. If missingness is confined to covariates only, another simple method is the missing-indicator method (M), which utilizes the mean imputation (ME) approach for each missing value and adds an indicator of missingness for the incomplete variable in the analysis model. The latter two methods (ME and M) are generally suitable to deal with missing covariates in randomized trials [Citation2,Citation6,Citation7], but not for missing outcomes as explained in sections 2.3 and 2.4. Advanced methods for addressing missing data include multiple imputation (MI) and linear mixed effects models (LMM) [Citation1,Citation8] as discussed later in sections 2.5 and 2.6. These methods generally provide unbiased treatment effect estimates with increased efficiency and power, particularly when missingness (of the covariate or outcome) is MCAR or MAR. While they are recommended for dealing with missing outcomes in randomized trials, they are not necessarily advocated for missing covariates [Citation1,Citation8].

There is extensive literature on how to deal with missing data in randomized trials. Table presents a summary of published studies addressing this topic, specifically when linear regression is the analysis model of interest. The table categorizes studies based on data structures and distinguishes between missing covariates, missing outcomes, and missing in both. Additionally, the table classifies studies based on various missingness mechanisms and the different methods investigated for handling missing data.

Table 1. An overview of the data structures, missingness mechanism scenarios, and methods to deal with missing data in the literature. Note: Y = continuous outcome, X₁= treatment assignment, and X₂, X₃= baseline variables; R indicates missingness (R = 1 for observed, and R = 0 for missing); R₂∼X₁+X₂ means the missingness mechanism of X₂ is a function of (X₁, X₂); the other missingness mechanisms are defined similarly for R₂ and R_Y (representing the outcome missingness).

Download CSV Display Table

The case of missing covariate(s) has received attention from a few authors [Citation1,Citation2,Citation6,Citation7,Citation9,Citation10]. These studies have compared the performance of several missing data methods under different missingness mechanisms through simulation studies. The conclusion from these studies is that ME and M are preferable to all other methods as outlined in Table . This finding holds under the condition that the missingness of the covariate does not depend both on the treatment and the covariate simultaneously (see [Citation6,Citation7]). In situations where missingness depends on the treatment assignment, which can occur if the covariate is measured after treatment assignment, ME and especially M are biased. In contrast, UA and CCA are unbiased under such conditions but may be inefficient depending on the covariate-outcome correlation and the percentage of missingness. It should be noted that measuring covariates after treatment assignment represents poor practice and should be avoided in general. Consequently, this specific scenario is not further investigated in the present paper.

The case of missing outcome(s) has also been a focus of research in the literature (see, among others, [Citation1,Citation8,Citation11]. Simulations conducted under various missingness mechanisms have indicated that MI and LMM are the most suitable methods to deal with missingness in the outcome. In this context, the overall mean imputation (ME) was not specifically investigated, but it might perhaps be useful if implemented per treatment group, equivalent to a mean imputation method conditional on the treatment assignment but not on the other covariates. This method has been shown to produce unbiased estimates of the regression coefficients (though not their standard errors) under MCAR or MAR, when missing values on a variable (in this case, the outcome) are replaced with the mean of that variable conditional on the other variables (in this case, the treatment) with which it is associated in the regression analysis [Citation12,Citation13].

The more general case of missing data in both covariates and the outcome in the context of an RCT with a continuous outcome has been explored to some extent by Carpenter & Kenward [Citation8] and Jackson et al. [Citation3] through case studies. While Carpenter & Kenward [Citation8] provided formal proofs for MI and LMM, Jackson et al. [Citation3] focused on the plausibility of different assumptions of the missingness mechanism. In view of the limited scope of these case studies and the absence of a systematic simulation study comparing various methods, there are significant gaps in our understanding of missing data methods for RCTs in scenarios involving missingness in both covariates and the outcome. Firstly, these studies did not cover practical scenarios where the missingness in the outcome may depend on the baseline recording of the outcome, the treatment indicator and their interaction, all of which will be addressed in this paper. Furthermore, Jackson et al. [Citation3] concentrated on the plausibility of various assumptions regarding the missingness mechanism but did not perform a detailed comparison of different missing data methods, which is the primary focus of our study. Hence, a good contribution to the literature on dealing with missing data in RCTs would be to consider this problem in the general setting explored by Carpenter & Kenward [Citation8] and Jackson et al. [Citation3], but going beyond their work by comparing different missing data methods across a wider range of simulated missingness scenarios. Finally, the previous studies on missingness in both covariates and the outcome have shown little interest in the performance of simple methods, such as UA, CCA, ME and M, particularly in the case of the missing outcome. Here, we focus on evaluating the performance of these simple methods in comparison with advanced methods through simulations under various missingness mechanisms. We aim to determine if and when these simple methods can be used without introducing substantial bias in the treatment effect estimate or its standard error.

This paper is arranged as follows. In section 2, we formally describe the various methods for handling missing data. In section 3, we conduct simulations to compare all proposed methods of dealing with missing data across a range of missingness mechanisms, focusing on the case of joint missingness in a baseline covariate and an outcome. The evaluation is done concerning various performance criteria in section 4. In section 5, we extend the simulation to the case of joint missingness in multiple baseline covariates and the outcome. We present a real case study in section 6. In section 7, we discuss the results obtained from simulations and the real-world application and conclude findings presented in this article.

2. Missing-data methods

This section defines several missing data methods that will be compared in the subsequent simulation study in Section 3. We begin by assuming that model (1) is correctly specified. In this ideal scenario, where no missing data exists, the ordinary least squares estimator ( ${\hat{β}}_{1}$ ) and its standard error (SE) are unbiased [Citation14–16]. Model (1) serves as our reference throughout the study against which to compare the various missing data methods.

2.1. Unadjusted analysis (UA)

Unadjusted analysis (UA) involves estimating the treatment effect without accounting for any covariates. Typically, UA is inefficient, unless the covariates are not predictive. In situations where both X₂ and Y have missing data, UA estimates the treatment effect without adjusting for X₂ and only uses cases with observed Y. Thus, given the analysis model (1), the UA model can be represented as: (2) $Y_{i} = (α_{0} + α_{1} X_{i 1} + ε_{i}) R_{iY}, i = 1, \dots, n,$ (2) where $R_{iY}$ is the missingness indicator of $Y_{i}$ , $R_{iY} = 0$ if $Y_{i}$ is missing, and $R_{iY} = 1$ if $Y_{i}$ is observed. Here, the missingness mechanism of X₂ is irrelevant, and the bias of the treatment effect estimator ( ${\hat{α}}_{1}$ ) and its standard error (SE) (i.e. efficiency) depend on the missingness mechanism of Y and possibly on the size of the treatment effect $β_{1}$ . Bias in the treatment effect means that $α_{1}$ (and so $E ({\hat{α}}_{1})$ ) is unequal to $β_{1}$ in the model (1). Moreover, the efficiency of ${\hat{α}}_{1}$ relative to the reference case (i.e. ${\hat{β}}_{1}$ in the model (1) applied to the complete data) depends on the percentage of missing data in Y and the covariate effect $β_{2}$ (i.e. the association between Y and X₂). The UA model (2) may thus be useful in terms of efficiency if the proportion of missingness is low for Y but high for X₂. However, its bias (and efficiency) under various missingness mechanisms and percentages will be studied through simulation in Section 3.

2.2. Complete-case analysis (CCA)

In complete-case analysis (CCA), only individuals without any missing data (whether on X₂, Y, or both) are included in the analysis. Here, given the analysis model (1), the model for CCA is expressed as: (3) $Y_{i} Q_{i} = (α_{0} + α_{1} X_{i 1} + α_{2} X_{i 2} + ε_{i}) Q_{i}, i = 1, \dots, n,$ (3) where $Q_{i} = 1$ if the i^th individual has complete data (i.e. $R_{iY} = R_{i 1} = R_{i 2} = 1$ ), and $Q_{i} = 0$ otherwise. The OLS estimator ofor the CCA model (3) is given by theorem 2.1 of Jones [Citation15]. According to this theorem, the bias of the OLS estimator of $α_{1}$ and its standard error (SE) (i.e. efficiency) depend on the missingness mechanism of both X₂ and Y. The bias in the SE estimator additionally depends on the number of complete cases. As with UA, bias in the treatment effect means that $α_{1}$ (and so $E ({\hat{α}}_{1})$ ) in the CCA model (3) is unequal to $β_{1}$ in the model (1). CCA may thus be useful if the covariate X₂ is strongly correlated with the outcome Y, and both X₂ and Y do not suffer from substantial missingness. However, like UA, the bias and efficiency of CCA under various missingness mechanisms and percentages will be studied through simulation in Section 3.

2.3. Mean imputation (ME)

The mean imputation (ME) method replaces each missing value of a variable with its observed mean, and then the originally intended analysis model is applied to the completed dataset. When both X₂ and Y have missing data, the ME method involves replacing the missing values in X₂ and Y with the overall observed mean of X₂ and the mean per treatment of the observed Y, respectively. It should be noted that the overall mean imputation (rather than the mean imputation per treatment group) should be used for X₂ because it provides better precision and coverage for the treatment effect in the subsequent analysis [Citation2,Citation6]. For the outcome Y, the mean per treatment is used because the outcome Y can be influenced by the treatment assignment, and using the overall mean imputation (for Y) leads to bias in the treatment effect estimates, except when β₁= 0 (i.e. no treatment effect).

Given the analysis model (1), the ME model is represented by: (4) $Z_{iY} = α_{0} + α_{1} X_{i 1} + α_{2} Z_{i 2} + ε_{i}, i = 1, \dots, n$ (4) where $Z_{iY} = Y_{i} R_{iY} + {\bar{Y}}_{X_{1}}^{obs} (1 - R_{iY})$ , and $R_{iY}$ indicates missingness in the outcome( $R_{iY} = 0$ if $Y_{i}$ is missing, $R_{iY} = 1$ if $Y_{i}$ is observed). ${\bar{Y}}_{X_{1}}^{obs}$ represents the observed mean of $Y$ within each treatment group. Additionally, $Z_{i 2} = X_{i 2} R_{i 2} + {\bar{X}}_{2}^{obs} (1 - R_{i 2})$ , where $R_{i 2}$ indicates missingness in $X_{2}$ ( $R_{i 2} = 0$ if $X_{2 i}$ is missing, $R_{i 2} = 1$ if $X_{2 i}$ is observed), and ${\bar{X}}_{2}^{obs}$ is the overall observed mean of $X_{2}$ . Bias in the OLS estimator of $α_{1}$ and its standard error (SE) from the ME model (4) is expected to depend on the rate of missing data and the missingness mechanisms for both X₂ and Y. This bias and the efficiency of ME will be studied through simulation in Section 3. Once again, the bias in the treatment effect means that $α_{1}$ (and thus $E ({\hat{α}}_{1})$ ) from the ME model (4) is unequal to $β_{1}$ in the model (1).

2.4. Missing-indicator method (M)

The missing-indicator method (M) is generally used for handling missing covariates. It replaces missing values of a covariate with a fixed value (such as zero or the mean) and modifies the analysis model by incorporating an indicator of missing values on that covariate. In the context where both the covariate (X₂) and the outcome (Y) have missing data, we extend the M method as follows: missing values of X₂ are imputed using the overall mean imputation, while the missing values of Y are imputed using the mean per treatment group. An indicator of missingness in X₂ is then added to the analysis model. Given the analysis model (1), the M model can be represented as: (5) $Z_{iY} = α_{0} R_{i 2} + α_{1} X_{i 1} + α_{2} X_{i 2} R_{i 2} + α_{3} (1 - R_{i 2}) + ε_{i}; i = 1, \dots, n$ (5) where $Z_{iY} = Y_{i} R_{iY} + {\bar{Y}}_{X_{1}}^{obs} (1 - R_{iY})$ as defined in equation (4). Furthermore, when the expression of $Z_{i 2}$ , defined above, is plugged into equation (4), it is apparent that equation (5) differs from equation (4) by replacing $α_{0}$ and $α_{2} {\bar{X}}_{2}^{obs} (1 - R_{i 2})$ with $α_{0} R_{2}$ and $α_{3} (1 - R_{i 2})$ , respectively. This introduces an extra parameter and provides greater flexibility compared to equation (4).

Bias in the OLS estimator of $α_{1}$ and its standard error (SE) from the M model (5) is anticipated to depend on the missingness mechanisms and the rate of missingness in both X₂ and Y. This bias as well as the efficiency of the M method will be explored through simulation in Section 3. Once again, bias in the treatment effect means that $α_{1}$ (and thus $E ({\hat{α}}_{1})$ ) from the M model (5) is unequal to $β_{1}$ in the model (1).

2.5. Multiple imputation (MI)

Multiple imputation (MI) is an advanced method for handling missing data. It involves the use of imputation models, which are models fitted to the observed data, to replace missing values multiple times (m times). These imputation values are drawn randomly from the posterior predictive distributions of incomplete variables. MI accounts for the between-imputation uncertainty (i.e. uncertainty due to what values to impute) and generates several completed datasets. Each completed dataset is then analyzed separately using standard methods suitable for complete data. Finally, the results from these multiple analyses are combined using Rubin’s [Citation17,Citation18]. It is important to note that MI results in a larger estimated variance of the estimator of interest ( ${\hat{β}}_{1}$ in this case) compared to single imputation methods because it takes into account both the between-imputation and within-imputation variances.

In general, MI provides unbiased estimates of the regression coefficients under MCAR and MAR mechanisms provided that the imputation model is congenial to the analysis model. This loosely means that the imputation model should contain at least all variables in the analysis model (including the outcome) to preserve the relationships between these variables [Citation19].

In the context of randomized trials, MI is likely to yield unbiased estimates with missingness limited to baseline covariates irrespective of any missingness mechanisms [Citation1,Citation6,Citation7]. However, a MAR-based MI might yield biased estimates under MNAR for the outcome missingness. In this article, we will explore whether MI leads to unbiased estimates of $β_{1}$ when both the covariate (s) and the outcome have missing observations.

2.6. Maximum likelihood-based method

In the current paper's setting where $X_{2}$ and $Y$ are the baseline and post measurement of a continuous outcome, another advanced method for handling missing data is to assume these repeated measures follow a bivariate normal distribution, adjusted for the treatment group $X_{1}$ . If there exists another covariate (e.g. $X_{3}$ ) in addition to the treatment group $X_{1}$ adjustment will be made for ( $X_{1}, X_{3}$ ) – this extension is considered in a later section. In the absence of missing data, the bivariate model is given as follows: (6) $(\begin{matrix} X_{i 2} \\ Y_{i} \end{matrix}) | X_{i 1} \sim N {(\begin{matrix} α_{02} + α_{12} X_{i 1} \\ α_{0 Y} + α_{1 Y} X_{i 1} \end{matrix}), (\begin{matrix} σ_{X_{2}}^{2} & σ_{X_{2} Y} \\ σ_{X_{2} Y} & σ_{Y}^{2} \end{matrix})}$ (6)

Let $μ_{1 A} = E [Y | X_{1} = A]$ and $μ_{1 B} = E [Y | X_{1} = B]$ be the means of post measurement in groups $A$ and $B$ , respectively. Also, let $α_{12} = 0$ constrain the baseline mean to be the same for both treatment groups due to randomization (i.e. $α_{02}$ ). Let ${\hat{α}}_{02}$ , ${\hat{μ}}_{1 A}$ , and ${\hat{μ}}_{1 B}$ be the maximum likelihood (ML) estimators of $α_{02}$ , $μ_{1 A}$ , and $μ_{1 B}$ , respectively. The ML estimator of the treatment effect $α_{1 Y}$ is then ${\hat{α}}_{1 Y} = {\hat{μ}}_{1 B} - {\hat{μ}}_{1 A} .$ This estimator and its variance are asymptotically equivalent to their corresponding counterparts in the analysis of covariance (ANCOVA) model (1) when there are no missing data [Citation8, p60; Citation20,Citation21]. The joint model (6) is equivalent to a linear mixed model (LMM) with fixed effects for treatment (set to zero at baseline because of randomization), time, time-by-treatment interaction (which is the effect of interest), and other relevant covariates if present. The LMM also incorporates an unstructured covariance matrix for the repeated measures [Citation22]; Sullivan et al. [Citation1]). Liu et al. [Citation23] and Van Breukelen [Citation20] provided a detailed analytical comparison between LMM and ANCOVA models concerning the estimated treatment effect and its variance.

The ML estimator of the treatment effect ( ${\hat{α}}_{1 Y}$ ) is known to be consistent and asymptotically unbiased under MAR and MCAR mechanisms. It is also unaffected by any missingness in the baseline covariates (as the missingness in X₂ or any other baseline covariates is unrelated to the treatment and post measurement of the outcome – this is a result of obtaining baseline measures before randomization). However, unbiasedness of the treatment effect estimator ( ${\hat{α}}_{1 Y}$ ) is not guaranteed if the missingness of Y is MNAR. These aspects will be evaluated in comparison to the other methods through simulation. Additionally, the variance of ${\hat{α}}_{1 Y}$ is likely to increase due to loss of information when compared to the case of no missing data [Citation8,Citation23].

3. Simulation study

We conducted an extensive simulation study to compare the finite-sample performance of different methods of handling missing data in terms of bias, efficiency, and the 95% confidence interval coverage rate of the treatment effect. A randomized controlled trial was simulated with 200 observations, evenly divided between the control group ( $X_{1} = 0$ ) and the intervention group ( $X_{1} = 1$ ). The study included a single baseline covariate $X_{2}$ (representing the continuous baseline outcome) and with a post measurement outcome Y. Missing data were allowed to occur in both the covariate $X_{2}$ and the outcome Y, but not in the treatment indicator $X_{1}$ . The case of multiple baseline covariates (i.e. X₂ and X₃) is considered in a later section of the present paper. The simulation setup involves the following steps.

3.1. Generation of complete data

The complete data were generated as follows:

$X_{1}$ from a Bernoulli distribution with $P (X_{1} = 0) = P (X_{1} = 1)$ ,
$(\begin{matrix} X_{2} \\ Y \end{matrix}) | X_{1} \sim N ((\begin{matrix} α_{02} + α_{12} X_{1} \\ α_{0 Y} + α_{1 Y} X_{1} \end{matrix}), (\begin{matrix} \begin{matrix} 1 & 0.5 \end{matrix} \\ \begin{matrix} 0.5 & 1 \end{matrix} \end{matrix}))$

With

ρ_{X_{2} Y} = (σ_{X_{2} Y} / σ_{X_{2}} σ_{Y}) = 0.5

and

α_{02} = α_{0 Y} = 0

. Due to randomization

α_{12} = 0

, which implies that α_1Y is the treatment effect (equivalent to

β_{1}

in the ANCOVA model (1)). The coefficient α_1Y is set to −0.5, 0, or 0.50 to assess whether the performance of each method (under various missingness mechanisms) was influenced by the magnitude of the treatment effect. Notably, α_1Y = 0.50 corresponds to a medium effect size d = 0.50, defined as

d = α_{1 Y} / σ_{Y}

, which is analogous to the definition of d for an RCT without baseline [Citation24,Citation25]. Hence, three scenarios were considered for simulating complete data: one with a negative treatment effect, one without any treatment effect, and one with a positive treatment effect. For each of these scenarios, 1500 datasets, each with a sample size of 200 were generated.

3.2. Generation of incomplete data

Missing values in $X_{2}$ and Y were independently created using the following models:

Logit {P(R₂ = 1|X₁,X₂,Y)} = φ₀₂ + φ₂₂X₂,
Logit {P(R_Y = 1|X₁,X₂,Y)} = φ_0Y + φ_1YX₁ + φ_2YX₂ + φ_(1*2)YX₁*X₂ +φ_YYY,

where R₂= 1 (R_Y= 1) if X₂ (Y) is observed, and R₂= 0 (R_Y= 0) if X₂ (Y) is missing. The setup values for the parameters in the above models are given in Table . Two ranges of missingness rate were considered per variable: low (10–17%) and high (30–37%).

Table 2. Parameter values of the missingness models for X₂ (Logit {P(R₂ = 1|X₁,X₂,Y)} = φ₀₂ + φ₂₂X₂) and for Y (Logit {P(R_Y = 1|X₁,X₂,Y)} = φ_0Y + φ_1YX₁ + φ_2YX₂ + φ_(1×2)YX₁×X₂ +φ_YYY). Note: To achieve two ranges of missingness rate within each scenario, the intercept values (φ₀₂, φ_0Y) were varied accordingly.

Download CSV Display Table

The low missingness rate (per variable) resulted in about 1–3% joint missingness in both X₂ and Y. For the high missingness rate, the joint missingness was about 9–14%. To achieve the required missingness percentage per variable, the intercepts (φ₀₂, φ_0Y) were adjusted based on the values of the other φ parameters.

For each generated complete dataset and each missingness scenario presented in Table , two incomplete datasets were created, one with 10–17% missingness and one with 30–37% missingness. Thus, a total of 2 (missingness rates) ×14 (missingness scenarios) = 28 incomplete datasets were created per generated complete dataset.

3.3. Imputation and analysis

The following methods were used to analyze each simulated dataset: unadjusted analysis (UA), complete-case analysis (CCA), mean imputation (ME), missing-indicator method (M), multiple imputation (MI), and linear mixed model (LMM). Also, as a benchmark, each original simulated dataset (before inducing missing data,) was analyzed.

For MI, we used the full-conditional specification (FCS) approach. This approach specifies a conditional imputation model for each partially observed variable given all the other variables [Citation18,Citation26]. In the context of the present paper, the imputation model for X2 was a linear regression of X2 on X1 and Y, and the imputation model for Y was a linear regression of Y on X1 and X2. It is worth noting that the interaction X1×X2 could also be included in the imputation model for Y under the missingness mechanism 5. However, this was not done to maintain consistency with the other missingness mechanisms and the analysis model. Furthermore, Y was included in the imputation model for X2 to ensure that the imputation model contained all the variables of the analysis model, preserving the relationships between these variables [Citation19]. Finally, seventy imputations (m = 70) were created following the rule of thumb that the number of imputations should be at least as large as the percentage of incomplete cases [Citation27]. For LMM, we fitted the LMM procedure to the available data.

3.4. Methods comparison

All methods were compared across 84 scenarios (3 treatment effects × 14 missingness mechanisms × 2 missingness rates) in the simulation study. Let ${\hat{β}}_{1 i}$ and ${\hat{SE}}_{i}$ be the treatment effect estimate and its standard error in the i^th simulated dataset ( $i = 1, \dots, s$ ) and $s = 1500$ . The following five performance criteria were used.

Bias of ${\hat{β}}_{1}$ : The bias was calculated as $Bias ({\hat{β}}_{1}) = {\bar{\hat{β}}}_{1} - β_{1}$ , where ${\bar{\hat{β}}}_{1} = \sum_{i}^{s} {\hat{β}}_{1 i} / s$ ;
Coverage of the 95% confidence interval (CI): The coverage was computed as the proportion of simulated datasets where the 95% CI contained the true treatment effect. It was calculated as $Coverage (95 %CI) = (\sum_{i = 1}^{s} K_{i} / s) \times 100$ , where $K_{i} = 1$ if $β_{1} \in C I_{i}$ and $K_{i} = 0$ otherwise;
Mean squared error (MSE) of ${\hat{β}}_{1}$ : MSE was calculated as $MSE ({\hat{β}}_{1}) = (Bias ({\hat{β}}_{1}))^{2} + Var ({\hat{β}}_{1})$ , where $Var ({\hat{β}}_{1}) = \sum_{i}^{s} ({\hat{β}}_{1 i} - {\bar{\hat{β}}}_{1})^{2} / (s - 1)$ ;
Type I error rate (false positive rate (FPR)): For scenarios where α_1Y (and thus also β₁) was zero, the type I error rate was computed as the proportion of 1500 simulated datasets where the 95% CI did not contain zero;
Power to detect a true treatment effect: For scenarios where α_1Y (and thus also β₁) is not zero, the power was calculated as the proportion of 1500 simulated datasets where the 95%CI did not contain zero.

These criteria were calculated for each method and compared against the results obtained from the complete data, which served as the reference method (REF). The simulations were implemented in R 3.5.1, where the mice package (version 3.3.0) and the nlme package (version 3.1-137) were used for the MI and LMM methods, respectively. Calculations were conducted on a computer with an Intel®Core™i5-6500 [email protected].

4. Results

The simulation results are presented as follows: We begin by comparing the methods based on each performance criterion across various missingness mechanisms and subsequently summarise the findings according to the non-MNAR and MNAR mechanism for Y. It is important to note that the results for missingness mechanisms 1–7 (where X₂ is MCAR) are detailed in this section. Due to their similarity, the results for missingness mechanisms a1-a7 (where X₂ is MNAR) can be found in the supplementary materials.

In terms of the bias of ${\hat{β}}_{1}$ (Figure ), all methods showed unbiased results under the MCAR mechanism and all MAR mechanisms, except for mechanism 4. In mechanism 4, CCA, MI, and LMM showed no or negligible bias, while the other methods (UA, ME, M) exhibited bias, which increased with the missingness rate. Intriguingly, more bias was found in mechanism 4 (where R_Y depended on the additive effects of X₁ and X₂) than in mechanism 5 (where R_Y depended on the multiplicative effect of X₁ and X₂ in addition to their additive effects). This can be explained as follows: In both mechanisms, the observed mean of Y is shifted upward in both groups because the odds of Y being missing are higher for smaller X₂. This corresponds to smaller values of Y because of the positive correlation between X₂ and Y. In mechanism 4, this shift is much bigger in the control group (X₁= 0) than in the treated group (X₁= 1) because the odds of Y being missing are higher in X₁= 0, which leads to a downward bias in ${\hat{β}}_{1}$ (the mean difference of Y between the two groups). In mechanism 5, the mean shift is about the same in both groups because the odds of Y being missing are higher in X₁= 0 if X₂> −1 and in X₁= 1 if X₂< −1. Hence, this leads to no or negligible bias in ${\hat{β}}_{1}$ (see Table S1 and Figure S1 of the Supplement for illustration).

Figure 1. Bias of the treatment effect estimate per missing data method, missingness mechanism (columns), as given in Table 2, true treatment effect (rows), and missingness rate (curves).

Under MNAR mechanisms, all methods exhibited bias except for scenario 6 when there was no treatment effect ( $β_{1} = 0$ ). The extent of bias increased with a higher missingness rate. Interestingly, in mechanism 6, the bias was towards 0 (upwards if β₁ < 0 and downwards if β₁ > 0), while it was consistency negative regardless of the sign of β₁ in mechanism 7. This can be explained as follows. In mechanism 6, the odds of Y being missing are higher for smaller values of Y, causing an upward shift in the observed mean of Y. When $β_{1}$ is positive, Y is smaller in the control group than in the treated group, leading to more missing data and a stronger upward shift in the observed mean of Y in the control group than in the treated group. Consequently, this leads to an underestimation of the treatment effect $β_{1}$ . Conversely, when $β_{1}$ is negative, the opposite effect occurs in mechanism 6. In mechanism 7, on the other hand, missingness additionally depends on the treatment, leading to more missingness and thus more upward bias in the mean of Y in the control group (remembering that missingness increases as Y decreases). Compared to mechanism 6, this leads to a more negative (or less positive) ${\hat{β}}_{1}$ .

In terms of coverage (Figure ), under MCAR and MAR mechanisms, all methods achieved close to the 95% coverage rate except for ME and M, which showed substantial undercoverage. Under MNAR mechanisms, except for ME and M, all other methods maintained close to the 95% coverage rate in mechanism 6 for all values of β₁ (> 0, 0, < 0). However, in mechanism 7, substantial undercoverage occurred for at least one β₁, and in this scenario, MI and LMM did not outperform CCA. ME and M showed substantial undercoverage in both MNAR mechanisms. The degree of undercoverage worsened with an increased missingness rate.

Figure 2. Coverage (%) of the 95% CI for the treatment effect estimate per missing data method, missingness mechanism (columns), as given in Table 2, true treatment effect (rows), and missingness rate (curves). Lower and upper horizontal reference lines (94.4 and 95.6) correspond to $95 \mp \sqrt{95 \times 5 / 1500}$ (Monte Carlo Errors), with 1500 simulated datasets per scenario.

In terms of mean squared error (MSE) (Figure ), all methods showed similar MSEs under MCAR and MAR mechanisms, except for UA when the missingness rate was between 10–17% and UA and CCA when the missingness rate was between 30–37%. The higher MSE for UA and CCA can be attributed to a larger residual outcome variance for UA and a smaller sample size for CCA. Under MNAR mechanism 6, the pattern remained consistent with that under MCAR and MAR. However, under MNAR mechanism 7, all methods (although to a slightly lesser extent for MI and LMM) suffered from strongly elevated MSE in scenarios with β₁= 0 or β₁= 0.5, reflecting their substantial bias in these scenarios, as observed in Figure . A similar but weaker effect was observed under mechanisms 4 and 6.

Figure 3. Mean squared error (MSE) for the treatment effect estimate per missing data method, missingness mechanism (columns), as given in Table 2, true treatment effect (rows), and missingness rate (curves).

In terms of false positive rate (FPR, or type I error rate) (Figure ), under all missingness mechanisms, ME and M produced substantially higher FPRs than the nominal level of 5% (≥ 7.5% and ≥ 15% when the missingness rate was between 10–17% and 30–37%, respectively). In contrast, UA, CCA, MI and LMM displayed FPRs close to the nominal level of 5% except for mechanism 7 (for all four methods) and mechanism 4 (only for UA). This particular result for UA can be explained by considering the bias of UA in scenario 4, as illustrated in Figure .

Figure 4. False positive rate (FPR) for detecting a treatment effect per missing data method, missingness mechanism (columns), as given in Table 2, true treatment effect (rows), and missingness rate (curves). Horizontal reference line is drawn at 5% FPR.

In terms of statistical power (Figure ), the power of all methods was lower compared to that of the complete data (REF) under all missingness mechanisms. However, the reduction in power was most prominent for UA and CCA, while ME and M were least affected. This difference can be attributed to the larger residual outcome variance for UA and the smaller sample size for CCA, as indicated in Figure . Additionally, ME and M showed relatively higher power due to their underestimation of the standard error of β₁ (also note that these two methods showed lower coverage and higher FPR compared to the other methods, as demonstrated in Figures and ). Furthermore, under the MNAR mechanism 7, the power of all methods was dramatically reduced, particularly in scenarios with a 30–37% missingness rate and a positive β₁. This decrease in power can be explained by observing the bias pattern in Figure .

Figure 5. Power (%) to detect a treatment effect per missing data method, missingness mechanism (columns), as given in Table 2, true treatment effect (rows), and missingness rate (curves).

In summary, the methods exhibited varying performances, showing better results under the non-MNAR mechanisms (for Y) compared to the MNAR mechanisms. Accordingly, the comparison of methods is presented briefly, first for the non-MNAR and then for the MNAR cases as outlined below.

Under the non-MNAR mechanisms (scenarios 1–5, Figures ), CCA, MI and LMM demonstrated minimal or no bias in estimating the treatment effect $β_{1}$ , while UA, ME and M showed substantial bias in mechanism 4 (Figure ). Coverage rates (Figure ) were close to the nominal level of 95% for all methods except for ME and M. The latter two methods suffered from substantial undercoverage. MSEs (Figure ) were comparable across all methods except for UA and CCA, particularly when the missingness rate was between 30–37%. The FPRs (Figure ) mirrored the coverage pattern (Figure ), maintaining approximately the nominal level of 5% whenever the coverage rate was close to 95% and substantially exceeding the nominal level of 5% when there was substantial undercoverage (for ME and M). The statistical power (Figure ) of all methods was lower than that of the REF method, with UA and CCA showing the most considerable reduction (especially with 30–37% missingness), and ME and M demonstrating relatively better performance, albeit at the cost of higher FPRs for ME and M when the missingness rate was between 30–37%.

Under MNAR scenarios 6–7 (Figures ), all methods showed bias in estimating $β_{1}$ except in scenario 6 where there was no treatment effect ( $β_{1} = 0$ ). ME and M displayed substantial undercoverage in mechanism 6, while other methods achieved approximately the 95% nominal level. As opposed, all methods suffered from substantial undercoverage in mechanism 7. The MSEs of all methods were comparable to those observed in the non-MNAR scenarios in mechanism 6 but notably higher for mechanism 7. FPRs were close to the nominal level of 5% in mechanism 6 for all methods except for ME and M but exceeded the nominal level in mechanism 7, consistent with the poor coverage observed for that mechanism. The power patterns resembled those in the non-MNAR scenarios, with mechanism 7 leading to more power loss compared to other mechanisms.

In summary, ME and M showed the poorest performance both under MNAR and non-MNAR mechanisms.

5. Missingness in multiple baseline covariates and an outcome

In this extended simulation study, we explored the scenario involving multiple covariates with missing data. Specifically, we considered a situation where outcome Y and two baseline covariates X₂ and X₃ have missing observations. The analysis model of interest was the linear regression model $Y_{i} = β_{0} + β_{1} X_{i 1} + β_{2} X_{i 2} + β_{3} X_{i 3} + ε_{i}, i = 1, \dots, n$ , where X₁ is the treatment group indicator which is fully observed.

The simulation setup followed similar steps as described in Section 3, with a few modifications. A new covariate (X₃) was generated from a Bernoulli distribution with $P (X_{3} = 0) = P (X_{3} = 1)$ . The baseline (X₂) and post measurement (Y) of the outcome were generated from the bivariate model $(\begin{matrix} X_{2} \\ Y \end{matrix}) | X_{1}, X_{3} \sim N ((\begin{matrix} α_{02} + α_{12} X_{1} + α_{32} X_{3} \\ α_{0 Y} + α_{1 Y} X_{1} + α_{3 Y} X_{3} \end{matrix}), (\begin{matrix} \begin{matrix} 1 & 0.5 \end{matrix} \\ \begin{matrix} 0.5 & 1 \end{matrix} \end{matrix})) .$ Here, α₃₂ and α_3Y were both set at 0.5 (indicating a medium effect size), while all other parameters remained consistent with those specified in Section 3.1. The missingness model for X₃ was defined as Logit{P(R₃ = 1|X₁, X₂, X₃, Y)} = φ₀₃ + φ₃₃X₃, with φ₃₃= 0 and 1 for MCAR and MNAR, respectively. Note that the missingness mechanisms for X₂ and Y were consistent with those outlined in Table . Meanwhile, for X₃, the missingness mechanism was either MCAR (for missingness types 8–14) or MNAR (for missingness types a8-a14).

Using the five performance criteria described in section 3.4, we compared the following methods: UA, CCA, CCAME (CCA for Y with ME for X₂ and X₃), MI, and three versions of LMM: LMMUA (LMM with X₃ excluded from the analysis), LMMCCA (LMM with persons with missing X₃ excluded), and LMMME (LMM with mean imputation for X₃). ME and M were omitted due to their poor performance in the case of joint missingness in X₂ and Y. It is important to note that CCAME and LMMME were included based on previous findings [Citation6,Citation7] demonstrating the adequacy of ME for missing covariates in RCTs. For MI, the imputation model for X₂ was a linear regression of X₂ on X₁, X₃ and Y. Likewise, the imputation model for X₃ was a logistic regression of X₃ on X₁, X₂ and Y, and the imputation model for Y was a linear regression of Y on X₁ and X₂ and X₃.

The simulation results for UA, CCA, MI and LMMUA under the missingness mechanisms 8–14 (in Figures A6–A10 in the Appendix) were similar to those under the missingness mechanisms 1–7 (in Figures ). Hence, the focus of this section is on the newly added methods (CCAME, LMMCCA and LMMME) in comparison with the old ones (excluding UA due to its worse performance across all criteria). Specifically, the comparison includes CCAME versus CCA and MI, and the three LMM variants versus each other and MI.

Concerning the bias of the treatment effect estimate ( ${\hat{β}}_{1}$ ) (Figure A6 in the Appendix), the results of missingness mechanisms 8–14 were comparable to those obtained in missingness mechanisms 1–7 (in Figure ). As expected, CCAME, LMMCCA and LMMME showed no or negligible bias under MCAR and MAR mechanisms (scenarios 8–12), while they exhibited bias under MNAR mechanisms except when $β_{1} = 0$ (scenario 13).

Under MCAR and MAR mechanisms, all methods produced coverage rates close to the 95% nominal level across all scenarios (Figure A7 in the Appendix). However, under MNAR mechanisms, the methods showed a coverage rate close to 95% in mechanism 13 but substantial undercoverage in mechanism 14 for β₁ ≥ 0, aligning with the bias pattern observed in Figure A6. This undercoverage worsened with higher missingness rates.

In terms of mean squared error (MSE) shown in Figure A8 in the Appendix, CCAME (and MI) and LMMUA (and LMMME) exhibited similar patterns, resulting in smaller MSE values compared to CCA and LMMCCA in all missingness mechanisms. The differences between these methods became more pronounced with increased missingness rates. Notably, CCA displayed the largest standard error due to a substantial sample size reduction, followed by LMMCCA, which excludes individuals with missing values on X₃ only.

The false positive rate (FPR, or type I error) of all methods was close to 5% under all missingness mechanisms except for missingness mechanism 14 (Figure A9 in the Appendix). In this particular scenario, the FPR exceeded the nominal level, consistent with the coverage pattern shown in Figure A7.

The statistical power of all methods was lower than that of the REF method under the missingness mechanisms 8–14 (Figure A10 in the Appendix). This reduction was especially prominent for CCA followed by LMMCCA, particularly when the missingness rate ranged between 30–37%. The sharp decline in power under missingness mechanism 14 and the positive treatment effect resulted from the patterns in Figures A6–A8 in this specific scenario.

In summary, the methods exhibited varying performance across different scenarios, performing better under the non-MNAR mechanisms of Y compared to the MNAR mechanisms. It is worth noting that UA (not discussed here for simplicity) and CCA, MI and LMMUA demonstrated similar performance when considering joint missingness in Y, X₂ and X₃ and joint missingness in Y and X₂. Based on this, the comparison between methods is briefly summarized, first for non-MNAR scenarios and then for MNAR scenarios.

Under the non-MNAR scenarios 8–12 (Figures A6–A10 in the Appendix), all methods (except UA) showed no or negligible bias in estimating $β_{1}$ , with the coverage rate close to 95%. The MSEs were similar across methods, except for CCA and LMMCCA. For all methods, the FPR remained close to 5%. Regarding statistical power, all methods performed worse than REF, with CCA followed by LMMCCA demonstrating the lowest power, further decreasing with increased missingness rates.

Under the MNAR scenarios 13–14 (Figures A6–A10 in the Appendix), all methods showed a bias for ${\hat{β}}_{1}$ except for scenario 13 when there was no treatment effect ( $β_{1} = 0$ ). Coverage rates remained close to 95% in mechanism 13 but substantially declined in mechanism 14 when the treatment effect was non-negative ( $β_{1} \geq 0$ ). CCA and LMMCCA had higher MSE values compared to the other methods, with substantial differences observed at a 30–37% missingness rate. FPRs of all methods were close to the nominal level in mechanism 13 but increased in mechanism 14. Statistical power was lower than REF for all methods, with CCA and LMMCCA showing the weakest performance in terms of power.

It is important to note that the simulation results for all methods under missingness mechanisms a8-a14 were comparable to those under missingness mechanisms 8–14. These specific results can be found in the supplementary materials (Figures A6–A10).

6. Case study: Chronic Obstructive Pulmonary Disease (COPD) trial

The Chronic Obstructive Pulmonary Disease (COPD) trial was a two-arm randomized controlled trial (RCT) conducted to assess the effectiveness of a 4-month exercise training program on functional exercise capacity in patients with mild to moderate COPD in primary care compared to a control group receiving usual care [Citation28]. Ninety patients were recruited from general practices in the southern part of the Netherlands and were randomly assigned to two treatment arms: the intervention group (exercise training programme (ETP); n = 46) and the Control group receiving usual care with a low-intensity exercise programme (LIEP; n = 44). This study hypothesized that the ETP could be more effective than LIEP in managing, and potentially reducing, symptoms related to mild to moderate COPD, including impairments in exercise capacity, quadriceps strength, health-related quality of life and physical activity levels.

The primary outcome of this trial was improvement in functional exercise capacity, assessed by the 6-min walking distance (6MWD). Secondary outcomes included breathlessness (Medical research council (MRC) dyspnoea score), disease-specific quality of life (Clinical COPD Questionnaire (CCQ) and Chronic Respiratory Questionnaire (CRQ)), muscle strength (handgrip force, isometric knee extension and shoulder abduction strength) and objective daily physical activity. Each outcome was measured before treatment, at 4 months after treatment (post-treatment), and again at 6 months after treatment (follow-up). The results of this trial were previously published [Citation28].

Because the primary outcome (6MWD) had no missing data at baseline, we focused on shoulder abduction strength as the outcome of interest, which had 52% and 61% missing data at baseline and 4 months after treatment (post-treatment), respectively. In addition to the treatment group and baseline measurement of the outcome, several other baseline covariates were available, among which we chose smoking status (which had a 3% missingness rate and was measured before treatment), to maintain consistency with the simulation setup (in Section 4).

A comparison of these baseline measurements revealed no significant differences between the intervention and control groups, which was expected as these measurements were taken before randomization. Since the missingness mechanism of baseline variables has minimal impact on the performance of the methods concerning the treatment effect estimate and its SE in RCTs, there was no need to further investigate the missingness mechanism of baseline shoulder abduction strength (X2) or smoking status (X3). To explore the missingness mechanism of post-treatment shoulder abduction strength (Y), we used a logistic regression of its missingness indicator (denoted by R_Y) on the treatment group (X1), baseline shoulder abduction (X2), smoking status (X3), and the interaction between X1×X2. Using the likelihood ratio tests, it turned out that R_Y is significantly associated with X1 and X3 (omnibus p-value = 0.0360) but not with X1×X2, and subsequently X2 after its removal. While this finding does not definitively rule out the possibility of MNAR mechanism for Y, it suggests that the missingness of Y may not be MCAR as it depended on X1 and X3. The treatment effect and its SE were estimated using UA, CCA, CCAME, ME, M, MI, LMMUA, LMMCCA and LMME. For MI, we used m = 70 imputations, incorporating all covariates as well as the outcome in the imputation models. For UA, baseline shoulder abduction strength (X2) and smoking status (X3) were excluded from the analysis. In contrast, only X3 was excluded for LMMUA. For all other methods, all covariates were included in the analysis.

Table shows the treatment effect estimate with its 95% confidence interval (CI) for all methods. The CI of UA, CCA, CCAME, MI and LMMUA included zero, suggesting the absence of a significant between-group difference in the outcome. In contrast, the CI of ME, M, LMMCCA, and LMMME did not include zero, indicating the presence of a significant between-group difference in shoulder abduction strength. However, it is important to note that the differences between the methods, except UA, ME and M, were generally small in most respects. This is because, given the missingness patterns of the data (refer to Table S2 of the supplementary materials), the proportion of the data used by most methods did not substantially differ from one another (see Table S3 of the supplementary material).

Table 3. Chronic Obstructive Pulmonary Disease (COPD) trial results – Methods performances. Note: SE is the standard error, and CIL and CIU are the lower and upper bound of the 95% confidence interval.

Download CSV Display Table

In terms of efficiency, as indicated by the standard error (SE), UA showed the lowest efficiency by producing the largest SE (as UA omitted two baseline variables shoulder abduction strength (X2) and smoking status (X3), and there was a high percentage of missing data in the outcome (61%)). CCA, on the other hand, resulted in a much smaller SE than UA because although it utilized 33 of the 34 cases of UA (with one person excluded due to missing data in X2), it had a smaller residual variance due to adjustment for covariates, notably X2, which was strongly correlated (about 0.8) with Y. CCAME was slightly more efficient than CCA since it used just one more case than CCA. Among the remaining 56 incomplete cases (10 with missing values in Y only, 43 in both X2 and Y, and 3 in X2, X3 and Y), the three LMM methods only added these 10 additional cases to the analysis (note that LMM, by default, excludes cases with missing data in either X2 or X3. Hence, the effective sample size for LMM was 44, whereas it remained 33 for CCA.

Despite using more cases than CCA, LMM only showed a slight efficiency gain over CCA. This could be attributed to the limited information carried by the 10 cases with missing data in Y only, which were included in LMM but not in CCA. Notably, LMMCCA and LMMME produced identical results in all aspects because they used the same data. LMMUA, despite excluding X3 from the analysis, showed similar results to LMMCCA and LMMME. This similarity might be because X3 did not appear to affect Y (p-value = 0.4940).

The SE for MI was approximately 5% larger than those of the LMM methods because it used the same 44 cases as LMM but accounted for uncertainty due to imputation. Increasing the sample size from 44 to 90 (by including all 46 incomplete cases with both X2 and Y missing) had little impact on the results because these cases carried almost no information for the imputation process. Finally, ME and M showed the lowest SEs because of decreased outcome variance (and so SE). This is the result from mean imputation per treatment for the post-treatment outcome in both ME and M methods. Additionally, these two latter methods did not account for uncertainty in the imputation, as indicated by the between-imputation variance in MI.

7. Discussion and conclusion

In this study, we explored the issue of missing data in randomized controlled trials (RCTs) with a continuous outcome. A literature review highlighted the need to further investigate the performance of simple methods versus advanced methods, especially in scenarios involving joint missingness in covariates and the outcome. Our investigation focused on comparing various methods (as described in section 2) through extensive simulations, covering a wide range of realistic missingness mechanisms. Initially, we examined joint missingness in the outcome (Y) and a single baseline covariate (X2). Subsequently, we expanded our analysis to incorporate joint missingness in the outcome (Y) and multiple baseline covariates (X2, X3). In this extended analysis, we excluded methods that performed poorly in the previous case and introduced additional methods for comparison. Finally, we demonstrated the practical application of these methods in a randomized trial on chronic obstructive pulmonary disease [Citation28].

In scenarios involving joint missingness in both Y and X2, the simulations revealed varying patterns of results across different missingness mechanisms. However, a consistent trend emerged, indicating better performance under non-MNAR mechanisms of Y compared to MNAR mechanisms.

Under non-MNAR mechanisms (except for mechanism 4), our findings showed no or negligible bias in estimating the treatment effect across all methods. However, differences emerged in other aspects based on specific scenarios. UA and CCA produced a higher MSE, leading to substantially reduced power. This was particularly noticeable for CCA when the missingness rate was between 30–37%. This behaviour can be attributed to the larger residual outcome for UA and the smaller sample size for CCA. ME and M suffered from substantial undercoverage when applied to missing data in outcome Y, which is not surprising since both methods yielded biased (underestimated) standard errors (SE) due to two reasons: First, the mean squared error (MSE) as an estimator of the residual variance is reduced by replacing missing outcome values with the group mean (as the sum of squared errors is unchanged while the degrees of freedom are increased). This leads to a decrease in the SE, which is proportional to the square root of MSE. Second, these methods ignored the uncertainty regarding the true outcome values, leading to underestimation of SE, which, in turn, resulted in higher FPR and greater power compared to other methods.

MI and LMM performed similarly and emerged as the top-performing methods (cf. Carpenter & Kenward [Citation8]). These methods utilized more information, enhancing precision and power. Moreover, MI and LMM preserved coverage and FPR better than ME and M as they accounted for the associations between variables more effectively.

Under missingness mechanism 4, UA, ME and M were the only methods showing negative bias in estimating the treatment effect because the missingness in the outcome (Y) depended on X1 and X2, and these methods did not account for any of these variables so that they produced biased estimates.

Under MNAR mechanisms, the methods performed less severely in mechanism 6 than in mechanism 7. This is reasonable because the missingness in mechanism 6 is not directly linked to the treatment group (X1). Exceptionally, when $β_{1} = 0$ (i.e. no treatment effect), all methods produced unbiased estimates of the treatment effect under the missingness mechanism 6 even though the missingness of Y depended on Y itself. This result is expected since, in the absence of a treatment effect, the distribution of Y is identical in both treatment groups.

In general, MI and LMM performed better than other methods under both MNAR mechanisms, although both methods showed biased estimates of $β_{1}$ and a reduced coverage rate. On the other hand, ME and M ranked as the least performing methods, suffering from substantial undercoverage and an inflated FPR due to underestimated SEs as described previously. CCA showed a similar performance to MI and LMM but with reduced power due to the smaller sample size. UA was comparable to ME and M in terms of bias in ${\hat{β}}_{1}$ but offered better coverage and FPR because it neither underestimated the residual variance nor disregarded the uncertainty about the true outcome values.

In the case of joint missingness in Y, X2 and X3, the simulation results also revealed varying patterns across different missingness mechanisms. Generally, the methods performed better under non-MNAR mechanisms (for Y) compared to MNAR, as detailed below.

Under non-MNAR mechanisms, MI, CCAME, LMMUA, and LMMME demonstrated acceptable results in all aspects and generally outperformed UA, CCA, and LMMCCA, especially in terms of smaller MSE and higher power. This is attributed to the smaller residual outcome variance compared to UA and a larger sample size compared to CCA and LMMCCA. Specifically, LMMUA outperformed UA due to the inclusion of X2 in the analysis (which reduces the residual variance) and the incorporation of cases with missing Y but observed X2. Moreover, CCAME surpassed LMMCCA in terms of power because of two differences between these methods: LMMCCA excludes persons with missing values in X3, while CCAME includes them. LMMCCA includes persons with missing values in Y if X2 is observed, whereas CCAME excludes them. Further, LMMUA outperformed LMMCCA when the missingness rate was between 30–37% as LMMCCA (unlike LMMUA) excludes persons with missing values in X3. It seems that the advantage gained by LMMCCA from adjusting for X3 was counterbalanced by the reduction in sample size. The performance of LMMME was similar to LMMUA but better than LMMCCA due to its larger sample size. Interestingly, LMMME did not show any advantage over LMMUA, despite its adjustment for X3. Two potential explanations for this are (a) mean imputation for missing values in X3 weakened the association between X3 and Y, and (b) adding X3 likely did not reduce the residual outcome variance because the baseline outcome (X2) is already included in the model.

Similar to the non-MNAR mechanisms, MI, CCAME, LMMUA, and LMMME continued to outperform UA, CCA, and LMMCCA under MNAR mechanisms due to the same reasons explained previously. Across all methods, the results for mechanism 13 were better than those for mechanism 14.

In summary, our simulation studies revealed that the methods’ performance was impacted under MNAR mechanisms (for Y) compared to the non-MNAR scenarios. This is attributed to the fact that all proposed methods are MAR-based, highlighting the challenges posed by MNAR data. Additionally, the findings presented were based on the MCAR mechanism of X2 (and X3), but similar results were obtained for the MNAR mechanism of both covariates, as demonstrated in the supplementary figures. This consistency aligns with previous research indicating that the missingness mechanism for baseline covariates has minimal impact on methods’ performance, as long as these covariates are independent of the treatment or any post-randomization variable (see, among others, Kayembe et al. [Citation6,Citation7]; Sullivan et al. [Citation1]; [Citation2]).

Implementation of the methods to the COPD trial highlighted several differences observed in the simulations, notably the larger standard error (lower efficiency) of UA, along with the severe underestimation of the standard error and bias in the treatment effect estimate by ME and M.

The choice of methods for handling missing data in RCTs requires careful consideration. Based on our findings, the following recommendations can be provided. For baseline covariates measured before randomization, the missingness mechanism may not substantially impact the results. However, it is essential to differentiate between the missingness mechanisms for the outcome. CCAME, LMMUA and LMMME are preferable in the case of non-MNAR scenarios because these methods provide unbiased treatment effect estimates with good coverage, high power and close-to-nominal FPR. Among these, CCAME is the most straightforward to implement. However, it may lead to reduced power, particularly in scenarios with a very high rate of missingness in the outcome (e.g. exceeding 50%), due to a substantially reduced sample size. MI is a suitable option, although it might have slightly lower power compared to CCAME, especially when the missingness rate is high (30–37%). It should be noted that mean imputation (ME) and missing-indicator method (M) are not recommended for handling missing outcome data due to their tendency to underestimate SEs. They are, nevertheless, more appropriate for managing incomplete covariates in RCTs (see, among others, [Citation6,Citation7]).

Under MNAR mechanisms of the outcome, all methods tend to yield biased treatment effect estimates. Moreover, they consistently performed poorly across various performance criteria, particularly when the missingness in the outcome depended on both the treatment and the outcome itself. Interestingly, the best performing methods under the non-MNAR scenarios were the least severe under the MNAR scenarios. Hence, we recommend the use of CCAME due to its ease of implementation, followed by LMMUA, LMMME and MI. CCA and LMMCCA might also be considered if the missingness rate falls below 10%. It is important to recognize that none of these methods including CCAME can fully address the challenge posed by MNAR mechanisms in the outcome. Modelling the missingness mechanism coupled with sensitivity analysis is the appropriate approach to tackle this issue effectively (see [Citation29], Chap. 5.3).

This study focused exclusively on RCTs with a continuous outcome, addressing the issue of missing data in both covariates (including baseline outcomes) and post-treatment outcomes. An important extension of our research could explore cluster randomized trials with joint missingness in the outcome and multiple covariates. Additionally, investigating the issue of joint missingness within the context of binary or survival data expands the scope of the present analysis.

Data sharing

The data (R-code) that support the findings of this study are available from the corresponding author upon request.

Supplemental material

Supplemental Material

Download Zip (406.9 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Sullivan TR, White IR, Salter AB, et al. Should multiple imputation be the method of choice for handling missing data in randomized trials? Stat Methods Med Res. 2018;27(9):2610–2626. doi:10.1177/0962280216683570
PubMed Web of Science ®Google Scholar
White IR, Thompson SG. Adjusting for partially missing baseline measurements in randomized trials. Stat Med. 2005;24(7):993–1007. doi:10.1002/sim.1981
PubMed Web of Science ®Google Scholar
Jackson D, White IR, Leese M. How much can we learn about missing data?: an exploration of a clinical trial in psychiatry. J R. Stat Soc: Ser A (Stat Soc). 2010;173(3):593–612. doi:10.1111/j.1467-985X.2009.00627.x
PubMedGoogle Scholar
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–592. doi:10.1093/biomet/63.3.581
Web of Science ®Google Scholar
Little RJ, Rubin DB. The analysis of social science data with missing values. Sociol Methods Res. 1989;18(2–3):292–326. doi:10.1177/0049124189018002004
Web of Science ®Google Scholar
Kayembe MT, Jolani S, Tan FE, et al. Imputation of missing covariate in randomized controlled trials with a continuous outcome: scoping review and new results. Pharm Stat. 2020;19(6):840–860. doi:10.1002/pst.2041
PubMed Web of Science ®Google Scholar
Kayembe MT, Jolani S, Tan FE, et al. Imputation of missing covariates in randomized controlled trials with continuous outcomes: simple, unbiased and efficient methods. J Biopharm Stat. 2022;32(5):717–739. doi:10.1080/10543406.2021.2011898
PubMed Web of Science ®Google Scholar
Carpenter JR, Kenward MG. Missing data in randomised controlled trials: a practical guide. Birmingham: London School of Hygiene and Tropical Medicine; 2007.
Google Scholar
Schemper M, Smith TL. Efficient evaluation of treatment effects in the presence of missing covariate values. Stat Med. 1990;9(7):777–784. doi:10.1002/sim.4780090707
PubMed Web of Science ®Google Scholar
Groenwold RH, White IR, Donders ART, et al. Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis. Cmaj. 2012;184(11):1265–1269. doi:10.1503/cmaj.110977
PubMed Web of Science ®Google Scholar
Xi W, Pennell ML, Andridge RR, et al. Comparison of intent-to-treat analysis strategies for pre-post studies with loss to follow-up. Contemporary clinical trials communications. 2018;11:20–29. doi:10.1016/j.conctc.2018.05.008
PubMed Web of Science ®Google Scholar
Afifi AA, Elashoff RM. Missing observations in multivariate statistics I. Review of the literature. J Am Stat Assoc. 1966;61(315):595–604.
Web of Science ®Google Scholar
Little RJ. Regression with missing X’s: a review. J Am Stat Assoc. 1992;87(420):1227–1237.
Web of Science ®Google Scholar
Beck N. OLS in matrix form. Lect Notes. 2001;1:2.
Google Scholar
Jones MP. Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc. 1996;91(433):222–230. doi:10.1080/01621459.1996.10476680
Web of Science ®Google Scholar
Seber GAF. Linear regression analysis. New York: John Wiley & Sons; 1977.
Google Scholar
Rubin DB. Multiple imputation for non-response in Surveys. New York: John Wiley & Sons; 1987.
Google Scholar
Van Buuren S. Flexible imputation of missing data. Boca Raton: Chapman & Hall/CRC Press; 2012.
Google Scholar
Bartlett JW, Seaman SR, White IR, et al. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res. 2015;24(4):462–487. doi:10.1177/0962280214521348
PubMed Web of Science ®Google Scholar
Van Breukelen GJ. ANCOVA versus CHANGE from baseline in nonrandomized studies: the difference. Multivariate Behav Res. 2013;48(6):895–922. doi:10.1080/00273171.2013.831743
PubMed Web of Science ®Google Scholar
Winkens B, van Breukelen GJ, Schouten HJ, et al. Randomized clinical trials with a pre-and a post-treatment measurement: repeated measures versus ANCOVA models. Contemp Clin Trials. 2007;28(6):713–719. doi:10.1016/j.cct.2007.04.002
PubMed Web of Science ®Google Scholar
Lu K, Mehrotra DV. Specification of covariance structure in longitudinal data analysis for randomized clinical trials. Stat Med. 2010;29(4):474–488. doi:10.1002/sim.3820
PubMed Web of Science ®Google Scholar
Liu GF, Lu K, Mogg R, et al. Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Stat Med. 2009;28(20):2509–2530. doi:10.1002/sim.3639
PubMed Web of Science ®Google Scholar
Cohen J. Statistical power analysis for the behavioral sciences. New York: Academic Press; 1969.
Google Scholar
Rice ME, Harris GT. Comparing effect sizes in follow-up studies: ROC Area, Cohen’s d, and r. Law Hum Behav. 2005;29(5):615–620. doi:10.1007/s10979-005-6832-7
PubMed Web of Science ®Google Scholar
Seaman SR, Hughes RA. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model. Stat Methods Med Res. 2018;27(6):1603–1614. doi:10.1177/0962280216665872
PubMed Web of Science ®Google Scholar
White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377–399. doi:10.1002/sim.4067
PubMed Web of Science ®Google Scholar
Fastenau A, Van Schayck OC, Winkens B, et al. Effectiveness of an exercise training programme COPD in primary care: a randomized controlled trial. Respir Med. 2020;165:105943. doi:10.1016/j.rmed.2020.105943
PubMed Web of Science ®Google Scholar
Tan FES, Jolani S. Applied linear regression for longitudinal data with an emphasis on missing observations. Boca Raton: Chapman & Hall/CRC Press; 2022.
Google Scholar

Appendix

Results of simulation scenarios with joint missingness (Y, X₂, X₃) under missingness mechanisms 8–14 for Y and the MCAR mechanism for X₂ and X₃.

Figure A6. Bias of the treatment effect estimate per missing data method, missingness mechanism (columns), true treatment effect (rows), and missingness rate (curves).

Figure A7. Coverage (%) of the 95% CI for the treatment effect estimate per missing data method, missingness mechanism (columns), true treatment effect (rows), and missingness rate (curves). Lower and upper horizontal reference lines (94.4 and 95.6) correspond to

95 \mp \sqrt{95 \times 5 / 1500}

(Monte Carlo Errors), with 1500 simulated datasets per scenario.

Figure A7. Coverage (%) of the 95% CI for the treatment effect estimate per missing data method, missingness mechanism (columns), true treatment effect (rows), and missingness rate (curves). Lower and upper horizontal reference lines (94.4 and 95.6) correspond to 95∓95×5/1500 (Monte Carlo Errors), with 1500 simulated datasets per scenario.

Figure A8. Mean squared error (MSE) for the treatment effect estimate per missing data method, missingness mechanism (columns), true treatment effect (rows), and missingness rate (curves).

Figure A9. False positive rate (FPR) for detecting a treatment effect per missing data method, missingness mechanism (columns), true treatment effect (rows), and missingness rate (curves). Horizontal reference line is drawn at 5% FPR.

Figure A10. Power (%) to detect a treatment effect per missing data method, missingness mechanism (columns), true treatment effect (rows), and missingness rate (curves).

Dealing with missing observations in the outcome and covariates in randomized controlled trials

Abstract

1. Introduction