Full article: Model averaging for generalized linear models in fragmentary data prediction

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Fragmentary data is becoming more and more popular in many areas which brings big challenges to researchers and data analysts. Most existing methods dealing with fragmentary data consider a continuous response while in many applications the response variable is discrete. In this paper, we propose a model averaging method for generalized linear models in fragmentary data prediction. The candidate models are fitted based on different combinations of covariate availability and sample size. The optimal weight is selected by minimizing the Kullback–Leibler loss in the completed cases and its asymptotic optimality is established. Empirical evidences from a simulation study and a real data analysis about Alzheimer disease are presented.

KEYWORDS:

1. Introduction

Our study is motivated by the study of Alzheimer disease (AD). Its main clinical features are the decline of cognitive function, mental symptoms, behaviour disorders, and the gradual decline of activities of daily living. It is the most common cause of dementia among people over age of 65, yet no prevention methods or cures have been discovered. The Alzheimer's Disease Neuroimaging Initiative (ADNI, http://adni.loni.usc.edu) is a global research programme that actively supports the investigation and development of treatments that slow or stop the progression of AD. The researchers collect multiple sources of data from voluntary subjects including: cerebrospinal fluid (CSF), positron emission tomography (PET), magnetic resonance imaging (MRI) and genetics data (GENE). In addition, mini-mental state examination (MMSE) score is collected for each subject, which is an important diagnostic criterion for AD. Our target is to establish a model focussing on the AD prediction (the probability of Alzheimer). This task is relatively easy if the data are fully observed. However, in ADNI data, not all the data sources are available for each subject. As we can see from Table in Section 5, among the total of 1170 subjects, only 409 of them have all the covariate data available, 368 of them do not have the GENE data, 40 of them do not have the MRI data, and so on. Such kind of ‘fragmentary data’ nowadays is very common in the area of medical studies, risk management, marketing research and social sciences (Fang et al., Citation2019; Lin et al., Citation2021; Xue & Qu, Citation2021; Y. Zhang et al., Citation2020). But the extremely high missing rate and complicated missing patterns bring big challenges to the analysis of fragmentary data.

Table 1. An illustrative example for fragmentary data.

Display Table

Table 2. Response patterns and sample sizes for ADNI data.

Download CSV Display Table

In this paper we discuss the model averaging methods for fragmentary data prediction. Model averaging is historically proposed as an alternative to model selection. The most well-known model selection methods include AIC (Akaike, Citation1970), Mallows $C_{p}$ (Mallows, Citation1973), BIC (Schwarz, Citation1978), lasso (Tibshirani, Citation1996), smoothly clipped absolute deviation (Fan & Li, Citation2001), sure independence screening (Fan & Lv, Citation2008) and so on.

Model averaging, unlike most variable selection methods which focus on identifying a single ‘correct model’, aims to the prediction accuracy given several predictors (Ando & Li, Citation2014). Without ‘putting all inferential eggs in one unevenly woven basket’ (Longford, Citation2005), model averaging takes all the candidate models into account and makes prediction by a weighted average, which can be classified into Bayesian and frequentist model averaging. In this paper, we focus on frequentist model averaging (Buckland et al., Citation1997; Hansen, Citation2007; Hjort & Claeskens, Citation2003; Leung & Barron, Citation2006; Yang, Citation2001, Citation2003, among many others) and refer readers being interested in Bayesian model averaging to Hoeting et al. (Citation1999) and the references therein. Researchers have developed many frequestist model averaging methods over the past two decades. To just name a few, the smoothed AIC and smoothed BIC (Buckland et al., Citation1997), Mallows model averaging (Hansen, Citation2007), Jackknife model averaging (Hansen & Racine, Citation2012) and heteroskedasticity-robust $C_{p}$ (Liu & Okui, Citation2013) mainly focus on low dimensional linear models. Ando and Li (Citation2014) and X. Zhang et al. (Citation2020) consider least squares model averaging with high dimensional data. For more complex models, we have model averaging for generalized linear models (Ando & Li, Citation2017; Zhang et al., Citation2016), quantile regression (Lu & Su, Citation2015), semiparametric ‘model averaging marginal regression’ for time series (Chen et al., Citation2018; D. Li et al., Citation2015), model averaging for covariance matrix estimation (Zheng et al., Citation2017), varying-coefficient models (C. Li et al., Citation2018; Zhu et al., Citation2019), vector autoregressions (Liao et al., Citation2019), semiparametric model averaging for the dichotomous response (Fang et al., Citation2022), and so on.

All the model averaging methods mentioned above assume that the data are fully observed and can not be applied to fragmentary data directly. Due to the extra large missing rate and complex response patterns, the traditional missing data techniques such as imputation and inverse propensity weighting (Kim & Shao, Citation2013; Little & Rubin, Citation2002) can not be efficiently applied either. Recently, Y. Zhang et al. (Citation2020), Xue and Qu (Citation2021) and Lin et al. (Citation2021) develop some methods for block-wise missing or individual-specific missing data. But they only consider a continuous response.

On the other hand, Schomaker et al. (Citation2010) and Dardanoni et al. (Citation2015, Citation2011) propose model averaging methods based on imputation with no asymptotic optimality. Zhang (Citation2013) proposes a model averaging method by imputing the missing data by zeros for linear models. Liu and Zheng (Citation2020) extends it to generalized linear models. In the context of fragmentary data, Fang et al. (Citation2019) proposes a model averaging method to select weight by cross-validation on the complete cases and shows its advantage to the previous model averaging methods. Ding et al. (Citation2021) extends it to multiple quantile regression. Asymptotic optimalities are established for the last four methods but they are only applicable to a continuous response except Liu and Zheng (Citation2020).

In this paper, we propose a model averaging method for fragmentary data prediction in generalized linear models. The candidate models are fitted based on different combinations of covariate availability and sample size. The optimal weight is selected by minimizing the Kullback–Leibler loss in the completed cases and its asymptotic optimality is established. Unlike the methods in Fang et al. (Citation2019), our method does not need to refit the candidate models in the complete cases for weight selection. Empirical results from a simulation study and a real data analysis about Alzheimer disease show the superiority of the proposed method.

The paper is organized as follows. Section 2 discusses the proposed method in details. Asymptotic optimality is established in Section 3. Empirical results of a simulation study and a real data analysis are presented in Sections 4 and 5, respectively. Section 6 concludes the paper with some remarks. All the proofs are provided in the Appendix.

2. The proposed method

For illustration, we consider the fragmentary data in Fang et al. (Citation2019) as presented in Table . Assume we observe n subjects with a response variable Y and a covariate set $V = {X_{j} : j = 1, \dots, p}$ . Only a covariate subset $V_{i} \subseteq V$ for each subject i can be observed. Note that $V_{1} = V_{2} = {X_{1}, \dots, X_{8}}$ , $V_{3} = {X_{1}, X_{2}, X_{3}}$ and so on. All the covariate subsets can be classified into different response patterns denoted by ${Δ_{k} : k = 1, \dots, K}$ . In Table , K = 7, $Δ_{1} = {X_{1}, \dots, X_{8}}$ , $Δ_{2} = {X_{1}, X_{2}, X_{3}}$ , …, and $Δ_{7} = {X_{1}, X_{2}, X_{7}, X_{8}}$ . For notation simplicity, throughout the paper we also use $V_{i}$ or $Δ_{k}$ to denote the set of indices of the covariates in $V_{i}$ or $Δ_{k}$ , e.g., $V_{i} = {X_{1}, X_{2}, X_{3}} = {1, 2, 3}$ or $Δ_{k} = {X_{1}, X_{4}, X_{5}, X_{6}} = {1, 4, 5, 6}$ . Denote $S_{k} = {i : V_{i} \supseteq Δ_{k}}$ as the subject set with covariates in $Δ_{k}$ being available. In Table , $S_{1} = {1, 2}$ , $S_{2} = {1, 2, 3, 4}$ , …, and $S_{7} = {1, 2, 4, 9, 10}$ .

Our target is to make prediction given the fragmentary data ${(y_{i}, x_{i j}) : i = 1, \dots, n, j \in V_{i}}$ , where $y_{i}$ 's and $x_{i j}$ 's are observations of Y and $X_{j}$ whenever they are observed. Specifically, consider that Y given $X = (X_{1}, \dots, X_{p})^{⊤}$ has an exponential family distribution (1) $f (Y | X) = \exp {\frac{Y θ (X) - b (θ (X))}{ϕ} + c (Y, ϕ)}$ (1) for some known functions $b (\cdot)$ , $c (\cdot, \cdot)$ and a known dispersion parameter ϕ. The canonical parameter $θ (\cdot)$ is unknown. For a new subject with available covariate data $V^{*} \in {Δ_{k} : k = 1, \dots, K}$ , we need to estimate $θ (V^{*})$ .

Without loss of generality, we assume that $Δ_{1} = V = {1, 2, \dots, p}$ . Then $S_{1}$ is the CC (complete cases) sample in the missing data terminology. Similar to Fang et al. (Citation2019), we mainly focus on prediction of $θ (x^{*})$ with $x^{*} = (x_{1}^{*}, \dots, x_{p}^{*})^{⊤}$ from pattern $Δ_{1}$ , i.e., $V^{*} = Δ_{1} = V$ . Any $x^{*}$ from other pattern $Δ_{k}$ can be handled in the same way by ignoring the covariates not in $Δ_{k}$ , which will be illustrated in the real data analysis.

As discussed in Fang et al. (Citation2019), there exists a natural trade-off between the covariates included in the prediction model and available sample size. Taking Table as an example, if we want to include all the 8 covariates in the model, only subjects 1 and 2 can be used without imputation. But if we only include the first covariate in the model, all the 10 subjects can be used. This trade-off naturally prepares a sequence of candidate models for model averaging.

Specifically, we can fit a generalized linear model $M_{k}$ on the data ${(y_{i}, x_{i j}) : i \in S_{k}, j \in Δ_{k}}$ and try to combine the prediction results from all the candidate models ${M_{k} : k = 1, \dots, K}$ . Denote $y = (y_{1}, \dots, y_{n})^{⊤}$ and $x_{i} = (x_{i 1}, \dots, x_{i p})^{⊤}$ . Besides, the design matrix of $M_{k}$ is expressed as $X_{k} = (x_{i j} : i \in S_{k}, j \in Δ_{k}) \in R^{n_{k} \times p_{k}}$ , where $n_{k} = | S_{k} |$ and $p_{k} = | Δ_{k} |$ . We assume $n_{1} \geq p$ . Consequently $n_{k} \geq p_{k}$ since $n_{k} \geq n_{1}$ and $p_{k} \leq p$ . The candidate model $M_{k}$ is expressed as (2) $\begin{aligned} f (y_{i} | θ_{i}^{(k)}, ϕ) \\ = \exp {\frac{y_{i} θ_{i}^{(k)} - b (θ_{i}^{(k)})}{ϕ} + c (y_{i}, ϕ)}, i \in S_{k}, \end{aligned}$ (2) where $θ_{i}^{(k)}$ is the i-th element of the parameter $θ_{(k)} = (θ_{i}^{(k)}, i \in S_{k})^{⊤}$ . It is modelled by a linear model $θ_{(k)} = X_{k} β_{(k)}$ . Denote the maximum likelihood estimator of $β_{(k)}$ by ${\hat{β}}_{(k)}$ . Note that we do not assume that the true model $θ (X)$ in (Equation1(1) $f (Y | X) = \exp {\frac{Y θ (X) - b (θ (X))}{ϕ} + c (Y, ϕ)}$ (1) ) is indeed a linear function of X. Thus, all the candidate models can be misspecified. For a new $x^{*} = (x_{1}^{*}, \dots, x_{p}^{*})^{⊤}$ , we predict $θ (x^{*})$ by ${\hat{θ}}^{*} (w) = \sum_{k = 1}^{K} w_{k} x_{k}^{*^{⊤}} {\hat{β}}_{(k)} = x^{*^{⊤}} \hat{β} (w),$ where $x_{k}^{*} = (x_{j}^{*} : j \in Δ_{k})^{⊤}$ , $\hat{β} (w) = \sum_{k = 1}^{K} w_{k} Π_{k}^{⊤} {\hat{β}}_{(k)}$ , $Π_{k}$ is a projection matrix of size $p_{k} \times p$ consisting of 0 or 1 such that $x_{k}^{*} = Π_{k} x^{*}$ , and the weight vector $w = (w_{1}, \dots, w_{K})^{⊤}$ belongs to $H_{n} = {w \in [0, 1]^{K} : \sum_{k = 1}^{K} w_{k} = 1} .$ Let $θ {\hat{β} (w)} = (θ_{1} {\hat{β} (w)}, \dots, θ_{n_{1}} {\hat{β} (w)})^{⊤} = X_{1} \hat{β} (w)$ be the model averaging estimator of $θ_{(1)}$ . Our weight choice criterion is motivated by the Kullback–Leibler (KL) loss in Zhang et al. (Citation2016) and is defined as follows. Denote the true value of $θ_{(1)}$ as $θ_{0} = (θ_{0_{1}}, \dots, θ_{0_{n_{1}}})^{⊤}$ . Let $y^{*} = (y_{1}^{*}, \dots, y_{n_{1}}^{*})^{⊤}$ be another realization from $f (\cdot | θ_{0}, ϕ)$ and independent of $y$ . The KL loss of $θ {\hat{β} (w)}$ is (3) $\begin{aligned} K L (w) & = 2 \sum_{i \in S_{1}} E_{y^{*}} {\log {f (y_{i}^{*} | θ_{0 i}, ϕ)} \\ - \log (f [y_{i}^{*} | θ_{i} {\hat{β} (w)}, ϕ])} \\ = 2 ϕ^{- 1} B {\hat{β} (w)} - 2 ϕ^{- 1} μ_{S_{1}}^{⊤} θ {\hat{β} (w)} \\ - 2 ϕ^{- 1} B_{0} + 2 ϕ^{- 1} μ_{S_{1}}^{⊤} θ_{0} \\ = 2 J (w) - 2 ϕ^{- 1} B_{0} + 2 ϕ^{- 1} μ_{S_{1}}^{⊤} θ_{0}, \end{aligned}$ (3) where $B_{0} = \sum_{i \in S_{1}} b (θ_{0 i})$ , $B {\hat{β} (w)} = \sum_{i \in S_{1}} b [θ_{i} {\hat{β} (w)}]$ , $μ_{S_{1}} = (μ_{S_{1}, 1}, \dots, μ_{S_{1}, n_{1}})^{⊤} = (E (y_{i} | i \in S_{1}), i = 1, \dots, n_{1})^{⊤}$ and $\begin{aligned} J (w) & = ϕ^{- 1} B {\hat{β} (w)} - ϕ^{- 1} μ_{S_{1}}^{⊤} θ {\hat{β} (w)} \\ = ϕ^{- 1} {\sum_{i \in S_{1}} b [θ_{i} {\hat{β} (w)}] - \sum_{i \in S_{1}} μ_{S_{1}, i} θ_{i} {\hat{β} (w)}} . \end{aligned}$ As Zhang et al. (Citation2016) discussed, we would obtain a weight vector by minimizing $J (w)$ given $μ_{S_{1}}$ . However, it is infeasible in practice to do so since the parameter $μ_{S_{1}}$ is unknown. Instead, we replace $μ_{S_{1}}$ by $y_{S_{1}} = (y_{i}, i \in S_{1})^{⊤}$ and add an penalty term to $J (w)$ to avoid overfitting, which gives us the following weight choice criterion $\begin{aligned} G (w) & = 2 ϕ^{- 1} {\sum_{i \in S_{1}} b [θ_{i} {\hat{β} (w)}] - \sum_{i \in S_{1}} y_{i} θ_{i} {\hat{β} (w)}} \\ + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k}, \end{aligned}$ where $λ_{n} \sum_{k = 1}^{K} w_{k} p_{k}$ is the penalty term, $λ_{n}$ is a tuning parameter that usually takes value 2 or $\log (n_{1})$ , and $p_{k}$ is the number of variables in the k-th candidate model. The optimal weight vector is defined as (4) $\hat{w} = {a r g m i n}_{w \in H_{n}} G (w) .$ (4)

Remark 2.1

Basically, our idea is to use all available data to estimate parameters for each candidate model and use CC data to construct the optimal weights. This is similar to Fang et al. (Citation2019) that deals with linear models for fragmentary data. However, unlike Fang et al. (Citation2019), our proposed method does not need to refit the candidate models in the CC data to decide the optimal weight. Similar to Zhang (Citation2013), Liu and Zheng (Citation2020) selects weights by applying KL loss to the entire data with unavailable covariate data replaced by zeros, which does not perform quite well in the empirical studies.

Remark 2.2

Under the logistic regression model, $ϕ = 1$ and $b (θ) = \log (1 + e^{θ})$ . Let $θ_{i} {\hat{β} (w)} = \log \frac{{\hat{p}}_{i} (w)}{1 - {\hat{p}}_{i} (w)}$ . Then (5) $\begin{aligned} J (w) & = \sum_{i \in S_{1}} \log [1 + e^{θ_{i} {\hat{β} (w)}}] - \sum_{i \in S_{1}} μ_{S_{1}, i} θ_{i} {\hat{β} (w)} \\ = - \sum_{i \in S_{1}} \log {1 - {\hat{p}}_{i} (w)} - \sum_{i \in S_{1}} μ_{S_{1}, i} \log \frac{{\hat{p}}_{i} (w)}{1 - {\hat{p}}_{i} (w)} \\ = - \sum_{i \in S_{1}} [μ_{S_{1}, i} \log {\hat{p}}_{i} (w) + (1 - μ_{S_{1}, i}) \log {1 - {\hat{p}}_{i} (w)}] \end{aligned}$ (5) and $\begin{aligned} G (w) & = - 2 \sum_{i \in S_{1}} [y_{i} \log {\hat{p}}_{i} (w) + (1 - y_{i}) \log {1 - {\hat{p}}_{i} (w)}] \\ + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k} . \end{aligned}$

3. Asymptotic optimality

Let $β_{(k)}^{*}$ be the parameter vector that minimizes the KL divergence between the true model and the k-th candidate model (Equation2(2) $\begin{aligned} f (y_{i} | θ_{i}^{(k)}, ϕ) \\ = \exp {\frac{y_{i} θ_{i}^{(k)} - b (θ_{i}^{(k)})}{ϕ} + c (y_{i}, ϕ)}, i \in S_{k}, \end{aligned}$ (2) ). From Theorem 3.2 of White (Citation1982), we know that, under certain regularity conditions, (6) ${\hat{β}}_{(k)} - β_{(k)}^{*} = O_{p} (n_{k}^{- 1 / 2}) = O_{p} (n_{1}^{- 1 / 2}) .$ (6) Let $ϵ_{S_{1}} = (ϵ_{S_{1}, 1}, \dots, ϵ_{S_{1}, n_{1}})^{⊤} = (y_{1}, \dots, y_{n_{1}})^{⊤} - μ_{S_{1}}$ , ${\bar{σ}}^{2} = max_{i \in S_{1}} V a r (ϵ_{S_{1}, i})$ , $β^{*} (w) = \sum_{k = 1}^{K} w_{k} Π_{k}^{⊤} β_{(k)}^{*}$ , $\begin{aligned} {K L}^{*} (w) & = 2 ϕ^{- 1} B {β^{*} (w)} - 2 ϕ^{- 1} B_{0} \\ - 2 ϕ^{- 1} μ_{S_{1}}^{⊤} [θ {β^{*} (w)} - θ_{0}], \end{aligned}$ and $ξ_{n} = inf_{w \in H_{n}} {K L}^{*} (w)$ . We assume the following conditions.

(C1)	$∥ X_{1}^{⊤} μ_{S_{1}} ∥= O (n_{1}), ∥ X_{1}^{⊤} ϵ_{S_{1}} ∥= O_{p} (n_{1}^{1 / 2})$ , anduniformly for $w \in H_{n}$ , $∥ \partial B (β) / \partial β^{⊤} \|_{β = \tilde{β} (w)} ∥= O_{p} (n_{1})$ for every $\tilde{β} (w)$ between $\hat{β} (w)$ and $β^{*} (w)$ .
(C2)	Uniformly for $k \in {1, \dots, K}$ , $n_{1}^{- 1} {\bar{σ}}^{2} \times ∥ θ (Π_{k}^{⊤} β_{(k)}^{*}) ∥^{2} = O (1)$ .
(C3)	$n_{1} ξ_{n}^{- 2} = o (1)$ .

The following theorem establishes the asymptotic optimality of the model averaging estimator $θ {\hat{β} (\hat{w})}$ .

Theorem 3.1

Under Equation (Equation6(6) ${\hat{β}}_{(k)} - β_{(k)}^{*} = O_{p} (n_{k}^{- 1 / 2}) = O_{p} (n_{1}^{- 1 / 2}) .$ (6) ), conditions (C1)∼(C3), and $n_{1}^{- 1 / 2} λ_{n} = O (1)$ , we have $\frac{K L (\hat{w})}{inf_{w \in H_{n}} K L (w)} \to_{p} 1,$ where $K L (w)$ is defined in (Equation3(3) $\begin{aligned} K L (w) & = 2 \sum_{i \in S_{1}} E_{y^{*}} {\log {f (y_{i}^{*} | θ_{0 i}, ϕ)} \\ - \log (f [y_{i}^{*} | θ_{i} {\hat{β} (w)}, ϕ])} \\ = 2 ϕ^{- 1} B {\hat{β} (w)} - 2 ϕ^{- 1} μ_{S_{1}}^{⊤} θ {\hat{β} (w)} \\ - 2 ϕ^{- 1} B_{0} + 2 ϕ^{- 1} μ_{S_{1}}^{⊤} θ_{0} \\ = 2 J (w) - 2 ϕ^{- 1} B_{0} + 2 ϕ^{- 1} μ_{S_{1}}^{⊤} θ_{0}, \end{aligned}$ (3) ) and $\hat{w}$ is defined in (Equation4(4) $\hat{w} = {a r g m i n}_{w \in H_{n}} G (w) .$ (4) ).

Conditions (C1)–(C3) are similar to Conditions (C.1)–(C.3) in Zhang et al. (Citation2016). What is slightly different is the order $O (n_{1})$ other than $O (n)$ . It is rational because our weights selection is based on CC ( $i \in S_{1}$ ) data with sample size $n_{1}$ . Condition (C3) requires that $ξ_{n}$ grows at a rate no slower than $n_{1}^{1 / 2}$ , which is the same as the third part of Condition (A7) of Zhang et al. (Citation2014), and is also implied by Conditions (7) and (8) of Ando and Li (Citation2014). Condition (C3) is imposed in order to obtain the asymptotic optimality, which is slightly stronger than that $ξ_{n} \to \infty$ . Note that Theorem 3.1 holds when both $λ_{n} = 2$ and $λ_{n} = \log (n_{1})$ . These two versions of model averaging methods are both applied in Sections 4 and 5.

4. Simulation

In this section, we conduct a simulation study to compare the finite sample performances of the following methods.

CC: a generalized linear regression using subjects that all the covariates are available.
SAIC & SBIC: use the smoothed AIC and smoothed BIC in Buckland et al. (Citation1997) to decide the model weights.
IMP: the zero imputation method in Liu and Zheng (Citation2020). We use IMP1 and IMP2 to denote the IMP method with $λ_{n} = 2$ and $\log (n_{1})$ , respectively.
GLASSO: the method using CC data and group lasso of Meier et al. (Citation2008) to select covariates and fitting a model with the subjects that have all the selected covariates available.
OPT: the proposed method. We use OPT1 and OPT2 to denote the OPT method with $λ_{n} = 2$ and $\log (n_{1})$ , respectively.

The data is generated as follows. A binary $y_{i}$ is generated from model Binomial(1, $p_{i}$ ) with $\begin{aligned} p_{i} & = \exp (\sum_{j = 1}^{p} β_{j} x_{i j}) / {1 + \exp (\sum_{j = 1}^{p} β_{j} x_{i j})}, \\ i & = 1, \dots, n, \end{aligned}$ where p = 14, $β = 0.4 \times (1, 1 / 2, \dots, 1 / p)$ , $0.1 \times (1, 1, \dots, 1)$ or $0.2 \times (1 / p, \dots, 1 / 2, 1)$ , $x_{i 1} = 1$ , $(x_{i 2}, \dots, x_{i p})$ is generated from a multivariate normal distribution with $E (x_{i j}) = 1, V a r (x_{i j}) = 1$ , and $C o v (x_{i j_{1}}, x_{i j_{2}}) = ρ$ for $j_{1} \neq j_{2}$ , $ρ = 0.3$ , 0.6 or 0.9, and the sample size n = 400 or 800.

To mimic the situation that all candidate models are misspecified, we pretend that the last covariate is not available for all the candidate models. The remaining 12 covariates other than the intercept are divided into 3 groups. The s-th group consists of $X_{4 (s - 1) + 2}$ to $X_{4 s + 1},$ s = 1, 2, 3. The covariates in the s-th group are available if the first covariate of each group $X_{4 (s - 1) + 2} < 1$ , which results in K = 8. The percentages of CC ( $S_{1}$ ) data are $19 %$ , $25.5 %$ and $38.8 %$ , respectively for $ρ = 0.3$ , 0.6 and 0.9. We consider the prediction when $V^{*} = V$ and use KL loss (divided by $n_{1}$ ) defined in (Equation5(5) $\begin{aligned} J (w) & = \sum_{i \in S_{1}} \log [1 + e^{θ_{i} {\hat{β} (w)}}] - \sum_{i \in S_{1}} μ_{S_{1}, i} θ_{i} {\hat{β} (w)} \\ = - \sum_{i \in S_{1}} \log {1 - {\hat{p}}_{i} (w)} - \sum_{i \in S_{1}} μ_{S_{1}, i} \log \frac{{\hat{p}}_{i} (w)}{1 - {\hat{p}}_{i} (w)} \\ = - \sum_{i \in S_{1}} [μ_{S_{1}, i} \log {\hat{p}}_{i} (w) + (1 - μ_{S_{1}, i}) \log {1 - {\hat{p}}_{i} (w)}] \end{aligned}$ (5) ) for assessment. The number of simulation runs is 200. Figures – present the KL loss boxplots for each method under different simulation settings. The main conclusions are as follows.

The SAIC, SBIC and CC methods perform much worse than OPT1 and OPT2. In many situations, these three methods perform quite similar, indicating that SAIC and SBIC tend to select the model with more covariates and smaller sample size ( $M_{1}$ with CC data).
The zero imputation methods IMP1 and IMP2 generally perform not as well as the proposed methods OPT1 and OPT2. Some exceptions happen when n and ρ are small (for example, the first panel in Figure ), in which the usage of zeros to replace unavailable covariates has relatively small effect on the prediction.
The performance of GLASSO is also worse than the proposed methods, which shows the model selection method does not work quite well when the models are misspecified.
The proposed method OPT1 produces the lowest KL loss in most situations.

Figure 1. The KLs of all the methods in 200 replications when $β = 0.4 \times (1, 1 / 2, \dots, 1 / p)$ .

Figure 1. The KLs of all the methods in 200 replications when β=0.4×(1,1/2,…,1/p).

Figure 2. The KLs of all the methods in 200 replications when $β = 0.1 \times (1, 1, \dots, 1)$ .

Figure 2. The KLs of all the methods in 200 replications when β=0.1×(1,1,…,1).

Figure 3. The KLs of all the methods in 200 replications when $β = 0.2 \times (1 / p, \dots, 1 / 2, 1)$ .

Figure 3. The KLs of all the methods in 200 replications when β=0.2×(1/p,…,1/2,1).

5. A real data example

To illustrate the application of our proposed method, we consider the ADNI data which is available at http://adni.loni.usc.edu. The ADNI data contains three different phases: ADNI1, ADNIGO, and ADNI2. In this paper, we use ADNI2 in which some new model data are added. For every subject, different visits at longitudinal time points are recorded and here we focus on the baseline data. As we have mentioned in Section 1, the ADNI data mainly includes four different sources: CSF, PET, MRI and GENE. The CSF data includes 3 variables: ABETA, TAU and PTAU. Quantitative variables from the PET images are computed by Helen Wills Neuroscience Institute, UC Berkeley and Lawrence Berkeley National Laboratory containing 241 variables. The MRI is segmented and analysed in FreeSurfer by the Center for Imaging of Neurodegenerative Diseases at the University of California -- San Francisco, which produces 341 variables on volume, surface area, and thickness of regions of interest. GENE, which plays an important role in AD, contains 49,386 variables.

The overall sample size is 1170. The K = 8 response patterns and sample size for each pattern are presented in Table . The total missing rate is about $65 %$ . The MMSE provides a picture of an individual's present cognitive performance based on direct observation of completion of test items. A score of $< 28$ is the general cutoff indicating the presence of cognitive impairment. As a result, we classify the MMSE score into two levels and consider the binary response Y = 1 if the MMSE score is no less than 28 and Y = 0 otherwise.

It can be seen that the data is high dimensional, which may contain variables with redundant information. Thus, we first use correlation screening to select features that are most likely to be related to the response variable. All the 3 variables in CSF are kept and 10 variables each for PET, MRI and GENE are screened. We also tried other variable number but found that this screening procedure gave us the smallest KL loss.

To compare the prediction performances of the methods considered in the simulation, we randomly select $75 %$ of the subjects from each response pattern, combine them to a training data for model fitting, and use the rest of the subjects as the test data for performance evaluation. For each of the considered methods, we use the training data to fit the model, apply it to the test data, and compute the KL loss of the predictions on test data. The KL loss instead of misclassification rate is considered because the probability of AD is what we really care. We repeat this procedure independently for 100 replications.

Note that in this real data analysis, we do not only consider the prediction for $V^{*} = V$ . For $V^{*} \neq {C S F, P E T, M R I, G E N E}$ , the proposed method ignores the covariates not in $V^{*}$ for modelling and prediction. For example, when $V^{*} = {P E T, M R I, G E N E}$ , the covariates from ‘CSF’ are ignored and only 5 candidate models are considered. More details for this kind of procedure can be found in Fang et al. (Citation2019).

Figure displays boxplots of the KL losses over 100 replications for different methods. The boxplots for IMP1 and IMP2 are not shown in the figure because their KL losses are too large. The proposed methods OPT1 and OPT2 outperform the other methods.

Figure 4. The KL loss of all the methods in 100 replications for ADNI data.

6. Concluding remarks

Fragmentary data is becoming more and more popular in many areas and it is not easy to handle. Most existing methods dealing with fragmentary data consider a continuous response while in many applications the response variable is discrete. We propose a model averaging method to deal with fragmentary data under generalized linear models. The asymptotic optimality is established and empirical results from a simulation study and a real data analysis about Alzheimer disease show the superiority of the proposed method.

There are several topics for our future study. First, the covariate dimension p and the number of candidate models K are assumed to be fixed. The asymptotic optimality with diverging p and K needs further investigation. Second, we do not focus on the comparison of $λ_{n} = 2$ and $λ_{n} = \log (n_{1})$ . Which tuning parameter should we use in the practice? In fact, how to choose the best tuning parameter for model averaging is still a challenging problem even under linear models. Third, we assume the overall model belongs to an exponential family which is still restrictive. The extension to more general models deserves further study.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research of Fang was supported by National Key R&D Program of China [grant numbers 2021YFA1000100, 2021YFA1000101] and National Natural Science Foundation of China [grant numbers 11831008, 12071143].

References

Akaike, H. (1970). Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22(1), 203–217. https://doi.org/10.1007/BF02506337
Web of Science ®Google Scholar
Ando, T., & Li, K.-C. (2014). A model averaging approach for high dimensional regression. Journal of American Statistical Association, 109(505), 254–265. https://doi.org/10.1080/01621459.2013.838168
Web of Science ®Google Scholar
Ando, T., & Li, K.-C. (2017). A weight-relaxed model averaging approach for high-dimensional generalized linear models. The Annals of Statistics, 45(6), 2654–2679. https://doi.org/10.1214/17-AOS1538
Web of Science ®Google Scholar
Buckland, S. T., Burnham, K. P., & Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics, 53(2), 603–618. https://doi.org/10.2307/2533961
Web of Science ®Google Scholar
Chen, J., Li, D., Linton, O., & Lu, Z. (2018). Semiparametric ultra-high dimensional model averaging of nonlinear dynamic time series. Journal of the American Statistical Association, 113(522), 919–932. https://doi.org/10.1080/01621459.2017.1302339
Web of Science ®Google Scholar
Dardanoni, V., Luca, G. D., Modica, S., & Peracchi, F. (2015). Model averaging estimation of generalized linear models with imputed covariates. Journal of Econometrics, 184(2), 452–463. https://doi.org/10.1016/j.jeconom.2014.06.002
Web of Science ®Google Scholar
Dardanoni, V., Modica, S., & Peracchi, F. (2011). Regression with imputed covariates: A generalized missing indicator approach. Journal of Econometrics, 162(2), 362–368. https://doi.org/10.1016/j.jeconom.2011.02.005
Web of Science ®Google Scholar
Ding, X., Xie, J., & Yan, X. (2021). Model averaging for multiple quantile regression with covariates missing at random. Journal of Statistical Computation and Simulation, 91(11), 2249–2275. https://doi.org/10.1080/00949655.2021.1890733
Web of Science ®Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association, 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273
Web of Science ®Google Scholar
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussions). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911. https://doi.org/10.1111/rssb.2008.70.issue-5
PubMed Web of Science ®Google Scholar
Fang, F., Li, J., & Xia, X. (2022). Semiparametric model averaging prediction for dichotomous response. Journal of Econometrics, 229(2), 219–245. https://doi.org/10.1016/j.jeconom.2020.09.008
Google Scholar
Fang, F., Wei, L., Tong, J., & Shao, J. (2019). Model averaging for prediction with fragmentary data. Journal of Business & Economic Statistics, 37(3), 517–527. https://doi.org/10.1080/07350015.2017.1383263
Web of Science ®Google Scholar
Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75(4), 1175–1189. https://doi.org/10.1111/ecta.2007.75.issue-4
Web of Science ®Google Scholar
Hansen, B. E., & Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics, 167(1), 38–46. https://doi.org/10.1016/j.jeconom.2011.06.019
Web of Science ®Google Scholar
Hjort, N. L., & Claeskens, G. (2003). Frequentist model average estimators. Journal of American Statistical Association, 98(464), 879–899. https://doi.org/10.1198/016214503000000828
Web of Science ®Google Scholar
Hoeting, J., Madigan, D., Raftery, A., & Volinsky, C. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382–401. https://doi.org/10.1214/ss/1009212519
Web of Science ®Google Scholar
Kim, J. K., & Shao, J. (2013). Statistical methods for handling incomplete data. Chapman & Hall/CRC.
Google Scholar
Leung, G., & Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on Information Theory, 52(8), 3396–3410. https://doi.org/10.1109/TIT.2006.878172
Web of Science ®Google Scholar
Li, C., Li, Q., Racine, J. S., & Zhang, D. (2018). Optimal model averaging of varying coefficient models. Statistica Sinica, 28(2), 2795–2809. https://doi.org/10.5705/ss.202017.0034
Web of Science ®Google Scholar
Li, D., Linton, O., & Lu, Z. (2015). A flexible semiparametric forecasting model for time series. Journal of Econometrics, 187(1), 345–357. https://doi.org/10.1016/j.jeconom.2015.02.025
Web of Science ®Google Scholar
Liao, J., Zong, X., Zhang, X., & Zou, G. (2019). Model averaging based on leave-subject-out cross-validation for vector autoregressions. Journal of Econometrics, 209(1), 35–60. https://doi.org/10.1016/j.jeconom.2018.10.007
Web of Science ®Google Scholar
Lin, H., Liu, W., & Lan, W. (2021). Regression analysis with individual-specific patterns of missing covariates. Journal of Business & Economic Statistics, 39(1), 179–188. https://doi.org/10.1080/07350015.2019.1635486
Web of Science ®Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. 2nd ed. Wiley.
Google Scholar
Liu, Q., & Okui, R. (2013). Heteroskedasticity-robust Cp model averaging. The Econometrics Journal, 16(3),463–472. https://doi.org/10.1111/ectj.12009
Web of Science ®Google Scholar
Liu, Q., & Zheng, M. (2020). Model averaging for generalized linear model with covariates that are missing completely at random. The Journal of Quantitative Economics, 11(4), 25–40. https://doi.org/10.16699/b.cnki.jqe.2020.04.003
Google Scholar
Longford, N. T. (2005). Editorial: Model selection and efficiency is ‘Which model…?’ the right question? Journal of the Royal Statistical Society: Series A (Statistics in Society), 168(3), 469–472. https://doi.org/10.1111/rssa.2005.168.issue-3
Google Scholar
Lu, X., & Su, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics, 188(1), 40–58. https://doi.org/10.1016/j.jeconom.2014.11.005
Web of Science ®Google Scholar
Mallows, C. (1973). Some comments on Cp. Technometrics, 15(4), 661–675. https://doi.org/10.2307/1267380
Web of Science ®Google Scholar
Meier, L., Geer, S. V. D., & Peter, B. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 53–71. https://doi.org/10.1111/j.1467-9868.2007.00627.x
Web of Science ®Google Scholar
Schomaker, M., Wan, A. T. K., & Heumann, C. (2010). Frequentist model averaging with missing observations. Computational Statistics and Data Analysis, 54(12), 3336–3347. https://doi.org/10.1016/j.csda.2009.07.023
Web of Science ®Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Web of Science ®Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Web of Science ®Google Scholar
Wan, A. T. K., Zhang, X., & Zou, G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics, 156(2), 277–283. https://doi.org/10.1016/j.jeconom.2009.10.030
Web of Science ®Google Scholar
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50(1), 1–25. https://doi.org/10.2307/1912526
Web of Science ®Google Scholar
Xue, F., & Qu, A. (2021). Integrating multi-source block-wise missing data in model selection. Journal of American Statistical Association, 116(536), 1914–1927. https://doi.org/10.1080/01621459.2020.1751176
Web of Science ®Google Scholar
Yang, Y. (2001). Adaptive regression by mixing. Journal of American Statistical Association, 96(454), 574–588. https://doi.org/10.1198/016214501753168262
Web of Science ®Google Scholar
Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statistica Sinica, 13, 783–809.
Web of Science ®Google Scholar
Zhang, X. (2013). Model averaging with covariates that are missing completely at random. Economics Letters, 121(3), 360–363. https://doi.org/10.1016/j.econlet.2013.09.008
Web of Science ®Google Scholar
Zhang, X., Yu, D., Zou, G., & Liang, H. (2016). Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association, 111(516), 1775–1790. https://doi.org/10.1080/01621459.2015.1115762
Web of Science ®Google Scholar
Zhang, X., Zou, G., & Liang, H. (2014). Model averaging and weight choice in linear mixed effects models. Biometrika, 101(1), 205–218. https://doi.org/10.1093/biomet/ast052
Web of Science ®Google Scholar
Zhang, X., Zou, G., Liang, H., & Carroll, R. J. (2020). Parsimonious model averaging with a diverging number of parameters. Journal of the American Statistical Association, 115(530), 972–984. https://doi.org/10.1080/01621459.2019.1604363
PubMed Web of Science ®Google Scholar
Zhang, Y., Tang, N., & Qu, A. (2020). Imputed factor regression for high-dimensional block-wise missing data. Statistica Sinica, 30(2), 631–651. https://doi.org/10.5705/ss.202018.0008
Web of Science ®Google Scholar
Zheng, H., Tsui, K-W, Kang, X., & Deng, X. (2017). Cholesky-based model averaging for covariance matrix estimation. Statistical Theory and Related Fields, 1(1), 48–58. https://doi.org/10.1080/24754269.2017.1336831
Google Scholar
Zhu, R., Wan, A. T. K., Zhang, X., & Zou, G. (2019). A Mallow-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association, 114(526), 882–892. https://doi.org/10.1080/01621459.2018.1456936
Web of Science ®Google Scholar

Appendix. Proof of Theorem 3.1

Proof. Let

\tilde{G} (w) = G (w) - 2 ϕ^{- 1} B_{0} + 2 ϕ^{- 1} μ_{S_{1}}^{⊤} θ_{0}

. It is obvious that

\hat{w} = {a r g m i n}_{w \in H_{n}} \tilde{G} (w)

. From the proof of Theorem 1 in Wan et al. (Citation2010), Theorem 3.1 is valid if the following two conclusions hold:

(A1)

(i) sup_{w \in H_{n}} \frac{| K L (w) - {K L}^{*} (w) |}{{K L}^{*} (w)} \to_{p} 0,

(A1) and

(A2)

(i i) sup_{w \in H_{n}} \frac{| \tilde{G} (w) - {K L}^{*} (w) |}{{K L}^{*} (w)} \to_{p} 0.

(A2) By (Equation6), we know that uniformly for

w \in H_{n}

(A3)

\begin{aligned} \hat{β} (w) - β^{*} (w) = \sum_{k = 1}^{K} w_{k} Π_{k}^{⊤} ({\hat{β}}_{(k)} - β_{(k)}^{*}) = O_{p} (n_{1}^{- 1 / 2}) . \end{aligned}

(A3) It follows from (EquationA3), Condition (C1), and Taylor expansion that uniformly for

w \in H_{n}

\begin{aligned} | B {\hat{β} (w)} - B {β^{*} (w)} | \\ \leq ‖ \frac{\partial B (β)}{\partial β^{⊤}} |_{β = \tilde{β} (w)} ‖ ∥ \hat{β} (w) - β^{*} (w) ∥= O_{p} (n_{1}^{1 / 2}), \\ μ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] \\ \leq∥ μ_{S_{1}}^{⊤} X_{1} ∥∥ \hat{β} (w) - β^{*} (w) ∥= O_{p} (n_{1}^{1 / 2}), \end{aligned}

and

\begin{aligned} ϵ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] \\ \leq∥ ϵ_{S_{1}}^{⊤} X_{1} ∥∥ \hat{β} (w) - β^{*} (w) ∥= O_{p} (1), \end{aligned}

where

\tilde{β} (w)

is a vector between

\hat{β} (w)

and

β^{*} (w)

. In addition, using the central limit theorem and Condition (C.2), we know that uniformly for

w \in H_{n}

ϵ_{S_{1}}^{⊤} θ {β^{*} (w)} = \sum_{k = 1}^{K} w_{k} ϵ_{S_{1}}^{⊤} θ (Π_{k}^{⊤} β_{(k)}^{*}) = O_{p} (n_{1}^{1 / 2}) .

Then we have (A4) $\begin{aligned} sup_{w \in H_{n}} | K L (w) - {K L}^{*} (w) | \\ \leq 2 ϕ^{- 1} sup_{w \in H_{n}} | B {\hat{β} (w)} - B {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | μ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] | \\ = O_{p} (n_{1}^{1 / 2}) \end{aligned}$ (A4) and (A5) $\begin{aligned} sup_{w \in H_{n}} | \tilde{G} (w) - {K L}^{*} (w) | \\ \leq 2 ϕ^{- 1} sup_{w \in H_{n}} | B {\hat{β} (w)} - B {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | y_{S_{1}}^{⊤} θ {\hat{β} (w)} - μ_{S_{1}}^{⊤} θ {β^{*} (w)} | \\ + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k} \\ \leq 2 ϕ^{- 1} sup_{w \in H_{n}} | B {\hat{β} (w)} - B {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | μ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | ϵ_{S_{1}}^{⊤} θ {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | ϵ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] | \\ + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k} \\ = O_{p} (n_{1}^{1 / 2}) + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k} . \end{aligned}$ (A5) Now, from (EquationA4(A4) $\begin{aligned} sup_{w \in H_{n}} | K L (w) - {K L}^{*} (w) | \\ \leq 2 ϕ^{- 1} sup_{w \in H_{n}} | B {\hat{β} (w)} - B {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | μ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] | \\ = O_{p} (n_{1}^{1 / 2}) \end{aligned}$ (A4) ) to (EquationA5(A5) $\begin{aligned} sup_{w \in H_{n}} | \tilde{G} (w) - {K L}^{*} (w) | \\ \leq 2 ϕ^{- 1} sup_{w \in H_{n}} | B {\hat{β} (w)} - B {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | y_{S_{1}}^{⊤} θ {\hat{β} (w)} - μ_{S_{1}}^{⊤} θ {β^{*} (w)} | \\ + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k} \\ \leq 2 ϕ^{- 1} sup_{w \in H_{n}} | B {\hat{β} (w)} - B {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | μ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | ϵ_{S_{1}}^{⊤} θ {β^{*} (w)} | \\ + 2 ϕ^{- 1} sup_{w \in H_{n}} | ϵ_{S_{1}}^{⊤} [θ {\hat{β} (w)} - θ {β^{*} (w)}] | \\ + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k} \\ = O_{p} (n_{1}^{1 / 2}) + λ_{n} \sum_{k = 1}^{K} w_{k} p_{k} . \end{aligned}$ (A5) ), $n_{1} ξ_{n}^{- 2} = o (1)$ , and $n_{1}^{- 1 / 2} λ_{n} = O (1)$ , we can obtain (EquationA1(A1) $(i) sup_{w \in H_{n}} \frac{| K L (w) - {K L}^{*} (w) |}{{K L}^{*} (w)} \to_{p} 0,$ (A1) ) and (EquationA2(A2) $(i i) sup_{w \in H_{n}} \frac{| \tilde{G} (w) - {K L}^{*} (w) |}{{K L}^{*} (w)} \to_{p} 0.$ (A2) ). This completes the proof.

Model averaging for generalized linear models in fragmentary data prediction

ABSTRACT

1. Introduction

Table 1. An illustrative example for fragmentary data.

Table 2. Response patterns and sample sizes for ADNI data.

2. The proposed method

3. Asymptotic optimality

4. Simulation

5. A real data example

6. Concluding remarks

Disclosure statement

References

Appendix. Proof of Theorem 3.1

Information for

Open access

Opportunities

Help and information

Model averaging for generalized linear models in fragmentary data prediction

ABSTRACT

1. Introduction

Table 1. An illustrative example for fragmentary data.

Table 2. Response patterns and sample sizes for ADNI data.

2. The proposed method

3. Asymptotic optimality

4. Simulation

5. A real data example

6. Concluding remarks

Disclosure statement

Additional information

Funding

References

Appendix. Proof of Theorem 3.1

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date