Abstract
The increased availability of healthcare data has made predictive modeling popular in a clinical setting. If an expected patient-specific outcome can be estimated prior to a medical intervention the healthcare costs can be reduced for both patient and provider. The nature of data used to train such predictive models is frequently longitudinal, as interventions with convalescence times or chronic conditions contain outcome measures at intermediate follow-up points. Here we outline a predictive modeling approach that takes advantage of the longitudinal structure of the data by sequentially predicting the outcomes at intermediate time points and including them as predictors in models for later time points. This is done for continuous and threshold-dichotomized outcomes. The proposed method improves predictive accuracy as it takes advantage of the correlation in follow-up measures to distribute the estimation of coefficient effects over several models, making it advantageous for smaller datasets. This formulation also allows for effective screening of first-order interaction effects. The improved performance is illustrated using a simulation study and an applied example of predicting outcomes following surgery. The proposed approach is shown to be consistent for prediction, effective in modeling interactions and robust to presence of noise variables.
Data availability
Data used in the example is available from the joineR package in R.