Abstract
This article proposes an extended state-space model for accommodating multivariate panel data. The novel aspect of this contribution is an adjustment to the classical model for multiple subjects that allows missingness in the covariates in addition to the responses. Missing covariate data are handled by a second state-space model nested inside the first to represent unobserved exogenous information. Relevant Kalman filter equations are derived, and explicit expressions are provided for both the E- and M-steps of an expectation-maximization (EM) algorithm, to obtain maximum (Gaussian) likelihood estimates of all model parameters. In the presence of missing data, the resulting EM algorithm becomes computationally intractable, but a simplification of the M-step leads to a new procedure that is shown to be an expectation/conditional maximization (ECM) algorithm under exogeneity of the covariates. Simulation studies reveal that the approach appears to be relatively robust to moderate percentages of missing data, even with fewer subjects and time points, and that estimates are generally consistent with the asymptotics. The methodology is applied to a dataset from a published panel study of elderly patients with impaired respiratory function. Forecasted values thus obtained may serve as an “early-warning” mechanism for identifying patients whose lung function is nearing critical levels. Supplementary materials for this article are available online.
Acknowledgments
This research is based on Naranjo’s Ph.D. thesis. Also, it is supported by National Science Foundation grants DMS-0631632 and SES-0631588. The authors thank Dr. Susanna Lagorio, MD, Senior Researcher, National Centre for Epidemiology Surveillance and Health Promotion (CNESPS), National Institute of Health (Istituto Superiore di Sanità), Rome (Italy), for access to the Lagorio et al. () data. We are indebted to Prof. Dr. Miguel Jerez, Universidad Complutense de Madrid (Spain), for helpful discussions and access to E4, a MATLAB toolbox for time series modeling, which permitted us to carry out model identification calculations. We also acknowledge the suggestions of two anonymous reviewers that led to vast improvements. Finally, we dedicate this work to the memory of our mentor and colleague, George Casella, whose passing leaves an immense void in the statistics community.