Abstract
In this article, the authors derive the likelihood-based exact inference for singly and multiply imputed synthetic data in the context of a multivariate regression model. The synthetic data are generated via the Plug-in Sampling method, where the unknown parameters in the model are set equal to the observed values of their point estimators based on the original data, and synthetic data are drawn from this estimated version of the model. Simulation studies are carried out in order to confirm the theoretical results. The authors provide exact test procedures, which in case multiple synthetic datasets are permissible, are compared with the asymptotic results of Reiter. An application using 2000 U.S. Current Population Survey public use data is discussed. Furthermore, properties of the proposed methodology are evaluated in scenarios where some of the conditions that were used to derive the methodology do not hold, namely for nonnormal and discrete distributed random variables, cases in which the inferential procedures developed still show very good performances.
Acknowledgments
Ricardo Moura sincerely thanks the faculty of the Department of Mathematics and Statistics at UMBC (University of Maryland, Baltimore County) for their support and encouragement. Part of Ricardo Moura’s work was also completed during a ‘Summer at Census’ visit to the US Census Bureau for which he is thankful to Dr. Tommy Wright. The authors also want to earnestly thank the constructive contributions of the Editors-in-Chief, Associate Editors and reviewers who dealt with the manuscript since its first submission.