Abstract
Data augmentation (DA) turns seemingly intractable computational problems into simple ones by augmenting latent missing data. In addition to computational simplicity, it is now well-established that DA equipped with a deterministic transformation can improve the convergence speed of iterative algorithms such as an EM algorithm or Gibbs sampler. In this article, we outline a framework for the transformation-based DA, which we call data transforming augmentation (DTA), allowing augmented data to be a deterministic function of latent and observed data, and unknown parameters. Under this framework, we investigate a novel DTA scheme that turns heteroscedastic models into homoscedastic ones to take advantage of simpler computations typically available in homoscedastic cases. Applying this DTA scheme to fitting linear mixed models, we demonstrate simpler computations and faster convergence rates of resulting iterative algorithms, compared with those under a non-transformation-based DA scheme. We also fit a Beta-Binomial model using the proposed DTA scheme, which enables sampling approximate marginal posterior distributions that are available only under homoscedasticity. Supplementary materials are available online.
Acknowledgements
Both Joseph Kelly and Hyungsuk Tak would like to acknowledge and thank Carl N. Morris for the supervision and integral role played in the development of Kelly (Citation2014). Hyungsuk Tak also thanks Xiao-Li Meng for thoughtful comments on the first draft of this manuscript and Phillip Everson for a series of productive discussions on the multivariate linear mixed model. Finally, we thank the associate editor and the two anonymous reviewers for their insightful comments that significantly improved the presentation of this manuscript.
Supplementary materials
Supplementary materials are available online that include (A) the details of the Gibbs samplers and EM algorithms based on both DTA and DA used in Section 3.1, (B) those of the Gibbs samplers and EM algorithms for the multivariate linear mixed model in Section 3.2, (C) those of the DTA scheme used in Section 4, (D) the hospital profiling data in Section 3.2, and (E) all the R codes used in this article.