Abstract
Bayesian inference with a large dataset is computationally intensive, as Markov chain Monte Carlo simulation requires a complete scan of the dataset for each proposed parameter update. To reduce the number of data points evaluated at each iteration of posterior simulation, we develop a double marginalized subsampling method, which is applicable to a wide array of microeconometric models including Tobit, Probit, regressions with non-Gaussian errors, heteroscedasticity and stochastic volatility, hierarchical longitudinal models, time-varying-parameter regressions, Gaussian mixtures, etc. We also provide an extension to double pseudo-marginalized subsampling, which has more applications beyond conditionally conjugate models. With rank-one update of the cumulative statistics, both methods target the exact posterior distribution, from which a parameter draw can be obtained with every single observation. Simulation studies demonstrate the statistical and computational efficiency of the marginalized sampler. The methods are also applied to a real-world massive dataset on the incidentally truncated mortgage rates.
Supplementary Material
The supplementary materials contain computer programs that implement the double marginalized subsampling method proposed in the paper. Empirical results of the article (Gaussian mixture model, hierarchical longitudinal model, stochastic volatility, etc.) can be reproduced by the programs.
Acknowledgments
The author thank to the editor, AE and reviewers for the comments and insights that greatly improve the article.