CrossRef citations to date
Review Article

A selective review of statistical methods using calibration information from similar studies

, &
Pages 175-190 | Received 04 Jan 2021, Accepted 10 Jan 2022, Published online: 17 Feb 2022


  • Babcock, B., Babu, S., Datar, M., Motwani, R., & Widom, J. (2002). Models and issues in data stream systems. In Proceedings of the 21 ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 1–16). ACM.
  • Back, K., & Brown, D. P. (1992). GMM, maximum likelihood, and nonparametric efficiency. Economics Letters, 39(1), 23–28. https://doi.org/10.1016/0165-1765(92)90095-G
  • Braverman, M., Garg, A., Ma, T., Nguyen, H., & Woodruff, D. (2016). Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the 48th annual ACM symposium on theory of computing (pp. 1011–1020). ACM.
  • Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. (2009). Introduction to meta-analysis. Wiley.
  • Chatterjee, N., Chen, Y.-H., Maas, P., & Carroll, R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. Journal of the American Statistical Association, 111(513), 107–117. https://doi.org/10.1080/01621459.2015.1123157
  • Chaudhuri, S., Handcock, M. S., & Rendall, M. S. (2008). Generalized linear models incorporating population level information: an empirical likelihood based approach. Journal of the Royal Statistical Society: Series B, 70(2), 311–328. https://doi.org/10.1111/rssb.2008.70.issue-2
  • Chen, J., & Qin, J. (1993). Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika, 80(1), 107–116. https://doi.org/10.1093/biomet/80.1.107
  • Chen, J., Sitter, R., & Wu, C. (2002). Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys. Biometrika, 89(1), 230–237. https://doi.org/10.1093/biomet/89.1.230
  • Cochran, W. G. (1977). Sampling techniques (3rd ed.). Wiley.
  • Dersimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188. https://doi.org/10.1016/0197-2456(86)90046-2
  • Duan, R., Ning, Y., & Chen, Y. (2020). Heterogeneity-aware and communication-efficient distributed statistical inference. arXiv:1912.09623v1.
  • Duchi, J., Jordan, M., Wainwright, M., & Zhang, Y. (2015). Optimality guarantees for distributed statistical estimation. arXiv:1405.0782.
  • Han, P., & Lawless, J. (2016). Comment. Journal of the American Statistical Association, 111(513), 118–121. https://doi.org/10.1080/01621459.2016.1149399
  • Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50(4), 1029–1054. https://doi.org/10.2307/1912775
  • Hartely, H. O., & Rao, J. N. K. (1968). A new estimation theory for sample surveys. Biometrika, 55(3), 547–557. https://doi.org/10.1093/biomet/55.3.547
  • Imbens, G., & Lancaster, T. (1994). Combining micro and macro data in microeconometric models. Review of Economic Studies, 61(4), 655–680. https://doi.org/10.2307/2297913
  • Jordan, M. I., Lee, J. D., & Yang, Y. (2019). Communication-efficient distribution statistical inference. Journal of the American Statistical Association, 114(526), 668–681. https://doi.org/10.1080/01621459.2018.1429274
  • Lee, J., Liu, Q., Sun, Y., & Taylor, J. (2017). Communication-efficient sparse regression. Journal of Machine Learning Research, 18, 1–30. http://jmlr.org/papers/v18/16-002.html
  • Lin, D. Y., & Zeng, D. (2010). On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika, 97(2), 321–332. https://doi.org/10.1093/biomet/asq006
  • Luo, L. (2020). Renewable estimation and incremental inference in generalized linear models with streaming data sets. Journal of the Royal Statistical Society, Series B, 82(1), 69–97. https://doi.org/10.1111/rssb.12352
  • Neiswanger, W., Wang, C., & Xing, E. (2015). Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the 30th conference on uncertainty in artificial intelligence (pp. 623–632). AUAI Press.
  • Nguyen, T. D., Shih, M. H., Srivastava, D., Tirthapura, S., & Xu, B. (2021). Stratified random sampling from streaming and stored data. Distributed and Parallel Databases, 39(3), 665–710. https://doi.org/10.1007/s10619-020-07315-w
  • Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2), 237–249. https://doi.org/10.1093/biomet/75.2.237
  • Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Annals of Statistics, 18(1), 90–120. https://doi.org/10.1214/aos/1176347494
  • Owen, A. B. (2001). Empirical likelihood. CRC.
  • Qin, J. (2000). Combining parametric and empirical likelihoods. Biometrika, 87(2), 484–490. https://doi.org/10.1093/biomet/87.2.484
  • Qin, J. (2017). Biased sampling, over-identified parameter problems and beyond. Springer.
  • Qin, J., & Lawless, J. (1994). Empirical likelihood and general equations. Annals of Statistics, 22(1), 300–325. https://doi.org/10.1214/aos/1176325370
  • Qin, J., Zhang, H., Li, P., Albanes, D., & Yu, K. (2015). Using covariate specific disease prevalence information to increase the power of case-control study. Biometrika, 102(1), 169–180. https://doi.org/10.1093/biomet/asu048
  • Susanne, M. S. (2007). Point estimation with exponentially tilted empirical likelihood. Annals of Statistics, 35(2), 634–672. https://doi.org/10.1214/009053606000001208
  • Tian, L., & Gu, Q. (2016). Communication-efficient distributed sparse linear discriminant analysis. arXiv:1610.04798.
  • van de Geer, S., Buhlmann, P., Ritov, Y., & Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high dimensional models. Annals of Statistics, 42(3), 1166–1202. https://doi.org/10.1214/14-AOS1221
  • van de Vaart, V. W. (2000). Asymptotic statistics. Cambridge University Press.
  • Wang, X., & Dunson, D. (2015). Parallelizing MCMC via Weierstrass sampler. arXiv:1312.4605.
  • Wang, J., Kolar, M., Srebro, N., & Zhang, T. (2017). Efficient distributed learning with sparsity. In Proceedings of the 34th international conference on machine learning, Sydney, Australia, PMLR 70 (pp. 3636–3645).
  • Wu, C., & Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association, 96(453), 185–193. https://doi.org/10.1198/016214501750333054
  • Wu, C., & Thompson, M. E. (2020). Sampling theory and practice. Springer.
  • Zeng, D. & Lin, D. Y. (2015). On random-effects meta-analysis. Biometrika, 102(2), 281–294.
  • Zhang, Y., Duchi, J., & Wainwright, M. (2013). Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Research, 14, 3321–3363.
  • Zhang, H., Deng, L., Schiffman, M., Qin, J., & Yu, K. (2020). Generalized integration model for improved statistical inference by leveraging external summary data. Biometrika, 107(3), 689–703. https://doi.org/10.1093/biomet/asaa014