References
- Ball, P., and Price, M. (2019), “Using Statistics to Assess Lethal Violence in Civil and Inter-State War,” Annual Review of Statistics and Its Application, 6, 63–84. DOI: 10.1146/annurev-statistics-030718-105222.
- Betancourt, B., Sosa, J., and Rodríguez, A. (2020), ‘‘A Prior for Record Linkage Based on Allelic Partitions,” arXiv:2008.10118.
- Betancourt, B., Zanella, G., and Steorts, R. C. (2020), “Random Partition Models for Microclustering Tasks,” Journal of the American Statistical Association, pp. 1–13, DOI: 10.1080/01621459.2020.1841647.
- Bilenko, M., Mooney, R. J., Cohen, W. W., Ravikumar, P. and Fienberg, S. E. (2003), “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems, 18, 16–23. DOI: 10.1109/MIS.2003.1234765.
- Binder, D. A. (1978), “Bayesian Cluster Analysis,’’ Biometrika, 65, 31–38. DOI: 10.1093/biomet/65.1.31.
- Binette, O., and Steorts, R. C. (2020), “(Almost) All of Entity Resolution,” arXiv:2008.04443.
- Bird, S. M., and King, R. (2018), “Multiple Systems Estimation (or Capture–Recapture Estimation) to Inform Public Policy,” Annual Review of Statistics and Its Application, 5, 95–118. DOI: 10.1146/annurev-statistics-031017-100641.
- Enamorado, T., Fifield, B., and Imai, K. (2019), “Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records,” American Political Science Review, 113, 353–371. DOI: 10.1017/S0003055418000783.
- Enamorado, T., and Steorts, R. C. (2020), “Probabilistic Blocking and Distributed Bayesian Entity Resolution,” in International Conference on Privacy in Statistical Databases, Cham: Springer, pp. 224–239.
- Fellegi, I. P., and Sunter, A. B. (1969), “A Theory for Record Linkage,’’ Journal of the American Statistical Association, 64, 1183–1210. DOI: 10.1080/01621459.1969.10501049.
- Fortini, M., Liseo, B., Nuccitelli, A. and Scanu, M. (2001), “On Bayesian Record Linkage,” Research in Official Statistics, 4, 185–198.
- Herbei, R., and Wegkamp, M. H. (2006), “Classification With Reject Option,” Canadian Journal of Statistics, 34, 709–721. DOI: 10.1002/cjs.5550340410.
- Hof, M. H., Ravelli, A. C., and Zwinderman, A. H. (2017), “A Probabilistic Record Linkage Model for Survival Data,” Journal of the American Statistical Association, 112, 1504–1515. DOI: 10.1080/01621459.2017.1311262.
- Jaro, M. A. (1989), “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,” Journal of the American Statistical Association, 84, 414–420. DOI: 10.1080/01621459.1989.10478785.
- Klami, A., and Jitta, A. (2016), “Probabilistic Size-Constrained Microclustering,” in Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, Arlington, VA: AUAI Press, pp. 329–338.
- Larsen, M. D. (2005), “Advances in Record Linkage Theory: Hierarchical Bayesian Record Linkage Theory,” in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 3277–3284.
- Larsen, M. D., and Rubin, D. B. (2001), ‘Iterative Automated Record Linkage Using Mixture Models’, Journal of the American Statistical Association 96, 32–41. DOI: 10.1198/016214501750332956.
- Liseo, B. and Tancredi, A. (2011), “Bayesian Estimation of Population Size via Linkage of Multivariate Normal Data Sets,” Journal of Official Statistics 27, 491–505.
- Marchant, N. G., Kaplan, A., Elazar, D. N., Rubinstein, B. I. and Steorts, R. C. (2021), “d-Blink: Distributed End-to-End Bayesian Entity Resolution,’’ Journal of Computational and Graphical Statistics, 30, 406–421. DOI: 10.1080/10618600.2020.1825451.
- Matsakis, N. E. (2010), “Active Duplicate Detection with Bayesian Nonparametric Models,” PhD thesis, Massachusetts Institute of Technology.
- Meilă, M. (2007), “Comparing Clusteringsan Information Based Distance,’’ Journal of Multivariate Analysis, 98, 873–895. DOI: 10.1016/j.jmva.2006.11.013.
- Miller, J., Betancourt, B., Zaidi, A., Wallach, H., and Steorts, R. C. (2015), ‘‘Microclustering: When the Cluster Sizes Grow Sublinearly With the Size of the Data Set,” arXiv:1512.00792.
- Sadinle, M. (2014), “Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach,” The Annals of Applied Statistics, 8, 2404–2434. DOI: 10.1214/14-AOAS779.
- Sadinle, M. (2017), “Bayesian Estimation of Bipartite Matchings for Record Linkage,” Journal of the American Statistical Association, 112, 600–612.
- Sadinle, M., and Fienberg, S. E. (2013), ‘‘A Generalized Fellegi–Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems,” Journal of the American Statistical Association, 108, 385–397. DOI: 10.1080/01621459.2012.757231.
- Steorts, R. C. (2015), “Entity Resolution With Empirically Motivated Priors,” Bayesian Analysis, 10, 849–875. DOI: 10.1214/15-BA965SI.
- Steorts, R. C., Hall, R., and Fienberg, S. E. (2016), “A Bayesian Approach to Graphical Record Linkage and Deduplication,” Journal of the American Statistical Association, 111, 1660–1672. DOI: 10.1080/01621459.2015.1105807.
- Tancredi, A., and Liseo, B. (2011), “A Hierarchical Bayesian Approach to Record Linkage and Size Population Problems,’’ Annals of Applied Statistics, 5, 1553–1585.
- Tancredi, A., Steorts, R., and Liseo, B. (2020), “A Unified Framework for De-Duplication and Population Size Estimation” (with discussion), Bayesian Analysis, 15, 633–682. DOI: 10.1214/19-BA1146.
- Tran, K.-N., Vatsalan, D., and Christen, P. (2013), “GeCo: An Online Personal Data Generator and Corruptor,” in Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2473–2476.
- Wade, S., and Ghahramani, Z. (2018), “Bayesian Cluster Analysis: Point Estimation and Credible Balls” (with discussion), Bayesian Analysis, 13, 559–626. DOI: 10.1214/17-BA1073.
- Winkler, W. E. (1990), “String Comparator Metrics and Enhanced Decision Rules in the Fellegi–Sunter Model of Record Linkage,” in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 354–359.
- Winkler, W. E. (1994), “Advanced Methods for Record Linkage,” in Proceedings of the Section on Survey Research Methods, Alexandria, VA: American Statistical Association, pp. 467–472.
- Zanella, G., Betancourt, B., Miller, J. W., Wallach, H., Zaidi, A. and Steorts, R. C. (2016), “Flexible Models for Microclustering With Application to Entity Resolution,” in Advances in Neural Information Processing Systems, NY, USA: Curran Associates Inc., pp. 1417–1425.