References
- S. Aleshin-Guendel and M. Sadinle, Multifile partitioning for record linkage and duplicate detection, J. Am. Stat. Assoc. (2022), pp. 1–10. Available at https://doi.org/10.1080/01621459.2021.2013242.
- M.J. Beal, Variational Algorithms for Approximate Bayesian Inference, University of London, University College London (United Kingdom), 2003.
- B. Betancourt, J. Sosa, and A. Rodríguez, A prior for record linkage based on allelic partitions, Comput. Stat. Data. Anal. 172 (2022), pp. 107474.
- B. Betancourt, G. Zanella, J.W. Miller, H. Wallach, A. Zaidi, and R.C. Steorts, Flexible models for microclustering with application to entity resolution, Adv. Neural. Inf. Process. Syst. 29 (2016), pp. 1417–1425.
- B. Betancourt, G. Zanella, and R.C. Steorts, Random partition models for microclustering tasks, J. Am. Stat. Assoc. (2020), pp. 1–13. Available at https://doi.org/10.1080/01621459.2020.1841647.
- A. Borg and M Sariyar, RecordLinkage: Record Linkage in R. 2016, R package version 0.4-10.
- T. Broderick and R.C Steorts, Variational Bayes for merging noisy databases, preprint (2014). Available at arXiv:1410.4792.
- G. Casella, E. Moreno, and F.J. Girón, Cluster analysis, model selection, and prior distributions on models, Bayesian Analysis 9 (2014), pp. 613–658.
- T. Chen, E. Fox, and C Guestrin, Stochastic gradient Hamiltonian Monte Carlo, International Conference on Machine Learning, PMLR, 2014, pp. 1683–1691.
- P. Christen and A Pudjijono, Accurate synthetic generation of realistic personal information, Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2009, pp. 507–514.
- P. Christen and D Vatsalan, Flexible and extensible generation and corruption of personal data, Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, 2013, pp. 1165–1168.
- H. Crane, The ubiquitous Ewens sampling formula, Stat. Sci. 31 (2016), pp. 1–19.
- P Domingos, Multi-relational record linkage, Proceedings of the KDD-2004 Workshop on Multi-Relational Data Mining, Citeseer, 2004.
- T. Enamorado and R.C Steorts, Probabilistic blocking and distributed Bayesian entity resolution, International Conference on Privacy in Statistical Databases, Springer, 2020, pp. 224–239.
- T. Ferguson, A bayesian analysis of some nonparametric problems, Ann. Stat. 1 (1973), pp. 209–230.
- D. Gamerman and H.F. Lopes, Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, CRC Press, Boca Raton, FL, 2006.
- M.S. Handcock, A.E. Raftery, and J.M. Tantrum, Model-based clustering for social networks, J. R. Stat. Soc.: Ser. A (Stat. Soc.) 170 (2007), pp. 301–354.
- P.D. Hoff, A.E. Raftery, and M.S. Handcock, Latent space approaches to social network analysis, J. Am. Stat. Assoc. 97 (2002), pp. 1090–1098.
- S. Jain and R.M. Neal, A split-merge Markov chain Monte Carlo procedure for the dirichlet process mixture model, J. Comput. Graph. Stat. 13 (2004), pp. 158–182.
- M.I. Jordan, Z. Ghahramani, T.S. Jaakkola, and L.K. Saul, An introduction to variational methods for graphical models, Mach. Learn. 37 (1999), pp. 183–233.
- P.N. Krivitsky and M.S. Handcock, Fitting latent cluster models for networks with latentnet, J. Stat. Softw. 24 (2008), pp. 1–23.
- J.W. Lau and P.J. Green, Bayesian model-based clustering procedures, J. Comput. Graph. Stat. 16 (2007), pp. 526–558.
- N.G. Marchant, A. Kaplan, D.N. Elazar, B.I. Rubinstein, and R.C. Steorts, d-blink: Distributed end-to-end Bayesian entity resolution, J. Comput. Graph. Stat. 30 (2021), pp. 406–421.
- P. McCullagh and J Yang, Stochastic classification models, International Congress of Mathematicians, Vol. 3, Citeseer, 2006, pp. 72–145.
- J. Miller, B. Betancourt, A. Zaidi, H. Wallach, and R.C Steorts, Microclustering: When the cluster sizes grow sublinearly with the size of the data set, preprint (2015). Available at arXiv:1512.00792.
- J.W. Miller and M.T. Harrison, Mixture models with a prior on the number of components, J. Am. Stat. Assoc. 113 (2018), pp. 340–356.
- P. Müller and A. Rodríguez, Nonparametric Bayesian Inference, Institute of Mathematical Statistics, 2013. Available at https://imstat.org/overview/.
- A. Narayanan and V Shmatikov, De-anonymizing social networks, 2009 30th IEEE Symposium on Security and Privacy, IEEE, 2009, pp. 173–187.
- R.M. Neal, Markov chain sampling methods for dirichlet process mixture models, J. Comput. Graph. Stat. 9 (2000), pp. 249–265.
- R.M. Neal, MCMC using Hamiltonian dynamics, Handbook of Markov chain Monte Carlo, Vol. 2, 2011, pp. 114–162,
- J.S Rosenthal, Optimal proposal distributions and adaptive MCMC, Handbook of Markov Chain Monte Carlo, 4(10.1201). 2011.
- M. Sadinle, Detecting duplicates in a homicide registry using a Bayesian partitioning approach, Ann. Appl. Stat. 8 (2014), pp. 2404–2434.
- M. Sadinle and S.E. Fienberg, A generalized Fellegi–Sunter framework for multiple record linkage with application to homicide record systems, J. Am. Stat. Assoc. 108 (2013), pp. 385–397.
- L.K. Saul, T. Jaakkola, and M.I. Jordan, Mean field theory for sigmoid belief networks, J Artif Intell Res 4 (1996), pp. 61–76.
- J. Sethuraman, A constructive definition of dirichlet priors, Stat. Sin. 4 (1994), pp. 639–650.
- A.L. Smith, D.M. Asta, and C.A. Calder, The geometry of continuous latent space models for network data, Stat. Sci. 34 (2019), pp. 428–453.
- J. Sosa and L. Buitrago, A review of latent space models for social networks, Revista Colombiana De Estadística 44 (2021), pp. 171–200.
- J. Sosa and A Rodríguez, A record linkage model incorporating relational data, preprint (2018). Available at arXiv:1808.04511.
- R.C. Steorts, Entity resolution with empirically motivated priors, Bayesian Analysis 10 (2015), pp. 849–875.
- R.C. Steorts, R. Hall, and S.E. Fienberg, A Bayesian approach to graphical record linkage and deduplication, J. Am. Stat. Assoc. 111 (2016), pp. 1660–1672.
- R.C. Steorts, S.L. Ventura, M. Sadinle, and S.E Fienberg, A comparison of blocking methods for record linkage, International Conference on Privacy in Statistical Databases, Springer, 2014, pp. 253–268.
- A. Tancredi, R. Steorts, and B. Liseo, A unified framework for de-duplication and population size estimation (with discussion), Bayesian Analysis 15 (2020), pp. 633–682.
- H. Wallach, S. Jensen, L. Dicker, and K Heller, An alternative prior process for nonparametric Bayesian clustering, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2010, pp. 892–899.