63
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Fast Bayesian Record Linkage for Streaming Data Contexts

, ORCID Icon &
Received 28 Sep 2022, Accepted 03 Nov 2023, Published online: 03 Jan 2024

References

  • Aleshin-Guendel, S., and Sadinle, M. (2022), “Multifile Partitioning for Record Linkage and Duplicate Detection,” Journal of the American Statistical Association, 118, 1786–1795. DOI: 10.1080/01621459.2021.2013242.
  • Altwaijry, H., Kalashnikov, D. V., Mehrotra, S. (2017), “QDA: A Query-Driven Approach to Entity Resolution,” IEEE Transactions on Knowledge and Data Engineering, 29, 402–417. DOI: 10.1109/TKDE.2016.2623607.
  • Anderson, J., Burns, P. J., Milroy, D., Ruprecht, P., Hauser, T., and Siegel, H. J. (2017), “Deploying RMACC Summit: An HPC Resource for the Rocky Mountain Region,” in Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC17, New York, NY, USA. Association for Computing Machinery.
  • Betancourt, B., Zanella, G., Miller, J. W., Wallach, H., Zaidi, A., and Steorts, R. C. (2016), “Flexible Models for Microclustering with Application to Entity Resolution,” in Advances in Neural Information Processing Systems, (Vol. 29), eds. D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Curran Associates, Inc.
  • Binette, O., and Steorts, R. C. (2022), “(Almost) All of Entity Resolution,” Science Advances, 8, eabi8021. DOI: 10.1126/sciadv.abi8021.
  • Blair, D. C. (1979), “Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths; 1979: 208 pp. Price: $32.50,” Journal of the American Society for Information Science, 30, 374–375. DOI: 10.1002/asi.4630300621.
  • Christen, P. (2012), Data Matching Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications, Berlin: Springer.
  • Christen, P., Gayler, R., and Hawking, D. (2009), “Similarity-Aware Indexing for Real-Time Entity Resolution,” in Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, New York, NY, USA, pp. 1565–1568, Association for Computing Machinery. DOI: 10.1145/1645953.1646173.
  • Czapinski, J., and Panek, T. (2015), “Social Diagnosis - Objective and Subjective Quality of Life in Poland,” available at http://www.diagnoza.com/index-en.html. Accessed: 2022-06-09.
  • Dey, D., Mookerjee, V., and Liu, D. (2011), “Efficient Techniques for Online Record Linkage,” IEEE Transactions on Knowledge and Data Engineering, 23, 373–387. DOI: 10.1109/TKDE.2010.134.
  • Fellegi, I. P., and Sunter, A. B. (1969), “A Theory for Record Linkage,” Journal of the American Statistical Association, 64, 1183–1210. DOI: 10.1080/01621459.1969.10501049.
  • Fleming, M., Kirby, B., and Penny, K. I. (2012), “Record Linkage in Scotland and its Applications to Health Research,” Journal of Clinical Nursing, 21, 2711–2721. DOI: 10.1111/j.1365-2702.2011.04021.x.
  • Gutman, R., Afendulis, C. C., and Zaslavsky, A. M. (2013), “A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs,” Journal of the American Statistical Association, 108, 34–47. DOI: 10.1080/01621459.2012.726889.
  • Hof, M. H., Ravelli, A. C., and Zwinderman, A. H. (2017), “A Probabilistic Record Linkage Model for Survival Data,” Journal of the American Statistical Association, 112, 1504–1515. DOI: 10.1080/01621459.2017.1311262.
  • Hooten, M. B., Johnson, D. S., and Brost, B. M. (2021), “Making Recursive Bayesian Inference Accessible,” The American Statistician, 75, 185–194. DOI: 10.1080/00031305.2019.1665584.
  • Ioannou, E., Nejdl, W., Niederée, C., and Velegrakis, Y. (2010), “On-the-Fly Entity-Aware Query Processing in the Presence of Linkage,” Proceedings of the VLDB Endowment, 3(1–2):429–438. DOI: 10.14778/1920841.1920898.
  • Kaplan, A., Betancourt, B., and Steorts, R. C. (2022), “A Practical Approach to Proper Inference with Linked Data,” The American Statistician, 76, 384–393. DOI: 10.1080/00031305.2022.2041482.
  • Karapiperis, D., Gkoulalas-Divanis, A., and Verykios, V. S. (2018), “Summarization Algorithms for Record Linkage,” in Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26–29, 2018, eds. M. H. Böhlen, R. Pichler, N. May, E. Rahm, S. Wu, and K. Hose, pp. 73–84. OpenProceedings.org.
  • Lunn, D., Barrett, J., Sweeting, M., and Thompson, S. (2013), “Fully Bayesian Hierarchical Modelling in Two Stages, with Application to Meta-Analysis,” Journal of the Royal Statistical Society, Series C, 62, 551–572. DOI: 10.1111/rssc.12007.
  • Marchant, N. G., Kaplan, A., Elazar, D. N., Rubinstein, B. I. P., and Steorts, R. C. (2021), “d-blink: Distributed End-to-End Bayesian Entity Resolution,” Journal of Computational and Graphical Statistics, 30, 406–421. DOI: 10.1080/10618600.2020.1825451.
  • McVeigh, B. S., Spahn, B. T., and Murray, J. S. (2019), “Scaling Bayesian Probabilistic Record Linkage with Post-Hoc Blocking: An Application to the California Great Registers,” available at https://arxiv.org/abs/1905.05337.
  • Sadinle, M. (2014), “Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach,” The Annals of Applied Statistics, 8, 2404–2434. DOI: 10.1214/14-AOAS779.
  • Sadinle, M. (2017), “Bayesian Estimation of Bipartite Matchings for Record Linkage,” Journal of the American Statistical Association, 112, 600–612. DOI: 10.1080/01621459.2016.1148612.
  • Sadinle, M., and Fienberg, S. E. (2013), “A Generalized Fellegi–Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems,” Journal of the American Statistical Association, 108, 385–397. DOI: 10.1080/01621459.2012.757231.
  • Steorts, R. C. (2015), “Entity Resolution with Empirically Motivated Priors,” Bayesian Analysis, 10, 849–875. DOI: 10.1214/15-BA965SI.
  • Steorts, R. C., Hall, R., and Fienberg, S. E. (2016), “A Bayesian Approach to Graphical Record Linkage and Deduplication,” Journal of the American Statistical Association, 111, 1660–1672. DOI: 10.1080/01621459.2015.1105807.
  • Tancredi, A., and Liseo, B. (2011), “A Hierarchical Bayesian Approach to Record Linkage and Population Size Problems,” The Annals of Applied Statistics, 5, 1553–1585. DOI: 10.1214/10-AOAS447.
  • Taylor, I., Kaplan, A., and Betancourt, B. (2022), bstrl: Bayesian Streaming Record Linkage, R package version 1.0.2. DOI: 10.1080/10618600.2023.2283571.
  • Tran, K.-N., Vatsalan, D., and Christen, P. (2013), “GeCo: An Online Personal Data Generator and Corruptor,” in Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM ’13, New York, NY, USA, pp. 2473–2476. Association for Computing Machinery. DOI: 10.1145/2505515.2508207.
  • Vatsalan, D., Sehili, Z., Christen, P., and Rahm, E. (2017), Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges, pp. 851–895, Cham: Springer.
  • Winkler, W. E. (2006), “Overview of Record Linkage and Current Research Directions,” Technical Report, U.S. Bureau of the Census Statistical Research Division.
  • Wortman, J. P. H. (2019), “Record Linkage Methods with Applications to Causal Inference and Election Voting Data,” Ph. D. thesis, Duke University.
  • Yang, Y., and Dunson, D. B. (2013), “Sequential Markov Chain Monte Carlo,” available at https://arxiv.org/abs/1308.3861.
  • Zanella, G. (2020), “Informed Proposals for Local MCMC in Discrete Spaces,” Journal of the American Statistical Association, 115, 852–865. DOI: 10.1080/01621459.2019.1585255.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.