1,852
Views
69
CrossRef citations to date
0
Altmetric
Original Articles

A Bayesian Approach to Graphical Record Linkage and Deduplication

, &
Pages 1660-1672 | Received 01 Dec 2013, Published online: 12 Jan 2017

References

  • Belin, T. R., and Rubin, D. B. (1995), “A Method for Calibrating False-Match Rates in Record Linkage,” Journal of the American Statistical Association, 90, 694–707.
  • Bhattacharya, I., and Getoor, L. (2006), “A Latent Dirichlet Model for Unsupervised Entity Resolution,” in SDM, Vol. 5, SIAM, pp. 47–58.
  • Christen, P. (2012), “A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication,” IEEE Transactions on Knowledge and Data Engineering, 24, 11.
  • Copas, J., and Hilton, F. (1990), “Record Linkage: Statistical Models for Matching Computer Records,” Journal of the Royal Statistical Society, Series A, 153, 287–320.
  • Dai, A. M., and Storkey, A. J. (2011), “The Grouped Author-Topic Model for Unsupervised Entity Resolution,” in Artificial Neural Networks and Machine Learning–ICANN 2011, New York: Springer, pp. 241–249.
  • Domingos, P., and Domingos, P. (2004), “Multi-Relational Record Linkage,” in Proceedings of the KDD-2004 Workshop on Multi-Relational Data Mining, ACM, pp. 31–48.
  • Fellegi, I., and Sunter, A. (1969), “A Theory for Record Linkage,” Journal of the American Statistical Association, 64, 1183–1210.
  • Fienberg, S., Makov, U., and Sanil, A. (1997), “A Bayesian Approach to Data Disclosure: Optimal Intruder Behavior for Continuous Data,” in Privacy in Statistical Databases, 13, 75–89.
  • Fleming, L., King, C., III, and Juda, A. (2007), “Small Worlds and Regional Innovation,” Organization Science, 18, 938–954.
  • Gutman, R., Afendulis, C., and Zaslavsky, A. (2013), “A Bayesian Procedure for File Linking to Analyze End- of-Life Medical Costs,” Journal of the American Statistical Association, 108, 34–47.
  • Hall, R., and Fienberg, S. (2012), “Valid Statistical Inference on Automatically Matched Files,” in Privacy in Statistical Databases 2012, Lecture Notes in Computer Science (Vol. 7556), eds. J. Domingo-Ferrer and I. Tinnirello, Berlin: Springer, pp
  • Herzog, T., Scheuren, F., and Winkler, W. (2007), Data Quality and Record Linkage Techniques, New York: Springer.
  • Jain, S., and Neal, R. (2004), “A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model,” Journal of Computational and Graphical Statistics, 13, 158–182.
  • Lahiri, P., and Larsen, M. (2005), “Regression Analysis With Linked Data,” Journal of the American Statistical Association, 100, 222–230.
  • Larsen, M. D., and Rubin, D. B. (2001), “Iterative Automated Record Linkage Using Mixture Models,” Journal of the American Statistical Association, 96, 32–41.
  • Liseo, B., and Tancredi, A. (2013), “Some Advances on Bayesian Record Linkage and Inference for Linked Data,” available at http://www.ine.es/e/essnetdi_ws2011/ppts/Liseo_Tancredi.pdf.
  • Matsakis, N. E. (2010), Active Duplicate Detection with Bayesian Nonparametric Models, Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA.
  • Reiter, J. P., and Raghunathan, T. E. (2007), “The Multiple Adaptations of Multiple Imputation,” Journal of the American Statistical Association, 102, 1462–1471.
  • Sadinle, M. (2014), “Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach,” The Annals of Applied Statistics, 8, 2404–2434.
  • Sadinle, M. (2015), A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage, Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA.
  • Sadinle, M., and Fienberg, S. (2013), “A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record-Systems,” Journal of the American Statistical Association, 108, 385–397.
  • Steorts, R. C. (2015), “Entity Resolution With Empirically Motivated Priors,” Bayesian Analysis, 10, 849–875.
  • Steorts, R., Ventura, S., Sadinle, M., and Fienberg, S. (2014), “A Comparison of Blocking Methods for Record Linkage,” in Privacy in Statistical Databases, ed. J. Domingo-Ferrer, Berlin: Springer, pp. 253–268.
  • Tancredi, A., and Liseo, B. (2011), “A Hierarchical Bayesian Approach to Record Linkage and Population Size Problems,” Annals of Applied Statistics, 5, 1553–1585.
  • Winkler, W. (1999), “The State of Record Linkage and Current Research Problems,” Technical Report, Statistical Research Division, U.S. Bureau of the Census. Available at https://www.census.gov/srd/papers/pdf/rr99-04.pdf
  • Winkler, W. (2000), “Machine Learning, Information Retrieval, and Record Linkage,” in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 20–29. Available at http://www.niss.org/affiliates/dqworkshop/papers/winkler.pdf.
  • Zhang, S., Shih, Y.-C. T., and Müller, P. (2007), “A Spatially-Adjusted Bayesian Additive Regression Tree Model to Merge Two Datasets,” Bayesian Analysis, 2, 611–633.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.