1,689
Views
52
CrossRef citations to date
0
Altmetric
Theory and Methods

Bayesian Estimation of Bipartite Matchings for Record Linkage

Pages 600-612 | Received 01 Jun 2015, Published online: 30 Mar 2017
 

ABSTRACT

The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is nontrivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal article by Fellegi and Sunter in Citation1969. These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador. Supplementary materials for this article are available online.

Acknowledgments

The author thanks Kira Bokalders, Bill Eddy, Steve Fienberg, Rebecca Nugent, Jerry Reiter, Beka Steorts, Andrea Tancredi, Bill Winkler, the editors, associate editor, and referees for helpful comments and suggestions on earlier versions of this article, Patrick Ball and Megan Price from the Human Rights Data Analysis Group – HRDAG for providing access to the data used in this article, and Peter Christen for sharing his synthetic datafile generator.

Funding

This research is derived from the Ph.D. thesis of the author and was supported by NSF grants SES-11-30706 to Carnegie Mellon University and SES-11-31897 to Duke University/NISS.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.