1,689
Views
52
CrossRef citations to date
0
Altmetric
Theory and Methods

Bayesian Estimation of Bipartite Matchings for Record Linkage

Pages 600-612 | Received 01 Jun 2015, Published online: 30 Mar 2017

References

  • Ball, P. (2000), “The Salvadoran Human Rights Commission: Data Processing, Data Representation, and Generating Analytical Reports,” in Making the Case: Investigating Large Scale Human Rights Violations Using Information Systems and Data Analysis , eds. P. Ball , H. F. Spirer , and L. Spirer , New York and Washington DC: AAAS.
  • Belin, T. R. , and Rubin, D. B. (1995), “A Method for Calibrating False-Match Rates in Record Linkage,” Journal of the American Statistical Association , 90, 694–707.
  • Berger, J. O. (1985), Statistical Decision Theory and Bayesian Analysis (2nd ed.), New York: Springer.
  • Bernardo, J. M. , and Smith, A. F. M. (1994), Bayesian Theory , New York: Wiley.
  • Bilenko, M. , Mooney, R. J. , Cohen, W. W. , Ravikumar, P. , and Fienberg, S. E. (2003), “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems , 18, 16–23.
  • Center for Disease Control and Prevention (2015), “Link Plus,” available at http://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm .
  • Christen, P. (2008), “Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification,” in KDD ’08 , ACM, pp. 151–159.
  • ——— (2012a), “A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication,” IEEE TKDE , 24, 1537–1555.
  • ——— (2012b), Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection , Berlin, Heidelberg: Springer-Verlag.
  • Christen, P. , and Pudjijono, A. (2009), “Accurate Synthetic Generation of Realistic Personal Information,” in Advances in KDD (Vol. 5476), eds. T. Theeramunkong, B. Kijsirikul, N. Cercone, and T.-B. Ho, New York: Springer, pp. 507–514.
  • Christen, P. , and Vatsalan, D. (2013), “Flexible and Extensible Generation and Corruption of Personal Data,” in ACM Conference of Information and Knowledge Management (CIKM), pp. 1165–1168.
  • Cohen, W. W. , Ravikumar, P. , and Fienberg, S. E. (2003), “A Comparison of String Distance Metrics for Name-Matching Tasks,” in KDD Workshop on Data Cleaning and Object Consolidation , pp. 73–78.
  • Copas, J. B. , and Hilton, F. J. (1990), “Record Linkage: Statistical Models for Matching Computer Records,” Journal of the Royal Statistical Society , Series A, 153, 287–320.
  • Dempster, A. P. , Laird, N. M. , and Rubin, D. B. (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society , Series B, 39, 1–38.
  • Elmagarmid, A. K. , Ipeirotis, P. G. , and Verykios, V. S. (2007), “Duplicate Record Detection: A Survey,” IEEE Transactions on Knowledge and Data Engineering , 19, 1–16.
  • Fair, M. (2004), “Generalized Record Linkage System — Statistics Canada’s Record Linkage Software,” Austrian Journal of Statistics , 33, 37–53.
  • Fellegi, I. P. , and Sunter, A. B. (1969), “A Theory for Record Linkage,” Journal of the American Statistical Association , 64, 1183–1210.
  • Fortini, M. , Liseo, B. , Nuccitelli, A. , and Scanu, M. (2001), “On Bayesian Record Linkage,” Research in Official Statistics , 4, 185–198.
  • Fortini, M. , Nuccitelli, A. , Liseo, B. , and Scanu, M. (2002), “Modeling Issues in Record Linkage: A Bayesian Perspective,” in Proceedings of the Section on Survey Research Methods , ASA, pp. 1008–1013.
  • Green, P. J. , and Mardia, K. V. (2006), “Bayesian Alignment Using Hierarchical Models, With Applications in Protein Bioinformatics,” Biometrika , 93, 235–254.
  • Gutman, R. , Afendulis, C. C. , and Zaslavsky, A. M. (2013), “A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs,” Journal of the American Statistical Association , 108, 34–47.
  • Herbei, R. , and Wegkamp, M. H. (2006), “Classification With Reject Option,” Canadian Journal of Statistics , 34, 709–721.
  • Herzog, T. N. , Scheuren, F. J. , and Winkler, W. E. (2007), Data Quality and Record Linkage Techniques , New York: Springer.
  • Howland, T. (2008), “How El Rescate, a Small Nongovernmental Organization, Contributed to the Transformation of the Human Rights Situation in El Salvador,” Human Rights Quarterly , 30, 703–757.
  • Jaro, M. A. (1989), “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,” Journal of the American Statistical Association , 84, 414–420.
  • Jewell, N. P. , Spagat, M. , and Jewell, B. L. (2013), “MSE and Casualty Counts: Assumptions, Interpretation, and Challenges,” in Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict , eds. T. B. Seybolt , J. D. Aronson , and B. Fischhoff , Oxford, UK: Oxford University Press, pp. 185–211.
  • Larsen, M. D. (2002), “Comments on Hierarchical Bayesian Record Linkage,” in Proceedings of the Section on Survey Research Methods , Alexandria, VA: ASA, pp. 1995–2000.
  • ——— (2005), “Advances in Record Linkage Theory: Hierarchical Bayesian Record Linkage Theory,” in Proceedings of the Section on Survey Research Methods , Alexandria, VA: ASA, pp. 3277–3284.
  • ——— (2010), “Record Linkage Modeling in Federal Statistical Databases,” in FCSM Research Conference, Washington DC: Federal Committee on Statistical Methodology.
  • Larsen, M. D. , and Rubin, D. B. (2001), “Iterative Automated Record Linkage Using Mixture Models,” Journal of the American Statistical Association , 96, 32–41.
  • Liseo, B. , and Tancredi, A. (2011), “Bayesian Estimation of Population Size via Linkage of Multivariate Normal Data Sets,” Journal of Official Statistics , 27, 491–505.
  • Little, R. J. A. , and Rubin, D. B. (2002), Statistical Analysis With Missing Data (2nd ed.), Hoboken, NJ: Wiley.
  • Lovász, L. , and Plummer, M. D. (1986), Matching Theory , Amsterdam: North-Holland.
  • Lum, K. , Price, M. E. , and Banks, D. (2013), “Applications of Multiple Systems Estimation in Human Rights Research,” The American Statistician , 67, 191–200.
  • Matsakis, N. E. (2010), “Active Duplicate Detection With Bayesian Nonparametric Models,” Ph.D. dissertation, Massachusetts Institute of Technology.
  • Meng, X.-L. , and Rubin, D. B. (1993), “Maximum Likelihood Estimation via the ECM Algorithm: A General Framework,” Biometrika , 80, 267–278.
  • Newcombe, H. B. , and Kennedy, J. M. (1962), “Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information,” Communications of the ACM , 5, 563–566.
  • Newcombe, H. B. , Kennedy, J. M. , Axford, S. J. , and James, A. P. (1959), “Automatic Linkage of Vital Records,” Science , 130, 954–959.
  • Papadimitriou, C. H. , and Steiglitz, K. (1982), Combinatorial Optimization: Algorithms and Complexity , Upper Saddle River, NJ: Prentice-Hall.
  • Plummer, M. , Best, N. , Cowles, K. , and Vines, K. (2006), “CODA: Convergence Diagnosis and Output Analysis for MCMC,” R News , 6, 7–11.
  • R Core Team (2013), R: A Language and Environment for Statistical Computing , Vienna, Austria: R Foundation for Statistical Computing.
  • Sadinle, M. (2014), “Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach,” Annals of Applied Statistics , 8, 2404–2434.
  • Sadinle, M. , and Fienberg, S. E. (2013), “A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems,” Journal of the American Statistical Association , 108, 385–397.
  • Steorts, R. C. (2015), “Entity Resolution With Empirically Motivated Priors,” Bayesian Analysis, 4, 849–875.
  • Steorts, R. C. , Hall, R. , and Fienberg, S. E. (2015), “A Bayesian Approach to Graphical Record Linkage and Deduplication,” Journal of the American Statistical Association, 111, 1660–1672.
  • Tancredi, A. , and Liseo, B. (2011), “A Hierarchical Bayesian Approach to Record Linkage and Size Population Problems,” Annals of Applied Statistics , 5, 1553–1585.
  • Ventura, S. L. , Nugent, R. , and Fuchs, E. R. H. (2015), “Seeing the Non-Stars: (Some) Sources of Bias in Past Disambiguation Approaches and a New Public Tool Leveraging Labeled Records,” Research Policy, 9, 1672–1701.
  • Winkler, W. E. (1988), “Using the EM Algorithm for Weight Computation in the Fellegi-Sunter Model of Record Linkage,” in Proceedings of the Section on Survey Research Methods , Alexandria, VA: ASA, pp. 667–671.
  • Winkler, W. E. (1990), “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage,” in Proceedings of the Section on Survey Research Methods , Alexandria, VA: ASA, pp. 354–359.
  • ——— (1993), “Improved Decision Rules in the Fellegi-Sunter Model of Record Linkage,” in Proceedings of Survey Research Methods Section , Alexandria, VA: ASA, pp. 274–279.
  • ——— (1994), “Advanced Methods for Record Linkage,” in Proceedings of the Section on Survey Research Methods , Alexandria, VA: ASA, pp. 467–472.
  • ——— (2002), “Methods for Record Linkage and Bayesian Networks,” in Proceedings of the Section on Survey Research Methods , Alexandria, VA: ASA, pp. 3743–3748.
  • ——— (2006), “Overview of Record Linkage and Current Research Directions,” Research Report Series no. 2006-2, U.S. Census Bureau.
  • Winkler, W. E. , and Thibaudeau, Y. (1991), “An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census,” Statistical Research Division Technical Report 91-9, U.S. Census Bureau.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.