161
Views
4
CrossRef citations to date
0
Altmetric
Articles

A simple two-step procedure using the Fellegi–Sunter model for frequency-based record linkage

, &
Pages 2789-2804 | Received 10 Aug 2020, Accepted 22 Apr 2021, Published online: 04 May 2021
 

Abstract

The widely used Fellegi–Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi–Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi–Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This project was supported from the Agency for Healthcare Research and Quality [grant number R01HS023808] and from the Patient-Centered Outcomes Research Institute [grant number ME-2017C1-6425]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Agency for Healthcare Research and Quality or the Patient-Centered Outcomes Research Institute.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 549.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.