2,520
Views
18
CrossRef citations to date
0
Altmetric
Research Paper

Assessing data linkage quality in cohort studies

ORCID Icon, &
Pages 218-226 | Received 24 Oct 2019, Accepted 03 Mar 2020, Published online: 20 May 2020

References

  • Aldridge RW, Shaji K, Hayward AC, Abubakar I. 2015. Accuracy of probabilistic linkage using the enhanced matching system for public health and epidemiological studies. PLoS One. 10(8):e0136179.
  • Ali MS, Ichihara MY, Lopes LC, Barbosa GCG, Pita R, Carreiro RP, dos Santos DB, et al. 2019. Administrative data linkage in Brazil: potentials for health technology assessment. Front Pharmacol. 10(984):984.
  • Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, Sørensen H, et al. 2015. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement. PLoS Med. 12(10):e1001885.
  • Blakely T, Salmond C. 2002. Probabilistic record linkage and a method to calculate the positive predictive value. Int J Epidemiol. 31(6):1246–1252.
  • Bohensky M. 2015. Chapter 4: bias in data linkage studies. In: Harron K, Dibben C, Goldstein H, editors. Methodological developments in data linkage. London: Wiley.
  • Bohensky M, Jolley D, Sundararajan V, Evans S, Pilcher D, Scott I, Brand C. 2010. Data linkage: a powerful research tool with potential problems. BMC Health Serv Res. 10(1):346–352.
  • Boyd A, Thomas R, Hansell AL, Gulliver J, Hicks LM, Griggs R, Vande Hey J, et al. 2019. Data resource profile: the ALSPAC birth cohort as a platform to study the relationship of environment and health and social factors. Int J Epidemiol. 48(4):1038–1039k.
  • Centre for Health Record Linkage 2012. Quality Assurance. 2012 [accessed 2020 Mar 13]. http://www.cherel.org.au/quality-assurance.
  • Chamberlayne R, Green B, Barer M, Hertzman C, Lawrence W, Sheps S. 1998. Creating a population-based linked health database: a new resource for health services research. Can J Public Health. 89(4):270–273.
  • Chiu M, Lebenbaum M, Lam K, Chong N, Azimaee M, Iron K, Manuel D, Guttmann A. 2016. Describing the linkages of the immigration, refugees and citizenship Canada permanent resident data and vital statistics death registry to Ontario’s administrative health database. BMC Med Inform Decis Mak. 16(1):135.
  • Christen P, Goiser K. 2007. Quality and complexity measures for data linkage and deduplication. quality measures in data mining. New York (NY): Springer; p. 127–151.
  • Christen P, Pudjijono A. 2009. Accurate synthetic generation of realistic personal information. Berlin: Heidelberg.
  • Davis KAS, Cullen B, Adams M, Brailean A, Breen G, Coleman JRI, Dregan A, et al. 2019. Indicators of mental disorders in UK Biobank-A comparison of approaches. Int J Methods Psychiatr Res. 28(3):e1796.
  • Doidge J, Harron K. 2018. Demystifying probabilistic linkage. Int J Popul Data Sci. 3:1.
  • Doidge J, Harron K. 2019. Reflections on modern methods: linkage error bias. Int J Epidemiol. 48(6):p2050–2060. DOI:10.1093/ije/dyz203
  • Doidge J, Morris J, Harron K, Stevens S, Gilbert R. 2019. Prevalence of Down’s Syndrome in England, 1998–2013: comparison of linked surveillance data and electronic health records. Int J Popul Data Sci. in press.
  • Doiron D, Raina P, Fortier I, Linkage Between Cohorts and Health Care Utilization Data: Meeting of Canadian Stakeholders workshop participants. 2013. Linking Canadian population health data: maximizing the potential of cohort and administrative data. Can J Public Health. 104(3):e258–e261.
  • Downs JM, Ford T, Stewart R, Epstein S, Shetty H, Little R, Jewell A, et al. 2019. An approach to linking education, social care and electronic health records for children and young people in South London: a linkage study of child and adolescent mental health service data. BMJ Open. 9(1):e024355.
  • Elfeky M, Verykios V, Elmagarmid A, Ghanem T, Huwait A. 2003. Record Linkage: a machine learning approach, a toolbox, and a digital government web service. computer science technical reports. West Lafayette: Department of Computer Science Purdue University.
  • Fellegi I, Sunter A. 1969. A theory for record linkage. J Am Stat Assoc. 64(328):1183–1210.
  • Ford D, Jones K, Verplancke JP, Lyons R, John G, Brown G, Brooks C, et al. 2009. The SAIL Databank: building a national architecture for e-health research and evaluation. BMC Health Serv Res. 9(1):157.
  • Ford JB, Roberts CL, Taylor LK. 2006. Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. Paediatr Perinat Epidemiol. 20(4):329–337.
  • Gilbert R, Lafferty R, Hagger-Johnson G, Harron K, Zhang L-C, Smith P, Dibben C, Goldstein H. 2018. GUILD: Guidance for Information about Linking Datasets. J Public Health. 40(1):191–198.
  • Goldstein H, Carpenter J, Kenward MG, Levin KA. 2009. Multilevel models with multivariate mixed response types. Statistical Modelling. 9(3):173–197.
  • Goldstein H, Harron K, Cortina-Borja M. 2017. A scaling approach to record linkage. Statist Med. 36(16):2514–2521.
  • Goldstein H, Harron K, Wade A. 2012. The analysis of record-linked data using multiple imputation with data value priors. Statist Med. 31(28):3481–3493.
  • Hagger-Johnson G, Harron K, Fleming T, Gilbert R, Goldstein H, Landy R, Parslow RC. 2015. Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records. BMJ Open. 5(8):e008118.
  • Hagger-Johnson G, Harron K, Goldstein H, Aldridge R, Gilbert R. 2017. Probabilistic linkage to enhance deterministic algorithms and reduce data linkage errors in hospital administrative data. J Innov Health Inform. 24(2):891.
  • Hagger-Johnson G, Harron K, Gonzalez-Izquierdo A, Cortina-Borja M, Dattani N, Muller-Pebody B, Parslow R, et al. 2015. Identifying false matches in anonymised hospital administrative data without patient identifiers. Health Serv Res. 50(4):1162–1178.
  • Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto M, Goldstein H. 2017. Challenges in administrative data linkage for research. Big Data Society. 4(2):2053951717745678..
  • Harron K, Gilbert R, Cromwell DA, van der Meulen JH. 2016. Linking data for mothers and babies in de-identified electronic health data. PLoS One. 11(10):e0164667.
  • Harron K, Goldstein H, Wade A, Muller-Pebody B, Parslow R, Gilbert R. 2013. Linkage, evaluation and analysis of national electronic healthcare data: application to providing enhanced blood-stream infection surveillance in paediatric intensive care. PLoS One. 8(12):e85278
  • Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H. 2014. Evaluating bias due to data linkage error in electronic healthcare records. BMC Med Res Methodol. 14(1):36.
  • Herbert A, Gilbert R, González-Izquierdo A, Li L. 2015. Violence, self-harm and drug or alcohol misuse in adolescents admitted to hospitals in England for injury: a retrospective cohort study. BMJ Open. 5(2):e006079
  • Hockley C, Quigley M, Hughes G, Calderwood L, Joshi H, Davidson L. 2008. Linking Millennium Cohort data to birth registration and hospital episode records. Paediatr Perinat Epidemiol. 22(1):99–109.
  • Jorm L. 2015. Routinely collected data as a strategic resource for research: priorities for methods and workforce. Public Health Res Pr. 25(4):e2541540.
  • Jutte DP, Roos L, Brownell MD. 2011. Administrative record linkage as a tool for public health research. Annu Rev Public Health. 32(1):91–108.
  • Kelman C, Bass A, Holman C. 2002. Research use of linked health data – a best practice protocol. Aust N Z J Public Health. 26(3):251–255.
  • Lariscy JT. 2011. Differential record linkage by hispanic ethnicity and age in linked mortality studies. J Aging Health. 23(8):1263–1284.
  • Lawson EH, Ko CY, Louie R, Han L, Rapp M, Zingmond DS. 2013. Linkage of a clinical surgical registry with Medicare inpatient claims data using indirect identifiers. Surgery. 153(3):423–430.
  • Li Y, Hall M, Fisher BT, Seif AE, Huang Y-S, Bagatell R, Getz KD, et al. 2015. Merging children’s oncology group data with an external administrative database using indirect patient identifiers: a report from the children’s oncology group. PLoS One. 10(11):e0143480.
  • Ludvigsson JF, Otterblad-Olausson P, Pettersson BU, Ekbom A. 2009. The Swedish personal identity number: possibilities and pitfalls in healthcare and medical research. Eur J Epidemiol. 24(11):659–667.
  • Moore CL, Amin J, Gidding HF, Law MG. 2014. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS One. 9(7):e103690.
  • Paixão ES, Campbell OMR, Rodrigues LC, Teixeira MG, Costa MCN, Brickley EB, Harron K. 2019. Validating linkage of multiple population-based administrative databases in Brazil. PLoS One. 14(3):e0214050–e0214050.
  • Pita R, Mendonça E, Reis S, Barreto M, Denaxas S. 2017. A machine learning trainable model to assess the accuracy of probabilistic record linkage. Heidelberg (NY): Springer Cham.
  • Pita R, Pinto C, Sena S, Fiaccone R, Amorim L, Reis S, Barreto M, et al. 2018. On the accuracy and scalability of probabilistic data linkage over the brazilian 114 million cohort. IEEE J Biomed Health Inform. 22(2):346–353.
  • Rubin D. 1987. Multiple imputation for nonresponse in surveys. New York: Wiley.
  • Sayers A, Ben-Shlomo Y, Blom AW, Steele F. 2016. Probabilistic record linkage. Int J Epidemiol. 45(3):954–964.
  • Sterne J, White I, Carlin J, Spratt M, Royston P, Kenward M, Wood A, Carpenter J. 2009. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 338(1):b2393–b2393.
  • Wellcome’s Longitudinal Population Studies Working Group. 2017. Longitudinal Population Studies Strategy [accessed 2020 Mar 13]. https://wellcome.ac.uk/sites/default/files/longitudinal-population-studies-strategy_0.pdf.
  • Zhang G, Parker JD, Schenker N. 2016. Multiple imputation for missingness due to nonlinkage and program characterstics. A case study of the National Health Interview Survey linked to Medicare claims. J Surv Stat Methodol. 4(3):319–338.