160
Views
0
CrossRef citations to date
0
Altmetric
ORIGINAL RESEARCH

A Harmonised Approach to Curating Research-Ready Datasets for Asthma, Chronic Obstructive Pulmonary Disease (COPD) and Interstitial Lung Disease (ILD) in England, Wales and Scotland Using Clinical Practice Research Datalink (CPRD), Secure Anonymised Information Linkage (SAIL) Databank and DataLoch

, , ORCID Icon, ORCID Icon, ORCID Icon, , , , & ORCID Icon show all
Pages 235-247 | Received 30 Aug 2023, Accepted 23 Feb 2024, Published online: 04 Apr 2024
 

Abstract

Background

Electronic healthcare records (EHRs) are an important resource for health research that can be used to improve patient outcomes in chronic respiratory diseases. However, consistent approaches in the analysis of these datasets are needed for coherent messaging, and when undertaking comparative studies across different populations.

Methods and Results

We developed a harmonised curation approach to generate comparable patient cohorts for asthma, chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) using datasets from within Clinical Practice Research Datalink (CPRD; for England), Secure Anonymised Information Linkage (SAIL; for Wales) and DataLoch (for Scotland) by defining commonly derived variables consistently between the datasets. By working in parallel on the curation methodology used for CPRD, SAIL and DataLoch for asthma, COPD and ILD, we were able to highlight key differences in coding and recording between the databases and identify solutions to enable valid comparisons.

Conclusion

Codelists and metadata generated have been made available to help re-create the asthma, COPD and ILD cohorts in CPRD, SAIL and DataLoch for different time periods, and provide a starting point for the curation of respiratory datasets in other EHR databases, expediting further comparable respiratory research.

Data Sharing Statement

Data are available on request from the CPRD. Their provision requires the purchase of a license, and this license does not permit the authors to make them publicly available to all. Licenses are available from the CPRD (http://www.cprd.com): The Clinical Practice Research Datalink Group, The Medicines and Healthcare products Regulatory Agency, 10 South Colonnade, Canary Wharf, London E14 4PU. The Stata scripts used within CPRD have been made available via a project GitHub. Please contact the corresponding author to access.

For SAIL, researchers can apply to use the underlying data tables and the curation scripts on a collaborative basis within the SAIL trusted research environment, subject to the standard SAIL project application process and agreeing to a derived dataset policy (saildatabank.com/contact).

The DataLoch data are available as part of the DataLoch Respiratory Registry – a de-identified registry of linked respiratory data from the South-East Scotland region – which can be accessed by application to the DataLoch service (dataloch.org/connect-with-us).

Ethics Statement

CPRD has NHS Health Research Authority (HRA) Research Ethics Committee (REC) approval to allow the collection and release of anonymised primary care data for observational research [NHS HRA REC reference number: 05/MRE04/87]. Each year CPRD obtains Section 251 regulatory support through the HRA Confidentiality Advisory Group (CAG), to enable patient identifiers, without accompanying clinical data, to flow from CPRD contributing GP practices in England to NHS Digital, for the purposes of data linkage [CAG reference number: 21/CAG/0008]. The protocol for this research was approved by CPRD’s Research Data Governance (RDG) Process (protocol number: 22_001769) and the approved protocol is available upon request. Linked pseudonymised data was provided for this study by CPRD. Data is linked by NHS Digital, the statutory trusted third party for linking data, using identifiable data held only by NHS Digital. Select general practices consent to this process at a practice level with individual patients having the right to opt-out.

All work conducted in SAIL Databank was completed under the permission and approval of the SAIL independent Information Governance Review Panel (IGRP) under project number 1387.

The DataLoch work was reviewed and approved under the project number DL_2022_054.

Acknowledgments

This study makes use of anonymised data held in CPRD Aurum, SAIL Databank, and DataLoch. We would like to acknowledge all the data providers who make anonymised data available for research.

This study is based in part on data from the Clinical Practice Research Datalink obtained under licence from the UK Medicines and Healthcare products Regulatory Agency. The data is provided by patients and collected by the NHS as part of their care and support. The interpretation and conclusions contained in this study are those of the author/s alone. Hospital Episode Statistics (HES) data, copyright © 2023, re-used with the permission of The Health & Social Care Information Centre. All rights reserved.

SAIL Databank receives core funding from the Welsh Government’s Health and Care Research Wales.

DataLoch is core-funded by the Data-Driven Innovation programme within the Edinburgh and South East Scotland City Region Deal (ddi.ac.uk) and the Chief Scientist Office, Scottish Government (http://www.cso.scot.nhs.uk).

Part of this paper was presented at the Dataloch conference as an abstract presentation with interim findings. The poster’s abstract was published online Enabling innovation in health and social care (dataloch.org)

This work was discussed with the BREATHE curiosity group prior to initiation to obtain views on the usefulness of the project.

Disclosure

AS has been supported by institutional research grants from the Industrial Strategy Challenge Fund, the Medical Research Council and Health Data Research, and from the UK and Scottish Governments for the Usher Data Driven Innovation Hub which manages DataLoch. JKQ has been supported by institutional research grants from the Industrial Strategy Challenge Fund, the Medical Research Council, Health Data Research, GSK, BI, asthma+lung UK, AZ and received personal fees for advisory board participation, consultancy or speaking fees from GlaxoSmithKline, Evidera, Chiesi, AstraZeneca, Insmed. SS reports grants from Industrial Strategy Challenge Fund, grants from Medical Research Council, grants from Health Data Research UK, during the conduct of the study; grants from Industrial Strategy Challenge Fund, grants from e Medical Research Council, grants from Health Data Research UK, outside the submitted work. CO reports grants from Medical Research Council, during the conduct of the study. The authors report no other conflicts of interest in this work.

Additional information

Funding

This work is supported by BREATHE-The Health Data Research Hub for Respiratory Health (MC_PC_19004). BREATHE is funded through the UK Research and Innovation Industrial Strategy Challenge Fund with additional support from the Medical Research Council and delivered through Health Data Research UK. Infrastructure support for this research was provided by the NIHR Imperial Biomedical Research Centre (BRC).