1,428
Views
0
CrossRef citations to date
0
Altmetric
Perspective

Real world data for rare diseases research: The beginner’s guide to registries

, , , &
Pages 9-15 | Received 10 May 2023, Accepted 17 Jul 2023, Published online: 01 Aug 2023

ABSTRACT

Introduction

Rare disease research has specific challenges that can be addressed using registries.

Areas covered

There are at least three different types of registries: patient registries, disease registries, and product registries. Patient registries recruit rare disease patients, potentially including several rare diseases within a registry, while disease registries can be considered a subset of patient registries which focus on specific diseases. Product registries focus on specific drugs. These registries may be used to conduct research that is specifically requested by a regulatory authority, they may be developed by a drug company to monitor the use of a particular drug or may be developed for public health monitoring purposes.

Expert Opinion

Compared to other sources of real-world data (RWD), such as electronic medical records (EMRs) and claims data, registries are more likely to have a correct diagnosis and more specific information about RDs. However, registries also have their challenges. Competition between registries may lead to missing or incomplete data. Registries could also have limited information on drug and medical history, which are better captured in EMRs or claims. Nevertheless, registries remain an important source of RWD in the rare disease space and are increasingly being leveraged to comply with regulatory requirements.

1. Introduction

Real-world data (RWD) are routinely collected data relating to a patient’s health status or the delivery of health care from a variety of sources other than traditional clinical trials [Citation1–3]. RWD has increasingly been utilized to generate evidence on the use and outcomes of drugs [Citation4]. The use of RWD for research purposes is not new and has been a cornerstone of pharmacoepidemiologists for decades. The growth of electronic health records and advancement in technology, including the use of the cloud and the 21st Century Cures Act in the U.S.A., have further increased the importance of RWD for faster drug development and/or to provide evidence on effectiveness and safety of pharmaceuticals. This situation was accelerated during the coronavirus-19 pandemic where rapid regulatory decisions needed to be made to control or manage the pandemic.

Recently, the US Food and Drug Administration (FDA) as well as European investigators have published definitions of real-world evidence (RWE) and multiple guidance documents to use RWE for regulatory purpose. According to the FDA, real-world evidence (RWE) is defined as ‘the clinical evidence regarding the usage, and potential benefits or risks, of a medical product derived from analysis of RWD’ [Citation1], while in Europe, RWE is defined as ‘the information derived from the analysis of RWD’ [Citation3]. In earlier stages of the drug lifecycle, RWE enables quantifying and characterizing the target population, understanding the disease characteristics, contributing to the planning of clinical trials, and complementing risk-benefit assessment [Citation5]. In later stages, RWE makes it possible to confirm drug effectiveness in real-world patient populations, to identify subgroups, and to assess drug safety.

Drugs for Rare Diseases (RD) have specific evidence needs, including prevalence estimates to fulfill regulatory thresholds for orphan designation (ODD). The ODD was implemented by regulatory agencies to stimulate the development of drugs for patients with RD [Citation6,Citation7]. However, each regulatory agency establishes the prevalence thresholds to consider a RD and other requirements [Citation8]. For example, the European Medicine Agency (EMA) requires that the prevalence of the condition in the EU should be less than 5 in 10,000 to be considered rare or ‘it must be unlikely that marketing of the medicine would generate sufficient returns to justify the investment needed for its development; the product must be intended for the treatment, prevention or diagnosis of a disease that is life-threatening or chronically debilitating; and there is no satisfactory method of diagnosis, prevention or treatment of the condition concerned can be authorized, or, if such a method exists, the medicine must be of significant benefit to those affected by the condition’ [Citation7]. The FDA defines a condition as rare if it has a prevalence of ˂200,000 persons in the US population [Citation6].

It is challenging to generate robust RWE for RD, which encompasses a vast range of different conditions that are frequently genetic and present in early childhood. Given the very small number of patients (ultra-rare diseases may affect as few as 1 person per 50,000 [Citation9] and their wide geographical dispersion, it is difficult to comprehensively identify cases and obtain adequate samples to assess drug efficacy and safety using study designs that are standard for more common diseases. Moreover, RDs are frequently underdiagnosed due to heterogeneity in presentation, course, genetic profile, and limited clinical expertise outside a few specialized centers. Additionally, the lack of treatment standards hinders the definition of a comparator group. Given the restrictions in generating robust efficacy and safety data in the development phase, post-approval studies for orphan drugs play an increasingly important role for these medications [Citation10]. Patient registries are often established to evaluate post-approval safety and up to 67% of drugs approved by EMA with the request of a registry had orphan designation [Citation11]. This article aims to describe RWD relevant for RD research, focusing on registries, and to discuss their uses, advantages, and limitations.

2. Characteristics of registries

Rare disease-related registries are organized systems that use observational study methods to collect uniform data to evaluate specified outcomes for a population defined by a disease, condition, or exposure (including use of a specific drug), that is followed up over time [Citation12,Citation13]. Registries are established for one or more scientific, clinical, or policy purposes, including evaluation of drug safety, and their design vary accordingly [Citation12,Citation14]. There are three main kinds of registries: patient registries, disease-based registries, and product-based registries.

Patient registries have traditionally been generated by academic or research institutions and operated as a single institution or multicenter registry. Recruitment is typically clinic- and physician-based. However, the active participation of patients, family members, and/or advocacy groups have resulted in patient-powered patient registries [Citation12] such as the Italian Rare Disease Registry [Citation15]. In this case, the recruitment is carried out by patients’ advocacy groups or they can facilitate the recruitment and identification of research priorities. For instance, patients can be reported to the registry by the specialist who made the diagnosis or by the treating physician, or they can enroll directly, e.g. through dedicated websites or advocacy groups, but a combination of these mechanisms is frequent. Validation of diagnosis is a requirement for good data quality. The term 'patient registry' has also been used for clinical registries, clinical data registries, disease registries, and outcome registries.

A disease-based registry is a patient registry whose members are defined by a particular disease or disease-related patient characteristic regardless of exposure to any medicinal product, other treatment, or particular health service [Citation16]. They include longitudinal data on patients based on a defined diagnosis, focusing on one specific or a group of RDs. They collect static information about demographics, family history, genetic and clinical characteristics at study entry. The registries also collect retrospective and prospective time-varying characteristics including examinations, laboratory tests, medical therapies, and outcomes. These outcomes may include patients’ reported outcomes and quality of life measures, but also changes in treatment and clinical parameters. These longitudinal data enable natural history information to emerge from patient registries. Patient data can be linked to external sources of data (e.g. mortality registries, biobanks, imaging, or pathology registries).

RD-related registries support several research and regulatory endeavors which include informing diagnosis ascertainment and its timing with clinical/genetic research, informing therapeutic guidelines, or identifying gaps in care through natural history, prognosis, evaluation of treatment effects, assessing the safety of treatments, quantifying outcomes of care, evaluating quality of care, quality of life, and disease management. It is important to distinguish a patient registry from a registry-based study, where in the last case, the data already collected or infrastructure or one or multiple registries are used to investigate a research question [Citation16].

Another type of registry is product-based registry where data is collected for targeting patients exposed to a specific medicinal product or substance. This type of registry is useful to collect additional information on the effectiveness, safety, and use of the product. Examples of this include the Italian Drug Agency’s drug monitoring registries [Citation17] or some registries developed by pharmaceutical companies to monitor the use and safety of their products [Citation18].

3. Use of registries to generate evidence on rare diseases

Registries are databases that describe the epidemiology of rare or ultra-rare conditions, which may serve the purpose of orphan drug filing, and to characterize clinical phenotypes and genotypes. Since registry recruitment usually occurs through clinicians, the diagnoses are more reliable than those identified in other RWD sources, such as electronic medical records or claims databases [Citation19]. Registries also contain much more specific information on RDs than other RWD sources. For instance, the Grace Science Foundation N-glycanase 1 (NGLY1) Registry [Citation20] collects demographics, clinical, and genetic (i.e. genetic test reports and/or biochemical testing validated via genetic counselor review) information of patients with NGLY1 deficiency (MIM #615273), an ultra-rare, autosomal recessive disorder. The analysis of registry data indicated an incidence of this disease in the US of ~12 individuals born per year, higher than prior literature reports, and identified previously unreported NGLY1 variants [Citation21].

Registries can also provide evidence on natural history and prognostic factors. For instance, evidence on overall survival and prognostic factors for overall survival of neuroendocrine neoplasms (NENs) was obtained from the European Neuroendocrine Tumour Society registry, thanks to the large-scale data collected in this multi-center and multinational patient sample with long-term follow-up [Citation22]. Another example is the use of data from the International Niemann-Pick Disease Registry to describe the natural history of Niemann-Pick Disease Type C, characterized by progressive neurological and visceral manifestations [Citation23].

Registries can also provide evidence on how a rare disease is diagnosed and treated in real world clinical practice. For instance, EURACAN data have been used to characterize diagnostic and treatment patterns of pineal region tumors, rare intracranial tumors representing <1% of all adult intracranial tumor lesions [Citation24]. EURACAN is the European registry for all rare adult solid cancers, it was launched by the European Commission on 17 March 2017, registering patients from a European Reference Network of more than 70 highly specialized cancer centers across 24 European countries and involving health-care providers and patient representatives. Another example is the use of registry data to monitor drug safety in the post marketing setting. For instance, the EURACAN registry is being used to support the characterization of the safety profile of larotrectinib (Vitrakvi), a cancer medicine for treating solid tumors that display a neurotrophic tyrosine receptor kinase gene fusion [Citation25]. In the Duchenne’s muscular dystrophy research space, the STRIDE Registry proved important in comparing standard of care to a drug that was marketed, ataluren, as requested by the EMA’s Pharmacovigilance Risk Assessment Committee (PRAC) [Citation26].

An example of patient registries for RDs is the International Collaborative Gaucher Group (ICGG) Gaucher Registry [Citation27]. Established in 1991, the ICGG Gaucher Registry includes longitudinal data on an estimated 12,000 patients with a confirmed diagnosis of Gaucher disease, irrespectively of their treatment [Citation28]. Thanks to the broad international coverage and long-term follow-up, studies based on that registry have greatly contributed to getting important insights into this lysosomal storage disorder [Citation29–32]. For instance, disease presentation has been better characterized in even smaller sub-groups [Citation31,Citation32], and long-term real-world drug effectiveness and safety have been monitored [Citation30,Citation33]. Moreover, methodological issues related to the use of registry data for case–control studies have been explored [Citation34].

International networks of patient registries have been created, for instance, in the context of the Translational Research in Europe for the Assessment and Treatment of Neuromuscular disease initiative, known as TREAT-NMD [Citation35]. This encompasses patient registries for several NMDs, initially established as an EU funded ‘network of excellence,’. This initiative has a strong focus on the harmonization of data collection, common datasets focused on trial planning, feasibility, and recruitment, as well as gene-specific/locus-specific registries that collect genetic and clinical information, making it an important resource for genotype–phenotype correlation and natural history studies.

An example of a public health/surveillance registry is the National Data Bank for Rare Diseases (Banque Nationale de Données Maladies Rares, BNDMR) [Citation36]. This registry collects a minimum standardized set of information at the national level including clinical data. On the other hand, the urea cycle disorders consortium (UCDC) is one of the original networks supported by the Rare Diseases Clinical Research Network at the US National Institutes of Health (NIH) since 2008. UCDC originally included four US clinical sites and has expanded to include 16 sites in the U.S.A., Canada, and Europe with research supported by NIH, foundation, and philanthropic funding [Citation37]. This consortium supports the data collection of a natural history of participants with congenital deficiencies impairing urea biosynthesis. Up to date, this longitudinal study includes information on over 800 participants. Participants in this study provided longitudinal data including any hospitalization episode for hyperammonemia, results of age-adapted neuropsychological tests, laboratory measurements, and medical interventions or therapies including nitrogen scavenger drugs, nutrition, and liver transplantation. This longitudinal study has provided important information on the prevalence of these deficiencies (estimate from 1:14,000 to 1:350,000 depending on the deficiency type), prevalence of comorbidity such as mortality, hospitalization, and neuropsychological deficits, and management of the disease such as therapies used, neuropsychological patterns, and possible triggers for hyperammonemic episodes. The consortium has also supported multiple studies within the participating centers that collect additional data to assess specific aims, including developing better diagnoses or assessing effectiveness of novel urea cycle disorder therapies. Of note, inclusion in a registry is subject to legal and ethical obligation, including patients informed consent and data protection.

4. Strengths and limitations of registries

RD patient registries provide information on the descriptive epidemiology, in terms of population-based prevalence and incidence estimates, geographical distributions, and temporal trends, provided that cases are comprehensively registered, recruitment is complete and not affected by selection (e.g. self-selection of research-active patients), and the population is well defined and has a denominator. Patient registries with main public-health/surveillance purpose are usually suited for this scope, but the lack of clinical and follow-up information may limit their ability to study natural history or to implement analytic studies [Citation9].

Disease-based patient registries that collect detailed clinical data and follow-up are useful to characterize the disease and its natural history. There are several advantages over health-care databases. First, the population in a registry is well-defined with consensus-based clinical definition and ascertainment of eligibility criteria. In contrast, a heathcare database has to rely on reported diagnosis codes that may not be specific enough to identify the rare diseases. Second, registries may collect outcome information not routinely available in healthcare databases (e.g. functional status, pain, and patient reported outcomes). Finally, although the data in a registry may come from multiple sources, data are collected in a standardized way with standardized protocols, visit schedules, procedures, and a central database. In contrast, healthcare databases rely on standards of care that may vary from one healthcare practice to another. For the same characteristics, disease registries collecting detailed clinical and follow-up information offer advantages over retrospective chart reviews, which extract clinical information recorded in a non-uniform way across different centers, also reducing the work and time to retrieve retrospective patients.

Patient registries may also facilitate the planning and carrying out of clinical trials. For instance, the TREAT-NMD network implemented patient registries as a means to facilitate recruitment of participants for clinical trials [Citation35]. Furthermore, observational studies (e.g. cohort or case–control studies) may be implemented in registries, for instance, to evaluate post-marketing drug safety. Regulatory agencies, however, foster the establishment of registries not restricted to users of a single medication but also including untreated patients or those receiving another therapy to enable comparative assessment of drug outcomes. However, this is not without its challenges. There may be competition with sponsors in drug research and de novo registry establishment for rare disease. In addition, it is challenging to have registries meeting the needs of multi-stakeholders. It is also difficult to adopt different data collection strategies to accommodate sponsor objectives within a single RD registry, which may not even have been created for that purpose.

Recruitment plans that ensure the inclusion of cases with a validated diagnosis and to cover the entire spectrum of disease avoiding systematical selection/exclusion of subgroups (e.g. more complicated, more advanced or severe diseases, and older patients) are needed. To capture the entire spectrum of the disease, RD registries often need to be global and recruit cases from a multiplicity of centers worldwide. Additionally, the follow-up should be long enough and complete for as many patients as possible to capture the entire spectrum of disease courses and outcomes. It can be challenging to retain patents in the registries, particularly for RDs that manifest in childhood, because there is a need to follow the patient into adulthood.

However, registries have several limitations. Depending on the size of the registry and/or the catchment area, scientific data even at the aggregate level can potentially be unpublishable because the small number of patients could make the data identifiable. This may limit direct scientific public outreach. Another limitation might be the lack of detailed information on concomitant drugs, comorbidities, and healthcare utilization (e.g. hospital/emergency department admission) which are potentially more accurately captured in claims and EHRs than registries. Finally, registries capturing only cases referred at the discretion of a health-care professional may underestimate the number of patients with a specific RD. There are other limitations which are specific to each registry, depending on the data collection process and the quality/completeness of the data collected. It would be important to have high levels of transparency about a registry’s data quality [Citation15]. It is possible that competing registries lead to poor data collection in one of the registries. This could happen if data collection is mandatory in more than one registry.

5. Regulator initiatives and other relevant recommendations or guidelines

The EMA Patient Registry Initiative, launched in 2015, aims at improving the use of existing registries for regulatory decision-making and to facilitate the establishment of high-quality new registries if no adequate source of post-authorization data already exists [Citation38]. The EMA has also created a specific Task Force to leverage and optimize patient registries [Citation39]; a guideline for patient-based registries was published by EMA in 2021 [Citation16]. The FDA also published guidance on patient registries in 2021 [Citation40].

Specifically for RDs, the International Rare Diseases Research Consortium (IRDiRC) recommended that RD patient registries should be broad in scope rather than focused on a single therapeutic product [Citation41]. The IRDiRC recommendations aim to increase the quality of patient registries and their utility in facilitating the development of new therapies for RD. For instance, they highlight the importance of harmonization between registries (e.g. regarding coding systems, geographical coverage, and type of data collected), of interoperability and data linkage with other data sources (e.g. biobanks, genetic data, and other databases), and of assuring data quality and long-term sustainability. Useful resources to identify RD registries are the Registry of Patient Registries (RoPR) and OrphaNet. The European Network of Centres for Pharmacoepidemiology and Pharmacovigilance inventory of disease registries is another source to identify patient registries. EC JRC European Platform on Rare Disease Registration. Recommendations on rare disease registries were also developed by EURORDIS-NORD-CORD, which lists 10 principles that should guide the development and use of rare disease registries [Citation42].

6. Conclusion

Conducting research among rare disease patients can be challenging due to the difficulty of identifying or recruiting patients, especially for ultra-rare diseases. Registries are an important way of addressing these challenges, whether through disease registries, product registries, or more general patient registries. However, the use of registries to generate RWE requires careful consideration of the strengths and limitations of each data source and the appropriateness of study designs to apply. Although registries are useful to identify rare disease patients, the creation of consortia and other types of scientific collaborations are essential in order to recruit greater numbers of patients and develop and standardize methods that are increasingly adapted to generating RWE from these data sources.

7. Expert opinion

The use of registries for rare diseases is expanding around the world. With this increased interest there is a risk of fragmentation in the data landscape, leading to duplication of registries and multiplication of small registries of undefined quality. Recently, there have been concerns because of the burden that registries can produce on the healthcare system. These problems have been recognized, and measures are being taken to prevent them. In Europe, there was a distinct move from the work of individual clinical or research groups to much larger national groups, known as European Reference Networks (ERN) [Citation43]. Some of these groups, including the German Network for Mitochondrial Disorders and the German Network for Leukodystrophies, have developed registries for these diseases. Even larger, international registries have been created using funds from the European Commission, such as EUROGLYCANET, recruiting patients with congenital disorders of glycosylation and the European Rare Kidney Disease Registry. In Europe, the registry landscape is now considered to be much less scattered than it was before. However, this does not say much about the quality of existing registries. The European Commission has developed a set of basic data elements needed for a registry [Citation44], but to our knowledge, large-scale screening for compliance has not been done. This could be an important next step to guarantee the quality of the registries being used for rare disease research. Registries still have limitations related to rare disease research that have not yet been addressed. One important limitation concerns the coding of rare diseases. The lack of a dedicated and widely used coding system means that it may be difficult to derive epidemiological estimates from any data source where a rare disease is not coded well [Citation45]. In order for registries to be truly interoperable, future work should focus on mapping rare disease registries to a single coding system.

Another area that deserves more attention is the comparison of differences and similarities between types of registers. In this paper, registries were presented as three distinct types (patient registries, disease registries, and product registries) and indeed there are several differences between each type of registry. Patient registries encompass all therapeutic areas including rare diseases considering the patient’s perspective, and they might focus on one disease, or on a family of related rare diseases. Both patient and disease registries might lack information on treatment while this is a core component of drug registries, it will depend on the objective of the register. Furthermore, the source of data between patient registries and disease registries might be different. For patient registries, the source of data may be patients directly while for a disease registry, the source of data might be electronic medical records, HCPs, patients, or others. There are similarities between the three registry types that create some overlap. Patient registries and disease registries are arguably similar between themselves, with the latter being a more focused version of the former. If such registries contain consistently recorded information on RD treatment, an overlap with product registries occurs. In this sense, some registries can potentially be more correctly referred to as hybrid registries. The reason to reflect on the classification of a registry is that the selection of the registry would depend on the objective to create that registry and the perspective of interest, and this could impact the quality standards and expectations that go with it.

Article highlights

  • Research on rare diseases is fraught with challenges, particularly concerning the identification and recruitment of patients for studies investigating drug efficacy/effectiveness and safety.

  • Registries are a source of real-world data (RWD) that are useful to address some of the challenges associated with research on rare diseases.

  • There are at least three types of registries: patient registries, disease registries, and product registries.

  • The main strength of registries is that they are more likely than other RWD sources, such as electronic medical records or administrative databases, to have the correct diagnosis.

  • An important limitation of registries is the burden that they can produce on the healthcare system, particularly for physicians, pharmacists, drug distributors, regulators, and manufacturers, in terms of data entry, data management and monitoring.

  • The role of registries is increasing in importance within a regulatory setting, where they are being leveraged by the European Medicines Authority and the Food and Drug Administration

Declaration of interests

F Pisa is a member of the International Society of Pharmacoepidemiology (ISPE)’s Special Interest Group (SIG) on Rare Diseases and is employed by Bayer. A Arias is a member of the ISPE SIG on Rare Diseases. Emily Bratton is the Chair of the International Society of Pharmacoepidemiology’s Special Interest Group on Rare Diseases, adjunct Professor at the University of North Carolina – Chapel Hill (Epidemiology Department) and is employed by IQIVIA. M Salas is a member of the ISPE SIG on Rare Diseases and is employed by Daiichi Sankyo Inc. J Sultana is Vice-Chair of the ISPE SIG on Rare Diseases and carries out consultancies for the University of Exeter and the University of Cambridge in therapeutic areas not related to rare diseases. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or material discussed in the manuscript.

Reviewer disclosures

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Author contributions

FP conceived the paper. FP, JS, EB, MS, and AA prepared the first draft and made critical revisions.

Acknowledgments

The authors would like to thank Dr Rima Izem for her contribution to the scientific content of the paper.

Additional information

Funding

This paper was not funded.

References