447
Views
11
CrossRef citations to date
0
Altmetric
Review

Harnessing Real-World Data for Regulatory Use and Applying Innovative Applications

ORCID Icon, ORCID Icon, , , , & ORCID Icon show all
Pages 671-679 | Published online: 22 Jul 2020

Abstract

A vast quantity of real-world data (RWD) are available to healthcare researchers. Such data come from diverse sources such as electronic health records, insurance claims and billing activity, product and disease registries, medical devices used in the home, and applications on mobile devices. The analysis of RWD produces real-world evidence (RWE), which is clinical evidence that provides information about usage and potential benefits or risks of a drug. This review defines and explains RWD, and it also details how regulatory authorities are using RWD and RWE. The main challenges in harnessing RWD include collating and analyzing numerous disparate types or categories of available information including both structured (eg, field entries) and unstructured (eg, doctor notes, discharge summaries, social media posts) data. Although the use of artificial intelligence to capture, amalgamate, standardize, and analyze RWD is still evolving, it has the potential to support the increased use of RWE to improve global health and healthcare.

Introduction

In the healthcare industry, real-world data (RWD) comprise data regarding patient health and/or healthcare delivery, obtained outside of randomized, controlled trials (RCTs).Citation1,Citation2 RWD may be derived from electronic health records (EHRs), insurance claims and billing activity, product and disease registries, medical devices used in the home, and applications (apps) on mobile devices (). The nature and sources of RWD provide the potential for large volumes of data to be collected, often in near-real time (eg, data streamed from mobile devices), that can provide information at granular or large-scale level. For example, the MyHeart Counts Cardiovascular Health Study, which collected data from 48,968 individuals, demonstrated that a study using smartphones to measure physical activity and cardiovascular health is feasible and provides a means of collecting large amounts of data in real time.Citation3 The analysis of RWD produces real-world evidence (RWE), which is clinical evidence that provides information about the usage and potential benefits or risks of a drug product.Citation4

Figure 1 From real-world data to real-world evidence, key capabilities such as health economics outcomes research, real world evidence and population health.

Abbreviations: FDA, US Food and Drug Administration; WHO, World Health Organization.

Figure 1 From real-world data to real-world evidence, key capabilities such as health economics outcomes research, real world evidence and population health.Abbreviations: FDA, US Food and Drug Administration; WHO, World Health Organization.

RWE may be generated from different study designs or analyses, including randomized (non-controlled) trials, large simple trials, pragmatic trials (ie, trials that incorporate elements of routine clinical practice), and prospective or retrospective observational studies.Citation5 RWE can be derived from a variety of sources. Therefore, its use provides an opportunity to combine diverse datasets to provide a broader understanding than might be available through RCTs.Citation2

In order to obtain reliable and useful evidence, RWD must be collated and analyzed using accurate and robust algorithms tailored to the specific contexts. The responsible use and application of RWD requires addressing issues around data standardization, integration, quality control, access, privacy, and security. For example, RWD may be available from multiple repositories and registries, with each source storing data in separate clouds (ie, siloed) and/or in different languages.

Data from multiple sources may include disparate types or categories of information and may include both structured (eg, field entries) and unstructured (eg, doctor notes, discharge summaries, social media posts) information.Citation6 Data aggregation and translation are, therefore, necessary for analyses to be performed on such data, and therefore database or repository systems must be interoperable and varied data formats must be translated and integrated. Data analysis must also take into account the level of data accuracy (eg, correct identification of all relevant patients, absence of data entry errors), miscoding of claims, and variations across health systems in data coding practices.Citation7 To leverage data and gain actionable insights, artificial intelligence (AI) capabilities such as machine learning, deep learning, and other algorithms can be useful.

This review article serves as an introduction to readers who would like to learn more about RWE and the potential for AI to aid the capture and analysis of RWD. Additional information and details on how RWE can provide key information on “the safety and effectiveness of a medication in large, heterogeneous populations” is provided in an earlier review.Citation8

RWE for Regulatory Purposes

Across the globe, regulatory agencies and pharmaceutical companies increasingly embrace RWE and AI for its ability to support a more comprehensive picture of health.Citation9 In the United States, the Food and Drug Administration (FDA) 21st Century Cures Act, passed in 2016,Citation4 and the Prescription Drug User Fee Act VI (PDUFA VI)Citation10 have both expanded the opportunities for the use of RWE in the healthcare and pharmaceutical industries.

Although data from RCTs and the synthesis of these data in systematic reviews and meta-analyses remain the gold standard for evidence, due to the restrictions of trial design and patient inclusion criteria, the data produced in such trials are not necessarily applicable to real-world settings.Citation2 Also, RWE has the potential to help identify rare adverse events.Citation11 Therefore, the FDA does consider RWD in its evaluation of medical product safety, and occasionally to inform decisions about efficacy.Citation12 In one case, RWD were also used in Japan to support regulatory decision-making around lorazepam when a small clinical trial failed to meet the primary endpoint due to low enrollment, yet the secondary endpoint and real-world usage supported use of the drug in the target population.Citation13 However, concerns remain about the use of RWE in studies in which the effectiveness (or safety signals) of a treatment are small or negligible.Citation11

There has been a relatively long historical precedent for using RWD for safety purposes,Citation14 while the FDA has more recently established a framework for the use of RWD for generating evidence of effectiveness. A set of guidance documents has been developed by the agency, listed chronologically below:

  • Use of RWE to support regulatory decision-making for medical devices (August 2017)Citation15

  • Use of EHR data in clinical investigations: Guidance for Industry (July 2018)Citation16

  • Framework for FDA’s real-world evidence program (December 2018)Citation17

  • Submitting documents using RWD and RWE to FDA for drugs and biologics: Guidance for Industry (May 2019).Citation18

Additional perspectives outside of the United States include:

  • Regulatory perspective on RWE in scientific advice (European Medicines Agency; April 2018)Citation19

  • Key considerations in using RWE to support drug development (China Center for Drug Evaluation; May 2019).Citation20

From a regulatory perspective, RWE may now be used to support label expansion (including new indications, dosages, and patient sub-populations), to assess the feasibility of Phase IV studies, to provide evidence for products in an expedited approval pathway, to provide a historical control arm for clinical studies, and in pragmatic trials.Citation5 Indeed, there is a growing list of drug approvals and labelling changes that have utilized RWE and data from non-traditional study designs in their regulatory submissions.Citation21 For example, an expansion in the indication of palbociclib was supported by RWD from EHR and insurance claims.Citation21,Citation22 Such changes are the result of ever-greater amounts of health-related data being stored by computers, mobile and wearable devices, biosensors, and medical devices.Citation1 A survey conducted in 2017 among life science companies showed that more than half (54%) of respondents were providing significant investment to increase their RWE programs,Citation23 thus showing how the use of RWE will likely grow in the near future. Subsequent surveys have demonstrated that 90% of respondents from pharma companies are looking to build their RWE capabilities across all of the product life cycle, through preclinical, clinical, approval, and post-approval stages. However, less than half (45%) of pharma companies felt that they had these sort of end-to-end RWE capabilities currently.Citation24 The most recent survey (conducted in January and February 2020) revealed that 94% of respondents believe that the use of RWE in research and development will be important or very important to their organization in the next three years. The importance of RWE was thought likely to move from post-approval value demonstration to research and development, notably in supporting regulatory filings and augmenting clinical trials.Citation25 The opportunity to harness RWD for regulatory purposes will further grow as AI experts refine the necessary software and statisticians develop new algorithms and analytical skills that refine the methods of data analysis.

For RWE to be efficiently used for regulatory purposes, certain conditions must be in place and standards met. For example, standards must be established regarding acceptable data quality, sources, and analytical methods. A clearer understanding is also needed around the ways in which RWE can be used to inform about alternative dosing schedules, benefits for patient subpopulations, and targeted/precision medicine strategies. From a practical perspective, basic infrastructure must be in place to collect RWD in any countries that would use RWE. This is particularly important in developing countries, in which EHRs or claims information may not be widely used or where patient registries collect disparate information. Similarly, trials conducted in real-world settings require sufficient patient participation to generate meaningful RWD. Where RWD are available, advanced, validated analytical methodologies are also needed to generate accurate and informative RWE. The use of standardized methods for both collecting and analyzing data will improve reproducibility and comparability of real-world studies. The drive for more standardization has been led by organizations such as the Observational Health Data Sciences and Informatics group (OHDSI).Citation26 The OHDSI has produced a common-data model, which is used to encode and store clinical data in a standardized manner. Use of common-data models allows the same question to be addressed consistently across different databases and countries, thus making large-scale international observational research feasible.Citation27 Finally, the findings of RWE analyses and related information must be communicated in a credible and comprehensible way to patients, the medical community, and any other stakeholders (eg, policymakers).

How Can Artificial Intelligence Be Used to Generate RWE?

To obtain high-quality RWE that is acceptable for regulatory use and other decision-making applications, RWD must be collected in a large quantity, collated into an analyzable format, and a certain level of data accuracy and reliability ensured. To support these efforts, AI capabilities are increasingly being applied to the analysis of RWD.Citation24,Citation28 Furthermore, in a 2018 survey, 60% of pharma industry respondents were using AI in their RWE programs and 95% anticipated utilizing AI for this purpose in the coming years.Citation24 AI constitutes a combination of self-learning capabilities that mimic the way the human brain works, with the intent of replicating human decision-making and interactions.

Natural language processing (NLP), machine learning, and robotic process automation (RPA) are three AI capabilities that are particularly applicable to the generation of RWE for the healthcare industry.Citation6 Natural language processing—the computerized interpretation and organization of human language—includes text classification, recognition of syntax, interpretation of word meaning based on location in a sentence, and language translation.Citation9 Machine learning, which includes a variety of predictive statistical and mathematical modeling techniques (),Citation9,Citation29 is often layered onto NLP to reinterpret and correct initial assumptions following repeated usage.Citation30 RPA involves software that automates repeated tasks, helping to speed the processes at the same time as reducing human error and oversight.Citation31,Citation32 RPA software could also be used to capture, interpret, and process patient-level data, as well as trigger responses and communicate with other digital systems.Citation33

Table 1 Ten Machine Learning Algorithm TechniquesCitation29

In the context of analyzing RWD to generate RWE, RPA can be used to extract large amounts of data from EHRs, insurance claims data, apps, and social media. However, much RWD is stored in unstructured formats, from which the relevant data must be interpreted and stored in a new, structured format.Citation6 The potential for AI tools to extract and translate large amounts of unstructured RWD for RWE generation is huge. To achieve this, RPA can be combined with natural language processing and machine learning to produce more uniform datasets containing only the information relevant to the analyses. For example, RPA might be employed to access clinical narratives in EHRs. However, because clinical narratives contain unstructured data, an NLP approach could be used to extract relevant information and translate the text (including physician shorthand) into structured field entries.Citation34 Machine learning algorithms and RPA might then be used to rapidly analyze the data from multiple perspectives. Applying a similar approach across a variety of data sources and therapeutic areas, previously unseen trends may emerge and new insights may be gained into aspects of patient health, drug product usage and safety, and healthcare resource utilization.

We are not aware of scenarios or studies in which all three elements (RPA, NLP, and machine learning algorithms) have been used to analyze RWE. However, the individual elements are already being incorporated in research settings. For example, a recent study of myopia in Chinese school-aged children provides an example of machine learning applied to RWD.Citation35 In this study, RWD were collected from electronic medical records across eight ophthalmic centers. The random forest method of machine learning was used to develop an algorithm that would predict myopia among children as young as 3 years of age. The algorithm was refined to improve accuracy, weeding out unnecessary data and resolving heterogeneous data, and then internally and externally validated. A separate study used hospital episode statistics from England’s National Health Service to develop a predictive machine learning algorithm to screen patients at risk of idiopathic pulmonary arterial hypertension.Citation36 In that study, a multidisciplinary team that included both clinicians and AI experts collaborated to develop the algorithm throughout numerous testing and validation stages, ultimately achieving 99.99% specificity and 14.10% sensitivity. In examples such as these, the use of machine learning allows, in the investigators’ words, the combination of “enormous numbers of predictors in a non-linear and highly interactive way”.Citation35 However, before machine learning approaches can be relied upon to produce accurate RWE, the algorithms must be tested, refined, and retested in real-world situations to ensure high accuracy.Citation37

Challenges with Using AI and RWD

Although AI shows promise for application in the healthcare industry, the diverse, complex, and observational nature of RWD presents challenges for data analysis.Citation9 For example, claims data are typically generated for insurance billing purposes, not adjudicated in terms of data quality, and medical errors can exist. Refinement, or training, of AI algorithms is essential to improving the accuracy of RWD analysis, and both studies described above focused on this aspect of the methodology. However, researchers must first ensure that the data obtained are complete and relevant to the condition, patient population, and treatment analyzed.Citation28 For example, unstructured data may contain relevant information for only certain sub-populations or information may be entered for some patients but not others. Even structured data pose challenges in the application of AI to RWD, as field-entry data may be entered using inconsistent terms, may be formatted differently between sources, or may be incomplete or contain errors.Citation7 Any of these situations might lead to inaccuracy in the analyses and/or data being rejected by the algorithms.

Low data recall (incorrect inclusion or exclusion of data from an analysis) also appears to pose a great challenge to the use of AI in RWD analysis. Algorithms that are not optimized might identify “false-positive” and “false-negative” data, for example by including or excluding patients in analysis who do not actually meet the intended criteria.Citation28 A recent retrospective study of EHR data for patients with cardiovascular disease illustrates the difficulty that analysts encounter in algorithm development. The study results suggested that, despite the researchers taking various approaches toward generating the algorithms, the final model did not accurately analyze structured data and medication information was not sufficiently captured from unstructured data.Citation28 In addition, analysts must consider that data and relevant variables may change over time, and their algorithms may also need to evolve to take this into account. To circumvent these challenges, AI algorithms must be continually refined and their limitations understood by those who interpret the resulting RWE. Initial attempts to elevate the quality of analysis might include combining multiple data sources, the consideration of doctor shorthand in language processing, and systematic comparisons of machine learning algorithms within or across therapeutic areas.Citation34,Citation35,Citation37,Citation38

In some countries, data aggregation may pose an equally significant challenge. Legal barriers around data privacy, eg, European Union General Data Protection Regulation (GDPR),Citation39 practical barriers related to data storage across multiple organizations (ie, data silos), and economic barriers involving lack of incentives for organizations to collaborate and share data, all affect the availability of RWD to which AI tools can be applied.Citation9 Additional challenges are likely to be encountered in this fast-moving field, as new methods, applications, and learning algorithms are utilized to capture and evaluate RWD. If challenges like these are successfully met and large-volume RWD are shown to provide sufficiently accurate and comprehensive RWE, the application of AI to RWD has the potential to shorten the timeline for clinical trial design and regulatory approval, and to uncover patterns in large sets of data that would otherwise not be observed.Citation40

Collaborations and Partnerships to Harness RWD

With the growing understanding of the utility of RWE, pharmaceutical researchers have begun to partner with academia and government with the intent of advancing the use of RWD in addressing major healthcare issues.

For example, the Statistical Partnerships Among Academe, Industry, and Government Award of the American Statistical Association was established in 2002 to recognize outstanding partnerships between academia, industry, and government organizations, as well as to promote new partnerships among these organizations.Citation41 The award recognizes outstanding collaborations between organizations, while identifying key individual contributors. A number of companies and universities have formed fruitful public-private partnerships in statistics, analytics, and data science.Citation41 In addition, 80% of respondents from the pharmaceutical industry who completed a survey in early 2020, reported that they are entering into strategic partnerships to utilize new sources of RWD.Citation25 Furthermore, some of the studies outlined in this review have involved partnerships between pharmaceutical and academic researchers.Citation28,Citation36

In order for these partnerships to be successful, all parties must view the collaboration as a long-term relationship and work together to evolve ideas from the stage of problem identification to that of solution implementation. In the context of applying AI to RWE generation, the process might involve developing innovative statistical methodology to solve the novel statistical problems that emerge in each unique context. For example, data-driven partnerships can play an important role in improving the management of diseases in patients across diverse geographical settings.Citation42,Citation43

AI capabilities such as RPA, NLP, and machine learning will necessarily be employed to facilitate the analyses of data obtained from any studies conducted through these partnerships, with careful attention paid to the technologic and statistical methods used.

The Future of AI for RWE

Although RWE has long been used to aid the understanding of patients, health conditions, and healthcare resource usage, its use in a regulatory capacity is in its infancy. Both those who generate RWE and those who interpret and use it in a practical sense must keep in mind the limitations of the source data and analytical approaches used.Citation2 For this reason—similar to data obtained from RCTs—transparency of methodology and the use of methodological best practices will be essential.Citation9 All parties will benefit from a clear understanding of the analytical methodology and to what extent conclusions can be drawn. Because of the ready availability (and relatively low cost) of RWD relative to data from RCTs, RWE may be increasingly relied upon by both the pharmaceutical industry and regulatory agencies to inform decision-makers on the clinical value and benefit of interventions.Citation2 However, data access, privacy, and security issues will be key needs and must be addressed with each new RWD analysis. Data quality and bias will also remain key issues for interpreting the generated RWE, and potential confounding factors must be communicated.Citation12,Citation44 For regulatory purposes, early engagement with regulators will support subsequent efforts to obtain and analyze RWD. Finally, in an era of digital innovation, AI will enable extensive collection, aggregation, analyses, and interpretations to generate RWE.Citation24,Citation45

Conclusions

In practical terms, the use of AI in an era of “Big Data” and RWD is still evolving but has great potential to support the increased use of RWE to improve global health and healthcare (). Again, the limitations of AI software and methodology will need to be considered while continual efforts are made to improve the capabilities of the AI tools. Cross-disciplinary expertise will also be necessary to ensure that the software and subsequent refinements are tailored to each analytical approach. A focus must be given to the careful application and evolution of AI technology in the context of RWD. If this is done, researchers, regulators, and other stakeholders may benefit by more clearly understanding patterns in patient treatment and behavior, disease progression, and resource usage. This will support the ultimate goal of influencing positive patient outcomes in real time.

Figure 2 Benefits achieved with real-world evidence and AI.

Abbreviation: AI, artificial intelligence.

Figure 2 Benefits achieved with real-world evidence and AI.Abbreviation: AI, artificial intelligence.

Disclosure

Kelly H Zou, Jim Z Li, Joseph Imperato, Chandrashekhar N Potkar, Nikuj Sethi, and Amrit Ray are employees of the Upjohn Division, Pfizer Inc. Jon Edwards is an employee of Envision Pharma Group. Envision Pharma Group has a consultancy agreement with Pfizer Inc. The views expressed are their own and do not necessarily represent those of their employers. The authors report no other conflicts of interest in this work.

Additional information

Funding

This work was funded by the Upjohn Division, Pfizer Inc.

References