713
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Natural language processing – relevance to patient outcomes and real-world evidence

, &
Pages 5-9 | Received 07 Jul 2023, Accepted 23 Oct 2023, Published online: 30 Oct 2023

Amidst all the recent publicity and debate around the societal risks posed by artificial intelligence, longstanding developments in the field are maturing which could transform healthcare research in more positive ways, in particular enhancing the information that is available for analysis from routine health records.

1. The value/role of routine healthcare data

Research into patient outcomes and real-world evidence requires data from people using health services. The choice at the outset is whether to collect this de novo or to use information that is already routinely collected. The first approach is adopted by clinical cohort studies and trials in which patients are systematically approached, recruited, and examined over the study period with research-grade information collection. Data generated can attain high levels of accuracy, objectivity, and standardization; however, these studies are expensive and can generally only be sustained to answer specific research questions. Furthermore, the recruitment and follow-up processes often result in samples that do not represent their source populations; indeed, it is reasonable to assume that groups who stand to benefit most from health research are least likely to be represented as research participants.

Despite self-evident challenges of information availability, standardization, and accuracy, routine healthcare data capture much more generalizable clinical populations, particularly groups who may be marginalized as a result of their socioeconomic background or the severity of their illness. In addition, these information sources capture ‘real-world’ experiences of healthcare, outside the rarefied and highly standardized environments of clinical trials. Finally, the sample sizes generated from a health record, even over a small geographic area, are likely to be several orders of magnitude greater than what can feasibly be assembled in a recruited research cohort. Therefore, despite important limitations, health records provide real-world evidence through the detailed descriptions of individual-level care episodes, and a longstanding dream has been for a health service that genuinely learns from its records in the same way that its staff learn from their experiences of clinical practice.

2. Opportunities arising from electronic health records

Although there have been attempts to systematize and derive new knowledge from healthcare for almost as long as medicine has existed as a discipline, the potential for a genuinely learning system has leapt forward with the widespread introduction of electronic health records over the last 10–20 years. As healthcare information becomes digital at source, and now that compute capacity has become large enough to handle the volumes of data generated, the size and depth of health records databases can finally be utilized both for knowledge generation and potentially for translation of that knowledge back to source services and the clinical interface. The healthcare that someone receives in 2023 could therefore genuinely be informed by all the recorded experiences of care provided in similar circumstances over the previous decade or more; in turn, that person’s experiences could contribute to improving the care of someone else in 2033.

Clearly, learning healthcare is in its infancy and we are some distance from realizing this potential; however, it remains a potentially achievable scenario at some point in the future. One of the key challenges lies in the dependence of learning algorithms (or indeed of any research study seeking to use routine data sources) on healthcare information that is available in an accessible format. Although valuable information is captured by structured fields in the source record (e.g. prescribing data, blood assays, imaging, and healthcare episodes), many features and occurrences are traditionally recorded in text, and these include much of the data required for effective real-world evidence. For example, a diagnosis may well be represented as a structured entity, quite often made more amenable to analysis through the use of a standard coding system; however, information on the symptoms experienced, their duration, their impact, and any change in their intensity following an intervention, is much more likely to lie in text. The application of natural language processing (NLP) in healthcare data has been an important prerequisite for unlocking potential.

3. Natural language processing (NLP)

NLP is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications [Citation1]. NLP has been in use since the 1950s as an intersection of linguistics and artificial intelligence [Citation2]. While it began as mostly rules-based, following grammatical syntaxes and patterns of regular expressions, statistical approaches to modeling language have come to dominate the field, most recently through the use of artificial neural networks to construct of large language models – models of word occurrence comprising millions or billions of parameters, learned from very large text collections [Citation3]. These advances have largely been driven by the availability of big data, access to more computational power, and innovations in machine learning methods [Citation4,Citation5]. Methods utilizing such big data from EHR text have been increasingly used for downstream tasks such as text classification [Citation6–8], prediction of medical outcomes [Citation9–12], and named entity recognition [Citation13,Citation14], the latter increasingly used to aid clinical coding by linking mentions of medical concepts in text to terminologies, such as the widely adopted SNOMED CT [Citation15,Citation16].

Since electronic health records were not designed for research purposes, it can be challenging to access any information that is stored as free text within these records. NLP can help overcome these challenges by converting clinical documents to analyzable data elements. NLP techniques are generally scalable, adaptable to other similar data, and make the task of data extraction and analysis less time consuming and labor intensive [Citation17].

4. A case study of applied NLP in electronic mental health records – the Maudsley CRIS platform

In two recent reviews of UK healthcare NLP [Citation18,Citation19], mental health predominated as a focus for application. This in part reflects the fact that electronic health records were mandated for many mental health services in the UK earlier than other specialties, but it also reflects the greater need in a specialty where pre-structured data fields are limited in scope and value from its source records (and vary substantially between providers), and where textual case summaries are traditionally much more extensive and detailed. As just one case example, the Clinical Record Interactive Search (CRIS) platform at the National Institute for Health and Care Research (NIHR) Maudsley Biomedical Research Centre has supported researcher access to deidentified electronic mental health records since 2008 [Citation20,Citation21]. In the last 10 years, over 100 NLP algorithms have been developed and applied in research projects, using a variety of techniques including pattern matching, supervised machine learning such as support vector machines, and large language models. These are described in detail individually in an open source online catalog but in summary have been used to ascertain diverse entities including over 60 individual symptoms of severe mental disorders [Citation22], pharmacotherapy (in the absence of a prescribing database) [Citation23], scores for blood assays, measurements (e.g. body mass index [Citation24]), and cognitive assessments [Citation25], illicit drug use [Citation26], co-occurring physical disorders [Citation27], loneliness [Citation28], and experiences of violence [Citation29]. The overarching objective has been to ‘unlock’ hitherto unavailable information covering interventions received (e.g. medication, psychotherapy [Citation30]), clinical indications for these (e.g. symptom profiles), outcomes experienced (e.g. clinical improvement, response to medication, adverse drug events), and other relevant factors predicting prognosis (e.g. social stressors, health, health-related behaviors).

Initially, these NLP algorithms involved creation of bespoke gazetteers (e.g. medications [Citation23], measurements [Citation24]) in combination with rules-based approaches. This was soon followed by the application of machine learning based NLP to rapidly develop algorithms for the extraction of many more targets such as interventions and symptoms [Citation22,Citation31]. More recently, models pre-trained on large quantities of general language text have been used to extract more complex constructs such as social context [Citation29]. Development has also included the infrastructure to manage the operational use of these algorithms at scale and frequency across the 30 million text documents within CRIS, as well as a prototype platform, Mental Health Text Analytics Cloud (MH-TAC) developed in collaboration with the University of Sheffield and the University of Cambridge, allowing this functionality to be made available to other services within National Health Service approved data transfer and processing environments. In addition, combining NLP developed through CRIS with live processing and visualization functionality of CogStack [Citation32], this functionality is beginning to be translated back to the clinical interface via the VIEWER dashboard [Citation33].

5. Expert opinion

Electronic records represent an important example of ‘big data’ in health research and are unprecedented resources, the research implications of which are only beginning to be realized. The novelty here is not in the sample sizes themselves (routine administrative data have provided large samples for decades) but in the depth (granularity) of information available on these populations. Achieving usable granularity in routine information may depend substantially on extracting structure from text fields, and healthcare-applied NLP therefore has at least the potential to become a component underpinning the generation of real-world evidence.

5.1. NLP vs incorporation of standardised structured instruments

Of course, the alternative to applied NLP for enhancing healthcare data availability is to require more structured evaluation at source – for example, through the routine use of standardized assessments and other research instruments (including self-completed online questionnaires). Standardized assessments have the advantage of providing more readily available pre-structured data for analysis, as well as imposing a more consistent assessment approach at the clinical interface. However, there are substantial logistic challenges in imposing widespread structured information collection on busy clinical services (e.g. beyond the incorporation of a few symptom scales), as evidenced in the near-total collapse in the 1980s of the once-prevalent ‘psychiatric case register’ design [Citation34] which relied on this imposition, albeit in pre-digital times. Furthermore, the extent to which structured assessments from routine care can be considered ‘research-grade’ (i.e. equivalent to the same instruments administered in a research study) depends on sustainable training and motivation of staff, and an advantage of NLP is that the meaning of derived meta-data can at least be assessed against the original text, whereas the accuracy of a clinical entity recorded in a checkbox has to be taken at face value. Also, any pre-structured data may be just as subject to recording bias as text-derived data, for example as has been observed when coding incentives have been applied in UK primary care [Citation35], and structured routine data capture is likely to vary more than clinical text between different healthcare providers with different health records interfaces. Standardized assessments can overcome the important limitation of NLP in inferring an entity’s absence (e.g. the non-recording of a symptom in text may be because it was not present or because it was not enquired about); however, this advantage depends on regular repeats of such assessments, as symptom profiles vary over time. Ultimately, data derived from text or structured fields in routine health records should not be seen as competing, but as complementary, resources.

5.2. Challenges in wider NLP implementation

Case studies such as the CRIS platform demonstrate that a text-dominated mental healthcare record can at least be ‘enabled’ via NLP to support a range of research projects which would not otherwise have been possible. However, a limitation here is that NLP development and application to date have tended to be restricted to specific sites with academic interests in this field. There is a need for larger networks to be enabled, which will in turn provide opportunities for evaluating cross-site applicability and for driving standardization in reporting. Although there are technical hurdles to overcome in achieving multi-site implementation, the challenges are primarily logistic and financial (e.g. sustaining services to support this infrastructure) as well as political (reflecting confused healthcare and academic funding landscapes that encourage both collaboration and competition). In addition, availability of clinical text for NLP development has been very variable – in the UK, this has varied both for technical reasons (reflecting different electronic health record providers) and because of access permissions (reflecting different governance/security interpretations by data custodians). Both sources of variation have been major impediments to progress in this field and might be readily solved by concerted national efforts at harmonization.

5.3. Challenges in further NLP development

‘Easy win’ entities for healthcare NLP are likely to be succeeded rapidly by much harder tasks – challenging either because the entity itself is complex (e.g. social support) or because of the diverse wording used to describe or imply it in source text (e.g. suicidality). Methodological progress is therefore essential, as are platforms for developing and evaluating innovations on clinical text with appropriate levels of data security and governance. This is likely to be an exciting and rapidly progressing field for the foreseeable future, as NLP needs in health research are far outstripped by NLP capabilities in other fields. However, advances in healthcare application depend on the availability of clinical text for NLP training and evaluation, which in turn creates governance challenges for some approaches (e.g. commercial large language models currently). It is also important that NLP development keeps in step with an appropriately cautious and methodical approach to application and evaluation. An advantage, for example, with the CRIS platform described above has been the strong (and bi-directional) historic link between NLP development and records-based research, facilitated by the internationally leading academic setting. This environment ensures that NLP development is appropriately targeted, that its applicability is evaluated rapidly in case study projects, and that this feeds into algorithm refinement and continued advance.

5.4. Challenges in the use/application of routine healthcare data

The utility of NLP-derived resources primarily reflects broader limitations of routine healthcare data. Although some patients may be better represented than they would in recruited research studies, there may be important groups under-represented because of limited healthcare access, particularly in settings where healthcare provision is not universal, or universally affordable. Also, prospective research using a geographic catchment-based data resource (such as Maudsley CRIS) may be complicated by in- and out-migration. Measurement accuracy, as discussed, is a key consideration with routine data, as information is not collected for research purposes. A thorough understanding of motivations for health record completion is therefore needed, and appropriate caution taken in interpretations. Text has some advantages over structure in the facility to check and re-check NLP algorithm performance, as well as being accompanied by a clinical incentive for accuracy. Medicolegal or other (e.g. financial) incentives may generate bias, although this may be as problematic for pre-structured as for text-derived data. Furthermore, NLP methods such as word embeddings [Citation36], latent Dirichlet allocation [Citation37] and sentiment analysis [Citation38] have the potential to objectively identify bias within clinical notes. Finally, there are often missing data issues on confounding factors in routine information resources (which may not be soluble through NLP if the confounder is not routinely captured in a clinical record), as well as the wider limitations on causal inference from any observational research.

6. Conclusion

Over the next few years, it is likely that research utility will be matched by improvements in the quantity of record-derived, NLP-generated meta-data presented back at the clinical interface. This is an exciting prospect, and it is also possible that early steps toward a genuine learning healthcare records system might be enabled as a result. Over-arching predictive algorithms, of course, come with their own ethical and practical challenges around clinical application [Citation39] and need substantial methodological progress in development [Citation40], not least to ensure that the above biases in routine information recording are not amplified or thoughtlessly incorporated. As mentioned, the veracity and provenance of source information in a health record (whether structured or text-derived) need careful consideration in any research application. However, the granularity offered by NLP ought to be transformative in model performance, given the level of potentially relevant information that is missing from the structured fields used to date. Of note, if this is to be realized, data architectures will need to be developed that support real-time processing at scale, and the pace of change is likely to depend as much on building the multidisciplinary teams required to develop, apply, and maintain these technologies as on the technical advances themselves.

Declaration of interest

R Stewart declares research support received in the last 3 years from Janssen, GSK, and Takeda, and royalties from Oxford University Press. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Reviewer disclosures

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Additional information

Funding

R Stewart is part-funded by the NIHR Maudsley Biomedical Research Centre at the South London and Maudsley NHS Foundation Trust and King’s College London. R Stewart and A Roberts are part-funded by i) UKRI – Medical Research Council through the DATAMIND HDR UK Mental Health Data Hub (MRC reference: MR/W014386); ii) the UK Prevention Research Partnership (Violence, Health and Society; MR-VO49879/1), an initiative funded by UK Research and Innovation Councils, the Department of Health and Social Care (England) and the UK devolved administrations, and leading health research charities. J Chaturvedi is supported by the KCL-funded Centre for Doctoral Training (CDT) in Data-Driven Health. R Stewart is part-funded by the National Institute for Health Research (NIHR) Applied Research Collaboration South London (NIHR ARC South London) at King’s College Hospital NHS Foundation Trust. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

References

  • Liddy ED. Natural language processing. In: editors, Levine-Clark M, and McDonald J, Eds. Fourth ed. Encyclopedia of library and Information Sciences. Boca Raton FL: CRC Press; 2018. p. 3–4.
  • Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18(5):544–551. doi: 10.1136/amiajnl-2011-000464
  • Laparra E, Mascio A, Velupillai S, et al. A review of recent work in transfer learning and domain adaptation for natural language processing of electronic health records. Yearb Med Inform. 2021;30(1):239–244. doi: 10.1055/s-0041-1726522
  • Malte A, Ratadiya P. Evolution of transfer learning in natural language processing. arXiv Published Online First: 2019. doi: 10.48550/arxiv.1910.07370
  • Jurafsky D, Martin JH. Speech and language processing. 2nd ed. Upper Saddle River, NJ, USA: Prentice-Hall Inc.; 2008.
  • Wang Y, Sohn S, Liu S, et al. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak. 2019;19:1. doi: 10.1186/s12911-018-0723-6
  • Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. Proc 52nd Annual Meeting Assoc Comput Linguist. 2014;1:655–665.
  • Mithun S, Jha AK, Sherkhane UB, et al. Development and validation of deep learning and BERT models for classification of lung cancer radiology reports. IMU. 2023;40:101294. doi: 10.1016/j.imu.2023.101294
  • Choi E, Bahadori MT, Schuetz A, et al. Doctor AI: predicting clinical events via recurrent neural networks. JMLR Workshop Conf Proc. 2016;56:301–318.
  • Suresh H, Hunt N, Johnson A, et al. Clinical intervention prediction and understanding with deep neural networks. Proc 2nd Mach Learn Healthcare Conf. 2017;68:322–337.
  • Zhang XS, Tang F, Dodge HH, et al. MetaPred: meta-learning for clinical risk prediction with limited patient electronic health records. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; New York, NY, USA. Association for Computing Machinery; 2019. p. 2487–2495.
  • Rasmy L, Xiang Y, Xie Z, et al. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ Digit Med. 2021;4:86. doi: 10.1038/s41746-021-00455-y
  • Cho H, Lee H. Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinf. 2019;20(1):735. doi: 10.1186/s12859-019-3321-4
  • Du N, Chen K, Kannan A, et al. Extracting symptoms and their status from clinical conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019: p. 915–925. Association for Computational Linguistics, Florence, Italy
  • Kersloot MG, Lau F, Abu-Hanna A, et al. Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES. J Biomed Semant. 2019;10(1):14. doi: 10.1186/s13326-019-0207-3
  • Peterson KJ, Liu H. Automating the transformation of free-text clinical problems into SNOMED CT expressions. AMIA Jt Summits Transl Sci Proc. 2020;2020:497–506.
  • Kim E, Rubinstein SM, Nead KT, et al. The evolving use of electronic health records (EHR) for research. Semin Radiat Oncol. 2019;29(4):354–361. doi: 10.1016/j.semradonc.2019.05.010
  • Ford E, Curlewis K, Squires E, et al. The potential of research drawing on clinical free text to bring benefits to patients in the United Kingdom: a systematic review of the literature. Front Digit Health. 2021;3:606599. doi: 10.3389/fdgth.2021.606599
  • Wu H, Wang M, Wu J, et al. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. npj Digital Med. 2022;5(1):186. doi: 10.1038/s41746-022-00730-6
  • Stewart R, Soremekun M, Perera G, et al. The South London and Maudsley NHS Foundation Trust Biomedical research Centre (SLAM BRC) case register: development and descriptive data. BMC Psychiatry. 2009;9(1):51. doi: 10.1186/1471-244X-9-51
  • Perera G, Broadbent M, Callard F, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical research Centre (SLaM BRC) case register: current status and recent enhancement of an electronic mental health record-derived data resource. BMJ Open. 2016;6(3):e008721. doi: 10.1136/bmjopen-2015-008721
  • Khapre S, Stewart R, Taylor C. An evaluation of symptom domains in the 2 years before pregnancy as predictors of relapse in the perinatal period in women with severe mental illness. Eur Psychiatry. 2021;64(1):e26. doi: 10.1192/j.eurpsy.2021.18
  • Kadra G, Stewart R, Shetty H, et al. Long-term antipsychotic polypharmacy prescribing in secondary mental health care and the risk of mortality. Acta Psychiatr Scand. 2018;138(2):123–132. doi: 10.1111/acps.12906
  • Chen J, Perera G, Shetty H, et al. Body mass index and mortality in patients with schizophrenia spectrum disorders: a cohort study in a South London catchment area. Gen Psych. 2022;35(5):e100819. doi: 10.1136/gpsych-2022-100819
  • Bishara D, Perera G, Harwood D, et al. Centrally acting anticholinergic drugs used for urinary conditions associated with worse outcomes in dementia. J Am Med Dir Assoc. 2021;22(12):2547–2552. doi: 10.1016/j.jamda.2021.08.011
  • Patel R, Wilson R, Jackson R, et al. Association of cannabis use with hospital admission and antipsychotic treatment failure in first episode psychosis: an observational study. BMJ Open. 2016;6(3):e009888. doi: 10.1136/bmjopen-2015-009888
  • Bendayan R, Kraljevic Z, Shaari S, et al. Mapping multimorbidity in individuals with schizophrenia and bipolar disorders: evidence from the South London and Maudsley NHS Foundation Trust Biomedical research Centre (SLAM BRC) case register. BMJ Open. 2022;12(1):e054414. doi: 10.1136/bmjopen-2021-054414
  • Parmar M, Ma R, Attygalle S, et al. Associations between loneliness and acute hospitalisation outcomes among patients receiving mental healthcare in South London: a retrospective cohort study. Soc Psychiatry Psychiatr Epidemiol. 2022;57(2):397–410. doi: 10.1007/s00127-021-02079-9
  • Botelle R, Bhavsar V, Kadra-Scalzo G, et al. Can natural language processing models extract and classify instances of interpersonal violence in mental healthcare electronic records: an applied evaluative study. BMJ Open. 2022;12(2):e052911. doi: 10.1136/bmjopen-2021-052911
  • Morris RM, Sellwood W, Edge D, et al. Ethnicity and impact on the receipt of cognitive-behavioural therapy in people with psychosis or bipolar disorder: an English cohort study. BMJ Open. 2020;10:e034913. doi: 10.1136/bmjopen-2019-034913
  • Jackson RG, Patel R, Jayatilleke N, et al. Natural language processing to extract symptoms of severe mental illness from clinical text: the clinical record interactive search comprehensive data extraction (CRIS-CODE) project. BMJ Open. 2017;7(1):e012012. doi: 10.1136/bmjopen-2016-012012
  • Jackson R, Kartoglu I, Stringer C, et al. CogStack - experiences of deploying integrated information retrieval and extraction services in a large National health Service Foundation Trust hospital. BMC Med Inform Decis Mak. 2018;18(1):47. doi: 10.1186/s12911-018-0623-9
  • Dr Rob Harland at the KHP Annual Conference 2021 - YouTube. (accessed 2023 Jun 23) https://www.youtube.com/watch?v=k6BcDfAJ0R4&feature=youtu.be
  • Perera G, Soremekun M, Breen G, et al. The psychiatric case register: noble past, challenging present, but exciting future. Br J Psychiatry. 2009;195:191–193. doi: 10.1192/bjp.bp.109.068452
  • Kendrick T, Stuart B, Newell C, et al. Changes in rates of recorded depression in English primary care 2003-2013: time trend analyses of effects of the economic recession, and the GP contract quality outcomes framework (QOF). J Affect Disord. 2015;180:68–78. doi: 10.1016/j.jad.2015.03.040
  • Garg N, Schiebinger L, Jurafsky D, et al. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc Natl Acad Sci, USA. 2018;115:E3635–44. doi: 10.1073/pnas.1720347115
  • Don’t Walk OJ B, Reyes Nieva H, Lee S-J, et al. A scoping review of ethics considerations in clinical natural language processing. JAMIA Open. 2022;5:ooac039. doi: 10.1093/jamiaopen/ooac039
  • Weissman GE, Ungar LH, Harhay MO, et al. Construct validity of six sentiment analysis methods in the text of encounter notes of patients with critical illness. J Biomed Informat. 2019;89:114–121. doi: 10.1016/j.jbi.2018.12.001
  • Shah ND, Steyerberg EW, Kent DM. Big data and predictive analytics: recalibrating expectations. JAMA. 2018;320(1):27–28. doi: 10.1001/jama.2018.5602
  • Navarro CLA, Damen JAA, Takada T, et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ. 2021;375:n2281. doi: 10.1136/bmj.n2281

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.