ABSTRACT
Introduction
Knowledge graphs have proven to be promising systems of information storage and retrieval. Due to the recent explosion of heterogeneous multimodal data sources generated in the biomedical domain, and an industry shift toward a systems biology approach, knowledge graphs have emerged as attractive methods of data storage and hypothesis generation.
Areas covered
In this review, the author summarizes the applications of knowledge graphs in drug discovery. They evaluate their utility; differentiating between academic exercises in graph theory, and useful tools to derive novel insights, highlighting target identification and drug repurposing as two areas showing particular promise. They provide a case study on COVID-19, summarizing the research that used knowledge graphs to identify repurposable drug candidates. They describe the dangers of degree and literature bias, and discuss mitigation strategies.
Expert opinion
Whilst knowledge graphs and graph-based machine learning have certainly shown promise, they remain relatively immature technologies. Many popular link prediction algorithms fail to address strong biases in biomedical data, and only highlight biological associations, failing to model causal relationships in complex dynamic biological systems. These problems need to be addressed before knowledge graphs reach their true potential in drug discovery.
Article highlights
• Knowledge graphs provide an elegant solution to the ’data problem’ in the pharmaceutical industry, integrating and harmonizing the ever-growing number of multimodal data sources.
• Representing biological systems as knowledge graphs has allowed for the exploitation of graph theory and powerful graph machine learning methodologies, well suited to the target-based systems biology approach to drug discovery.
• The most common application of knowledge graphs in the pharmaceutical industry is in early stage drug discovery and repurposing, particularly in identification of pathogenic genes and drug targets.
• Biomedical knowledge graphs have yielded noteworthy repurposing candidates for COVID-19 directly leading to clinical validation and emergency use authorization.
• The predictive power in many graph machine learning techniques comes mainly from connectivity, and not network proximity, introducing a significant bias in link prediction tasks.
• This connectivity bias is further exacerbated when training on literature-derived knowledge graphs whose degree distribution diverges from that of the underlying biological system. Mitigation strategies are needed.
This box summarizes key points contained in the article.
Acknowledgments
The author would like to express their gratitude to Delphine Rolando, Rachel Hodos and Dane Corneil. Their expertise in drug discovery, graph machine learning, and knowledge graphs was instrumental in writing this review. Lastly, we thank Daniel Miskell for his insight over the years.
Reviewer disclosures
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.
Declaration of interest
F MacLean is a full-time employee of BenevolentAI. The author has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.