370
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Phytochemical drug discovery for COVID-19 using high-resolution computational docking and machine learning assisted binder prediction

, , , , &
Pages 6643-6663 | Received 31 Mar 2022, Accepted 31 Jul 2022, Published online: 22 Aug 2022
 

Abstract

The COVID-19 pandemic has resulted in millions of deaths around the world. Multiple vaccines are in use, but there are many underserved locations that do not have adequate access to them. Variants may emerge that are highly resistant to existing vaccines, and therefore cheap and readily obtainable therapeutics are needed. Phytochemicals, or plant chemicals, can possibly be such therapeutics. Phytochemicals can be used in a polypharmacological approach, where multiple viral proteins are inhibited and escape mutations are made less likely. Finding the right phytochemicals for viral protein inhibition is challenging, but in-silico screening methods can make this a more tractable problem. In this study, we screen a wide range of natural drug products against a comprehensive set of SARS-CoV-2 proteins using a high-resolution computational workflow. This workflow consists of a structure-based virtual screening (SBVS), where an initial phytochemical library was docked against all selected protein structures. Subsequently, ligand-based virtual screening (LBVS) was employed, where chemical features of 34 lead compounds obtained from the SBVS were used to predict 53 lead compounds from a larger phytochemical library via supervised learning. A computational docking validation of the 53 predicted leads obtained from LBVS revealed that 28 of them elicit strong binding interactions with SARS-CoV-2 proteins. Thus, the inclusion of LBVS resulted in a 4-fold increase in the lead discovery rate. Of the total 62 leads, 18 showed promising pharmacokinetic properties in a computational ADME screening. Collectively, this study demonstrates the advantage of incorporating machine learning elements into a virtual screening workflow.

Communicated by Ramaswamy H. Sarma

Acknowledgements

We thank Andrew Bruno of the University at Buffalo for the molecular fingerprint algorithm we used (The ECFPs algorithm is available at https://github.com/ubccr/pinky), and Dr. Yong-hui Zheng (Department of Microbiology and Molecular Genetics at MSU) for providing the LBVS compound list. We thank the Institute for Cyber Enabled Research (ICER) at Michigan State University for technical help and computational resources.

Data availability statement

For the molecular docking, Rosetta 3.12 was used which can be obtained for free with an academic license (https://www.rosettacommons.org/software/license-and-download). Rosetta 3.12 was installed onto a cluster maintained by the Michigan State University Institute for Cyber Enabled Research. Docking jobs on this cluster were submitted using the Slurm workload manager. The CIDs, names, and SMILES of the 272 phytochemicals initially used in SBVS are available in the supporting files in a spreadsheet titled ‘Ligand_Library_Key_SBVS’. The SMILES and names of all the additional compounds screened through LBVS are available in a spreadsheet titled ‘AdditionalLibraryForLBVS’. The complete SwissADME data for the 62 lead compounds is available in the spreadsheet titled ‘SwissADMEfinalresult’. The BCL:Conf ligand conformer generator was installed alongside Rosetta 3.12 on the cluster, and it was obtained for free with an academic license from http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1. OpenBabel was obtained for free from http://openbabel.org/wiki/Category:Installation. Various python scripts were used to generate plots, process docking input files and generate docking jobs on the cluster, and they are all available at (https://github.com/ziruiwang1996/ligand_protein_docking). Matplotlib was also used for plotting data. R and the statistics module in Python were used for calculating sample means, covariances, correlations, and all other statistical parameters. Other files containing raw docking data, components for the LBVS algorithm, and PDB files of all the lead compounds docked against specific proteins are accessible via a link present in a README.md document located at the GitHub site linked previously. These other files are all inside a Google Drive folder titled ‘data’, which is accessed by clicking the link in the README file. Additional score function testing data is available at https://ziruiw.shinyapps.io/score_functions_on_sarscov2/.

Disclosure statement

The authors declare no competing interests.

Funding

Daniel R. Woldring, Michigan State University.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,074.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.