Abstract
The COVID-19 pandemic has resulted in millions of deaths around the world. Multiple vaccines are in use, but there are many underserved locations that do not have adequate access to them. Variants may emerge that are highly resistant to existing vaccines, and therefore cheap and readily obtainable therapeutics are needed. Phytochemicals, or plant chemicals, can possibly be such therapeutics. Phytochemicals can be used in a polypharmacological approach, where multiple viral proteins are inhibited and escape mutations are made less likely. Finding the right phytochemicals for viral protein inhibition is challenging, but in-silico screening methods can make this a more tractable problem. In this study, we screen a wide range of natural drug products against a comprehensive set of SARS-CoV-2 proteins using a high-resolution computational workflow. This workflow consists of a structure-based virtual screening (SBVS), where an initial phytochemical library was docked against all selected protein structures. Subsequently, ligand-based virtual screening (LBVS) was employed, where chemical features of 34 lead compounds obtained from the SBVS were used to predict 53 lead compounds from a larger phytochemical library via supervised learning. A computational docking validation of the 53 predicted leads obtained from LBVS revealed that 28 of them elicit strong binding interactions with SARS-CoV-2 proteins. Thus, the inclusion of LBVS resulted in a 4-fold increase in the lead discovery rate. Of the total 62 leads, 18 showed promising pharmacokinetic properties in a computational ADME screening. Collectively, this study demonstrates the advantage of incorporating machine learning elements into a virtual screening workflow.
Communicated by Ramaswamy H. Sarma
Acknowledgements
We thank Andrew Bruno of the University at Buffalo for the molecular fingerprint algorithm we used (The ECFPs algorithm is available at https://github.com/ubccr/pinky), and Dr. Yong-hui Zheng (Department of Microbiology and Molecular Genetics at MSU) for providing the LBVS compound list. We thank the Institute for Cyber Enabled Research (ICER) at Michigan State University for technical help and computational resources.
Data availability statement
For the molecular docking, Rosetta 3.12 was used which can be obtained for free with an academic license (https://www.rosettacommons.org/software/license-and-download). Rosetta 3.12 was installed onto a cluster maintained by the Michigan State University Institute for Cyber Enabled Research. Docking jobs on this cluster were submitted using the Slurm workload manager. The CIDs, names, and SMILES of the 272 phytochemicals initially used in SBVS are available in the supporting files in a spreadsheet titled ‘Ligand_Library_Key_SBVS’. The SMILES and names of all the additional compounds screened through LBVS are available in a spreadsheet titled ‘AdditionalLibraryForLBVS’. The complete SwissADME data for the 62 lead compounds is available in the spreadsheet titled ‘SwissADMEfinalresult’. The BCL:Conf ligand conformer generator was installed alongside Rosetta 3.12 on the cluster, and it was obtained for free with an academic license from http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1. OpenBabel was obtained for free from http://openbabel.org/wiki/Category:Installation. Various python scripts were used to generate plots, process docking input files and generate docking jobs on the cluster, and they are all available at (https://github.com/ziruiwang1996/ligand_protein_docking). Matplotlib was also used for plotting data. R and the statistics module in Python were used for calculating sample means, covariances, correlations, and all other statistical parameters. Other files containing raw docking data, components for the LBVS algorithm, and PDB files of all the lead compounds docked against specific proteins are accessible via a link present in a README.md document located at the GitHub site linked previously. These other files are all inside a Google Drive folder titled ‘data’, which is accessed by clicking the link in the README file. Additional score function testing data is available at https://ziruiw.shinyapps.io/score_functions_on_sarscov2/.
Disclosure statement
The authors declare no competing interests.
Funding
Daniel R. Woldring, Michigan State University.
Correction Statement
This article has been republished with minor changes. These changes do not impact the academic content of the article.