Search in:

Future Medicinal Chemistry Volume 8, 2016 - Issue 15

Submit an article Journal homepage

Open access

2,785

Views

CrossRef citations to date

Altmetric

Listen

Editorial

Does ‘Big Data’ Exist in Medicinal Chemistry, and If So, How can It Be Harnessed?

Igor V Tetko1 Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764Neuherberg, Germany;2 BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764Neuherberg, GermanyCorrespondence[email protected]
View further author information

Ola Engkvist3 Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, SwedenView further author information

Hongming Chen3 Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, SwedenView further author information

Pages 1801-1806 | Received 01 Aug 2016, Accepted 12 Aug 2016, Published online: 15 Sep 2016

Cite this article
https://doi.org/10.4155/fmc-2016-0163
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

Keywords::

applicability domain
Big Data
chemoinformatics
education in chemistry and informatics
local and global models
multitask learning
neural networks
virtual chemical spaces

The term ‘Big Data’ has gained increasing popularity within the chemistry field and across science broadly in recent years [Citation1]. Chemical databases have seen a dramatic growth over the past decade, with, for example, ChEMBL, REAXYS and PubChem providing hundreds of millions of experimental facts for tens of millions of compounds [Citation1]. Moreover, even larger datasets of experimental measurements are held within in-house data collections at pharma companies [Citation2]. Overall, the total number of entries across these databases is in the range of a billion, 10⁹; however, although this number may seem impressive, it pales into comparison relative to other fields [Citation3], where the amount of data is frequently measured in exabytes, 10¹⁸. Thus, does Big Data really exist within the chemistry field? What are such data within medicinal chemistry specifically and where do the challenges lie in analysis of these data? Big Data refer to data out of the scale of traditional applications, which require efforts beyond the traditional analysis [Citation1]. In this article, we will be discussing how it applies to medicinal chemistry, as well as providing an overview of some of the most important trends in the medicinal chemistry–Big Data field.

Does Big Data exist in medicinal chemistry?

A dataset could be classified as ‘big’ if technical resources (speed, memory) are not capable of analyzing the data, using existing methods. Big Data in a field like analysis of particle collision at CERN [Citation3] is driven by physical challenges (hardware, computer speed and physical computer memory required to store and analyze such data), which may be addressed by the development of new and more advanced software.

Medicinal chemistry related data are created and curated in pharmaceutical industry via high-throughput screening (HTS) and drug discovery campaigns and additionally also available in databases sourced from scientific journals, patents etc. For example, AstraZeneca in-house screening database contains over 150 million structure–activity relationship (SAR) data point [Citation2]. The HTS data from pharma companies are usually very sparse and for each screened target there is only a small number of active hits. Further developments are done with a relatively small series of compounds, usually varying from hundreds to thousands of compounds for those series. Specialists who work on these target specific data do not have Big Data in their daily work; traditional modeling algorithm is well enough to handle their datasets.

When the focus is on chemogenomics data, the situation is different. The biggest medicinal chemistry data reservoir, PubChem, currently comprise 91 million chemical structures and 230 million bioactivity data points corresponding to over 10K protein targets. The total data size is around 60GBs [Citation4], which is considered ‘big’ in medicinal chemistry terms, but is in fact still well below the terabyte or even petabyte data comprised by databases such as eBay [Citation5] and Amazon [Citation6]. However, if chemical descriptors (such as structural fingerprints) of compounds are generated for this level of dataset, the total data size will probably be in the realms of the conventional Big Data size. For each specific protein target, the available SAR data will be much less and it would be in the range of hundreds of thousands of data points, or up to a few million data points if inactive compounds from an HTS were taken into account. For building single-target quantitative SAR (QSAR) models, the traditional machine learning algorithms will still be capable of handling this magnitude of data [Citation7]. But, if one wants to use all available chemogenomics data (in databases like PubChem, ChEMBL etc.) to pursue multitask learning (see below) and build one multilabel model to predict multiple target activity simultaneously, traditional algorithms used in chemoinformatics are unlikely to work and it would require huge computer power and a dedicated parallel programming model to solve the problem.

Another big challenge in medicinal chemistry, where Big Data can have an impact, is related to the question of which molecule to synthesize next in a drug discovery project. To identify the optimal candidate for synthesis, large virtual chemical spaces need to be explored, which clearly is a Big Data-related problem. So return to our initial question, we can conclude that Big Data does exist in medicinal chemistry and there are a number of challenges associated with this, depending on which aspect of the field is under focus.

Is Big Data really useful for prediction?

Let us now consider an example of how data size can make a difference in property predictions. In 2014 IVT published a melting point (MP) model based on approximately 50k compounds [Citation8], which was succeeded by approximately 275k compounds [Citation9] model in 2016. The ‘large’ set of 50 k compounds was processed by On-line Chemical Database and Modeling Environment (OCHEM) [Citation10,Citation11] using the same approaches applied in multiple previous studies. The latter set was considered Big Data since we could not use the previous tools without changes. Among other things, we had to implement parallel calculations using a support vector machines method, solve problems of storage of very large data matrices using sparse format (including calculations with matrices incorporating >0.2 trillion entries), as well as account for several other technical challenges. Were the results worth our efforts? Yes. The model developed with Big Data was more accurate and calculated, for example, the lowest published error for Bergström set of drugs [Citation12]. Moreover, prediction of water solubility using Yalkowsky's General Solubility Equation, which is based on MP and octanol/water partition coefficient (logP) [Citation13], was also significantly improved compared with using the model developed with 50k [Citation9]. Considering that the Big Data set mainly contained data automatically mined from patent literature [Citation9], it also proved feasibility and success of a developed fully automatic data extraction technology.

Multitask learning & deep learning for Big Data

Data collection is always a challenging task. The measurements of even some simple properties, such as solubility in water, can be difficult, time consuming and error prone [Citation14]. However, many physicochemical and biological properties are strongly interrelated, for example, the water solubility depends on logP and MP as shown by the General Solubility Equation [Citation13]. One strategy could be to explore these relationships by developing models for several properties simultaneously [Citation15]. This multitask learning concept is especially attractive for building polypharmacology prediction models, since many protein targets are interrelated due to the intrinsic similarity in sequence or interaction pattern of binding pockets. One recent study shows that massive multitask networks obtain predictive accuracies significantly better than single-task methods when the different outcomes are related to each other [Citation16]. Deep learning methods are also thought to help in addressing this issue (see [Citation17,Citation18] for a review of these approaches in drug discovery). These new and promising approaches have already been used to achieve superhuman accuracy in recognizing Chinese characters [Citation19] and to develop a computer program (AlphaGo), which was capable of beating 18-time world champion Lee Sedol at the ancient and complicated game, Go [Citation20]. An important milestone for deep learning and multitask learning was their performance in the TOX21 challenge, where the combination of them provided overall best accuracy using the area under the curve performance measure [Citation21]. Interestingly, the best balanced accuracy was calculated using Associative Neural Networks [Citation22], which were developed using traditional ‘shallow’ neural networks [Citation23]. This method also contributed the top-ranked model for the ToxCast challenge [Citation24]. Probably, their combination, also known as ‘deep Associative Neural Networks’, can contribute even better models. Deep learning has already demonstrated advantages in combining tasks and data by simultaneous analysis of 259 datasets totaling >40 M data points from public databases [Citation16]. Large-scale chemogenomics modeling is currently an active research field with several important publications recently from Jansen Pharmaceuticals [Citation25–28]. The approaches may still need to be optimized to learn imbalanced datasets, data weighted by measurements accuracies, to identify methods for optimal combination of qualitative and quantitative data as well as use of unsupervised data [Citation17]. Thus, instead of filtering data by removing low-quality records coming from less reliable experiments, one may develop better global models by using all data. A Horizon2020-funded research project is currently working on addressing the current limitations of large-scale chemogenomics modeling [Citation29]. The data of different quality could be weighted by their experimental or estimated accuracies. In the future, the machine learning methods may also be merged with systems biology approaches to predict pharmacokinetic/pharmacodynamic (PK/PD) and/or to better use in vitro measurements to estimate in vivo toxicity, which remains a challenge for traditional methods [Citation24]. Another important direction is the optimal use of global models to create highly accurate local models based on additional data, as was demonstrated in our study that looked at predicting the logP of Pt complexes using computational methods [Citation30]. The development of such approaches is important to improve global models for new compound series using few high accuracy measurements. These tasks are typical among the medicinal chemistry field and this is one of the application areas where the use of Big Data is highly required.

Challenges of exploring virtual chemical spaces

The global models developed with technologies described in the previous section can be particularly useful for searching virtual chemical spaces, which is basically a Big Data problem. In a drug discovery project, the constantly posted question by medicinal chemist is which molecule to synthesize next. Due to the vastness of the chemical space even to enumerate the chemical space around the existing prioritized compounds and to score the compounds for synthetic feasibility, ADME and on-target as well as off-target potency is a true Big Data problem. If all available information is taking into account with the latest machine learning algorithms like multitask learning where all models are trained simultaneously much larger computing resources are needed in comparison to standard single QSAR models. Additionally it is desirable that all models can be automatically updated every time new experimental data are uploaded. A specific example would be to build models to predict off-targets for each molecule proposed for synthesis. To train multitask models for the whole genome would be several thousand models with up to more than one million data points per model if HTS data are used for training the models. Thus the enumeration of chemical space, as well as the building and updating of models, are all Big Data problems that are highly relevant for medicinal chemistry.

As an example, the chemical universe database GDB17 enumerated >166 billion compounds containing up to 17 atoms [Citation31] while the total space of drug-like molecules is estimated to be about 10⁶⁰. For medicinal chemists these virtual spaces can be used to identify new drug-like molecules with favorable properties, for example, promising ADME/T properties that are conducive toward drug development. This is a highly challenging task. Even an annotation of GDB17 using a fast prediction model, which would calculate 100,000 molecules per minute, would require more than 3 years of computing time on a single processor [Citation1]. Moreover, the straightforward prediction of all these compounds can be of a limited value. The numbers of existing experimental measured data points vary from hundreds (complex biological properties such as oral bioavailability) to hundreds of thousands of measurements (simple physicochemical properties, such as MP). The use of multilearning of several properties can help to enlarge experimental space, but even in this case the developed models would still need to extrapolate from one experimental measurement to hundreds of thousands or even hundreds of millions of compounds for prediction of GDB17. The extrapolations, of course, cannot be reliable for all compounds. The applicability domain (AD) methods [Citation32] can help to identify subsets of molecules in the twilight drug-like zone, in other words, molecules with reliable predictions but also with properties favorable for drug development (high solubility, low toxicity and so on). Although many AD methods exist [Citation32], a model based on compounds that are solid at a room temperature failed to identify compounds, which have MP below 0°C, using a state-of-the art method for AD definition [Citation8]. This result indicates that the AD methods need further development to be reliably used for analysis of large virtual chemical spaces. To better estimate the AD and accordingly the confidence in predictions, the method of conformal prediction has been pioneered in the drug discovery field by AstraZeneca and collaborators [Citation33]. Conformal predictors were originally developed at Royal Holloway, University of London, UK.

A prominent example of medicinal chemistry-related application of Big Data in pharmaceutical industry is the design and utilization of the so-called virtual library, which is constructed on compiled large number of organic chemical reactions and available chemical reagents. In AstraZeneca, such a virtual library was constructed using synthetic protocols extracted from in-house corporate electronic laboratory notebook to enable virtual screening in this huge chemical space (can reach 10¹⁵ products in theory) via 2D structural fingerprint [Citation34]. Similar systems have been developed in other pharmaceutical companies. BI-Claim from Boehringer Ingelheim uses in-house combinatorial library generation protocols and commercial reagents to generate virtual libraries [Citation35]. The system could theoretically enumerate 5 × 10¹¹ virtual chemical structures and the similarity searching can be carried out via Ftrees-Fragment Spaces. One application of this platform on drug discovery project has been reported [Citation36] that virtual screening on the combinatorial libraries via Ftrees-Fragment Spaces led to the identification of two new structural classes of GPR119 agonists with submicromolar in vitro potencies. Researchers from Pfizer reported on the Pfizer Global Virtual Library (PGVL) of synthetically feasible compounds, which makes use of over 1200 combinatorial reactions and can theoretically enumerate 10¹⁴−10¹⁸ virtual compounds [Citation37]. A custom desktop software package, PGVL-Hub, has been developed to enable the similarity search on interested queries and design-focused libraries [Citation38]. The impact of PGVL-Hub has been applied [Citation39] in the discovery of novel Chk1 inhibitors, where two lead compounds were obtained through two rounds of focused library design using PGVL-Hub based on one initial HTS hit. Recently researchers of Lilly reported using their own virtual library platform, Proximal Lilly Collection, to carry out near neighbor search, focused library design and virtual screening. To develop selective hRIO2 Kinase Inhibitors, an similarity search on Proximal Lilly Collection based on an old anti-inflammatory drug diphenpyramide was done and led to the identification of one follow-up compound with tenfold increment on its potency [Citation40].

Training in Big Data analytics

The development and use of methods to advance analysis of Big Data requires adequately trained specialists [Citation1]. In this respect chemoinformatics specialists, who receive education both in chemistry and in informatics, will play a leading role in the further development of this field. Funding provided by the EU commission to educational programs, such as Marie Skłodowska-Curie Innovative Training Network European Doctorate ‘Big Data in Chemistry’ [Citation41], is also important in developing specialized training programs that closely match the requirements of industry with proposed theoretical and practical training.

Conclusion

While one may question if Big Data accurately describes the datasets handled within the medicinal chemistry field, there is no denying that there is a demand for Big Data approaches that are capable of analyzing the increasing volumes of data in this field. Techniques and methods that enable the exploration of virtual chemical spaces and (deep) learning of several properties simultaneously are expected to allow medicinal chemists to efficiently exploit Big Data. Last but not least, further progress will also critically depend on training programs and advances in chemoinformatics, a discipline bridging chemistry and informatics.

Acknowledgements

This article reflects only the authors’ view and neither the European Commission nor the Research Executive Agency are responsible for any use that may be made of the information it contains. The authors thank BIGCHEM partners for their comments and suggestions, which were important to improve this manuscript.

Financial & competing interests disclosure

The project leading to this article has received funding from the European Union's Horizon2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 676434, ‘Big Data in Chemistry’ (‘BIGCHEM’). IV Tetko is CEO and founder of BigChem GmbH, which licenses the OCHEM [Citation10]. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

No writing assistance was utilized in the production of this manuscript.

Additional information

Funding

References

Tetko IV , EngkvistO, KochU, ReymondJL, ChenH. BIGCHEM: challenges and opportunities for Big Data analysis in chemistry. Mol. Inform. doi:10.1002/minf.201600073 (2016) ( Epub ahead of print).
Web of Science ®Google Scholar
Muresan S , PetrovP, SouthanCet al. Making every SAR point count: the development of Chemistry Connect for the large-scale integration of structure and bioactivity data. Drug Discov. Today16 (23–24), 1019–1030 (2011).
PubMed Web of Science ®Google Scholar
Mondal K . Design issues of Big Data parallelisms. Adv. Intell. Syst. Comput.434, 209–217 (2016).
Google Scholar
PubChem BioAssay Database . http://pubchem.ncbi.nlm.nih.gov.
Google Scholar
Inside eBay's 90PB data warehouse . www.itnews.com.au/news/inside-ebay8217s-90pb-data-warehouse-342615.
Google Scholar
How Amazon Works. http://money.howstuffworks.com/amazon1.htm.
Google Scholar
Mervin LH , AfzalAM, DrakakisG, LewisR, EngkvistO, BenderA. Target prediction utilising negative bioactivity data covering large chemical space. J. Cheminform.7, 51 (2015).
PubMed Web of Science ®Google Scholar
Tetko IV , SushkoY, NovotarskyiSet al. How accurately can we predict the melting points of drug-like compounds? J. Chem. Inf. Model. 54 (12), 3320–3329 (2014).
PubMed Web of Science ®Google Scholar
Tetko IV , LoweD, WilliamsAJ. The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS. J. Cheminform.8, 2 (2016).
PubMed Web of Science ®Google Scholar
Sushko I , NovotarskyiS, KornerRet al. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J. Comput. Aided. Mol. Des.25 (6), 533–554 (2011).
PubMed Web of Science ®Google Scholar
Online Chemical Database . http://ochem.eu
Google Scholar
Bergstrom CA , NorinderU, LuthmanK, ArturssonP. Molecular descriptors influencing melting point and their role in classification of solid drugs. J. Chem. Inf. Comput. Sci.43 (4), 1177–1185 (2003).
PubMedGoogle Scholar
Ran Y , YalkowskySH. Prediction of drug solubility by the general solubility equation (GSE). J. Chem. Inf. Comput. Sci.41 (2), 354–357 (2001).
PubMedGoogle Scholar
Balakin KV , SavchukNP, TetkoIV. In silico approaches to prediction of aqueous and DMSO solubility of drug-like compounds: trends, problems and solutions. Curr. Med. Chem.13 (2), 223–241 (2006).
PubMed Web of Science ®Google Scholar
Varnek A , GaudinC, MarcouG, BaskinI, PandeyAK, TetkoIV. Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients. J. Chem. Inf. Model.49 (1), 133–144 (2009).
PubMed Web of Science ®Google Scholar
Ramsundar B , KearnesS, RileyP, WebsterD, KonerdingD, PandeV. Massively multitask networks for drug discovery. ArXiv e-prints 1502.02072 (2015).
Google Scholar
Baskin I , WinklerD, TetkoIV. A renaissance of neural networks in drug discovery. Expert Opin. Drug Discov.11 (8), 785–795 (2016).
PubMed Web of Science ®Google Scholar
Gawehn E , HissJA, SchneiderG. Deep learning in drug discovery. Mol. Inf.35 (1), 3–14 (2016).
PubMed Web of Science ®Google Scholar
96.7% recognition rate for handwritten Chinese characters using AI that mimics the human brain. http://phys.org/news/2015-09-recognition-handwritten-chinese-characters-ai.html.
Google Scholar
Borowiec S . AlphaGo seals 4–1 victory over Go grandmaster Lee Sedol. The Guardian 15 March (2016). www.theguardian.com/technology/2016/mar/15/googles-alphago-seals-4-1-victory-over-grandmaster-lee-sedol
Google Scholar
Mayr A , KlambauerG, UnterthinerT, HochreiterS. DeepTox: toxicity prediction using deep learning. Front. Environ. Sci.3, 80 (2016).
Web of Science ®Google Scholar
Tetko IV . Associative neural network. Methods Mol. Biol.458, 185–202 (2008).
PubMedGoogle Scholar
Abdelaziz A , Spahn-LangguthH, Werner-SchrammK, TetkoIV. Consensus modeling for HTS assays using in silico descriptors calculates the best balanced accuracy in Tox21 challenge. Front. Environ. Sci.4, 2 (2016).
Google Scholar
Novotarskyi S , AbdelazizA, SushkoY, KornerR, VogtJ, TetkoIV. ToxCast EPA in vitro to in vivo challenge: insight into the rank-i model. Chem. Res. Toxicol.29 (5), 768–775 (2016).
PubMed Web of Science ®Google Scholar
Simm J , AranyA, ZakeriPet al. Macau: scalable bayesian multi-relational factorization with side information using MCMC. ArXiv e-prints 1509.04610 (2015). https://arxiv.org/abs/1509.04610
Google Scholar
Zawbaa HM , SzlekJ, GrosanC, JachowiczR, MendykA. Computational intelligence modeling of the macromolecules release from PLGA microspheres – focus on feature selection. PLoS ONE11 (6), e0157610 (2016).
PubMed Web of Science ®Google Scholar
Arany A , SimmJ, ZakeriPet al. Highly scalable tensor factorization for prediction of drug–protein interaction type. ArXiv e-prints 1512.00315 (2015). https://arxiv.org/pdf/1512.00315.pdf.
Google Scholar
Unterthiner T , MayrA, KlambauerGet al. Deep learning as an opportunity in virtual screening. Presented at : NIPS 2014 Deep Learning and Representation Learning Workshop. Montreal, Canada, 8–13 December 2014.
Google Scholar
Exascalable Compound Activity Prediction Engines . www.excape-h2020.eu.
Google Scholar
Tetko IV , JaroszewiczI, PlattsJA, Kuduk-JaworskaJ. Calculation of lipophilicity for Pt(II) complexes: experimental comparison of several methods. J. Inorg. Biochem.102 (7), 1424–1437 (2008).
PubMed Web of Science ®Google Scholar
Ruddigkeit L , Van DeursenR, BlumLC, ReymondJL. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model.52 (11), 2864–2875 (2012).
PubMed Web of Science ®Google Scholar
Tetko IV , BruneauP, MewesHW, RohrerDC, PodaGI. Can we estimate the accuracy of ADME-Tox predictions?Drug Discov. Today11 (15–16), 700–707 (2006).
PubMed Web of Science ®Google Scholar
Norinder U , CarlssonL, BoyerS, EklundM. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. J. Chem. Inf. Model.54 (6), 1596–1603 (2014).
PubMed Web of Science ®Google Scholar
Vainio MJ , KogejT, RaubacherF. Automated recycling of chemistry for virtual screening and library design. J. Chem. Inf. Model.52 (7), 1777–1786 (2012).
PubMed Web of Science ®Google Scholar
Lessel U , WellenzohnB, LilienthalM, ClaussenH. Searching fragment spaces with feature trees. J. Chem. Inf. Model.49 (2), 270–279 (2009).
PubMed Web of Science ®Google Scholar
Wellenzohn B , LesselU, BellerA, IsambertT, HoenkeC, NosseB. Identification of new potent GPR119 agonists by combining virtual screening and combinatorial chemistry. J. Med. Chem.55 (24), 11031–11041 (2012).
PubMed Web of Science ®Google Scholar
Peng Z . Very large virtual compound spaces: construction, storage and utility in drug discovery. Drug Discov. Today Technol.10 (3), e387–394 (2013).
PubMedGoogle Scholar
Peng Z , YangB, MattapartiSet al. PGVL Hub: An integrated desktop tool for medicinal chemists to streamline design and synthesis of chemical libraries and singleton compounds. Methods Mol. Biol.685, 295–320 (2011).
PubMedGoogle Scholar
Teng M , ZhuJ, JohnsonMDet al. Structure-based design and synthesis of (5-arylamino-2H-pyrazol-3-yl)-biphenyl-2′,4′-diols as novel and potent human CHK1 inhibitors. J. Med. Chem.50 (22), 5253–5256 (2007).
PubMed Web of Science ®Google Scholar
Nicolaou CA , WatsonIA, HuH, WangJ. The proximal lilly collection: mapping, exploring and exploiting feasible chemical space. J. Chem. Inf. Model.56 (7), 1253–1266 (2016).
PubMed Web of Science ®Google Scholar
BigChem . http://bigchem.eu.
Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Does ‘Big Data’ Exist in Medicinal Chemistry, and If So, How can It Be Harnessed?

Does Big Data exist in medicinal chemistry?

Is Big Data really useful for prediction?

Multitask learning & deep learning for Big Data

Challenges of exploring virtual chemical spaces

Training in Big Data analytics

Conclusion

Acknowledgements

Financial & competing interests disclosure

References

Information for

Open access

Opportunities

Help and information

Does ‘Big Data’ Exist in Medicinal Chemistry, and If So, How can It Be Harnessed?

Does Big Data exist in medicinal chemistry?

Is Big Data really useful for prediction?

Multitask learning & deep learning for Big Data

Challenges of exploring virtual chemical spaces

Training in Big Data analytics

Conclusion

Acknowledgements

Financial & competing interests disclosure

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date