Abstract
The ‘big data’ concept plays an increasingly important role in many scientific fields. Big data involves more than unprecedentedly large volumes of data that become available. Different criteria characterizing big data must be carefully considered in computational data mining, as we discuss herein focusing on medicinal chemistry. This is a scientific discipline where big data is beginning to emerge and provide new opportunities. For example, the ability of many drugs to specifically interact with multiple targets, termed promiscuity, forms the molecular basis of polypharmacology, a hot topic in drug discovery. Compound promiscuity analysis is an area that is much influenced by big data phenomena. Different results are obtained depending on chosen data selection and confidence criteria, as we also demonstrate.
Lay abstract
‘Big data’ is affecting many areas of life, more so than we might realize. It is often not well understood what big data really mean. This is also true in science. For example, medicinal chemistry, which is a conservative scientific discipline and slow to respond to new trends, is currently entering the big data era. This provides many new opportunities and challenges, as exemplified by the computational study of biological activities of drugs and other compounds from medicinal chemistry.
Graphical abstract
For the anticancer drug imatinib (top), a kinase inhibitor, the histogram indicates the relative number of target annotations on the basis of low- (red), medium- (green) or high-confidence (blue) activity data. Hence, the magnitude of detectable promiscuity (multitarget activity) decreases with increasing data confidence.
Author contributions
J Bajorath conceived the study; Y Hu and J Bajorath planned the analysis; Y Hu carried out the analysis; Y Hu and J Bajorath analyzed the results and wrote the manuscript.
Acknowledgements
The authors also thank OpenEye Scientific Software for a free academic license.
Financial & competing interests disclosure
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.