Abstract
The use of modern data science has recently emerged as a promising new path to tackling the complex challenges involved in the creation of next-generation chemistry and materials. However, despite the appeal of this potentially transformative development, the chemistry community has yet to incorporate it as a central tool in every-day work. Our research program is designed to enable and advance this emerging research approach. It is centred around the creation of a software ecosystem that brings together physics-based modelling, high-throughput in silico screening and data analytics (i.e. the use of machine learning and informatics for the validation, mining and modelling of chemical data). This cyberinfrastructure is devised to offer a comprehensive set of data science techniques and tools as well as a general-purpose scope to make it as versatile and widely applicable as possible. It also emphasises user-friendliness to make it accessible to the community at large. It thus provides the means for the large-scale exploration of chemical space and for a better understanding of the hidden mechanisms that determine the properties of complex chemical systems. Such insights can dramatically accelerate, streamline and ultimately transform the way chemical research is conducted. Aside from serving as a production-level tool, our cyberinfrastructure is also designed to facilitate and assess methodological innovation. Both the software and method development work are driven by concrete molecular design problems, which also allow us to assess the efficacy of the overall cyberinfrastructure.
Acknowledgements
Computing time on the high-performance computing clusters ‘Rush’, ‘Alpha’, ‘Beta’ and ‘Gamma’ was provided by the University at Buffalo (UB) Center for Computational Research (CCR). The work presented here is part of MAFA’s, MH’s and YP’s PhD theses. In addition, this review mentions contributions of the following members and former members of the Hachmann group: Dr. Andrew J. Schultz (Research Assistant Professor; deep eutectic solvent and organic photovoltaics applications), Aditya Sonpal (MSc; ChemBDDB, ChemML, deep eutectic solvent and high-refractive index polymer applications), Gaurav Vishwakarma (MSc; ChemML), Po-Han Chen (MSc; ChemML), Vigneshwar Kumaran Sudalayandi Rajeswari (MSc; biodegradable polymer application), Amol Rajendra Mahajan (MSc; liquid organic hydrogen carrier application). Shirish Sivaraj (MSc; ChemBDDB), Noah A. Zydel (BSc; ChemML), Chi Hin Chan (BSc; deep eutectic solvent application), Andrew J. DeRooy (BSc; deep eutectic solvent application), Sykhere A. Brown (BSc; ChemML), Supriya Agrawal (MSc ’17; ChemBDDB), Sai Prasad Ganesh (BSc ’17; high-refractive index polymer application), Mark A. Pitman (BSc ’17; hydrolysis catalyst application), William S. Evangelista II (MEng ’16; ChemHTPS) Yujie Tian (MSc ’16; ChemML, organic semiconductor application), Mikhail Pechagin (BSc ’16; ChemML), Ching-Yen Shih (MSc ’15; ChemML) and Bryan A. Moore (BSc ’15; organic photovoltaics application).
Notes
No potential conflict of interest was reported by the authors.