References
- Wu Z , Ramsundar B , Feinberg EN , et al . Moleculenet: a benchmark for molecular machine learning. Chem Sci. 2018;9(2):513–530.
- Ramprasad R , Batra R , Pilania G , et al . Machine learning in materials informatics: recent applications and prospects. Comput Mater. 2017;3(1):54–66.
- Ramakrishnan R , von Lilienfeld A . Machine learning, quantum chemistry, and chemical space. Rev Comput Chem. 2017;225–256.
- Ward L , Wolverton C . Atomistic calculations and materials informatics: a review. Current Opinion Solid State Mater Sci. 2017;21(3):167–176.
- Agrawal A , Choudhary A . Perspective: materials informatics and big data: realization of the ‘fourth paradigm’ of science in materials science. Apl Mater. 2016;4(5):053208.
- Jain A , Hautier G , Ong SP , et al . New opportunities for materials informatics: resources and data mining techniques for uncovering hidden relationships. J Mater Res. 2016;31(8):977–994.
- Wagner N , Rondinelli JM . Theory-guided machine learning in materials science. Front Mater. 2016;3:28–36.
- von Lilienfeld A . Quantum machine learning in chemical compound space. Angew Chem Int Ed. 2017;57:2–8.
- Janet JP , Kulik HJ . Resolving transition metal chemical space: feature selection for machine learning and structure-property relationships. J Phys Chem A. 2017 Nov;121:8939–8954.
- Janet JP , Kulik HJ . Predicting electronic structure properties of transition metal complexes with neural networks. Chem Sci. 2017 Feb;8:5137–5152.
- Faber FA , Hutchison L , Huang B , et al . Prediction errors of molecular machine learning models lower than hybrid dft error. J Chem Theory Comput. 2017;13(11):5255–5264.
- Ma J , Sheridan RP , Liaw A , et al . Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model. 2015 Feb;55:263–274.
- Fernandez M , Trefiak NR , Woo TK . Atomic property weighted radial distribution functions descriptors of metal-organic frameworks for the prediction of gas uptake capacity. J Phys Chem C. 2013;117:14095–14105.
- Potyrailo R , Rajan K , Stoewe K , et al . Combinatorial and high-throughput screening of materials libraries: review of state of the art. ACS Comb Sci. 2011;13(6):579–633.
- Murphy RF . An active role for machine learning in drug development. Nat Chem Biol. 2011;7(6):327–330.
- Fernandez M , Barnard AS . Identification of nanoparticle prototypes and archetypes. ACS Nano. 2015;9(12):11980–11992.
- Fernandez M , Breedon M , Cole IS , et al . Modeling corrosion inhibition efficacy of small organic molecules as non-toxic chromate alternatives using comparative molecular surface analysis (CoMSA). Chemosphere. 2016;160:80–88.
- Varnek A , Baskin I . Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model. 2012;52(6):1413–1437.
- Swann ET , Fernandez M , Coote ML , et al . Bias-free chemically diverse test sets from machine learning. ACS Comb Sci. 2017;19(8):544–554.
- De S , Bartok AP , Csanyi G , et al . Comparing molecules and solids across structural and alchemical space. Phys Chem Chem Phys. 2016;18:13754–13769.
- Sadeghi A , Ghasemi SA , Schaefer B , et al . Metrics for measuring distances in configuration spaces. J Chem Phys. 2013;139(18):184118.
- Cherkasov A , Muratov EN , Fourches D , et al . QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57(12):4977–5010.
- Ghiringhelli LM , Vybiral J , Levchenko SV , et al . Big data of materials science: critical role of the descriptor. Phys Rev Lett. 2015;114:105503.
- Sun B , Fernandez M , Barnard AS . Machine learning for silver nanoparticle electron transfer property prediction. J Chem Inf Model. 2017;57(10):2413–2423.
- Fernandez M , Bilić A , Barnard AS . Machine learning and genetic algorithm prediction of energy differences between electronic calculations of graphene nanoflakes. Nanotechnology. 2017;28(38):38LT03–38LT06.
- Ma X , Li Z , Achenie LEK , et al . Machine-learning-augmented chemisorption model for co 2 electroreduction catalyst screening. J Phys Chem Lett. 2015 Sep;6:3528–3533.
- Behler J . First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew Chem Int Ed. 2017;56(42):12828–12840.
- Behler J . Constructing high-dimensional neural network potentials: a tutorial review. Int J Quantum Chem. 2015;115(16):1032–1050.
- Pietrucci F , Andreoni W . Graph theory meets Ab Initio molecular dynamics: Atomic structures and transformations at the nanoscale. Phys Rev Lett. 2011;107:085504.
- Eshet H , Khaliullin RZ , Kühne TD , et al . Ab initio quality neural-network potential for sodium. Phys Rev B. 2010 May;81:184107.
- Bartók AP , Kondor R , Csányi G . On representing chemical environments. Phys Rev B. 2013;87:184115.
- Hansen K , Biegler F , Ramakrishnan R , et al . Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space. J Phys Chem Lett. 2015;6(12):2326–2331.
- Von Lilienfeld OA . First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem. 2013;113(12):1676–1689.
- Huang B , von Lilienfeld OA . Communication: understanding molecular representations in machine learning: The role of uniqueness and target similarity. J Chem Phys. 2016 Oct;145:161102–161107.
- Seko A , Togo A , Tanaka I . Descriptors for machine learning of materials data. Singapore: Springer; 2018. p. 3–23.
- Todeschini R , Consonni V . Molecular descriptors for chemoinformatics: volume I: alphabetical listing/volume II: appendices, references. vol. 41. Weinheim: Wiley; 2009.
- Todeschini R , Consonni V . Handbook of molecular descriptors. vol. 11. Weinheim: Wiley; 2008.
- Katritzky AR , Lobanov VS , Karelson M . CODESSA 2.0 (Comprehensive descriptors for structural and statistical analysis). Gainesville, USA: University of Florida; 1996.
- Gardiner EJ , Gillet VJ , Haranczyk M , et al . Turbo similarity searching: effect of fingerprint and dataset on virtual-screening performance. Stat Anal Data Mining. 2009;2(2):103–114.
- Nguyen KT , Blum LC , Deursen RV , et al . Classification of organic molecules by molecular quantum numbers. ChemMedChem. 2009;4:1803–1805.
- Pearlman RS , Smith KM . Novel software tools for chemical diversity. Perspect Drug Discovery Des. 1998;9:339–353.
- Sliwoski G , Kothiwale S , Meiler J , et al . Computational methods in drug discovery. Pharmacol Rev. 2014;66(1):334–395.
- Rognan D . The impact of in silico screening in the discovery of novel and safer drug candidates. Pharmacol Ther. 2017;175:47–66.
- Willett P . Similarity-based virtual screening using 2d fingerprints. Drug Discovery Today. 2006;11(23–24):1046–1053.
- Bonchev D , Trinajstić N . Information theory, distance matrix, and molecular branching. J Chem Phys. 1977;67(10):4517.
- Bertz SH . The first general index of molecular complexity. J Am Chem Soc. 1981;103(12):3599–3601.
- Balaban AT . Highly discriminating distance-based topological index. Chem Phys Lett. 1982;89(5):399–404.
- Hall LH , Kier LB . The molecular connectivity chi indexes and kappa shape indexes in structure-property modeling. Rev Comput Chem. 2007;2:367–422.
- Bonchev D . On the concept for overall topological representation of molecular structure. Adv Math Chem App. 2015(1):42–75.
- Fernandez M , Abreu JI , Shi H , et al . Machine learning prediction of the energy gap of graphene nanoflakes using topological autocorrelation vectors. ACS Comb Sci. 2016;18(11):661–664.
- Hastie T , Tibshirani R , Friedman J . The elements of statistical learning. New York (NY): Springer; 2009.
- Rupp M , Tkatchenko A , Müller K-R , et al . Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett. 2012;108(5):58301.
- Bartók AP , Gillan MJ , Manby FR , et al . Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water. Phys Rev B. 2013;88:054104.
- Montavon G , Rupp M , Gobre V , et al . Machine learning of molecular electronic properties in chemical compound space. New J Phys. 2013;15(9):095003.
- Hansen K , Montavon G , Biegler F , et al . Assessment and validation of machine learning methods for predicting molecular atomization energies. J Chem Theory Comput. 2013;9(8):3404–3419.
- Rupp M , Tkatchenko A , Müller K-R , et al . Rupp et al. reply. Phys Rev Lett. 2012;109:059802.
- Faber FA , Hutchison L , Huang B , et al . Prediction errors of molecular machine learning models lower than Hybrid DFT error. J Chem Theory Comput. 2017 Oct;13:5255–5264.
- Hemmer MC , Steinhauer V , Gasteiger J . Deriving the 3D structure of organic molecules from their infrared spectra. Vib Spectrosc. 1999;19(1):151–164.
- Hemmer MC , Gasteiger J . Prediction of three-dimensional molecular structures using information from infrared spectra. Anal Chim Acta. 2000;420(2):145–154.
- Fernandez M , Trefiak NR , Woo TK . Atomic property weighted radial distribution functions descriptors of metal-organic frameworks for the prediction of gas uptake capacity. J Phys Chem C. 2013;117(27):14095–14105.
- Fernandez M , Shi H , Barnard AS . Quantitative structure-property relationship modeling of electronic properties of graphene using atomic radial distribution function scores. J Chem Inf Model. 2015;55(12):2500–2506.
- Hemmer MC . Radial distribution functions in computational chemistry -- theory and applications [PhD thesis]. Friedrich-Alexander University Erlangen-Nurnberg; 2007.
- Jackson JE . A user’s guide to principal components. New York (NY): Wiley; 1991.
- Tenenbaum JB , DeSliva V , Langford JC . A global framework for nonlinear dimensionality reduction. Science (80-). 2000;290(December):2319–2323.
- Roweis S , Saul L . Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–2326.
- Belkin M , Niyogi P . Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003;15(6):1373–1396.
- Kruskal JB . Nonmetric multidimensional scaling: a numerical method. Psychometrika. 1964;29(2):115–129.
- Kruskal JB . Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika. 1964;29(1):1–27.
- van der Maaten L . Accelerating t-SNE using tree-based algorithms. J Mahc Learn Res. 2014;15:3221–3245.
- Huo X , Ni X , Smith AK . A survey of manifold-based learning methods. Recent Adv Data Mining Enterprise Data. 2007;691–745.
- Kohonen T . The self-organizing map. Neurocomputing. 1998;21(1–3):1–6.
- Gasteiger J , Li X , Rudolph C , et al . Representation of molecular electrostatic potentials by topological feature maps. J Am Chem Soc. 1994;116(11):4608–4620.
- Vatanen T , Osmala M , Raiko T , et al . Self-organization and missing values in SOM and GTM. Neurocomputing. 2015;147(1):60–70.
- Wittek P , Gao SC , Lim IS , et al . Somoclu: an efficient parallel library for self-organizing maps. J Stat Softw. 2017;78:1–21. DOI:10.18637/jss.v078.i09
- O’Grady KE . Measures of explained variance: cautions and limitations. Psychol Bull. 1982;92(3):766–777.
- Cutler A , Breiman L . Archetypal analysis. Technometrics. 1994;36(3):338–347.
- Huggins P , Pachter L , Sturmfels B . Toward the human genotope. Bull Math Biol. 2007;69(8):2723–2735.
- Thøgersen JC , Mørup M , Damkiær S , et al . Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways. BMC Bioinf. 2013;14:279–293.
- Shoval O , Sheftel H , Shinar G , et al . Evolutionary trade-offs, pareto optimality, and the geometry of phenotype space. Science. 2012;336:1157–1160.
- Marinetti S , Finesso L , Marsilio E . Archetypes and principal components of an IR image sequence. Infrared Phys Technol. 2007;49(3):272–276.
- Mørup M , Hansen LK . Archetypal analysis for machine learning and data mining. Neurocomputing. 2012;80:54–63.
- Kosti MV , Feldt R , Angelis L . Archetypal personalities of software engineers and their work preferences: a new perspective for empirical studies. Empir Softw Eng. 2016;21:1509–1532.
- Porzio GC , Ragozini G , Vistocco D . On the use of archetypes as benchmarks. Appl Stoch Model Bus Ind. 2008;24(5):419–437.
- Eugster MJA , Leisch F . From spider-man to hero -- archetypal analysis in R. J Stat Softw. 2009;30(8):1–23.
- Estivill-Castro V . Why so many clustering algorithms. ACM SIGKDD Explor Newsl. 2002;4(1):65–75.
- Jain AK , Topchy A , Law MH , et al . Landscape of clustering algorithms. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. vol. 1. Cambridge, UK: IEEE; 2004. p. 260–263.
- Sun B , Barnard AS . The impact of size and shape distributions on the electron charge transfer properties of silver nanoparticles. Nanoscale. 2017;9(34):12698–12708.
- Sun B , Barnard AS . Impact of speciation on the electron charge transfer properties of nanodiamond drug carriers. Nanoscale. 2016;8(29):14264–14270.
- Sun B , Fernandez M , Barnard AS . Statistics, damned statistics and nanoscience -- using data science to meet the challenge of nanomaterial complexity. Nanoscale Horiz. 2016;1(2):89–95.
- Barnard AS . Impact of distributions on the photocatalytic performance of anatase nanoparticle ensembles. J Mater Chem A. 2015;3(1):60–64.
- Shi H , Rees RJ , Per MC , et al . Impact of distributions and mixtures on the charge transfer properties of graphene nanoflakes. Nanoscale. 2015;7(5):1864–1871.
- Barnard AS , Per MC . Size and shape dependent deprotonation potential and proton affinity of nanodiamond. Nanotechnology. 2014;25(44):445702.
- Fernandez M , Barnard AS . Geometrical Properties Can Predict CO2 and N2 Adsorption Performance of Metal-Organic Frameworks (MOFs) at Low Pressure. ACS Comb Sci. 2016;18(5):243–252.
- Barnard AS . Computational strategies for predicting the poten-tial risks associated with nanotechnology. Nanoscale. 2009;1(1):89–95.
- Von Luxburg U . A tutorial on spectral clustering. Stat Comput. 2007;17(4):395–416.
- Xiang T , Gong S . Spectral clustering with eigenvector selection. Pattern Recognit. 2008;41(3):1012–1029.
- Ng AY , Jordan MI , Weiss Y . On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst. 2002;849–856.
- Everitt BS , Landau S , Leese M , et al . Hierarchical clustering. New York (NY): Wiley; 2011.
- Murtagh F , Contreras P . Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev Data Min Knowl Discov. 2012;2(1):86–97.
- Murtagh F , Contreras P . Methods of hierarchical clustering. Comput (Long Beach Calif). 2011;38(2):1–21.
- Johnson RD III . Nist computational chemistry comparison and benchmark database nist standard reference database number 10. 2016. [cited 2016 Oct 18. Available from:] http://cccbdb.nist.gov/
- Durant JL , Leland BA , Henry DR , et al . Reoptimization of MDL Keys for Use in Drug Discovery. J Chem Inf Comput Sci. 2002;42:1273–1280.
- Landrum G . Rdkit: open source cheminformatics. 2016. Available from: http://www.rdkit.org
- Curtiss LA , Raghavachari K , Redfern PC , et al . Assessment of Gaussian-3 and density functional theories for a larger experimental test set. J Chem Phys. 2000;112(17):7374–7383.
- Grimme S . Accurate calculation of the heats of formation for large main group compounds with spin-component scaled MP2 methods. J Phys Chem A. 2005;109:3067–3077.
- Winter NW . Theoretical description of the diimide molecule. J Chem Phys. 1975;62(4):1269.
- Parsons CA , Dykstra CE . Electron correlation and basis set effects in unimolecular reactions. A study of the model rearrangement system N2H2 . J Chem Phys. 1979;71(7):3025.
- Jensen HJA , Jøergensen P , Helgaker T . Ground-state potential energy surface of diazene. J Am Chem Soc. 1987;109:2895–2901.
- Andzelm J , Sosa C , Eades RA . Theoretical study of chemical reactions using density functional methods with nonlocal corrections. J Phys Chem. 1993;97(18):4664–4669.
- McKee ML . Catalyzed cis/trans isomerization of diazene. A computational study in the gas and aqueous phases. J Phys Chem. 1993;97:13608–13614.
- Smith BJ . Isomers and transition structures of diazene. J Phys Chem. 1993;97:10513–10514.
- Angeli C , Cimiraglia R , Hofmann H-J . On the competition between the inversion and rotation mechanisms in the cis-trans thermal isomerization of diazene. Chem Phys Lett. 1996;259(September):276–282.
- Jursic B . Ab initio and density functional theory study of the diazene isomerization. Chem Phys Lett. 1996;261:13–17.
- Mach P , Masik J , Urban J , et al . Single-root multireference Brillouin-Wigner coupled-cluster theory. Rotational barrier of the N2H2 molecule. Mol Phys. 1998;94(1):173–179.
- Martin JML , Taylor PR . Benchmark ab initio thermochemistry of the isomers of diimide, N2H2, using accurate computed structures and anharmonic force fields. Mol Phys. 1999;96(4):681–692.
- Stepanic V , Baranovic G . Ground and excited states of isodiazene - an ab initio study. Chem Phys. 2000;254:151–168.
- Chattarj PK , Perez P , Zevallos J , et al . Theoretical study of the trans N2H2→ cisN2H2 and F2S2→ FSSF reactions in gas and solution phases. J Mol Struct. 2001;580:171–182.
- Hwang D , Mebel A . Reaction mechanism of N2/H2 conversion to NH3: a theoretical study. J Phys Chem A. 2003;107:2865–2874.
- Pu X , Wong N-B , Zhou G , et al . Substituent effects on the trans/cis isomerization and stability of diazenes. Chem Phys Lett. 2005;408(1–3):101–106.
- Biczysko M , Poveda L , Varandas A . Accurate MRCI study of ground-state N2H2 potential energy surface. Chem Phys Lett. 2006;424(1–3):46–53.
- Chaudhuri RK , Freed KF , Chattopadhyay S , et al . Potential energy curve for isomerization of N2H2 and C2H4 using the improved virtual orbital multireference Møller-Plesset perturbation theory. J Chem Phys. 2008;128(14):144304.
- Mahapatra US , Chattopadhyay S . Evaluation of the performance of single root multireference coupled cluster method for ground and excited states, and its application to geometry optimization. J Chem Phys. 2011;134(4):044113.
- Jana J . Relative stabilities of two difluorodiazene isomers: density functional and molecular orbital studies. Reports Theor Chem. 2012;1:1–10.
- Musiał M , Lupa Ł , Szopa K , et al . Potential energy curves via double ionization potential calculations: example of 1,2-diazene molecule. Struct Chem. 2012;23(5):1377–1382.
- Sand AM , Schwerdtfeger CA , Mazziotti DA . Strongly correlated barriers to rotation from parametric two-electron reduced-density-matrix methods in application to the isomerization of diazene. J Chem Phys. 2012;136(3):034112.
- Swann E , Fernandez M , Barnard A , et al . Cmolsc-1 quantum chemical test set. v1. CSIRO Data Collection. 2017. DOI:10.4225/08/58bcf1565950a
- Swann E , Fernandez M , Barnard A , et al . Cmolst-1 quantum chemical test set. v1. CSIRO Data Collection. 2017. DOI:10.4225/08/58bcf21ca85b6
- Fernandez M , Wilson HF , Barnard AS . Impact of distributions on the prediction of nanoparticle prototypes and archetypes. Nanoscale. 2017;9:832–843.
- Lai L , Barnard AS . Tuning the electron transfer properties of entire nanodiamond ensembles. J Phys Chem C. 2014;118:30209–30215.
- Barnard AS . Impact of distributions on the photocatalytic performance of anatase nanoparticle ensembles. J Mater Chem A. 2015;3:60–64.
- Barnard AS , Wilson HF . Optical emission of statistical distributions of silicon quantum dots. J Phys Chem C. 2015;119:7969–7977.
- Barron H , Barnard AS . Using structural diversity to tune the catalytic performance of Pt nanoparticle ensembles. Catal Sci Technol. 2015;5:2848–2855.
- Silver D , Huang A , Maddison CJ , et al . Mastering the game of Go with deep neural networks and tree search. Nature. 2016;529(7587):484–489.
- Rusk N . Deep learning. Nat Methods. 2015;13(1):35–35.