Figures & data
![](/cms/asset/83691ff8-4218-454a-9a4a-144578eef00f/tsta_a_1439253_uf0001_oc.jpg)
Figure 1. Example of feature binning procedure for (a) electronegativity and (b) Voronoi feature face area (in red) in Pbcn LiSbWO6 with 4 Li, 4 Sb, 4 W, and 16 O atoms in the unit cell.
![Figure 1. Example of feature binning procedure for (a) electronegativity and (b) Voronoi feature face area (in red) in Pbcn LiSbWO6 with 4 Li, 4 Sb, 4 W, and 16 O atoms in the unit cell.](/cms/asset/78607c34-f08a-4ec0-b576-6bfb18f96939/tsta_a_1439253_f0001_oc.gif)
Table 1. Applied smoothing parameters for various histograms.
Figure 2. (a) Schematic workflow for the extraction procedure implemented for Voronoi tessellation of crystal structures, (b) Test case crystal structure Pbcn LiSbWO6 (green spheres are Li atoms, brown spheres are Sb atoms, magenta spheres are W atoms, and red spheres are O atoms). Voronoi tessellation for LiSbWO6 (c) with all atoms considered (1st nearest-neighbor (NN) information), (d) with Li atoms only (2NN information), and (e) with O atoms only (2NN information).
![Figure 2. (a) Schematic workflow for the extraction procedure implemented for Voronoi tessellation of crystal structures, (b) Test case crystal structure Pbcn LiSbWO6 (green spheres are Li atoms, brown spheres are Sb atoms, magenta spheres are W atoms, and red spheres are O atoms). Voronoi tessellation for LiSbWO6 (c) with all atoms considered (1st nearest-neighbor (NN) information), (d) with Li atoms only (2NN information), and (e) with O atoms only (2NN information).](/cms/asset/1b23310f-3925-404d-9c29-75ae78e027ae/tsta_a_1439253_f0002_oc.gif)
Table 2. Fitting conditions employed for GBR and SVR models.
Figure 3. Test data-set grand average errors for DFT-CE and DFT-BG by (a), (c) GBR (total: 420 trees/data split × 100 data splits = 4200 fitting instances for CE and BG, respectively) and (b), (d) SVR fitting. Test data-set error surface with respect to hyperparameter combination (C and ) for SVR fitting for (c) DFT-CE and (d) DFT-BG; a total of 18,600 regularly-spaced hyperparameter coordinate sets (or fitting instances) each, respectively.
![Figure 3. Test data-set grand average errors for DFT-CE and DFT-BG by (a), (c) GBR (total: 420 trees/data split × 100 data splits = 4200 fitting instances for CE and BG, respectively) and (b), (d) SVR fitting. Test data-set error surface with respect to hyperparameter combination (C and ε) for SVR fitting for (c) DFT-CE and (d) DFT-BG; a total of 18,600 regularly-spaced hyperparameter coordinate sets (or fitting instances) each, respectively.](/cms/asset/93a3a08d-cfaf-42f8-8de2-7b3c5234d018/tsta_a_1439253_f0003_oc.gif)
Figure 4. Fitting quality of final models: (a) CE with GBR, (b) CE with SVR, (c) BG with GBR, and (d) BG with SVR. Insets in (a) and (c) show deviance (error residuals) plots terminating at the optimal number of base tree learners for GBR fitting (50 trees for CE and 70 trees for BG, respectively). Optimal hyperparameters (C and ) for SVR fitting are indicated in (b) and (d).
![Figure 4. Fitting quality of final models: (a) CE with GBR, (b) CE with SVR, (c) BG with GBR, and (d) BG with SVR. Insets in (a) and (c) show deviance (error residuals) plots terminating at the optimal number of base tree learners for GBR fitting (50 trees for CE and 70 trees for BG, respectively). Optimal hyperparameters (C and ε) for SVR fitting are indicated in (b) and (d).](/cms/asset/5d57515f-5c4a-4f23-9e83-195998af7a16/tsta_a_1439253_f0004_oc.gif)
Table 3. Comparison of predictive accuracy between EN+VORO and RDF-based final machine models with ICSD-based DFT calculated data-set (140 compounds).
Figure 5. Test data-set grand average errors (from 100 random data splitting) for (a) DFT-CE, (c) DFT density (DFT-d), (e) DFT-BG, and (g) DFT decomposition energy (DFT-Ed) by GBR fitting. Final models have (b) 120, (d) 160, (f) 300, (h) 430 trees, respectively.
![Figure 5. Test data-set grand average errors (from 100 random data splitting) for (a) DFT-CE, (c) DFT density (DFT-d), (e) DFT-BG, and (g) DFT decomposition energy (DFT-Ed) by GBR fitting. Final models have (b) 120, (d) 160, (f) 300, (h) 430 trees, respectively.](/cms/asset/36b4dad6-716d-44c4-bd2a-5418e9d73c17/tsta_a_1439253_f0005_oc.gif)
Table 4. Comparison of predictive accuracy between EN+VORO- and RDF-based final machine models using Materials Project data-set (1000 compounds).
Figure 6. Scaled variable importance (VI) plot of bin features from the final GBR models for (a) DFT-CE and (b) DFT-BG.
![Figure 6. Scaled variable importance (VI) plot of bin features from the final GBR models for (a) DFT-CE and (b) DFT-BG.](/cms/asset/8e13820e-5d09-4521-bd17-a30d4563f701/tsta_a_1439253_f0006_oc.gif)
Figure 7. (a) Average number of vertices vs. counted average face area <0.2 from Voronoi tessellations of atom centers, (b) histogram for face areas relative to average per Voronoi cell for Pbcn LiSbWO6 crystal structure.
![Figure 7. (a) Average number of vertices vs. counted average face area <0.2 from Voronoi tessellations of atom centers, (b) histogram for face areas relative to average per Voronoi cell for Pbcn LiSbWO6 crystal structure.](/cms/asset/2db91362-a97d-479f-b569-9f58f0b86255/tsta_a_1439253_f0007_oc.gif)