Abstract
Background: Artificial Neural Networks (ANN) are extensively used to model ‘omics’ data. Different modeling methodologies and combinations of adjustable parameters influence model performance and complicate model optimization. Methodology: We evaluated optimization of four ANN modeling parameters (learning rate annealing, stopping criteria, data split method, network architecture) using retention index (RI) data for 390 compounds. Models were assessed by independent validation (I-Val) using newly measured RI values for 1492 compounds. Conclusion: The best model demonstrated an I-Val standard error of 55 RI units and was built using a Ward’s clustering data split and a minimally nonlinear network architecture. Use of validation statistics for stopping and final model selection resulted in better independent validation performance than the use of test set statistics.
Supplementary data
To view the supplementary data that accompany this paper please visit the journal website at: www.tandfonline.com/doi/full/10.4155/BIO.15.1
Financial & competing interests disclosure
This research was funded by NIH grant 1R01GM087714. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.