172
Views
0
CrossRef citations to date
0
Altmetric
Research Article

XGBoost odor prediction model: finding the structure-odor relationship of odorant molecules using the extreme gradient boosting algorithm

, , , &
Received 19 Jan 2023, Accepted 07 Sep 2023, Published online: 18 Sep 2023
 

Abstract

Determining the structure-odor relationship has always been a very challenging task. The main challenge in investigating the correlation between the molecular structure and its associated odor is the ambiguous and obscure nature of verbally defined odor descriptors, particularly when the odorant molecules are from different sources. With the recent developments in machine learning (ML) technology, ML and data analytic techniques are significantly being used for quantitative structure-activity relationship (QSAR) in the chemistry domain toward knowledge discovery where the traditional Edisonian methods have not been useful. The smell perception of odorant molecules is one of the aforementioned tasks, as olfaction is one of the least understood senses as compared to other senses. In this study, the XGBoost odor prediction model was generated to classify smells of odorant molecules from their SMILES strings. We first collected the dataset of 1278 odorant molecules with seven basic odor descriptors, and then 1875 physicochemical properties of odorant molecules were calculated. To obtain relevant physicochemical features, a feature reduction algorithm called PCA was also employed. The ML model developed in this study was able to predict all seven basic smells with high precision (>99%) and high sensitivity (>99%) when tested on an independent test dataset. The results of the proposed study were also compared with three recently conducted studies. The results indicate that the XGBoost–PCA model performed better than the other models for predicting common odor descriptors. The methodology and ML model developed in this study may be helpful in understanding the structure-odor relationship.

Communicated by Ramaswamy H. Sarma

Acknowledgments

The authors acknowledge the Department of Bioinformatics & Applied Sciences, Indian Institute of Information Technology, Allahabad for providing a computing facility.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The source code for the XGBoost odor prediction model is available at https://github.com/pbi2015002/XGBoost_Odor-Prediction_model, and the dataset related to the study is provided in the Supplementary Material files.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,074.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.