Abstract
Per- and polyfluoroalkyl substances (PFASs), one of the persistent organic pollutants, have immunosuppressive effects. The evaluation of this effect has been the focus of regulatory toxicology. In this investigation, 146 PFASs (immunosuppressive or nonimmunosuppressive) and corresponding concentration gradients were collected from literature, and their structures were characterized by using Dragon descriptors. Feature importance analysis and stepwise feature elimination are used for feature selection. Three machine learning (ML) methods, namely Random Forest (RF), Extreme Gradient Boosting Machine (XGB), and Categorical Boosting Machine (CB), were utilized for model development. The model interpretability was explored by feature importance analysis and correlation analysis. The findings indicated that the three models developed have exhibited excellent performance. Among them, the best-performing RF model has an average AUC score of 0.9720 for the testing set. The results of the feature importance analysis demonstrated that concentration, SpPosA_X, IVDE, R2s, and SIC2 were the crucial molecular features. Applicability domain analysis was also performed to determine reliable prediction boundaries for the model. In conclusion, this study is the first application of ML models to investigate the immunosuppressive activity of PFASs. The variables used in the models can help understand the mechanism of the immunosuppressive activity of PFASs, allow researchers to more effectively assess the immunosuppressive potential of a large number of PFASs, and thus better guide environmental and health risk assessment efforts.
Acknowledgments
This work was supported by the Key R&D and Promotion Project in Henan Province of China under Grant (No. 232102311047); and the Key Scientific Research Project of Colleges and Universities in Henan Province under Grant (No. 23B330002).
Credit authorship contribution statement
Yuxin Xuan: conducted data analysis and writing. Yulu Wang: downloaded and analyzed the data. Rui Li: downloaded and analyzed the data. Yuyan Zhong: pre-processed the data. Na Wang: edited the manuscript. Lingyin Zhang: pre-processed the data. Qian Chen: took care of the visualization. Shuling Yu: did the review. Jintao Yuan: was responsible for visualization, review, editing, and supervision.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The data used in this work and developed models are freely available in the Supplementary materials section.