ABSTRACT
Quantitative structure–property relationship (QSPR) modelling has been used in many scientific fields. This approach has been extensively applied in environmental research to predict physicochemical properties of compounds with potential environmental impact. The soil sorption coefficient is an important parameter for the evaluation of environmental risks, and it helps to determine the final fate of substances in the environment. In the last few years, different QSPR models have been developed for the determination of the sorption coefficient. In this study, several QSPR models were generated and evaluated for the prediction of log Koc from the relationship with log P. These models were obtained from an extensive and diverse training set (n = 639) and from subsets of this initial set (i.e. halves, fourths and eighths). The aim of this study was to investigate whether the size of the training set affects the statistical quality of the obtained models. Furthermore, statistical equivalence was verified between the models obtained from smaller sets and the model obtained from the total training set. The results confirmed the equivalence between the models, thus indicating the possibility of using smaller training sets without compromising the statistical quality and predictive capability, as long as most chemical classes in the test set are represented in the training set.
Acknowledgements
The authors thank the National Council for Scientific and Technological Development/Ministry of Science and Technology of Brazil (CNPq/MCT/Brazil) for the financial support and the QSAR Research Group in Environmental Chemistry and Ecotoxicology of the Department of Theoretical and Applied Sciences of the University of Insubria (DiSTA/UNINSUBRIA) for providing the QSARINS 2.2.2 software.
Disclosure statement
The authors report no conflict of interests.
SUPPLEMENTAL DATA
Supplemental data for this article can be accessed at: https://doi.org/10.1080/1062936X.2019.1586759