772
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Consequences of spatial structure in soil–geomorphic data on the results of machine learning models

, , , , , & show all
Article: 2245381 | Received 26 Apr 2023, Accepted 02 Aug 2023, Published online: 16 Aug 2023

References

  • Abu-Mostafa YS, Magdon-Ismail M, Lin HT. 2012. Learning from data. United States: AMLBook.
  • Angelini ME, Heuvelink GBM. 2018. Including spatial correlation in structural equation modelling of soil properties. Spatial Stat. 25:35–51. doi: 10.1016/j.spasta.2018.04.003.
  • Anselin L. 2002. Under the hood: issues in the specification and interpretation of spatial regression models. Agric Econ. 27(3):247–267. doi: 10.1111/j.1574-0862.2002.tb00120.x.
  • Behrens T, Schmidt K, MacMillan RA, Rossel RAV. 2018a. Multi-scale digital soil mapping with deep learning. Sci Rep. 8(1):15244. doi: 10.1038/s41598-018-33516-6.
  • Behrens T, Schmidt K, Viscarra Rossel RA, Gries P, Scholten T, MacMillan R. 2018b. Spatial modelling with Euclidean distance fields and machine learning. Eur J Soil Sci. 69(5):757–770. doi: 10.1111/ejss.12687.
  • Behrens T, Schmidt K, MacMillan RA, Rossel RAV. 2018c. Multiscale contextual spatial modelling with the Gaussian scale space. Geoderma. 310:128–137. doi: 10.1016/j.geoderma.2017.09.015.
  • Bel L, Laurent JM, Bar-Hen A, Allard D, Cheddadi R. 2005. A spatial extension of CART: application to classification of ecological data. In: Renard P, Demougeot-Renard H, Froidevaux R, editors. Geostatistics for environmental applications. Berlin: Springer; p. 99–109.
  • Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM. 2016. Mlr: Machine learning in R. J Mach Learn Res. 17:1–5.
  • Bivand RS, Pebesma E, Gómez-Rubio V. 2013. Applied spatial data analysis with R. New York, NY: Springer.
  • Breiman L. 2001. Random forests. Mach Learn. 45(1):5–32. doi: 10.1023/A:1010933404324.
  • Brungard CW, Boettinger JL, Duniway MC, Wills SA, Edwards TC. 2015. Machine learning for predicting soil classes in three arid landscapes. Geoderma. 239–240:68–83. doi: 10.1016/j.geoderma.2014.09.019.
  • Burrough PA. 2001. GIS and geostatistics: essential partners for spatial analysis. Environ Ecol Stat. 8(4):361–377. doi: 10.1023/A:1012734519752.
  • Chang CC, Lin CJ. 2011. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2(3):1–27. doi: 10.1145/1961189.1961199.
  • Currit N. 2002. Inductive regression: overcoming OLS limitations with the general regression neural network. Comput Environ Urban Syst. 26(4):335–353. doi: 10.1016/S0198-9715(01)00045-X.
  • Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, Marquéz JRG, Gruber B, Lafourcade B, Leitão PJ, et al. 2013. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography. 36(1):27–46. doi: 10.1111/j.1600-0587.2012.07348.x.
  • Drucker H, Surges CJC, Kaufman L, Smola A, Vapnik V. 1997. Support vector regression machines. Adv Neural Inf Process Syst. 1:155–161.
  • Ferraciolli MA, Bocca FF, Rodrigues LHA. 2019. Neglecting spatial autocorrelation causes underestimation of the error of sugarcane yield models. Comput Electron Agric. 161:233–240. doi: 10.1016/j.compag.2018.09.003.
  • Feurer M, Hutter F. 2019. Hyperparameter optimization. In: Hutter F, Kotthoff L, Vanschoren J, editors. Automated machine learning: methods, systems, challenges. The Springer series on challenges in machine learning. Cham: Springer International Publishing; p. 3–33. doi: 10.1007/978-3-030-05318-5_1.
  • Fisher A, Rudin C, Dominici F. 2019. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res. 20:1–81.
  • Gaspard G, Kim D, Chun Y. 2019. Residual spatial autocorrelation in macroecological and biogeographical modeling: a review. J Ecol Environ. 43:19. doi: 10.1186/s41610-019-0118-3.
  • Georganos S, Grippa T, Niang Gadiaga A, Linard C, Lennert M, Vanhuysse S, Mboga N, Wolff E, Kalogirou S. 2021. Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int. 36(2):121–136. doi: 10.1080/10106049.2019.1595177.
  • Getis A, Griffith DA. 2002. Comparative spatial filtering in regression analysis. Geogr Anal. 34(2):130–140. doi: 10.1111/j.1538-4632.2002.tb01080.x.
  • Grekousis G. 2019. Artificial neural networks and deep learning in urban geography: a systematic review and meta-analysis. Comput Environ Urban Syst. 74:244–256. doi: 10.1016/j.compenvurbsys.2018.10.008.
  • Griffith DA. 2000. A linear regression solution to the spatial autocorrelation problem. J Geograph Syst. 2(2):141–156. doi: 10.1007/PL00011451.
  • Griffith DA. 1996. Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data. Canadian Geographer. 40(4):351–367. doi: 10.1111/j.1541-0064.1996.tb00462.x.
  • Griffith DA. 2003. Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Berlin: Springer-Verlag.
  • Grömping U. 2009. Variable importance assessment in regression: linear regression versus random forest. Am Stat. 63(4):308–319. doi: 10.1198/tast.2009.08199.
  • Guo Z, Adhikari K, Chellasamy M, Greve MB, Owens PR, Greve MH. 2019. Selection of terrain attributes and its scale dependency on soil organic carbon prediction. Geoderma. 340:303–312. doi: 10.1016/j.geoderma.2019.01.023.
  • Gupta S, Lehmann P, Bonetti S, Papritz A, Or D. 2021. Global prediction of soil saturated hydraulic conductivity using random forest in a covariate-based geotransfer function (CoGTF) framework. J Adv Model Earth Syst. 13(4):e2020MS002242. doi: 10.1029/2020MS00224.
  • Guyon I, Weston J, Barnhill S, Vapnik V. 2002. Gene selection for cancer classification using support vector machines. Mach Learn. 46(1/3):389–422. doi: 10.1023/A:1012487302797.
  • Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. 2015. The extent and consequences of p-hacking in science. PLoS Biol. 13(3):e1002106. doi: 10.1371/journal.pbio.1002106.
  • Hengl T, Nussbaum M, Wright MN, Heuvelink GBM, Gräler B. 2018. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ. 6:e5518. doi: 10.7717/peerj.5518.
  • Heung B, Ho HC, Zhang J, Knudby A, Bulmer CE, Schmidt MG. 2016. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma. 265:62–77. doi: 10.1016/j.geoderma.2015.11.014.
  • Heuvelink GBM, Webster R. 2022. Spatial statistics and soil mapping: a blossoming partnership under pressure. Spatial Stat. 50:100639. doi: 10.1016/j.spasta.2022.100639.
  • Jiang Z, Li Y, Shekhar S, Rampi L, Knight J. 2017. Spatial ensemble learning for heterogeneous geographic data with class ambiguity: a summary of results. Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 23. p. 1–10. doi: 10.1145/3139958.3140044.
  • Kanevski M, Timonin V, Pozdnukhov A. 2009. Machine learning for spatial environmental data: theory, applications, and software. Boca Raton: CRC Press LLC.
  • Kaya F, Mishra G, Francaviglia R, Keshavarzi A. 2023. Combining digital covariates and machine learning models to predict the spatial variation of soil cation exchange capacity. Land. 12(4):819. doi: 10.3390/land12040819.
  • Khaledian Y, Miller BA. 2020. Selecting appropriate machine learning methods for digital soil mapping. Appl Math Modell. 81:401–418. doi: 10.1016/j.apm.2019.12.016.
  • Kim D. 2021. Predicting the magnitude of residual spatial autocorrelation in geographical ecology. Ecography. 44(7):1121–1130. doi: 10.1111/ecog.05403.
  • Kim D, Song I. 2021. Predicting model improvement by accounting for spatial autocorrelation: a socio-economic perspective. Prof Geogr. 73(1):131–149. doi: 10.1080/00330124.2020.1812408.
  • Kim D, Hirmas DR, McEwan RW, Mueller TG, Park SJ, Šamonil P, Thompson JA, Wendroth O. 2016. Predicting the influence of multi-scale spatial autocorrelation on soil–landform modeling. Soil Sci Soc Am J. 80(2):409–419. doi: 10.2136/sssaj2015.10.0370.
  • Kim D, Šamonil P, Jeong G, Tejnecký V, Drábek O, Hruška J, Park SJ. 2019. Incorporation of spatial autocorrelation improves soil–landform modeling at A and B horizons. Catena. 183:104226. doi: 10.1016/j.catena.2019.104226.
  • Kühn I, Dormann CF. 2012. Less than eight (and a half) misconceptions of spatial analysis. J Biogeogr. 39(5):995–998. doi: 10.1111/j.1365-2699.2012.02707.x.
  • Kuhn M, Johnson K. 2013. Applied predictive modeling. New York, NY: Springer.
  • Lark RM. 2012. Towards soil geostatistics. Spatial Stat. 1:92–99. doi: 10.1016/j.spasta.2012.02.001.
  • Legendre P. 1993. Spatial autocorrelation: trouble or new paradigm? Ecology. 74(6):1659–1673. doi: 10.2307/1939924.
  • Makungwe M, Chabala LM, Chishala BH, Lark RM. 2021. Performance of linear mixed models and random forests for spatial prediction of soil pH. Geoderma. 397:115079. doi: 10.1016/j.geoderma.2021.115079.
  • Ma Y, Minasny B, Malone BP, McBratney AB. 2019. Pedology and digital soil mapping (DSM). Eur J Soil Sci. 70(2):216–235. doi: 10.1111/ejss.12790.
  • McBratney AB, Santos MLM, Minasny B. 2003. On digital soil mapping. Geoderma. 117(1–2):3–52. doi: 10.1016/S0016-7061(03)00223-4.
  • Miralha L, Kim D. 2018. Accounting for and predicting the influence of spatial autocorrelation in water quality modeling. IJGI. 7(2):64. doi: 10.3390/ijgi7020064.
  • Molnar C. 2022. Interpretable machine learning. Lulu. com. Victoria, BC: Leanpub.
  • Möller M, Zepp S, Wiesmeier M, Gerighausen H, Heiden U. 2022. Scale-specific prediction of topsoil organic carbon contents using terrain attributes and SCMaP soil reflectance composites. Remote Sens. 14(10):2295. doi: 10.3390/rs14102295.
  • Montgomery DC, Peck EA, Vining GG. 2012. Introduction to linear regression analysis. Hoboken, NJ: Wiley.
  • Nussbaum M, Spiess K, Baltensweiler A, Grob U, Keller A, Greiner L, Schaepman ME, Papritz A. 2018. Evaluation of digital soil mapping approaches with large sets of environmental covariates. Soil. 4(1):1–22. doi: 10.5194/soil-4-1-2018.
  • Nikparvar B, Thill JC. 2021. Machine learning of spatial data. IJGI. 10(9):600. doi: 10.3390/ijgi10090600.
  • O’Callaghan JF, Mark DM. 1984. The extraction of drainage networks from digital elevation data. Comput Vision Graphics Image Process. 28(3):323–344. doi: 10.1016/S0734-189X(84)80011-0.
  • R Core Team. 2021. R: the R Project for statistical computing.
  • Schölkopf B, Smola AJ, Williamson RC, Bartlett PL. 2000. New support vector algorithms. Neural Comput. 12(5):1207–1245. doi: 10.1162/089976600300015565.
  • Schölkopf B, Smola AJ. 2001. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.
  • Schratz P, Muenchow J, Iturritxa E, Richter J, Brenning A. 2019. Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data. Ecol Modell. 406:109–120. doi: 10.1016/j.ecolmodel.2019.06.002.
  • Sekulić A, Kilibarda M, Heuvelink GBM, Nikolić M, Bajat B. 2020. Random forest spatial interpolation. Remote Sens. 12(10):1687. doi: 10.3390/rs12101687.
  • Sergeev AP, Buevich AG, Baglaeva EM, Shichkin AV. 2019. Combining spatial autocorrelation with machine learning increases prediction accuracy of soil heavy metals. Catena. 174:425–435. doi: 10.1016/j.catena.2018.11.037.
  • Shukla G, Garg RD, Srivastava HS, Garg PK. 2018. An effective implementation and assessment of a random forest classifier as a soil spatial predictive model. Int J Remote Sens. 39(8):2637–2669. doi: 10.1080/01431161.2018.1430399.
  • Sinha P, Gaughan AE, Stevens FR, Nieves JJ, Sorichetta A, Tatem AJ. 2019. Assessing the spatial sensitivity of a random forest model: application in gridded population modeling. Comput Environ Urban Syst. 75:132–145. doi: 10.1016/j.compenvurbsys.2019.01.006.
  • Svetnik V, Liaw A, Tong C, Wang T. 2004. Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules [series title: lecture notes in computer science]. In: T Kanade, J Kittler, JM Kleinberg, F Mattern, JC Mitchell, M Naor, O Nierstrasz, C Pandu Rangan, B Steffen, M Sudan, D Terzopoulos, D Tygar, MY Vardi, G Weikum, F Roli, J Kittler, T Windeatt, editors, Multiple classifier systems. Berlin Heidelberg: Springer; p. 334–343. doi: 10.1007/978-3-540-25966-4_33.
  • Tiefelsdorf M, Griffith DA. 2007. Semiparametric filtering of spatial autocorrelation: the eigenvector approach. Environ Plan A. 39(5):1193–1221. doi: 10.1068/a37378.
  • van der Westhuizen S, Heuvelink GBM, Hofmeyr DP, Poggio L. 2022. Measurement error-filtered machine learning in digital soil mapping. Spatial Stat. 47:100572. doi: 10.1016/j.spasta.2021.100572.
  • Venables WN, Ripley BD. 2002. Modern applied statistics with S. New York, NY: Springer.
  • Wadoux AMJC. 2019. Using deep learning for multivariate mapping of soil with quantified uncertainty. Geoderma. 351:59–70. doi: 10.1016/j.geoderma.2019.05.012.
  • Wadoux AMJC, Minasny B, McBratney AB. 2020. Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth-Sci Rev. 210:103359. doi: 10.1016/j.earscirev.2020.103359.
  • Wadoux AMJC, Padarian J, Minasny B. 2019. Multi-source data integration for soil mapping using deep learning. Soil. 5(1):107–119. doi: 10.5194/soil-5-107-2019.
  • Wright MN, Ziegler A. 2017. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Soft. 77(1):17. doi: 10.18637/jss.v077.i01.
  • Zevenbergen LW, Thorne CR. 1987. Quantitative analysis of land surface topography. Earth Surf Process Landforms. 12(1):47–56. doi: 10.1002/esp.3290120107.