Abstract
This paper presents a novel method, Area and Feature Guided Regularised Random Forest (AFGRRF), applied for modelling binary geographic phenomenon (occurrence versus absence). AFGRRF is a wrapper feature-selection method based on a previous modification of Random Forest (RF), namely the Guided Regularised Random Forest (GRRF). AFGRRF produces maps that minimise the affected area without a significant difference in accuracy. For this, it tunes the GRRF hyper-parameters according to a trade of between True Positive Rate and the affected area (Success Rate). AFGRRF also addresses the ‘Rashomon effect’ or the multiplicity of good models. The proposed method was tested to model illegal landfills in Gran Canaria Island (Spain). AFGRRF performance was compared to that of other RF-based methods: (i) standard RF; (ii) Area Random Forest (ARF); (iii) Feature Random Forest (FRF); (iv) Area Feature Random Forest (AFRF) and (v) GRRF. AFGRRF predicted the smallest affected area, 19% of the island, at a similar True Positive Rate. This percentage is substantially smaller than the one predicted by RF (27.43%), ARF (26%), FRF (27.78%), AFRF (23%) and GRRF (29.67%).
Disclosure statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data and codes availability statement
The data and codes that support the findings of this study are available at https://github.com/AFGRRF/Area-Feature-Guide-Regularised-Random-Forest. The proposed AFGRRF code requires the following R libraries: RRF, Raster, Rgdal, and ROC written by others who are not affiliated with the research.
Additional information
Funding
Notes on contributors
Lorenzo Carlos Quesada-Ruiz
Lorenzo Carlos Quesada-Ruiz received the B.Sc. degree in geography from the University of Las Palmas de Gran Canaria, M.Sc. Geographical Information Sciences and Remote sensing from the University of Zaragoza, and Ph.D. in Geography from the University of Seville. He is Margarita-Salas postdoctoral fellowship at the University of Seville. His current research focuses on the spatial analysis applied to environmental problems.
Victor Francisco Rodriguez-Galiano
Victor Francisco Rodriguez-Galiano received the B.Sc. degree in environmental sciences, the M. Eng. degree in geodesy and cartography and Ph.D. degree in remote sensing from the University of Granada, Spain. He is associate professor at the Department of Geography of the University of Seville. His current research focuses on machine learning for addressing environmental problems and satellite derived land surface phenology and its validation with ground data.
Raúl Zurita-Milla
Raul Zurita-Milla received the Agricultural Engineering degree from the University of Cordoba (Spain), and the M.Sc. and Ph.D. degrees in Geo-information Science and Earth observation from Wageningen University. He is full professor and head of the Geo-Information Department at the Faculty ITC of the University of Twente. His current research focuses on the use of data-driven approaches for modelling seasonal processes.
Emma Izquierdo-Verdiguier
Emma Izquierdo-Verdiguier received the B.Sc. degree in physics and the M.Sc. and Ph.D. degrees in remote sensing from the University of Valencia, Spain. She is assistant postdoc in BOKU and a Google Developer Expert. Her research interests are the use of machine learning for Earth Observation data analysis and cloud computing environment for land surface monitoring.