ABSTRACT
This study compares the performance of gradient boosting decision tree (GBDT), artificial neural networks (ANNs), and random forests (RF) methods in LUC modeling in the Seoul metropolitan area. The results of this study showed that GBDT and RF have higher predictive power than ANN, indicating that tree-based ensemble methods are an effective technique for LUC prediction. Along with the outstanding predictive performance, the DT-based ensemble models provide insights for understanding which factors drive LUCs in complex urban dynamics with the relative importance and nonlinear marginal effects of predictor variables. The GBDT results indicate that distance to the existing residential site has the highest contribution to urban land use conversion (30.4% of the relative importance), while other significant predictor variables were proximity to industrial and public sites (combined 32.3% of relative importance). New residential development is likely to be adjacent to existing residential sites, but nonresidential development occurs at a distance (about 600 m) from such sites. The distance to the central business district (CBD) had increasing marginal effects on residential land use conversion, while no significant pattern was found for nonresidential land use conversion, indicating that Seoul has experienced more population suburbanization than employment decentralization.
Data and codes availability statement
The codes and data that support the findings of the present study are available on Figshare at (10.6084/m9.figshare.12749813).
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1. The SMA has been selected as the case study area because the SMA, the capital region of South Korea, has experienced massive complex LUCs in the past few decades, making it a good test bed for building ML-based LUC modeling. The SMA has experienced rapid LUC over the last several decades via suburbanization and decentralization of population and employment. The SMA is one of the largest and densest cities in the world in terms of its population, which increased five-fold from 5.2 million in 1960 to 25.3 million in 2015 (Korean Statistical Information Service: http://kosis.kr).
2. LIMS is the parcel boundary map provided by National Spatial Data Infrastructure Portal. The data are downloadable at http://data.nsdi.go.kr/dataset/12771.
3. This study conducted a Bayesian optimization process for the selected ML models with the train dataset after the dataset was randomly split 70%:30% into train and test datasets, respectively. The optimal hyperparameters for the GBDT were learning rate = 0.1, number of iterations = 1415, maximum depth = 8, minimum sample split = 10, minimum sample leaf = 10, and subsample = 0.9, while those for the ANN were learning rate = 0.000392, num_dense_layers = 4, num_dense_nodes = 512, dropout = 0.2, epoch = 200, batch_size = 126, and activation = ‘relu.’ The optimal hyperparameters for the RF were number of iterations = 2000, maximum depth = 40, minimum sample split = 10, minimum sample leaf = 10, and max_features = 8.
Additional information
Notes on contributors
Myung-Jin Jun
Myung-Jin Jun is professor in the Department of Urban Planning and Real Estate at Chung-Ang University, South Korea. His research interests include urban modelling, urban big data analysis, and machine learning, and their applications in urban sciences.