0
Views
0
CrossRef citations to date
0
Altmetric
Research Article

DCAI-CLUD: a data-centric framework for the construction of land-use datasets

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Received 05 Jan 2024, Accepted 25 Jul 2024, Published online: 05 Aug 2024
 

Abstract

A high-quality land-use dataset is crucial for constructing a high-performance land-use classification model. Due to the complexity and spatial heterogeneity of land-use, the dataset construction process is inefficient and costly. This challenge affects the quality of datasets, consequently impacting the model’s performance. The emerging field of Data-Centric Artificial Intelligence (DCAI) is expected to deliver techniques for dataset optimization, offering a promising solution to the problem. Therefore, this study proposes a data-centric framework named DCAI-CLUD for the construction of land-use datasets. Based on this framework, the accuracy and rate of data labeling are improved by 5.93 and 28.97%. The Gini index of the dataset and the proportion of samples with non-mixed land-use categories are enhanced by 3.27 and 8.52%. The overall accuracy (OA) and Kappa of the land-use classification model improved significantly by 27.87 and 58.08%. This study is the first to introduce DCAI into the field of geographic information and remote sensing and verify its effectiveness. The proposed framework can effectively improve the construction efficiency and quality of the dataset and synchronously optimize the model performance. Based on the proposed framework, we constructed a multi-source land-use dataset of major cities in China named CN-MSLU-100K.

HIGHLIGHTS

  1. A framework for optimizing the land-use dataset construction process is proposed.

  2. Filtering and pre-labeling improved the quality and efficiency of data labeling.

  3. The performance of land-use classification model is enhanced by dataset optimization.

  4. Preconceived results have a subjective impact on the data labelers.

  5. The first study to introduce DCAI for land-use classification is launched.

Acknowledgement

We are deeply grateful to Professor Yuan May, Dr. Andreas Züfle, and the anonymous reviewers for their constructive comments and suggestions on our paper. We also extend our sincere thanks to the young volunteers who contributed significantly to this human-computer collaboration project.

Disclosure statement

No conflict of interest exists in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part.

Data and codes availability statement

The CN-MSLU-100K dataset cannot be shared publicly due to the copyright reasons. However, readers can access the dataset upon request. The CN-MSLU-DEMO dataset are publicly available at http://doi.org/10.6084/m9.figshare.24942510. We have already provided a website for full data application and download at https://urbancomp.net/s/cn-mslu-100k-land-use-classification-dataset-at-block-scale-for-multi-source-spatio-temporal-dataen. The ‘human-computer collaborative’ data annotation method mentioned in Section 2.2 is operated based on a data annotation platform of Alibaba, so the code of the platform cannot be disclosed due to the copyright issues. The rest of the code and sample data used to reproduce our work are publicly available at http://doi.org/10.6084/m9.figshare.24942510.

Additional information

Funding

This work was supported by the National Key Research and Development Program of China [2023YFB3906803], the Alibaba Group through Alibaba Innovation Research Program [No. 20228670], the National Natural Science Foundation of China [42171466]; the ‘CUG Scholar’ Scientific Research Funds at China University of Geosciences (Wuhan) [2022034] and a Guangdong-Hong Kong-Macau Joint Laboratory Program [2020B1212030009].

Notes on contributors

Hao Wu

Hao Wu has obtained his master’s degree from China University of Geosciences (Wuhan). He is currently working at the State Grid Corporation of China. His research interests are geospatial big data mining, data-centric urban modeling. He contributed to the methodology, software developing, writing – original draft, visualization, writing – review and editing.

Zhangwei Jiang

Zhangwei Jiang is a staff algorithm engineer at Alibaba Group. His research interests are LBS data mining and research & recommendation algorithm. He contributed to the project administration, conceptualization, data curation, investigation, methodology, writing – original draft, writing – review and editing.

Anning Dong

Anning Dong has obtained his master’s degree from China University of Geosciences (Wuhan). He is currently working at the State Administration of Foreign Exchange in China. His research interests are spatiotemporal big data mining and crime geography. He contributed to the methodology, data curation, software developing, validation, writing – original draft, writing – review and editing.

Ronghui Gao

Ronghui Gao is a graduate student at China University of Geosciences (Wuhan). His research interests are geospatial big data mining, Interpretability of urban models. He contributed to the methodology, validation, writing – original draft, writing – review and editing.

Xiaoqin Yan

Xiaoqin Yan is currently a Ph.D. student in GIScience at the Institute of Remote Sensing and Geographical Information Systems, Peking University, Beijing. His research interests are spatio-temporal big data computing and social sensing. He contributed to the methodology, data curation, validation, writing – original draft, writing – review and editing.

Zhihui Hu

Zhihui Hu is a graduate student at China University of Geosciences (Wuhan). His research interests are geospatial big data mining, land use classification and trajectory representation learning. He contributed to the methodology, validation, writing – original draft, writing – review and editing.

Fengling Mao

Fengling Mao is an algorithm engineer at Alibaba Group. Her research interests are trajectory pattern mining and spatiotemporal data embedding. She contributed to the methodology, data curation, validation, software developing, writing – review and editing.

Hong Liu

Hong Liu is a senior staff algorithm engineer at Alibaba Group. His research interests are data mining and research&recommendation algorithm. He contributed to the conceptualization, investigation, methodology, writing – review and editing.

Pengxuan Li

Pengxuan Li is a senior staff data engineer at Alibaba Group. His research interests are data mining and data science. the methodology, validation, software developing, writing – review and editing.

Peng Luo

Peng Luo has obtained his Ph.D. from the Chair of Cartography and Visual Analytics at the Technical University of Munich, Germany. He is about to join the Senseable City Lab at the Massachusetts Institute of Technology. His research interests include spatial association modelling, social sensing, and applied artificial intelligence. He contributed to the validation, writing – original draft, writing – review and editing.

Zijin Guo

Zijin Guo has obtained his master’s degree from China University of Geosciences (Wuhan). He is currently working at the Changjiang Water Resources Commission in China. His research interests are trajectory data mining and complex network analysis. He contributed to the validation, writing – original draft, writing – review and editing.

Qingfeng Guan

Qingfeng Guan is a professor at China University of Geosciences (Wuhan). His research interests are high-performance spatial intelligence computation and urban computing. He contributed to the supervision, writing – review and editing.

Yao Yao

Yao Yao is a Professor at China University of Geosciences (Wuhan) and a researcher at the University of Tokyo. His research interests are geospatial big data mining, analysis, and computational urban science. He contributed to the supervision, project administration, conceptualization, data curation, investigation, methodology, writing – original draft, visualization, writing – review and editing.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.