144
Views
0
CrossRef citations to date
0
Altmetric
Method

Automatic training sample collection utilizing multi-source land cover products and time-series Sentinel-2 images

ORCID Icon, ORCID Icon, , , , , & show all
Article: 2352957 | Received 29 Nov 2023, Accepted 06 May 2024, Published online: 14 May 2024

ABSTRACT

Collecting reliable training samples plays a crucial role in improving the accuracy of land cover (LC) mapping products, which are essential foundational data for global environmental and climate change research. However, the process is labor-intensive and time-consuming, as it heavily relies on human interpretation. This article proposes an automatic training sample collection approach (ATSC) that utilizes multi-source LC products and time-series Sentinel-2 images. Firstly, a preliminary sample dataset was generated by fusing multiple LC products with the weighted majority voting (WMV) algorithm. Secondly, a locally selective combination in parallel outlier ensembles (LSCP) anomaly detection algorithm was applied to filter abnormal samples. The results revealed that (1) the China Land Cover Dataset (CLCD) had the highest overall accuracy (73.22%), and the ESRI Land Cover (ESRI) had the lowest overall accuracy (59.93%). Tree cover, built area, and water showed high accuracy across all products, while shrubland and wetland generally had low accuracy. (2) The average accuracy of the preliminary training samples for the four study areas was 95.62%. However, there were still abnormal samples, such as classification errors, LC changes within a year, and spectral anomalies. (3) Using the LSCP algorithm, 70.10% of the abnormal samples were removed, resulting in a final training sample accuracy that exceeded 97.95% in each region. The ATSC approach provides higher-quality training samples for LC classification and facilitates large-scale LC mapping initiatives.

1. Introduction

LC refers to various biological or physical cover types on the Earth’s surface, primarily to the natural properties of the land (Di Gregorio and Jansen Citation2000). Accurate and extensive LC mapping provides foundational data support for scientific research on the relationships between natural and biological activities, spatial patterns of surface cover, simulations of ecological environment changes, monitoring and evaluation, and human societal and economic development (Cao et al. Citation2015; Giri et al. Citation2013; Kayet et al. Citation2016). With the rapid development of remote sensing technology and easier access to aerial and satellite imagery, low-cost and efficient acquisition of LC data has become possible (Z. Chen et al. Citation2019; Congalton et al. Citation2014).

Numerous LC products have been released worldwide. Many early global-scale products, such as the International Geosphere-Biosphere Programme (IGBP) DISCover (Loveland and Belward Citation1997), University of Maryland (UMD) (Hansen and Reed Citation2000), and Global Land Cover dataset for 2000 (GLC2000) (T. R. Loveland et al. Citation2000), were created using data from the AVHRR sensor, resulting in a low spatial resolution (1 km). Later, MODIS at 500-meter spatial resolution and MERIS at 300-meter spatial resolution became the primary data sources for LC mapping. Several products with medium spatial resolution, such as the Global Land Cover by National Mapping Organization (GLCNMO) (Shirahata et al. Citation2017), MODIS Land Cover (MCD12Q1) (Friedl et al. Citation2010), and Land Cover-Climate Change Initiative (LC-CCI) (Bontemps et al. Citation2012), were subsequently developed. In recent years, large stocks of Landsat and Sentinel-2 images were used to produce LC products with 30-meter and 10-meter spatial resolution. These fine-resolution products include Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC) (Gong et al. Citation2013), Global Land Cover Mapping at 30 m Resolution (GlobeLand30) (J. Chen et al. Citation2017), Dynamic World (Brown et al. Citation2022), and ESA WorldCover (Zanaga et al. Citation2021). It is important to note that the accuracy of LC products varies. In the best cases, these values are approximately 80% and are still influenced by uncertainties. Previous researchers have evaluated and compared various LC products in specific regions, revealing persistent challenges of low accuracy, particularly in the shrubland, wetland, and grassland categories (L. Liu et al. Citation2021; Z. Wang and Mountrakis Citation2023; J. Wang et al. Citation2022; T. Zhao et al. Citation2023). Further endeavors are warranted to enhance the overall quality of LC maps.

Many scholars advocate reducing the uncertainty in LC maps by fusing existing multi-source LC products. Schepaschenko et al. (Citation2011) used a suitability index to fuse multiple data sources, generating a LC map for the Russian region. Kinoshita et al. (Citation2014) employed a logistic regression model to fuse six global-scale LC products, revealing that the number of products significantly impacts the accuracy of the fused results. A. Pérez-Hoyos et al. (Citation2012) employed fuzzy set theory to fuse four LC products within the European region, resulting in improved accuracy. Consequently, the fusion of multi-source products has become a crucial approach for enhancing map accuracy as the number of available LC products continues to increase.

Training samples are critical for training classifiers and directly impact the accuracy and reliability of LC mapping results (Foody and Mathur Citation2006). There are two approaches to collecting training samples. The first and most commonly used approach is interpretation-based, which yields high-quality samples, but a large amount of manual work is required for large-scale mapping (Calderón-Loor, Hadjikakou, and Bryan Citation2021; M. Li et al. Citation2022). The second approach involves collecting training samples from existing LC products and has been proven to offer advantages such as fully automatic collection and the production of a large and geographically distributed training dataset (Colditz et al. Citation2011; Hermosilla et al. Citation2022; Hu, Dong, and Batunacun Citation2018; Radoux et al. Citation2014; H. K. Zhang and Roy Citation2017; X. Zhang et al. Citation2021). However, in many studies, LC products with lower spatial resolution were often chosen when the second approach was employed, and training samples were mostly collected from an individual product. Classification errors and LC changes can affect training sample reliability, particularly in regions with high landscape fragmentation, leading to potentially low-quality training samples. Many fine spatial resolution LC products have emerged recently, yet few scholars have fused them for training sample collection. The reliability of information obtained from a single source of LC products is generally lower than that obtained from the fusion of multiple products (Ran et al. Citation2012). The fusion of multiple LC products can compensate for the deficiencies of individual products, reduce errors, and enhance the credibility of collected training samples. LC products with fine spatial resolution can provide a wealth of spatial detail. Training samples collected from products with fine spatial resolution will be more accurate and applicable to fine-resolution LC mapping. Additionally, many previous approaches did not consider the issue of missing satellite images in the locations of collected sample points. Model performance can be affected by algorithms sensitive to missing values (C. Zhang, Zhang, and Tian Citation2023). Therefore, utilizing multiple fine spatial resolution LC products and satellite images is necessary to collect high-quality and less cloud-covered training samples.

Researchers have implemented various approaches to remove outliers during the automatic collection of training samples due to the inherent uncertainties in LC products. Radoux et al. (Citation2014) removed pixels in fringe areas through morphological processing and excluded pixels with spectral anomalies based on Mahalanobis distance measurements. Zhang et al. (Citation2021) removed abnormal points based on the spectral statistical distribution of training samples. Jin et al. (Citation2022) used principal component analysis (PCA) to reduce the dimensions of 68 spectral features and remove outlier pixels by a statistical approach. Wen et al. (Citation2022) utilized NDVI time-series data to calculate the monthly mean (μ) and standard deviation (σ), removing corn samples falling outside the range of μ±σ in the NDVI. However, many outlier removal approaches do not consider time-series spectral features, which results in their inability to detect anomalies caused by LC changes within a year. Most of these approaches utilize statistically-based outlier removal techniques, assuming that the data are drawn from a specific distribution (Chandola, Banerjee, and Kumar Citation2009), often a normal distribution. However, the assumption often does not hold true for higher-dimensional real datasets. Even when the statistical assumption can be reasonably justified, several hypothesis test statistics can be applied to detect anomalies; choosing the best statistic is often not a straightforward task. In recent years, anomaly detection has garnered more attention from machine learning researchers. Numerous novel algorithms have been proposed and applied across various domains, including financial fraud detection, network virus attack warning, and natural disaster prevention (J. Li et al. Citation2023; Nassif et al. Citation2021; Proverbio, Bertola, and Smith Citation2018; Zuo et al. Citation2023). Regrettably, many advanced anomaly detection algorithms have not been employed to eliminate outliers in LC training samples. In contrast to statistical techniques, machine learning-based anomaly detection algorithms can adapt to various types of data distributions, including high-dimensional complex cases. Machine learning approaches have broader applicability and more robust performance while also providing outlier scores to aid in understanding the severity of anomalies (Han et al. Citation2022; Nassif et al. Citation2021). In particular, anomaly detection algorithms based on ensemble learning can enhance the accuracy and robustness of detection by reducing the dependence on individual detectors (Ouyang et al. Citation2021; J. Zhang et al. Citation2019).

To fill these research gaps, an approach for automatically collecting LC training samples has been developed. This approach can reduce the uncertainty in automatic training sample collection, yielding higher-quality and more representative LC training samples. It is applicable to LC classification with fine spatial resolution. The main content included the following: (1) The accuracy of multiple LC products was evaluated using a unified validation dataset. (2) The category weight values were computed according to user accuracy, and multiple LC products were fused based on the WMV algorithm. (3) Stable regions, which were high-confidence and cloud-free regions, were extracted from the fused LC map. Subsequently, training samples were collected using a local adaptive strategy. (4) An ensemble-based anomaly detection algorithm was implemented to identify and remove abnormal samples using time-series spectral features extracted from Sentinel-2 images. The reliable training samples collected through this approach can be used for large-scale and fine-scale LC mapping in specific regions.

2. Study area and materials

2.1. Study area

Four geographically and climatically distinct regions of China were chosen as the study areas. The four regions are the Beijing-Tianjin-Hebei region (218,000 km2), Heilongjiang Province (473,000 km2), Guangdong Province (180,000 km2), and northern Xinjiang (184,000 km2, including the Bortala Mongol Autonomous Prefecture, Ili Kazakh Autonomous Prefecture, Tacheng Prefecture, Huocheng County, Kokdala County, Karamay City, Shihezi City, and Shuanghe City).

shows that these regions are located in different parts of China. Northern Xinjiang region, located in northwestern China, mainly experiences a temperate continental climate. This climate is characterized by cold winters and hot summers, with significant diurnal and annual temperature variations and relatively low annual precipitation. Both the Beijing-Tianjin-Hebei region in the north and Heilongjiang Province in the northeast have a temperate monsoon climate. Heilongjiang is situated at a higher latitude and experiences longer and colder winters. Guangdong Province, situated along the southern coast, experiences a subtropical monsoon climate characterized by warm temperatures throughout the year. Comparing results from different regions can provide a comprehensive understanding of the generalization of the proposed approach.

Figure 1. The geographical location of the four study areas. (a) Northern Xinjiang region; (b) Heilongjiang Province; (c) Guangdong Province; (d) Beijing-Tianjin-Hebei region.

Figure 1. The geographical location of the four study areas. (a) Northern Xinjiang region; (b) Heilongjiang Province; (c) Guangdong Province; (d) Beijing-Tianjin-Hebei region.

2.2. Data and pre-processing

2.2.1. Land cover products

Training samples were collected utilizing multiple LC products from 2020. Four LC products, namely Dynamic World (DW), European Space Agency WorldCover (ESA), Environmental Systems Research Institute Land Cover (ESRI), and China Land Cover Dataset (CLCD), were utilized (Brown et al. Citation2022; Zanaga et al. Citation2021; Karra et al. Citation2021; J. Yang and Huang Citation2021). All of these products can be accessed through the Google Earth Engine (GEE). The spatial resolution is 10 meters, except for the CLCD, which is 30 meters. These four LC products were chosen for their fine resolution and the provision of maps for multiple reference years, which supports sample collection across different years. However, these LC products are all subject to biases and uncertainties. Differences in classification systems (where different products have slightly different definitions for certain categories) are a significant factor influencing the consistency and biases of different products (Hao et al. Citation2023; Kang et al. Citation2022; Venter et al. Citation2022). For instance, rice paddies and irrigated/inundated agriculture are classified as flooded vegetation in the DW and ESRI but as cropland in the ESA. Furthermore, the classification methods, data sources, and pre-processing techniques can affect the consistency among various products (Hao et al. Citation2023; J. Wang et al. Citation2022). Specifically, the CLCD was obtained by constructing several temporal metrics using 335,709 Landsat images from GEE and inputting them into the random forest classifier (J. Yang and Huang Citation2021). While the DW and ESRI products are both derived from Sentinel-2 data and utilize deep learning methods, their data pre-processing methods and input features differ (Venter et al. Citation2022). It is also important to note that the accuracy validation results of the products above were obtained using different validation samples by the data producers (T. Zhao et al. Citation2023). The quantity and quality of the validation dataset can also lead to errors in the evaluation results.

Pre-processing of the LC products was carried out on GEE. Given that DW provides near-real-time LC data, the annual DW data were composited by taking the mode for each pixel of all available data from January 1st to 31 December 2020. The LC products were initially clipped using China’s boundary vector data. All of them were standardized to the WGS1984 coordinate system with a 10-meter spatial resolution through reprojection operations. The LC product classification systems were unified through the reclassification process. The resulting class codes (pixel values) and names are as follows: 1-Tree cover, 2-Shrubland, 3-Grassland, 4-Cropland, 5-Built/Impervious area, 6-Bare land, 7-Snow and ice, 8-Water, and 9-Wetland/Flooded vegetation. It is worth noting that the categories “Herbaceous wetland” and “Mangroves” from the ESA were merged into the “Wetland” category. The category “Moss and lichen” is absent in the other three LC products and has a limited distribution in China according to the ESA, so this category was excluded.

2.2.2. Land cover validation dataset

The novel stratified random sampling global validation dataset in 2020 (SRS_Val dataset), developed by Liu et al. (Citation2023) (https://zenodo.org/records/7846090, accessed on 12 June 2023), was used for assessing the accuracy of the LC products. The SRS_Val dataset was established using a stratified equal-area random sampling strategy and a visual interpretation method (T. Zhao et al. Citation2023). Compared to previous validation datasets, this dataset enhances the sample density of heterogeneous landscapes and rare LC types. It adopts a standardized classification system from the UN-LCCS, ensuring good compatibility and coherence with various LC products. Despite integrating multiple sources of remote sensing imagery and rigorous quality control measures to ensure high confidence in the SRS_Val dataset, some validation samples remain uncertain. It should be noted that some categories may have relatively fewer samples when assessing the SRS_Val dataset at a national scale.

Clipping and reclassification procedures were also applied to the SRS_Val. “Broadleaved forest,” “Needle-leaved forest,” and similar categories were merged into the “Tree cover.” “Rain-fed cropland” and “Irrigated cropland” were merged into the “Cropland.” “Sparse vegetation” and “Bare areas” were merged into the “Bare land.” Since the SRS_Val is intended for global-scale accuracy assessment, some categories may have limited samples within the region of China. To mitigate uncertainty in the accuracy assessment due to limited samples, the number of categories with fewer samples (e.g. wetland and snow/ice) was increased to 300 through visual interpretation of Google Earth imagery. Ultimately, there were 7,691 validation samples, as depicted in .

Figure 2. Spatial distribution of 7691 land cover validation samples in China.

Figure 2. Spatial distribution of 7691 land cover validation samples in China.

2.2.3. Satellite images

The Sentinel-2 satellite is equipped with a high-resolution multispectral imaging instrument known as the Multispectral Imager (MSI). It is widely used in land monitoring, providing imagery of vegetation, soil and water cover, inland waterways, and coastal areas (Drusch et al. Citation2012; Phiri et al. Citation2020). The satellite orbits 786 kilometers and captures imagery across 13 spectral bands with a scanning bandwidth of 290 kilometers. The satellite offers a revisit period of 10 days for a single satellite and five days for complementary coverage from two satellites, resulting in varying spatial resolutions across different spectral ranges.

The Sentinel-2 Level-2A data served as the satellite data source. The Sentinel-2 data from March to November 2020 were chosen and clipped on the GEE platform. The platform’s cloud removal function was employed to eliminate cloud and cloud shadow pixels, and the mean function was used to composite the corresponding monthly images.

3. Methods

An approach (ATSC) has been developed for automatic training sample collection using multi-source LC products and time-series satellite images. This approach allows the generation of a high-quality and dependable sample dataset. shows the technical flowchart of the ATSC approach. Multiple LC product fusion, preliminary sample collection, and spectral feature extraction were implemented on GEE. The anomaly detection algorithm was implemented using Python. Details of each component are presented in the following sections.

Figure 3. Technical flowchart of ATSC approach.

Figure 3. Technical flowchart of ATSC approach.

3.1. Fusing multi-source land cover products

The LC products utilized provide maps of LC categories, with only DW additionally providing confidence layers for each category. Therefore, the WMV algorithm for data fusion is a straightforward and suitable approach. For the simple majority voting algorithm, all LC products have the same value of votes, and they are all equal to 1. The final classification result for a given pixel is determined by selecting the category that received the most votes. It is important to note that the accuracy of each LC product varies, with disparities in classification accuracy across categories. It is often necessary to consider the differences among the products to leverage each product’s strengths and achieve more effective combination results with greater accuracy. This entails assigning different weights to the votes of each LC product during the voting process, an algorithm known as weighted majority voting (WMV) (H. Kim et al. Citation2011; Zhu et al. Citation2021). The final prediction for each pixel is done based on the highest weighted votes. The formula for the WMV algorithm is as follows:

(1) Rkx=i=1Ci,k=maxj=1Rjx=i=1wi,jCi,j(1)

In this context, wi,j denotes the weight value of LC product Ci for category j.

The weight value for each category of all products was initially set at 0.25. Later, the accuracy of the four products in China was assessed using a unified validation dataset (SRS_Val). The initial weights were adjusted based on user accuracy (UA). The choice of UA is motivated by the relatively limited number of samples in the validation dataset. The producer accuracy (PA) could be affected by sample imbalance among categories. The specific process for calculating weight values is detailed in formula (2). To measure the confidence level of the fused LC classification results, the confidence value for each pixel was defined as the highest weighted votes calculated by the WMV algorithm. The higher the consistency in the classification results of four LC products, the higher the confidence value.

(2) wi,j=UACi,ji=1UACi,j(2)

where UACi,j represents the user accuracy of category j in the LC product Ci, as in formula (3). Other accuracy evaluation metrics used include the PA, overall accuracy (OA), and kappa coefficient, as shown in formulas (4)-(6).

(3) UAclassi=xiik=1nxik(3)
(4) PAclassi=xiik=1nxki(4)
(5) OA=k=1nxkki,k=1nxik(5)
(6) Kappa=Ni=1nxiik=1ni=1nxiki=1nxkiN2k=1ni=1nxiki=1nxki(6)

where N is the number of all validation samples; n is the number of rows/columns of the matrix; xii is the element in row i, column i of the confusion matrix; and xik is the element in row i, column k of the confusion matrix (xkk and xki are similar).

3.2. Collecting preliminary training samples through local adaptive strategies

To ensure the collection of highly reliable training samples, extracting stable regions with high confidence in LC classification results and minimal cloud cover effects is essential. Two constraints were employed to extract regions with high confidence: (1) the confidence value (i.e. the highest weighted votes) of pixels after the fusion of LC products was greater than 0.7. (2) For each pixel, there is agreement in the classification results from at least three of the four LC products. These two constraint conditions help mitigate the negative impact resulting from the poor accuracy of a particular LC product in a specific category. This approach also enhances the credibility of training samples collected from categories with poor classification accuracy and consistency. Morphological processing (i.e. erode) was then applied to remove pixels in fringe areas, thereby reducing the uncertainty caused by edge effects (Wen et al. Citation2022). To ensure that the training samples collected were minimally affected by the cloud cover mask, regions without data gaps were extracted from the time-series satellite images. Training samples were collected from the ultimately extracted stable regions.

After extracting stable regions, a local adaptive strategy was used to collect preliminary training samples. A grid dataset was initially generated for each study area (each grid was 5 km × 5 km). One sample point was collected for each LC category within each grid. This strategy was chosen considering that the same LC category within a region might exhibit spectral differences influenced by geographic location. Therefore, by collecting training samples on a grid-by-grid basis, the spatial distribution of the collected samples becomes more uniform and representative. Multiple experiments revealed that setting the grid size to 5 km enables the collection of sufficient training samples to achieve LC mapping at both large regional and city-level scales. Moreover, during execution on GEE, the process does not significantly consume excessive computational time and resources because of the large number of collected samples. Furthermore, certain categories, such as wetlands, may have relatively sparse spatial distributions within the region. The final number of preliminary samples obtained from these categories may be limited. An initial threshold for the number of preliminary samples was set to address this issue. Considering that different regions may vary significantly in area, it is necessary to set different thresholds for each region (see formula (7)). In cases where the total number of preliminary samples collected for a particular category falls below the set threshold, more samples were collected for that category within each grid. To avoid collecting preliminary samples for a certain category too densely or too closely, which could affect the model’s generalization ability and classification accuracy, the maximum number of samples collected per grid is set to 5. If more than five samples are collected per grid for a specific category, sampling for that category will stop even if the total number of samples does not meet the set threshold. Notably, the grid size and sample quantity thresholds in this strategy can still be modified by the users in accordance with the research domain and specific application requirements.

(7) X=500+500S1010(7)

X denotes the initial threshold, S denotes the region’s area (in 10,000 square kilometers), and x denotes the ceiling function.

3.3. Filtering abnormal samples utilizing time-series spectral features

3.3.1. Extracting time-series spectral features

Monthly composited Sentinel-2 images from March to November were used to compute five additional spectral indices, including the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Normalized Difference Water Index (NDWI), Normalized Difference Built-Up Index (NDBI), and Bare Soil Index (BSI). Following the collection of preliminary training samples, 11 spectral features shown in were extracted for each month, resulting in a total of 99 features. The time-series spectral features were utilized to filter abnormal samples from the preliminary training samples.

Table 1. Spectral features extracted from Sentinel-2 images for filtering abnormal samples.

The time-series of satellite images from March to November was chosen after careful consideration. Firstly, this timeframe encompasses the growing season for most vegetation types (such as trees, shrubland, and grassland), allowing for an accurate reflection of the characteristics of different categories. Secondly, for certain LC categories, such as built area and bare land, the seasonal variations are relatively small, and the data range from this period is sufficient to describe their characteristics. Additionally, certain regions in China at middle and high latitudes and high altitudes may be affected by snow cover and vegetation leaffall in winter, which may not represent surface features well. Therefore, selecting data from March to November can reduce such interference and more accurately reflect surface features.

3.3.2. Anomaly detection algorithm

3.3.2.1. Local outlier factor (LOF) algorithm

An ensemble strategy for anomaly detection was employed to identify abnormal samples within the preliminary samples. The LOF serves as the base anomaly detector, determining the degree of anomaly by calculating the density difference between each data point and its neighboring points (Breunig et al. Citation2000). The LOF algorithm not only considers the density of individual sample points but also considers the density of neighboring points, making it adaptable to data with varying densities and distributions. Therefore, it is suitable for situations in geographic environments where the spectral characteristics of the same LC category may vary due to differences in climate, location, and other factors. Furthermore, the LOF algorithm is robust even for high-dimensional datasets without requiring explicit assumptions about the data distribution. Different types of outliers, including both global and local outliers, can be detected (D. Kim, Lee, and Lee Citation2020). The LOF algorithm is described in detail below, and the distances calculated in the algorithm all refer to the distance across the feature space of the derived variables.

For a given sample point xi, let Dkxi represent the distance between xi and its k-nearest neighbors, and let Lkxi denote the set of points within the k-nearest neighbor distance. Then, the reachability distance between two sample points, xi, and xj, denoted as Rkxi,xj, is calculated as follows:

(8) Rkxi,xj=maxdistxi,xj,Dkxj(8)

When j is in a dense region, and xi is far from xj, the reachability distance metric equals the actual distance. If j is situated in a sparse region, the reachability distance metric will be smoothed out by its k-nearest neighbor distance. This allows us to calculate the average reachability distance (ARkxi) of xi by averaging the reachability distance of its k-nearest neighborhood points:

(9) ARkxi=MEANjLkxiRkxi,xj(9)

The local outlier factor is the average ratio of ARkxi with respect to its k nearest neighbors xi.

(10) LOFkxi=MEANyiLkxiARkxiARkxj(10)

There are many spectral features utilized for anomaly detection, and these features may exhibit high correlation (especially in adjacent months with the same wavelength bands). Satisfactory results may not be achieved using traditional Euclidean distance metrics. Therefore, the Mahalanobis distance was used as the distance metric for the LOF algorithm. The formula for calculating the Mahalanobis distance is as follows (De Maesschalck, Jouan-Rimbaud, and Massart Citation2000):

(11) dMxi,xj=xixjTΣ1xixj(11)

where Σ is the covariance matrix of vectors xi and xj.

3.3.2.2. Ensemble-based anomaly detection algorithm: LSCP

As an ensemble approach for anomaly detection, Locally Selective Combination in Parallel Outlier Ensembles (LSCP), proposed by Zhao et al. (Citation2019), was adopted. LSCP first defines a local region for testing instances and then identifies the most competent base detectors in this local region by measuring their similarity relative to the pseudo-ground truth. More robust predictions can be achieved through this ensemble process (Y. Zhao et al. Citation2019). The specific steps are as follows:

  1. Initially, a set of r models is trained using the training samples XtrainRn×d, resulting in an aggregated outlier score matrix OXtrain. In formula (12), Cr denotes the score vector from the r-th base detector. Each detector score, CrXtrain, is standardized using Z-score normalization (Aggarwal and Sathe Citation2015; Zimek, Campello, and Sander Citation2014).

    (12) OXtrain=C1Xtrain,,CrXtrain(12)

  2. Generating pseudo ground truth for evaluation. The pseudo ground truth with OXtrain is generated using the maximum score across detectors (as referenced in the original article). This is generalized in formula (13), where φ represents the aggregation taken across all base detectors.

    (13) target=φOXtrainRn×1(13)

  3. Local region definition. The local region ψj of a test instance Xtestj is defined as the set of its k nearest training objects. Formally, this is denoted as (14). The process involves randomly selecting t sets of d/2 to d dimensional feature subspaces. The k nearest neighbor samples to Xtestj in the training samples are identified in each subspace. Then, training objects that occur more than t/2 times in kNNensj are included, thus defining a local region. Following the original article, the value of k was set at 10% of the training samples, bounded in the range of [30; 100].

    (14) ψj=xi|xiXtrain,xikNNensj(14)

  4. Model selection and combination. Once the local space is determined, the local outlier score matrix Oψj is obtained utilizing the trained base detectors. The local pseudo ground truth targetψj is obtained by extracting values associated with the local region j from the target dataset. LSCP measures the local competence of each base detector by evaluating the Pearson correlation between the local pseudo ground truth targetψj and the local detector scores CrXtrainψj (Schubert et al. Citation2012). Finally, with the selected x detectors, the outlier score of the test data Xtestj is computed, and the average of these x outlier scores serves as the final outlier score for the test data.

    (15) Oψj=C1ψj,,CrψjRψj×R(15)
    (16) targetψj={targetxi|xiψj}Rψj×1(16)

The performance of LSCP was evaluated by comparing it to various traditional global combination frameworks. These frameworks include the following (Aggarwal and Sathe Citation2015): (1). Average combination: an outlier score is assigned to each data point based on the average score generated by each base detector. (2). Maximum combination: The maximum score is used as the outlier score. (3). Average of Maximum (AOM) combination: The base detectors are randomly divided into predefined subsets, and the final score is computed by averaging the maximum score in each subset. (4). Maximum of Average (MOA) combination: The final score is defined as the maximum of the average scores in each subset.

All combination frameworks mentioned above utilize the same pool of individual base detectors to ensure consistency. A higher final outlier score suggests a greater likelihood of being an abnormal point. It’s usually necessary to set a threshold to distinguish abnormal samples. The area under curve percentage (AUCP) algorithm was used in this study to compute the thresholds. The area under the curve was used to evaluate a non-parametric means to threshold scores generated by the outlier scores. The outliers are set to any value beyond where the AUC of the kernel density estimate (KDE) is less than the (mean + abs(mean-median)) percent of the total KDE AUC (Ren et al. Citation2019). The area under the curve (AUC) is defined as follows:

(17) AUC=limxinfi=1nfxδx(17)

fx is the curve, and δx is the incremental step size of the rectangles whose areas will be summed up. The AUCP method generates a curve using the pdf of the normalized decision scores over a range of 0–1. This is done with a kernel density estimation.

Recall and precision were used to evaluate the effect of these algorithms (Ma et al. Citation0000). The calculation results of the two indicators are shown in equations (18) and (19):

(18) Recall=TPTP+FN(18)
(19) Precision=TPTP+FP(19)

where TP is the number of abnormal samples judged as abnormal samples, FN is the number of abnormal samples judged as normal samples, and FP is the number of normal samples judged as abnormal samples.

4. Results and discussion

4.1. Accuracy assessment of multi-source land cover products

The classification accuracy of the four LC products in China was validated using the SRS_Val dataset, and the results are shown in . ESA and CLCD had relatively high OA, exceeding 70%. The DW and ESRI demonstrated comparatively lower OA, with the ESRI being the lowest at 59.93% and a kappa coefficient of 0.523.

Table 2. Accuracy statistics of four land cover products.

The accuracy of different LC categories also varied significantly. Due to the relatively sparse distribution of validation samples at the national scale, the UA was used as the accuracy metric for different categories. shows that the classification accuracy for categories such as tree cover, built area, bare land, water, and snow and ice was relatively high. However, the classification accuracy for certain categories, such as shrubland and wetland, was suboptimal across all products. Grassland had higher classification accuracy only in the ESA and CLCD. The low classification accuracy of certain categories and differences in category definitions across different products may impact the reliability of the collected samples (Venter et al. Citation2022; H. Yang et al. Citation2017). Therefore, fusing multi-source LC products is necessary for collecting reliable training samples.

4.2. Fusion of multi-source land cover products

4.2.1. Confidence values for the results of fusing four land cover products

As shown in , the weight values for each product were calculated based on the UA. It is worth noting that the weight values calculated using this method may result in a situation where the UA for a certain category is low, yet its weight value is high. For example, although the UA of shrubland was low across all products, its weight values in the DW and ESRI were high. There is a greater probability of categorizing the classification result as shrubland when applying the WMV algorithm. However, the reliability of the final collected training samples is not significantly affected by the problem. These uncertainty areas will subsequently be removed by placing constraints on the result of the WMV algorithm before collecting the preliminary training samples.

Figure 4. Weight values for each category of the four land cover products used for the weighted majority voting algorithm.

Figure 4. Weight values for each category of the four land cover products used for the weighted majority voting algorithm.

Figure 5. Confidence values for the results of fusing four land cover products. Note that the confidence value for each pixel was defined as the highest weighted votes calculated by the weighted majority voting algorithm. (a) Northern Xinjiang region; (b) Heilongjiang Province; (c) Guangdong Province; (d) Beijing-Tianjin-Hebei region.

Figure 5. Confidence values for the results of fusing four land cover products. Note that the confidence value for each pixel was defined as the highest weighted votes calculated by the weighted majority voting algorithm. (a) Northern Xinjiang region; (b) Heilongjiang Province; (c) Guangdong Province; (d) Beijing-Tianjin-Hebei region.

Due to the prior selection of four example regions in different geographical locations in China, subsequent experiments focused mainly on these four regions. As depicted in , confidence value for each pixel was defined as the highest weighted votes calculated by the WMV algorithm. The confidence values were categorized into four levels: low confidence (value less than 0.4, indicating low consistency in classification results), moderate-low confidence (value between 0.4 and 0.7), moderate-high confidence (value between 0.7 and 1.0), and high confidence (value of 1, indicating complete agreement in classification results across all products). shows that the high confidence regions in northern Xinjiang were relatively sparse, mostly including some southern, eastern, and northern areas. Many areas in northern Xinjiang were classified as moderate-high confidence, while moderate-low confidence regions were more concentrated and tended to form strip patterns. In Heilongjiang Province, high confidence regions were primarily located in the western and southern areas, and moderate-high confidence regions were concentrated in the north. Some areas in the northern and western regions of Heilongjiang Province exhibited moderate-low confidence. In Guangdong Province, in addition to specific regions such as the southern coastal area showing moderate-low confidence, most areas exhibited high confidence. The distribution of high confidence regions in the Beijing-Tianjin-Hebei was primarily concentrated in the southeast, while the northern and western areas exhibited predominantly moderate-high confidence values.

Overall, low confidence regions were rare among the four regions, with most areas having confidence values above 0.7, indicating high confidence at a macro scale. The spatial distribution patterns of confidence values are closely associated with geographical environmental features. Cropland, tree cover, built area, water, and snow/ice showed high confidence in most areas, achieving a confidence value of 1, indicating reliable classification results. The LC in these areas is stable and can be easily identified in remote sensing images. Conversely, certain areas showed relatively low confidence values for categories like shrubland, grassland, bare land, and wetland, indicating inconsistent classification across LC products. The low confidence values in these regions are primarily attributed to the similarity of spectral and textural features and the high semantic similarity of these categories (H. Wang et al. Citation2022). Discrepancies in the definitions of grassland-shrubland and wetland-grassland among different LC products contribute to confusion in the classification results (Baig et al. Citation2022). Furthermore, confidence values tend to be lower in areas with complex LC categories and high spatial heterogeneity of the Earth’s surface and in edge areas of land features. This is mainly due to uncertainties in pixel identification, differences in category definitions, and edge effects (e.g. mixed pixels or regions), resulting in poor spatial consistency among different LC products (Hao et al. Citation2023; Radoux et al. Citation2014). The regions with lower confidence values can affect the quality of training sample collection. It is advisable to avoid collecting training samples within these areas. Additionally, the number of reliable training samples collected can also be affected if there is a significant proportion of regions with lower confidence values within a specific LC category.

4.2.2. Land cover maps using the weighted majority voting algorithm

The fused classification result by the WMV algorithm corresponds to the LC category receiving the highest weighted votes. The LC maps using the WMV algorithm, as depicted in , exhibited notable regional disparities. Northern Xinjiang region was dominated by grasslands, bare land, and croplands, with Aydingkol Lake (the largest saltwater lake in Xinjiang) and Sayram Lake in the west. The high-altitude areas of northern Xinjiang had substantial permanent snow and ice cover. Heilongjiang Province was characterized by extensive croplands and trees, with a concentration of water and wetlands in the western area. Most areas of Guangdong Province were dominated by trees, along with numerous rivers and lakes. The Pearl River Delta region on Guangdong’s southern coast had a high density of built areas due to its developed economy. In the Beijing-Tianjin-Hebei region, croplands were most widespread and were concentrated in the southeast, while trees and grasslands dominated in the north and central-west. Built areas were particularly concentrated and extensive in Beijing and Tianjin.

Figure 6. Land cover maps using the weighted majority voting algorithm. (a) Northern Xinjiang region; (b) Heilongjiang Province; (c) Guangdong Province; (d) Beijing-Tianjin-Hebei region.

Figure 6. Land cover maps using the weighted majority voting algorithm. (a) Northern Xinjiang region; (b) Heilongjiang Province; (c) Guangdong Province; (d) Beijing-Tianjin-Hebei region.

4.3. Collection of preliminary training samples

The results of the preliminary training sample collection by the local adaptive strategy are shown in after extracting stable regions. A larger number of samples of tree cover, cropland, and built/impervious areas were collected in Heilongjiang Province, Guangdong Province, and the Beijing-Tianjin-Hebei region due to the extensive presence of these categories. The majority of grassland and bare land samples were collected in the northern Xinjiang region. However, some categories had fewer collected samples, notably snow and ice, which were exclusively collected in northern Xinjiang. Wetland samples were relatively scarce in all three regions except Heilongjiang Province, where they were more abundant. Relatively fewer shrubland and grassland samples were collected in specific regions. For instance, only 8 shrubland samples were collected in Heilongjiang Province, and 119 grassland samples were collected in Guangdong Province. The scarcity of samples in these cases is primarily attributed to the limited spatial distribution and the high uncertainty of the classification results of these categories. Furthermore, the approach extracts cloud-free regions from March to November before sampling, which impacts the quantity of samples collected. The four study areas were minimally affected by cloud cover masks throughout the year, reducing the impact on sample collection quantity. However, in many other regions, especially southern China, there is significant cloud cover during the summer, which may result in extensive data gaps in composited monthly satellite images. To address this problem, the optimal solution is to exclude satellite images from months with extensive data gaps. Only composited images from months with fewer data gaps were utilized, and cloud-free regions were extracted for preliminary sample collection. The same applies when using the collected training samples for LC classification.

Table 3. Numbers of preliminary training samples for various land cover categories in the four regions collected by the local adaptive strategy.

The quality and quantity of training samples are crucial for training classification models (Mellor et al. Citation2015). The sample collection strategy described in this paper enables the collection of reliable and evenly distributed preliminary training samples. However, the quantity of preliminary samples may vary significantly depending on the study area, which could result in sample imbalance. Unbalanced training samples refer to significantly more or fewer training samples for one or multiple categories. This may result in rare LC types being underrepresented relative to more abundant classes, which may degrade the overall classification accuracy (Estabrooks, Jo, and Japkowicz Citation2004; Mellor et al. Citation2015). Techniques such as down-sampling of majority classes (Freeman, Moisen, and Frescino Citation2012) and over-sampling of minority classes (Ling and Li Citation1998) have been explored to alleviate the problem of unbalanced training samples. Therefore, users may need to adjust based on the specific study area and classification model when applying our sample collection strategy. For example, users can increase the number of samples collected per grid for categories with fewer samples or adjust the grid size (set at 5 km in this paper) to increase or decrease the number of samples collected for certain categories.

The accuracy of the collected preliminary training samples was validated through visual interpretation of 6671 randomly selected samples from the four regions using online satellite images such as Google Earth and Bing Maps. The number of samples selected randomly for each region ranged from 1606 to 1739. The time-series spectral curves for 2020 were also extracted and analyzed to aid interpretation. Validation revealed that the accuracies of the preliminary training samples in the northern Xinjiang, Heilongjiang, Guangdong, and Beijing-Tianjin-Hebei regions were 96.26%, 97.01%, 94.24%, and 94.97%, respectively. The average accuracy of the preliminary training samples across the four regions was 95.62%, indicating satisfactory overall quality.

4.4. Filtering abnormal samples by LSCP algorithm

4.4.1. Types of abnormal samples

Despite the high accuracy achieved by the collected preliminary training samples, errors and abnormal samples were still present. The abnormal samples can be classified into three types: (1) classification errors (label errors), (2) LC changes within a year, and (3) spectral anomalies (also referred to as feature anomalies, including mixed pixels and anomalies caused by environmental factors or issues with satellite data quality).

lists anomaly types, descriptions, images, and time-series spectral curves of the six typical abnormal samples. Samples (a) and (b) represent anomalies caused by classification errors (label errors). In the spectral curve charts, the orange curve represents the spectral curve of abnormal samples, while the blue curve represents the spectral curves of normal samples. Differences in the NDVI time-series curves allow for the differentiation of abnormal samples. Sample (c) shows anomalies caused by changes in the LC category within a year. From June to August, sample (c) had positive NDVI values but negative NDWI values. The main reason may be the seasonal hydrological changes in the region and influences from tides, resulting in changes in water levels (Y. Li et al. Citation2019; Yin et al. Citation2023). These factors cause areas originally covered by water to become exposed, coupled with vigorous summer vegetation growth, leading to a significant increase in the NDVI. Sample (d) represents anomalies induced by mixed pixels in the satellite image (X. Liu, Li, and Zhang Citation2010). The sample is located at the edge of a building, where the pixel in the satellite image contains spectral information from multiple LC categories, such as buildings and trees. As a result, it presented a different spectral curve from that of pure pixels. Samples (e) and (f) show anomalies caused by environmental factors or issues with satellite data quality. Spectral curves revealed that sample (e) had extreme anomalies in March, with the green and near-infrared bands being lower than those in other months. The NDVI was negative, and the NDWI even exceeded 0.4. At sample (f), spectral anomalies occurred in October, where values in the visible and near-infrared bands approached zero (0.0001), resulting in calculated spectral indices such as the EVI, NDVI, and NDWI all being zero.

Table 4. Six typical abnormal samples with anomaly types, descriptions, images, and time-series spectral curves.

The presence of abnormal samples (noise) in training samples can adversely affect the training of models, consequently impacting the accuracy of LC classification. During the training process, models strive to minimize the loss function, meaning that they attempt to minimize the discrepancy between the true labels in the training samples and the model’s predicted results. If there is noise in the training samples, the model may also attempt to fit it, which could result in overfitting. Previous researchers have assessed the impact of noise on classification accuracy. Rodriguez-Galiano et al. (Citation2012) reported that the performance of random forest classifiers was relatively insensitive to the intentional mislabeling of training samples up to 20%, beyond which the error rate exhibited exponential growth. Mellor et al. (Citation2015) discovered that the classification accuracy gradually decreases as the proportion of noise increases. In addition, the complexity of the learning model may be impacted by some classification algorithms (Pelletier et al. Citation2017). For example, the average path length of training instances may increase in the presence of class label noise, leading to increased computational training time. Therefore, it is crucial to remove abnormal samples present in the preliminary training samples.

4.4.2. Comparison of different anomaly detection models

Ensemble frameworks for anomaly detection were implemented in Python, utilizing time-series spectral features to identify abnormal samples. The anomaly detection models were trained and applied separately for each category in the four regions. Due to the scarcity of shrubland samples in Heilongjiang Province, they were removed preemptively. All the ensemble frameworks utilized a pool of 30 LOF-based base detectors for consistent performance evaluation. To induce diversity among the base detectors (Britto, Sabourin, and Oliveira Citation2014; Zimek, Campello, and Sander Citation2014), different initialization hyperparameters, i.e. the number of neighbors (MinPts) used in each LOF detector, were selected within the range of [5, 150]. For the AOM and MOA frameworks, the base detectors were divided into five subgroups; each group contains six base detectors selected without replacement.

The training samples that have been visually interpreted were utilized to evaluate the performance of the different models. The results of LSCP and other models are shown in . The recall rate indicates the proportion of abnormal samples that are correctly identified. Compared to conventional models, the LSCP models showed superior and robust performance, with a generally higher recall rate. The superiority of LSCP can be attributed to its ability to integrate only competent base detectors in the local region of test instances, which mitigates the influence of underperforming detectors. Its advantages were pronounced in the northern Xinjiang and Beijing-Tianjin-Hebei regions. Nevertheless, significant differences in recall rates were observed across different regions. The recall rates in northern Xinjiang and Guangdong exceeded 75%, whereas the recall rate in the Beijing-Tianjin-Hebei region was the lowest at only 60.47%. Hence, the performance of anomaly detection models varies significantly across different study areas. shows that all anomaly detection models exhibited low precision, indicating that many normal samples were detected as anomalies. This occurrence may be attributed to the distribution of these samples in the feature space deviating from most of the normal samples despite their correct labels and lack of apparent anomalies found during visual interpretation. The percentages of anomalies detected (removal ratio) by the LSCP models in the preliminary training samples in northern Xinjiang, Heilongjiang, Guangdong, and Beijing-Tianjin-Hebei were 8.37%, 3.01%, 4.84%, and 5.29%, respectively. Although the use of anomaly detection models may remove some normal samples, the proportion of removal relative to the total sample count is very small, and thus, the overall impact is minimal. After the abnormal samples were removed by the LSCP models, the accuracy of the final training samples in the northern Xinjiang, Heilongjiang, Guangdong, and Beijing-Tianjin-Hebei regions increased by 2.05% to 4.34%, reaching 99.04%, 99.06%, 98.58%, and 97.95%, respectively. Even after sample filtering, errors and uncertainties are still present in the final training samples. However, the errors in the final samples are small and sufficient for model training. It is worth noting that the number of visual interpretation samples used to evaluate anomaly detection algorithms may be limited, and the proportion of abnormal samples in the overall sample set is particularly small. As a result, the validation results of this study may deviate from the actual accuracy. Despite this, they still hold some reference value for readers.

Table 5. Comparison of the performance of different anomaly detection models in the four regions (%).

Table 6. Comparison of the performance between the LSCP and conventional statistical approaches (%).

The total number and recall rate of the three types of anomalies were calculated separately to analyze the detection effectiveness of the anomaly detection models for different types of anomalies. A total of 291 abnormal samples were found in all four regions during visual interpretation. There were 170 samples with classification errors (label errors), 33 samples with LC changes within a year, and 88 samples with spectral anomalies (feature anomalies). Using the LSCP models for anomaly detection, the recall rates for classification errors, LC changes within a year, and spectral anomalies were 67.65%, 69.70%, and 75.00%, respectively. The identification accuracy of the LSCP model varied for different types of abnormal samples. The accuracy of identifying anomalies caused by classification errors was the lowest. Certain vegetation types, especially shrubland, grassland, and wetland, are susceptible to misclassification due to their similar spectral and textural features, as well as the confusion existing in LC products. Therefore, many abnormal samples with classification errors are also challenging to correctly identify. The accuracy of identifying abnormal samples with LC changes within a year, as well as abnormal samples with spectral anomalies, was relatively higher. For samples that have spectral anomalies, abnormal spectral values over one or more time periods lead to significant differences in the time-series spectral curves compared to those of other samples, making them easier to detect. Additionally, as LC change is a slow process, the number of samples with LC changes within a year is relatively small, mostly related to seasonal changes in water bodies (as shown in , sample (c)). The effectiveness of the anomaly detection algorithm on abnormal samples with LC changes within a year needs to be validated further. It should also be noted that the threshold in anomaly detection algorithms is computed based on the outlier scores of all points. Only samples with outlier scores exceeding the threshold are classified as anomalies. Due to variations in threshold computation among different algorithms, the removal ratio of training samples and the accuracy of anomaly detection can also differ. Overall, ensemble machine learning-based anomaly detection algorithms are effective for filtering abnormal samples. The detection performance in the future is expected to be further enhanced by improvements in anomaly detection algorithms and ensemble strategies, as well as the selection of other typical features.

4.4.3. Comparison with conventional approaches

The performances of the different approaches were evaluated by comparing the results of the LSCP models with those of conventional statistical techniques. Specifically, the approaches employed by Jin et al. (Citation2022) and Wen et al. (Citation2022) for outlier removal were utilized. Both approaches rely on the normal distribution of the data, removing training samples falling outside the μ±σ range. Jin et al. performed PCA to compute PC1 and PC2, using these principal components for filtering abnormal samples. Wen et al. utilized NDVI time-series data to eliminate outlier samples from the average NDVI curve. Other approaches, such as that proposed by Zhang et al. (Citation2021), similarly remove outlier samples based on spectral statistical distribution. Since these approaches share similar principles, they were not compared separately.

Recall, precision, and removal ratio were used to compare the different approaches, with these evaluation metrics calculated based on all samples from the four regions. indicates that the LSCP had a recall rate of 70.10%, which was lower than the recall rates of the two statistical approaches. In particular, the standard deviation filtering approach achieved a recall rate of 97.59%, indicating its ability to identify almost all abnormal samples. However, the precision of the two statistical approaches was significantly lower than that of LSCP, while the removal ratio was notably greater. Although conventional statistical approaches can correctly identify the majority of abnormal samples, a large percentage of normal samples are also removed. In fact, visual interpretation identified a minimal proportion of abnormal samples, suggesting that relying solely on statistical approaches could lead to excessive removal of training samples. Moreover, because of the reduction in data dimensionality, the PCA-Lajda criterion approach may overlook many samples exhibiting LC changes within a year and spectral anomalies. The original article applied the standard deviation filtering approach to remove abnormal corn samples. When used on multiple-category LC training samples, a large portion of the samples are eliminated because the time-series spectral characteristics of the same category are diverse (e.g. different crops exhibit different spectral curves). It is important to note that in addition to sample quality, sample quantity is also crucial for model training. Despite the slightly lower recall rate of the LSCP compared to conventional approaches, it better balances sample quality and quantity, avoiding excessive removal of training samples. Machine learning-based anomaly detection approaches such as LSCP are unaffected by data distribution and provide outlier scores for each sample, helping users understand the degree of anomalies. Thus, machine learning-based anomaly detection algorithms have promising applications in filtering abnormal LC training samples.

4.4.4. Comparison with models utilizing single-temporal spectral features

The effectiveness of models based on single-temporal and time-series spectral features for various anomalies was compared using the best-performing ensemble framework, LSCP. The time-series features encompass spectral features from March to November. Single-temporal spectral features were extracted from composite images in April, July, and October, representing spring, summer, and autumn spectral characteristics, respectively. Six typical abnormal samples from were used to validate the detection results using different temporal features. In , abnormal samples are labeled “1” if the model successfully identified them and “0” if not. The “Total” column displays the total number of successfully identified abnormal samples.

Table 7. Comparison of the model performance between single-temporal and time-series spectral features on six typical abnormal samples.

shows that all models effectively identified anomalies caused by classification errors (e.g. samples (a) and (b)) and mixed pixels (e.g. sample (d)). Challenges persist in accurately identifying abnormal samples utilizing single-temporal features alone. For instance, sample (b) was not identified as an abnormal sample when relying solely on spectral features from October. Models utilizing single-temporal features demonstrated insufficient detection ability for anomalies resulting from LC changes within a year (e.g. sample (c)) and spectral anomalies (e.g. samples (e) and (f)). In the case of sample (c), seasonal and water level changes in the water were predominantly evident in the summer, so detection relies on spectral features from this season. The same principle also pertains to samples (e) and (f). Consequently, employing anomaly detection utilizing time-series spectral features is considered more appropriate and yields the highest accuracy, demonstrating strong performance in detecting all types of abnormal samples.

4.5. Generalization of the ATSC approach

Although the experimental scope of this study is limited to China, the nation as a whole comprises a vast expanse characterized by substantial climate and geography that differ considerably between regions. The four chosen experimental regions are located in diverse geographical locations within China and exhibit significant differences in climate, land use and land cover, topography, and socio-economic development levels. The LC training samples obtained using the ATSC approach demonstrated desirable accuracy across these geographically diverse regions, indicating that the approach has strong generalizability. With the increasing availability of high-resolution LC products at both global and regional scales, coupled with the global coverage of satellite images such as Sentinel-2, we foresee the feasibility and potential of the ATSC approach for cross-regional applications.

When applying the ATSC approach to other regions, several key factors need to be considered. Firstly, selecting suitable LC products with appropriate temporal and spatial coverage is crucial. It is possible to choose either global-scale LC products or regional-scale products that cover the target region. Secondly, a validation dataset covering the target region is necessary to calculate category weight values for different products when applying the WMV algorithm. The validation dataset can either be utilized from publicly available datasets (such as SRS_Val) or created through visual interpretation. To minimize the impact of temporal differences, it is essential to ensure that the reference years of the selected LC products and the validation dataset are as close as possible. Thirdly, when employing the local adaptive sampling strategy, it is necessary to reasonably set the grid size and sampling quantity based on the specific study area and application requirements. Finally, parameter adjustments may be necessary for anomaly detection models like LSCP to suit the context effectively. Moreover, the uncertainties that are associated with this approach must be noted. The reliability of the collected training samples is influenced by factors such as the quantity and accuracy of the LC products, as well as the quality of the validation dataset. Additionally, the accuracy of anomaly detection algorithms could be impacted by the complexity and variability of LC spectra in certain regions.

However, further validation in other regions worldwide has not been conducted. It must be acknowledged that when applying the ATSC approach to regions with significantly different geography, such as tropical rainforests with climates distinct from China, its performance may vary. Future research should include extensive case studies under diverse geographic and climatic conditions to evaluate the generalization and applicability of the ATSC approach.

5. Conclusion

This article proposes a novel approach for automatically collecting training samples for LC classification. The approach initially evaluated the accuracy of multi-source LC products using a unified validation dataset. Category weight values were computed according to the UA, and multiple LC products were fused by using the WMV algorithm. High-confidence regions were then extracted from the fused LC map, and morphological processing was applied to remove pixels in fringe areas. Subsequently, time-series Sentinel-2 images were utilized to extract cloud-free regions. A local adaptive strategy was used to automatically collect high-quality training samples. The validation results demonstrated the high accuracy of the collected preliminary training samples, ranging from 94.24% to 97.01% across the four study areas, with an average accuracy of 95.62%. An ensemble-based anomaly detection algorithm was then utilized to further filter the abnormal samples. Validation using visually interpreted samples showed the effectiveness of different anomaly detection models in identifying abnormal samples from the LC preliminary sample dataset. Machine learning-based anomaly detection algorithms, such as LSCP, were found to better balance the quality and quantity of training samples than traditional statistical approaches. Among them, the LSCP model utilizing time-series spectral features performed the best. The results indicated a further improvement in the accuracy of the final training samples to 97.95% or higher across all regions after removing abnormal samples.

Compared to previous approaches, the ATSC approach offers several advantages. (1) The collection of training samples is fully automatic, eliminating the labor-intensive process of generating sufficient training samples for large-scale LC classification. It is applicable to regions with different geographical locations and environments, as well as to applications in different years. (2) Multiple existing fine spatial resolution LC products are fused, making the collected training samples more suitable for LC classification using high spatial resolution satellite imagery (such as Sentinel-2). (3) The final training sample dataset includes two indicators, the confidence value and the outlier score, which measure sample reliability. Users can comprehensively utilize these two indicators for further filtering training samples when applied across diverse regions and different classification algorithms. Consequently, the reliable training samples collected through our approach can be employed for high-quality LC mapping.

Author contributions statement

Yanzhao Wang: conception and design, analysis and interpretation of the data, the drafting of the paper. Yonghua Sun: conception and design, revising it critically for intellectual content, the final approval of the version to be published. Xuyue Cao: conception and design, the drafting of the paper. Yihan Wang: analysis and interpretation of the data, the drafting of the paper. Wangkuan Zhang: analysis and interpretation of the data, the drafting of the paper. Xinglu Cheng: analysis and interpretation of the data, revising it critically for intellectual content. Ruozeng Wang: analysis and interpretation of the data. Jinkun Zong: analysis and interpretation of the data. All authors agree to be accountable for all aspects of the work.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The Dynamic World, ESA WorldCover, and Sentinel-2 L2A data are available in Google Earth Engine Data Catalog (https://developers.google.cn/earth-engine/datasets/catalog). The ESRI Land Cover is available through the Earth Engine Python or JavaScript client libraries (ee.ImageCollection(“projects/sat-io/open-datasets/landcover/ESRI_Global-LULC_10m”)) and the following website (https://www.arcgis.com/apps/instant/media/index.html?appid=fc92d38533d440078f17678ebc20e8e2). The China Land Cover Dataset is available through the Earth Engine Python or JavaScript client libraries (ee.Image(“projects/lulc-datase/assets/LULC_HuangXin/CLCD_v01_2020”)) and Zenodo (https://zenodo.org/records/8176941). The SRS_Val dataset is available through Zenodo (https://zenodo.org/records/7846090). All data generated during this study are available from the corresponding author upon request.

Additional information

Funding

This work was supported in part by the Beijing Outstanding Young Scientists Program [BJJWZYJH01201910028032] and the National Key Research and Development Project [2018YFC1508902, 2017YFC0406006, 2017YFC0406004].

References

  • Aggarwal, C. C., and S. Sathe. 2015. “Theoretical Foundations and Algorithms for Outlier Ensembles.” ACM SIGKDD Explorations Newsletter 17 (1): 24–25. https://doi.org/10.1145/2830544.2830549.
  • Baig, M. F., M. R. U. Mustafa, I. Baig, H. B. Takaijudin, and M. T. Zeshan. 2022. “Assessment of Land Use Land Cover Changes and Future Predictions Using CA-ANN Simulation for Selangor, Malaysia.” Water 14 (3): 402. https://doi.org/10.3390/w14030402.
  • Bontemps, S., M. Herold, L. Kooistra, A. Van Groenestijn, A. Hartley, O. Arino, I. Moreau, and P. Defourny. 2012. “Revisiting Land Cover Observation to Address the Needs of the Climate Modeling Community.” Biogeosciences 9 (6): 2145–2157. https://doi.org/10.5194/bg-9-2145-2012.
  • Breunig, M. M., H.-P. Kriegel, R. T. Ng, and J. Sander. 2000. “LOF: Identifying Density-Based Local Outliers.” Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Dallas, Texas, USA, 93–104.
  • Britto, A. S., Jr, R. Sabourin, and L. E. Oliveira. 2014. “Dynamic Selection of Classifiers—A Comprehensive Review.” Pattern Recognition 47 (11): 3665–3680. https://doi.org/10.1016/j.patcog.2014.05.003.
  • Brown, C. F., S. P. Brumby, B. Guzder-Williams, T. Birch, S. B. Hyde, J. Mazzariello, W. Czerwinski, et al. 2022. “Dynamic World, Near Real-Time Global 10 M Land Use Land Cover Mapping.” Scientific Data 9 (1). https://doi.org/10.1038/s41597-022-01307-4.
  • Calderón-Loor, M., M. Hadjikakou, and B. A. Bryan. 2021. “High-Resolution Wall-To-Wall Land-Cover Mapping and Land Change Assessment for Australia from 1985 to 2015.” Remote Sensing of Environment 252:112148. https://doi.org/10.1016/j.rse.2020.112148.
  • Cao, Q., D. Yu, M. Georgescu, Z. Han, and J. Wu. 2015. “Impacts of Land Use and Land Cover Change on Regional Climate: A Case Study in the Agro-Pastoral Transitional Zone of China.” Environmental Research Letters 10 (12): 124025. https://doi.org/10.1088/1748-9326/10/12/124025.
  • Chandola, V., A. Banerjee, and V. Kumar. 2009. “Anomaly Detection: A Survey.” ACM Computing Surveys (CSUR) 41 (3): 1–58. https://doi.org/10.1145/1541880.1541882.
  • Chen, J., S. Li, H. Wu, and X. Chen. 2017. “Towards a Collaborative Global Land Cover Information Service.” International Journal of Digital Earth 10 (4): 356–370. https://doi.org/10.1080/17538947.2016.1267268.
  • Chen, Z., B. Yu, Y. Zhou, H. Liu, C. Yang, K. Shi, and J. Wu. 2019. “Mapping Global Urban Areas from 2000 to 2012 Using Time-Series Nighttime Light Data and MODIS Products.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (4): 1143–1153. https://doi.org/10.1109/JSTARS.2019.2900457.
  • Colditz, R. R., M. Schmidt, C. Conrad, M. C. Hansen, and S. Dech. 2011. “Land Cover Classification with Coarse Spatial Resolution Data to Derive Continuous and Discrete Maps for Complex Regions.” Remote Sensing of Environment 115 (12): 3264–3275. https://doi.org/10.1016/j.rse.2011.07.010.
  • Congalton, R. G., J. Gu, K. Yadav, P. Thenkabail, and M. Ozdogan. 2014. “Global Land Cover Mapping: A Review and Uncertainty Analysis.” Remote Sensing 6 (12): 12070–12093. https://doi.org/10.3390/rs61212070.
  • De Maesschalck, R., D. Jouan-Rimbaud, and D. L. Massart. 2000. “The Mahalanobis Distance.” Chemometrics and Intelligent Laboratory Systems 50 (1): 1–18. https://doi.org/10.1016/S0169-7439(99)00047-7.
  • Diek, S., F. Fornallaz, M. E. Schaepman, and R. de Jong. 2017. “Barest Pixel Composite for Agricultural Areas Using Landsat Time Series.” Remote Sensing 9 (12): 1245. https://doi.org/10.3390/rs9121245.
  • Di Gregorio, A., and L. J. M. Jansen. 2000. Land Cover Classification System(ICCS): Classification Concepts and User Manual. Rome: Food and Agriculture Organization of the United Nations.
  • Drusch, M., U. Del Bello, S. Carlier, O. Colin, V. Fernandez, F. Gascon, B. Hoersch, et al. 2012. “Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services.” Remote Sensing of Environment 120:25–36. https://doi.org/10.1016/j.rse.2011.11.026.
  • Estabrooks, A., T. Jo, and N. Japkowicz. 2004. “A Multiple Resampling Method for Learning from Imbalanced Data Sets.” Computational Intelligence 20 (1): 18–36. https://doi.org/10.1111/j.0824-7935.2004.t01-1-00228.x.
  • Foody, G. M., and A. Mathur. 2006. “The Use of Small Training Sets Containing Mixed Pixels for Accurate Hard Image Classification: Training on Mixed Spectral Responses for Classification by a SVM.” Remote Sensing of Environment 103 (2): 179–189. https://doi.org/10.1016/j.rse.2006.04.001.
  • Freeman, E. A., G. G. Moisen, and T. S. Frescino. 2012. “Evaluating Effectiveness of Down-Sampling for Stratified Designs and Unbalanced Prevalence in Random Forest Models of Tree Species Distributions in Nevada.” Ecological Modelling 233:1–10. https://doi.org/10.1016/j.ecolmodel.2012.03.007.
  • Friedl, M. A., D. Sulla-Menashe, B. Tan, A. Schneider, N. Ramankutty, A. Sibley, and X. Huang. 2010. “MODIS Collection 5 Global Land Cover: Algorithm Refinements and Characterization of New Datasets.” Remote Sensing of Environment 114 (1): 168–182. https://doi.org/10.1016/j.rse.2009.08.016.
  • Gao, B. C. 1996. “NDWI—A Normalized Difference Water Index for Remote Sensing of Vegetation Liquid Water from Space.” Remote Sensing of Environment 58 (3): 257–266. https://doi.org/10.1016/S0034-4257(96)00067-3.
  • Giri, C., B. Pengra, J. Long, and T. R. Loveland. 2013. “Next Generation of Global Land Cover Characterization, Mapping, and Monitoring.” International Journal of Applied Earth Observation and Geoinformation 25:30–37. https://doi.org/10.1016/j.jag.2013.03.005.
  • Gong, P., J. Wang, L. Yu, Y. Zhao, Y. Zhao, L. Liang, Z. Niu, X. Huang, H. Fu, and S. Liu. 2013. “Finer Resolution Observation and Monitoring of Global Land Cover: First Mapping Results with Landsat TM and ETM+ Data.” International Journal of Remote Sensing 34 (7): 2607–2654. https://doi.org/10.1080/01431161.2012.748992.
  • Han, S., X. Hu, H. Huang, M. Jiang, and Y. Zhao. 2022. “Adbench: Anomaly Detection Benchmark.” Advances in Neural Information Processing Systems 35:32142–32159. https://doi.org/10.2139/ssrn.4266498.
  • Hansen, M. C., and B. Reed. 2000. “A Comparison of the IGBP DISCover and University of Maryland 1 Km Global Land Cover Products.” International Journal of Remote Sensing 21 (6–7): 1365–1373. https://doi.org/10.1080/014311600210218.
  • Hao, X., Y. Qiu, G. Jia, M. Menenti, J. Ma, and Z. Jiang. 2023. “Evaluation of Global Land Use–Land Cover Data Products in Guangxi, China.” Remote Sensing 15 (5): 1291. https://doi.org/10.3390/rs15051291.
  • Hermosilla, T., M. A. Wulder, J. C. White, and N. C. Coops. 2022. “Land Cover Classification in an Era of Big and Open Data: Optimizing Localized Implementation and Training Data Selection to Improve Mapping Outcomes.” Remote Sensing of Environment 268:112780. https://doi.org/10.1016/j.rse.2021.112780.
  • Hu, Y., Y. Dong, and Batunacun. 2018. “An Automatic Approach for Land-Change Detection and Land Updates Based on Integrated NDVI Timing Analysis and the CVAPS Method with GEE Support.” Isprs Journal of Photogrammetry & Remote Sensing 146:347–359. https://doi.org/10.1016/j.isprsjprs.2018.10.008.
  • Huete, A. R., H. Q. Liu, K. Batchily, and W. vanLeeuwen. 1997. “A Comparison of Vegetation Indices Global Set of TM Images for EOS-MODIS.” Remote Sensing of Environment 59 (3): 440–451. https://doi.org/10.1016/S0034-4257(96)00112-5.
  • Jin, Q., E. Xu, and X. Zhang. 2022. “A Fusion Method for Multi-Source Land Cover Products Based on Superpixels and Statistical Extraction for Enhancing Resolution and Improving Accuracy.” Remote Sensing 14 (7): 1676. https://doi.org/10.3390/rs14071676.
  • Kang, J., X. Yang, Z. Wang, H. Cheng, J. Wang, H. Tang, Y. Li, Z. Bian, and Z. Bai. 2022. “Comparison of Three ten Meter Land Cover Products in a Drought Region: A Case Study in Northwestern China.” The Land 11 (3): 427. https://doi.org/10.3390/land11030427.
  • Karra, K., C. Kontgis, Z. Statman-Weil, J.C. Mazzariello, M. Mathis, and S.P. Brumby. 2021. “Global Land Use/Land Cover With Sentinel 2 and Deep Learning.” In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 4704–4707.
  • Kayet, N., K. Pathak, A. Chakrabarty, and S. Sahoo. 2016. “Spatial Impact of Land Use/Land Cover Change on Surface Temperature Distribution in Saranda Forest, Jharkhand.” Modeling Earth Systems and Environment 2 (3): 1–10. https://doi.org/10.1007/s40808-016-0159-x.
  • Kim, H., H. Kim, H. Moon, and H. Ahn. 2011. “A Weight-Adjusted Voting Algorithm for Ensembles of Classifiers.” Journal of the Korean Statistical Society 40 (4): 437–449. https://doi.org/10.1016/j.jkss.2011.03.002.
  • Kim, D., S. Lee, and J. Lee. 2020. “An Ensemble-Based Approach to Anomaly Detection in Marine Engine Sensor Streams for Efficient Condition Monitoring and Analysis.” Sensors 20 (24): 7285. https://doi.org/10.3390/s20247285.
  • Kinoshita, T., K. Iwao, and Y. Yamagata. 2014. “Creation of a Global Land Cover and a Probability Map Through a New Map Integration Method.” International Journal of Applied Earth Observation and Geoinformation 28:70–77. https://doi.org/10.1016/j.jag.2013.10.006.
  • Li, M., B. Chen, C. Webster, P. Gong, and B. Xu. 2022. “The Land-Sea Interface Mapping: China’s Coastal Land Covers at 10 M for 2020.” Science Bulletin 67 (17): 1750–1754. https://doi.org/10.1016/j.scib.2022.07.012.
  • Li, J., J. Li, C. Wang, F. J. Verbeek, T. Schultz, and H. Liu. 2023. “Outlier Detection Using Iterative Adaptive Mini-Minimum Spanning Tree Generation with Applications on Medical Data.” Frontiers in Physiology 14. https://doi.org/10.3389/fphys.2023.1233341.
  • Ling, C. X., and C. Li. 1998. “Data Mining for Direct Marketing: Problems and Solutions.” Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, USA, 73–79.
  • Liu, X., X. Li, and X. Zhang. 2010. “Determining Class Proportions within a Pixel Using a New Mixed-Label Analysis Method.” IEEE Transactions on Geoscience and Remote Sensing 48 (4): 1882–1891. https://doi.org/10.1109/TGRS.2009.2033178.
  • Liu, L., X. Zhang, Y. Gao, X. Chen, X. Shuai, and J. Mi. 2021. “Finer-Resolution Mapping of Global Land Cover: Recent Developments, Consistency Analysis, and Prospects.” Journal of Remote Sensing 2021. https://doi.org/10.34133/2021/5289697.
  • Liu, L., T. Zhao, and X. Zhang. 2023. “A Novel Stratified Random Sampling Global Validation Dataset in 2020—Srs_val Dataset.” https://doi.org/10.5281/zenodo.7724983.
  • Li, Y., Q. Zhang, X. Liu, Z. Tan, and J. Yao. 2019. “The Role of a Seasonal Lake Groups in the Complex Poyang Lake-Floodplain System (China): Insights into Hydrological Behaviors.” Journal of Hydrology 578:124055. https://doi.org/10.1016/j.jhydrol.2019.124055.
  • Loveland, T. R., and Belward, A. S. 1997. “The IGBP-DIS Global 1km Land Cover Data Set, DISCover: First Results.” International Journal of Remote Sensing 18 (15): 3289–3295. https://doi.org/10.1080/014311697217099.
  • Loveland, T. R., B. C. Reed, J. F. Brown, D. O. Ohlen, Z. Zhu, L. Yang, and J. W. Merchant. 2000. “Development of a Global Land Cover Characteristics Database and IGBP DISCover from 1 Km AVHRR Data.” International Journal of Remote Sensing 21 (6–7): 1303–1330. https://doi.org/10.1080/014311600210191.
  • Ma, Y., J. Niu, B. Xu, X. Song, W. Huang, and G. Sun. “Identification Method for Users-Transformer Relationship in Station Area Based on Local Selective Combination in Parallel Outlier Ensembles Algorithm.”
  • Mellor, A., S. Boukir, A. Haywood, and S. Jones. 2015. “Exploring Issues of Training Data Imbalance and Mislabelling on Random Forest Performance for Large Area Land Cover Classification Using the Ensemble Margin.” ISPRS Journal of Photogrammetry and Remote Sensing 105:155–168. https://doi.org/10.1016/j.isprsjprs.2015.03.014.
  • Nassif, A. B., M. A. Talib, Q. Nasir, and F. M. Dakalbab. 2021. “Machine Learning for Anomaly Detection: A Systematic Review.” IEEE Access 9:78658–78700. https://doi.org/10.1109/ACCESS.2021.3083060.
  • Ouyang, B., Y. Song, Y. Li, G. Sant, and M. Bauchy. 2021. “EBOD: An Ensemble-Based Outlier Detection Algorithm for Noisy Datasets.” Knowledge-Based Systems 231:107400. https://doi.org/10.1016/j.knosys.2021.107400.
  • Pelletier, C., S. Valero, J. Inglada, N. Champion, C. Marais Sicre, and G. Dedieu. 2017. “Effect of Training Class Label Noise on Classification Performances for Land Cover Mapping with Satellite Image Time Series.” Remote Sensing 9 (2): 173. https://doi.org/10.3390/rs9020173.
  • Pérez-Hoyos, A., F. J. García-Haro, and J. San-Miguel-Ayanz. 2012. “A Methodology to Generate a Synergetic Land-Cover Map by Fusion of Different Land-Cover Products.” International Journal of Applied Earth Observation and Geoinformation 19:72–87. https://doi.org/10.1016/j.jag.2012.04.011.
  • Phiri, D., M. Simwanda, S. Salekin, V. R. Nyirenda, Y. Murayama, and M. Ranagalage. 2020. “Sentinel-2 Data for Land Cover/Use Mapping: A Review.” Remote Sensing 12 (14): 2291. https://doi.org/10.3390/rs12142291.
  • Proverbio, M., N. J. Bertola, and I. F. C. Smith. 2018. “Outlier-Detection Methodology for Structural Identification Using Sparse Static Measurements.” Sensors 18 (6): 1702. https://doi.org/10.3390/s18061702.
  • Radoux, J., C. Lamarche, E. Van Bogaert, S. Bontemps, C. Brockmann, and P. Defourny. 2014. “Automated Training Sample Extraction for Global Land Cover Mapping.” Remote Sensing 6 (5): 3965–3987. https://doi.org/10.3390/rs6053965.
  • Ran, Y., X. Li, L. Lu, and Z. Li. 2012. “Large-Scale Land Cover Mapping with the Integration of Multi-Source Information Based on the Dempster–Shafer Theory.” International Journal of Geographical Information Science 26 (1): 169–191. https://doi.org/10.1080/13658816.2011.577745.
  • Ren, K., H. Yang, Y. Zhao, W. Chen, M. Xue, H. Miao, S. Huang, and J. Liu. 2019. “A Robust AUC Maximization Framework with Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification.” IEEE Transactions on Neural Networks and Learning Systems 30 (10): 3072–3083. https://doi.org/10.1109/TNNLS.2018.2870666.
  • Rodriguez-Galiano, V. F., B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez. 2012. “An Assessment of the Effectiveness of a Random Forest Classifier for Land-Cover Classification.” ISPRS Journal of Photogrammetry and Remote Sensing 67:93–104. https://doi.org/10.1016/j.isprsjprs.2011.11.002.
  • Schepaschenko, D., I. McCallum, A. Shvidenko, S.Fritz, F. Kraxner, and M. Obersteiner. 2011. “A New Hybrid Land Cover Dataset for Russia: a Methodology for Integrating Statistics, Remote Sensing and in situ Information.” Journal of Land Use Science 6 (4): 245–259. https://doi.org/10.1080/1747423X.2010.511681.
  • Schubert, E., R. Wojdanowski, A. Zimek, and H.-P. Kriegel. 2012. “On Evaluation of Outlier Rankings and Outlier Scores.” Proceedings of the 2012 SIAM International Conference on Data Mining (SDM), Anaheim, California, USA, 1047–1058.
  • Shirahata, L. M., K. Iizuka, A. Yusupujiang, F. R. Rinawan, R. Bhattarai, and X. Dong. 2017. “Production of Global Land Cover Data–GLCNMO2013.” Journal of Geography & Geology 9. https://doi.org/10.5539/jgg.v9n3p1.
  • Tucker, C. J. 1979. “RED and PHOTOGRAPHIC INFRARED LINEAR COMBINATIONS for MONITORING VEGETATION.” Remote Sensing of Environment 8 (2): 127–150. https://doi.org/10.1016/0034-4257(79)90013-0.
  • Venter, Z. S., D. N. Barton, T. Chakraborty, T. Simensen, and G. Singh. 2022. “Global 10 M Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover.” Remote Sensing 14 (16): 4101. https://doi.org/10.3390/rs14164101.
  • Wang, Z., and G. Mountrakis. 2023. “Accuracy Assessment of Eleven Medium Resolution Global and Regional Land Cover Land Use Products: A Case Study Over the Conterminous United States.” Remote Sensing 15 (12): 3186. https://doi.org/10.3390/rs15123186.
  • Wang, J., X. Yang, Z. Wang, H. Cheng, J. Kang, H. Tang, Y. Li, Z. Bian, and Z. Bai. 2022. “Consistency Analysis and Accuracy Assessment of Three Global Ten-Meter Land Cover Products in Rocky Desertification Region—A Case Study of Southwest China.” ISPRS International Journal of Geo-Information 11 (3): 202. https://doi.org/10.3390/ijgi11030202.
  • Wang, H., H. Yan, Y. Hu, Y. Xi, and Y. Yang. 2022. “Consistency and Accuracy of Four High-Resolution LULC Datasets—Indochina Peninsula Case Study.” The Land 11 (5): 758. https://doi.org/10.3390/land11050758.
  • Wen, Y., X. Li, H. Mu, L. Zhong, H. Chen, Y. Zeng, S. Miao, et al. 2022. “Mapping Corn Dynamics Using Limited but Representative Samples with Adaptive Strategies.” Isprs Journal of Photogrammetry and Remote Sensing 190:252–266. https://doi.org/10.1016/j.isprsjprs.2022.06.012.
  • Yang, J., and X. Huang. 2021. “The 30 M Annual Land Cover Dataset and Its Dynamics in China from 1990 to 2019.” Earth System Science Data 13 (8): 3907–3925. https://doi.org/10.5194/essd-13-3907-2021.
  • Yang, H., S. Li, J. Chen, X. Zhang, and S. Xu. 2017. “The Standardization and Harmonization of Land Cover Classification Systems Towards Harmonized Datasets: A Review.” Isprs International Journal of Geo-Information 6 (5): 154. https://doi.org/10.3390/ijgi6050154.
  • Yin, Y., R. Xia, Y. Chen, R. Jia, N. Zhong, C. Yan, Q. Hu, X. Li, and H. Zhang. 2023. “Non-Steady State Fluctuations in Water Levels Exacerbate Long-Term and Seasonal Degradation of Water Quality in River-Connected Lakes.” Water Research 242:120247. https://doi.org/10.1016/j.watres.2023.120247.
  • Zanaga, D., R. Van De Kerchove, W. De Keersmaecker, N. Souverijns, C. Brockmann, R. Quast, J. Wevers, et al. 2021. “ESA WorldCover 10 m 2020 v100.” https://doi.org/10.5281/zenodo.5571936.
  • Zha, Y., J. Gao, and S. Ni. 2003. “Use of Normalized Difference Built-Up Index in Automatically Mapping Urban Areas from TM Imagery.” International Journal of Remote Sensing 24 (3): 583–594. https://doi.org/10.1080/01431160304987.
  • Zhang, J., Z. Li, K. Nai, Y. Gu, and A. Sallam. 2019. “DELR: A Double-Level Ensemble Learning Method for Unsupervised Anomaly Detection.” Knowledge-Based Systems 181:104783. https://doi.org/10.1016/j.knosys.2019.05.026.
  • Zhang, X., L. Liu, X. Chen, Y. Gao, S. Xie, and J. Mi. 2021. “GLC_FCS30: Global Land-Cover Product with Fine Classification System at 30m Using Time-Series Landsat Imagery.” Earth System Science Data 13 (6): 2753–2776. https://doi.org/10.5194/essd-13-2753-2021.
  • Zhang, H. K., and D. P. Roy. 2017. “Using the 500 M MODIS Land Cover Product to Derive a Consistent Continental Scale 30 M Landsat Land Cover Classification.” Remote Sensing of Environment 197:15–34. https://doi.org/10.1016/j.rse.2017.05.024.
  • Zhang, C., H. Zhang, and S. Tian. 2023. “Phenology-Assisted Supervised Paddy Rice Mapping with the Landsat Imagery on Google Earth Engine: Experiments in Heilongjiang Province of China from 1990 to 2020.” Computers and Electronics in Agriculture 212:108105. https://doi.org/10.1016/j.compag.2023.108105.
  • Zhao, Y., Z. Nasrullah, M. K. Hryniewicki, and Z. Li. 2019. “LSCP: Locally Selective Combination in Parallel Outlier Ensembles.” Proceedings of the 2019 SIAM International Conference on Data Mining(SDM), Calgary, Alberta, Canada, 585–593.
  • Zhao, T., X. Zhang, Y. Gao, J. Mi, W. Liu, J. Wang, M. Jiang, and L. Liu. 2023. “Assessing the Accuracy and Consistency of Six Fine-Resolution Global Land Cover Products Using a Novel Stratified Random Sampling Validation Dataset.” Remote Sensing 15 (9): 2285. https://doi.org/10.3390/rs15092285.
  • Zhu, L., G. Jin, X. Zhang, R. Shi, Y. La, and C. Li. 2021. “Integrating Global Land Cover Products to Refine GlobeLand30 Forest Types: A Case Study of Conterminous United States (CONUS).” International Journal of Remote Sensing 42 (6): 2105–2130. https://doi.org/10.1080/01431161.2020.1851797.
  • Zimek, A., R. J. Campello, and J. Sander. 2014. “Ensembles for Unsupervised Outlier Detection: Challenges and Research Questions a Position Paper.” ACM SIGKDD Explorations Newsletter 15 (1): 11–22. https://doi.org/10.1145/2594473.2594476.
  • Zuo, Z., Z. Li, P. Cheng, and J. Zhao. 2023. “A Novel Subspace Outlier Detection Method by Entropy-Based Clustering Algorithm.” Scientific Reports 13 (1). https://doi.org/10.1038/s41598-023-42261-4.