Publication Cover
Canadian Journal of Remote Sensing
Journal canadien de télédétection
Volume 49, 2023 - Issue 1
1,020
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Object-Based Image Analysis (OBIA) and Machine Learning (ML) Applied to Tropical Forest Mapping Using Sentinel-2

L’analyse d’images basée sur des objets (OBIA) et l’apprentissage automatique (ML) appliqués à la cartographie des forêts tropicales à l’aide de Sentinel-2

, &
Article: 2259504 | Received 20 Mar 2023, Accepted 10 Sep 2023, Published online: 16 Oct 2023

Abstract

The purpose of this research was to distinguish and estimate natural forest areas at Paraná, Brazil. Forest plantations (Silviculture) and natural forests have high annual vegetative vigor, as well as agricultural areas in the periods of agricultural harvests, which can bring classification errors between these classes of Land Use and Land Cover (LULC), these classes have similar spectral signatures, but have a distinct texture that can be separated in the supervised classification process, with the joining of object and pixel-to-pixel classification method approaches. Thus, image segmentation techniques through Object-Based Image Analysis (OBIA) and Machine Learning (ML) made forest mapping possible over a large territorial extension. The Google Earth Engine (GEE) platform was used to calculate the vegetation indices (VIs) and Spectral Mixture Analysis (SMA) fraction spectral from Sentinel-2 images, and the creation of homogeneous spectrally shaped regions under supervised classification of phytoecological regions and mesoregions. The overall precision obtained in the mappings resulted in 0.94 Kappa Index (KI) and 96% of Overall Accuracy (OA), which indicates a high performance in large-scale forest mapping. The proposed dataset, source codes and trained models are available on Github (https://github.com/Cechim/simepar-brazil/), creating opportunities for further ad vances in the field.

Résumé

Le but de cette recherche était de distinguer et d’estimer les superficies forestières indigènes du Paranà, au Brésil. Les plantations forestières (sylviculture) et les forêts naturelles ont une vigueur végétative annuelle élevée de même que les zones agricoles pendant les périodes de récoltes agricoles, ce qui peut entrâıner des erreurs de classification entre ces trois classes d’utilisation des terres (LULC). Ces classes ont des signatures spectrales similaires, mais ont une texture distincte qui peut être séparée dans le processus de classification supervisée en combinant les méthodes de classification par objet et par pixel. Ainsi, les techniques de segmentation d’images par l’analyse basée sur les objets (OBIA) et l’apprentissage automatique (ML) ont rendu possible la cartographie forestière sur un grand territoire. La plate-forme Google Earth Engine (GEE) a été utilisée pour calculer les indices de végétation (VIs), pour analyser les mélanges spectraux (SMA) d’images Sentinel-2, et pour créer des régions spectralement ho-mogènes lors de la classification supervisée des régions phytoécologiques et des mésorégions. Les classifications résultantes ont obtenu un indice Kappa (KI) global de 0,94 et une précision globale (OA) de 96%, ce qui indique une performance élevée dans la cartographie forestière à grande échelle. L’ensemble de données, les codes sources et les modèles entrâınés sont disponibles sur Github1, ce qui ouvre la voie à de nouvelles avancées dans ce domaine.

1. Introduction

The global forest area is 4.06 billion hectares (ha), which represents 31% of the planet’s total surface area, more than half of the world’s forests (54%) are concentrated in five countries: Russia (20%), Brazil (12%), Canada (9%), the USA (8%), and China (5%) FAO (Citation2020). The mapping of natural forests plays a strategic role in forest management and policies, conservation, and environmental licensing purposes. The natural forests area estimation using remote sensing depends on the method and the spatial resolution of the images.

The new series of satellites have sensors with a high spatial and temporal resolution, which relates to the capability of revisiting the same area Segarra et al. (Citation2020). Furthermore, the adoption of new technologies and several alternative image providers, such as Planet (Nanosatellites), AIRBUS, Digital Globe, and ESA (European Space Agency), reduced the image cost per km2. These provider groups offer several operating satellites alternatives, such as Pleiades A, Pleiades B, SPOT 7, SPOT 6, SPOT 5, WorldView 2, WorldView 3, and Sentinel-2. The Sentinel-2 images are produced from passive optical sensors, and the temporal resolution is generally weekly due to the use of satellite constellations Vrdoljak and Kilic´ Pamukovic´ (Citation2022).

Sentinel 2A and 2B satellites generate multispectral images (bands 2 to 4 and 8) with a 10-meter spectral resolution and a 5-day temporal resolution with the advantage that these images have no acquisition cost for the user. Due to its mul tispectral bands, Sentinel-2 images are used for forest type classification Chen et al. (Citation2018), biomass estimation Duan et al. (Citation2019), urban forest spatial distribu tion Eskandari et al. (Citation2020), forest removal Pałaś and Zawadzki (Citation2020), mapping of land cover and land use Zeng et al. (Citation2020), forest and mangrove mapping Cis sell et al. (2021).

Therefore, the temporal and spectral resolution of Sentinel-2 images have great potential for projects based on mapping, area estimation, and the identifica tion of changes in LULC. There are also various initiatives and projects involving public and private institutions focused on the development of LULC maps both in Brazil and in Paraná state, the focus of this study.

This large database can be processed for vast territorial extensions using the almost real-time processing from the GEE cloud platform, which presents comput ing capabilities that can be applied for several high-impact social issues, including deforestation, drought, catastrophes, diseases, food security, water management, climate monitoring, global water surface changes Pekel et al. (Citation2016), and environmental protection Gorelick et al. (Citation2017). In the study of forests, this platform has been used for the analysis of forest cover and forest loss Hansen et al. (Citation2013), estimation of crop harvest Lobell et al. (Citation2015), to generate land cover products (Sousa et al. (Citation2020) and Zeng et al. (Citation2020)), for coniferous forest classification Kaplan (Citation2021), biomass estimation of forest plantation Theofanous et al. (Citation2021), mangrove mapping Mondal et al. (Citation2019), forest cover mapping Ganz et al. (Citation2020), forest estimation and detection of forest change Zulfiqar et al. (Citation2021), and analysis of forest species distribution Xie et al. (Citation2021).

The GEE platform uses ML algorithms and many works indicate its efficiency in different mapping applications and supervised classification. Some examples are shoreline mapping in order to differentiate substrate types Banks et al. (Citation2017) and to estimating terrestrial latent heat flux Wang et al. (Citation2017), mapping of land cover dynamics Huang et al. (Citation2017), land use classification Hird et al. (Citation2017), mapping of wetlands Brovelli et al. (Citation2020), and forest mapping and monitoring Was´niewski et al. (Citation2020).

Therefore, the mapping of natural forests with a new methodological approach employing multispectral images from the Sentinel-2 satellite series and ML in GEE will contribute to the enforcement oversight public policies in any process of environmental licensing or authorization for forest suppression. This is especially important in order to support the monitoring of natural forest resources and to subside the monitoring large-scale deforestation. The aim of this work was the development of a methodological approach for the mapping of natural forests.

Thus, the main objective of this work was to map and estimate areas the natural forest formation of the Atlantic forest biome in Paraná state, Southern Brazil, using OBIA process with ML supervised classification using images from the Sentinel-2 satellite implemented in GEE computing environment with spectral fraction and VIs applied for each biome phytoecological.

1.1. Organization of research steps

The article was divided in the following steps: (1) Introduction; (2) Definition of the study area (southern Brazil), data selection and satellite images acquisition; (3) Research methods (Methodological procedures): OBIA including extraction of texture index of Grey Level Co-occurrence Matrix (GLCM), image segmentation, topology correction, supervised classification in GEE including asset importation from OBIA, selection of dates with low cloud incidence by mesoregion, cloud mask application, fraction spectral acquisition and calculation of VIs, independent samples selection by phytoecological region, supervised classification by ML, spatial filter application, validation methodology and spatial accuracy evaluation and natural forest area estimation; (4) Results and discussion and (5) Conclusions.

2. Materials

2.1. Study area

The state of Paraná is located in Southern Brazil, between the parallels 22°29’S and 26°43’S latitude, and between the meridians 48°2’W and 54°38’W longitude (). Paraná has an annual average precipitation between 1100 mm to 1920 mm, an average temperature between 15 °C and 24 °C Aparecido et al. (Citation2016), and annual average evapotranspiration of 700 mm to 1600 mm Caviglione et al. (Citation2000).

Figure 1. Paraná phytoecological regions: Dense Ombrophilous Forest, Mixed Ombrophilous Forest, Semideciduous Seasonal Forest, Pioneer Vegetal Formation, Savanna (Cerrado), Steppe (Campos Sulinos), Water and Areas of Ecotone (Contact zones).

Figure 1. Paraná phytoecological regions: Dense Ombrophilous Forest, Mixed Ombrophilous Forest, Semideciduous Seasonal Forest, Pioneer Vegetal Formation, Savanna (Cerrado), Steppe (Campos Sulinos), Water and Areas of Ecotone (Contact zones).

The state’s climatic characteristics have regional variations, yet the Köppen and Geiger (1928) climate classification system establish the Cfa (warm temper ate with hot summer) and Cfb (humid temperate with moderate hot summer) as the most predominant classes. However, Cwa (humid temperate with dry winter) and Aw (humid tropical savanna) are also present in the state’s North portion Aparecido et al. (Citation2016).

As a result of the interaction between biotic (vegetation and animals) and abiotic (climate, rock, topography, and soil) components, the state has a great diver sity of landscapes and numerous types of vegetation. The phytoecological regions are spaces defined by typical floristic genera and characteristical biological forms that are recurrent within the same climate, occurring on land of varied lithology, but with defined topography IBGE (Citation2012) ().

2.2. Data selection

Sentinel-2 satellite images from June to December 2016 were used for natural forest classification. We used as a selection criterion the periods without predominant agricultural crops – such as soybean, corn (summer) and wheat (winter), and with low cloudiness at the time of image acquisition .

Table 1. Characteristics of Sentinel-2 MSI bands used for OBIA.

3. Methods

3.1. Methodological procedures

The research method was divided in the following steps ().

Figure 2. Schematic overview of natural forest mapping methodological procedures.

Figure 2. Schematic overview of natural forest mapping methodological procedures.

3.2. Object-based image analysis (OBIA)

The supervised classification method was used by acquiring samples from the images with different types Sentinel-2 color composites to discrimination forest types, as natural forest formation (native) of forest plantation (Silviculture). For the natural forest class, 1,885 sampling distributed throughout the state of Paraná were selected. The classification method used was segmentation with (OBIA) in eCognition Software. Different parameters were tested for the segmentation using the algorithm “Multiresolution Segmentation” and the base parameters Scale, Shape and Compactness” were set to 150, 0.1, and 0.5 respectively.

To assist in the classification process, different official reference cartographic bases from the state of Paraná were used as extra information. The Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI), Haralick texture index (GLCM Homogeneity) Haralick et al. (Citation1973), Haralick (Citation1979) and Conners and Harlow (Citation1980) were generated and used as additional bands.

The classification process is iterative, thus, some classes of land use were segmented with different parameterizations, is considered as both a qualitative assessment based on visual interpretation and a quantitative assessment using reference data Costa et al. (Citation2018) and measures that report on the overall accuracy of segmentation Clinton et al. (Citation2010), and consequently reclassified more than once until a better result was achieved regarding the performance and the desired classification.

We used the Nearest Neighbor (NN) classification algorithms from eCognition Developer, a supervised classification method derived from statistical learn ing theory Cover and Hart (Citation1967). The result was submitted to a class editing, a post-classification process to correct possible inconsistencies in the classification, which are mainly related to wrongly labeled classes.

3.3. Topological analysis

After the OBIA classification, topological analyses were applied taking into account all classes, ArcGis 10.8.1 software was used for this. As topological tools, “Intersect” was used to identify overlapping areas or edges between classes, and “Symmetrical difference” and “Erase” to identify areas not classified by the NN algorithm. The “Dissolve” function was applied to the Forest class, the functions “Check geometry” and “Repair geometry” have been applied to correct inconsistencies in the geometries or the attribute table. From these topology analyses the” Explode” tool was applied to break the forest class into polygons, which were then cut and separated by mesoregion and further imported into GEE.

3.4. Supervised classification in GEE

The OBIA supervised classification was imported in Asset form into GEE. In the ML supervised classification process, the stable samples – training and calibration samples that were extracted from classes that did not change their values during all years of the 6.0 collection (from 1985 to 2020) – from MapBiomas for the year 2016 were used as reference. The digital classification was performed by phytoecological region contained in each mesoregion using the Random Forest (RF) algorithm available in GEE, and running 70 interactions Breiman (Citation2001). This classifier is an algorithm less sensitive to the quality of training samples and overfitting due to the large number of decision trees produced by the random se lection of a training sample subset Belgiu and Drăguţ (Citation2016).

A total of 23 Sentinel-2 visible and infrared spectral bands at 10 m spatial resolution, Spectral Mixture Analysis (SMA) spectral fractions and Vegetation Indices (VIs) were used in the classification process (). The SMA was generated from the calculation of the Green Vegetation (GV), Non-Photosynthetic Vegeta tion (NPV), Soil and Shade fractions implemented in GEE. SMA is a physically based form of image processing that aids in the repeated and accurate derivation of quantitative sub-pixel information (Smith et al. Citation1990). Some studies have already used SMA for the estimation and mapping of agricultural crop residues (Bannari et al. (Citation2006); Pacheco and McNairn (Citation2010)).

Table 2. Spectral bands Sentinel-2 and vegetation indices used in the classification.

SMA works under the assumption that a spectrum computed by a sensor considers a linear combination of the spectra of all components within the pixel and the spectral proportions of the end members, which reflect the proportions of the occupied area by defined features on the Earth’s surface Adams et al. (Citation1995); Lu et al. (Citation2004).

The VIs Calculated in the GEE and based on the median Ganz et al. (Citation2020) ().

To improve accuracy, a new RF classification was made, such as parameters was used, the number of trees in the random forest classifier varied from 50 to 100 iterations and variables per split from 1, the classifier with 70 trees to Sentinel 2 data resolution, the sample set used were the stable samples from the MapBiomas project for the year 2016 Souza et al. (Citation2020).

After the supervised classification process, a spatial filter created with the [“connectedPixelCount”] function was applied to avoid unwanted changes to the pixel group edges. This function, available and implemented in the GEE platform, locates connected components (neighbors) that share the same pixel value. Thus, only pixels that do not share connections with a predefined number of identical neighbors are considered isolated. In this filter, at least six connected pixels are required to achieve the minimum connection value. Consequently, the minimum mapping unit is directly affected by the applied spatial filter, and it was defined as 6 pixels (the equivalent of approximately 0.5 ha) Souza et al. (Citation2020).

3.5. Validation methodology

The methodology for validating the natural forest classification by ML was divided into two steps: grid generation within the study area, and the generation of point samples within each grid. The number of grids sampled was defined ac cording to the methodology described in the Technical Specification Standard for Quality Control of Geospatial Data (ET-CQDG) of DSG (2015), which adopts the sampling plans described in ISO (International Organization for Standardization) standards.

The thematic accuracy validation was defined based on the number of points that were generated from the regular grid (sampling is uniform, non-proportional, and non-random). The definition of the grid is based on the scale of the generated product, which, in this case, has a spatial resolution of 10 m, so the scale would be 1:60,000. Spatial sampling is done by partitioning the cells into 4 × 4 cm according to the scale of the product to be evaluated, and using integer values in the form of a grid, while the number of cells depends on the scale and size of the study area ().

Table 3. Satellite scale compatibility from ET-CQDG spatial sampling from DSG 2015.

Following the scale definition, the grid and the number of points to be used for validation were generated with the “Fishnet tool” from ArcGIS, while the population points set was generated by the grid centroid. The resulting population set was 34,708 points for the entire state of Paraná, i.e., one reference point every 2.4 km. From the attribute table, the classes corresponding to each of the points were defined based on the Sentinel-2 images with a spatial resolution of 10 m, and Planet of 5 m. In this way, all points in the population set were visually identified via satellite image interpretation, defining, thus, the standardized reference matrix for two classes: Natural forest and Non-forest.

3.6. Accuracy analysis and area estimation

The classifier performance was evaluated using metrics such as Kappa Index (KI), Overall Accuracy (OA), Inclusion Errors (IE), and Omission Errors (OE) (Congalton (Citation1991), Congalton and Green (Citation2019)) and Global Disagreement (Allocation and Quantity components) Pontius and Millones (Citation2011). The forest area was estimated by counting the pixels contained in each municipality using the QGIS “Zonal Statistics” tool Sherman et al. (Citation2011) and comparing the estimated area of the mapping done with Sentinel-2 images, the area obtained in the 2016 MapBiomas, and the one from IAT (Institute Water and Earth of Paraná) generated with WorldView images from 2012 to 2016.

The Albers Equal Area Projection, and SIRGAS2000 Datum were the standard to generate the area quantitatives. The estimations’ normality analysis was performed using the Shapiro-Wilk and Anderson-Darling tests, as well as the Spearman’s correlation coefficient (rs) to verify the data dispersion when comparing the estimated municipal area obtained from other mappings (IAT and MapBiomas). The Refined Index of Agreement (dr) of Willmott et al. (Citation2012) (EquationEquation (1)), which measures the precision of the estimated values in relation to the straight line 1:1, and the Mean Error (ME) (EquationEquation (2)), which measures the mean of the errors, were used as statistical indicators since they determine the method accuracy, and indicate the distance between the estimated and the observed values. This index ranges from −1 to 1, with positive values close to 1 indicating better agreement. (1) dr=1|EiOi|2|OiO|(1) (2) ME=1n*(OiEi)(2) where: Ei = Estimated forest area; Oi = Observed forest area; O = Average observed forest area; n = number of municipalities.

4. Results and discussion

4.1. Mapping of LULC

For OBIA mapping, samples for 9 thematic classes were selected through NN supervised classification based on the mapping generated with Sentinel-2 images from 2016, this classification for the forest cover class was made for the state of Paraná using the supervised OBIA classification with topological correction imported into GEE, for the natural forest class, an KI of 0.87 with OA of 91.05% was obtained ().

Figure 3. Classification of LULC by OBIA (images from June to December 2016, Sentinel-2.

Figure 3. Classification of LULC by OBIA (images from June to December 2016, Sentinel-2.

4.2. Supervised classification in GEE

The stable samples obtained from the Savanna phytoecological region through the MapBiomas project, year 2016 Souza et al. (Citation2020), were used to test the performance of different ML supervised classification algorithms on GEE. The Random Forest (RF) and Gradient Tree Boost (GTB) classifiers showed better perfor mance and similarity, with better definition and smoothing between the mapping classes ().

Table 4. Spatial accuracy, KI and OA(%) of the classifiers tested by ML in GEE.

The Classification and Regression Trees (CART) classifier demonstrated greater spectral confusion among the land use and land cover classes. The accuracy evaluation was done by means of a confusion matrix in GEE according to Stehman (Citation1997) with a sampling of 70% of the data for training and 30% for validation testing. The highest spectral confusions were between Natural forest formation and Silviculture (Planted forest) classes, and between Pasture and Agriculture areas. Among the classifiers used, RF obtained the best accuracy result in LULC classification. The use of any type of spatial filter was not evaluated, but the classification algorithm Support Vector Machine (SVM), RF, GBT and CART and its performance using the same training sample.

After defining the best ML algorithm (RF), all classes of LULC were imported as Asset in the GEE, thus a new classification was made with a new sample set for training and validation, improving the performance of thematic accuracy which can be verified by Correctly Classified Pixels (CCP%) and OA% according to matrix adapted of Richards (Citation1993) ().

Table 5. Confusion matrix among the classes of LULC.

The proximity between the average, predicted value and the true value is evidenced by the Accuracy score metric, indicated how close is the measured value to the true value, the Macro average indicated average unweighted mean per label and Weighted average indicated average support-weight mean per lable ().

Table 6. Accuracy assessment criteria for the natural forest class.

4.3. Mapping of natural forest formation

It is important to highlight that, in this mapping, the forests’ spatial distribution is evident, especially in the South, Southeast, and Metropolitan mesoregions. This map was generated from the initial mapping done by OBIA with supervised classification (NN) and refined by a reclassification process in GEE by ML with RF ().

Figure 4. Natural forest mapping by OBIA and ML with RF (Year 2016, Sentinel-2 images).

Figure 4. Natural forest mapping by OBIA and ML with RF (Year 2016, Sentinel-2 images).

4.4. Spatial accuracy assessment

According to the classification proposed by Landis and Koch (Citation1977) for the KI value, the spatial accuracy analysis of the classifications showed excellent the matic quality with KI values between 0.80 and 1.00 (). The lowest values, in turn, were obtained for the Mixed Ombrophilous Forest, which can be explained by the large number of reforestation areas ().

Table 7. Evaluation statistics of supervised classification by Phytoecological Region.

Similar results were found by the mapping of Souza et al. (Citation2020) on GEE with Landsat images from the Atlantic Forest Biome in 2016 using RF for the MapBiomas project. It resulted in an OA of 91.4%, an Allocation disagreement of 6.5%, and a Quantity disagreement of 2.9%, the Quantity component, considered as the classification of incorrect proportions of pixels in the classes and by the Allocation component that refers to the incorrect spatial distri bution of pixels in classes.

Similar results were also obtained by Was´niewski et al. (Citation2020) with an accuracy of 92.6% and 98.5% using RF and Sentinel-2 images for forest mapping in Northwestern Gabon; and by Niculescu et al. (Citation2018) while monitoring the vegetation with Sentinel-2 and SPOT-6 data in 2017 at France, where he obtained 93% in OA.

The values for IE and OE obtained in this study were 6.4% and 5.4%, respectively, regarding the entire state of Paraná. These results were lower than those for OE and IE obtained by Souza et al. (Citation2020) − 14% and 6.2% respectively in the mapping of the forest formation class for the MapBiomas project in 2016. This improvement in the mapping performance with Sentinel-2 images can be explained by the use of the Paraná state mesoregions (10 homogeneous regions). In addition, the new classification (refined) process performed by RF implemented in GEE was done by forest types and considering the phytoecological regions.

As a function of the KI calculating the accuracy based on randomness arises as an option the analysis of the Global Disagreement (GD) (Allocation and Quantity). Such components provide additional information that assists in explanation of the error in the mappings, the contribution of the allocation component of 2.0% and the proportion of the quantity component with 0.1% indicates the effectiveness of this new approach developed for large-scale forest mapping associating OBIA and ML ().

Figure 5. Errors of commission, omission by domain and allocation component and quantity.

Figure 5. Errors of commission, omission by domain and allocation component and quantity.

The determination of GD and metric intensity of omission and commission uses the reference sample set of the classes under study comparing with the population set of pixels of the generated mapping. In the global disagreement graph formed by the percentages allocation component and quantity, it is observed that the contribution of the allocation component in the total disagreement was greater than the quantity component, the which implies incorrect spatial distribution designated pixels in classes ().

Figure 6. Intensity of omission errors, commission and disagreement global.

Figure 6. Intensity of omission errors, commission and disagreement global.

4.5. Estimated forest area

The estimation indicated that there was an underestimation of the forest area obtained by RF mapping with Sentinel-2 images, with a difference of 213,164 ha (3.4%) when compared to mapping from MapBiomas collection 5 from 2016. In contrast, when comparing with IAT mapping done with WorldView images from 2012 to 2016, the area was overestimated, which can be explained by the spectral mixing of some areas contained in the 10 m pixels of Sentinel-2 images. The area estimations were calculated considering the original spatial resolution of each mapping technique ().

Table 8. Estimation of natural forest area by different satellites.

As the data of estimated forest area do not follow a normal probability distribution, the Spearman’s correlation coefficient (rs) was used, and the value of rs = 0.99 was obtained when comparing MapBiomas/Landsat and IAT/WorldView2 mapping, indicating a strong correlation between the data obtained in the mapping done with Sentinel-2 images using the methodology associating OBIA and RF ().

Figure 7. Comparison between area estimates using Sentinel-2 mapping, year 2016. (A) Map-Biomas and (B) IAT.

Figure 7. Comparison between area estimates using Sentinel-2 mapping, year 2016. (A) Map-Biomas and (B) IAT.

The analysis of the data compared by statistical metrics indicated that the estimated area had a Mean Error (ME) of 534.25 ha in relation to MapBiomas map ping, and a ME of 576.03 ha when compared to IAT mapping. This dissimilarity can be justified by the detect sensors’ spatial resolution difference on the different satellites (). Willmott’s refined index of agreement (dr) measured the accuracy between the area estimated with Sentinel-2 mapping, MapBiomas, and IAT mapping using WorldView images. Values of 0.94 (MapBiomas) and 0.93 (IAT) were found, indicating an optimal performance, i.e., high accuracy among the area estimations.

5. Conclusions

The results indicate the potential of applying OBIA and ML techniques for supervised classification of natural forests using VIs and spectral fraction from Sentinel-2 images. Moreover, it was possible to map and estimate the forest area for large territorial extensions such as the entire state of Paraná.

The division of the state by phytoecological region and mesoregions implemented in GEE enabled a better spectral homogeneity of the regions for mapping. This facilitated the selection of images from days without cloud cover, and in peri ods with agricultural crops at low vegetative vigor, which allowed for a significant improvement in mapping. As a result, there was a decrease in spectral confusion among the LULC classes.

The use of vegetation indices and texture improved the performance and accuracy of the classifier. Furthermore, the use of OBIA facilitated the post-classification editing and enabled the reduction of spectral confusion between classes, and consequently increased the thematic spatial accuracy.

The high performance of this approach demonstrated the methodological efficiency based on the analysis of the IE and OE, the high OA, the area estimation, the statistical indicators such as Spearman’s correlation coefficient (rs), Mean Error, and the refined index of agreement, which had an excellent performance in comparison with mappings from other detect sensors.

Therefore, the methodology can be applied in projects involving forest mapping in large territorial extensions that require high thematic precision and area estimation. This makes possible the monitoring and generation of reliable area estimations for subsequent years.

Algorithm

The ML algorithm used for RF classification and the Sentinel-2 mosaic bands, including vegetation indices, SMA fraction, and Sentinel-1 bands, is publicly available at: https://github.com/Cechim/simepar-brazil/.

Acknowledgment

The authors would like to thank the Academic Publishing Advisory Center (Centro de Assessoria de Publicação Acadêmica, CAPA – www.capa.ufpr.br) of the Federal University of Paraná (UFPR) for assistance with English language translation and developmental editing.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Federal University Paraná (UFPR) and Technology System and Environmental Monitoring of Paraná (SIMEPAR).

References

  • Adams, J.B., Sabol, D.E., ValerieKapos, F. R.A., Roberts, D.A. O., Smith, M., and Gillespie, A.R. 1995. “Classification of multispectral images based on fractions of endmembers: Application to land-cover change in the Brazilian Amazon.” Remote Sensing of Environment, Vol. 52(No. 2): pp. 137–154. doi:10.1016/00344257(94)00098-8.
  • Aparecido, L.E.O., Rolim, G.S., Richetti, J., Souza, P.S., and Johann, J.A. 2016. “Köppen, Thornthwaite and Camargo climate classifications for climatic zoning in the State of Parana’.” Ciência e Agrotecnologia, Vol. 40(No. 4): pp. 405–417. doi:10.1590/1413-70542016404003916.
  • Banks, S., Millard, K., Behnamian, A., White, L., Ullmann, T., Charbon-Neau, F., Chen, Z., Wang, H., Pasher, J., and Duffe, J. 2017. “Contributions of actual and simulated satellite sar data for substrate type differentiation and shoreline mapping in the Canadian arctic.” Remote Sensing, Vol. 9(No. 12): pp. 1206. doi:10.3390/rs9121206.
  • Bannari, A., Pacheco, A., Staenz, K., McNairn, H., and Omari, K. 2006. “Estimating and mapping crop residues cover on agricultural lands using hyperspectral and ikonos data.” Remote Sensing of Environment, Vol. 104(No. 4): pp. 447–459. doi:10.1016/j.rse.2006.05.018.
  • Belgiu, M., and Drăguţ, L. 2016. “Random forest in remote sensing: A review of applications and future directions.” ISPRS Journal of Photogrammetry and Remote Sensing Vol. 114: pp. 24–31. doi:10.1016/j.isprsjprs.2016.01.011.
  • Blackburn, G.A. 1998. “Spectral indices for estimating photosynthetic pigment concentrations: A test using senescent tree leaves.” International Journal of Remote Sensing, Vol. 19(No. 4): pp. 657–675. doi:10.1080/014311698215919.
  • Breiman, L. 2001. “Random Forests.” Machine Learning, Vol. 45(No. 1): pp. 5–32. doi:10.1023/A:1010933404324.
  • Brovelli, M.A., Sun, Y., and Yordanov, V. 2020. “Monitoring forest change in the amazon using multi-temporal remote sensing data and machine learning classification on google earth engine.” ISPRS International Journal of Geo-Information (MDPI), Vol. 9: pp. 1–21. doi:10.3390/ijgi9100580.
  • Caviglione, J.H., Kiihl, L.R.B., Caramori, P.H., de Oliveira, D., Galdino, J., Borrozino, E., Giacomini, C.C.Y., Sonomura, M., and Pugsley, L. 2000. “Cartas climáticas do Paraná”. Instituto Agronômico do Paraná (IAPAR). Londrina: IAPAR, CD-Room.
  • Chen, L., Ren, C., Zhang, B., Wang, Z., and Xi, Y. 2018. “Estimation of forest above ground biomass by geographically weighted regression and machine learning with sentinel imagery.” Forests, Vol. 9(No. 10): pp. 582. doi:10.3390/f9100582.
  • Cissell, J.R., Canty, S.W.J., Steinberg, M.K., and Simpson, L.T. 2021. “Mapping National Mangrove Cover for Belize Using Google Earth Engine and Sentinel-2 Imagery.” Applied Sciences, Vol. 11(No. 9): pp. 4258. doi:10.3390/app11094258.
  • Clinton, N., Holt, A., Scarborough, J., Yan, L., and Gong, P. 2010. “Accuracy assessment measures for object-based image segmentation goodness.” Photogrammetric Engineering & Remote Sensing, Vol. 76(No. 3): pp. 289–299. doi:10.14358/PERS.76.3.289.
  • Congalton, R.G. 1991. “A review of assessing the accuracy of classifications of remotely sensed data.” Remote Sensing of Environment, Vol. 37(No. 1): pp. 35–46. doi:10.1016/0034-4257(91)90048-B.
  • Congalton, R.G., and Green, K. 2019. Assessing the Accuracy of ReMotely Sensed Data: Principles And Practices. 3st ed., Boca Raton: CRC Press. doi:10.1201/9780429052729.
  • Conners, R.W., and Harlow, C.A. 1980. “A Theoretical Comparison of Texture Algorithms.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 2(No. 3): pp. 204–222. doi:10.1109/TPAMI.1980.4767008.
  • Costa, H., Foody, G.M., and Boyd, D.S. 2018. “Supervised methods of image segmentation accuracy assessment in land cover mapping.” Remote Sensing of Environment, Vol. 205: pp. 338–351. doi:10.1016/j.rse.2017.11.024.
  • Cover, T.M., and Hart, P. 1967. “Nearest neighbor pattern classification.” IEEE Transactions on Information Theory, Vol. 13(No. 1): pp. 21–27. doi:10.1109/TIT.1967.1053964.
  • Dash, J., and Curran, P. 2004. “The meris terrestrial chlorophyll index.” International Journal of Remote Sensing, Vol. 25(No. 23): pp. 5403–5413. doi:10.1080/0143116042000274015.
  • Delegido, J., Verrelst, J., Alonso, L., and Moreno, J. 2011. “Evaluation of sentinel-2 red-edge bands for empirical estimation of green lai and chlorophyll content.” Sensors, Vol. 11(No. 7): pp. 7063–7081. doi:10.3390/s110707063.
  • Duan, Q., Tan, M., Guo, Y., Wang, X., and Xin, L. 2019. “Understanding the spatial distribution of urban forests in China using sentinel-2 images with google earth engine.” Forests, Vol. 10(No. 9): pp. 729. doi:10.3390/f10090729.
  • Escadafal, R. 1989. “Remote sensing of arid soil surface color with landsat thematic mapper.” Advances in Space Research, Vol. 9(No. 1): pp. 159–163. doi:10.1016/0273-1177(89)90481-X.
  • Eskandari, S., Jaafari, M.R., Oliva, P., Ghorbanzadeh, O., and Blaschke, T. 2020. “Mapping land cover and tree canopy cover in Zagros Forests of Iran: Application of Sentinel-2, Google Earth, and Field Data.” Remote Sensing, Vol. 12(No. 12): pp. 1912. doi:10.3390/rs12121912.
  • FAO. 2020. Global Forest Resources Assessment 2020 – Key findings. Food and Agriculture Organization of the United Nations (FAO), pp. 1–16. Rome: FAO. doi:10.4060/ca8753en.
  • Ganz, S., Adler, P., and KäNdler, G. 2020. “Forest cover mapping based on a Combination of aerial images and sentinel-2 satellite data compared to national forest inventory data.” Forests, Vol. 11(No. 12): pp. 1322. doi:10.3390/f11121322.
  • Gitelson, A.A., Gritz, Y., and Merzlyak, M.N. 2003. “Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves.” Journal of Plant Physiology, Vol. 160(No. 3): pp. 271–282. doi:10.1078/0176-1617-00887.
  • Gitelson, A.A., Kaufman, Y.J., and Merzlyak, M.N. 1996. “Use of a green channel in remote sensing of global vegetation from eos-modis.” Remote Sensing of Environment, Vol. 58(No. 3): pp. 289–298. doi:10.1016/S0034-4257(96)00072-7.
  • Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., and Moore, R. 2017. “Google earth engine: Planetary-scale geospatial analysis for everyone.” Remote Sensing of Environment Vol. 202: pp. 18–27. doi:10.1175/1520-0477.
  • Guyot, G., and Baret, F. 1988. Utilisation de la haute resolution spectrale pour suivre l’etat des couverts vegetaux, In 4. Colloque international, ASE, Aussois, France. https://hal.inrae.fr/hal-02780265.
  • Guyot, G., Frederic, B., and Major, D. 1988. “High spectral resolution: Determination of spectral shifts between the red and the near infrared.” International Archives of Photogrammetry and Remote Sensing, Vol. 11: pp. 750–760.
  • Hansen, M.C., Potapov, P.V., Moore, R., Hancher, M., Turubanova, S.A., TyukavIna, A., Thau, D., et al. 2013. “High-resolution global maps of 21st-century forest cover change.” Science, Vol. 342(No. 6160): pp. 850–853. doi:10.1126/science.124469.
  • Haralick, R.M. 1979. “Statistical and structural approaches to texture.” Proceedings of the IEEE, Vol. 67(No. 5): pp. 786–804. doi:10.1109/PROC.1979.11328.
  • Haralick, R.M., Shanmugam, K., and Dinstein, I. 1973. “Textural features of image classification.” IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-3(No. 6): pp. 610–621. doi:10.1109/TSMC.1973.4309314.
  • Hird, J.N., DeLancey, E.R., McDermid, G.J., and Kariyeva, J. 2017. “Google earth engine, open-access satellite data, and machine learning in support of large-area probabilistic wetland mapping.” Remote Sensing, Vol. 9(No. 12): pp. 1315. doi:10.3390/rs9121315.
  • Huang, H., Chen, Y., Clinton, N., Wang, J., Wang, X., Liu, C., Gong, P., et al. 2017. “Mapping major land cover dynamics in Beijing using all Landsat images in Google Earth Engine.” Remote Sensing of Environment, Vol. 202: pp. 166–176. doi:10.1016/j.rse.2017.02.021.
  • Huete, A. 1988. “A soil-adjusted vegetation index (savi).” Remote Sensing of Environment, Vol. 25 (No. 3): pp. 295–309. doi:10.1016/0034-4257(88)90106-X.
  • IBGE 2012. Manual Tecnico da Vegetação Brasileira. 2 ed., p. 272. Rio de Janeiro: IBGE. Instituto Brasileiro de Geografia e Estatıstica (IBGE).
  • Jay, S., Gorretta, N., Morel, J., Maupas, F., Bendoula, R., Rabatel, G., Dutartre, D., Comar, A., and Baret, F. 2017. “Estimating leaf chlorophyll content in sugar beet canopies using millimeter to centimeter-scale reflectance imagery.” Remote Sensing of Environment, Vol. 198: pp. 173–186. doi:10.1016/j.rse.2017.06.008.
  • Jiang, Z., Huete, A.R., Didan, K., and Miura, T. 2008. “Development of a two-band enhanced vegetation index without a blue band.” Remote Sensing of Environment, Vol. 112(No. 10): pp. 3833–3845. doi:10.1016/j.rse.2008.06.006.
  • Jordan, C.F. 1969. “Derivation of leaf-area index from quality of light on the forest floor.” Ecology, Vol. 50(No. 4): pp. 663–666. doi:10.2307/1936256.
  • Kaplan, G. 2021. “Broad-leaved and coniferous forest classification in google earth engine using sentinel imagery.” Environmental Sciences Proceedings (MDPI) 3, 1–6. doi:10.3390/IECF2020-07888.
  • Kimes, D., Markham, B., Tucker, C., and McMurtrey, J. 1981. “Temporal relationships between spectral response and agronomic variables of a corn canopy.” Remote Sensing of Environment, Vol. 11: pp. 401–411. doi:10.1016/0034-4257(81)90037-7.
  • Landis, J.R., and Koch, G.G. 1977. “The measurement of observer agreement for categorical data.” Biometrics, Vol. 33(No. 1): pp. 159–174.
  • Lobell, D.B., Thau, D., Seifert, C., Engle, E., and Little, B. 2015. “A scalable satellite based crop yield mapper.” Remote Sensing of Environment Vol. 164: pp. 324–333. doi:10.1016/j.rse.2015.04.021.
  • Lu, D., Batistella, M., Moran, E., and Mausel, P. 2004. “Application of spectral mixture analysis to amazonian land-use and land-cover classification.” International Journal of Remote Sensing, Vol. 25(No. 23): pp. 5345–5358. doi:10.1080/01431160412331269733.
  • McFeeters, S.K. 1996. “The use of the normalized difference water index (ndwi) in the delineation of open water features.” International Journal of Remote Sensing, Vol. 17(No. 7): pp. 1425–1432. doi:10.1080/01431169608948714.
  • Mondal, P., Liu, X., Fatoyinbo, T.E., and Lagomasino, D. 2019. “Evaluating Combinations of Sentinel-2 Data and Machine-Learning Algorithms for Mangrove Mapping in West Africa.” Remote Sensing, Vol. 11(No. 24): pp. 2928. doi:10.3390/rs11242928.
  • Niculescu, S., Billey, A., and Talab Ou Ali, H. 2018. “Random forest classification using Sentinel-1 and Sentinel-2 series for vegetation monitoring in the Pays de Brest (France).” Paper presented at SPIE Remote Sensing 10783. doi:10.1117/12.2325546.
  • Pacheco, A., and McNairn, H. 2010. “Evaluating multispectral remote sensing and spectral unmixing analysis for crop residue mapping.” Remote Sensing of Environment, Vol. 114(No. 10): pp. 2219–2228. doi:10.1016/j.rse.2010.04.024.
  • Pałaś, K.W., and Zawadzki, J. 2020. “Sentinel-2 imagery processing for tree logging observations on the Białowiez˙a Forest World Heritage Site.” Forests, Vol. 11(No. 8): pp. 857. doi:10.3390/f11080857.
  • Pekel, J.F., Cottam, A., Gorelick, N., and Belward, A.S. 2016. “High-resolution mapping of global surface water and its long-term changes.” Nature, Vol. 540(No. 7633): pp. 418–422. doi:10.1038/nature20584.
  • Pontius, R.G., and Millones, M. 2011. “Death to kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment.” International Journal of Remote Sensing, Vol. 32(No. 15): pp. 4407–4429. doi:10.1080/01431161.2011.552923.
  • Pouget, M., Floc’h, E., Kamal, S., and Saloum, B. 1990. Utilisation des données spot pour la cartographie des ressources renouvelables: application à la région côtière nord-ouest de l’egypte. Cartographie des etats de surface (SATCARTO), pp. 103–143.
  • Richards, J. 1993. Remote Sensing Digital Image Analysis. An Introduction, 340. Berlin Heidelberg: Springer-Velarg. doi:10.1007/978-3-642-88087-2.
  • Rouse, J.W., H.R.S.J., and Deering, D. 1973. Monitoring vegetation systems in the great plains with ERTS. Proceedings of 3rd Earth Resources Technology Satellite Symposium 1, 309–317.
  • Segarra, J., Buchaillot, M.L., Araus, J.L., and Kefauver, S.C. 2020. “Remote sensing for precision agriculture: Sentinel-2 improved features and applications.” Agronomy, Vol. 10(No. 5): pp. 641. doi:10.3390/agronomy10050641.
  • Sherman, G.E., Sutton, T., Blazek, R., Holl, S., Dassau, O. B. M., Mitchell, T., and Luthaman, L. 2011. Quantum GIS User Guide – Version 3.16 “Wroclaw”.
  • Smith, M.O., Susan, L.U., Adams, J.B., and Gillespie, A.R. 1990. “Vegetation in deserts: I. A regional measure of abundance from multispectral images.” Remote Sensing of Environment (Elsevier), Vol. 31(No. 1): pp. 1–26. doi:10.1016/0034-4257(90)90074-V.
  • Sousa, C., Fatoyinbo, L., Neigh, C., Boucka, F., Angoue, V., and Larsen, T. 2020. “Cloud-computing and machine learning in support of country-level land cover and ecosystem extent mapping in Liberia and Gabon.” PLos One, Vol. 15(No. 1): pp. e0227438. doi:10.1371/journal.pone.0227438.
  • Souza, C.M., Z. Shimbo, J., Rosa, M.R., Parente, L.L., A. Alencar, A., Rudorff, B.F.T., Hasenack, H., et al. 2020. “Reconstructing three decades of land use and land cover changes in Brazilian Biomes with Landsat Archive and Earth Engine.” Remote Sensing, Vol. 12(No. 17): pp. 2735. doi:10.3390/rs12172735.
  • Stehman, S.V. 1997. “Selecting and interpreting measures of thematic classification accuracy.” Remote Sensing of Environment, Vol. 62(No. 1): pp. 77–89. doi:10.1016/S0034-4257(97)00083-7.
  • Theofanous, N., Chrysafis, I., Mallinis, G., Domakinis, C., Verde, N., and SiaHalou, S. 2021. “Aboveground biomass estimation in short rotation forest plantations in Northern Greece using ESA’s sentinel medium-high resolution multispectral and radar imaging missions.” Forests, Vol. 12(No. 7): pp. 902. doi:10.3390/f12070902.
  • Vrdoljak, L., and Kilic´ Pamukovic´, J. 2022. “Assessment of atmospheric correction processors and spectral bands for satellite-derived bathymetry using sentinel-2 data in the middle adriatic.” Hydrology, Vol. 9(No. 12): pp. 215. doi:10.3390/hydrology9120215.
  • Wang, X., Yao, Y., Zhao, S., Jia, K., Zhang, X., Zhang, Y., Zhang, L., Xu, J., and Chen, X. 2017. “MODIS-based estimation of terrestrial latent Heat Flux over North America using three machine learning algorithms.” Remote Sensing, Vol. 9(No. 12): pp. 1326. doi:10.3390/rs9121326.
  • Waśniewski, A., Hościło, A., Zagajewski, B., and Moukétou-Tarazewicz, D. 2020. “Assessment of Sentinel-2 satellite images and random forest classifier for rainforest mapping in gabon.” Forests, Vol. 11(No. 9): pp. 941. doi:10.3390/f11090941.
  • Welikhe, P., Essamuah-Quansah, J., Fall, S., and McElhenney, W. 2017. “Estimation of soil moisture percentage using landsat-based moisture stress index.” Journal of Remote Sensing GIS, Vol. 06: pp. 200. doi:10.4172/2469-4134.1000200.
  • Willmott, C.J., Robeson, S.M., and Matsuura, K. 2012. “A refined index of model performance.” International Journal of Climatology, Vol. 32(No. 13): pp. 2088–2094. doi:10.1002/joc.2419.
  • Xie, B., Cao, C., Xu, M., Duerler, R.S., Yang, X., Bashir, B., Chen, Y., and Wang, K. 2021. “Analysis of regional distribution of tree species using multi-seasonal sentinel-12 imagery within Google Earth Engine.” Forests, Vol. 12(No. 5): pp. 565. doi:10.3390/f12050565.
  • Zeng, H., Wu, B., Wang, S., Musakwa, W., Tian, F., Mashimbye, Z.E., Poona, N., and Syndey, M. 2020. “a synthesizing land-cover classification method based on Google Earth Engine: A case study in Nzhelele and Levhuvu Catchments, South Africa.” Chinese Geographical Science, Vol. 30 (No. 3): pp. 397–409. doi:10.1007/s11769-020-1119-y.
  • Zulfiqar, A., Ghaffar, M.M., Shahzad, M., Weis, C., Malik, M.I., Shafait, F., and Wehn, N. 2021. “AI-ForestWatch: semantic segmentation based end-to-end framework for forest estimation and change detection using multi-spectral remote sensing imagery.” Journal of Applied Remote Sensing, Vol. 15(No. 02): pp. 1–21. doi:10.1117/1.JRS.15.024518.