Publication Cover
Canadian Journal of Remote Sensing
Journal canadien de télédétection
Volume 49, 2023 - Issue 1
1,213
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Comparative Analysis of Empirical and Machine Learning Models for Chla Extraction Using Sentinel-2 and Landsat OLI Data: Opportunities, Limitations, and Challenges

Analyse comparative de modèles empiriques et d’apprentissage automatique pour l’extraction de la Chla à l’aide des données Sentinel-2 et Landsat OLI: opportunités, limites et défis

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Article: 2215333 | Received 25 Aug 2022, Accepted 04 May 2023, Published online: 06 Jun 2023

Abstract

Remote retrieval of near-surface chlorophyll-a (Chla) concentration in small inland waters is challenging due to substantial optical interferences of various water constituents and uncertainties in the atmospheric correction (AC) process. Although various algorithms have been developed to estimate Chla from moderate-resolution terrestrial missions (∼10–60 m), the production of both accurate distribution maps and time series of Chla has proven challenging, limiting the use of remote analyses for lake monitoring. Here, we develop a support vector regression (SVR) model, which uses satellite-derived remote-sensing reflectance spectra (Rrsδ) from Sentinel-2 and Landsat-8 images as input for Chla retrieval in a representative eutrophic prairie lake, Buffalo Pound Lake (BPL), Saskatchewan, Canada. Validated against in situ Chla from seven ice-free seasons (N ∼ 200; 2014–2020), the SVR model outperformed both locally tuned, Rrsδ-fed empirical models (Normalized Difference Chlorophyll Index, 2- and 3-band, and OC3) and Mixture Density Networks (MDNs) by 15–65%, while exhibiting comparable performance to a locally trained MDN, with an error of ∼35%. Comparison of Chla retrieval models, AC processors (iCOR, ACOLITE), and radiometric products (Rayleigh-corrected, surface, and top-of-atmosphere reflectance) showed that the best Chla maps and optimal time series (up to 100 mg m−3) were produced using a coupled SVR-iCOR system.

RÉSUMÉ

L’extraction à distance de la concentration de chlorophylle-a (Chla) près de la surface dans les petites eaux intérieures est difficile en raison des interférences optiques importantes de divers constituants de l’eau et des incertitudes dans le processus de correction atmosphérique (CA). Bien que divers algorithmes aient été développés pour estimer Chla à partir de missions terrestres à résolution modérée (∼10–60 m), la production de cartes de répartition précises et de séries chronologiques de Chla s’est avérée difficile, limitant l’utilisation d’analyses à distance pour la surveillance des lacs. Ici, nous développons un modèle de régression vectorielle de support (RVS), qui utilise des spectres de réflectance dérivés de satellites (Rrsδ) utilisant des images Sentinel-2 et Landsat-8 comme entrée pour la récupération de la Chla d’un lac eutrophique des prairies représentatif, Buffalo Pound Lake (BPL), Saskatchewan, Canada. Validé à partir des données Chla in situ de sept saisons sans glace (N ∼ 200; 2014–2020), le modèle SVR a surpassé à la fois les modèles empiriques à réglage local, alimentés en (Rrsδ) (indice de chlorophylle par différence normalisée, bandes 2 et 3 et OC3) et les réseaux de densité de mélange (MDN) de 15% à 65%, tout en présentant des performances comparables à celles d’un MDN formé localement, avec une erreur de ∼35%. La comparaison des modèles de récupération de Chla, des processeurs AC (iCOR, ACOLITE) et des produits radiométriques (correction de Rayleigh, réflectance de surface et de la haute atmosphère) a montré que les meilleures cartes Chla et les séries chronologiques optimales (jusqu’à 100 mg m−3) ont été produites à l’aide d’un système SVR-iCOR couplé.

Introduction

Small inland waters (SIWs) are the predominant form of lakes globally, with 64% of basins <100 km2 (Downing et al. Citation2006), yet they are highly subject to water quality degradation due to changes in climate and land use (Carpenter et al. Citation1998, Delpla et al. Citation2009). Despite recognition of the problem for decades, the water quality of SIWs continues to degrade, resulting in harmful algal blooms (HABs) composed of cyanobacteria (Walker Citation2019). The frequency, magnitude, and persistence of HABs have also increased globally due to atmospheric warming (Ho et al. Citation2019; Hayes et al. Citation2020). A change in the near-surface concentration of chlorophyll-a (Chla) is one of the most reliable proxies of algal bloom intensification retrievable from satellite analyses, as Chla is present in all phytoplankton, including cyanobacteria (Roesler et al. Citation2017), and has unique absorption features (peak at ∼430 and ∼670 nm in live organisms) that can be detected through optical imaging (Kutser Citation2009).

Accurate Chla retrieval from optical radiometry is affected by the interplay between the inherent optical properties (absorption, scattering) of pure water, its dissolved or suspended constituents, and solar photons in water-leaving radiance (Lw). In particular, reflectance is affected strongly by phytoplankton density, colored dissolved organic matter (CDOM), and non-algal particles (NAP) (Babin et al. Citation2003). Further, Lw is modulated by atmospheric properties of the transmission path to satellite sensors, consequently, atmospheric correction (AC) processors are used to convert top-of-atmosphere reflectance (ρTOA) to satellite-derived remote sensing reflectance (Rrsδ) to retrieve Chla. Estimates of  Rrsδ include uncertainties in the AC and sensor radiometric measurements, as well as the effect of surface-reflected radiance (sun glint; Bulgarelli and Zibordi Citation2018), but are useful estimates of remote sensing reflectance (Rrs), defined as the ratio of water-leaving radiance to the total downwelling irradiance just above water. Once Rrsδ is approximated, a wide range of algorithms, including semi-analytical, empirical, and machine-learning (ML) models, can be applied to retrieve Chla from reflectance measurements (Carder et al. Citation1999; Morel Citation1980; Odermatt et al. Citation2012).

Semi-analytical models retrieve water absorption and scattering properties from Rrs measurements and can be used to estimate Chla (Gons Citation1999; Lee et al. Citation2002; Schroeder et al. Citation2007). While accurate in some circumstances (Santini et al. Citation2010; Van Der Woerd and Pasterkamp Citation2008), these models are sensitive to the form of AC and require accurate estimates of optical water parameters (IOCCG Citation2006; Odermatt et al. Citation2012). In contrast, empirical models (differential/ratio-based indices) based on blue-green wavelengths (e.g., NASA’s OCx models) tend to perform well in phytoplankton-dominated aquatic ecosystems (O'Reilly et al. Citation1998; O'Reilly and Werdell Citation2019). Further, various red- and near infrared (NIR)-indices have been developed and validated for ocean color sensors, including the 2band, 3band, and Normalized Difference Chlorophyll Index (NDCI) (Dall’Olmo and Gitelson Citation2005; Mishra and Mishra Citation2012; Moses et al. Citation2009) for use with data from the Medium Resolution Imaging Spectrometer (MERIS). Models based on red or NIR bands may be less sensitive to uncertainties in AC, especially when closely spaced (Moses et al. Citation2009); nonetheless, model performance depends on the range of Chla variation, the amount of interference from other constituents (e.g., backscattering NAP), and the band configuration of sensors (Gitelson Citation1992; Gitelson et al. Citation2007). Instead, empirical ML algorithms, especially neural networks (NN), are widely used to retrieve Chla over geographically-extensive regions using large synthetic or in situ radiometric measurements from diverse optical water types (OWTs) (Doerffer and Schiller Citation2007; Hu et al. Citation2021; Pahlevan et al. Citation2020; Schroeder et al. Citation2007).

To date, remotely-sensed Chla estimates have been applied successfully to large waterbodies, including the open ocean (Bryan et al. Citation2005; O'Reilly and Werdell Citation2019), coastal waters (Moses et al. Citation2012; Werdell et al. Citation2009), and large lakes (Binding et al. Citation2021; Binding et al. Citation2011; Gons et al. Citation2008; Schaeffer et al. Citation2018), using ocean-color sensors, such as MERIS and the Sea-viewing Wide Field-of-view Sensor (SeaWiFS). In contrast, Chla retrieval for SIWs has been challenging because the optical regimes of inland waters are influenced by particulate organic and inorganic particles, as well as CDOM (Mobley Citation1994). Generally, ocean-color sensors lack sufficiently high spatial resolution (<100 m) to sample SIWs (Ansper and Alikas Citation2018; Philipson et al. Citation2014). Likewise, very high-resolution sensors are not widely used to monitor water quality mostly because they do not offer much improvement in terms of spectral resolution and signal to noise ratio (SNR), despite their commercial nature. Instead, Multi-Spectral Instrument (MSI) and Operational Land Imager (OLI) sensors onboard Sentinel-2 (S2) and Landsat-8 (L8) satellites show potential for sensing Chla from SIW, as they provide excellent global coverage with spatial resolutions from 10 to 60 m. Although designed for land observations, these sensors may be applicable to small aquatic ecosystems (Cao et al. Citation2019; Pahlevan et al. Citation2014; Xu et al. Citation2020), primarily because of their radiometric performance and stability (Claverie et al. Citation2018; Helder et al. Citation2018; Pahlevan et al. Citation2019; Wulder et al. Citation2015) compared to heritage Landsat-class missions (Allan et al. Citation2011; Tebbs et al. Citation2013; Yacobi et al. Citation1995). Once combined, S2 and L8 images are available at sub-weekly revisit rate in high-latitude regions (Li and Roy Citation2017), favoring their use for water quality and HAB monitoring.

A broad range of Chla models has been used for MSI and OLI images, including 2band, 3band, and NDCI (Ansper and Alikas Citation2018). While MSI has been utilized for detecting cyanobacterial blooms and retrieval of Chla in subalpine lakes (Bresciani et al. Citation2018), studies suggest that current approaches have limitations at the extremes of the observed Chla range, e.g., Chla < 10 mg m−3 or Chla > 100 mg m−3 (Toming et al. Citation2016; Dörnhöfer et al. Citation2016; Kutser et al. Citation2016). Instead, the application of Mixture Density Networks (MDN) to a large dataset of in situ radiometry and Chla measurements has allowed the development of models which outperformed other state-of-the-art algorithms for a wide range of Chla concentration (0.1–100 mg m−3) using MSI (Pahlevan et al. Citation2020), as well as OLI data (Smith et al. Citation2021). Additionally, Cao et al. (Citation2020) developed BST, a model based on the Gradient Boosting Tree algorithm (XGBoost) (Chen and Guestrin Citation2016), and successfully tested it on OLI data taken from lakes in eastern China. As another example of ML models employed to retrieve Chla, Support Vector Machines/Regressions (SVM/SVR) (Vapnik Citation2013) have been applied to oceanic (Camps-Valls et al. Citation2006; Hu et al. Citation2021; Kwiatkowska and Fargion Citation2003; Martinez et al. Citation2020) and inland waters (Tian et al. Citation2022).

Despite recent developments, reliable estimates of Chla in SIWs remain challenging when based on moderate-resolution (∼10–60 m) satellite data. Empirical models leverage only a limited range of the spectrum and may not optimally solve ill-posed conditions (O'Sullivan Citation1986) that are common to inverse problems, such as Chla retrieval (Defoin‐Platel and Chami Citation2007; Pahlevan et al. Citation2020; Sydor et al. Citation2004; Werdell et al. Citation2018). Similarly, while globally trained ML (GML) models (e.g., MDN) can leverage the full visible and near-infrared spectrum (VNIR) and may handle non-linear and ill-posed problems (Pahlevan et al. Citation2020), they can be susceptible to uncertainties in AC that could reduce their suitability under sub-optimal atmospheric or aquatic conditions (Pahlevan et al. Citation2020; Smith et al. Citation2021). These observations suggest that the development of locally trained ML (LML) models using Rrsδ measurements might be suitable solution for optimal monitoring of Chla at local scales.

Here, we employed an ML approach based on SVR to retrieve robust and reliable Chla time series and maps for Buffalo Pound Lake, Saskatchewan, Canada, using MSI and OLI imagery. Our SVR model was trained and validated with ∼200 co-located in situ Chla measurements with corresponding Rrsδ observations. We compared model performance against several state-of-the-art algorithms, including OC3, MDN, 2band, BST, and LMDN—a locally trained MDN—in terms of its quantitative (general and stratified) performance, as well as its spatial and temporal consistency. Then, we assessed the robustness of the model for uncertainties from two AC processors (i.e., iCOR and ACOLITE) and different broadly-defined OWTs, to assess its potential utility for other small eutrophic lakes. Our overall objective was to develop a reliable baseline model for BPL that might also be suitable for other regional lakes exhibiting similar HABs and water conditions.

Materials and methods

Study site

Buffalo Pound Lake (BPL) is a long (∼30 km), narrow (<1 km), and shallow (<6 m) lake located in the Qu’Appelle River watershed, Saskatchewan, Canada (, ). Currently, the basin is eutrophic, with summer blooms occurring during June-September and peak surface populations of phytoplankton during July–August (Kehoe et al. Citation2019). Continuous monitoring for over 25 years shows that cyanobacteria are the predominant phytoplankton during July-September (Swarbrick et al. Citation2019; Vogt et al. Citation2018). The lake landscape orientation parallel to the direction of prevailing winds means that the water column is polymictic, experiencing frequent mixing periods with only intermittent vertical stratification (Dröscher et al. Citation2008).

Figure 1. Map and location of Buffalo Pound Lake (BPL), Saskatchewan, Canada. (a) Location of the Qu’Appelle River watershed within Canada. (b) Location of BPL within Qu’Appelle River watershed. (c) A Landsat-8 RGB image of BPL is overlaid on a bathymetric map on which sampling stations are also shown (solid black triangles numbered 1–11).

Figure 1. Map and location of Buffalo Pound Lake (BPL), Saskatchewan, Canada. (a) Location of the Qu’Appelle River watershed within Canada. (b) Location of BPL within Qu’Appelle River watershed. (c) A Landsat-8 RGB image of BPL is overlaid on a bathymetric map on which sampling stations are also shown (solid black triangles numbered 1–11).

Table 1. Buffalo Pound Lake characteristics and water quality parameters at station 1 (averaged from late May to early September  2014–2020).

Several attributes make BPL suitable for the development of remote sensing models of Chla. First, the lake is an important freshwater resource as it supplies drinking water to one-quarter of the provincial population, including the nearby cities of Regina and Moose Jaw (Hosseini et al. Citation2018). Second, multi-decadal records demonstrate elevated Chla content during late summer, with abundant surface blooms of toxic cyanobacteria (Kehoe et al. Citation2015; Hayes et al. Citation2020). Third, the lake size and elongated shape produce large gradients and patches of differing Chla concentration (10–100 mg Chla m−3), as well as regions of contrasting optical properties (NAP turbidity, HABs) that are suitable for analysis with spatially resolved MSI and OLI platforms. Finally, BPL is representative of many other prairie lakes in terms of physical, biological, and chemical properties (Finlay et al. Citation2015; Hayes et al. Citation2020), suggesting that models developed in this site may have regional suitability for water quality monitoring.

BPL exhibits two distinct OWTs (Appendix A). OWT1 characterizes the southern basin (stations 1–8), where Chla are elevated and optical characteristics are similar to those recorded in phytoplankton-rich systems elsewhere (OWT4 in Pahlevan et al. Citation2021; OWT8 in Spyrakos et al. Citation2018). In contrast, the northern basin (stations 9–11) exhibits of suspended sediments and lower Chla values ( in Appendix A), similar to OWT5 in Pahlevan et al. (Citation2021) or OWT4 in Spyrakos et al. (Citation2018).

Data

Although there is a long history of recorded in situ data in BPL (Swarbrick et al. Citation2019), we selected the period of 2014–2020 to match Landsat-8 and Sentinel-2 missions.

In situ Chla data

In situ Chla data originated from multiple datasets (). At station 1, autonomous, on-site fluorescence probes were available through deployment on a buoy. These fluorometric measurements were calibrated following Chegoonian et al. (Citation2022). In addition, discrete water samples were collected from the lake surface and 0.8-m depth, with Chla collected on Whatman GF/F frozen and later extracted following Wintermans and De Mots (Citation1965) and analyzed using a UV-visible spectrophotometer (Shimadzu UV-1601-PC). Samples from station 2 were obtained from the water treatment plant intake at ∼3 m depth in this polymictic lake. Samples from the intake were filtered onto a 0.45-µm pore filter, extracted in 90% acetone, and analyzed via spectrophotometry following standard methods (Eaton et al. Citation2017).

Table 2. Details of in situ Chla measurements employed in this study.

Phytoplankton from station 3 was collected on GF/C glass-fiber filters (nominal pore size 1.2 µm) following Swarbrick et al. (Citation2019). Briefly, surface water (∼0.5-m depth) and depth-integrated samples were filtered through GF/C filters and frozen (−10 °C) until analysis for Chla (mg m−3) through standard trichromatic assays (Jeffrey and Humphrey Citation1975) and biomarker pigment (nmoles pigment L−1) analysis by HPLC following Leavitt and Hodgson (Citation2001).

Samples from stations 4 to 11 were collected during monthly field visits at a 1-m depth using a Niskin bottle. Sub-samples for Chla analysis were transferred into laboratory bottles, stored in dark cool, containers, and analyzed using Eaton et al. (Citation2017) method 10200H. Briefly, samples were filtered at low vacuum through 0.45 µm nitrocellulose filters, and pigments were extracted using a 90% acetone solution by mixing. The resulting samples were steeped for <24 hours before Chla values were calculated following Jeffrey and Humphrey (Citation1975).

Satellite images

Cloud-free Level-1C MSI images acquired by the Sentinel-2A/B satellites with a 2–3 days revisit time during the open water season were identified manually and downloaded for the period 2017–2020. The MSI sensor collects data in 13 spectral bands from 443 to 2190 nm at spatial resolutions of 10, 20, and 60 m, and with a 12-bit radiometric resolution (Li et al. Citation2017). In addition, cloud-free OLI Level-1 images from Landsat-8 satellite (launched 2013) were downloaded for the period 2014–2020. The spatial resolution of the optical channels of OLI is 30 m, and the satellite overpasses the study site every ∼8 days. Appendix B compares the MSI and OLI’s spectral configuration with Chla spectral reflectance (), including reflectance spectra for samples with different Chla measured in BPL using an ASD spectrometer (Analytical Spectral Devices, ASD Inc., Boulder, CO, USA).

Methodology

A similar data analysis workflow () was used for all analyses in this study, although algorithms (e.g., AC processors, Chla retrieval models) and train-test split approaches differed between experiments (see ).

Data preprocessing

All images were corrected for atmospheric effects to produce two different reflectance quantities, namely satellite-derived remote sensing reflectance (Rrsδ) and Rayleigh-corrected reflectance (ρrc). We selected ACOLITE (v20210114.0) (Vanhellemont Citation2019; Vanhellemont and Ruddick Citation2014) and iCOR (version 3) (De Keukelaere et al. Citation2018) as AC processors since they outperform other processors in inland waters with OWTs similar to BPL (Pahlevan et al. Citation2021), especially when red-NIR wavelengths are used (Ilori et al. Citation2019). Visual inspection of images showed no significant sunglint effect in BPL; besides, a sunglint correction in the presence of adjacency effect (AE) may result in overcorrection (Vanhellemont Citation2019). The use of iCOR applies the SIMilarity Environment Correction (SIMEC) algorithm (Sterckx et al. Citation2015) to reduce AE which may be an issue for BPL due to its narrow width. Although ACOLITE lacks an inherent AE correction in the current version, a low threshold for the SWIR band (top of atmosphere reflectance at 1609 nm = 0.0215) was set to remove pixels highly impacted by AE and sunglint, as well as land pixels (Vanhellemont Citation2019). Furthermore, thanks to the dynamic band selection, the dark spectrum fitting (DSF) algorithm used in ACOLITE selects other bands (typically blue or red which might be unaffected by AE from nearby dark vegetation) if the NIR/SWIR adjacency effects are severe (Vanhellemont Citation2019). Regardless of AC processors, all MSI spectral bands were then resampled to a 60-m grid to be consistent for further steps (Ansper and Alikas Citation2018). Analysis of model performance using images resampled at 10, 20, and 60 m resolution () demonstrates that resampling at 60 m did not affect retrieval accuracy, yet improves time efficiency in model development.

Optically deep waters are the focus of this study; hence, Chla samples for which Secchi Disk Depth (SDD) measurements equal to bottom depth were excluded. This was to ensure that bottom reflection is avoided in our assessments. In situ samples (1394 station-day samples) were then collated with the closest matching satellite-derived Rrs products to create co-located Rrsδ−Chla matchups. The maximum time span between field sampling and image acquisition was 3 days (median = 0 day). While longer than the ±3 hours interval recommended for oceanic waters (Werdell and Bailey Citation2005), this value is much shorter than the interval needed (up to ±7 days) for reliable retrieval from optically-stable inland waters (Ansper and Alikas Citation2018; Dörnhöfer et al. Citation2018; Lunetta et al. Citation2015; Tang et al. Citation2003). However, to minimize potential mismatches in the sampling date, we used continuous Chla from the buoy to exclude matchups for which Chla at the time of satellite overpass differed from in situ values by >20%. Representative Rrsδ spectra for matchups were chosen to be the median of 3 × 3-element windows centered around the matchup locations.

Both AC processors mask land and clouds automatically; however, we manually deleted matchups that were contaminated by thin clouds/haze and cloud shadow through a visual assessment of images. Both processors occasionally overcorrect for atmospheric effects, mostly due to aerosol contribution, resulting in negative reflectance, especially in the 443 and 490 nm bands. However, in this study, there were few instances of negative reflectance values (∼5%) and these were excluded after inspection. Finally, we implemented an outlier detection algorithm to remove samples whose Rrsδ deviated from the mean values of Rrsδ by more than ±3σ. Approximately 200 matchups (varies by sensor type and AC processor) were selected for algorithm development and evaluation (). The distribution of Rrsδ derived from ACOLITE and iCOR (Rrsδ,ACL and Rrsδ,iCOR, respectively) is shown in .

Model development

Input and output Chla values were log10-transformed in the SVR model (see Appendix C). We allowed some outliers using a C = 2.5 parameter (regularization term) to decrease the chance of overfitting. We also employed a Radial Basis Function kernel (RBF) with γ = 0.14 and 0.25 for MSI and OLI data, respectively, to handle non-linearity in the feature space. These hyperparameters (C, γ, and kernel type) were tuned using a grid-search cross-validation process that minimizes model errors (mean absolute error) on a validation set. Here, the validation set was one-fifth of the training data (see ) that was repeatedly set apart for hyperparameter tuning. After identifying optimized hyperparameters, the validation set was merged with the whole training data and fed into the model for a final training process.

Using Rrsδ−Chla matchups, we calibrated several state-of-the-art empirical Chla retrieval algorithms for use with MSI and OLI spectral bands, namely OC3 (O'Reilly et al. Citation1998; O'Reilly and Werdell Citation2019), 2band (O'Reilly et al. Citation1998), 3band (Dall’Olmo and Gitelson Citation2005), and NDCI (Mishra and Mishra Citation2012) for MSI, and OC3, as well as FLH-blue (Beck et al. Citation2016) for OLI. Although OC3 was originally developed for clear oceanic waters, this model is commonly used as a benchmark for Chla retrieval in inland waters (e.g., Pahlevan et al. Citation2020). After a log10 transformation, these differential/ratio-based indices implied a linear relationship with log10-transformed Chla. The exceptions were 2band and 3band for which we added a power-of-two term to better fit the data. The tuned formula and coefficients for empirical models are presented in Appendix D ().

We also applied MDN and BST models as representatives of state-of-the-art ML models developed for MSI and OLI. MDN was implemented using the code available via https://github.com/STREAM-RS/STREAM-RS (Pahlevan et al. Citation2020; Smith et al. Citation2021). In addition, we implemented a locally trained MDN (LMDN) using local Rrsδ−Chla matchups. A similar process was conducted for the BST model (Cao et al. Citation2020) using the BST-OLI package (https://github.com/zgcao/bst_oli) and a locally trained XGBoost model, LBST. The reflectance spectra imported into these LML models (LMDN and LBST) were identical to our SVR model; i.e., Rrsδ derived from the first seven and four spectral bands (400–800 nm) for MSI and OLI, respectively.

Model assessment

Chla retrievals were assessed from three different aspects; quantitative performance, spatial integrity, and temporal validity. We also examined the robustness of our proposed model under various scenarios, including changes in water type, AC processors and radiometric products, and remote sensing data types.

For the quantitative assessment, the MSI datasets are selected as the main data source. Matchups were split into training and test datasets for the following experiments to estimate general and stratified performance, spatial and temporal integrity, model sensitivity to sensors and AC processors, and model transferability; however, the method used to do so differed among the experiments to assure a complete assessment of our model. summarizes the evaluation approaches (training-test splitting), as well as the number of training/test matchups, available for each experiment.

Assessment of general performance (section General performance) and model transferability (section Model transferability over water types) was based on a cross-validation approach in which the matchups were categorized either annually () or geographically (southern/northern basins). In each run, Rrsδ−Chla matchups related to a single year (or basin) were put aside as test data before the model was trained with the remaining data and used to assess model performance.

To gain insight into the model performance in two eutrophic conditions (OWTs; stratified performance hereafter; section Stratified performance), model sensitivity to the two AC processors (section Model sensitivity to AC and radiometric products), and its robustness for each sensor (section Model sensitivity to sensor type), we used a 5-fold cross-validation approach to randomly select among Rrsδ−Chla matchups. This approach ensures sufficient, equal training/test data for each run.

Assessment of model capability in generating Chla maps (section Spatial integrity) using both MSI and OLI images was based on images from a single date (July 16, 2020) when we had both cloud-free images from both sensors (∼10 minutes apart) and the maximum number of coincident (within 2 hours) in situ Chla samples (nine total), spanning a broad range of Chla (∼10–100 mg m−3). The corresponding matchups were considered equivalent to unseen test data, and the models were trained with the remaining matchups (184 matchups for MSI and 169 for OLI) (). In addition, to assess the stability of Chla retrieval over time (section Temporal validity), MSI-derived Rrsδ−Chla matchups corresponding to the continuous measurements of the buoy in 2020 were considered as unseen test data, and the remaining matchups were used to train the models.

Accuracy metrics

Both linear and log10-transformed metrics were examined to assess model accuracy. In general, metrics calculated in log10-transformed space (i.e., RMSLE, SSPB, and MdSA) are believed to provide a better assessment due to the log-normal distribution of Chla (O'Reilly and Werdell Citation2019; Seegers et al. Citation2018). The performance metrics for accuracy assessment were estimated as follows: (1) RMSE=[i=1N(PiMi)2/n]1/2(mg m3)(1) (2) RMSLE=[i=1N(log10(Pi)log10(Mi))2/n]1/2(2) (3) MAPE=100×median([|PiMi|/Mi])(3) (4) SSPB= 100 × sign(z)(10|z|1),  z= median(log10(Pi/Mi))(%)(4) (5) MdSA=100 × (10y1),  y=median|log10(Pi/Mi)|(%)(5) where Pi and Mi stand for predicted and measured Chla, respectively. RMSE is the root mean squared error, RMSLE is the root mean squared log-error, MAPE is the median absolute percentage error, SSPB represents symmetric signed percentage bias, and MdSA is the median symmetric accuracy, computed in log-space (Morley et al. Citation2018).

SSPB and MdSA were expressed as percent (%), expected to be resistant to outliers, zero-centered, and easily interpretable (Pahlevan et al. Citation2020). While SSPB measures the bias of a model, MdSA is believed to be an indicator of its precision. Because SSPB and MdSA are relatively new indices, we also estimated RMSE, RMSLE, and MAPE to facilitate the comparison with earlier studies. Finally, models were evaluated using Slope and Model Win Rate (MWR) criteria, wherein Slope was used to compare the results with earlier studies, while MWR, expressed in %, was used to determine which model performed better in pair-wise comparison of the residuals (Seegers et al. Citation2018).

Results

Quantitative assessment of the model on MSI data

Quantitative assessments were conducted using both general and stratified performance. Here, general performance analysis employed all matchups, whereas stratified analysis was conducted separately on two OWTs and provides insights into the use of SVR models in eutrophic conditions.

General performance

The overall accuracy of models for retrieving Chla from MSI-derived Rrsδ,ACL values was computed over all stations, and the whole Chla range (∼1–125 mg m−3; , ). Results show that LML models (SVR, LMDN) significantly outperformed (>15% improvement in MdSA) all other empirical and GML models. In particular, SVR outperformed all empirical models as reported via MWR, representing >60% of retrievals. Compared to LMDN, SVR performed marginally better (∼3% improvement in MdSA) but returned equal estimates of bias (as SSPB). The slope for SVR (0.78) demonstrates reasonable performance through the whole range of Chla in BPL. Among other models, the performance of OC3 was poor, as expected because of its dependency on blue-green band ratios, while other empirical models for eutrophic waters (2band, 3band, and NDCI) performed better and similarly in BPL, with the 2band algorithm generally outperforming other empirical models.

Figure 2. Matchup analysis of Chla derived from different algorithms applied on MSI-derived Rrsδ,ACL data and near-coincident, co-located in situ Chla samples in BPL. Year of data acquisition indicated by colored solid circles.

Figure 2. Matchup analysis of Chla derived from different algorithms applied on MSI-derived Rrsδ,ACL data and near-coincident, co-located in situ Chla samples in BPL. Year of data acquisition indicated by colored solid circles.

Table 3. Evaluation metrics (general performance) for Chla retrieval models on MSI and in situ Chla matchups (N = 193).

The MDN model, trained on global Rrs data, exhibited comparable precision to empirical models (∼56% error), albeit with a high bias (SSPB = 28%) and a tendency to overestimate Chla (Slope = 1.23) reflecting its sensitivity to Rrsδ. The LMDN showed good performance, implying MDN’s strong performance even with a relatively small training sample size (∼10% of matchups used by Pahlevan et al. Citation2020).

Visual inspection of scatter plots revealed that SVR and LMDN predictions were less biased than other models when based on annual sampling. Scatter plots of Chla retrievals also illustrated a reasonable overall performance of all models (except OC3) for Chla > 10 mg m−3 (MdSA¯ = 39.9 ± 10.32%). However, retrieving Chla < 10 mg m−3 was less accurate (MdSA¯ = 57.8 ± 6.95%), with most models overestimating Chla in this lower range (SSPB¯ = 46.8 ± 12.1%). Nonetheless, SVR and LMDN models exhibited a substantially better performance (Slope¯ = 0.47 ± 0.04) when compared to the other models (Slope¯ = −0.19 ± 0.09). While all models failed to retrieve Chla < 2 mg m−3, the absence of data in this range (2 matchups) prevented a detailed evaluation of performance. Instead, we infer from that empirical models underestimated Chla > 30 mg m−3 (SSPB¯ = −43.4 ± 7.1%) especially values >100 mg m−3, while ML models (SVR and LMDN) did not (SSPB¯ = −7.2 ± 5.2%), possibly because the latter uses at least four additional MSI spectral bands.

Stratified performance

Analysis of stratified performance (OWT1 vs. OWT2) suggests that SVR significantly outperformed all other algorithms in the southern basin, which is almost 80% of the lake area (). SVR also excelled relative to other algorithms in the northern basin, when considering most performance metrics, including MdSA and MWR. However, LMDN performance was comparable to that of SVR in the northern basin and even surpassed it in terms of SSPB and Slope. The reasonable performance of LMDN with few data (e.g., in the northern basin with only 45 samples for training) was unexpected; however, results should be treated with caution due to the low availability of test data (11 samples in each run; ). Scatter plots in (Appendix D) further demonstrate that empirical models failed to estimate Chla in the northern basin (Slope < 0.1), while LML models provide better estimates of Chla in turbid water (Slope = 0.3).

Table 4. Evaluation metrics for Chla retrieval models on MSI and in situ Chla matchups based on water type.

Except for OC3, Chla retrieval was more accurate (15–50% improvement in MdSA) in the southern basin compared to the northern site. The higher concentration of suspended sediments and NAP in the northern basin, which leads to a higher Chla interference by NAP backscattering particularly at longer wavelengths (red-NIR), likely explains the lower accuracy of Chla retrieval at that location. This pattern may also explain the higher accuracy of OC3 in the northern basin; given that it was the only model that did not use red-NIR bands.

Model sensitivity to AC and radiometric products

Model performance was assessed over two different AC processors (ACOLITE, iCOR) and three radiometric products (Rrsδ, ρrc, and ρTOA) applied to MSI data (). While ACOLITE provided all three products, iCOR only returns Rrsδ. Overall, SVR and LMDN manifested robust outputs for both AC processors and all the radiometric products (MdSA¯ = 43.7 ± 3.7%). In contrast, the mean of variability for empirical models was almost 2-fold greater (±7.8%) than these models, with a maximum for OC3 (±14.8%) and a minimum for 2band (±3.5%). SVR-Rrsδ,ACL exhibited the best performance of all combinations of retrieval models and AC processors. SVR’s superiority was also evident when employing Rrsδ,iCOR or ρrc, with only ρTOA showing comparable results to those obtained with LMDN (<2% difference).

Figure 3. Median Symmetric Accuracy (MdSA) for Chla retrieval algorithms when applied to MSI-A/B data processed to produce different radiometric products (Rrsδ, ρrc, and ρTOA) with different AC processors (ACOLITE and iCOR). Note that ρrc is generated with ACOLITE and theoretically is not different when using iCOR. N is the total number of matchups. See for the detailed training/test split process.

Figure 3. Median Symmetric Accuracy (MdSA) for Chla retrieval algorithms when applied to MSI-A/B data processed to produce different radiometric products (Rrsδ, ρrc, and ρTOA) with different AC processors (ACOLITE and iCOR). Note that ρrc is generated with ACOLITE and theoretically is not different when using iCOR. N is the total number of matchups. See Table D1 for the detailed training/test split process.

No single AC processor or radiometric product performed best in all Chla retrieval models. For example, OC3 and 3band worked better with iCOR as the AC processor, while the others (2band, NDCI, LMDN, SVR) all presented better results with ACOLITE. For these latter models, Rrsδ displays the highest accuracy compared to the other products (ρrc, ρTOA), suggesting that ACOLITE outperformed iCOR whenever it successfully carried out aerosol correction (ρrc Rrsδ). Our results also show that Rayleigh correction (ρTOAρrc) as implemented in ACOLITE reduced Chla retrieval accuracy except for OC3, confirming that this procedure over-corrects reflectance in red-NIR wavelengths while remaining suitable for use with blue-green bands. On the other hand, declining accuracy after aerosol correction in OC3 applications indicates that the AC processors failed to accurately remove aerosol effects in blue-green bands, a task that has proven to be challenging elsewhere (Pahlevan et al. Citation2021).

Model transferability over water types

Model transferability over two OWTs in BPL was assessed using Rrsδ,ACL- Chla matchups derived from MSI images (see section Model assessment) (). All empirical algorithms (OC3, NDCI, 2band, and 3band) failed to retrieve Chla when they were trained by matchups from a different, but similar, OWT (MdSA > 100%, Slope < 0.2). Additionally, LMDN showed poor transferability over both water types (MdSA > 200%, Slope < 0.2). In contrast, SVR maintained a reasonable transferability over two OWTs (MdSA = 61%, Slope = 0.35) compared to alternate models. Although the error and bias increased ∼2- to 4-fold compared to instances where both OWTs were used to train the SVR model (MdSA = 61 vs. 36% and SSPB = 15.8 vs. 3.4%) (see section General performance), they remained within an acceptable range for many applications. SVR’s high transferability might be related to its proven resistance to overfitting, thanks to the regularization parameter C.

Figure 4. Scatter plot of in situ Chla versus predicted Chla from MSI-A/B images. Chla values in the northern basin (OWT2, red solid circles) are predicted using a model trained with southern basin matchups (OWT1, blue solid circles) and vice versa.

Figure 4. Scatter plot of in situ Chla versus predicted Chla from MSI-A/B images. Chla values in the northern basin (OWT2, red solid circles) are predicted using a model trained with southern basin matchups (OWT1, blue solid circles) and vice versa.

Model sensitivity to sensor type

Matchups of Rrsδ,ACL–Chla derived from OLI images were employed to retrieve Chla in BPL. LMDN outperformed SVR in most metrics when using OLI data, by ∼5% in MdSA and with a 2-fold greater Slope (). MDN displayed an overall error of 95% and a bias of ∼50% reflecting the training of this global model with in situ Rrs rather than Rrsδ. Additionally, even though OLI lacks spectral bands at red-edge wavelengths, a red-NIR empirical model (FLH-blue) outperformed the blue-green-based index of OC3 by ∼10%. A global BST model (Cao et al. Citation2020) failed to estimate Chla in BPL (results not shown here), similar to what has been observed elsewhere (Smith et al. Citation2021), likely due to much lower CDOM absorption in BPL compared to the waterbodies that were used to train BST (aCDOM(440)¯ = 0.28 m−1 vs. aCDOM(440)¯ = 0.8–1 m−1). Finally, the LBST model exhibited poor performance (MdSA = 77%), possibly because the boosting algorithms degrade in the presence of outliers and errors in training data (Li and Bradic Citation2018).

Table 5. Evaluation metrics for Chla retrieval models on OLI and in situ Chla matchups (N = 178).

Overall, Chla retrieval using OLI data (; MdSA¯ = 71.3 ± 13.2%) appeared less accurate than that based on MSI summarized in (MdSA¯ = 55.2 ± 19.3%). OLI’s poor performance was also inferred from low Slope (< 0.5), likely due to the absence of a red-edge band. Similar to MSI, LML models exhibited better performance than empirical and GML models when applied to OLI data. The analysis of scatter plots () also revealed that all models failed to estimate Chla values <10 mg m−3 and concentrations >100 mg m−3. Although the former limitation was also observed when using MSI data (see section General performance), the latter might be intensified because OLI does not possess a spectral band in the domain of Chla fluorescence (680–710 nm).

Spatial integrity

Chla maps for BPL were generated from an MSI image taken on July 16, 2020 (). All model-processor combinations suggested Chla as low as ∼10 mg m−3 in the north basin, whereas some models/processors (e.g., SVR-iCOR) predicted Chla values up to ∼100 mg m−3 in the south basin. Regardless of the AC processor used, ML models (SVR and LMDN) seem to deliver overall smoother maps (less noise) compared to the 2band output, probably due to leveraging all spectral bands.

Figure 5. Chla maps for BPL derived from different retrieval algorithms/AC processors couples applied on MSI-A image acquired on July 16, 2020. The markers in the insets represent examples of the location of in situ data, collected on the same date, and employed as unseen test data. The color bars and associated numbers beside the markers show estimated Chla concentration in mg m−3. In situ Chla concentration in points A, B, and C are 41.2, 66.2, and 102.8 mg m−3, respectively. 2band was used as the best representative of empirical models.

Figure 5. Chla maps for BPL derived from different retrieval algorithms/AC processors couples applied on MSI-A image acquired on July 16, 2020. The markers in the insets represent examples of the location of in situ data, collected on the same date, and employed as unseen test data. The color bars and associated numbers beside the markers show estimated Chla concentration in mg m−3. In situ Chla concentration in points A, B, and C are 41.2, 66.2, and 102.8 mg m−3, respectively. 2band was used as the best representative of empirical models.

Visual comparison of Chla maps based on near-coincident in situ measurements revealed that the SVR model, coupled with iCOR processor, had the highest consistency with in situ measurements (). Although all models/processors showed a reasonable and similar performance in mapping moderate Chla concentrations (, upper insets), they differed more substantially in estimating high Chla values at the south of the lake. SVR tended to estimate higher Chla concentrations than did LMDN and 2band models, regardless of AC processors (lower insets in ). SVR-iCOR also seemed to be more capable of detecting high spatial gradients in Chla, as it is the only combination to capture large gradients of Chla at two nearby stations (Chla = 66.2 to Chla = 102.8; lower insets ). Such high-frequency changes in Chla may be related to the surface patchiness of cyanobacteria.

SVR results appeared prone to mixed pixels compared to LMDN and 2band models. Although this effect was limited to 1–2 pixels close to the shoreline, this issue should be treated with caution when producing maps of nearshore Chla. Similarly, despite being very eutrophic (SDD¯<1 m), SDD measurements across the lake show that a small portion of the lake area in the north basin can be considered as optically shallow waters, mostly in very early or late summer (i.e., May or October). Consequently, the elevated Chla estimates produced for the northern basin by models are probably influenced by very shallow depths (<2 m) or a high density of rooted aquatic macrophytes. While maps were produced using MDN and empirical models, none outperformed the above-mentioned models. For instance, MDN returned some unrealistically high Chla values, and OC3 routinely and significantly underestimated Chla.

As it is sometimes more important to reconstruct spatial patterns of Chla than accurately estimate absolute concentrations, we normalized the predicted Chla vector of unseen matchups for stations 4–11 (a longitudinal transect along the lake), by dividing by the vector norm to better evaluate which algorithms recorded spatial patterns of Chla in BPL (). Overall, normalization did not reveal a single superior model/processor in terms of retrieving spatial gradients of Chla. While SVR-iCOR provided the most similar pattern to measured Chla gradients in the northern basin (#station > 8), SVR-ACOLITE demonstrated good performance in retrieving Chla changes in the southern stations 5–8. In contrast, the 2band model performed well at stations 4–5 whereas LMDN performed poorly at stations 4–6 and 10–11. Together, these patterns suggest that SVR showed the highest overall capability in retrieving the Chla gradient along the lake.

Figure 6. Spatial profile of normalized Chla along the lake (south to north) for July 16, 2020, derived from in situ measurement Chla (solid line) as well as predicted Chla from algorithms applied on MSI image (dashed lines). X-axis denotes station number (see ).

Figure 6. Spatial profile of normalized Chla along the lake (south to north) for July 16, 2020, derived from in situ measurement Chla (solid line) as well as predicted Chla from algorithms applied on MSI image (dashed lines). X-axis denotes station number (see Figure 1).

We also mapped Chla over the lake using OLI data for the same date (July 16, 2020) using FLH-Blue, LBST, LMDN, and SVR models (). Maps from LBST and LMDN were markedly noisy, whereas LMDN showed reasonable quantitative performance for OLI data (), and FLH-Blue and SVR generated smooth maps. The SVR model exhibited more consistency with in situ data (marked points in ), while LMDN retrieved Chla values higher (120 mg m−3) than observed in situ, and the other algorithms underestimated Chla. In terms of reconstructing the spatial pattern of Chla, LMDN seems to provide the best performance, consistent with its higher Slope (Slope = 0.45) (). Appendix E also shows more examples of the produced Chla maps for BPL as well as a pixel-by-pixel comparison of MSI- and OLI-derived Chla in same-date images over BPL.

Figure 7. Chla map for BPL derived from different algorithms applied on OLI image acquired on July 16, 2020. The markers in the insets represent examples of in situ data, collected on the same date, and employed as unseen test data. The color bars and associated numbers beside the markers show estimated Chla concentration in mg m−3. In situ Chla concentration in points A, B, and C are 41.2, 66.2, and 102.8 mg m−3, respectively.

Figure 7. Chla map for BPL derived from different algorithms applied on OLI image acquired on July 16, 2020. The markers in the insets represent examples of in situ data, collected on the same date, and employed as unseen test data. The color bars and associated numbers beside the markers show estimated Chla concentration in mg m−3. In situ Chla concentration in points A, B, and C are 41.2, 66.2, and 102.8 mg m−3, respectively.

Temporal validity

Robust retrieval of Chla over time is a daunting task in a eutrophic waterbody due to high variations in surface bloom densities, resultant water optics, and atmospheric conditions. Comparison of SVR-iCOR, SVR-ACOLITE, LMDN-ACOLITE, and 2band-ACOLITE processing couples at station 1 in BPL revealed that SVR-iCOR tracked in situ Chla measurements better than the other model/processor combinations (), with particularly good capture of intense summer blooms in July to September. Although none of the models accurately captured the peak of Chla (>100 mg m−3) over the investigated period, SVR-iCOR followed the shape and magnitude of the measured time series with a ∼15% underestimation of peak Chla values. In contrast, couples based on ACOLITE failed to deliver consistent Chla values on August 7, 2020 when cloud shadow contaminated images. For more moderate Chla concentrations (20–60 mg m−3), SVR-ACOLITE displayed better performance than SVR-iCOR. Overall, a correlation analysis between the time series of measured and predicted Chla showed that SVR-iCOR (ρ= 0.798) outperformed other models (ρ= 0.684–0.728) in retrieving Chla time series.

Figure 8. Time series of Chla in station 1 in BPL for summer 2020, derived from in situ measurement Chla (solid line) as well as predicted Chla from algorithms applied to MSI images.

Figure 8. Time series of Chla in station 1 in BPL for summer 2020, derived from in situ measurement Chla (solid line) as well as predicted Chla from algorithms applied to MSI images.

Discussion

Analysis of Landsat 8 and Sentinel 2 images using locally trained machine-learning models, particularly those based on SVR, provided robust retrieval of Chla for a small eutrophic lake using MSI and OLI images (sections General performance and Stratified performance). These models also generated realistic annual time series and spatial gradients of Chla of scales appropriate to the prairie lake (sections Spatial integrity and Temporal validity). Overall, these models were robust to variations in AC processors (ACOLITE vs. iCOR) and sensor types (MSI vs. OLI). Together, our analysis suggests that pre-trained SVR models may provide important information on spatial and temporal patterns of water quality and HABs in regional lakes, provided that optical water types and atmospheric conditions are similar. However, the fact that the results here are based on a single lake study may necessitate further investigations of the presented model in other regional lakes.

Uncertainties in Chla and radiometric data

Although we attempted to reduce the uncertainties associated with in situ Chla data, any comparison of remotely sensed images and discrete lake measurements can be complicated due to the high variability of in situ data (Clay et al. Citation2019; Qiu et al. Citation2021). Here, we tried to reduce random noise in Chla measurements by conducting each measurement several times and averaging values. However, our in situ data originated from different laboratories using contrasting measurement techniques (field fluorometry, laboratory spectrophotometry, HPLC), instrumentation, calibration, and field sampling (surface 1 m vs. depth-integrated). While these factors may affect model performance, they also suggest that our algorithms exhibit minimal overfitting and systematic errors in performance assessment, and may be generalizable to other regional lakes.

Several lines of evidence suggest that potential outliers and other uncertainties in Chla measurements did not alter the results of comparative assessment of retrieval models. First, we used the median symmetric accuracy (MdSA) as the main metric to compare the models, as it is highly robust to potential outliers in in situ Chla measurements. Second, we conducted various experiments with different numbers and combinations of matchups, and in all cases, SVR showed robust and similar results, meaning that uncertainties in lake production do not substantially alter results. Moreover, given that BPL is well mixed vertically (Dröscher et al. Citation2008), we expect that differences in sampling protocols may not greatly affect our findings. Finally, earlier studies suggest that SVR can handle diverse, highly uncertain datasets because they use only a part of the data (support vectors) for learning (Chegoonian et al. Citation2017; Chegoonian et al. Citation2021; Foody and Mathur Citation2006; Hu et al. Citation2021; Nikparvar and Thill Citation2021). Handling uncertainties of in situ data becomes crucial when input data to observatory systems originate from diverse field and laboratory sources.

Comparable accuracy of Chla retrieval accuracy obtained from 10, 20, and 60 m MSI data () supported the use of 60 m data, which substantially (>5 times) reduced the time needed to develop a reliable model. This finding is consistent with studies that utilized 60 m resampled data (Ansper and Alikas Citation2018) and averaged pixel values over an equal-sized window (Pahlevan et al. Citation2020; Werther et al. Citation2022). Although the use of 60 m data may increase the likelihood of having mixed pixels in our model, lower spatial resolution also reduces random noise arising from very fine resolution (10–20 m) imaging of aquatic environments. Moreover, our sampling stations () were on the central axis of the lake, at least 500 m from shore, which eliminates the possibility of mixed aquatic-terrestrial pixels even with 60 m data. Our field observations of bloom formation in the lake during the summers of 2014–2021 also indicate that very small scale patchiness of phytoplankton blooms are rare, consistent with the similar performance of models based on 10, 20, and 60 m MSI data.

Merits of locally trained ML models

When compared with traditional empirical models (e.g., OC3, 2band), LML models exhibit several clear advantages, particularly with regard to SVR models. First, their ability to leverage all spectral bands and the capability to learn and model diverse uncertainties (in situ data, non-linearity, non-Chla constituents) is an advantage over traditional empirical/physical models and led to 15–65% error reduction. Such performance might be improved further when using models, such as LMDN that can deal with ill-posed problems (Pahlevan et al. Citation2020).

Currently, the uncertainties in AC processors are the major hurdle for employing GML models in inland waterbodies (Pahlevan et al. Citation2020). These models are often trained with in situ radiometric measurements and can be degraded when fed by satellite-derived measurements. LML models that can learn AC uncertainties (Rrsδ) specific to a lake of interest may be an important solution for application to local and regional resource management issues, such as blooms of toxic cyanobacteria near recreational areas or drinking water inlets. Meanwhile, the development of global models based on satellite-derived reflectance or including ancillary data may provide an opportunity to expand the geographic range of applications of ML models (Smith et al. Citation2021).

Presently, the need for substantial training data is a major obstacle to the development of local ML models. Fortunately, here we demonstrate that LML models (SVR and LMDN) were trainable with ∼200 matchups (section General performance), while the stability of the results even with only ∼50 matchups was encouraging (section Stratified performance), as many regional agencies in Europe and North America conduct routine monitoring (e.g., Soranno et al. Citation2017). Ideally, such locally trained models should possess reasonable generalization to retrieve reliable Chla in nearby lakes where optical conditions, water type, and atmospheric conditions differ only slightly. Our results suggest that SVR models exhibit adequate transferability when trained and tested with two different (but similar) water types in BPL (section Model transferability over water types). Although this capability is in agreement with SVR resistance to overfitting (Kwiatkowska and Fargion Citation2003; Mountrakis et al. Citation2011; Zhan et al. Citation2003), it is still essential to further validate our results using a more consistent and systematically collected/calibrated in situ Chla dataset.

This study was also the first independent assessment of the global MDN model in a small eutrophic lake. Although MDN is not expected to outperform locally trained models, it showed errors within ∼60% of in situ measurements. Nonetheless, MDN tended to significantly overestimate Chla (high bias) relative to locally trained, Rrsδ-fed models. Substantial uncertainties in AC process, which can be seen in drastically different Rrsδ distributions from ACOLITE and iCOR (), or low performance of the model with respect to spectral ambiguities, may explain MDN overestimation.

Atmospheric correction

Algorithms developed to retrieve downstream products, such as Chla, always should exhibit consistent performance with different intermediate processors, specifically AC processors. Here, we demonstrate the robustness of the SVR model when data is processed using ACOLITE and iCOR, and three different radiometric products (Rrsδ, ρrc, and ρTOA). The fact that ρrc, and ρTOA exhibit reasonable results—especially when using red-NIR bands—is in agreement with findings from previous studies (Matthews et al. Citation2012; Matthews and Odermatt Citation2015; Wynne et al. Citation2010) and can support using atmospherically-uncorrected data commonly-available on global scale (e.g., Google Earth Engine). Furthermore, our results show that the accuracy of Chla estimates was generally greatest when using Rrsδ for retrieval models, other than those based on OC3 and 3band for which ρrc generated more accurate products. We also note that empirical algorithms using blue-green bands (e.g., OC3) significantly benefited from Rayleigh correction for blue-light scattering. While Rayleigh correction did not appear to increase the accuracy of the models that were based on red-NIR bands, further evaluations are needed to evaluate this finding.

Modeling results were consistent with those of Pahlevan et al. (Citation2021), who recently conducted a comprehensive comparison between AC processors in retrieving Rrs using an extensive global dataset. For example, we observed that 2band and NDCI—two algorithms that use only 665 and 704 nm bands—performed better when they are coupled with ACOLITE than with iCOR. Similarly, OC3 and 3band models that use blue (443 or 492 nm) and 740 nm bands showed better performance with iCOR when compared to ACOLITE (). We interpret the high consistency between the assessments of downstream products (Chla concentration) and satellite-derived reflectance as an indicator of the effectiveness of AC process on the accuracy of downstream products. However, we also recognize that further examination of the effectiveness of AC will require a separate estimate of retrieval uncertainty from the AC process; an assessment that needs field radiometric measurements which were not sufficiently available in our study.

Comparisons among experiments in this study, as well as drastically different Rrsδ distributions derived from ACOLITE and iCOR (), suggest that different AC processors may lead to significant differences in retrieval performance. Thus, the algorithms for retrieval of downstream products should be examined as retrieval models/AC processors. For instance, the SVR model shows greater accuracy when used in conjunction with ACOLITE (), and more temporal stability when using iCOR as the AC processor (). However, the comparison between ACOLITE and iCOR is not entirely equivalent due to differences in the number of matchups (15 more for iCOR when masked by ACOLITE); thus, other studies (Ilori et al. Citation2019; Pahlevan et al. Citation2021; Warren et al. Citation2019; Xu et al. Citation2020) are needed for a more comprehensive comparison of AC processors.

Conclusion

This paper presents a machine-learning model based on support vector regression (SVR) to retrieve Chla concentration from satellite-derived reflectance measurements (Rrsδ) of Sentinel-2 (MSI) and Landsat-8 (OLI). The proposed model was trained and evaluated using a dataset of near-coincident, co-located in situ Chla and Rrsδ observations (N ∼ 200), collected in a mid-latitude eutrophic lake from 2014 to 2020. Comparison of the SVR model against state-of-the-art, commonly used alternates revealed that SVR outperformed all other algorithms when using MSI data. This superiority is seen in both general (entire samples, Chla = 1–125 mg m−3) and stratified levels (two distinct optical water types).

The proposed model also showed superiority in retrieving time series of Chla and producing Chla maps, two important applications of remote sensing in monitoring and mapping of harmful algal blooms. The superiority of SVR was also demonstrated by the return of robust and similar results following the alteration of AC processors (ACOLITE vs. iCOR). The model was also stable when fed with different radiometric products (Rrsδ, ρrc, and ρTOA). Quantitative evaluation of SVR also showed a promising transferability among two optical water types common to this study region, particularly in comparison to standard models.

Together, these findings reveal the high potential of SVR models to retrieve Chla in small waterbodies, even using data from multi-spectral terrestrial missions, such as MSI and OLI. Although results are presented only for BPL, the fact that the lake is broadly representative of over 100 regional lakes within a 240,000 km2 area (Finlay et al. Citation2015; Hayes et al. Citation2020) suggests that our findings may be generalized to other eutrophic mid-latitude waterbodies of similar optical water types. Development of such models for consistent retrievals from long-term observational records of satellite missions, such as Landsat and Sentinel increases the potential for monitoring and mapping the extent and intensity of harmful algal blooms in an era of global warming.

Author contributions

Conceptualization: AMC; limnological in situ data: PRL, HMB, and JMD; method development: AMC, NP, and KZ; data analysis: AMC; manuscript preparation: AMC; editing and approval: All authors; funding: CRD, PRL, HMB, and NP.

Acknowledgments

We thank members of University of Saskatchewan (US) Global Institute for Water Security and the University of Regina (UR) Limnology Laboratory for field data collection, as well as David Vandergucht and seasonal staff with the Saskatchewan Water Security Agency, and the Buffalo Pound Water Treatment Plant (BPWTP). Operations of the monitoring buoy were supported by Jay Bauer, Katy Nugent, Cameron Hoggarth, and staff of BPWTP. We gratefully acknowledge that the field research and UR analyses took place on Treaty 4 territory, homelands of the Cree, Saulteaux, Lakota, Dakota, and Nakota peoples, as well as the Metis/Michief nation. US is located on Treaty 6 territory, while University of Waterloo is located on the traditional territory of the Neutral, Anishinaabeg, and Haudenosaunee peoples.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Buoy operations and associated research supported by grants to HMB from the Natural Sciences and Engineering Research Council of Canada (NSERC), Canada Foundation for Innovation (CFI), Global Water Futures (GWF)-Canada First Research Excellence Fund (FORMBLOOM project), Global Institute for Water Security, and Buffalo Pound Water Treatment Plant. Qu’Appelle Long-term Ecological Research program (QU-LTER) was supported by grants to PRL from NSERC, CFI, Canada Research Chairs, the Province of Saskatchewan, and University of Regina. NP was supported under the NASA ROSES contract #80HQTR19C0015, Remote Sensing of Water Quality element, and the USGS Landsat Science Team Award #140G0118C0011. AMC, CRD and KZ were supported by GWF TTSW (Transformative sensor Technologies and Smart Watersheds for Canadian Water Futures) project.

References

  • Allan, M.G., Hamilton, D.P., Hicks, B.J., and Brabyn, L. 2011. “Landsat remote sensing of chlorophyll a concentrations in central North Island lakes of New Zealand.” International Journal of Remote Sensing, Vol. 32(No. 7): pp. 2037–2055. doi:10.1080/01431161003645840.
  • Ansper, A., and Alikas, K. 2018. “Retrieval of chlorophyll a from Sentinel-2 MSI data for the European Union water framework directive reporting purposes.” Remote Sensing, Vol. 11(No. 1): pp. 64. doi:10.3390/rs11010064.
  • Babin, M., Stramski, D., Ferrari, G.M., Claustre, H., Bricaud, A., Obolensky, G., and Hoepffner, N. 2003. “Variations in the light absorption coefficients of phytoplankton, nonalgal particles, and dissolved organic matter in coastal waters around Europe.” Journal of Geophysical Research, Vol. 108(No. C7): pp. 3211. doi:10.1029/2001JC000882.
  • Beck, R., Zhan, S., Liu, H., Tong, S., Yang, B., Xu, M., Ye, Z., et al. 2016. “Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations.” Remote Sensing of Environment, Vol. 178: pp. 15–30. doi:10.1016/j.rse.2016.03.002.
  • Binding, C., Pizzolato, L., and Zeng, C. 2021. “EOLakeWatch; delivering a comprehensive suite of remote sensing algal bloom indices for enhanced monitoring of Canadian eutrophic lakes.” Ecological Indicators, Vol. 121: pp. 106999. doi:10.1016/j.ecolind.2020.106999.
  • Binding, C.E., Greenberg, T.A., and Bukata, R.P. 2011. “Time series analysis of algal blooms in Lake of the Woods using the MERIS maximum chlorophyll index.” Journal of Plankton Research, Vol. 33(No. 12): pp. 1847–1852. doi:10.1093/plankt/fbr079.
  • Bresciani, M., Cazzaniga, I., Austoni, M., Sforzi, T., Buzzi, F., Morabito, G., and Giardino, C. 2018. “Mapping phytoplankton blooms in deep subalpine lakes from Sentinel-2A and Landsat-8.” Hydrobiologia, Vol. 824(No. 1): pp. 197–214. doi:10.1007/s10750-017-3462-2.
  • Bryan, A.F., Werdell, P.J., Gerhard, M., Sean, W.B., Robert, E.E. Jr., Gene, C.F., Ewa, J.K., Charles, R.M., Frederick, S.P., and Donna, T. 2005. “The continuity of ocean color measurements from SeaWiFS to MODIS.” In Proc. SPIE.
  • Bulgarelli, B., and Zibordi, G. 2018. “On the detectability of adjacency effects in ocean color remote sensing of mid-latitude coastal environments by SeaWiFS, MODIS-A, MERIS, OLCI, OLI and MSI.” Remote Sensing of Environment, Vol. 209: pp. 423–438. doi:10.1016/j.rse.2017.12.021.
  • Camps-Valls, G., Gómez-Chova, L., Muñoz-Marí, J., Vila-Francés, J., Amorós-López, J., and Calpe-Maravilla, J. 2006. “Retrieval of oceanic chlorophyll concentration with relevance vector machines.” Remote Sensing of Environment, Vol. 105: pp. 23–33.
  • Cao, Z., Ma, R., Duan, H., Pahlevan, N., Melack, J., Shen, M., and Xue, K. 2020. “A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes.” Remote Sensing of Environment, Vol. 248: pp. 111974. doi:10.1016/j.rse.2020.111974.
  • Cao, Z., Ma, R., Duan, H., and Xue, K. 2019. “Effects of broad bandwidth on the remote sensing of inland waters: Implications for high spatial resolution satellite data applications.” ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 153: pp. 110–122. doi:10.1016/j.isprsjprs.2019.05.001.
  • Carder, K.L., Chen, F., Lee, Z., Hawes, S., and Kamykowski, D. 1999. “Semianalytic moderate‐resolution imaging spectrometer algorithms for chlorophyll a and absorption with bio‐optical domains based on nitrate‐depletion temperatures.” Journal of Geophysical Research: Oceans, Vol. 104(No. C3): pp. 5403–5421. doi:10.1029/1998JC900082.
  • Carpenter, S.R., Caraco, N.F., Correll, D.L., Howarth, R.W., Sharpley, A.N., and Smith, V.H. 1998. “Nonpoint pollution of surface waters with phosphorus and nitrogen.” Ecological Applications, Vol. 8(No. 3): pp. 559–568. doi:10.1890/1051-0761(1998)008[0559:NPOSWW].2.0.CO;2
  • Chegoonian, A., Mokhtarzade, M., and Valadan Zoej, M. 2017. “A comprehensive evaluation of classification algorithms for coral reef habitat mapping: Challenges related to quantity, quality, and impurity of training samples.” International Journal of Remote Sensing, Vol. 38(No. 14): pp. 4224–4243. doi:10.1080/01431161.2017.1317934.
  • Chegoonian, A.M., Zolfaghari, K., Baulch, H.M., and Duguay, C.R. 2021. “Support vector regression for chlorophyll-a estimation using Sentinel-2 images in small waterbodies.” In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pp. 7449–7452. IEEE.
  • Chegoonian, A.M., Zolfaghari, K., Leavitt, P.R., Baulch, H.M., and Duguay, C.R. 2022. “Improvement of field fluorometry estimates of chlorophyll a concentration in a cyanobacteria‐rich eutrophic lake.” Limnology and Oceanography: Methods, Vol. 20(No. 4): pp. 193–209. doi:10.1002/lom3.10480.
  • Chen, T., and Guestrin, C. 2016. “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794.
  • Claverie, M., Ju, J., Masek, J.G., Dungan, J.L., Vermote, E.F., Roger, J.-C., Skakun, S.V., and Justice, C. 2018. “The harmonized Landsat and Sentinel-2 surface reflectance data set.” Remote Sensing of Environment, Vol. 219: pp. 145–161. doi:10.1016/j.rse.2018.09.002.
  • Clay, S., Peña, A., DeTracey, B., and Devred, E. 2019. “Evaluation of satellite-based algorithms to retrieve chlorophyll-a concentration in the Canadian Atlantic and Pacific Oceans.” Remote Sensing, Vol. 11(No. 22): pp. 2609. doi:10.3390/rs11222609.
  • Dall’Olmo, G., and Gitelson, A.A. 2005. “Effect of bio-optical parameter variability on the remote estimation of chlorophyll-a concentration in turbid productive waters: Experimental results.” Applied Optics, Vol. 44(No. 3): pp. 412–422. doi:10.1364/ao.44.000412.
  • De Keukelaere, L., Sterckx, S., Adriaensen, S., Knaeps, E., Reusen, I., Giardino, C., Bresciani, M., et al. 2018. “Atmospheric correction of Landsat-8/OLI and Sentinel-2/MSI data using iCOR algorithm: validation for coastal and inland waters.” European Journal of Remote Sensing, Vol. 51(No. 1): pp. 525–542. doi:10.1080/22797254.2018.1457937.
  • Defoin‐Platel, M., and Chami, M. 2007. “How ambiguous is the inverse problem of ocean color in coastal waters?” Journal of Geophysical Research, Vol. 112(No. C3): pp. 1–16. doi:10.1029/2006JC003847.
  • Delpla, I., Jung, A.V., Baures, E., Clement, M., and Thomas, O. 2009. “Impacts of climate change on surface water quality in relation to drinking water production.” Environment International, Vol. 35(No. 8): pp. 1225–1233. doi:10.1016/j.envint.2009.07.001.
  • Doerffer, R., and Schiller, H. 2007. “The MERIS Case 2 water algorithm.” International Journal of Remote Sensing, Vol. 28(No. 3–4): pp. 517–535. doi:10.1080/01431160600821127.
  • Dörnhöfer, K., Göritz, A., Gege, P., Pflug, B., and Oppelt, N. 2016. “Water constituents and water depth retrieval from Sentinel-2A—A first evaluation in an oligotrophic lake.” Remote Sensing, Vol. 8(No. 11): pp. 941. doi:10.3390/rs8110941.
  • Dörnhöfer, K., Klinger, P., Heege, T., and Oppelt, N. 2018. “Multi-sensor satellite and in situ monitoring of phytoplankton development in a eutrophic-mesotrophic lake.” The Science of the Total Environment, Vol. 612: pp. 1200–1214. doi:10.1016/j.scitotenv.2017.08.219.
  • Downing, J.A., Prairie, Y.T., Cole, J.J., Duarte, C.M., Tranvik, L.J., Striegl, R.G., McDowell, W.H., et al. 2006. “The global abundance and size distribution of lakes, ponds, and impoundments.” Limnology and Oceanography, Vol. 51(No. 5): pp. 2388–2397. doi:10.4319/lo.2006.51.5.2388.
  • Dröscher, I., Finlay, K., Patoine, A., and Leavitt, P.R. 2008. “Daphnia control of the spring clear-water phase in six polymictic lakes of varying productivity and size.” Internationale Vereinigung Für Theoretische Und Angewandte Limnologie: Verhandlungen, Vol. 30(No. 2): pp. 186–190. doi:10.1080/03680770.2008.11902107.
  • Eaton, A.D., Clesceri, L.S., Rice E.W. Greenberg, A.E., eds. 2017. Standard Methods for the Examination of Water and Wastewater. 23th ed. American Public Health Association, Method 10200H p 10:22–24.
  • Filazzola, A., Mahdiyan, O., Shuvo, A., Ewins, C., Moslenko, L., Sadid, T., Blagrave, K., et al. 2020. “A database of chlorophyll and water chemistry in freshwater lakes.” Scientific Data, Vol. 7(No. 1): pp. 310. doi:10.1038/s41597-020-00648-2.
  • Finlay, K., Vogt, R.J., Bogard, M.J., Wissel, B., Tutolo, B.M., Simpson, G.L., and Leavitt, P.R. 2015. “Decrease in CO2 efflux from northern hardwater lakes with increasing atmospheric warming.” Nature, Vol. 519(No. 7542): pp. 215–218. doi:10.1038/nature14172.
  • Foody, G.M., and Mathur, A. 2006. “The use of small training sets containing mixed pixels for accurate hard image classification: Training on mixed spectral responses for classification by a SVM.” Remote Sensing of Environment, Vol. 103(No. 2): pp. 179–189. doi:10.1016/j.rse.2006.04.001.
  • Gitelson, A. 1992. “The peak near 700 nm on radiance spectra of algae and water: Relationships of its magnitude and position with chlorophyll concentration.” International Journal of Remote Sensing, Vol. 13(No. 17): pp. 3367–3373. doi:10.1080/01431169208904125.
  • Gitelson, A.A., Schalles, J.F., and Hladik, C.M. 2007. “Remote chlorophyll-a retrieval in turbid, productive estuaries: Chesapeake Bay case study.” Remote Sensing of Environment, Vol. 109(No. 4): pp. 464–472. doi:10.1016/j.rse.2007.01.016.
  • Gons, H.J. 1999. “Optical teledetection of chlorophyll a in turbid inland waters.” Environmental Science & Technology, Vol. 33(No. 7): pp. 1127–1132. doi:10.1021/es9809657.
  • Gons, H.J., Auer, M.T., and Effler, S.W. 2008. “MERIS satellite chlorophyll mapping of oligotrophic and eutrophic waters in the Laurentian Great Lakes.” Remote Sensing of Environment, Vol. 112(No. 11): pp. 4098–4106. doi:10.1016/j.rse.2007.06.029.
  • Hayes, N.M., Haig, H.A., Simpson, G.L., and Leavitt, P.R. 2020. “Local and regional effects of lake warming on risk of toxic algal exposure.” Limnology and Oceanography Letters, Vol. 5(No. 6): pp. 393–402. doi:10.1002/lol2.10164.
  • Helder, D., Markham, B., Morfitt, R., Storey, J., Barsi, J., Gascon, F., Clerc, S., et al. 2018. “Observations and recommendations for the calibration of Landsat 8 OLI and Sentinel 2 MSI for improved data interoperability.” Remote Sensing, Vol. 10(No. 9): pp. 1340. doi:10.3390/rs10091340.
  • Ho, J.C., Michalak, A.M., and Pahlevan, N. 2019. “Widespread global increase in intense lake phytoplankton blooms since the 1980s.” Nature, Vol. 574(No. 7780): pp. 667–670. doi:10.1038/s41586-019-1648-7.
  • Hosseini, N., Akomeah, E., Davies, J.-M., Baulch, H., and Lindenschmidt, K.-E. 2018. “Water quality modeling of a prairie river-lake system.” Environmental Science and Pollution Research International, Vol. 25(No. 31): pp. 31190–31204. doi:10.1007/s11356-018-3055-2.
  • Hu, C., Feng, L., and Guan, Q. 2021. “A machine learning approach to estimate surface chlorophyll a concentrations in global oceans from satellite measurements.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 59(No. 6): pp. 4590–4607. doi:10.1109/TGRS.2020.3016473.
  • Ilori, C.O., Pahlevan, N., and Knudby, A. 2019. “Analyzing performances of different atmospheric correction techniques for Landsat 8: Application for coastal remote sensing.” Remote Sensing, Vol. 11(No. 4): pp. 469. doi:10.3390/rs11040469.
  • IOCCG. 2006. Remote Sensing of Inherent Optical Properties: Fundamentals, Tests of Algorithms, and Applications. In Z.-P. Lee (Ed.), Reports of the international ocean-color coordinating group, No. 5. Dartmouth, Canada: IOCCG. doi:10.25607/OBP-96.
  • Jeffrey, S.W., and Humphrey, G.F. 1975. “New spectrophotometric equations for determining chlorophylls a, b, c1 and c2 in higher plants, algae and natural phytoplankton.” Biochemie Und Physiologie Der Pflanzen, Vol. 167(No. 2): pp. 191–194. doi:10.1016/S0015-3796(17)30778-3.
  • Kehoe, M., Ingalls, B., Venkiteswaran, J., and Baulch, H. 2019. “Successful forecasting of harmful cyanobacteria blooms with high frequency lake data.” bioRxiv, pp. 674325.
  • Kehoe, M.J., Chun, K.P., and Baulch, H.M. 2015. “Who smells? Forecasting taste and odor in a drinking water reservoir.” Environmental Science & Technology, Vol. 49(No. 18): pp. 10984–10992. doi:10.1021/acs.est.5b00979.
  • Kutser, T. 2009. “Passive optical remote sensing of cyanobacteria and other intense phytoplankton blooms in coastal and inland waters.” International Journal of Remote Sensing, Vol. 30(No. 17): pp. 4401–4425. doi:10.1080/01431160802562305.
  • Kutser, T., Paavel, B., Verpoorter, C., Ligi, M., Soomets, T., Toming, K., and Casal, G. 2016. “Remote sensing of black lakes and using 810 nm reflectance peak for retrieving water quality parameters of optically complex waters.” Remote Sensing, Vol. 8(No. 6): pp. 497. doi:10.3390/rs8060497.
  • Kwiatkowska, E.J., and Fargion, G.S. 2003. “Application of machine-learning techniques toward the creation of a consistent and calibrated global chlorophyll concentration baseline dataset using remotely sensed ocean color data.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 41(No. 12): pp. 2844–2860. doi:10.1109/TGRS.2003.818016.
  • Leavitt, P., and Hodgson, D. 2001. Sedimentary Pigments. Tracking Environmental Change Using Lake Sediments. Volume 3: Terrestrial, Algal, and Siliceous Indicators. Dordrecht: Kluwer Academic Publishers.
  • Lee, Z., Carder, K.L., and Arnone, R.A. 2002. “Deriving inherent optical properties from water color: A multiband quasi-analytical algorithm for optically deep waters.” Applied Optics, Vol. 41(No. 27): pp. 5755–5772. doi:10.1364/ao.41.005755.
  • Li, A.H., and Bradic, J. 2018. “Boosting in the presence of outliers: Adaptive classification with nonconvex loss functions.” Journal of the American Statistical Association, Vol. 113(No. 522): pp. 660–674. doi:10.1080/01621459.2016.1273116.
  • Li, J., and Roy, D.P. 2017. “A global analysis of Sentinel-2A, Sentinel-2B and Landsat-8 data revisit intervals and implications for terrestrial monitoring.” Remote Sensing, Vol. 9(No. 9): pp. 902. doi:10.3390/rs9090902.
  • Li, S., Ganguly, S., Dungan, J.L., Wang, W., and Nemani, R.R. 2017. “Sentinel-2 MSI radiometric characterization and cross-calibration with Landsat-8 OLI.” Advances in Remote Sensing, Vol. 6(No. 2): pp. 147–159. doi:10.4236/ars.2017.62011.
  • Lunetta, R.S., Schaeffer, B.A., Stumpf, R.P., Keith, D., Jacobs, S.A., and Murphy, M.S. 2015. “Evaluation of cyanobacteria cell count detection derived from MERIS imagery across the eastern USA.” Remote Sensing of Environment, Vol. 157: pp. 24–34. doi:10.1016/j.rse.2014.06.008.
  • Martinez, E., Gorgues, T., Lengaigne, M., Fontana, C., Sauzède, R., Menkes, C., Uitz, J., Di Lorenzo, E., and Fablet, R. 2020. “Reconstructing global chlorophyll-a variations using a non-linear statistical approach.” Frontiers in Marine Science, Vol. 7: pp. 464. doi:10.3389/fmars.2020.00464.
  • Matthews, M.W., Bernard, S., and Robertson, L. 2012. “An algorithm for detecting trophic status (chlorophyll-a), cyanobacterial-dominance, surface scums and floating vegetation in inland and coastal waters.” Remote Sensing of Environment, Vol. 124: pp. 637–652. doi:10.1016/j.rse.2012.05.032.
  • Matthews, M.W., and Odermatt, D. 2015. “Improved algorithm for routine monitoring of cyanobacteria and eutrophication in inland and near-coastal waters.” Remote Sensing of Environment, Vol. 156: pp. 374–382. doi:10.1016/j.rse.2014.10.010.
  • Mishra, S., and Mishra, D.R. 2012. “Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters.” Remote Sensing of Environment, Vol. 117: pp. 394–406. doi:10.1016/j.rse.2011.10.016.
  • Mobley, C.D. 1994. Light and Water: Radiative Transfer in Natural Waters. New York: Academic Press.
  • Morel, A. 1980. “In-water and remote measurements of ocean color.” Boundary-Layer Meteorology, Vol. 18(No. 2): pp. 177–201. doi:10.1007/BF00121323.
  • Morley, S.K., Brito, T.V., and Welling, D.T. 2018. “Measures of model performance based on the log accuracy ratio.” Space Weather, Vol. 16(No. 1): pp. 69–88. doi:10.1002/2017SW001669.
  • Moses, W.J., Gitelson, A.A., Berdnikov, S., and Povazhnyy, V. 2009. “Satellite estimation of chlorophyll-a concentration using the red and NIR bands of MERIS—The Azov sea case study.” IEEE Geoscience and Remote Sensing Letters, Vol. 6(No. 4): pp. 845–849. doi:10.1109/LGRS.2009.2026657.
  • Moses, W.J., Gitelson, A.A., Berdnikov, S., Saprygin, V., and Povazhnyi, V. 2012. “Operational MERIS-based NIR-red algorithms for estimating chlorophyll-a concentrations in coastal waters—The Azov Sea case study.” Remote Sensing of Environment, Vol. 121: pp. 118–124. doi:10.1016/j.rse.2012.01.024.
  • Mountrakis, G., Im, J., and Ogole, C. 2011. “Support vector machines in remote sensing: A review.” ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 66(No. 3): pp. 247–259. doi:10.1016/j.isprsjprs.2010.11.001.
  • Nikparvar, B., and Thill, J.-C. 2021. “Machine learning of spatial data.” ISPRS International Journal of Geo-Information, Vol. 10(No. 9): pp. 600–632. doi:10.3390/ijgi10090600.
  • O'Reilly, J.E., Maritorena, S., Mitchell, B.G., Siegel, D.A., Carder, K.L., Garver, S.A., Kahru, M., and McClain, C. 1998. “Ocean color chlorophyll algorithms for SeaWiFS.” Journal of Geophysical Research: Oceans, Vol. 103(No. C11): pp. 24937–24953. doi:10.1029/98JC02160.
  • O'Reilly, J.E., and Werdell, P.J. 2019. “Chlorophyll algorithms for ocean color sensors-OC4, OC5 & OC6.” Remote Sensing of Environment, Vol. 229: pp. 32–47. doi:10.1016/j.rse.2019.04.021.
  • O'Sullivan, F. 1986. “A statistical perspective on ill-posed inverse problems.” Statistical Science, Vol. 1(No. 4): pp. 502–518. doi:10.1214/ss/1177013525.
  • Odermatt, D., Gitelson, A., Brando, V.E., and Schaepman, M. 2012. “Review of constituent retrieval in optically deep and complex waters from satellite imagery.” Remote Sensing of Environment, Vol. 118: pp. 116–126. doi:10.1016/j.rse.2011.11.013.
  • Pahlevan, N., Chittimalli, S.K., Balasubramanian, S.V., and Vellucci, V. 2019. “Sentinel-2/Landsat-8 product consistency and implications for monitoring aquatic systems.” Remote Sensing of Environment, Vol. 220: pp. 19–29. doi:10.1016/j.rse.2018.10.027.
  • Pahlevan, N., Lee, Z., Wei, J., Schaaf, C.B., Schott, J.R., and Berk, A. 2014. “On-orbit radiometric characterization of OLI (Landsat-8) for applications in aquatic remote sensing.” Remote Sensing of Environment, Vol. 154: pp. 272–284. doi:10.1016/j.rse.2014.08.001.
  • Pahlevan, N., Mangin, A., Balasubramanian, S.V., Smith, B., Alikas, K., Arai, K., Barbosa, C., et al. 2021. “ACIX-Aqua: A global assessment of atmospheric correction methods for Landsat-8 and Sentinel-2 over lakes, rivers, and coastal waters.” Remote Sensing of Environment, Vol. 258: pp. 112366. doi:10.1016/j.rse.2021.112366.
  • Pahlevan, N., Smith, B., Schalles, J., Binding, C., Cao, Z., Ma, R., Alikas, K., et al. 2020. “Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach.” Remote Sensing of Environment, Vol. 240: pp. 111604. doi:10.1016/j.rse.2019.111604.
  • Philipson, P., Eriksso, K., and Stelzer, K. 2014. MERIS data for monitoring of small and medium sized humic Swedish lakes. In 2014 IEEE/OES Baltic International Symposium (BALTIC), 1–4. IEEE. doi:10.1109/BALTIC.2014.6887835.
  • Qiu, G., Xing, X., Boss, E., Yan, X.-H., Ren, R., Xiao, W., and Wang, H. 2021. “Relationships between optical backscattering, particulate organic carbon, and phytoplankton carbon in the oligotrophic South China Sea basin.” Optics Express, Vol. 29(No. 10): pp. 15159–15176. doi:10.1364/OE.422671.
  • Roesler, C., Uitz, J., Claustre, H., Boss, E., Xing, X., Organelli, E., Briggs, N., et al. 2017. “Recommendations for obtaining unbiased chlorophyll estimates from in situ chlorophyll fluorometers: A global analysis of WET Labs ECO sensors.” Limnology and Oceanography: Methods, Vol. 15(No. 6): pp. 572–585. doi:10.1002/lom3.10185.
  • Santini, F., Alberotanza, L., Cavalli, R.M., and Pignatti, S. 2010. “A two-step optimization procedure for assessing water constituent concentrations by hyperspectral remote sensing techniques: An application to the highly turbid Venice lagoon waters.” Remote Sensing of Environment, Vol. 114(No. 4): pp. 887–898. doi:10.1016/j.rse.2009.12.001.
  • Schaeffer, B.A., Bailey, S.W., Conmy, R.N., Galvin, M., Ignatius, A.R., Johnston, J.M., Keith, D.J., et al. 2018. “Mobile device application for monitoring cyanobacteria harmful algal blooms using Sentinel-3 satellite ocean and land colour instruments.” Environmental Modelling & Software : With Environment Data News, Vol. 109: pp. 93–103. doi:10.1016/j.envsoft.2018.08.015.
  • Schroeder, T., Schaale, M., and Fischer, J. 2007. “Retrieval of atmospheric and oceanic properties from MERIS measurements: A new case‐2 water processor for BEAM.” International Journal of Remote Sensing, Vol. 28(No. 24): pp. 5627–5632. doi:10.1080/01431160701601774.
  • Seegers, B.N., Stumpf, R.P., Schaeffer, B.A., Loftin, K.A., and Werdell, P.J. 2018. “Performance metrics for the assessment of satellite data products: An ocean color case study.” Optics Express, Vol. 26(No. 6): pp. 7404–7422. doi:10.1364/OE.26.007404.
  • Smith, B., Pahlevan, N., Schalles, J., Ruberg, S., Errera, R., Ma, R., Giardino, C., et al. 2021. “A chlorophyll-a algorithm for Landsat-8 based on mixture density networks.” Frontiers in Remote Sensing, Vol. 1: pp. 623678. doi:10.3389/frsen.2020.623678.
  • Smola, A.J., and Schölkopf, B. 2004. “A tutorial on support vector regression.” Statistics and Computing, Vol. 14(No. 3): pp. 199–222. doi:10.1023/B:STCO.0000035301.49549.88.
  • Soranno, P.A., Bacon, L.C., Beauchene, M., Bednar, K.E., Bissell, E.G., Boudreau, C.K., Boyer, M.G., et al. 2017. “LAGOS-NE: A multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of US lakes.” GigaScience, Vol. 6(No. 12): pp. 1–22. doi:10.1093/gigascience/gix101.
  • Spyrakos, E., O'Donnell, R., Hunter, P.D., Miller, C., Scott, M., Simis, S.G.H., Neil, C., et al. 2018. “Optical types of inland and coastal waters.” Limnology and Oceanography, Vol. 63(No. 2): pp. 846–870. doi:10.1002/lno.10674.
  • Sterckx, S., Knaeps, S., Kratzer, S., and Ruddick, K. 2015. “SIMilarity Environment Correction (SIMEC) applied to MERIS data over inland and coastal waters.” Remote Sensing of Environment, Vol. 157: pp. 96–110. doi:10.1016/j.rse.2014.06.017.
  • Swarbrick, V.J., Simpson, G.L., Glibert, P.M., and Leavitt, P.R. 2019. “Differential stimulation and suppression of phytoplankton growth by ammonium enrichment in eutrophic hardwater lakes over 16 years.” Limnology and Oceanography, Vol. 64(No. S1): pp. S130–S149. doi:10.1002/lno.11093.
  • Sydor, M., Gould, R.W., Arnone, R.A., Haltrin, V.I., and Goode, W. 2004. “Uniqueness in remote sensing of the inherent optical properties of ocean water.” Applied Optics, Vol. 43(No. 10): pp. 2156–2162. doi:10.1364/ao.43.002156.
  • Tang, D., Kawamura, H., Lee, M.-A., and Van Dien, T. 2003. “Seasonal and spatial distribution of chlorophyll-a concentrations and water conditions in the Gulf of Tonkin, South China Sea.” Remote Sensing of Environment, Vol. 85(No. 4): pp. 475–483. doi:10.1016/S0034-4257(03)00049-X.
  • Tebbs, E.J., Remedios, J.J., and Harper, D.M. 2013. “Remote sensing of chlorophyll-a as a measure of cyanobacterial biomass in Lake Bogoria, a hypertrophic, saline–alkaline, flamingo lake, using Landsat ETM+.” Remote Sensing of Environment, Vol. 135: pp. 92–106. doi:10.1016/j.rse.2013.03.024.
  • Tian, S., Guo, H., Xu, W., Zhu, X., Wang, B., Zeng, Q., Mai, Y., and Huang, J.J. 2022. “Remote sensing retrieval of inland water quality parameters using Sentinel-2 and multiple machine learning algorithms.” Environmental Science and Pollution Research, Vol. 30(No. 7): pp. 18617–18630. doi:10.1007/s11356-022-23431-9.
  • Toming, K., Kotta, J., Uuemaa, E., Sobek, S., Kutser, T., and Tranvik, L.J. 2020. “Predicting lake dissolved organic carbon at a global scale.” Scientific Reports, Vol. 10(No. 1): pp. 8471. doi:10.1038/s41598-020-65010-3.
  • Toming, K., Kutser, T., Laas, A., Sepp, M., Paavel, B., and Nõges, T. 2016. “First experiences in mapping lake water quality parameters with Sentinel-2 MSI imagery.” Remote Sensing, Vol. 8(No. 8): pp. 640. doi:10.3390/rs8080640.
  • Van Der Woerd, H.J., and Pasterkamp, R. 2008. “HYDROPT: A fast and flexible method to retrieve chlorophyll-a from multispectral satellite observations of optically complex coastal waters.” Remote Sensing of Environment, Vol. 112(No. 4): pp. 1795–1807. doi:10.1016/j.rse.2007.09.001.
  • Vanhellemont, Q. 2019. “Adaptation of the dark spectrum fitting atmospheric correction for aquatic applications of the Landsat and Sentinel-2 archives.” Remote Sensing of Environment, Vol. 225: pp. 175–192. doi:10.1016/j.rse.2019.03.010.
  • Vanhellemont, Q., and Ruddick, K. 2014. “Turbid wakes associated with offshore wind turbines observed with Landsat 8.” Remote Sensing of Environment, Vol. 145: pp. 105–115. doi:10.1016/j.rse.2014.01.009.
  • Vapnik, V. 2013. The Nature of Statistical Learning Theory. Berlin, Heidelberg, Germany: Springer Science & Business Media.
  • Vogt, R.J., Sharma, S., and Leavitt, P.R. 2018. “Direct and interactive effects of climate, meteorology, river hydrology, and lake characteristics on water quality in productive lakes of the Canadian Prairies.” Canadian Journal of Fisheries and Aquatic Sciences, Vol. 75(No. 1): pp. 47–59. doi:10.1139/cjfas-2016-0520.
  • Walker, H.W. 2019. Harmful Algae Blooms in Drinking Water: Removal of Cyanobacterial Cells and Toxins. Boca Raton, Florida, USA: CRC Press.
  • Warren, M.A., Simis, S.G., Martinez-Vicente, V., Poser, K., Bresciani, M., Alikas, K., Spyrakos, E., Giardino, C., and Ansper, A. 2019. “Assessment of atmospheric correction algorithms for the Sentinel-2A MultiSpectral Imager over coastal and inland waters.” Remote Sensing of Environment, Vol. 225: pp. 267–289. doi:10.1016/j.rse.2019.03.018.
  • Werdell, P.J., and Bailey, S.W. 2005. “An improved in-situ bio-optical data set for ocean color algorithm development and satellite data product validation.” Remote Sensing of Environment, Vol. 98(No. 1): pp. 122–140. doi:10.1016/j.rse.2005.07.001.
  • Werdell, P.J., Bailey, S.W., Franz, B.A., Harding, L.W., Feldman, G.C., and McClain, C.R. 2009. “Regional and seasonal variability of chlorophyll-a in Chesapeake Bay as observed by SeaWiFS and MODIS-Aqua.” Remote Sensing of Environment, Vol. 113(No. 6): pp. 1319–1330. doi:10.1016/j.rse.2009.02.012.
  • Werdell, P.J., McKinna, L.I.W., Boss, E., Ackleson, S.G., Craig, S.E., Gregg, W.W., Lee, Z., et al. 2018. “An overview of approaches and challenges for retrieving marine inherent optical properties from ocean color remote sensing.” Progress in Oceanography, Vol. 160: pp. 186–212. doi:10.1016/j.pocean.2018.01.001.
  • Werther, M., Odermatt, D., Simis, S.G.H., Gurlin, D., Lehmann, M.K., Kutser, T., Gupana, R., et al. 2022. “A Bayesian approach for remote sensing of chlorophyll-a and associated retrieval uncertainty in oligotrophic and mesotrophic lakes.” Remote Sensing of Environment, Vol. 283: pp. 113295. doi:10.1016/j.rse.2022.113295.
  • Wintermans, J., and De Mots, A.S. 1965. “Spectrophotometric characteristics of chlorophylls a and b and their phenophytins in ethanol.” Biochimica et Biophysica Acta, Vol. 109(No. 2): pp. 448–453. doi:10.1016/0926-6585(65)90170-6.
  • Wulder, M.A., Hilker, T., White, J.C., Coops, N.C., Masek, J.G., Pflugmacher, D., and Crevier, Y. 2015. “Virtual constellations for global terrestrial monitoring.” Remote Sensing of Environment, Vol. 170: pp. 62–76. doi:10.1016/j.rse.2015.09.001.
  • Wynne, T.T., Stumpf, R.P., Tomlinson, M.C., and Dyble, J. 2010. “Characterizing a cyanobacterial bloom in western Lake Erie using satellite imagery and meteorological data.” Limnology and Oceanography, Vol. 55(No. 5): pp. 2025–2036. doi:10.4319/lo.2010.55.5.2025.
  • Xu, Y., Feng, L., Zhao, D., and Lu, J. 2020. “Assessment of Landsat atmospheric correction methods for water color applications using global AERONET-OC data.” International Journal of Applied Earth Observation and Geoinformation, Vol. 93: pp. 102192. doi:10.1016/j.jag.2020.102192.
  • Yacobi, Y.Z., Gitelson, A., and Mayo, M. 1995. “Remote sensing of chlorophyll in Lake Kinneret using highspectral-resolution radiometer and Landsat TM: Spectral features of reflectance and algorithm development.” Journal of Plankton Research, Vol. 17(No. 11): pp. 2155–2173. doi:10.1093/plankt/17.11.2155.
  • Zhan, H., Shi, P., and Chen, C. 2003. “Retrieval of oceanic chlorophyll concentration using support vector machines.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 41: pp. 2947–2951.

Appendix A

Chla in BPL is higher than the average for freshwaters (Filazzola et al. Citation2020), as opposed to Dissolved Organic Carbon (DOC) which is in a low/medium amount for freshwaters (Toming et al. Citation2020). It can be claimed that particles, especially algal particles, mostly control the optical characteristics of water in BPL. This hypothesis can be confirmed by (upper diagonal) where the optical characteristics of water samples e.g., turbidity and Secchi Disk Depth (SDD) are highly correlated with Total Suspended Solids (TSS) and Chla. However, (lower diagonal) also reveals that the relationship depends on the station location as southern stations (1–8) show a stronger relationship between Chla and optical characteristics. This can be justified based on the distribution of water constituents, plotted in (diagonal), where northern stations (9–11) contain more sediments, as opposed to the southern stations which are more dominated by Chla.

Figure A1. Pair plots of some optically-derived/driven parameters in BPL (averaged on stations 4–11 from late May to early September  of 2014–2020). Diagonal elements are the distribution of each parameter, color-coded according to station numbers. Upper-diagonal elements are the scatter plot of paired parameters. Lower-diagonal charts are the contour plots showing the relationship between the parameters in northern and southern stations. N and ρ are the number of samples and correlation coefficients, respectively. Units are mg m−3, g m−3, NTU, and m for Chla, TSS, turbidity, and SDD, respectively.

Figure A1. Pair plots of some optically-derived/driven parameters in BPL (averaged on stations 4–11 from late May to early September  of 2014–2020). Diagonal elements are the distribution of each parameter, color-coded according to station numbers. Upper-diagonal elements are the scatter plot of paired parameters. Lower-diagonal charts are the contour plots showing the relationship between the parameters in northern and southern stations. N and ρ are the number of samples and correlation coefficients, respectively. Units are mg m−3, g m−3, NTU, and m for Chla, TSS, turbidity, and SDD, respectively.

Appendix B

Figure B1. Comparison of MSI bands (red boxes) and OLI bands (blue boxes) in wavelengths <800 nm. The spectra are from three different samples measured at BPL using an ASD spectrometer and display how water spectra change with changes in Chla content.

Figure B1. Comparison of MSI bands (red boxes) and OLI bands (blue boxes) in wavelengths <800 nm. The spectra are from three different samples measured at BPL using an ASD spectrometer and display how water spectra change with changes in Chla content.

Figure B2. Overview of a workflow developed in this study.

Figure B2. Overview of a workflow developed in this study.

Figure B3. Scatter plot of in situ Chla versus predicted Chla from MSI-A/B images, resampled to 10, 20, and 60 m. Data from Buffalo Pound Lake 2017–2021. Model performance does not vary consistently with image resolution.

Figure B3. Scatter plot of in situ Chla versus predicted Chla from MSI-A/B images, resampled to 10, 20, and 60 m. Data from Buffalo Pound Lake 2017–2021. Model performance does not vary consistently with image resolution.

Figure B4. Normalized frequency distributions of MSI-derived Rrsδ spectra for the matchups processed via ACOLITE (N = 193) and iCOR (N = 208) processors.

Figure B4. Normalized frequency distributions of MSI-derived Rrsδ spectra for the matchups processed via ACOLITE (N = 193) and iCOR (N = 208) processors.

Appendix C

SVR uses an ε-insensitive cost function (with ε as a threshold) in which errors (ei) up to ε are not penalized, whereas larger errors are penalized using a linear function, i.e., L(ei)=max(|ei|ε, 0). Thus, compared to traditional ML models (e.g., multilayer-perceptron neural networks), SVR is more robust to small errors and inherent uncertainties in training data (Zhan et al. Citation2003). Weights (ω) are estimated in the linear regression problem (EquationEquation A1), where i is the number of training data, j is the number of predictors (spectral bands), and ϕ is a kernel (a non-linear mapping function). SVR minimizes EquationEquation (A2), where ξi are |ei|>ε and C is the regularization parameter, balancing the minimization of errors and generalization capabilities (Camps-Valls et al. Citation2006; Smola and Schölkopf Citation2004). depicts a schematic view of the regression between Chla and reflectance measurements using SVR. (A1) Chlai=j=1MωTϕ(Rrsiδ)(A1) (A2) Cost= 12ω2+CiL(ei) (A2)

Figure C1. Graphical depiction of principles of support vector regression (SVR). (a) Schematic view of regression between Chla and Rrsδ using SVR. (b) Loss function defined for SVR; while errors less than ε are not penalized, larger errors are penalized by a linear function.

Figure C1. Graphical depiction of principles of support vector regression (SVR). (a) Schematic view of regression between Chla and Rrsδ using SVR. (b) Loss function defined for SVR; while errors less than ε are not penalized, larger errors are penalized by a linear function.

Appendix D

Figure D1. Matchup analysis of measured and predicted Chla from in situ Chla and MSI-A/B images for two different regions in BPL, categorized based on optical water type. For each optical water type, a model is trained and tested using a 5-fold cross-validation approach.

Figure D1. Matchup analysis of measured and predicted Chla from in situ Chla and MSI-A/B images for two different regions in BPL, categorized based on optical water type. For each optical water type, a model is trained and tested using a 5-fold cross-validation approach.

Figure D2. Matchup analysis of Chla derived from different algorithms applied on OLI data and near-coincident, co-located in situ Chla samples in BPL. The results are from a 5-fold cross-validation approach.

Figure D2. Matchup analysis of Chla derived from different algorithms applied on OLI data and near-coincident, co-located in situ Chla samples in BPL. The results are from a 5-fold cross-validation approach.

Table D1. Assessment approaches (training-test split) as well as the number of Rrsδ–Chla training/test matchups available for each experiment in this study.

Table D2. Formulas and coefficients of empirical models employed in this study. b# and w# are the reflectance and wavelength at specified bands, respectively.

Table D3. Annual frequency and statistics of Rrsδ–Chla matchups derived from MSI sensor.

Table A1. Statistics for water constituents associated with two distinct OWTs in BPL (averaged from late May to early September  2017–2020).

Appendix E

Figure E1. Chla maps for BPL derived from SVR algorithm applied on MSI (left) and OLI (right) images acquired on the same dates from 2017 to 2020. The color bars show the estimated Chla concentration in mg m−3.

Figure E1. Chla maps for BPL derived from SVR algorithm applied on MSI (left) and OLI (right) images acquired on the same dates from 2017 to 2020. The color bars show the estimated Chla concentration in mg m−3.

Figure E2. Matchup analysis of Chla derived from SVR algorithm applied on the three same-date MSI and OLI images captured over the BPL from 2017 to 2020.

Figure E2. Matchup analysis of Chla derived from SVR algorithm applied on the three same-date MSI and OLI images captured over the BPL from 2017 to 2020.