1,693
Views
5
CrossRef citations to date
0
Altmetric
Research Article

Retrieval of daily sea ice thickness from AMSR2 passive microwave data using ensemble convolutional neural networks

ORCID Icon & ORCID Icon
Pages 812-830 | Received 09 Mar 2021, Accepted 08 Jun 2021, Published online: 14 Jul 2021

ABSTRACT

Recently, measurement of sea ice thickness (SIT) has received increasing attention due to the importance of thinning ice in the context of global warming. Although altimeter sensors onboard satellite missions enable continuous SIT measurements over larger areas compared to in situ observations, these sensors are inadequate for mapping daily Arctic SIT because of their small footprints. We exploited passive microwave data from AMSR2 (Advanced Microwave Scanning Radiometer 2) by incorporating a state-of-the-art deep learning (DL) approach to address this limitation. Passive microwave data offer better temporal resolutions than those from a single altimeter sensors, but are rarely used for SIT estimations due to their limited physical relationship with SIT. In this study, we proposed an ensemble DL model with different modalities to produce daily pan-Arctic SIT retrievals. The proposed model determined the hidden and unknown relationships between the brightness temperatures of AMSR2 channels and SITs measured by CryoSat-2 (CS2) from the extended input features defined by our feature augmentation strategy. Although AMSR2-based SITs agreed well with CS2-derived gridded SIT values, they had similar uncertainties and errors in the CS2 SIT measurements, particularly for thin ice. However, based on quantitative validations using long-term unseen data and IceBridge data, the proposed retrieval model consistently generated SITs from AMSR2 at 25 km spatial resolution, regardless of time and space.

1. Introduction

Sea ice is the largest reflector of solar radiation, is known to reduce heat exchanges between the ocean and the atmosphere and is one of the most visible components of the climate system (Smith Citation1998; Spreen, Kaleschke, and Heygster Citation2008; Vinnikov et al. Citation1999). Thinning ice and increases in both ice-free periods and regions reportedly accelerate global warming (Ivanova et al. Citation2015; Stroeve et al. Citation2012, Citation2014). Among the most important Arctic sea ice parameters, sea ice thickness (SIT) and volume are related to changes in the heat budget of the Arctic and the exchanges of fresh water between sea ice and the ocean (Han et al. Citation2020; Laxon et al. Citation2013; Liu et al. Citation2020).

Observations of ice thickness by research vessels and aircraft are generally made via field campaigns, such as the National Aeronautics and Space Administration (NASA) Operation IceBridge (OIB), which links the measurements of air- and space-borne laser altimeters to compare and generate a long-term ice altimetry record (Koenig et al. Citation2010; Kwok et al. Citation2011). Ice thickness data from ships and aircraft enable coverage of extended areas, compared to data from ice profiling sonar (IPS) or upward-looking sonar (ULS), and are used to fill in gaps in polar observations between satellite missions (Koenig et al. Citation2010; Kwok et al. Citation2011). A moored sonar has the advantage of being able to record ice drafts at specific locations throughout the year continuously; however, these data are often noisy and are challenging to accurately convert into their corresponding thickness values (Proshutinsky, Bourke, and McLaughlin Citation2002; Yoshizawa, Shimada, and Cho Citation2018). Alternatively, since the initiation of the ICESat (IS) and CryoSat (CS) missions, satellite-based altimeters now provide the capacity to periodically measure ice freeboard over extended areas; thus, enabling ice volume estimations (Kwok et al. Citation2019; Laxon et al. Citation2013). Sea ice thickness observations from satellite missions have been validated using airborne electromagnetic sensors, sonar moorings, and OIB aircraft lasers (Kwok et al. Citation2019; Laxon et al. Citation2013). Since overall comparisons between satellite-based thickness estimates and in situ data show good agreement, these data are readily available and are promising for estimating continuous SIT in the Arctic.

Satellite-driven SIT datasets from altimeters have larger footprints than those obtained from close-range platforms. However, due to the extremely small footprints of altimeter sensors (), it is believed that data should be collected at intervals of at least one month in order to map the entire Arctic with a single instrument. Nevertheless, for several applications, this data collection interval is not suitable for detecting rapid changes in sea ice volume. Based on an empirical relationship between emissivity and ice thickness, SIT can also be estimated using brightness temperature (TB) data from passive microwave (PMW) sensors, such as the Advanced Microwave Scanning Radiometer 2 (AMSR2). Compared with altimeter sensors, the larger footprint of the PMW sensors enables the generation of daily thickness maps for the pan-Arctic region. Surface salinity inversely correlates with ice thickness due to the relationship between the surface brine volume fraction and the dielectric properties of sea ice at microwave frequencies (Naoki et al. Citation2008; Vant, Ramseier, and Makios Citation1978). However, microwave radiometric signals are theoretically inadequate to observe radiation emitted from the vertical features of the ice layer, which is required for SIT estimation (Lee et al. Citation2020; Naoki et al. Citation2008). The Soil Moisture and Ocean Salinity (SMOS) mission uses the 1.4 GHz (L-band) channel, characterized by significant sea ice penetration depth (Kaleschke et al. Citation2012). It has been used to retrieve the SIT of thin ice (up to 0.5 m) during the Arctic freezing period. However, sea ice penetration depth becomes smaller at higher frequencies in most PMW sensors. Of note, the polarization ratio (PR) of TB between horizontal and vertical polarizations has often been considered in thin ice thickness algorithms (Iwamoto et al., Citation2014; Martin et al. Citation2004; Tamura and Ohshima Citation2011). Yoshizawa, Shimada, and Cho (Citation2018) used the gradient ratio (GR) of the vertical polarization of the AMSR2 18 GHz and 36 GHz channels to estimate the flat first-year ice draft (i.e. the ice thickness below the waterline). They evaluated the data using in situ ice draft from a moored IPS in the Chukchi Abyssal Plain. For these reasons, PMW-based SIT estimation studies have been limited to first-year or thin ice retrievals. As a multi-sensor data merging approach, Ricker et al. (Citation2017) proposed an interpolation scheme using SIT data from CS2 and SMOS to generate a weekly Arctic-wide SIT product. However, this weekly product used 2–4 weekly SIT means as the background. Based on previous studies, current PMW-based SIT retrievals are limited to thin ice, and no operational daily SIT product exists for all ice types from any satellite remote sensing data. However, some weekly or monthly averaged products generated daily are available from the Alfred Wegener Institut (AWI; https://awi.de) and the National Snow & Ice Data Center (NSIDC; https://nsidc.org).

Figure 1. Examples of gridded Arctic sea ice thickness (SIT) maps derived from CryoSat-2 (CS2): 1-day (left, 1 February 2019); 14-day (middle, February 1–14); and 28-day (right, February 1–28) composites

Figure 1. Examples of gridded Arctic sea ice thickness (SIT) maps derived from CryoSat-2 (CS2): 1-day (left, 1 February 2019); 14-day (middle, February 1–14); and 28-day (right, February 1–28) composites

Deep learning (DL), an emerging topic in the machine learning community, attempts to model high-level abstraction in big data (LeCun, Bengio, and Hinton Citation2015). Deep learning is motivated by mimicking the learning process of the human brain, which often interactively learns from past experiences. In recent studies, a variety of DL architectures, such as multi-layer perceptron (MLP) (Gardner and Dorling Citation1998), convolutional neural networks (CNNs) (Krizhevsky, Sutskever, and Hinton Citation2017), and recurrent neural networks (RNN) (Sak, Senior, and Beaufays Citation2014) have been proposed and applied in diverse fields of study, yielding outstanding results (Zou et al. Citation2015). Several studies have also been investigated using DL for retrievals of sea ice properties. Chi et al. (Citation2019) proposed an MLP-based Arctic sea ice concentration (SIC) retrieval algorithm that used AMSR2 observations and high-resolution SIC derived from Moderate Resolution Imaging Spectroradiometer (MODIS) data as reference data. In their study, DL-based SIC values exhibited better agreements than other retrieval algorithms, especially in the 20–80% SIC zones. Wang et al. (Citation2016) utilized CNN to estimate SIC from synthetic aperture radar (SAR) images during the melt season. This approach was robust regarding the effects of the incidence angle, noise, wind, and melt and produced high-resolution SIC estimates in regions with low ice concentrations. Chi and Kim (Citation2017) obtained the first SIC predictions using RNN without incorporating additional ocean–atmosphere variables. Their study yielded comparable September forecasts to the results of various statistical and numerical models obtained via the Sea Ice Outlook (https://arcus.org/sipn/sea-ice-outlook).

In this study, we propose a novel SIT retrieval algorithm using a state-of-the-art CNN approach for all ice types and daily mapping from AMSR2 PMW data, which are not generally used for retrieving ice thicknesses. The retrieval model uses a CNN-based ensemble approach that concatenates multiple types of input features generated by the proposed mathematical operations to estimate SIT from the AMSR2 data. The SIT retrieval model exploits hidden and nonlinear relationships inherent in multiple AMSR2 channels, not often captured by traditional linear models. Although the reported CS2 SIT measurements have large errors and high uncertainties for observations of ice thicknesses less than 1 m, due in part to the limitation imposed by the radar’s pulse width, it is one of the most widely used SIT products (Kurtz, Galin, and Studinger Citation2014; Laxon et al. Citation2013; Tilling, Ridout, and Shepherd Citation2018). A goal of this study is to produce daily SIT maps over the entire Arctic from AMSR2 data at 25 km spatial resolution. Therefore, SIT data derived from CS2 that have been validated with in situ data (Laxon et al. Citation2013; Tilling, Ridout, and Shepherd Citation2018) were used as a reference to train and evaluate the model, and the estimated SIT values from AMSR2 showed good statistical agreements with CS2-based SITs. However, for thin ice measurements, which contain large errors and high uncertainties similar to CS2-based estimations, SIT retrievals from the proposed model were compared with daily SMOS-estimated SIT values. Lastly, a feature importance test using feature-denial experiments was conducted to explain which feature made the largest statistical contribution to the retrievals.

2. Datasets

2.1 AMSR2 brightness temperatures as input variables

The AMSR2 is onboard the Global Change Observation Mission-Water 1 (GCOM-W1), a sun-synchronous and near-polar orbiting platform successfully placed into orbit in May 2012. The AMSR2 records data at seven frequencies (6.9, 7.3, 10.6, 18.7, 23.8, 36.5, and 89 GHz) for horizontal (H) and vertical (V) polarization (Okuyama and Imaoka Citation2015). To mitigate the impacts of different antenna patterns (footprints) for the AMSR2 channels, re-sampled and gridded Level-3 TB data (25 km spatial resolution) were obtained from the Globe Portal System (https://gportal.jaxa.jp), which is operated by the Japan Aerospace Exploration Agency (JAXA). In this study, the AMSR2 Level-3 TB data were used as input variables.

2.2 Gridded CryoSat-2 sea ice thickness (SIT) as a target variable

Following the failure, CS2 was placed into a low-Earth orbit in April 2010 with an orbital inclination of 92°, enabling coverage up to 88° N. The SAR Interferometric Radar Altimeter (SIRAL) is the primary instrument aboard the CS2 that measures ice elevation using radar. It has a footprint of approximately 0.3 km × 1.5 km along-track and cross-track, respectively (Laxon et al. Citation2013). Once the sea ice freeboard is estimated from the CS2 radar waveforms, converting to SIT is straightforward using the hydrostatic equilibrium equation as follows (Alexandrov et al. Citation2010; Kurtz, Galin, and Studinger Citation2014; Laxon et al. Citation2013):

SIT=FρwρwρI+SρsρwρI

where F is the sea ice freeboard, S is the snow depth, and ρw,ρI,ρs are the densities of seawater, sea ice, and snow, respectively. Based on the assumption that ρw,ρI,ρs are known parameters, the parameterization of snow depth on sea ice is the most challenging, as it is not routinely measured. Since there are many approaches to converting freeboard into thickness using different parameter assumptions (Alexandrov et al. Citation2010; Kwok et al. Citation2019; Laxon et al. Citation2013), we used the converted daily SIT trajectory data obtained from the AWI to minimize uncertainties in the conversion process.

The AMSR2 Level-3 TB data include multiple tracks because the ground tracks overlap at high latitudes, and the overlapping regions increase with latitude. Differences in acquisition times may be responsible for the differences between the CS2 and PMW-based SIT values due to sea ice drift from wind forcing and ice-ocean interactions (Nakayama, Ohshima, and Fukamachi Citation2012). However, because the daily ice drift rate in the Arctic is less than 10 km/day, which is smaller than the spatial resolution (25 km) of the AMSR2 data used in this study (Durner et al. Citation2017; Lund et al. Citation2018), time differences within a single day were not considered. Most CS2-based SIT products are often output onto a Polar Stereographic or EASE 2.0 grid due to the large variations in SIT estimates from CS2 (Laxon et al. Citation2013; Kurtz, Galin, and Studinger Citation2014; Tilling, Ridout, and Shepherd Citation2018). Therefore, in this study, the daily SIT trajectories from the CS2 data were mapped onto the same Polar Stereographic 25 km grid cells as those used for the AMSR2 data. Samples of gridded Arctic SIT maps derived from the CS2 data are shown in . The weighted arithmetic mean, which is based on the individual thickness uncertainties, was used to calculate the average ice thickness within each grid cell as follows:

SIT=i=1NSITiσSITi2σSIT2
σSIT=1/i=1N1σSITi2

where N is the number of SIT measurements within a grid cell, σSIT is the uncertainty of the ice thickness retrieval, and σSIT is the random uncertainty. Since the mean SIT uncertainty depends on the number of data points in a grid cell, the mean uncertainty cannot be interpolated onto a different grid. Therefore, σSIT was calculated for each grid cell, and the SIT values with high random uncertainties (three standard deviations from the mean as a cutoff) were excluded from the SIT retrieval model (the percentage of the excluded data was approximately 3.5%). Hereafter, CS2-derived SIT values are the weighted arithmetic mean SIT corresponding to a 25 km grid cell and used as a target variable in this study.

3. Methods

Convolutional neural networks provide suitable techniques by learning contextual features and are widely used in image classification tasks (Krizhevsky, Sutskever, and Hinton Citation2017; Yu et al. Citation2017). Generally, convolutions are performed in the spatial domain but can be performed in the one-dimensional (1D) spectral domain, such as for hyperspectral or multi-variable data (Li, Zhang, and Zhang Citation2014; Maggiori et al. Citation2016). Multi-layer perceptron is a widely used type of feed-forward neural network based on back-propagation. However, MLP is unsuitable for more advanced tasks because it consists of fully connected layers in which each perceptron is connected to every other perceptron. Since the total number of learnable parameters can become extremely large, this results in redundancy when there are many dimensions. In contrast, CNN often uses fewer parameters than a fully connected network, and learns and generalizes features from the input domain.

Similar to general CNN architectures, 1D-CNN consists of two parts: 1) feature extraction and 2) regression or classification (Ince et al. Citation2016; Krizhevsky, Sutskever, and Hinton Citation2017; Peng, Liu et al., Citation2018a; Huang et al. Citation2019). Feature extraction has convolutional and pooling layers, and the regression/classification has fully-connected dense and dropout layers. The flattened layer between the first and second parts of the CNN exists to connect the two parts, with each convolutional and dense layer having an activation function. To create feature maps, the convolutional layers use a set of learnable filters connected to an input. Each filter convolves across the variables and computes the dot products. A pooling layer is inserted between successive convolutional layers, reducing the number of parameters, generalizing the feature maps, and preventing overfitting of the training data. To create final nonlinear combinations of features and make decisions, each of the fully connected layers, which are regular feed-forward neural-network layers, has full connections to all of the activations in the previous layer (LeCun, Bengio, and Hinton Citation2015; Krizhevsky, Sutskever, and Hinton Citation2017).

3.1 Feature augmentation strategy

In conventional machine learning tasks, the use of a large number of input features does not always guarantee higher accuracy, owing to the curse of dimensionality (Bellman Citation2015). In contrast, recent advanced DL approaches have been able to handle this limitation by combining improved optimization algorithms, big data, and advanced machines. The general form of a DL algorithm, including that of a 1D-CNN, is based on “add” operations between the input features to determine their weights or convolutional filters instead of “product” operations (Qu et al. Citation2018). For instance, output y of the hidden layer in DL can be calculated using w1x1+w2x2++wnxn where w,x,n are weights, input features, and the number of input features, respectively. Direct mathematical operations between two or more input features such as w1x1x2+w2x1x3++wn1x1xn will never occur.

Since we wanted to determine the theoretically unknown relationship between PMW signals and ice thickness values using DL, which has not been investigated in the current PMW-based SIT retrieval algorithms, exploiting features generated by “product” operations may be worthwhile. Therefore, we defined some inter-feature operations using the inverse, logarithm, Hadamard product, and normalized differences between TB channels (NDTB) in addition to the original 14 channels in the AMSR2 TB data, as listed in . In particular, the expression for NDTB v1p,v2p was inspired by the widely used PR (v) and GR v1p,v2p for the retrieval of sea ice properties from PMW data. The expression for PR is given by the normalized difference between the H- and V-polarized TB at a single frequency (v), and GR is the spectral gradient of TB between two different frequencies (v1,v2) for the polarized component (p) (Markus and Cavalieri Citation2000). Considering all of the combinations from the original 14 features, a total of 357 features were obtained for the extended model inputs instead of the original 14 AMSR2 TB features. It is uncertain whether these additional features are physically meaningful. However, finding and defining the hidden and unknown relationships between the additional features may be helpful for AMSR2-based SIT retrievals for all ice types. In particular, a few NDTBs have been used in retrieval algorithms for studies of ice concentration, type, thickness, and snow depth, as shown in . In addition, they could be used to estimate SIT directly from the PMW data without a freeboard (or draft)-thickness conversion.

Table 1. Input features for the proposed AMSR2-based sea ice thickness (SIT) retrieval model by feature augmentation

Table 2. Relationship between NDTB feature and the corresponding ratio widely used in retrieval algorithms of sea ice properties

3.2 Ensemble one-dimensional convolutional neural network

Owing to the nature of 1D-CNN, which explores neighboring features using convolutional filters, extracting meaningful information may be inefficient when the features are distant or physically uncorrelated. For example, if the number of input features is large, it is difficult to extract some information from the first and the last features unless a large kernel size or very deep network architecture is used. To overcome this limitation, this study split the input features into seven branches based on the feature augmentation rules listed in . We then developed a 1D-CNN-based ensemble model with multiple inputs and different modalities. The proposed model takes seven types of features as inputs, each having a different size. shows the proposed model architecture, along with the shape of each layer. Separate feature extractions of the 1D-CNN model blocks are operated on each feature type (gray blocks in ), with the results from all branches concatenated (blue blocks in ) for the final SIT retrieval (yellow block in ). The widely used rectified linear unit (ReLU) (Agarap Citation2018) and adaptive moment estimation (Adam) (Kingma and Ba Citation2014) with a 10−4 learning rate were used as the activation function for the convolutional and dense layers and as the gradient descent optimization algorithm, respectively. The mean absolute error (MAE) was used to calculate the loss due to its robustness to outliers. The dropout in the dense layers was used to learn about more robust features and prevent overfitting (Srivastava et al. Citation2014). Bayesian optimization was employed to optimize network configurations (hyperparameters) using small subsets of the data (Brochu, Cora, and De Freitas Citation2010). The Keras framework (https://keras.io) was used to build the 1D-CNN to retrieve the SIT from the AMSR2 TB data with TensorFlow (https://tensorflow.org) as a backend.

Figure 2. Network architecture of the proposed ensemble one-dimensional convolutional networks (1D-CNN) model for retrieving SIT from augmented AMSR2 brightness temperature (TB) data. Black layers denote each input feature group, as defined in , and pass through a 1D-CNN block. The results from each 1D-CNN block then concatenate and deliver to the dense layer parts (blue layers) for SIT retrieval

Figure 2. Network architecture of the proposed ensemble one-dimensional convolutional networks (1D-CNN) model for retrieving SIT from augmented AMSR2 brightness temperature (TB) data. Black layers denote each input feature group, as defined in Table 1, and pass through a 1D-CNN block. The results from each 1D-CNN block then concatenate and deliver to the dense layer parts (blue layers) for SIT retrieval

3.3 Model training

Due to the high uncertainties in summer SIT retrievals, we estimated the SIT for the freezing season (October to April), similar to other SIT retrieval studies/products (Iwamoto, Ohshima, and Tamura Citation2014; Kaleschke et al. Citation2012; Kurtz, Galin, and Studinger Citation2014; Kwok et al. Citation2019; Laxon et al. Citation2013; Lee et al. Citation2020; Ricker et al. Citation2017; Yoshizawa, Shimada, and Cho Citation2018; Lee et al. Citation2020). To develop a robust SIT retrieval model, we trained using a set of corresponding AMSR2 TB and CS2 SIT data acquired from two freezing seasons (October 2015–April 2016 and October 2016–April 2017; 662,299 samples), validated using data acquired from October 2017–April 2018 (332,049 samples), and tested using data acquired from October 2018–April 2019 (335,299 samples). Pixels near the coastlines were excluded to minimize land spillover effects that are misclassified sea ice pixels because of mixed land-ocean areas within the sensor’s field of view (Cavalieri et al. Citation1999). Pixels with SICs of less than 15%, a widely used threshold value to determine sea ice extent (Vinnikov et al. Citation1999), were also excluded as they are highly mixed pixels of ice and water signals (Chi et al. Citation2019). The TB data from the ascending and descending passes were averaged and used to develop the ensemble 1D-CNN model, as illustrated in .

We performed cross-validation by shuffling the years for training/validation/test sets to avoid accidental results due to fixed datasets. For example, one randomized set used data from October 2014–April 2015 and October 2017–April 2018 for training, data from October 2016–April 2017 for validation, data from October 2012–April 2013 for testing, etc. The results confirmed that the model scores were stable in time and were insensitive to the randomness. Therefore, in this study, we conducted experiments using data from October 2015–April 2016 and October 2016–April 2017 as training sets, data from October 2017–April 2018 as validation sets, and data from October 2018–April 2019 as testing sets.

4. Results and discussion

4.1 Model comparison

We first evaluated the performance of the proposed ensemble model based on the number of input features and model architectures. Although DL approaches showed promising results in many applications, it does not always guarantee better outcomes. Therefore, we chose a random forest (RF) regression model as the baseline model because of its popularity and performance in various machine learning applications (Fernández-Delgado et al. Citation2014). Five experimental cases were considered: 1) RF using the original 14 features; 2) RF using the extended 357 features; 3) 1D-CNN using the original 14 features; 4) 1D-CNN using the extended 357 features; 5) the proposed ensemble 1D-CNN using the extended 357 features. For the RF models, a grid search was used to determine the best values of the hyperparameters (number of trees, maximum tree depths, and the maximum number of features). For the conventional 1D-CNN models, one of seven input CNN blocks and dense layers (gray and blue blocks, respectively, in ) were directly connected without the concatenate layer. The training process of the CNN-based models continued iterating using the entire set of training samples, which were exposed to the network until the loss function for the validation data reached a minimum. With a 16,384 training batch size, 1D-CNN (14 features) and 1D-CNN/ensemble 1D-CNN (357 features) converged to minimum loss values after approximately 100,000 and 70,000 training epochs, respectively, on an NVIDIATM Tesla V100 with 32 GB of memory.

summarizes MAE, root mean square error (RMSE), bias, the Pearson correlation coefficient, and computational time according to the model architectures and the number of input features for the unseen test data (October 2018–April 2019) (Note: p-values of all models from Student’s t-tests were < 0.001). The use of extended features proposed in this study contributed to improved SIT estimations from the AMSR2 data, and the proposed ensemble model further yielded better agreements with the gridded CS2-based SIT estimates. The proposed ensemble model with the extended input features produced the most accurate outcomes. Furthermore, due to there being 10% fewer trainable parameters for the proposed ensemble model (1,557,073 versus 1,707,569), it was more computationally efficient than the conventional 1D-CNN-based models with the same number of input features. Although exploiting fewer features accelerated the model training and provided better estimations than the RF, the error of the proposed model was about half that of the 14-feature 1D-CNN model (MAE: 11.99 cm versus 25.00 cm; RMSE: 18.38 cm versus 35.33 cm).

Table 3. Statistical model comparisons (mean absolute error (MAE), root mean square error (RMSE), bias, and the Pearson correlation coefficient) with computational time using test sets according to the model architectures and number of input features

4.2 Quantitative accuracy assessment using gridded CryoSat-2 estimates

First, statistical comparisons were conducted between the AMSR2-derived daily SIT values estimated by the ensemble 1D-CNN (SITAMSR2) and the corresponding SIT values derived from the gridded CS2 data (SITCS2). The SITCS2 was evaluated using fine-scale in situ measurements acquired from OIB, IPS, or ULS (Laxon et al. Citation2013; Tilling, Ridout, and Shepherd Citation2018). ) shows the SITAMSR2 versus SITCS2 for the unseen test data (October 2018–April 2019), with the number of pixels denoted by the color intensity. In statistical terms, 11.99 cm of MAE, 18.38 cm of RMSE, and a near-zero bias for the test data were achieved. Remarkable correlations between the SITAMSR2 and SITCS2 were observed in the scatter plot. The Pearson correlation coefficient was 0.98, and the regression slope was near unity (0.93). The distribution of most sample points in the scatter plot was very close to the 1:1 line. ) illustrates a histogram of SITAMSR2 (orange) and SITCS2 (blue), with a 10 cm bin width. While the AMSR2-based SIT retrieval model showed a relative weakness in estimating SITs of less than 1.3 m, it exhibited a better estimation capacity for ice with thicknesses of 1.3–4 m. ) is a boxplot of the differences in SIT between AMSR2 and CS2 (SITCS2 – SITAMSR2) according to SIT ranges. The median thickness differences (orange lines) tended to increase slightly from the negative to positive values as the sea ice thickened yet generally formed around zero, and the whiskers were approximately equal on both sides of the box for thicknesses up to 4 m. In particular, SITs between 1.5 and 3 m exhibited better agreements than the other ranges due to the higher number of training samples in this range.

Figure 3. Statistical accuracies of the proposed SIT retrieval model using the test set: (a) Scatter plot of gridded CS2-estimated SIT (SITCS2; x-axis) vs. AMSR2-estimated SIT (SITAMSR2; y-axis) values; (b) histogram of SITCS2 (blue) and SITAMSR2 (orange) with a 10 cm bin size; (c) boxplot of SIT differences between CS2 and AMSR2 according to ice thickness ranges

Figure 3. Statistical accuracies of the proposed SIT retrieval model using the test set: (a) Scatter plot of gridded CS2-estimated SIT (SITCS2; x-axis) vs. AMSR2-estimated SIT (SITAMSR2; y-axis) values; (b) histogram of SITCS2 (blue) and SITAMSR2 (orange) with a 10 cm bin size; (c) boxplot of SIT differences between CS2 and AMSR2 according to ice thickness ranges

Overall, errors can be attributed to two sources. First, the differences in spatial resolution and acquisition times between the AMSR2 and CS2 may have resulted in a few misestimated samples. Although our labeled matched samples used for model training and testing were selected carefully based on pre-defined criteria (Section 2.2), multiple thickness measurements from CS2 corresponding to one 25 km AMSR2 grid cell may have caused errors. Second, AMSR2 tended to underestimate SIT as the sea ice thickened, as seen in . The number of samples in the thick ice range (> 4 m) was much smaller than in the 1–3 m range. These imbalanced samples in thick SIT ranges can cause underestimations in the thick ice estimates.

We generated daily SITAMSR2 maps from October 2012–April 2020 and evaluated them by making seasonal and monthly statistical accuracy (MAE, RMSE, and correlation) comparisons with corresponding gridded SITCS2 values as listed in . All of the p-values from Student’s t-tests for the daily data were less than 0.05, implying a statistically significant difference between the two datasets at a confidence level of 95%. Since the developed ensemble 1D-CNN model was exposed to data from October 2015–April 2018 for model training and validation, the seasonal statistics for these periods were better than those for data not exposed to the model (October 2013–April 2015; October 2018–April 2020). Overall, the seasonal averages for the unseen periods were consistent throughout the unseen periods and were not significantly different from those for the test period (MAE: 13.28 cm versus 13.43 cm; RMSE: 18.07 cm versus 18.23 cm; Correlation: 0.9800 versus 0.9816). In monthly comparisons, SITAMSR2 estimates in October and April exhibited weak agreements with SITCS2 compared to those from November–March and the seasonal means. One possible explanation for this is that October and April are the freezing and melting onsets of Arctic sea ice, respectively (Markus, Stroeve, and Miller Citation2009; Peng, Steele et al., Citation2018b). Therefore, more variations in sea ice dynamics within a 25 km grid can exist during these periods. While the accuracy from November–March was relatively stable, accuracy tended to decrease slightly as sea ice extents increased because of the increased number of sea ice pixels to be calculated in the statistical comparisons (Comiso et al. Citation2008). These results indicate that the retrieval model developed herein is statistically robust regarding unseen data and generalized since the proposed DL model was trained using sufficient training and validation samples from three Arctic winter seasons. Therefore, the proposed model is expected to be suitable for future operational generation of daily AMSR2-based SIT products.

Table 4. Comparisons of seasonal and monthly MAE, RMSE, and the Pearson correlation coefficient between the daily AMSR2 and CryoSat-2-derived sea ice thickness

4.3 Quantitative accuracy assessment using gridded operational IceBridge measurements

We evaluated the SITAMSR2 estimated using the proposed method with in situ SITs collected from OIB missions (SITOIB) from 2012 to 2019. However, there were significant differences in the spatiotemporal scales between the 25 km resolution AMSR2 grid and the fine-scale in situ data. Similar to SITCS2, multiple OIB measurements were associated with the same SITAMSR2 value. For this reason, the OIB data were gridded on the same 25 km grid using the same approach used in the CS2 to AMSR2 grid conversions (see Section 2.2). We selected 740 spatiotemporally matched samples from gridded AMSR2, CS2, and OIB datasets for pixel-to-pixel cross-comparisons. illustrates this cross-comparison, whose statistics were calculated using the same metrics as the previous comparisons. As shown in ), 47 cm of MAE and 68 cm of RMSE, with a correlation of 0.76, were obtained between SITAMSR2 and gridded SITOIB, which are much larger than the error between SITAMSR2 and SITCS2 (see )). Two possible reasons may explain these differences. First, the spatiotemporal resolutions of the two data sources significantly differed. Although we assumed the gridded SITCS2 as a reference and used them to compare with SITAMSR2 in the previous experiments (Section 4.2), the SITOIB data were acquired from completely different instruments. Second, CS2-derived SIT estimates used climatological snow depth values and were multiplied over first-year ice using a factor of 0.5 due to a significantly higher fraction of first-year ice (Kurtz and Farrell Citation2011; Hendricks, Ricker, and Helm Citation2016). However, snow depths during the IceBridge campaigns were derived from the snow radar (Kurtz and Farrell Citation2011). Additionally, different ice density assumptions in freeboard to thickness conversions may have resulted in large differences between the datasets (Kurtz, Galin, and Studinger Citation2014). While the CS2 SIT product used values of 916.7 kg/m3 and 882 kg/m3 for first-year and multi-year ice, respectively (Laxon et al. Citation2013; Hendricks, Ricker, and Helm Citation2016), the OIB SIT used 915 kg/m3 (Kurtz and Farrell Citation2011; Kurtz et al. Citation2013). However, as shown in ), a comparison of SITCS2 and SITOIB also had a similar error level. A direct comparison of SITAMSR2 and SITCS2 for the matched samples in ) still had a good agreement, although the number of samples was substantially smaller than those used in ) (335,299 versus 740 samples). In previous studies, Laxon et al. (Citation2013) and Tilling, Ridout, and Shepherd (Citation2018) showed similar difference patterns between gridded SITCS2 and SITOIB measurements, although they used different samples and grid mapping scale (0.4° latitudinal by 4° longitudinal grid).

Figure 4. Statistical accuracy comparisons using 740 matched samples from gridded Operational IceBridge (SITOIB), SITAMSR2, and SITCS2 data: (a) Scatter plot of SITOIB (x-axis) vs. SITAMSR2 (y-axis); (b) scatter plot of SITOIB (x-axis) vs. SITCS2 (y-axis); (c) scatter plot of SITCS2 (x-axis) vs. SITAMSR2 (y-axis)

Figure 4. Statistical accuracy comparisons using 740 matched samples from gridded Operational IceBridge (SITOIB), SITAMSR2, and SITCS2 data: (a) Scatter plot of SITOIB (x-axis) vs. SITAMSR2 (y-axis); (b) scatter plot of SITOIB (x-axis) vs. SITCS2 (y-axis); (c) scatter plot of SITCS2 (x-axis) vs. SITAMSR2 (y-axis)

4.4 Visual inspections of AMSR2-based SIT maps

illustrates the AMSR2-based SIT maps over the entire Arctic Ocean for mid-October 2018 to April 2019 generated using the proposed DL method. Note that pixels with SICs of less than 15%. Due to a lack of daily SITCS2 data for the entire Arctic Ocean and PMW-based SIT for all ice types, direct daily visual comparisons for all AMSR2 pixels cannot be made. However, the spatial distributions of the SIT values and time-series are reasonable compared to the stage of sea ice development (Comiso et al. Citation2008), daily generated weekly/monthly averaged products (Ricker et al. Citation2017; Lee et al. Citation2020; Xiao et al. Citation2020; Artic SIT maps of Centre for Polar Observation and Modelling Data Portal (http://cpom.ucl.ac.uk)), and SIT values of adjacent pixels. show the monthly means of daily SIT generated from AMSR2 data, and –f) show the monthly means of daily CS2 trajectories on corresponding AMSR2 grid cells for January, February, and March 2019. Overall, the distribution of the SITAMSR2 exhibited a fair agreement with the SITCS2. However, relatively high regional variations were observed in the SITCS2 maps, particularly in first-year ice regions. Differing spatial coverages between the two sensors may explain these variations. While the AMSR2 data yields spatially uniform measurements over the entire Arctic every day, daily CS2 measurements are sparse, especially further from the pole (i.e. first-year ice zone). This indicates that the monthly mean in a specific area contains the estimates from different days.

Figure 5. Daily generated AMSR2-derived SIT maps at 25 km spatial resolution for mid-October 2018 to April 2019

Figure 5. Daily generated AMSR2-derived SIT maps at 25 km spatial resolution for mid-October 2018 to April 2019

Figure 6. Monthly averaged SIT maps at 25 km spatial resolution (top row: AMSR2-derived; bottom row: CS2-derived) for January (a, d), February (b, e), and March (c, f) 2019

Figure 6. Monthly averaged SIT maps at 25 km spatial resolution (top row: AMSR2-derived; bottom row: CS2-derived) for January (a, d), February (b, e), and March (c, f) 2019

4.5. Comparison with the daily SMOS SIT product for thin ice

SMOS-derived SIT (SITSMOS) is one of the most representative daily SIT products. However, it is limited to thin ice with depths of up to approximately 0.5–1 m and depends on the ice temperature and salinity (Kaleschke et al. Citation2012). Kaleschke et al. (Citation2015) and Ricker et al. (Citation2017) found that the SITCS2 over thin ice (< 1 m) and in the marginal ice zone have large relative uncertainties. In this study, the SITAMSR2 and SITSMOS for corresponding thin ice pixels (< 1 m) were quantitatively and qualitatively compared. ) shows a plot of the SITAMSR2 and SITSMOS values with statistical comparisons for the test period (Note: p-value from Student’s t-test was < 0.001). Unlike the statistical comparison with SITCS2 for all ice types in the previous section ()), the MAE and RMSE were calculated as approximately 21 cm and 37 cm, respectively. The estimated SITAMSR2 overestimated SIT compared to the SITSMOS (17 cm positive bias), and most of the data points in the scatter plot are above the 1:1 line. Notably, significant misestimations in sea ice pixels less than 10 cm were observed. As shown in ), the distribution of SITAMSR2 (orange) is significantly different from that of SITSMOS (blue). Particularly, retrievals of SIT values less than 5 cm were failed.

Figure 7. Statistical accuracy comparisons of SITAMSR2 and SMOS-derived SIT (SITSMOS) values for thin ice (< 1 m): (a) Scatter plot of SITSMOS (x-axis) vs. SITAMSR2 (y-axis) values; (b) histogram of SITSMOS (blue) and SITAMSR2 (orange) with a 5 cm bin size

Figure 7. Statistical accuracy comparisons of SITAMSR2 and SMOS-derived SIT (SITSMOS) values for thin ice (< 1 m): (a) Scatter plot of SITSMOS (x-axis) vs. SITAMSR2 (y-axis) values; (b) histogram of SITSMOS (blue) and SITAMSR2 (orange) with a 5 cm bin size

illustrates the SITAMSR2, SITSMOS, and their difference maps for thin ice (< 1 m) in the middle of January, February, and March 2019. The color scale in the ), (h), (i) is labeled SITAMSR2 – SITSMOS, with positive (SITAMSR2 > SITSMOS) and negative (SITAMSR2 < SITSMOS) differences colored red and blue, respectively. Overall, as in the scatter plot comparison, AMSR2 overestimated SIT with respect to SMOS, particularly near the coastlines of the United States, Canada, Russia, and Greenland, where first-year ice is dominant. In January images, large distributions of underestimated pixels were observed in the Beaufort, Chukchi, East Siberian, and Laptev Seas ()), where SMOS masks sea ice pixels thicker than 1 m ()). The Bering Strait and a small area of the Chukchi Sea were underestimated ()). Overall, misestimations increased as sea ice developed. In the March images (), the AMSR2 significantly overestimated thin ice in the Baffin Bay, the western Greenland Sea, and the Barents Sea.

Figure 8. Sea ice thickness maps at 25 km spatial resolution for thin ice (< 1 m) generated from AMSR2 (top row) and SMOS (middle row), and the difference between AMSR2 and SMOS (bottom row) on January 15 (a, d, g), February 15 (b, e, h), and 15 March 2019 (c, f, i)

Figure 8. Sea ice thickness maps at 25 km spatial resolution for thin ice (< 1 m) generated from AMSR2 (top row) and SMOS (middle row), and the difference between AMSR2 and SMOS (bottom row) on January 15 (a, d, g), February 15 (b, e, h), and 15 March 2019 (c, f, i)

As addressed in previous studies that exhibited large errors over thin ice regions of CS2 (Kaleschke et al. Citation2015; Ricker et al. Citation2017), we also observed large overestimations in the SITAMSR2 results compared to SITSMOS, as shown by both statistical () and visual () analyses. The proposed retrieval model highly relies on gridded SITCS2 measurements with centimeter-scale differences (as shown in Section 4.2) since it assumes that gridded SITCS2 is a reference and even mimics the potential errors/uncertainties in the SITCS2 estimations over thin ice. Sparse CS2 SIT measurements in first-year ice regions and exclusion of water-contaminated pixels (SIC < 15%) resulted in insufficient samples from new or young ice for DL model training. This can be another potential reason for weak agreements with sea ice less than 10 cm.

4.6 Statistical feature importance test

Conventional SIT retrieval algorithms generally exploited the physical characteristics of ice using specific PMW channels or altimeter waveforms. However, the DL-based retrieval model developed in this study exploits all frequency/polarization channels of the AMSR2 data. Due to the unique characteristics of neural networks, which solve problems by exploiting hidden relationships inherent in multiple input features, it is difficult to physically quantify the importance of the input variables. Alternatively, we performed a feature permutation test to statistically explain which feature makes the largest contribution to the SIT retrievals. A single feature was randomly varied, while all other features were kept constant, and this process was iterated by changing the input feature. The feature significance assessment shows how the model performance decreased due to random shuffling. In this case, performance was measured using the MAE, which was used as the loss metric in model training.

shows heat maps of the increased MAE values resulting from the input feature permutation tests, calculated after ten replicates. Inputs with greater increased MAE values due to random permutations were more significant for determining the SIT. Using this metric, the original TB features were not statistically significant, whereas the additional features generated by our feature augmentation strategy demonstrated greater responses. Although the logarithm TB features ()) seem to be more important than the original TB features ()), they were not significant. The Hadamard product of two AMSR2 channels ()) had a greater effect on retrievals than that of a single feature operation (–d)). Overall, the Hadamard products combining low- and high-frequency channels contributed the most to the AMSR2-based SIT retrievals. The features defined by inverse operations of the original, logarithm, and Hadamard product (, d, f)) generally exhibited greater feature importance than the non-inverted features (, c, e)). In particular, the five most important features were 1/(TB(89V)TB(18V)), 1/(TB(89V)TB(10V)), 1/(TB(89V)TB(7V)), 1/(TB(36V)TB(10V)), and 1/(TB(36V)TB(7V)), which are combinations of inverse and product operations of two features. As some of the NDTB features are conventionally used to retrieve sea ice/snow properties listed in , we expected that the NDTB features would be significant. However, except for NDTB (18V, 10V) and NDTB (23V, 18V), the NDTB features were generally less critical than expected, as shown in ); this could be because information regarding the NDTB features may be mutually contained in the other features in the hidden layers of the neural network.

Figure 9. Increasing MAE in the AMSR2 SIT retrievals obtained from feature permutation importance tests. (a) Original p polarized TB at a single AMSR2 frequency (ν). (b) Inverse of TBν,p. (c) Logarithm of TBν,p. (d) Inverse-logarithm of TBν,p. (e) Hadamard product of two frequency channels (v1,v2) for p polarization. (f) Inverse of TBν1,pTBν2,p. (g) Normalized difference TB between two different frequencies (ν1, ν2) for the polarization (p). Note: p denotes horizontal or vertical polarization. ν denotes the AMSR2 channels (6.9, 7.3, 10.6, 23.8, 36.5, and 89 GHz)

Figure 9. Increasing MAE in the AMSR2 SIT retrievals obtained from feature permutation importance tests. (a) Original p polarized TB at a single AMSR2 frequency (ν). (b) Inverse of TBν,p. (c) Logarithm of TBν,p. (d) Inverse-logarithm of TBν,p. (e) Hadamard product of two frequency channels (v1,v2) for p polarization. (f) Inverse of TBν1,p∘TBν2,p. (g) Normalized difference TB between two different frequencies (ν1, ν2) for the polarization (p). Note: p denotes horizontal or vertical polarization. ν denotes the AMSR2 channels (6.9, 7.3, 10.6, 23.8, 36.5, and 89 GHz)

Although this statistical feature importance test may not provide a physical meaning for the relationship between the PMW signals and SIT, we derived a partial relationship between the feature importance of the proposed model and the current PMW-based SIT retrieval algorithms. summarizes the average feature importance (according to frequency) of the features associated with a specific frequency channel from . Features associated with frequencies higher than 10.6 GHz tended to exhibit greater significances than those associated with low-frequency channels. In particular, two of the most critical frequencies (18.7 and 36.5 GHz) are widely used in thin ice thickness retrieval algorithms (Iwamoto, Ohshima, and Tamura Citation2014; Yoshizawa, Shimada, and Cho Citation2018). The 89 GHz channel helps exclude weather-related contamination (Iwamoto, Ohshima, and Tamura Citation2014; Yoshizawa, Shimada, and Cho Citation2018). Moreover, Lee, Sohn, and Shi (Citation2018) reported that the estimation of optical thickness associated with the scattering of ice freeboard and snow layers is possible using microwave frequencies between 10.65 and 36.5 GHz. In Lee et al. (Citation2020), Arctic basin-scale SIT estimates from AMSR2 were demonstrated using a piecewise linear regression of optical thickness at 36.5 GHz of AMSR2 and CS2-driven ice freeboard. These studies presented new possibilities for PMW-based SIT mapping based on the assumption that the scattering optical thickness of the surface ice layer at microwave frequencies was linearly proportional to the physical thickness of the ice freeboard layer. Unfortunately, it is still difficult to clearly explain which input feature has the most physically significant effect on the retrievals. However, based on the results from the feature importance test in this study and the studies mentioned above, a partial relationship between PMW signals and SIT was explained.

Table 5. Average feature importance (according to frequency) for the features associated with a specific frequency channel. The more important frequencies have higher increased MAE values (unit: m)

Determining the significances of the input features will allow for a statistical explanation and will foreseeably help to optimize the input features of the SIT retrieval model for future operational systems. Moreover, the results demonstrate that the use of feature augmentation based on mathematical operations that cannot be exhibited in “add” operation-based general neural network architectures is more effective than only exploiting the original TB data. The absence of negative features in indicates that each PMW channel plays specific roles in improving the SIT retrieval model in DL hidden networks.

5. Conclusions

This study devised an ensemble 1D-CNN model and employed PMW data from the winter season to estimate daily SITs for the entire Arctic at 25 km spatial resolution, which could not be retrieved by CS2 due to ground track constraints. In general, the proposed method successfully estimated the SIT compared to the gridded SITCS2 and produced spatially/temporally consistent outputs in conjunction with a robust retrieval model and well-calibrated AMSR2 TB data. The following conclusions can be derived from this study.

First, most SIT retrieval studies using PMW data have been conducted to estimate thin ice thicknesses. Considering the constellation of multiple altimetry data such as Jason-3, Sentinel-3A/B, or SARAL/AltiKa may provide a better temporal resolution. However, a single altimeter sensor has been unable to map the entire Arctic Ocean on a daily scale due to small sensor footprints, and the temporal resolution of the SIT generated by merging multi-sensor data is seven days. These limitations motivated this study to use the DL approach to exploit hidden information across multiple PMW channels; thus, a DL-based retrieval model for all ice types was developed based on PMW data. Employing additional features and an ensemble approach improved the results. Second, we achieved overall MAE and RMSE accuracies of approximately 12 cm and 18 cm for unseen data, respectively, and remarkable correlations between the SITAMSR2 and gridded SITCS2. Due to the nature of machine learning, it is difficult to physically explain which AMSR2 channel made the largest contribution to the SIT estimations. Using a statistical approach, we evaluated the relative importance of the AMSR2 channels. In particular, since our approach exploited all of the AMSR2 channels, it enabled direct SIT retrievals without separate freeboard and snow depth calculations, which are hidden in the model training. Additionally, features associated with the 10.65–36.5 GHz channels exhibited more significant contributions to the AMSR2-based SIT retrievals. These frequency channels have also been incorporated into the development of a relationship between optical thickness and ice freeboard by Lee, Sohn, and Shi (Citation2018) and Lee et al. (Citation2020). These studies support a physical relationship between PMW signals and ice thickness for all ice types. Lastly, since the proposed retrieval model consistently generated eight-year historical data with consistent accuracy, it can be considered as a new operational SIT retrieval system for providing daily Arctic SIT data.

Although the proposed ensemble 1D-CNN model successfully estimated SITs from AMSR2 PMW data at 25 km spatial resolution and showed good agreements with gridded CS2 estimates, the uncertainties and errors in the CS2 measurements, particularly for ice less than 1 m thick, were also observed in comparisons with daily SMOS ice thickness data (). Since the SIT retrieval model used in this study was trained using gridded SITCS2 as a reference for all ice thickness ranges, it may also capture uncertainties and errors in the reference data due to the characteristics of the DL models. Although 12 cm of MAE may not be trivial for new or young ice, it is worth noting that providing daily SIT maps for the entire Arctic would improve various cryosphere studies/models and support industrial applications. In addition, combining the proposed model with L-band (1.4 GHz) data from SMOS, widely used to derive thin ice thicknesses, would be worthwhile for extending this research.

Acknowledgements

Daily Level-3 gridded TB data of AMSR2, SIT trajectory data of CS2, and NASA’s OIB data are publicly available at the Globe Portal System (https://gportal.jaxa.jp) operated by JAXA, AWI (https://awi.de), and NSDIC (https://nsidc.org), respectively.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Korea Polar Research Institute [PE21040].

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.