1,280
Views
6
CrossRef citations to date
0
Altmetric
Research Article

Aerosol data assimilation and forecast using Geostationary Ocean Color Imager aerosol optical depth and in-situ observations during the KORUS-AQ observing period

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1175-1194 | Received 16 May 2021, Accepted 11 Aug 2021, Published online: 10 Sep 2021

ABSTRACT

This study develops an aerosol data assimilation and forecast system using the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) and the three-dimensional variational (3D-VAR) data assimilation method. The system assimilates the aerosol optical depth (AOD) from the Geostationary Ocean Color Imager (GOCI) satellite and surface particulate matter (PM) observations. The simulation domain covers Northeast Asia at 15 km horizontal resolution, and the assimilation and forecast skill is evaluated for the Korea–US Air Quality (KORUS-AQ) intensive observing period. Observing system experiments (OSEs) are conducted to examine the changes in quality of assimilation and forecast skills sensitive to the assimilated observational input data. The baseline model simulation underestimates AOD and surface PM concentration in most regions, in which the assimilation of satellite and in-situ data improves the mean biases and spatial distribution. Moreover, it improves the forecast skill of the surface concentration of PM10 and PM2.5. The results from the OSEs indicate that the assimilation of GOCI AOD only slightly enhances the forecast skill. However, most of the skill improvement comes from the surface PM assimilation, showing a practically useful level of skill until 12 hours from the initial state. The marginal improvement in the PM10 forecasts by the GOCI AOD suggests the non-negligible difference between column-representing AOD and the surface PM concentration.

1. Introduction

Previous studies indicate that the atmospheric aerosol is closely related to the local air quality, causing various health issues (Seaton et al. Citation1995; Pope et al. Citation2002; Giannadaki, Lelieveld, and Pozzer Citation2017). Especially, the fine particulate matter (PM) and sulfates cause severe cardiopulmonary diseases, which are likely to increase the mortality by approximately 6% and 8%, respectively, at each 10 μg m−3 increase of airborne fine PM concentration (Pope et al. Citation2002). The global aerosol concentration has been continuously increased as observed by the aerosol optical depth (AOD) from satellites with a rate of 0.011 per decade for 2000–2009 (Alfaro-Contreras et al. Citation2017). Particularly, the trend in Northeast Asia, where the annual-mean AOD shows the highest in the globe, is almost eight times stronger than the global-mean trend. Atmospheric aerosol monitoring and prediction become more important in the early warning and making proper decisions to mitigate damages and adverse health impacts. Most national agencies are now operating routine air quality forecasts for atmospheric aerosols based on numerical models (Kukkonen et al. Citation2012; Marécal et al. Citation2015). The European Center for Medium-Range Weather Forecasts (ECMWF) provides the air quality forecast service (the Copernicus Atmosphere Monitoring Service; https://atmosphere.copernicus.eu/) over Europe and the global domain since 2015. The National Oceanic and Atmosphere Administration (NOAA) services National Air Quality Forecast Capability (NAQFC) operational air quality forecast guidance for the United States (Lee et al. Citation2017; https://airquality.weather.gov/). Moreover, many recent studies use satellite data extensively to monitor the ground-level air quality information (Althuwaynee, Balogun, and Madhoun Citation2020; Filonchyk et al. Citation2020; Unnithan and Gnanappazham Citation2020).

Data assimilation is one of the key elements to improve forecast skills. It is basically to combine model background states containing inevitable systematic biases of a forecasting model and incomplete observational data in space and time. Updated observational data correct the systematic biases of the model and the observational data are fit into the gridded analysis under the dynamical and physical constraint as formulated by the model. As an initial value problem, forecasting is significantly affected by the data assimilation techniques that determine the quality of the initial states. There are a couple of existing studies that improve the air quality forecast skill by integrating satellite-derived and in-situ observations with the data assimilation system. Liu et al. (Citation2011) applied the satellite-derived AOD data at 550 nm wavelength from the Moderate Resolution Imaging Spectroradiometer (MODIS; Justice et al. Citation1998) instrument to the Weather Research and Forecasting–Chemistry (WRF-Chem; Grell et al. Citation2005) model, based on the 3-dimensional variational (3D-VAR) method. It shows not only the improved representation of the aerosol analysis fields but also the significant increase in the forecast skill measured by the equitable threat score (ETS). However, the results also show the limitations of using polar-orbiting satellites due to their limited coverage in space and time. Li et al. (Citation2013) approached the data assimilation with surface PM2.5 observations only collected from the California Nexus (CalNex; http://www.esrl.noaa.gov-/csd/calnex/) 2010 field campaign. The correlation between observation and analysis increased significantly from the case without data assimilation. The study suggested that the surface observations could improve the quality of the aerosol analysis sufficiently for the case of PM2.5. Saide et al. (Citation2014) constructed a data assimilation system using the geostationary AOD product from the Geostationary Ocean Color Imager (GOCI) onboard the Communication Ocean and Meteorological Satellite (COMS), in addition to the polar-orbiting MODIS AOD data. Their system also adopted the 3D-VAR method and applied it to the WRF-Chem model coupled with the Model for Simulating Aerosol Interactions and Chemistry (MOSAIC; Fast et al. Citation2006) aerosol scheme. Their study suggested the geostationary GOCI AOD data are highly beneficial in improving the quality of aerosol analysis when combined with the MODIS AOD. Over the validation region in South Korea, the fractional bias of PM10 analysis at the surface was substantially decreased from the case when the MODIS AOD data were used only to the case when both AOD products were used.

The aforementioned studies demonstrated well the benefits of using satellite-derived AOD data in improving analysis and, thereby, improving air quality forecast skills. In particular, for the regions where the long-range transport of dust and chemical pollutants is dominant such as in Northeast Asia, including China, Korea, and Japan, the satellite data are ideal due to their wide areal coverage. On the other hand, the conclusions made from the previous studies need to be considered carefully, as they use various sources of observational data with different model configurations among the studies. In particular, the individual contributions from the satellite-derived and in-situ observations have not been assessed quantitatively in a single data assimilation system. For example, the study by Liu et al. (Citation2011) and Saide et al. (Citation2014) used the satellite-derived AOD products only, and the study by Li et al. (Citation2013) used the in-situ PM2.5 observations only. Moreover, most studies remained to demonstrate the improvement for the specific cases (Liu et al. Citation2011) or a short testing period of fewer than two weeks (Saide et al. Citation2014). In this regard, it needs a more extended test for the quantitative assessment of the improvement.

This study first explains constructing a 3D-VAR data assimilation system that uses the AOD from GOCI and the surface PM observations in the Northeast Asian domain. Then, it evaluates the quality of the aerosol analysis with a primary focus on examining the veracity of the spatial and temporal variation. The observing system experiments (OSEs; Gelaro and Zhu. Citation2009) are conducted to isolate the individual impacts by the satellite and the ground observations on the data assimilation and forecast performance. The experiment period is 41 days covering the Korea–United States Air Quality (KORUS-AQ; Al-Saadi et al. Citation2016) campaign period.

This study consists of the following sections. Section 2 describes the observation data, chemical transport model, data assimilation system, and experimental designs. Section 3 presents the analysis quality for PM10, PM2.5, and AOD, respectively, regarding spatial and temporal variations. Then, the impacts on the forecast skill are presented. This study also discusses limitations and potentials for further improvement in satellite data assimilation. Section 4 provides a summary and concluding remarks.

2. Methodology

2.1 Data

The KORUS-AQ intensive observing period was an air quality monitoring campaign for atmospheric aerosols and chemicals in Northeast Asia from 1 May to 10 June 2016. During this period, various observational data are available for the input data to the data assimilation and the validation of the system performance (). These include the AOD at 550 nm visible bands from the GOCI satellite instrument produced by Yonsei University (Lee et al. Citation2010), originally in 500-m horizontal resolution and processed into 6 km x 6 km grids. The GOCI data are available only for daytime during 00–07 UTCs when the visible sensor can detect the aerosol optical signal. Independent surface observations for AOD by the Aerosol Robotic Network (AERONET; Holben et al. Citation1998) are used to verify the data assimilation products. There exist 21 AERONET stations located in China, South Korea, and Japan during the KORUS-AQ period. Surface PM10 and PM2.5 data are collected from the 945 stations in the China Monitoring Network available on its website http://pm25.in/ and the 346 National Ambient air quality Monitoring Information System (NAMIS; Lee et al. Citation2008) stations in South Korea.

Table 1. Observational data used in the study

This study also uses the AOD reanalysis at 550 nm by the Modern-Era Retrospective analysis for Research and Applications version 2 produced by the National Aeronautics and Space Administration Goddard Space Flight Center (MERRA-2; Gelaro et al. Citation2017) as a reference from the independent analysis. It provides a global aerosol analysis produced by the analysis splitting data assimilation method (Randles et al. Citation2017) applied to the Goddard Earth Observing System (GEOS) atmospheric model. AOD observations include several sources, including the MODIS reflectance as the vast majority of the assimilated data since 2002.

2.2 Model simulations

This study uses the WRF-Chem model version 3.9.1 to simulate the atmospheric aerosol concentration in Northeast Asia. WRF-Chem is the online transport model to simulate the interaction between meteorological conditions, atmospheric chemistry, and aerosol processes.

The chemistry scheme is the Model for Ozone and Related Chemical Tracers (MOZART; Brasseur et al. Citation1998) developed for gaseous chemistry by the National Center for Atmospheric Research (NCAR). The MOZART chemistry contains a detailed description of the tropospheric inorganic chemistry mechanism, small alkane and alkene structures, isoprene, terpenes, and aromatics.

Among the aerosol schemes available in the WRF-Chem, this study uses the Goddard Chemistry Aerosol Radiation and Transport (GOCART; Chin et al. Citation2002) scheme. The GOCART categorizes the atmospheric aerosol into seven different species, including black and organic carbons, sulfate, dust, sea salt, and uncategorized PM2.5 and PM10. GOCART also divides the species depending on the particle size, specifically for dust and sea salt. For dust, five bins are assigned for the particles with a radius of 0.5, 1.4, 2.4, 4.5, and 8.0 μm. For sea salt, four bins are for a radius of 0.3, 1.0, 3.25, and 7.5 μm. Each of the black and organic carbons is divided into two categories (hydrophilic or hydrophobic). This results in 16 different aerosol species in total resolved by the GOCART scheme.

GOCART implements an interactive scheme with meteorological conditions for the natural dust and sea salt emission. The natural dust scheme is based on Ginoux et al. (Citation2001), in which the dust emission flux is parameterized with the dust source function and the surface wind speed at 10 m. The source function is the erodible fraction of soil sediments. The model considers the topographical complexity in the surrounding grids, and the source function increases in the alluvial lowland area. The dust emission also increases by the magnitude of surface wind only when it exceeds the threshold value formulated with the particle size and surface wetness. It assumes no dust flux in the wet area where the surface wetness is larger than 0.5. Sea salt emission from the ocean also depends on the particle size and surface wind speed, based on the empirical relationship (Chin et al. Citation2002).

The WRF-Chem model specifies the anthropogenic emission from the Emission Database for Global Atmospheric Research-Hemispheric Transport of Air Pollution (EDGAR-HTAP; Janssens-Maenhout et al. Citation2012), a global emission inventory produced by the European Commission (EC). It includes NOx, SOx, methane, NH3, VOC, CO, and carbonaceous aerosols. This emission data is based on the globally archived records in 2010. Biogenic emission is estimated by the Model of Emissions of Gases and Aerosols from Nature (MEGAN; Guenther et al. Citation2016) using the satellite-derived leaf area index (LAI) from the MODIS onboard Terra and Aqua. Biomass burning emission is specified based on the Fire Inventory developed by NCAR (FINN; Wiedinmyer et al. Citation2011).

Regarding the model physics, the model parametrizes cumulus convection based on the Grell-3D scheme (Grell and Dévényi Citation2002), cloud microphysics from the Single Moment-6 (WSM6; Hong and Lim Citation2006) scheme, the land surface scheme from the Unified Noah Land-Surface Model (Chen et al. Citation1996), planetary boundary layer (PBL) from the Yonsei University (YSU) scheme (Hong, Noh, and Dudhia Citation2006), shortwave and longwave radiation from the Goddard shortwave scheme (Chou et al. Citation1998) and the Rapid Radiative Transfer Model (RRTM; Mlawer et al. Citation1997) longwave schemes, respectively.

shows the simulation domain, and the model is run at a 15 km horizontal resolution. The domain contains 27 vertical layers up to 50 hPa. The initial and boundary conditions (ICs and BCs) for meteorology are specified using the Met Office Unified Model (UM; Bellouin et al. Citation2011) analysis processed by the Korea Meteorological Administration (KMA). The chemical ICs and BCs are specified by the MOZART-4 (Emmons et al. Citation2010) global chemical reanalysis produced by the Goddard Earth Observation System 5 (GEOS-5) global model with the MOZART chemistry.

Figure 1. (a) The simulation domain in Northeast Asia by WRF-Chem (only shaded region) is shown with surface elevation (unit: meter). The model uses the Lambert Conformal map projection. (b) shows the validation domain over the Korean Peninsula. The black circles in the figures indicate the ground PM observation sites, and the magenta triangles the AERONET sites for ground AOD observations

Figure 1. (a) The simulation domain in Northeast Asia by WRF-Chem (only shaded region) is shown with surface elevation (unit: meter). The model uses the Lambert Conformal map projection. (b) shows the validation domain over the Korean Peninsula. The black circles in the figures indicate the ground PM observation sites, and the magenta triangles the AERONET sites for ground AOD observations

2.3 Aerosol data assimilation

This study uses the Gridpoint Statistical Interpolation (GSI) data assimilation system version 3.5 developed by NOAA, National Aeronautics and Space Administration (NASA), and NCAR, and the source code is available in public. The Community Radiative Transfer Model (CRTM; Weng et al. Citation2005) in GSI supports the 14 aerosol species resolved in GOCART, which is the aerosol scheme of the forecast model in this study. The 3-dimensional variational (3D-VAR; Barker et al. Citation2003) method is used for the data assimilation algorithm, which considers both errors from the model and observations referenced to the cost function J(x),

(1) Jx=12xxbTB1xxb+yHxTR1yHx.(1)

To find the optimal analysis, the cost function is minimized, where x is the true (analysis) value, xb the model background, y the observation, and H the observation operator. B and R represent the background and observational error covariance matrix, respectively. The method to derive the minimum cost function is to find the value x, which makes the derivative of J(x) as zero. As the analytical solution needs to inverse large background error covariance matrix, an iteration method finds the analysis fields that minimize the cost function. In GSI, the cost function J(x) is transformed into two directional functions to converge to the solutions quickly. The innovation (i.e. y – H (x)) is distributed to the model prognostic variables according to the predetermined background error covariance matrix.

The model background and observational error covariances are estimated based on Liu et al. (Citation2011). The model background error covariance is calculated by the National Meteorological Center (NMC; Parrish and Derber Citation1992) method using the forecast error covariance averaged over the differences between two short-range model forecasts with different initial times but verifying at the same time. In this study, the model error is defined as the difference between 24- and 12-h WRF-Chem forecasts, and the error covariance for the aerosol variables is obtained using the values integrated for a month (i.e. 1–31 May 2016). Observation error covariance is specified as follows. The GOCI AOD error is set differently over the ocean and the land as a function of the aerosol optical depth τ,

(2) EOCN=0.185τ+0.037,(2)
(3) ELND=0.137τ+0.08,(3)

where EOCN and ELND are the observation errors for the ocean and the land, respectively (Choi et al. Citation2018). Surface PM error (E) is the sum of the measurement (EM) and the representative (ER) errors as below

(4) EM=EMax+EMin100(4)
(5) ER=ETEMΔxSLU(5)
(6) E=EM2+ER2(6)

where EMax = 1.5 μg m−3 and EMin = 0.75 μg m−3 are the maximum and the minimum errors for PM concentration, respectively. ET = 0.5 is a tunable parameter; Δx = 15 km is the horizontal resolution, and SLU = 3 km is the length scale of the influence of observations. These parameters are the standard values from the GSI configuration, which are slightly modified from Pagowski and Grell (Citation2012).

2.4 Data assimilation and forecast cycle

The aerosol data assimilation is conducted in a 6-h interval at every 00, 06, 12, and 18 UTCs, when all 16 aerosol species in GOCART are updated by combining 6-h WRF-Chem forecasts (i.e. the model background states) started from the previous analysis and new observations from GOCI AOD and surface PM concentrations. The updated aerosol analysis fields are the initial conditions for the next 6-h WRF-Chem forecasts and data assimilation. The concentrations of gaseous chemical species are updated continuously by the model with no observational constraint. Meteorological variables are simulated by the model for 24 hours until they are reinitialized by the UM analysis at 00 UTC each day. The GOCI data are assimilated only for 00 UTC and 06 UTC, as they are unavailable for 12 UTC and 18 UTC. The time window of GOCI is set as ±3 hours, in which the AOD values are averaged in time with equal weightings. Liu et al. (Citation2011) also applied the same time window in assimilating the MODIS AOD. Note that the hourly AODs at 00, 01, 02, and 03 UTCs are used for the 00 UTC data assimilation, and the data at 03, 04, 05, 06, and 07 UTCs are used for the 06 UTC data assimilation, respectively, as the GOCI is available only for daytime from 00 UTC to 07 UTC. On the other hand, the assimilation processes in situ PM data at the exact time of 00, 06, 12, and 18 UTCs.

This study conducts three DA runs: DA1 assimilates the GOCI AOD only, DA2 does the surface PM observations (PM10 and PM2.5) only, and DA3 does both GOCI AOD and the surface PM observations. The NoDA run indicates a free WRF-Chem forecast run restarted from the end of the previous 24-h run for chemistry and aerosol fields. In NoDA, the meteorological conditions are updated at 00 UTC every day as in the DA runs. Therefore, the difference between NoDA and DA is the impact of aerosol data assimilation.

Aside from the data assimilation cycle, the forecast runs start every day at 00 UTC and integrate for 24 h during the entire KORUS-AQ period with different initial conditions for the aerosol variables. DA1 denotes the forecast runs initialized from the GOCI AOD assimilation, DA2 from the surface PM assimilation, and DA3 from the assimilation of both GOCI AOD and the surface PM. Initial and boundary conditions for meteorological fields are from the 6-hourly global UM forecast fields initialized at 00 UTC. Initial conditions for gaseous chemistry fields are restarted from the previous 24-h forecasts.

3. Results

3.1 Time-series analysis during KORUS-AQ

During the KORUS-AQ period, there was a high concentration episode due to the Asian Dust on 7 May, clearly shown in the observed PM10 concentration averaged over South Korea (). The day of 7 May was officially recorded as the Asian Dust day. It was a typical springtime day with no rainfall in South Korea, with the daily maximum temperature ranging from 19°C to 23°C. The highest concentration in the figure is at 00 UTC on 7 May with a peak concentration of 200 μg m−3. There is another time of relatively high PM10 concentration during 25 May–1 June. The ground PM10 concentration maintains at a high level above 80 μg m−3 on average, and this is attributed not to the Asian Dust but the long-range transport of air pollutants from China (Choi et al., Citation2021). There is also a buildup period of PM10 concentration during 11–23 May due to the accumulation of local air pollutants under the stagnant weather until washed out by rain on 24 May.

Figure 2. Time series of (a) PM10, (b) PM2.5, and (c) AOD averaged for South Korea from the observations (black) and the simulations by NoDA (gray), DA1 (blue), DA2 (green), and DA3 (red). The observations show the average of all 346 NAMIS stations in South Korea in (a), and 16 AERONET stations located in South Korea in (b). For the validation, the nearest grid values to the ground stations are averaged to represent the model values. The unit is μg m−3 for PM concentrations and dimensionless for AOD

Figure 2. Time series of (a) PM10, (b) PM2.5, and (c) AOD averaged for South Korea from the observations (black) and the simulations by NoDA (gray), DA1 (blue), DA2 (green), and DA3 (red). The observations show the average of all 346 NAMIS stations in South Korea in (a), and 16 AERONET stations located in South Korea in (b). For the validation, the nearest grid values to the ground stations are averaged to represent the model values. The unit is μg m−3 for PM concentrations and dimensionless for AOD

NoDA generally underestimates the observed values, particularly in those episodic events – the Asian Dust event, the local accumulation of air pollutants, and the long-range transport from China. Simulated temporal variation is also weaker than the observed. When the GOCI AOD is assimilated (DA1), the PM10 analysis shows a much-reduced discrepancy from the observations, particularly for the three episodes. However, it is still likely to underestimate the values occasionally. When the surface PM data are assimilated (DA2), it delivers a better result. This is expected because the ground PM observations are directly provided for the data assimilation. When both satellite and ground observations are assimilated (DA3), the result is almost identical to DA2, showing the best fit for the observed time series. Especially in the Asian dust period on 7 May and the long-range transport regime of 25 May–1 June, it reproduces most of the observed PM10 concentration.

The observed time series of PM2.5 () show a similar temporal variation as in the PM10 time series, but also with some differences. Given that the PM10 concentration includes PM2.5, the surface PM2.5 contributes relatively more to PM10 in the long-range transport regime during 25 May–1 June. It suggests the natural dust, mostly sand, consists of large particles, whereas the anthropogenic emission contributes more to the PM2.5 component. DA2 and DA3 experiments show almost identical results, and both reproduce the observed PM2.5 concentration remarkably well. This is again due to the direct use of surface PM2.5 observation in the data assimilation process, which tends to correct the systematic underestimation bias in NoDA as in the case of PM10. However, DA1 shows more discrepancy from the observed values than DA2 or DA3. Unlike the case in PM10, the satellite-only data assimilation tends to overestimate PM2.5 in those three major episodic events.

The observed AOD variation by AERONET stations () shows a somewhat different time variation to PM10 or PM2.5. It does not show the highest values during the Asian dust event during 5–9 May. This implies that both variables do not always go along with each other. It suggests a weak coupling between the column-representing GOCI AOD and the surface-representing PM observations. Instead, the time series show sporadic increasing events in the late period of KORUS-AQ, for example, around 25 May–1 June and 7–11 June. The large temporal variation seems partly related to the statistical noise in the average with small data samples from the AERONET sites. The AOD values show much spread across the stations, especially after 25 May (not shown), indicating a large variability in AOD in space and time.

In the experiments, NoDA shows an underestimation of AOD as in the case of PM10 or PM2.5. The AOD simulated by DA2 shows an almost similar pattern to that by NoDA. The simulations by DA1 and DA3 are more consistent with each other, and both show a better consistency with the AERONET observations. However, they have difficulty in reproducing sporadic high concentration events, presumably due to the difference in spatial representativeness between ground and satellite AOD observations. When the two products are compared over the 21 AERONET sites, AERONET shows higher values than GOCI, although they maintain a high correlation (r = 0.76; See Supplementary Fig. S1).

3.2 Verification of major episodes

An example case of the Asian Dust event on 7 May 2016 during the KORUS-AQ period is selected to compare each data assimilation result (). Although the GOCI observations show many missing values over the domain due to cloud contamination, they give a hint of the highly concentrated AOD plume across the southern part of the Korean Peninsula. It also shows the high AOD values in the East Sea. The NoDA run simulates the dust plume covering from China to South Korea and Japan (), suggesting that the WRF-Chem model driven by the observed meteorological conditions can simulate the dust event to some extent. However, the simulated AOD values are much weaker than the GOCI AOD. In South Korea, NoDA tends to simulate relatively high AOD values in the south but not as high as in the GOCI observations. Assimilation of AOD from the GOCI data only directly benefits the simulation results (DA1). It tends to increase the AOD values in most areas in the domain. Moreover, although the satellite AOD has a limitation in depicting the dust event clearly due to many missing values (), DA1 successfully identifies the large-scale dust event with considerably large AOD values (). Note that DA2 () can only increase the AOD values in the land area of China. An AOD increase but in weak magnitude is also found in South Korea. The assimilation with the ground PM observations only shows that the dust plume becomes weaker over the oceans and in Japan, where no ground observations are available. This suggests a significant beneficial impact of using satellite AOD values in the aerosol data assimilation that fills the gap of ground observations. When both GOCI AOD and ground PM observations are assimilated altogether (DA3; ), the result is consistent with DA1 with no significant difference.

Figure 3. The AOD distribution at 00 UTC on 7 May 2016 over Northeast Asia from (a) the GOCI observations, (b) NoDA (with no assimilated observations), (c) DA1 (the assimilation of GOCI only), (d) DA2 (ground PM only), and (e) DA3 (GOCI and ground PM). The area within the red frame in (a) indicates the spatial coverage by the GOCI instrument. The unit of AOD is dimensionless

Figure 3. The AOD distribution at 00 UTC on 7 May 2016 over Northeast Asia from (a) the GOCI observations, (b) NoDA (with no assimilated observations), (c) DA1 (the assimilation of GOCI only), (d) DA2 (ground PM only), and (e) DA3 (GOCI and ground PM). The area within the red frame in (a) indicates the spatial coverage by the GOCI instrument. The unit of AOD is dimensionless

shows another example of a high-concentration event in South Korea due to long-range transport from China. In the GOCI observations (), AOD indicates high values from west to east at the 35° N latitude belt, although clouds contaminate most satellite pixels. NoDA () simulates the high AOD plume stretching from central China to the middle part of the Korean Peninsula and the Hokkaido in Japan. However, the transport path tends to shift to the north so that the AOD values in South Korea become weak. When the satellite data are assimilated in DA1 () and DA3 (), they tend to increase the magnitude of the AOD plume and expand it meridionally, covering most of South Korea with high AOD values. The data assimilation with surface PM observations only () does not improve much from NoDA in magnitude and spatial pattern. The satellite data assimilation helps improve the values in the ocean points and the downstream side in northern Japan. Note that the surface PM assimilation further reduces the AOD values in central China from DA1 to DA3, where the satellite AOD has much uncertainty due to many missing values.

Figure 4. Same as in except at 00 UTC on 26 May 2016

Figure 4. Same as in Figure 3 except at 00 UTC on 26 May 2016

3.3 Time-mean analysis patterns

compares the time-averaged surface PM10 from the surface observation sites and the various data assimilation experiments. The observation () shows high values in northern China, particularly over the Beijing-Tianjin-Hebei (BTH) area known for highly polluting coal-fired energy and manufacturing industries, transportation, and agricultural sectors. Another polluted region is located in the Yangtze River Delta centered in Shanghai. The surface PM concentration is maintained relatively lower in South Korea.

Figure 5. Time-averaged surface PM10 concentration (unit: μg m−3) over Northeast Asia during KORUS-AQ from (a) surface observations, (b) NoDA, (c) DA1, (d) DA2, and (e) DA3 experiments. From (f) to (h), the figures show the differences for DA1, DA2, and DA3 from NoDA, respectively. The analysis fields at 00 and 06 UTCs are averaged

Figure 5. Time-averaged surface PM10 concentration (unit: μg m−3) over Northeast Asia during KORUS-AQ from (a) surface observations, (b) NoDA, (c) DA1, (d) DA2, and (e) DA3 experiments. From (f) to (h), the figures show the differences for DA1, DA2, and DA3 from NoDA, respectively. The analysis fields at 00 and 06 UTCs are averaged

NoDA () shows an overall underestimation of surface PM10 for the BTH region in China and South Korea. The exception is in the continental interior in central China, where the model tends to overproduce the PM10 concentration. Overall underestimation by NoDA is consistent with the findings from , indicating a systematic model bias that can be generalized over the entire analysis period. The underestimation by the model can be caused by many factors, including surface emission, secondary formation, transport, and deposition process. It requires a more in-depth analysis and model experiments to pin down the root causes, which is well beyond the current research scope. This study just speculates that the model deficiency should be related to the considerable uncertainty in the surface-emission specified in the model and the parameterized aerosol chemistry and transport. The current emission inventory of the EDGAR-HTAP is produced by the bottom-up approach by integrating reported emissions, and possibly, there should exist many unknown sources or unreported emissions. Another deficiency is in the heterogeneous reactions associated with nitrogen and sulfur species. Many previous studies (e.g. Fairlie et al. Citation2010; Wang et al. Citation2012) indicate the significant contribution by the formation of nitrate and sulfate aerosols in East Asia to the total aerosol amount through heterogeneous thermodynamic reactions. The current MOZART and GOCART schemes do not explicitly represent these secondary aerosol formation processes. The model deficiency also lies in the parametrized natural dust scheme (See Section 2.2). The NoDA run underestimates the surface PM10 concentration in northwest China, where the natural dust is originated from the Gobi Desert and transported to increase the aerosol amount. The WRF-Chem model used in this study enables the dust module in the GOCART scheme that lifts the natural dust dynamically by surface winds. The overall weak surface PM10 concentration level in the northwest suggests a weak transport of natural dust from the source region by the model.

When only the GOCI AOD is assimilated (), surface PM10 increases in most areas. In the difference from NoDA (), the AOD assimilation helps increase the surface PM10 over the land area in northeastern China, North Korea, north of Japan, and the adjacent oceans. The results show more consistency with the surface observations over the BTH in China, where the positive contribution to the PM10 concentration by the AOD assimilation is rather broad in space. In this regard, DA1 shows the difficulty in resolving local-scale sources in the major cities in the BTH region in China. In addition, DA1 slightly underestimates the ground concentration in South Korea.

When only the ground PM observations are assimilated (), DA2 now tends to resolve the local-scale sources in major cities over BTH by substantially increasing the PM10 level comparing with NoDA. In addition, the surface data assimilation impacts are not limited to the observation data points but also in the adjacent regions. The data assimilation impact is most significant where the surface PM observations are available. Contrarily, its effect is almost negligible in North Korea and Japan, where no surface observation is provided (). Note that DA2 is almost identical to NoDA in the east of the Korean Peninsula, suggesting that the surface observations are limited in improving the analysis in farther regions. The surface PM10 analysis is mostly benefited by assimilating both satellite and ground observation (). The difference from NoDA () shows the features of simply adding two difference patterns of DA1 and DA2 from NoDA ().

Different from the case of PM10, the NoDA simulation for PM2.5 () shows much improvement over BTH in China with comparable magnitude to the observed values (). However, it still indicates regional discrepancy, such as the overestimation in the inland area in central China and the underestimation in the Yangtze River Delta and South Korea. The GOCI AOD only assimilation () tends to increase the concentration from NoDA in most regions, particularly in northeastern China and northern Japan. The difference from NoDA () is quite similar to the PM10 case (). Contrarily, the surface PM data assimilation () helps suppress the overestimation bias in the continental interior by NoDA. This is not driven by the satellite data assimilation (), as it is out of the GOCI retrieval domain. As in the case of PM10, the surface data assimilation corrects NoDA near the observation sources (), while the satellite data cover a much wider region (). The difference from NoDA by DA3 () is just close to the linear combination of those by DA1 and DA2 ( and 5g), consistent with the case of PM10.

Figure 6. Same as except PM2.5.

Figure 6. Same as Figure 5 except PM2.5.

compares the time-averaged spatial distribution of AOD from GOCI observations (), MERRA-2 (), the WRF-Chem simulations (), and the DA analyses (). In general, the time-averaged pattern by GOCI resolves highly polluted regions in the BTH and northeastern China, although the data are insufficient to locate other polluted areas in the south of BTH. The MERRA-2 shows the high AOD values in the comparison domain, with the maxima in BTH and central China. It also shows high values in Manchuria. The assimilation of GOCI AOD () leads to the overall increase of the values from NoDA (). The maximum regions are also consistent with those in MERRA-2. The increment is considerably significant in northern China and Manchuria (). The overall increase in the AOD analysis fields demonstrates the benefit of satellite data assimilation, particularly in sparse observations such as in Manchuria and the regions where the surface observation is unavailable.

Figure 7. Time-averaged AOD over Northeast Asia during KORUS-AQ from (a) GOCI, (b) MERRA-2 reanalysis, (c) NoDA, (d) DA1, (e) DA2, and (f) DA3 experiments. From (g) to (i), the figures show the differences for DA1, DA2, and DA3 from NoDA, respectively. Areas with the available data less than 30% for the time average are eliminated in (a)

Figure 7. Time-averaged AOD over Northeast Asia during KORUS-AQ from (a) GOCI, (b) MERRA-2 reanalysis, (c) NoDA, (d) DA1, (e) DA2, and (f) DA3 experiments. From (g) to (i), the figures show the differences for DA1, DA2, and DA3 from NoDA, respectively. Areas with the available data less than 30% for the time average are eliminated in (a)

Note that the GEOS-5 producing MERRA-2 and the WRF-Chem in this study share the same GOCART aerosol scheme. Comparing MERRA-2 with DA1, we found that MERRA-2 shows much higher sulfate aerosols (not shown), suggesting significant differences in the anthropogenic emission between the analyses. However, one cannot exclude other factors, such as the differences in the assimilated observations (MODIS versus GOCI) and the chemical transport model (GEOS versus WRF-Chem).

The assimilation of surface PM observations () tends to decrease the AOD values, particularly in China (), which is opposite to the case of the GOCI AOD assimilation (). It again suggests that the surface aerosol concentration may not represent well the aerosol concentration in the whole atmospheric column represented by AOD.

When both the satellite AOD and surface PM observations are provided to the data assimilation in DA3 ( and 7i), the results are not much different from the case of DA1. This is somewhat expected because the direct assimilation of the AOD performs best for the AOD analysis. Comparing the AOD values in DA1 (), DA3 () shows a slight decrease of the AOD values by the simultaneous assimilation of satellite data with the surface PM observations. It might suggest that the data assimilation performance is not always improved and often degraded by including additional observations. Our case implies the uncertain and non-linear relationship between AOD and surface PM values.

3.4 Impacts on forecast skills

Air quality forecasts can be improved by initializing the states of the forecast model with more realistic values consistent with the observations through the aerosol data assimilation technique. As the operational air quality forecasts for most countries are conducted for the ground PM concentration (Kukkonen et al. Citation2012; Marécal et al. Citation2015), this study compares the forecast skill for the PM concentration depending on each data assimilation experiment.

Before comparing the forecast skill by different initialization methods, this study first compares the quality of each analysis (i.e. the initial states of the forecasts) for PM10, PM2.5, and AOD, respectively, in the validation area in South Korea. shows the frequency scatter plots between observations and the model analysis from various experiments. This study compares the data assimilation performance quantitatively with two measures: the correlation between the observed and the analysis values and the relative root-mean-squared error (rRMSE) defined as the RMSE between the two divided by the time mean value of the observations.

Figure 8. Frequency scatter plots between observations (x-axis) and the model analysis (y-axis) for (a-d) PM10, (e-h) PM2.5, and (i-l) AOD. The model analysis in (a), (e), and (i) from NoDA, (b), (f), and (j) from DA1, (c), (g), and (k) from DA2, and (d), (h), and (l) from DA3. Frequency indicates the number of samples in each corresponding bin. The values at 00 and 06 UTC values are used for surface PM concentrations, 01 and 07 UTC values for AOD. The correlation (R) and the rRMSE (see the definition in the text) values are indicated in each panel

Figure 8. Frequency scatter plots between observations (x-axis) and the model analysis (y-axis) for (a-d) PM10, (e-h) PM2.5, and (i-l) AOD. The model analysis in (a), (e), and (i) from NoDA, (b), (f), and (j) from DA1, (c), (g), and (k) from DA2, and (d), (h), and (l) from DA3. Frequency indicates the number of samples in each corresponding bin. The values at 00 and 06 UTC values are used for surface PM concentrations, 01 and 07 UTC values for AOD. The correlation (R) and the rRMSE (see the definition in the text) values are indicated in each panel

The comparison provides several insights into the aerosol data assimilation in this study. First, the model backgrounds (NoDA in , e and i) show difficulty in reproducing the observed values. All concentrations simulated without the data assimilation suffer from significant underestimations, particularly in PM10. Considering more heterogeneity in the ground aerosol concentration and contributed by more aerosol species, PM10 seems to show larger errors, comparing with other variables. Second, the data assimilations provide more accurate analyses of the observed values comparing with the model backgrounds. The overall low estimation biases are improved, and the scatter plots exhibit better alignment in the diagonal line. The data assimilation of surface PM observations fits the observed PM values best (e.g. , d, g and h)and the satellite AOD assimilation represents the best to the observed AOD values (). As the AOD validation is based on the independent AERONET data, the best correlation in AOD (r = 0.76) cannot be attained to such high values in PM10 (r = 0.90) and PM2.5 (r = 0.91). The AOD validation may also suffer from the difference in the spatial representativeness between satellite and ground-based AOD observations, as discussed in . This improvement by the data assimilation may not be surprising because the data assimilation works properly in correcting biased model background states. However, note that the satellite AOD data assimilation results in improved surface PM analyses ( and f), particularly in PM2.5. A similar improvement can be found in the AOD analysis by assimilating surface PM observations (). This suggests the beneficial impacts of the 3D-VAR data assimilation methods. The observation data influences all of the model prognostic variables during the minimization of the cost function in EquationEquation (1). Finally, the data assimilation of both satellite AOD and ground PM observations does not necessarily improve individual data assimilation performance. Comparing the cases for PM10 (cf and d), PM2.5 (cf and h), and AOD (cf and l), the changes in the correlation and rRMSE seem to be negligible or even slightly degraded in terms of rRMSE. This suggests that the data assimilation with multivariate observations in this 3D-VAR algorithm may degrade the consistency with the original observations. However, it does not necessarily lead to the forecast skill degradation, as discussed in the next.

shows the forecast skill changes as the forecast lead time increases. In the figure, the forecast skill is measured by the temporal correlation between the observed and the forecast values as a function of forecast lead time. Another measure is the RMSE between the two. This study also conducted the statistical significance by the bootstrapping method.

Figure 9. The changes in the correlation of the surface (a) PM10 and (b) PM2.5 concentration forecasts and the RMSE for (c) PM10 and (d) PM2.5 from the initial state until +24 hr. The forecast skill is the average over all NAMIS stations in South Korea from the forecasts started at 00 UTC every day for 24 hours during KORUS-AQ. Each color line indicates the cases of NoDA (gray), DA1 (blue), DA2 (green), and DA3 (red). Shading indicates the 95% confidence level determined by the statistical significance test using the bootstrap method with 100,000 random samples for each experiment

Figure 9. The changes in the correlation of the surface (a) PM10 and (b) PM2.5 concentration forecasts and the RMSE for (c) PM10 and (d) PM2.5 from the initial state until +24 hr. The forecast skill is the average over all NAMIS stations in South Korea from the forecasts started at 00 UTC every day for 24 hours during KORUS-AQ. Each color line indicates the cases of NoDA (gray), DA1 (blue), DA2 (green), and DA3 (red). Shading indicates the 95% confidence level determined by the statistical significance test using the bootstrap method with 100,000 random samples for each experiment

Generally, the correlation decreases, and the RMSE increases in time due to the systematic biases in the prediction. The forecast skill exhibits a considerable difference depending on the data assimilation method for initialization. The forecast runs started from the NoDA initial conditions show the most inferior performance among the experiments. In the PM10 forecasts, the correlation starts even below 0.3 in the beginning, although there is a rebound of the skill in 12 hours from the initialized. The forecast skill remains less than 0.4, the minimum level for useful skill (Reichler and Roads Citation2005), during most of the forecast lead time. On the other hand, the RMSE tends to decrease in time until 12 h and starts to increase. This rebound of skill and the minimized error in about 12 h suggests the skill recovery due to the model’s capability to reproduce the observed diurnal variation of PM10. The PM2.5 forecasts by NoDA are better than those for PM10, maintaining the skill around 0.5 throughout the forecast time. This is consistent with the results in , where the data assimilation performs better for PM2.5 and provides better initial conditions.

The results in further indicate that the air quality forecast skill is improved by the aerosol data assimilation using satellite and surface observations. Comparing with NoDA, any forecast started from the initial states produced by the data assimilation maintains a higher correlation throughout the forecast time. This indicates that the data assimilation impact, as represented by the correlation difference between DA forecasts and NoDA, tends to last at least 12 hours. When the initial conditions are determined by the data assimilation with both satellite AOD and surface PM observations (DA3), the forecast skill is the best for both surface PM10 and PM2.5. It maintains a correlation higher than 0.4 for all times. The RMSE is increasing as the forecast time increases as the initialization impact fades away. Comparing with DA3, DA2 performs comparable skills, which suggests that the data assimilation of surface PM data is the most important in improving the quality of the surface PM forecasts. Note that the forecast skill of DA3 overpasses the skill of DA2 in a few hours, for example in 12 hours for PM10 and 6 hours for PM2.5 after initialization. This implies that the simultaneous data assimilation with both satellite and surface data can maintain more balanced states among the variables and improve the forecast skill in an extended time.

When the satellite data is assimilated only (DA1), the correlation skill is overall increasing throughout the forecast time, with a slight decrease of RMSE. Comparing with DA2 and DA3, the relatively low forecast skills for the initial few hours are related to the less accurate calibration of surface PM concentrations by the assimilation of the satellite data only (see and f). It indicates that the satellite data also have limitations in representing the local variation of surface PM concentration.

4. Further discussion

This study demonstrates the considerable benefits of using GOCI AOD in the data assimilation and forecasts and the limitations as well. The current data assimilation method in this study could have much room for further improvement. The first is to increase the AOD data availability for assimilation. compares the data availability of GOCI and surface PM observations at the time of data assimilation. The GOCI data covers a broader area ( and b) comparing with the ground data ( and d, although the availability is much less than 70% in most regions. Contrarily, the surface data show much higher availability reaching up to 100% in most areas, but the data are confined within the selected land area. The GOCI data are less available in the upstream region in central China and the south of 35° N, mainly due to clouds, although it provides the data at the Yellow Sea and North Korea, and northwest China, where the surface observations are unavailable. Therefore, our result with the GOCI data assimilation is limited by the data availability, and it has more potential to improve. Using the data assimilation of GOCI in all-sky conditions (Park et al. Citation2020) is expected to improve the performance significantly in this sense.

Figure 10. The GOCI AOD data availability for data assimilation at (a) 00 UTC and (b) 06 UTC within ± 3 hour time window. (c) and (d) indicate the availability of ground PM observations. The data availability is defined as the number of available data for the total required time. The values are smoothed by averaging 100 km by 100 km grids

Figure 10. The GOCI AOD data availability for data assimilation at (a) 00 UTC and (b) 06 UTC within ± 3 hour time window. (c) and (d) indicate the availability of ground PM observations. The data availability is defined as the number of available data for the total required time. The values are smoothed by averaging 100 km by 100 km grids

Another aspect is the discrepancy between the satellite AOD and the surface PM data. shows the vertical increment profiles of the total aerosol mixing ratio. Over China (), the satellite data assimilation tends to increase the aerosol concentration more aloft with a maximum at 3–6 km from the ground, while the increment by the surface PM assimilation is confined within the lowest 6 km above the ground. When both data are assimilated, the increment profiles correct not only the ground level but also in the mid- to upper-tropospheric levels. The results are qualitatively consistent with the findings in Schwartz et al. (Citation2012). The benefit of using both satellite and in-situ observations is to adequately address the local emission-driven air pollution regime and the large-scale transport regime by steering winds in the free atmosphere. On the other hand, the increment profiles over South Korea () show somewhat different features. The increment by the satellite AOD assimilation is relatively weak within the planetary boundary layer below 3 km altitudes and even negative in the middle troposphere. This suggests a difficulty in detecting local pollution contributions by the satellite AOD, and this discrepancy is worth further in-depth analysis. Downscaling the data by deep learning (Li et al. Citation2020) is expected to improve the locality of the satellite AOD.

Figure 11. Time-averaged vertical distributions of increments (analysis minus background) of the total aerosol mixing ratio (unit: ug kg-1) averaged over (a) East China (112–123 E, 30–41 N) and (b) South Korea (126–129 E, 34.3–37.8 N). DA1 (blue) is for the assimilation of GOCI only, DA2 (green) for the surface PM only, and DA3 (red) for both GOCI and surface PM assimilation

Figure 11. Time-averaged vertical distributions of increments (analysis minus background) of the total aerosol mixing ratio (unit: ug kg-1) averaged over (a) East China (112–123 E, 30–41 N) and (b) South Korea (126–129 E, 34.3–37.8 N). DA1 (blue) is for the assimilation of GOCI only, DA2 (green) for the surface PM only, and DA3 (red) for both GOCI and surface PM assimilation

5. Summary and conclusions

This study develops an aerosol data assimilation system based on the WRF-Chem model that simultaneously assimilates the satellite-derived AOD and surface PM10 and PM2.5 observations. The assimilation and forecast skill is evaluated for the KORUS-AQ period (1 May to 10 June 2016) by integrating the analysis and forecast system at 15 km horizontal resolution.

When the verification is conducted in South Korea based on time series, the model background (NoDA) tends to underestimate the observed PM concentrations and AOD, presumably due to the deficiencies in emission and parameterized aerosol chemistry and transport processes by the model. While the data assimilation of surface PM in DA2 directly corrects the errors in the model backgrounds and describes the observed surface PM concentration level realistically, the improvement by the satellite AOD assimilation in DA1 is not trivial in representing surface PM concentration. When both satellite and surface observations are assimilated in DA3, the results show the best fit to the observed time series of the surface PM concentrations for the entire period of KORUS-AQ, which highlights the beneficial impacts of utilizing multiple observations.

Due to the broad coverage of satellite, the GOCI AOD assimilation also provides improved spatial representation of high concentration episodes in South Korea. While the improvement is limited to the land area of China and the Yellow Sea by the surface PM assimilation, the GOCI data assimilation shows significant beneficial impacts in representing large-scale aerosol transport plumes, including the regions of unavailable surface observations.

The data assimilation of GOCI AOD and surface PM observations provides the best analysis in surface PM concentrations in Northeast Asia, but with different aspects of the improvement. The GOCI AOD assimilation helps increase the aerosol amount in most of the domains, consistent with the surface observations and the pattern obtained from the independent MERRA-2 analysis. However, its impacts are rather broad in space showing difficulty in reproducing high surface PM concentration levels in the BTH area of northern China. In addition, the assimilation with GOCI AOD only tends to underestimate the surface PM concentration, particularly PM10, in South Korea. This suggests an intrinsic difficulty in detecting the heterogeneous spatial distribution of surface PM by satellite data. In this regard, the combination of satellite and surface observations represents the best in resolving both broad and local pollution sources.

The data assimilation method produces substantial differences in the air quality forecast skill. Compared with the NoDA initialization, the air quality forecast skill is improved by any data assimilation method, with its impact lasting at least 12 hours or longer. When both satellite AOD and surface PM observations are initialized from the observational data, the forecast skill is the best for surface PM forecasts, maintaining a valuable level of the correlation skill higher than 0.4 for 24 hours.

The improvement of forecast skill by the GOCI AOD only assimilation is less than those expected by the surface PM data assimilation, as it exhibits less accurate calibration of surface PM concentration in the initial states. Nonetheless, it may still demonstrate the potential benefits of satellite data assimilation in improving the forecast skill in the data-sparse areas of surface PM observations. This study indicates two aspects of using GOCI AOD or any other satellite data to enhance aerosol data assimilation and forecast performance in Northeast Asia. First is the relatively low data availability comparing with the surface observations, primarily due to cloud contamination. The use of multiple satellites such as geostationary and polar-orbiting ones may increase the data samples and the quality of assimilation. A recent approach that used the machine learning algorithm to combine satellite information and the chemical transport model to fill the data gap would significantly improve the performance (Park et al. Citation2020). Another aspect is how to utilize the satellite AOD that represents the column-integrated information to the problem of air quality forecasts at the surface level. As shown in this study, two variables do not necessarily go along in the magnitude and temporal variations. One idea is to transform the satellite AOD into the surface PM variables based on the machine learning algorithms (Wang et al. Citation2010; Shin et al. Citation2020), which may reduce the uncertainty in the AOD observation operator in the radiative transfer model and improve the quality of the aerosol data assimilation at the surface by satellite data. This idea is now under test using the identical data assimilation and forecast framework used in this study.

Supplemental material

Supplemental Material

Download JPEG Image (28.6 KB)

Acknowledgements

The model simulations were performed by using the supercomputing resource of the Korea Meteorological Administration (National Center for Meteorological Supercomputer).

Data availability statement

The data that support the findings of this study are openly available in figshare at http://doi.org/10.6084/m9.figshare.14602458.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

This study was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science and Technology (NRF-2021R1A2C1008210)Ministry of Education [NRF-2021R1A2C1008210];

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.