2,352
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Improved PM2.5 concentration estimates from low-cost sensors using calibration models categorized by relative humidity

, , , , , , ORCID Icon & show all
Pages 600-613 | Received 12 Aug 2020, Accepted 04 Jan 2021, Published online: 02 Feb 2021

Abstract

Low-Cost Sensors (LCS) of fine particulate matter (PM2.5) have been widely used to supplement regular air quality monitoring stations. However, the sensor output is impacted by environmental factors, especially relative humidity, and must be calibrated to yield estimates of concentrations. In this study, we evaluate the performance of a linear model and a generalized additive model (GAM) to calibrate the output of LCS PM2.5 measurements in terms of different relative humidity levels. The method is based on co-located measurements from an LCS and a conventional reference monitor at two sites in an urban area in northern China. A stepwise variable selection of air pollutant concentrations and meteorological observations is used to select inputs to the model in order to improve the calibration ability. The results show that when relative humidity is below 75%, linear calibration of LCS PM2.5 observations can output PM2.5 mass concentrations close to the reference method with a correlation coefficient (R2) of 0.86. When relative humidity is above 75%, the GAM calibration model significantly outperforms the linear model, with an R2 of approximately 0.83. Overall, the linear model exhibits good fitness in dry conditions, while the GAM captures PM2.5 variations best in humid conditions. We conclude that low-cost PM2.5 sensors are sensitive to relative humidity and that therefore condition-specific calibration methods need to be used to improve the quality of the data as well as to improve the match with reference measurements.

Copyright © 2021 American Association for Aerosol Research

EDITOR:

1. Introduction

Fine atmospheric particulate matter (PM2.5, particulate matter with aerodynamic diameter ≤2.5 µm) containing complex chemical compounds, such as water-soluble ions, elemental carbon, organic carbon, and insoluble minerals, and with a large surface area to absorb toxic substances (Y. Zhang et al. Citation2016) pose a significant threat to human health (H. H. Zhang et al. Citation2018). Moreover, because of the small particle size, it can penetrate human respiratory tracts, alveoli, and even the blood stream through breathing, causing respiratory, stroke, lung, and cardiovascular diseases, as well as premature death (Apte et al. Citation2015; Brook et al. Citation2010; Pope et al. Citation2004; Shang et al. Citation2018; L. Yang et al. Citation2012). In the Global Burden of Disease (GBD) 2015 risks assessment, exposure to PM2.5 was estimated to have caused 4.2 million deaths and was ranked as the fifth mortality risk factor worldwide (Cohen et al. Citation2017).

In order to reduce human PM2.5 exposure and the associated health risks, government agencies have committed to strengthening air particulate matter (PM) pollution monitoring and formulating a series of pollutant emission reduction plans (Tian et al. Citation2019; Y. Wang et al. Citation2016; Q. Zhang et al. 2019). Conventionally, PM2.5 concentrations are usually measured by fixed monitoring stations with advanced instrumentation, such as the U.S. EPA Federal Reference Methods (FRM) and Federal Equivalent Methods (FEM) (U.S. Environmental Protection Agency Citation2013). In China, the equivalent reference instruments usually include Tapered Element Oscillating Microbalance (TEOM) and Beta Attenuation Monitor (BAM) (Ministry of Ecology and Environment of China Citation2018). Scientific monitors ensure the quality of the measurements, but they are sparsely deployed in urban areas due to their high cost and necessary professional maintenance (Steinle, Reis, and Sabel Citation2013). An increasing number of studies show that the limited temporal and spatial resolution of regular monitoring sites are insufficient to capture the variability of atmospheric pollutants as well as detailed exposure concentrations. This makes it difficult to quantify human exposure for health risk assessments in epidemiological research (Apte et al. Citation2017; Ye et al. Citation2018). Therefore, Low-Cost Sensors (LCS) using optical PM2.5 measurements rapidly became popular in order to supplement the deficiencies of conventional monitoring instruments. They are typically characterized by small size, lightweight, low power consumption, and easy maintenance (Badura et al. Citation2018). They offer promise for dense air quality monitoring networks, real-time pollution hotspot identification, and refined human exposure assessments (Gao, Cao, and Seto Citation2015; Pitchford et al. Citation2007; N. Zikova, Masiol, et al. Citation2017; Zusman et al. Citation2020).

However, the biggest limitation of low-cost light-scattering PM2.5 sensors is that the accuracy and precision of the results are sensitive to particle concentrations, chemical composition, and meteorological conditions. Interference due to temperature and relative humidity (RH) is particularly common. The data needs to be calibrated and the data quality of the measurements is controversial (Liu et al. Citation2017; Woodall et al. Citation2017; Zheng et al. Citation2018; Nadezda Zikova, Hopke, et al. Citation2017). Even though LCS performance has been estimated in the laboratory under particular conditions (Sayahi et al. Citation2019a), concerns remain about their performance in real-world environments (Cordero, Borge, and Narros Citation2018; Li and Biswas Citation2017).

Field calibration is used to obtain the best possible estimates of PM2.5 concentrations in ambient air, but getting accurate results can be particularly challenging under extreme meteorological conditions or during extreme pollution episodes. Previous studies have examined the performance of numerous field calibration models by comparing low-cost PM sensors to standard particle concentration measuring instruments. The most widely used model is linear regression, which assumes that there is a linear relationship between LCS data and research-grade instrument measurements (Jovasevic-Stojanovic et al. Citation2015; Nakayama et al. Citation2018). Relative humidity plays an important role in the calibration process as it leads to particle growth through moisture absorption as well as the formation of water droplets that can be detected as aerosols. Badura et al. (Citation2019) suggested that temperature and relative humidity should be used as inputs to the linear model in order to improve the calibration performance. However, it is thought that the response of LCS to relative humidity is non-linear in Di Antonio et al. (Citation2018), the PM2.5 concentrations reported by the PMS1003 were found to increase exponentially with relative humidity up to 75% (Jayaratne et al. Citation2018). To address this problem, non-linear models, and machine learning techniques, have been applied in order to obtain a better match with measurements from reference instruments (Topalović et al. Citation2019; Y. Wang et al. Citation2019). Because the sensor response varies under different target humidity conditions, it is necessary to evaluate whether the same field calibration method is applicable in all cases (Kelly et al. Citation2017; Zusman et al. Citation2020). Few studies have comprehensively considered the accuracy and efficiency of humidity-specific calibration models.

Our study aims to explore the performance characteristics of field calibration methods using both a linear model and a Generalized Additive Model (GAM) for PMS5003 sensors’ PM2.5 measurements under different relative humidity levels in a 37-day monitoring campaign. The GAM was used to capture non-linear associations of hourly concentrations of PM2.5 measurements with potential impactors, which has been shown to perform well (Lee, Wang, and Yu Citation2019). Additional impactors, such as meteorological and gas pollutant parameters, were introduced in order to improve the performance of the calibration model.

2. Materials and methods

2.1. Measurements

The field campaign was carried out from 11/01/2017 to 12/07/2017 during the heating season of northern China, which experiences high PM2.5 mass concentrations. The monitoring sites (), named as Lvdi (LD) and Quanhe (QH), were in Wenshang County, Shandong Province, China. The two sites were approximately 2 km apart and surrounded by similar land use types. They are both located in urban residential areas, close to major roadways, and were probably both impacted by coal combustion and traffic emissions. Therefore, we merged the original measurements of the two sites to a single dataset in order to carry out a single calibration model for both time series at once.

Figure 1. Map of Wenshang County, Shandong Province, China, with the triangles showing the two sites with co-located reference monitoring stations and LCS which are named Lvdi (LD) and Quanhe (QH), respectively, representing urban residential and traffic influenced environment.

Figure 1. Map of Wenshang County, Shandong Province, China, with the triangles showing the two sites with co-located reference monitoring stations and LCS which are named Lvdi (LD) and Quanhe (QH), respectively, representing urban residential and traffic influenced environment.

In our study, a portable commercial measurement station () provided by Zhongke YunTian Environmental Protection Technology Company (located in Jining, China) was deployed at both the LD and QH sites. Each station consisted of a particulate matter monitoring module, gas pollutant monitoring modules, meteorological parameter sensors, a wireless communication module, a power supply, and a power management unit. Overall, an array of sensors was used to measure PM2.5, carbon monoxide (CO), temperature, relative humidity, wind speed, and pressure. The sensor type of PM2.5 measurements is PMS5003 (about USD 30), which was manufactured by Plantower, China. It is based on the light scattering of aerosols. The PMS5003 photodiode detector measures the intensity of scattered light after a laser beam passes through the air sample. This is used to estimate the equivalent grain size of particles, and the number of particles per unit volume by different particle diameters. The estimates are calculated using Mie theory as described on the website (http://www.plantower.com/). As with most sensors, the portable measurement station has no heating equipment at the inlet to remove water from the measurements and therefore relative humidity probably has an impact on particle refractive properties, and, thus, affects the optical information received by the sensors (Manikonda et al. Citation2016). Before field deployment, a trial operation of portable measurement station was implemented in which hourly data was collected for 10 days in order to ensure the efficiency and stability of every sensor, and no extra calibration was made due to the difference between laboratory and field conditions. During the monitoring period, the portable measurement stations were checked once a week to confirm normal operation and continuous data collection. The values were provided both raw PM2.5 and ambient corrected PM2.5 (Barkjohn et al. Citation2020; Tryner et al. Citation2020), the analysis in this study was based on the raw PM2.5 concentrations reported by the PMS5003 without ambient corrections.

Figure 2. Left: Example of the monitoring airbox which contains an array of sensors measuring PM2.5 (Plantower PMS5003), CO, temperature, relative humidity, wind speed, and pressure. Both are mounted to the rooftop of the building, where there are no specific source emissions from the surrounding area. Right: Internal view of the equipment box.

Figure 2. Left: Example of the monitoring airbox which contains an array of sensors measuring PM2.5 (Plantower PMS5003), CO, temperature, relative humidity, wind speed, and pressure. Both are mounted to the rooftop of the building, where there are no specific source emissions from the surrounding area. Right: Internal view of the equipment box.

Each portable measurement station was deployed alongside a conventional instrument: The Met One Beta Attenuation Monitor (BAM) operating using the U.S. EPA Federal Equivalent Method (FEM) for continuous PM2.5 monitoring. The instrument has a dryer at the inlet to remove water vapor, which minimizes the influence from relative humidity (Chung et al. Citation2001). The conventional instrument was operated and maintained by the Jining Ecological and Environmental Agency, and hourly values of PM2.5, CO, nitrogen dioxide (NO2), sulfur dioxide (NO2), ozone (O3), temperature, relative humidity, wind speed, and pressure were available. Note that in the following section the terms PM2.5Ref, CORef, NO2Ref,SO2Ref,O3Ref, TRef,RHRef, WSRef, and PRef are used to represent the reference stations’ measured parameters of PM2.5, CO, NO2, SO2, O3, temperature, relative humidity, wind speed, and pressure, respectively. The terms PM2.5LCS, COLCS,TLCS,RHLCS, WSLCS, and PLCS represent the LCS variables of PM2.5, CO, temperature, relative humidity, wind speed, and pressure, respectively. LCS measurements were averaged to hourly values from one-minute resolution in order to match with the reference measurements. Hours with missing values were removed from the analysis, resulting in 87% data completeness.

2.2. Data analysis and LCS PM2.5 calibration models

2.2.1. Data analysis

Our monitoring period experienced frequent elevated PM2.5 concentrations as well as high relative humidity weather (). Time series () of sensors and BAM PM2.5 mass concentrations showed that peak PM2.5 events corresponded to episodes with high relative humidity (≥75%). Jayaratne et al. (Citation2018) found a marked increase of PM2.5 concentrations reported by PMS1003 once relative humidity exceeded 75%. This happens when substances in the particles such as sodium chloride exceed their deliquescence point. The ratio of the PMS5003-reported PM2.5 concentration to the dry TEOM-reported PM2.5 concentration increased sharply for RH above about 74% (Tryner et al. Citation2020). Cruz and Pandis (Citation2000) also report that sodium chloride and ammonium sulfate particles reached their deliquescence point when the relative humidity is above 75%. For substances with a deliquescence point, when the relative humidity exceeds the deliquescence point, the particulate matter will quickly absorb moisture, the scattering coefficient suddenly increases, and hygroscopic growth shows a significant jump with the relative humidity. Visual inspection of Figure S2 in the online SI shows that there is a difference in the characteristics of the measurements below and above an RH of approximately 75%. Above this threshold, the overestimation of LCS measurements in our field experiment increased rapidly with increasing RH (Figure S2a in the online SI). We therefore decided to separate the dataset into two groups according to whether the values of LCS relative humidity were below or above 75%. The first group corresponds to dry conditions and contained 1343 observations. The second group corresponds to humid conditions (RHLCS values that are equal to or greater than 75%) and contained 207 data points.

Figure 3. Time series of hourly PM2.5 mass concentrations measured by BAM and LCS along with relative humidity, measured between 1 November and 7 December 2017 at the two sites. The red shadow area represents the period of relative humidity measured by sensors as equal to or greater than 75%. (Upper: LD site; lower: QH site.)

Figure 3. Time series of hourly PM2.5 mass concentrations measured by BAM and LCS along with relative humidity, measured between 1 November and 7 December 2017 at the two sites. The red shadow area represents the period of relative humidity measured by sensors as equal to or greater than 75%. (Upper: LD site; lower: QH site.)

2.2.2. Variables selection method for calibration models

Calibration models were performed using PM2.5Ref as the dependent variable and PM2.5LCS as the independent variable along with the additional calibration predictors shown in . The additional predictors were selected using a stepwise forward process, which began with an empty model and gradually added one variable at a time as an input to the linear model or GAM. In each step, the variable that contributes to the greatest improvement in the coefficient of determination (R2) was included (de Foy et al. Citation2018). The variable selection was carried out iteratively. With the increase of numbers of the input variables, the R2 increased, however, the inclusion of reductant variables with low R2 contributions would lead collinearity problems. By visual inspection, we can estimate the position of an inflection point in the curvature of R2 vs. number of variables shown in Figure S5 in the online SI, and we use this to determine the optimal number of inputs to the model. The threshold at the inflection point was about 0.005 and 0.03 for dry and humid conditions, respectively (Figure S5 in the online SI). Therefore, for the model for dry conditions, as long as the new input variable contributed to an increase in R2 of at least 0.005 it was retained in the list of model inputs, for humid conditions we used a threshold of 0.03. The difference arose mainly because there was much better agreement between the LCS and reference measurements during the dry conditions than during the humid conditions and will be discussed further in Section 3.2.

Table 2. List of calibration models developed in this study using linear and GAM calibration models under dry and humid conditions, along with input variables used for each model and corresponding statistical performance metrics.

The predictors were normalized to obtain zero mean and unit standard deviation so as to reduce the effects of extreme observations. All available measurements were used as candidate inputs to the GAM in order to identify existing dependencies in the measurements. In the future, we could limit the candidate inputs to just the LCS measurements in order to develop an operational correction algorithm for the LCS product as was done for example in (Barkjohn et al. Citation2020).

2.2.3. Linear calibration based on reference measurements

The linear model was used to establish the relationships between the target variable and one or more predictors, which is one of the most widely used methods for calibrating LCS measurements. First, we derived the calibration formula by fitting a linear regression model to hourly LCS and co-located BAM PM2.5 values. The equation is as follows: (1) PM2.5Ref=α0+α1×PM̂2.5LCS+ε1(1) where PM2.5Ref (PM2.5 mass concentrations measured by BAM) represents the dependent variable; PM̂2.5LCS is the LCS PM2.5 used as the predictor variable and normed to have zero mean and unit standard deviation; α0 and α1 are the intercept and slope of linear regression; ε1 denotes the random residual.

The linear calibration model with stepwise forward selection of variables is an expansion of the linear regression model using the following equation: (2) PM2.5Ref=α0+α1×x1̂+α2×x2̂++αn×xn̂+ε1(2) where n is the number of included predictors; x1̂ to xn̂ are the normalized values of the predictors x1 to xn; and α1…αn are the regression coefficients corresponding to each predictor.

2.2.4. Gam calibration based on reference measurements

We tested the non-linear relationships of the PM2.5 concentrations to several predictors by using the GAM (Hua et al. Citation2021) calibration method which is a nonparametric regression method. In order to obtain an optimal model, identical to the linear calibration model, we first input only PM2.5LCS as a predictor, and then we included variables based on a stepwise forward selection algorithm. The smooth function can be described as (Eilers and Marx Citation1996; Wood and Augustin Citation2002): (3) PM2.5Ref=β0+s(PM̂2.5LCS)+ε2(3) where β0 is the intercept for GAM, s(PM̂2.5LCS) is the smooth function of normalized low-cost PM2.5 sensors measurements, and ε2 are the residuals. (4) PM2.5Ref=β0+s(x1̂)+s(x2̂)++s(xn̂)+ε2(4) where s(·) is the P-spline smoothing functions which optimize the fitting and control the smoothness through a penalty term (Eilers and Marx Citation1996), s(x1̂) to s(x2̂) are the smoothers which characterize the effects of the normalized x1, x2, and xn on the reference PM2.5 measurements, and n is the number of selected predictors.

2.3. Calibration performance metrics

The performance of calibration models was assessed using the coefficient of determination (R2) and root mean square error (RMSE) statistics. R2 measures relationships between calibrated LCS and reference BAM PM2.5 values. The value is between zero and one, and closer to one reflects better fitting quality. RMSE provides a measure for error between sensors and standard monitors measurements (Feenstra et al. Citation2019). The equations for R2 and RMSE are as follows: (5) Coefficient of determination (R2) =1i=1n(xi yi)2i=1n(yiy¯)2(5) (6) Root mean square error (RMSE) =i=1n(xiyi)2n(6) where xi is the calibrated values of LCS PM2.5, yi is the PM2.5 values measured by research-grade BAM,y¯ is the average concentration of BAM PM2.5, and n is the number of measurements.

Bootstrapping was used to estimate the uncertainty of the linear calibration model. The regression model was obtained 100 times using a randomly resampled dataset each time. When resampling the dataset, the data to be included were selected at random with replacement so that each dataset was of the same size as the original. The models were tested with 1-h to 24-h consecutive selections and found the results to be stable over time (Table S1). The uncertainty in the model coefficients was obtained from the variance of the 100 model simulations (de Foy and Schauer Citation2015).

Table 1. Statistical summary of measured parameters under dry and humid conditions. The parameters include PM2.5, wind speed (WS), air temperature (T), relative humidity (RH), surface pressure (P) and carbon monoxide (CO), respectively, from BAM and LCS, and SO2, NO2, and O3 from BAM.

Data preprocessing, data analysis, linear model and GAM calculation, and model validation process were performed using the R environment (Version 3.6.3) (R Core Team Citation2019) with the “reshape,” “ggplot2,” “mgcv,” and “boot” packages.

3. Results and discussion

3.1. Time series analysis

The time series of hourly BAM and LCS PM2.5 mass concentrations, along with LCS RH values at the LD and QH sites, can be seen in . Other environmental factors reported by the BAM instrument at LD are also provided in Figure S1 in the online SI. It shows that LCS PM2.5 values agree quite well (R2 = 0.74) with BAM PM2.5 measurements. The LCS PM2.5 measurements somewhat overestimate and are sensitive to sudden peaks in values during the monitoring period, which was also found in a previous study in Salt Lake City, USA (Sayahi, Butterfield, and Kelly Citation2019b). Compared to the BAM, LCS overestimated PM2.5 mass concentrations by a factor of about 1.4 in dry conditions and about 2.0 in humid conditions (Figure S2a in the online SI). The BAM instrument is using the sharp cyclone to remove particles larger than 2.5 µm. In contrast, the PMS5003 screened for PM2.5 by estimating the particle size from light scattering. Based on Mie theory and assumptions of spherical particles and specific refractive index, the algorithm estimates the particle size from the amount of light of a known wavelength scattered at a fixed angle. The optical equivalent diameter of the particle sensor is usually calibrated using polystyrene standard particles and an assumed density of 1.5 g/cm3. Although the sensors were calibrated by Plantower using standard particles in the laboratory, these overestimation factors will be influenced by differences in real-world conditions as well as varying protocols from different manufacturers (Pawar and Sinha Citation2020). The methods used for distinguishing PM2.5 from larger particles are different for each instrument and are a known source of discrepancy. However, there is no clear proxy to estimate the time-varying size of this effect and so, in this study, any systematic biases introduced in the measurements can only be accounted for at the same time as the other sources of error by the linear and GAM models.

The overestimated factors are similar to the results of (Sayahi, Butterfield, and Kelly Citation2019b), but are slightly greater in humid conditions. This is probably because the chemical composition of the aerosols in our study are different from those of Salt Lake City leading to changes in the instrument response as a function of relative humidity. In China, PM2.5 contains high abundances of organic matter during haze events (An et al. Citation2019). Furthermore, the stagnant weather and rapid increase of RH enhance particle hygroscopic growth (Liu et al. Citation2011). In the Salt Lake area, particles are primarily composed of secondary inorganic aerosol, especially ammonium nitrate, which is mainly related to inversions (Kuprov et al. Citation2014). High RH during our study was therefore a likely cause of the LCS overestimation compared with the reference measurements. Compared to the high RH environment, PM2.5 concentrations measured by LCS and BAM are more consistent in a lower RH environment. The correlation coefficient value is 0.91 (Figure S3 in the online SI), and is close to the results of (Sayahi, Butterfield, and Kelly Citation2019b).

The statistical summary of measured parameters is shown in . In dry and humid conditions, average LCS PM2.5 concentrations are 97.0 µg/m3 and 186.0 µg/m3, while average BAM PM2.5 concentrations are 70.9 µg/m3 and 101.8 µg/m3, respectively. Increasing RH drives particles hygroscopic growth, augments the PM2.5 mass concentrations (An et al. Citation2019; Tie et al. Citation2017), affects the responses from the light scattering of sensors, results in a decrease of monitoring accuracy, and causes greater variation of PM2.5 measurements of LCS (Yang Wang et al. Citation2015).

In light of the different degrees of response to the sensors due to different humidity conditions, the relationship between the independent variables and the dependent variable is more complicated, Figure S2a in the online SI shows that PM2.5LCS exhibits a nonlinear response when relative humidity exceeds 75%. Therefore, we divided the observations into two groups based on RH values, and evaluated the applicability of linear and GAM calibration methods separately for each group. Distribution of PM2.5 mass concentrations by BAM and LCS under dry and humid conditions is shown in Figure S4 in the online SI. The variance of LCS PM2.5 measurements is higher than that of BAM in humid conditions. This is the main reason for the greater uncertainty of the calibration results, as well as for the lower calibration accuracy of humid conditions as opposed to dry conditions.

Figure 4. Comparison (a) uncalibrated PM2.5 mass concentrations, (b) calibrated PM2.5 values with linear model only included one variable, (c) calibrated PM2.5 values with linear model included three variables, (d) uncalibrated PM2.5 mass concentrations, (e) calibrated PM2.5 values with GAM calibration only included one variable, and (f) calibrated PM2.5 values with GAM calibration included four variables fitted to reference monitors hourly PM2.5 mass concentrations in dry conditions. R2 represents the regression coefficients between all LCS PM2.5 measurements and reference BAM PM2.5 values, and the line represents 1:1 fitting line.

Figure 4. Comparison (a) uncalibrated PM2.5 mass concentrations, (b) calibrated PM2.5 values with linear model only included one variable, (c) calibrated PM2.5 values with linear model included three variables, (d) uncalibrated PM2.5 mass concentrations, (e) calibrated PM2.5 values with GAM calibration only included one variable, and (f) calibrated PM2.5 values with GAM calibration included four variables fitted to reference monitors hourly PM2.5 mass concentrations in dry conditions. R2 represents the regression coefficients between all LCS PM2.5 measurements and reference BAM PM2.5 values, and the line represents 1:1 fitting line.

3.2. Explanatory variables considered in calibration models

Figure S3 in the online SI shows the correlation coefficients (R) of all measured parameters in different relative humidity conditions. PM2.5 measurements of LCS correlated very well with conventional monitors, demonstrating that the PMS5003 sensor is a useful tool for PM2.5 monitoring. PM2.5 was significantly positively correlated with relative humidity due to particle hygroscopic growth; droplets are easily detected by sensors and cause high overestimation in the measurements (Pawar and Sinha Citation2020). PM2.5 was negatively correlated with wind speed, because in China haze episodes are usually accompanied by high humidity and stagnant weather (Sun et al. Citation2019). Wind speed can be an indirect factor in intensifying secondary conversion of gaseous pollutants and increasing moisture absorption of particulate matter. In addition, temperature might affect sensor performance and should be considered (Malings et al. Citation2020). PM2.5 exhibits positive associations with CO, SO2, and NO2, which indicates that residential emissions, such as coal combustion and biomass burning for residential heating and cooking as well as transportation emissions have a large contribution to PM2.5 pollution in the winter for the study area. The gas pollutants serve as indicators of different sources which have key impacts on PM2.5 composition, and hence influence aerosol morphology and optical properties (Li et al. Citation2016).

Because there are high correlation coefficients between multiple predictors, the models run the risk of overfitting. This is especially the case if the parameters obtained by BAM and LCS monitoring are introduced at the same time. We therefore use a stepwise forward variable selection algorithm to screen the input variables. All variables in the candidate list are tested in the model individually. The predictor leading to the greatest increase in R2 is added to the list of model inputs. This process is repeated so long as the R2 is increased by 0.005 for the humid model and 0.03 for the dry model. This method not only controls the number of variables and reduces the model complexity, but also avoids introducing highly correlated explanatory variables, which improves model robustness. The increase in model R2 as a function of the number of input variables is shown in Figure S5 in the online SI for the linear and GAM models under dry and humid conditions. Under dry conditions, when the number of input variables is greater than three, the increase of R2 value tends to be low. Under humid conditions, when the number of variables is greater than four, the increase of R2 value is low. The final variables selected are shown in . For the dry conditions, the linear model includes PM2.5LCS, TLCS, and RHLCS, and the GAM model includes PM2.5LCS, TLCS, RHLCS, and CORef. For the humid conditions, the linear model includes PM2.5LCS, WSRef, RHLCS, and COLCS, and the GAM model includes PM2.5LCS, RHLCS, WSLCS, and TLCS.

Figure 5. Comparison (a) uncalibrated PM2.5 mass concentrations, (b) calibrated PM2.5 values with one-variable linear model, (c) calibrated PM2.5 values with three-variable linear model, (d) uncalibrated PM2.5 mass concentrations, (e) calibrated PM2.5 values with one-variable GAM calibration, and (f) calibrated PM2.5 values with four-variable GAM calibration model fitted to hourly PM2.5 values of the reference monitor in humid conditions. Correlation coefficients (R2) are shown between the uncalibrated and calibrated LCS PM2.5 measurements with the reference BAM PM2.5 values. 1:1 line of best fit is shown in each graph.

Figure 5. Comparison (a) uncalibrated PM2.5 mass concentrations, (b) calibrated PM2.5 values with one-variable linear model, (c) calibrated PM2.5 values with three-variable linear model, (d) uncalibrated PM2.5 mass concentrations, (e) calibrated PM2.5 values with one-variable GAM calibration, and (f) calibrated PM2.5 values with four-variable GAM calibration model fitted to hourly PM2.5 values of the reference monitor in humid conditions. Correlation coefficients (R2) are shown between the uncalibrated and calibrated LCS PM2.5 measurements with the reference BAM PM2.5 values. 1:1 line of best fit is shown in each graph.

PM2.5LCS is positively related to PM2.5Ref showing that LCS and reference measurements are consistent (). To interpret the coefficients of the other input variables to the GAM, we need to consider them as contributions to the under and overestimation of PM2.5Ref by PM2.5LCS. Figure S8 in the online SI shows the GAM-derived relationship of the inputs to the model under dry conditions. T is positively related to PM2.5Ref showing that at higher temperatures the LCS concentrations are too low. This could partly be because under conditions of temperature inversions (cold temperatures) when there are haze events, the LCS measurements are too high relative to the reference measurements. It could also be partly a function of the large negative association of RH with PM2.5Ref: For high RH conditions LCS measurements are too high and need to be adjusted downwards. This is consistent with what we know about particle growth and the over-detection of droplets by optical methods relative to mass-based reference methods. CO shows a non-linear relationship with a decrease in the response function at high CO concentrations. Again, this suggests that the LCS measurements overestimate the PM2.5 concentrations during very polluted episodes when both the aerosol and CO concentrations are high.

Figure S9 in the online SI shows the non-linear relationships of the input variables for humid conditions. The dominant feature is the negative relationship of RH to PM2.5Ref which is considerably larger than under dry conditions. This underscores the overestimation of the LCS instrument at high RH conditions discussed above. In comparison, the relationship to WS is weak with downward adjustments of PM2.5LCS at both low and high wind speeds. PM2.5LCS needs to be adjusted down at cold temperatures and up at high temperatures. For cold temperatures this is most likely a function of overestimation during haze events. For high temperatures this could be a result of covariation between events with high RH and high T. The large downward adjustments at high RH conditions are slightly offset by an improved dependency on T for warm events.

The calibration models identify PM2.5LCS as the most important input variable for the models of PM2.5Ref, which explained approximately 82% of the variability under dry conditions and 50% under humid conditions of the reference PM2.5 concentrations. There is an exception to this however: under humid conditions, the one variable linear model for PM2.5Ref is more accurate using COLCS as an input. This is due to the high association between the emissions of CO and PM2.5 as a result of coal-fired heating in cities in northern China in the winter, which is the prime contributor to high concentrations of PM2.5 (J. Yang et al. Citation2019).

3.3. Comparison the performance of two calibration methods

The statistical results of the linear and GAM calibration models under different humidity conditions are shown in . Furthermore, and show the relationships between reference PM2.5 measurements and calibrated outputs for dry and humid conditions, respectively. Under dry conditions, uncalibrated PM2.5 measurements show good agreement (R2=0.82) with the measurements from the standard monitors, the linear model with three variables has an R2 of 0.86 compared with 0.82 for the one-variable model. The GAM model with four variables has an R2 of 0.88 compared with 0.82 for the one-variable model. Under humid conditions, the linear model with three variables has an R2 of 0.75 compared with 0.50 for the one-variable model. The GAM model with three variables has an R2 of 0.83 compared with 0.52 for the one-variable model. Compared with the linear model, the R2 of the GAM model increases significantly with an increase number of variables for humid conditions. Both meteorological and chemical parameters in the GAM calibration model lead to PM2.5 concentrations that have a higher R2 with the reference PM2.5 mass concentrations.

The R2 of the GAM model is higher than the linear model, but its limitations are that the model is more complex, has a higher computational cost and can be more difficult to interpret. Considering calculation efficiency and calibration accuracy, the results from this study suggest that a linear calibrated model is most appropriate for LCS PM2.5 measurements in dry conditions while a GAM calibration is more applicable in humid conditions. After calibrating the data, the high overestimates present at high values of RH were eliminated such that the relationship between LCS measurements and BAM measurements was closer to 1:1 (Figure S2b in the online SI).

3.4. Uncertainty analysis

We estimated the uncertainty of linear calibration by performing 100 bootstrapped realizations. This method randomly samples the data with replacement in order to keep a dataset with the same number of observations as the initial dataset (which may, as a consequence, include individual data points multiple times). The standard deviation (SD) of the regression coefficients were calculated from bootstrapping, as shown in Table S2 in the online SI. And Figure S6 and Figure S7 in the online SI represent the histograms and scatterplots of the regression coefficients using bootstrapping of the linear calibration for dry and humid conditions, respectively. The histograms indicate the uncertainty of coefficients and the scatterplots show the limited relationships between input variables. Most pairs of coefficients are not correlated with each other. The most correlated pairs are PM2.5LCS and RHLCS in the dry conditions with an R2 of 0.32. In humid conditions, the largest R2 value is between PM2.5LCS and COLCS with an R2 of 0.58, followed by PM2.5LCS and WSRef with an R2 of 0.27. The correlations in the accepted intervals and the variables are all kept in the final model. The histograms show greater uncertainty of regression coefficients in humid conditions than in dry conditions. In dry conditions, the concentrations variance with per standard deviation change for PM2.5LCS, TLCS, and RHLCS was around 5%, 9% and 30%, respectively. In humid conditions, the concentrations variance with per standard deviation for PM2.5LCS, WSRef, RHLCS and COLCS, was around 25%, 40%, 25%, and 60%, respectively. Overall, there is less uncertainty for the linear calibration model in dry conditions than in humid conditions.

Figure S8 and Figure S9 in the online SI show the nonlinear relationship of the inputs for the GAM models during the dry and humid conditions, respectively. The dotted line represents the 95% confidence interval of the model results. For the dry conditions, the uncertainty range is very narrow, showing that the GAM model is robust. Furthermore, PM2.5Ref and PM2.5LCS are linearly correlated with R2 equal to 0.82 showing that there is an excellent correspondence between the two. Under humid conditions, the uncertainties are larger, but the model remains very robust. PM2.5LCS plays a dominant role, but the importance of covariables are increased. In particular, relative humidity shows a non-linear relationship.

The collinearity was tested based on VIF (variance inflation factor) for linear calibration models and “concurvity” function in “mgcv” package in R was used for GAM calibration. For linear calibration, the values of VIF are less than 5 for all variables and lower than 2 for most variables (Table S3), thereby demonstrating that collinearity is within an acceptable range (Menard Citation2001). The “concurvity” function measures how a smoothed variable can be approximated by other smoothed variables in the model inputs. The results suggest values of concurvity above 0.5 start to introduce noticeable errors (Barton, Farewell, and Hallett Citation2020). The collinearity test results (Table S3) showed that all values were less than 0.5 and that there is no excessive collinearity for the GAM calibrations.

4. Conclusions

In this study, hourly PM2.5 measurements from two LCS instruments had a good agreement with co-located conventional air quality monitors although the LCS concentrations were somewhat overestimated. Relative humidity had a large impact on both the properties of the particles and on the response of the low-cost optical sensors with different behaviors above a cutoff of 75% relative humidity. Under dry conditions, there was a strong linear relationship of the PM2.5 mass concentrations between the LCS measurements and the reference monitors. Under humid conditions, the low-cost PM2.5 sensors showed a nonlinear response to relative humidity. Therefore, we assessed the performance of linear and GAM calibration methods of low-cost PM2.5 sensors measurements, categorized by relative humidity. The results suggested that including air pollutant concentrations and meteorological variables in the model would significantly improve the accuracy of the post-calibrated LCS PM2.5 values. In dry conditions, multiple linear calibration was found to be an effective way to get satisfactory calibration results, while, in humid conditions, GAM calibration models would be closer to the reference PM2.5 measurements. This study demonstrates that improved calibration methods are effective in producing more accurate estimates of PM2.5 concentrations using low-cost sensors in urban ambient environments with elevated PM2.5 pollution and varying relative humidity conditions.

Supplemental material

Supplemental Material

Download MS Word (1.7 MB)

Acknowledgments

We would like to thank Philip K. Hopke for helping improve the article and thank Sophia Wells for providing language help. We are grateful for the anonymous reviewers for their valuable comments which very helpful for improving the manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Natural Science Foundation of China (NSFC, No. 41877310), and partly by the National Key Research and Development Program of China (No. 2016YFC0503600).

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.