1,365
Views
14
CrossRef citations to date
0
Altmetric
Regular Articles

Using Statistical Regressions to Identify Factors Influencing PM2.5 Concentrations: The Pittsburgh Supersite as a Case Study

, &
Pages 766-774 | Received 15 Sep 2009, Accepted 18 Apr 2010, Published online: 27 Jul 2010

Abstract

Using data from the Pittsburgh Air Quality Study, we find that temperature, relative humidity, their squared terms, and their interactions explain much of the variation in airborne concentrations of PM 2.5 in the city. Factors that do not appreciably influence the concentrations over a full year include wind direction, inverse mixing height, UV radiation, SO 2 , O 3 , and season of the year. Comparison with similar studies of PM 2.5 in other cities suggests that the relative importance of different factors can vary greatly. Temperature and relative humidity are important factors in both Pittsburgh and New York City, and synoptic scale meteorology influencing these two sites can explain much of the pattern in PM 2.5 concentrations which peak in the summer. However, PM 2.5 levels in other cities have different seasonal patterns and are affected by a number of other factors, and thus the results presented here cannot be generalized to other locations without additional study.

INTRODUCTION

The Pittsburgh Air Quality Study (PAQS) was a $7 million program funded by the U.S. Environmental Protection Agency and the U.S. Department of Energy. The main data collection effort was conducted at the Pittsburgh Supersite in Schenley Park, adjacent to the Carnegie Mellon University campus (CitationWittig et al. 2004). The overall goal of the study was to improve our understanding of the factors affecting concentrations of airborne particulate matter, especially PM2.5, in eastern urban areas such as Pittsburgh. PM2.5 is important because high concentrations have been associated with health effects, materials damage, ecosystem damage, and reduced visibility. PM2.5 is also believed to contribute to climate change (CitationSeinfeld and Pandis 2006).

In a previous paper (CitationChu et al. 2009), we describe a simple statistical procedure to identify specific sources that are likely contributing to PM2.5 on days of elevated concentration. The procedure involves determining an effective wind direction over an averaging period that depends on estimated transport times, and examining patterns in wind direction frequency associated with periods of high PM2.5 concentration.

We apply the method to days of high concentration in the PAQS dataset to reach the conclusion that a combination of elevated background PM2.5 levels and local emissions from a few large sources in the Pittsburgh area causes exceedances of the National Ambient Air Quality Standard. It is likely that the background concentrations consist mostly of secondary particles created from gaseous precursors emitted from sources far upwind, while the local emissions are mostly primary particles.

Here we use the full PAQS dataset, which covers low as well as high PM2.5 concentrations, to explore the importance of several explanatory variables, including temperature, relative humidity, wind direction (for windspeeds greater than 1 m/s), UV radiation, SO2 concentration, O3 concentration, mixing height, and season of the year, as well as interactions among some of these variables. The regressions and statistical tests are used to identify those variables that have a major affect on PM2.5 concentrations as well as on changes in PM2.5 concentration from one averaging period to the next.

EXPERIMENTAL METHODS

The PAQS dataset covers the period July 2001 to September 2002, although here we will use exactly one year of data starting September 1, 2001. The Supersite location was about six kilometers east of downtown Pittsburgh in Schenley Park, adjacent to the Carnegie Mellon University campus at one of the highest locations in the city. Although the site is in the middle of a high density urban area, there are no heavily traveled roads within a few hundred meters of the site. There are no major stationary sources within 2 km of the site except one source that has been shown statistically to have negligible influence (CitationChu et al. 2009). PM2.5 was measured with a Rupprecht and Patashnick Model 1400A Tapered Element Oscillating Microbalance (TEOM), as described in CitationChu et al. (2009). Temperature and relative humidity were measured with a Campbell Scientific HMP45C Probe, while windspeed and wind direction data were obtained with a Met One 014A 3-cup anemometer and a Met One 024A wind vane, respectively. Ultraviolet radiation was measured with a Kipp & Zonen CUV3 UV Radiometer, which is sensitive over the entire range of UVA and UVB. SO2 concentration was determined with an API model 100A Analyzer, and O3 was measured with an API model 400A Analyzer. All of the data were obtained about 3 m above ground on the roof of a trailer positioned in the middle of a grassy field. Data have been combined to provide ten-minute average values throughout the study period for all of these instruments.

Balloon temperature measurements are taken at the Pittsburgh airport at 12:00 and 24:00 GMT (7 AM and 7 PM in Pittsburgh, Eastern Standard Time) and transformed into a vertical virtual potential temperature profile. To obtain the mixing height from the temperature profile, we used a variant of the CitationHolzworth (1967) method applied to morning conditions, which intersects the measured virtual potential temperature with an offset line. Holzworth uses an offset line of 5°C added to the minimum surface temperature observed between 2 AM and 6 AM local standard time, “established arbitrarily to allow for urban-rural differences in morning surface temperatures and for some solar heating of the surface after sunrise” (CitationHolzworth 1967, p. 1040).

Because we needed mixing height estimates at each hour, we chose to use data provided by the Rapid Update Cycle (RUC) model (CitationBenjamin et. al. 2004), which estimates the temperature profile at a grid of points over the entire country. We used the closest grid point which is roughly 10 kilometers from the Pittsburgh airport. Since the airport is no longer entirely rural and RUC takes some ground heating into account, we used a maximum offset of 1.5°C (Benjamin, personal communication). The offset for each hour was taken to be proportional to the solar radiometer data, scaled to be 1.5°C at the solar zenith, with a minimum of 0.1°C. Values of mixing height lower than 100 m are set equal to 100 m to avoid singularities caused by ground level mixing height (infinite inverse mixing height) and to account for the fact that no emissions occur as low as ground level (z = 0). The explanatory variable used in the regressions is inverse mixing height, so the maximum value is 0.01 m–1.

STATISTICAL METHODS

Linear regression assumes a dependent variable can be modeled by a linear combination of coefficients and explanatory variables (explanatory variables are also called independent variables in the regression, but they are not necessarily statistically independent):

where y = (y 1, …, y n ) is the vector of observations of the dependent variable, x j = (x 1j , …, x nj ) is the vector of observations of thej th explanatory variable, β j are the coefficients, and ϵ i are error terms. The task here is to find the values of β j that provide the best fit to the data based on minimizing the least squares error.

The explanatory variables were chosen based on the data we had available from the supersite program. These variables included concentrations of SO2 and O3 as well as meteorological observations, so each x j could be one of those variables or its transform. The dependent variable y could be the observed PM2.5 concentration or alternatively the difference of two consecutive observed PM2.5 concentrations. The former choice corresponds to building a model to explain the observed PM2.5, and the latter corresponds to building a model to explain the change of the observed PM2.5. We tried both models in our analysis.

For all the regressions presented here, there were 1,155 observations over one year. The limiting factor determining sample size was the number of 3-h time periods for which average values of all variables of interest were available.

RESULTS AND DISCUSSION

We start with Model 1, which predicts PM2.5 concentration by explanatory variables x j as listed in . The variables represented are:

  • [SO2]—SO2 concentration; Temp—temperature; Humid—relative humidity

  • [O3]—O3 concentration; UV—intensity of downwelling ultraviolet radiation; iMHc—inverse mixing height, corrected to a maximum 0.01 m−1

  • rSE, rS, rSW—fraction of time wind is from the direction southeast, south, and southwest

  • fsummer—categorical variable denoting the influence of season, equal to 1 for Summer and 0 for the other seasons, where the seasons are defined by their calendar dates (Summer is June 21–September 21, etc.)

  • ffall—same as fsummer but equal to 1 for Fall

  • fwinter—same as fsummer but equal to 1 for Winter

  • Intercept—the value of PM2.5 concentration when all explanatory variables are zero, i.e., β0 in Equation (Equation1).

  • There are a few points to note regarding these explanatory variables.

  1. All the explanatory variables used in the regressions are 3-h averages, except for fsummer, ffall, and fwinter.

  2. The directions for rSE, rS, and rSW are defined by potential local sources, where southeast corresponds to 112°–142°, south corresponds to 146°–201°, and southwest corresponds to 222°–254° (CitationChu et al. 2009).

  3. The categorical variables fspring, fsummer, ffall, and fwinter account for the effect of season. By setting fspring = 0, we can interpret the coefficients β fsummer , β ffall , and β fwinter as the difference in PM2.5 concentration between spring and each of the other three seasons. This is done since the four categorical variables plus the intercept β 0 are not linearly independent, and hence one variable must be eliminated to avoid a singularity in the solution. The values of the three coefficients are obtained by fitting the model, and they are interpreted as average values of the difference in concentration between each season and spring. These categorical variables have been included to determine whether the physically and chemically based variables in successfully explain most of the variation in PM2.5; the effects of the four seasons should be negligible, for example, if the variation in PM2.5 is due entirely to changes in the variable Temperature.

TABLE 1 Results of the linear regression Model 1 for PM2.5 concentration as the dependent variable. The symbols *, **, and § indicate significance at 0.05, 0.01, and 0.001 levels, respectively. The adjusted r2 for this model is 0.59

The estimated value of each coefficient j in the table has units such that the product j j has units of μ g/m3. The t value is the t-statistic for testing whether each coefficient is different from 0. The last column in the table shows Pr(> | t|), also known as the p-value, which is the probability of getting a value that is at least as extreme as the observed coefficient assuming the distribution of j is centered at zero. The column j j in the table includes values that are positive and negative, and the total average modeled concentration is the sum of all terms, which in this case is equal to 22.3 μ g/m3. Because the explanatory variables have a variety of units, the product j j is particularly important. Traditionally significance levels (i.e., the p-values) for each explanatory variable are considered to represent its relative importance, but here most of the explanatory variables become important since the sample size is so large. Therefore we report the products to see which variables are important contributors to the PM2.5 concentration on average.

Several results stand out from . Unlike results of our earlier work (CitationChu et al. 2009), the wind direction variables are not major contributors to predicting PM2.5. The inverse mixing height is also unimportant. Results show that the most important explanatory variable is relative humidity, followed by the categorical variable summer. Compared to PM2.5 concentrations in spring, the concentrations in summer are greater by about 1/3, the concentrations in fall are about the same as in spring, and the concentrations in winter are slightly smaller. From a physical viewpoint, one would expect that the concentrations are mostly affected by meteorological differences among the seasons; the fact that summer is one of the most important variables indicates that the regression has not captured the meteorological variables responsible for the seasonal differences. This suggests that Model 1 is too simple to explain the observed PM2.5 levels. Thus we introduce Model 2, which includes higher order terms of some variables, e.g., squares of variables and interactions among variables that are expressed as products, to try to account for the differences between seasons. Results are shown in .

TABLE 2 Results of the linear regression Model 2 for PM2.5 concentration as the dependent variable. The symbols *, **, and § indicate significance at 0.05, 0.01, and 0.001 levels, respectively. The adjusted r2 for this model is 0.66.

The most important terms in the column of j j are those for Temp, (Temp)2, and Temp × Humidity. At the average values of the 16 explanatory variables, it is mainly these three variables that have an important effect on PM2.5 concentration. The negative value of j j for Temp would suggest an inverse relation between PM2.5 concentration and temperature, but this is misleading. All terms containing temperature must be considered when examining the influence of temperature; when all five terms are considered together, the PM2.5 concentration increases as the temperature increases, since the (Temp)2 term becomes dominant at high temperature.

Care must be taken in interpreting the asterisks showing the significance of each independent variable. The last column suggests that the coefficient of iMHc is different from 0 with a probability of > 0.999. However, the value of j j is only 2.07 while the sum of all terms is 17.2 μ g/m3. Compared to the contributions of Temp, Temp2, and Temp × Humidity, the contribution of iMHc is quite small.

In checking the effects of an explanatory variable where there are higher order terms, it is necessary to include all terms containing that variable when considering its contribution. For example, the term Temp × Humidity has the third highest j j value in (after Temp and Temp2). Thus both Temp and Humidity are important variables despite the smaller values for Humidity and Humidity2.

To further investigate the effects of temperature and relative humidity, we have run Model 3 including only the important explanatory variables in Model 2 along with the seasonal categorical variables. Results are shown in . The table shows that the seasonal terms are all negligible, indicating we have successfully explained most of the variation in PM2.5 using physically meaningful variables. Although we only include the temperature and humidity terms in this model, along with associated quadratic terms, we are able to explain more than half (52%) of the variation in PM2.5.

TABLE 3 Results of the linear regression Model 3 for PM2.5 concentration as the dependent variable. The symbols *, **, and § indicate significance at 0.05, 0.01, and 0.001 levels, respectively. The adjusted r2 for this model is 0.52.

We have also run a model for the dependent variable y i equal to the change in PM2.5 concentration from one time period to the next, denoted by Δ PM2.5. Results show that Humidity and Humidity2 are the two most important explanatory variables in predicting Δ PM2.5. However, this model is able to explain only a small percentage of the variation in Δ PM2.5, as the adjusted value of r2 is only 0.11.

Altogether, these results show that several variables expected to be important turned out not to be. The inverse mixing height was used in the model, mainly because we would expect PM2.5 concentration to vary inversely with mixing height; as noted above it was of negligible significance in the regressions. Also of negligible significance were concentrations of SO2 and O3, wind direction, and intensity of ultraviolet radiation. Separate tests showed that day-of-week had little influence on PM2.5 concentration, suggesting that vehicle emissions were not important.

Factor analysis by principal component extraction with varimax rotation was conducted with a more complete PAQS dataset by CitationMillet et al. (2005). Their analysis showed that one of the factors was dominated by PM2.5, and this factor had only a weak diurnal variation. On the basis of this analysis, which also included a number of individual volatile organic compounds, the authors concluded that automotive emissions were not important to the variability in PM2.5 concentration, although local point source combustion had an effect and long range transport was a dominant factor influencing PM2.5. The importance of long range transport on PM2.5 during PAQS was also reported by CitationTang et al. (2004) based on measurements at several sites surrounding Pittsburgh. This hypothesis was further confirmed by CitationRobinson et al. (2006) who considered the chemistry of organic aerosols to study the influence of oxidation during atmospheric transport on source apportionment modeling.

In addition, CitationZhou et al. (2004a) used Positive Matrix Factorization (PMF) with PAQS data for a five-week period in summer 2001 (no overlap with the present study). The input data included particle number and volume concentration by size as well as PM2.5 mass, sulfate, and organic/elemental carbon, pollutant gases, and meteorology. The one factor dominated by PM2.5 mass was attributed to long-range transport, and Fourier analysis showed no diurnal variation suggesting little influence of local traffic. However, using PMF with bulk chemical composition data over a five-day period in July 2001 (CitationZhou et al. 2004b; CitationZhou et al. 2005) and over 13 months (CitationPekney et al. 2006) during PAQS showed that the influence of local traffic on the measurement site could be discerned despite its lack of major influence on total PM2.5.

shows the fitted values from Model 2 versus the observed data of a random sample (size = 100) from the PM2.5 concentrations. Although the fit is reasonable, the influential factors in the model are not necessarily the cause of a high PM2.5 concentration. Rather, the cause may be obscure.

FIG. 1 Fitted PM2.5 vs. Observed PM2.5. A random sample with size 100 is taken from all 3-h observations with valid observed data and fitted values. The length of error bars of the fitted values is set to the standard error from the linear regression.

FIG. 1 Fitted PM2.5 vs. Observed PM2.5. A random sample with size 100 is taken from all 3-h observations with valid observed data and fitted values. The length of error bars of the fitted values is set to the standard error from the linear regression.

We also observe high correlations between some of the explanatory variables, e.g., temperature and relative humidity are highly correlated. This is somewhat consistent with the results that PM2.5 is mostly influenced by temperature, squared temperature and temperature–humidity interaction, and Δ PM2.5 is mostly influenced by humidity and humidity2. Thus temperature and humidity are essentially one factor; PM2.5 and Δ PM2.5 are essentially associated with this one main factor.

The low r2 value for the attempt to model Δ PM2.5 suggests that we are not able to explain much of the change in PM2.5 concentrations from one time period to the next. To explore these variations further, we have attempted to predict PM2.5 concentrations using Model 4 which includes an autocorrelation structure, that is, the consecutive observations are not independent but correlated. Results are shown in .

TABLE 4 Results of the linear regression Model 4 for PM2.5 concentration as the dependent variable with autocorrelated errors. The symbols *, **, and § indicate significance at 0.05, 0.01, and 0.001 levels, respectively. The adjusted r2 equals 0.54, slightly lower than Model 2 without autocorrelation, but Model 2 incorrectly assumes independent errors. Model 4 gives an estimate for the first order autocorrelation coefficient of 0.77.

Results indicate that a model with 0.77 first order autocorrelation between residuals is a better fit to the data than Models 1–3 which assume the errors are independent. Thus the PM2.5 concentration at any specified time is a very influential factor in predicting its value in the next time period. This also justifies averaging 10 min measurements to obtain 3-h average values as was done in our previous analysis (CitationChu et al. 2009), since PM2.5 concentrations in this study generally did not change dramatically in the time scale of a few hours.

shows that the explanatory variables Temp, (Temp)2, and Temp × Humidity are the most important in explaining PM2.5 concentration, as was true in . This is expected, as Model 4 is identical to Model 2 except for the addition of the autocorrelation structure. As we did with Model 3, we ran a reduced form of Model 4 with only temperature and humidity related terms as well as the categorical seasonal variables. Results of this reduced model were similar to those of Model 3 but with a first order autocorrelation coefficient of 0.79.

Although these results are interesting, we must acknowledge several limitations of the method. First and foremost, regressions can be influenced disproportionally by a small number of high valued points. shows that most measured concentrations are fairly low; the limited number of high concentration datapoints can thus play a major role in affecting the outcome. Second, these regressions assume linearity even though the actual relationships may be nonlinear. In addition, the assumption of normality of errors and independence of errors implied in the regressions may not be correct. Finally, we recognize that the existence of variables that explain much of the variation in PM2.5 does not imply causality.

There are also limitations of the input data, which were subject to experimental error and represent conditions only at the measurement site. Meteorological data may vary over the region of PM2.5 transport, which would not have been captured by the dataset.

Nevertheless, the results demonstrate that high temperatures and relative humidities in this region of eastern United States may be associated with high PM2.5 concentrations to a greater extent than elevated concentrations of SO2 or O3 or high levels of UV. There does not appear to be an association between PM2.5 and inverse mixing height. This may be useful information for air pollution control agencies tasked with predicting PM2.5 episodes.

It is interesting to compare our results with PM2.5 measurements in other cities. CitationDeGaetano and Doherty (2004) report data from a network of 20 monitoring stations in New York City. Their analyses show that seasonal variability is similar to that in Pittsburgh: less than 20% of all morning hours in summer had average temperatures exceeding 29°C, yet more than half of the summer PM2.5 concentrations above the 95th percentile occurred during those hours. There were no occurrences of 95th percentile concentrations when the temperature was below 21°C. High PM2.5 concentrations were also associated with high humidity in their study. The annual maximum in PM2.5 occurred in the summer as in our study.

Unlike Pittsburgh, however, strong diurnal and day-of-week variations were observed, with higher concentrations during the morning rush hour and in the afternoon and evening on weekdays, and lower concentrations on weekends, consistent with the influence of local traffic. The authors note that early morning atmospheric stability probably also contributed to the morning peak concentrations. The high summer PM2.5 levels in New York were frequently associated with winds from the SW. Regional meteorology showed that these winds resulted from the presence of a high pressure center off the SE coast of the United States, indicating a westward displacement of the usual Bermuda High, and a low pressure center over northeastern Canada. Note that SW winds in New York are consistent with clockwise flow around the Bermuda High when the pressure center is located SE of the city. Spatial variations in concentration across the 20 urban sites were small. The authors conclude that regional meteorology and long-range transport dominate day-to-day changes in PM2.5 concentration.

Regional meteorology and long-range transport are also important for the day-to-day variations in PM2.5 in Pittsburgh. However, nearby sources SE of the city are also important, as shown by CitationMillet et al. (2005) and CitationChu et al. (2009). Note that when winds are from the SE in Pittsburgh, this cannot be due to a displaced Bermuda High; if it were due to clockwise flow around a High, the pressure center would have to be NE of the city. Thus regional meteorology favoring high PM2.5 concentrations in Pittsburgh is likely to be different from the regional meteorological conditions reported by CitationDeGaetano and Doherty (2004) as sometimes responsible for high PM2.5 in New York.

In contrast to Pittsburgh and New York City, CitationAldrin and Haff (2005) report on PM2.5 concentrations at four locations in Oslo, Norway. Their results show essentially no correlation between temperature above 0°C and PM2.5, and a strong inverse relation below freezing: PM2.5 increases as temperature decreases below zero. This is attributed to wood burning during cold weather in the city. PM2.5 concentrations increase only very weakly as relative humidity increases, although the concentration of coarse particles, PM10 – PM2.5, decreases sharply with a relative humidity increase.

There is almost no association between PM2.5 concentrations in Oslo and the difference in temperature T(z = 25 m) − T(z = 2 m) which may be considered a proxy for mixing height. The authors also note that PM2.5 concentration is positively correlated with automotive traffic in the vicinity of the four measurement sites, indicating the importance of local sources for these sites which were close to roads.

PM2.5 concentrations in Oslo show an annual maximum in late March/early April rather than in summer as in Pittsburgh and New York. CitationBerge et al. (2002) report that winter road maintenance and use of studded tires during periods of snow can contribute to airborne particle loadings in Oslo which may be occurring during this time period. In addition, the Joint Research Centre report on European aerosol (CitationPutaud et al. 2003, p. 11) states that particulate matter in European cities can be strongly influenced by regional background concentrations. Note that the March/April time period is also the peak season for arctic haze (CitationHeidam et al. 2004; CitationQuinn et al. 2007). Thus large scale meteorology may be affecting the Oslo measurements but in a different way compared with the East Coast U.S. measurements.

CitationPerez and Salini (2008) have developed predictive models for PM2.5 concentration in Santiago, Chile. The models predict the concentration on the following day using data from 8:00 PM on the current day from four stations in the city. Based on comparing their model results with data, they report that the best variables to use in their predictive models are (1) 1-h average PM2.5 concentrations at 7:00 PM and 8:00 PM on the current day, (2) average PM2.5 concentrations over the previous 24 h determined at 8:00 PM on the current day, (3) the thermal amplitude (T max – T min ) for the city for the current day, (4) the predicted thermal amplitude for the following day, and (5) a predicted stability index for the following day, which ranges from highly stable and low mixing height to extremely unstable, e.g., for passage of a frontal system.

Their results show that unlike Pittsburgh or Oslo, high PM2.5 concentrations are associated with greater atmospheric stability (low mixing height). The authors attribute this to the geography of Santiago, which is ringed by mountains preventing adequate ventilation of ground-level emissions when mixing height is low. The highest concentrations occur in the austral winter April–August, which is when mixing height is lowest.

Finally, it is of interest to consider the annual cycle of PM2.5 concentrations in Los Angeles which is also ringed by mountains. Concentrations peak in November–December each year (CitationSouth Coast Air Quality Management District 2007), probably due at least in part to the low mixing heights in winter. A secondary peak occurs in July, which is most likely indicative of rapid conversion of precursor gases to PM2.5 during the high temperatures in summer.

We thus see that high PM2.5 concentrations in Pittsburgh and New York appear to be influenced by regional meteorology and transport from upwind areas as well as by local sources. High concentrations in both cities are associated with high temperatures and relative humidities. A different situation exists in Oslo, where neither temperature nor relative humidity is strongly associated with PM2.5 concentration. Regional meteorology appears to be important but in a different way. Both Santiago and Los Angeles are affected by high mountains, which prevent dispersion under certain conditions. PM2.5 concentrations in these cities may be more influenced by the occurrence of low mixing height than in the other locations discussed.

These results can be reconciled with our previous work. The current study shows no effect of wind direction, even though local point sources to the S and SE of the monitoring site were shown to be important when including only exceedance days in the analysis (CitationChu et al. 2009). It is of interest that wind direction over the full year is correlated with temperature: higher temperatures are associated with winds from the S and SE. If wind direction is important for the overall dataset and correlates with high temperature (and relative humidity), then wind direction should show up as an important variable in winter. Therefore we ran another model with the fractional wind direction variables and the seasonal categorical variables to see if the wind direction is important in specific seasons. Results are similar to those in Model 2 in that wind direction is still not important and there are no important differences among the seasons. Thus we conclude that local point sources are not important in the overall dataset even though they are important on exceedance days.

It is also significant that PM2.5 does not correlate with inverse mixing height in Pittsburgh, even though we might expect such a correlation. This result suggests that in the absence of mountains or other barriers to transport, factors affecting background concentrations such as regional meteorology will be more important than mixing height. Note that this finding is strictly applicable only to the set of data we analyzed and may not be more generally applicable. Nevertheless, we know that regional meteorology can cause major changes in background PM2.5 concentration entering a city, and may be more important in influencing PM2.5 than efforts such as reducing emissions locally. The location of Pittsburgh downwind of major sources in the Ohio Valley, and the location of New York City downwind of major sources to the west and southwest, suggests that a regional approach to PM2.5 reduction is needed.

CONCLUSIONS

Data from September 1, 2001 to August 31, 2002 collected during the Pittsburgh Air Quality Study have been used with a linear regression model to identify factors that most influence concentrations of PM2.5. Results show that temperature, relative humidity and their associated quadratic terms can explain much of the variation in PM2.5, while wind direction, inverse mixing height, UV radiation, SO2, O3, and season of the year are not important. All of the data except inverse mixing height and season were 10-minute average values. Our previous work with this dataset but including only data for PM2.5 exceedance days shows that wind direction is important, implicating local sources (CitationChu et al. 2009). We conclude that local sources influence PM2.5 levels on the limited number of days of high concentration when the 24-h National Ambient Air Quality Standard for PM2.5 is exceeded, but are less important than long-range transport in influencing PM2.5 when examining a full year of data.

A limitation of the above regressions is that they assume the PM2.5 concentration in one 10-min period is uncorrelated with the concentration in the next period. To assess this possibility, we ran the regression using an autocorrelation structure. Results showed that a better fit than the original model was obtained with 0.77 first order autocorrelation between residual errors (PM2.5 observed minus PM2.5 fitted). Temperature, relative humidity, and quadratic terms were still the most important factors explaining the variation in PM2.5.

By comparing our results with those of studies in other cities, we note that in both Pittsburgh and New York City, much of the variation in PM2.5 can be explained by measurements of temperature, and, to a lesser extent, relative humidity. Concentrations in both cities' peak in the summer. In contrast, concentrations of PM2.5 in Oslo peak in March/April, with a different influence of temperature and smaller influence of relative humidity. In both Pittsburgh and Oslo, the atmospheric mixing height does not appear to have a major influence on the concentration. Finally, mixing height appears to be an important determinant in Los Angeles and in Santiago, Chile. Both of these cities are ringed by mountains. Overall we conclude that factors influencing PM2.5 vary greatly from one city to another, especially for cities with different topographic characteristics and in different climate regimes, and thus conclusions based on data obtained at one location cannot be easily generalized to other locations.

We acknowledge the generous assistance of Stan Benjamin of the NOAA/ESRL Global Systems Division in obtaining the RUC data and assisting with interpretation, and that of Louis Giardano of the National Weather Service office in Pittsburgh for his assistance in estimating mixing heights. We also acknowledge the assistance of Xiting Yang, Anna Anselmi, and David Hwang in earlier versions of this work. This research was conducted as part of the Pittsburgh Air Quality Study that was supported by US Environmental Protection Agency under contract R82806101 and the US Department of Energy National Energy Technology Laboratory under contract DE-FC26-01NT41017. This article has not been subject to EPA's required peer and policy review, and therefore does not necessarily reflect the views of the Agency. No official endorsement should be inferred.

REFERENCES

  • Aldrin , M. and Haff , I. H. 2005 . Generalised Additive Modelling of Air Pollution, Traffic Volume and Meteorology . Atmos. Environ. , 39 : 2145 – 2155 .
  • Benjamin , S. G. , Dévényi , D. , Weygandt , S. , Brundage , K. , Brown , J. M. , Grell , G. A. , Kim , D. , Schwartz , B. , Smirnova , T. G. , Smith , T. L. and Manikin , G. 2004 . An Hourly Assimilation-Forecast Cycle: The RUC . Monthly Weather Review , 132 : 495 – 518 .
  • Berge , E. , Walker , S. E. , Sorteberg , A. , Lenkopane , M. , Eastwood , S. , Jablonska , H. I. and Koltzow , M. O. 2002 . A Real-Time Operational Forecast Model for Meteorology and Air Quality During Peak Air Pollution Episodes in Oslo, Norway . Water, Air, Soil Pollut.: Focus , 2 : 745 – 757 .
  • Chu , N. , Kadane , J. B. and Davidson , C. I. 2009 . Identifying Likely PM2.5 Sources on Days of Elevated Concentration: A Simple Statistical Approach . Environ. Sci. Technol. , 43 : 2407 – 2411 .
  • DeGaetano , A. T. and Doherty , O. M. 2004 . Temporal, Spatial and Meteorological Variations in Hourly PM2.5 Concentration Extremes in New York City . Atmos. Environ. , 38 : 1547 – 1558 .
  • Heidam , N. Z. , Christensen , J. , Wahlin , P. and Skov , H. 2004 . Arctic Atmospheric Contaminants in NE Greenland: Levels, Variations, Origins, Transport, Transformations, and Trends 1990–2001 . Sci. Tot. Environ. , 331 : 5 – 28 .
  • Holzworth , G. C. 1967 . Mixing Depths, Wind Speeds and Air Pollution Potential for Selected Locations in the United States . J. Appl. Meteorol. , 6 : 1039 – 1044 .
  • Millett , D. B. , Donahue , N. M. , Pandis , S. N. , Polidori , A. , Stanier , C. O. , Turpin , B. J. and Goldstein , A. H. 2005 . Atmospheric Volatile Organic Compound Measurements during the Pittsburgh Air Quality Study: Results, Interpretation, and Quantification of Primary and Secondary Contributions . J. Geophys. Res. , 110 D07S07
  • Pekney , N. J. , Davidson , C. I. , Robinson , A. , Zhou , L. , Hopke , P. , Eatough , D. and Rogge , W. F. 2006 . Major Source Categories for PM2.5 in Pittsburgh using PMF and UNMIX . Aerosol Sci. Technol. , 40 : 910 – 924 .
  • Perez , P. and Salini , G. 2008 . PM2.5 Forecasting in a Large City: Comparison of Three Methods . Atmos. Environ. , 42 : 8219 – 8224 .
  • Putaud , J.-P. 2003 . Available at http://ccu.jrc.ec.europa.eu/publications/putaud_JF1_pdfPM-draft_02Jul.pdf A European Aerosol Phenomenology: Physical and Chemical Characteristics of Particulate Matter at Kerbside, Urban, Rural, and Background Sites in Europe, Joint Research Centre, European Commission, 2003, 55 pages
  • Quinn , P. K. , Shaw , G. , Andrews , E. , Dutton , E. G. , Ruoho-Airola , T. and Gong , S. L. 2007 . Arctic Haze: Current Trends and Knowledge Gaps . Tellus B , 59 : 99 – 114 .
  • Robinson , A. L. , Donahue , N. M. and Rogge , W. F. 2006 . Photochemical Oxidation and Changes in Molecular Composition of Organic Aerosol in the Regional Context . J. Geophys. Res. , 111 D03302
  • Seinfeld , J. H. and Pandis , S. N. 2006 . Atmospheric Chemistry and Physics: From Air Pollution to Climate Change, , 2nd ed. , New York : John Wiley .
  • South Coast Air Quality Management District . 2007 . Final 2007 Air Quality Management Plan, Appendix 2: Current Air Quality, Chapter 2: Air Quality in the South Coast Air Basin, Figure 2-13, p. II-2-16, June 2007
  • Tang , W. , Raymond , T. , Wittig , B. , Davidson , C. I. , Pandis , S. N. , Robinson , A. L. and Crist , K. 2004 . Spatial Variations of PM2.5 During the Pittsburgh Air Quality Study . Aerosol Sci. Technol. , 38 ( S2 ) : 80 – 90 .
  • Wittig , A. E. , Anderson , N. J. , Khlystov , A. Y. , Pandis , S. N. , Davidson , C. I. and Robinson , A. L. 2004 . Pittsburgh Air Quality Study Overview . Atmos. Environ. , 38 : 3107 – 3125 .
  • Zhou , L. , Kim , E. , Hopke , P. K. , Stanier , C. O. and Pandis , S. 2004a . Advanced Factor Analysis on Pittsburgh Particle Size-Distribution Data . Aerosol Sci. Technol. , 38 ( S1 ) : 118 – 132 .
  • Zhou , L. , Hopke , P. K. , Paatero , P. , Ondov , J. M. , Pancras , P. J. , Pekney , N. J. and Davidson , C. I. 2004b . Advanced Factor Analysis for Multiple Time Resolution Aerosol Composition Data . Atmos. Environ. , 38 : 4909 – 4920 .
  • Zhou , L. , Hopke , P. K. , Stanier , C. O. , Pandis , S. N. , Ondov , J. M. and Pancras , J. P. 2005 . Investigation of the Relationship between Chemical Composition and Size Distribution of Airborne Particles by Partial Least Squares and Positive Matrix Factorization . J. Geophys. Res. , 110 : D07S18

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.