2,126
Views
3
CrossRef citations to date
0
Altmetric
Articles

Systematic review of the association between long-term exposure to fine particulate matter and mortality

Pages 1647-1685 | Received 04 Dec 2020, Accepted 08 Mar 2021, Published online: 13 Apr 2021

ABSTRACT

We used a transparent systematic review framework based on best practices for evaluating study quality and integrating evidence to conduct a review of the available epidemiology studies evaluating associations between long-term exposure to ambient concentrations of PM2.5 and mortality (all-cause and non-accidental) conducted in North America. We found that while there is some consistency across studies for reporting positive associations, these associations are weak and several important methodological issues have led to uncertainties with regard to the evidence from these studies, including potential confounding by measured and unmeasured factors, exposue measurement error, and model misspecification. These uncertainties provide a plausible, alternative explanation to causality for the weakly positive findings across studies. Using a causality framework that incorporates best practices for making causal determinations, we concluded that the evidence for a causal relationship between long-term exposure to ambient PM2.5 concentrations and mortality from these studies is inadequate.

Introduction

Particulate matter (PM) is the generic term for a mixture of solid particles and liquid droplets in various size fractions in ambient air that comprises the particle phase of air pollution. PM originates from numerous primary sources, including industrial activities, fossil fuel combustion, motor vehicles, crustal material, and burning of natural materials (e.g. forest fires) (US EPA Citation2019). Secondary PM can be formed in ambient air from chemical reactions of gaseous pollutants such as nitrogen oxides, sulfur oxides, and volatile organic compounds (US EPA Citation2019). As a consequence of this wide variety of sources, PM has a variable chemical composition and particle size distribution.

While the toxicity of PM is dependent on the chemical composition of the particles, particle size is also an important characteristic with respect to potential health effects from exposure to PM (Miller Citation2014; US EPA Citation2019). Different sized particles can penetrate into different regions of the respiratory tract, with potential for different health outcomes. The United States Environmental Protection Agency (US EPA) evaluates the potential health effects of exposure to three main size fractions of PM, classified according to the aerodynamic diameter of particles (US EPA Citation2019). Coarse or thoracic coarse PM (PM10-2.5) has a nominal mean aerodynamic diameter > 2.5 μm and ≤ 10 μm, and is largely comprised of particles such as soil and street dust, road wear debris, fly ash, oxides of crustal elements, sea salt, nitrates, sulfates, and biological aerosols (e.g. pollen, fungal spores, mold). Fine PM (PM2.5) has a nominal mean aerodynamic diameter ≤ 2.5 μm and is typically comprised of water; elemental carbon; low and moderate volatility organic compounds; metal compounds; and sulfate, nitrate, ammonium, and hydrogen ions. Ultrafine particles (UFPs) are generally considered to have a diameter ≤ 0.1 μm based on physical size, thermal diffusivity, or electrical mobility, and are commonly comprised of elemental carbon, low volatility organic compounds, metal compounds, and sulfate. In addition, PM with a nominal mean aerodynamic diameter of ≤ 10 μm (which includes all of the above PM size fractions) is referred to as PM10 or thoracic PM, though US EPA does not focus on this particular size fraction for evaluations of health effects.

Particulate matter is one of six criteria air pollutants for which the Clean Air Act (CAA) mandates the US EPA set health-based National Ambient Air Quality Standards (NAAQS). In 2012, the US EPA established a new annual PM2.5 primary NAAQS of 12 μg/m3 (annual mean averaged over 3 years) and retained the 24-hour PM2.5 NAAQS of 35 μg/m3 (98th percentile averaged over 3 years) previously set in 2006 (US EPA Citation2013). The CAA mandate also requires that the NAAQS for each criteria air pollutant be reviewed every 5 years. As part of this review process, the US EPA develops an Integrated Science Assessment (ISA) for each criteria air pollutant, in which causal relationships between criteria air pollutant exposures and various human health and welfare effects are evaluated using a framework that US EPA developed specifically for this purpose (US EPA Citation2015). The most recent ISA for PM was finalized in 2019 (US EPA Citation2019) and the human health effects evaluation focused on studies of short- and long-term exposures to PM at concentrations relevant to the range of human ambient exposures that were published after those reviewed in the previous PM ISA, which was finalized in 2009 (US EPA Citation2009).

Over the last several decades, the epidemiology literature has evaluated associations between PM2.5 exposure and mortality. US EPA (Citation2019) conducted a comprehensive review of this literature in the PM ISA and concluded that there is a causal relationship between long-term (i.e. one month or longer) exposure to PM2.5 and total (nonaccidental) mortality. The Preamble to the ISAs (US EPA Citation2015) describes the general framework for evaluating scientific evidence (referred to herein as the ‘NAAQS framework’) and the Appendix of the PM ISA (US EPA Citation2019) provides aspects for assessing the quality of studies of PM exposure, but neither document provides detailed guidance on evidence evaluation and causal determinations. We have recently developed a more transparent systematic review and causality framework that is based on the NAAQS framework but is modified to incorporate best practices for evaluating study quality, evaluating and integrating evidence, and making causal determinations, to allow for a scientifically sound assessment of the evidence (Goodman et al. Citation2020). Here, we evaluate the epidemiology literature on the association between long-term exposure to ambient concentrations of PM2.5 and mortality (all-cause and non-accidental) using our modified framework. To be consistent with the evaluation in the most recent PM ISA, we limit our analysis to epidemiology studies published after those included in the 2009 PM ISA. We also limit our analysis to studies conducted in North America, as these are most generalizable to the US population and therefore are most relevant to the PM NAAQS. We contrast our analysis to that conducted by US EPA in the PM ISA and consider whether and how differences between the NAAQS framework and our modified framework led to different conclusions regarding causality.

Methods

Literature searches and study selection

The principal question of our evaluation is whether the available evidence supports a causal relationship between long-term exposure to PM2.5 and mortality (all-cause or non-accidental) at ambient concentrations. We searched the PubMed and Scopus databases for epidemiology studies published between 1 January 2009, and 1 January 2020, using the following terms: (PM2.5 OR ‘PM2.5’ OR ‘particulate matter 2.5’) AND (exposure OR exposures OR exposed) AND (mortality OR death) AND (‘all cause’ OR ‘total mortality’ OR ‘long term’). We also cross-referenced the PM ISA (US EPA Citation2019) and the bibliographies of relevant review articles to identify additional studies that were not included in the literature search results.

We included peer-reviewed, observational studies that evaluated the association between long-term exposure to PM2.5 (defined by US EPA as one month or longer in duration; US EPA Citation2019) and all-cause or non-accidental mortality. We excluded studies that met any of the following criteria: laboratory animal, in vitro, experimental, or controlled human exposure studies; studies that were not published in English; studies that evaluated constituents of PM2.5 but did not include any evaluation of total PM2.5; studies that did not evaluate long-term, ambient PM2.5 exposure (i.e. studies that evaluated short-term, indoor, or source specific PM2.5 exposures); studies that only evaluated cause-specific mortality and not all-cause mortality; studies that used relative risk estimates or concentration-response information from other epidemiology studies; reviews; editorials; commentaries; correspondence/communications; letters to the editor; studies reviewed in the 2009 PM ISA (US EPA Citation2009), and studies conducted outside of North America.

After identifying studies that met our inclusion and exclusion criteria, we further narrowed down the list of studies to focus on in our evaluation. If we identified more than one study of the same cohort, we included only the most recent study or the one or two studies reporting the most informative data regarding the PM2.5-mortality association in the cohort (e.g. greater population coverage, improved PM exposure estimates, and/or improved statistical analysis with copollutant adjustments, non-linearity examination, or additional confounder adjustments). We excluded ecological studies, because such studies are subject to ecological bias. We also excluded studies that used new or causal modeling approaches, because those approaches have not been widely applied or accepted.

Study quality criteria

The appendices of the most recent ISAs for criteria pollutants, including the PM ISA (US EPA Citation2019), provide a discussion of study quality aspects to consider for evaluating epidemiology studies of the respective pollutant. As some of these aspects are either lengthier or less detailed than others, we previously compiled the aspects into a table that included succinct criteria for what is indicative of higher quality for each aspect, and recommended additional aspects and criteria based on our survey of best practices for evaluating study quality (Goodman et al. Citation2020). Here, we further modified these aspects and criteria to be specific to epidemiology studies of PM2.5 and mortality ().

Table 1. Quality criteria for epidemiology studies of PM2.5 and mortality.

The study quality criteria include a total of 36 specific aspects of epidemiology studies, grouped into seven general categories (study design, study population, pollutant specification, PM2.5 exposure assessment, mortality outcome assessment, confounding, and statistical methods), that are informative of potential bias and uncertainties. While the majority of these aspects assess important dimensions of study conduct (i.e. those in bold font in ), some assess the clarity of study reporting (e.g. those regarding study objectives, participant characteristics, inclusion/exclusion criteria, pollutant description, descriptive statistics, and univariate analyses). Because aspects of PM2.5 specification (i.e. pollutant description and pollutant source) were incorporated in the inclusion/exclusion criteria, and the assessment of outcomes (i.e. all-cause or non-accidental mortality) is much less subjective to misclassification compared to other disease endpoints (e.g. incidence or cause-specific mortality), we focused more on the other five categories of quality criteria, and particularly on the aspects related to study conduct within these categories, in our evaluation of study quality.

Several details with respect to the study quality criteria are worth noting. Regarding sample size, we considered cohorts with sample sizes ≥1 million to be sufficient without power calculations. Because of the large number (e.g. 10+ or even 20+) of potential confounders that are usually adjusted for in air pollution studies, we considered all other sample sizes to require justification based on a power calculation.

Regarding recruitment/participation, if a study was a secondary analysis of data from an existing cohort that was initially recruited for research questions unrelated to PM2.5 or mortality, we considered the criteria related to whether the study population is representative of the source population and the participation rate as not applicable, because such studies usually had to exclude participants for logistic reasons (e.g. not having data for PM2.5 or mortality) rather than through a recruitment/participation process. Similarly, for secondary analyses, we considered the criterion for follow-up as not applicable if the authors used linkage to conveniently identify mortality outcomes in existing cohorts.

Regarding exposure assessment, we considered either a comparison of modeled vs. monitored PM2.5 or multiple PM2.5 modeling approaches as utilizing and comparing more than one exposure assessment method. Regarding spatial variability, we considered 10 km2 or more refined grids as sufficient for modeled PM2.5, as is generally accepted; we considered 5 km or smaller buffers as sufficient for direct site measurements of PM2.5 (i.e. ‘monitored PM2.5’), in order to reduce the potential for measurement error.

With regard to confounding, we considered the criterion for adjustment for potential confounders to be met only if all the listed key confounders in were adjusted for. It is worth noting that this is not an exhaustive list of important potential confounders. The PM2.5-mortality association could also be confounded by factors that are not typically measured in epidemiology studies of PM2.5 and mortality, such as stress and noise (Clougherty and Kubzansky Citation2009; Stansfeld Citation2015; US EPA Citation2019). There could also be residual confounding due to incomplete adjustment of covariates (e.g. socioeconomic status [SES]) and/or lack of adjustment for confounding by secular trend and unknown confounders.

Regarding statistical methods, we focused only on key, testable assumptions (e.g. proportional hazards assumption for Cox proportional hazards model) for the criterion regarding model assumptions, and we considered five or more comparisons based on the same model/analysis pertaining to the PM2.5-mortality association of interest to be subject to the multiple comparison issue and thus needing correction (e.g. Bonferroni correction).

We tabulated whether each of the included studies met each of the criteria listed in . This tabulation allowed for a consistent evaluation of study quality across all studies, by considering whether certain studies met more of the criteria for higher quality than other studies. We used the study quality criteria to identify the strengths and limitations of the studies, and used these to evaluate the study results, as discussed below.

Evidence integration

We assessed the results of the studies in the context of their methodological strengths and limitations (as determined from the analysis of study quality aspects and criteria) and evaluated the reliability of each study’s results to inform potential causality. We then integrated the evidence across studies using Bradford Hill aspects (Hill Citation1965) modified from those listed in the Preamble to the ISAs (US EPA Citation2015) to be more succinct, as described by Goodman et al. (Citation2020) (Supplemental Table S1). We did not use the Bradford Hill aspects as a checklist, as not meeting one or more of the aspects should not automatically preclude a conclusion of causality; rather, the aspects were used to provide a framework to systematically evaluate the weight of the evidence for making causal determinations. It is difficult to imagine a situation in which an association is not causal if every one of these aspects is met, however. Thus, if all of the Bradford Hill aspects are met, we concluded that the evidence as a whole supports causation. By contrast, it may be difficult to conclude that observed associations are causal if most or all of the aspects are not met. Thus, if not all of the Bradford Hill aspects were met, we determined whether it is more likely that the evidence as a whole supports causation (i.e. we provided likely explanations for any aspect that was not met), is suggestive of causation, is inadequate to determine causation, or supports no causation, as described below.

Causal conclusion

To form a conclusion regarding causality, we used a four-tiered framework for causality that is consistent with other causal frameworks, such as that defined in the Institute of Medicine (IOM) report Improving the Presumptive Disability Decision-Making Process for Veterans (IOM Citation2008) (Supplemental Table S2). This differs from the current NAAQS framework, which uses five categories for causation (causal, likely causal, suggestive, inadequate, and not likely causal). As discussed by Goodman et al. (Citation2020), US EPA’s definitions of these categories preclude the need for a likely causal category, which can instead be represented by the suggestive category in a four-tiered framework.

Consistent with the four-tiered framework, if all the modified Bradford Hill aspects were met, we concluded that the relationship between long-term exposure to PM2.5 and mortality is causal. If most of the aspects were met and there is a likely explanation for each that was not met, we also concluded that the relationship is causal. If there was inadequate information to assess some of the modified Bradford Hill aspects and all other aspects were met, we concluded that the evidence for a causal relationship is suggestive. If there was inadequate information to assess some of the Bradford Hill aspects and there was a likely explanation for each of the other aspects that was not met, we also concluded that the evidence for a causal relationship is suggestive. If there was inadequate information to assess most or all of the modified Bradford Hill aspects, we concluded that the evidence for a causal relationship is inadequate. If most or all of the aspects were not met and there is no likely explanation for why they were not met, we also concluded that the evidence for a causal relationship is inadequate. If the overall evidence indicated there is no causal relationship based on the modified Bradford Hill aspects (e.g. there was a consistent lack of an association in robust epidemiology studies), we concluded that the relationship between long-term exposure to PM2.5 and mortality is not causal.

Results

Literature selection

Our literature search for epidemiology studies evaluating the association between long-term PM2.5 exposure and all-cause or non-accidental mortality yielded 360 studies in PubMed and 115 studies in Scopus. We also reviewed the reference lists of three relevant reviews identified in our PubMed and Scopus searches, which contained 321 studies, and the section of the PM ISA that evaluated 34 North American studies of long-term exposure to PM2.5 and mortality. After a review of titles and abstracts, we identified 127 studies from the PubMed search, 6 studies from the Scopus search, 3 studies from cross-referencing the PM ISA, and 1 study from the reference lists of reviews for full text review. After a full text review, we identified 46 studies that met our inclusion and exclusion criteria. Two of these studies only assessed PM2.5 exposure for 30 days; as these studies were outliers compared to the majority of studies that assessed PM2.5 exposure for multiple years, we excluded these two studies. We further narrowed down the study selection by excluding ecological studies and the least recent or least informative studies of cohorts examined in multiple studies (as discussed above in the Methods section). The results for this study selection and detailed exclusion rationales are shown in .

Table 2. Final study selection rationales.

The results of our literature search and study selection are summarized in Supplemental Figure S1. Overall, 23 studies representing 20 underlying cohorts were included in the present review. One study was selected for each cohort, except for the Canadian Census Health and Environment Cohort (CanCHEC) 1991, Cancer Prevention Study (CPS) II, and California Teachers Study (CTS) cohorts, where two studies were selected for each cohort to represent distinct PM2.5 measurement approaches (i.e. ‘monitored’ vs. ‘modeled’), of which each approach has its own strengths and limitations and is not necessarily considered better than the other approach. In addition, the studies by Hart et al. (Citation2015) and DuPre et al. (Citation2019) both evaluated the Nurses’ Health Study (NHS) cohort, but the latter study was restricted to female nurses with breast cancer and additionally included the NHS II cohort, so the overlap in the study population was not substantial and we included both studies in this review.

The characteristics of the 23 included studies are summarized in . All of these studies used a cohort study design, with follow-up periods generally from the 1980s to the 2000s and follow-up time ranging from 5 to 35 years. Seventeen studies were conducted in US populations whereas the other six studies were conducted in Canadian populations. Within either country, most of the studies were conducted across multiple cities. In general, the studies analyzed individuals from three types of source population: (1) the general population, including both males and females; (2) individuals in specific professions (e.g. veterans, trucking industry workers, teachers, farmers, and health professionals), mostly limited to only males or only females; and (3) patients with underlying health conditions (e.g. hepatocellular cancer, myocardial infarction [MI]), including both males and females. Participants were mostly middle-aged or older, and only a few studies also included younger individuals. The sample size of the studies varied substantially, from as low as a few thousand (e.g. Malik et al. Citation2019) to as high as tens of millions (e.g. Di et al. Citation2017).

Table 3. Characteristics of epidemiology studies of long-term PM2.5 exposure and all-cause or non-accidental mortality.

All but one study (Villenueve et al. Citation2015) examined only one type of mortality outcome, with half of the studies examining all-cause mortality and the other half examining non-accidental mortality. While seven studies measured PM2.5 concentrations directly from monitoring sites, 18 studies estimated PM2.5 concentrations using modeling approaches. Only two studies (Hart et al. Citation2015; Di et al. Citation2017) examined both direct site measured and modeled PM2.5 in relation to mortality. As shown in , the PM2.5 modeling approaches varied among the studies that examined modeled PM2.5, with GEOS-Chem chemical transport model (CTM) and the Geographic Information System (GIS)-based smoothing model being the most commonly used techniques. The reported mean PM2.5 concentration also varied among the studies, ranging from 6.32 to 10.7 μg/m3 in Canadian studies and from 9.52 to 18.2 μg/m3 in US studies.

Study quality evaluation

The results of our study quality evaluation are presented in . If a study met a specific criterion, the column for that study shows a ‘+’ in the row for that criterion. If a study did not meet a specific criterion, the column for that study is blank in the row for that criterion. If a criterion is not applicable to a particular study (as discussed above in the Methods section), the column for that study shows ‘NA’ in the row for that criterion.

Table 4. Study quality evaluation of epidemiology studies of PM2.5 and mortality.

With regard to the study reporting aspects, all 23 studies clearly described the study objectives and the size of PM fraction, reported participant characteristics, and presented descriptive statistics. All of the studies clearly reported the inclusion/exclusion criteria that were also consistent with study objectives except the study by Deng et al. (Citation2017), which did not report inclusion/exclusion criteria. No study presented univariate analyses with PM2.5, covariates, and copollutants, although it was not uncommon for the studies to instead present analyses that adjusted for a minimum set of confounders.

While the specifics related to study conduct varied among the studies, they all share many common strengths and limitations. With regard to the study design category, all of the studies used a cohort study design with long study duration (i.e. multiple years). Most of the studies were conducted in multiple cities across multiple states/provinces, except for five studies where participants were from a single state/province (Ostro et al. Citation2010, Citation2015; Hartiala et al. Citation2016; Chen et al. Citation2016; Deng et al. Citation2017). None of the studies presented a power calculation to indicate sufficient sample size, however, so the three studies considered to have met the sample size criterion had sample sizes that were greater than 1 million (Crouse et al. Citation2015; Weichenthal et al. Citation2017; Di et al. Citation2017).

With regard to the study population category, all six studies that were conducted among patients with underlying health conditions (Hartiala et al. Citation2016; Chen et al. Citation2016; Deng et al. Citation2017; DuPre et al. Citation2019; Malik et al. Citation2019; Lipfert and Wygza Citation2019) ascertained these conditions by independent clinical assessment or self-report of physician diagnosis. Because all 23 studies were secondary analyses of existing cohorts for which members were initially recruited for research questions unrelated to PM2.5 or mortality, and all the studies used linkage to conveniently identify mortality outcomes, we considered the criteria related to representativeness of source, participation rate, and follow-up as not applicable (as discussed above).

With regard to the exposure assessment category, most of the studies used well-established, sensitive methods and sufficiently captured the spatial variability of PM2.5, and all studies estimated participants’ PM2.5 exposures before the outcome. While half of the studies accounted for temporal variability of PM2.5, fewer accounted for residential mobility and only one study (Weichenthal et al. Citation2014) accounted for personal activities by performing a stratified analysis by estimated time spent outdoors. The majority of the studies also did not compare more than one exposure assessment method. Importantly, half of the studies did not assign measured or estimated ambient PM2.5 data to participants’ locations from the same time period. Specifically, eight studies assigned PM2.5 data from as long as 10+ years later to participants’ locations (Jerrett et al. Citation2009; Hart et al. Citation2011; Lepeule et al. Citation2012; Villeneuve et al. Citation2015; Crouse et al. Citation2015; Turner et al. Citation2016; Thurston et al. Citation2016; Lipfert and Wyzga Citation2019); three studies assigned PM2.5 data from as far as 5+ (but <10) years later to participants’ locations (Weichenthal et al. Citation2014, Citation2016; Pinault et al. Citation2016); and one study assigned to PM2.5 data from as far as 5+ (but <10) years earlier to participants’ locations (DuPre et al. Citation2019).

With regard to the confounding category, none of the studies adjusted for all of the key potential confounders. Specifically, very few (n = 1–2) studies adjusted for relative humidity or other chemical exposures; and only a few studies adjusted for temperature (n = 4), medication use (n = 5), physical activity (n = 6), and diet (n = 8). A small number of studies also did not adjust for race, body mass index (BMI), or smoking status (n = 3–5). Nonetheless, the confounders that were included in most of studies were adjusted for properly. Copollutants were not adjusted for in more than half of the studies. In the studies that accounted for copollutant exposures, most of these examined the correlations between PM2.5 and the copollutants; however, the measurements of copollutants in these studies were subject to errors, as they did not properly account for temporal variation, spatial variation, residential mobility, or personal activities.

With regard to the statistical methods category, all studies employed appropriate statistical models (i.e. Cox proportional hazards model) for multivariate analyses, but only four studies (Lepeule et al. Citation2012; Chen et al. Citation2016; DuPre et al. Citation2019; Malik et al. Citation2019) indicated that key model assumptions (i.e. proportional hazards assumptions) were tested and satisfied. All but five studies are subject to the multiple comparison issue (with the number of comparisons as high as approximately 60), but none of these studies performed any correction to address this issue. While the primary objectives of the studies are variable, all but one study (Malik et al. Citation2019) assessed the robustness of the PM2.5-mortality risk estimates and half of the studies assessed potential non-linearity of the PM2.5-mortality relationship.

With regard to outcome assessment, in all studies the assessments of outcome were at time points consistent with study objectives and were blinded to exposure levels. With regard to PM2.5 specification, only four studies (Ostro et al. Citation2010, Citation2015; Turner et al. Citation2016; Lefler et al. Citation2019) additionally evaluated PM2.5 source-related indicators.

Evaluation of study results

The linear and non-linear study results are summarized in , respectively. Regarding linear results, we included the fully adjusted result of the PM2.5-mortality association reported for each study in . If the fully adjusted result was adjusted for copollutants, we further included the result without copollutant adjustment, if available, for comparison purposes. When statistically significant effect modification on the PM2.5-mortality association was reported, we also included stratum-specific results, if available. If a study reported results for multiple PM2.5 indicators (e.g. modeled and monitored, generated from different prediction models, within different buffers), mortality indicators (i.e. all-cause and non-accidental), or statistical analyses (e.g. weighted vs. non-weighted, time-dependent vs. time-independent), we included all such results for comparison purposes. We included all non-linear results reported in the studies in . Below, we present and discuss results by type of study population (i.e. general population, occupation-specific cohorts, and patients with underlying health conditions), as the results for one of type of study population cannot necessarily be applied to another type of study population.

Table 5. Results from linear association analyses.

Table 6. Results of non-linear association analyses.

General population

Eleven of the reviewed studies were conducted in the general population (Jerrett et al. Citation2009; Lepeule et al. Citation2012; Villeneuve et al. Citation2015; Crouse et al. Citation2015; Weichenthal et al. Citation2016, Citation2017; Pinault et al. Citation2016; Thurston et al. Citation2016; Turner et al. Citation2016; Di et al. Citation2017; Lefler et al. Citation2019). All of these studies reported a risk estimate for the PM2.5-mortality association assuming linearity. Seven of the eleven studies (Lepeule et al. Citation2012; Villeneuve et al. Citation2015; Crouse et al. Citation2015; Pinault et al. Citation2016; Thurston et al. Citation2016; Weichenthal et al. Citation2017; Di et al. Citation2017) also evaluated potential non-linearity of the association.

Linear Results. All studies in the general population without copollutant adjustment reported a statistically significant, positive association between PM2.5 exposure and mortality (either all-cause or non-accidental), with the exception of the study by Thurston et al. (Citation2016), which reported a statistically non-significant, positive association between PM2.5 and non-accidental mortality (hazard ratio [HR] = 1.03, 95% CI: 1.00–1.05 in a time-independent analysis; HR = 1.03, 95% CI: 0.99–1.05 in a time-dependent analysis). The magnitude of the HR estimates in these studies ranged from 1.026 (95% CI: 1.012–1.039) in the study by Weichenthal et al. (Citation2016) to 1.26 (95% CI: 1.19–1.34) in the study by Pinault et al. (Citation2016), although the corresponding exposure metric, exposure contrast, and adjustment of other confounders (i.e. other than copollutants) varied. The HR estimates in 8 of the 11 studies fell under 1.10, indicating weak associations. The width of 95% CIs in the largest study (Di et al. Citation2017; n = 60,925,443; HR = 1.084, 95% CI: 1.081–1.086) is substantially narrower than that in the smallest study (Lepeule et al. Citation2012; n = 8,096; HR = 1.14, 95% CI: 1.07–1.22). Although a larger sample size increases the statistical power of a study to detect an effect, when the sample size is too large (such as in the millions in the studies by Di et al. Citation2017; Crouse et al. Citation2015;; Weichenthal et al. Citation2017), statistically significant findings could be artifacts due to inflated statistical power and extremely narrow confidence intervals rather than reflecting a true underlying association, so the results from such studies should be interpreted with caution. Results did not appear to differ substantially between studies of all-cause vs. non-accidental mortality, modeled vs. monitored PM2.5, or US vs. Canadian populations.

Statistically significant effect modification by sex was identified by Pinault et al. (Citation2016), where males (HR = 1.344, 95% CI: 1.239–1.457, per 10 μg/m3 increment of PM2.5) had a higher risk of mortality (non-accidental) than females (HR = 1.181, 95% CI: 1.088–1.282, per 10 μg/m3 increment of PM2.5). The latter risk estimate is slightly higher than what was reported in the female-only study by Villenueve et al. (Citation2015) (HR = 1.10, 95% CI: 1.03–1.17 for all-cause mortality; HR = 1.12, 95% CI: 1.04–1.19 for non-accidental mortality), which may be attributable to differences in study design.

Eight of the studies in the general population estimated the PM2.5-mortality association with further copollutant adjustment. Specifically, four studies further adjusted for ozone (O3) alone (Jerrett et al. Citation2009; Thurston et al. Citation2016; Turner et al. Citation2016; Di et al. Citation2017); one study further adjusted for glutathione-related oxidative potential (OPGSH) alone (Weichenthal et al. Citation2016); two studies further adjusted for both O3 and nitrogen dioxide (NO2) (Crouse et al. Citation2015; Weichenthal et al. Citation2017); and one study further adjusted for PM2.5–10, O3, NO2, sulfur dioxide (SO2), and carbon monoxide (CO) (Lefler et al. Citation2019). Compared to the risk estimate without copollutant adjustment within the same study, the risk estimate with further adjustment for copollutants was slightly attenuated (i.e. closer to the null) in five of the eight studies (Crouse et al. Citation2015; Thurston et al. Citation2016; Weichenthal et al. Citation2016, Citation2017; Di et al. Citation2017). This attenuation is expected, as copollutant concentrations tend to be positively associated with PM2.5 and mortality (WHO Citation2006; US EPA Citation2019). By contrast, the risk estimate with further adjustment for copollutants remained the same in one study (Turner et al. Citation2016) and was slightly exaggerated (i.e. further away from the null) in two studies (Jerrett et al. Citation2009; Lefler et al. Citation2019). This variation in results could be due to variation in copollutant adjustments or errors in copollutant measurements that are of similar sources as PM2.5 measurement errors. However, it is worth noting that the copollutant adjustments in these studies are likely ineffective, as in none of the eight studies were copollutants measured at both the same temporal and spatial scales as PM2.5 to fully and accurately capture how the different pollutants were correlated with each other.

With the adjustment of O3, Di et al. (Citation2017) identified statistically significant effect modification by sex. Similar to the study by Pinault et al. (Citation2016), which did not adjust for copollutants, Di et al. (Citation2017) reported that males (HR = 1.087, 95% CI: 1.083–1.090, per 10 μg/m3 increment of PM2.5) were at higher risk of mortality (all-cause) than females (HR = 1.060, 95% CI: 1.057–1.063, per 10 μg/m3 increment of PM2.5).

The seemingly consistent linear results in the studies should be interpreted with caution, considering the large variations across studies in terms of participants’ characteristics (e.g. location, age, sex, race), exposure assessment (e.g. measurement, metric, contrast), outcome type (all-cause vs. non-accidental), and confounder adjustments. In fact, heterogeneity underlying the consistent linear results in recent studies of long-term PM2.5 and mortality has been reported by Di et al. (Citation2017). Specifically, these authors compiled the results of 22 studies (including studies published prior to 2009) that reported HR estimates ranging from 1.01 to 1.26, which are very similar to the HR estimates from the studies reviewed here. Di et al. (Citation2017) performed a meta-analysis of these studies using a random-effect model and reported a meta-HR of 1.11 (95% CI: 1.08–1.15). A heterogeneity test indicated a high degree of heterogeneity (I-squared = 95.9%, tau-squared = 0.0035, p < 0.0001) among the study results, however. While it is possible that the large variations in study design aspects across studies have only small impacts on the magnitude of risk estimates, one cannot rule out that the impact of this variation could also be large but masked by other factors that are consistently and potentially substantially influencing the studies and their risk estimates, as discussed below in the evaluation of study quality.

Non-linear Results. In the evaluation of potential non-linearity of the PM2.5-mortality association, six of the seven studies (Lepeule et al. Citation2012; Villeneuve et al. Citation2015; Crouse et al. Citation2015; Pinault et al. Citation2016; Thurston et al. Citation2016; Di et al. Citation2017) used spline techniques, although with varied types of spline, degrees of freedom, and confounding adjustments. Unlike the linear results summarized above, the observed shapes of the PM2.5-mortality curves are inconsistent across the studies. Two studies reported a linear shape for the PM2.5-mortality (all-cause) curve with no apparent threshold (Lepeule et al. Citation2012; Di et al. Citation2017). Three studies reported a supralinear shape for the PM2.5-mortality (non-accidental) curve (Crouse et al. Citation2015; Pinault et al. Citation2016; Weichenthal et al. Citation2017), among which Pinault et al. (Citation2016) further estimated a threshold PM2.5 concentration of 0 μg/m3 (+95% CI = 4.5 μg/m3). Villeneuve et al. (Citation2015) reported the PM2.5-mortality (non-accidental) curve to be V-shaped, with an estimated threshold at 11 μg/m3 (p = 0.004), and Thurston et al. (Citation2016) reported the shape of the PM2.5-mortality (non-accidental) curve to be monotonically increasing.

While all the studies in the general population estimated linear associations between PM2.5 and mortality, the observed non-linear curves in the studies above indicate that linearity may not be a valid modeling assumption. The contrast between highly consistent linear results and highly inconsistent non-linear results in these studies also indicates that the linearity assumption, although straightforward, may have masked important heterogeneity and details of the underlying PM2.5-mortality relationships, especially considering the variations in PM2.5 assessment approach (e.g. prediction model, exposure metric, exposure contrast, and exposure window or lag time), PM2.5 concentration distribution, and confounding adjustment across the studies. It is also possible that the different non-linear modeling techniques used in the studies could contribute to the variations in the observed shapes of the PM2.5-mortality association across studies.

Study Quality. The studies conducted in the general population share certain strengths and limitations. All 11 studies were conducted in multiple cities, so the study results have higher generalizability across North American populations. Nine of the eleven studies had a sample size of 100,000 or greater, indicating these studies have greater statistical power to detect an underlying PM2.5-mortality association, if it exists. Specifically, the studies in the general population included three of the largest studies in this review, with sample sizes in the millions (Crouse et al. Citation2015; Weichenthal et al. Citation2017; Di et al. Citation2017). As discussed above, however, the extremely large sample sizes of these three studies can inflate statistical power such that the weak but statistically significant findings reported in these studies may be artifacts rather than a representation of a true underlying association.

In general, all 11 studies assessed each participant’s exposure to PM2.5 by assigning to his/her location (primarily residential location) an ambient PM2.5 concentration that was either from direct measurements at one or a few nearby stationary monitoring sites or estimates from prediction models. This approach for exposure assessment does not account for individual factors, such as time spent indoors or at non-residential locations and personal activities, that vary among participants and can greatly affect their actual PM2.5 exposures. Further, while 10 of the 11 studies meet our quality criterion for spatial variability and 7 of the 11 studies meet the criterion for temporal variability, only three studies meet the criterion for assignment to participants’ locations, three studies meet the criterion for residential mobility, and none of the 11 studies meet the criterion for personal activities. These indicate that the results of all studies are subject to substantial exposure measurement error, though the associated overestimation or underestimation of PM2.5 exposure and the direction of bias to the study results are difficult to anticipate.

It is important to note that for the eight studies that did not assign measured or estimated ambient PM2.5 data to participants’ locations in the same time period (Jerrett et al. Citation2009; Lepeule et al. Citation2012; Villeneuve et al. Citation2015; Crouse et al. Citation2015; Weichenthal et al. Citation2016; Turner et al. Citation2016; Pinault et al. Citation2016; Thurston et al. Citation2016), the reported distribution of PM2.5 concentrations was likely not representative of the distribution of participants’ actual PM2.5 exposure. Considering that ambient PM2.5 concentrations are generally decreasing over time due to the implementation of more stringent regulations, and that all eight studies that do not meet the ‘assignment to participants’ locations’ criterion assigned PM2.5 data from as long as 5+ to 10+ years later to participants’ locations, these studies likely have underestimated the participants’ actual PM2.5 exposure concentration and overestimated the mortality rate associated with lower PM2.5 exposures. It is only from the three studies that meet this criterion (Weichenthal et al. Citation2017; Di et al. Citation2017; Lefler et al. Citation2019) that an inference can confidently be made regarding the PM2.5 concentration under which an association was observed with mortality (mean PM2.5 concentration was 7.37 μg/m3 in the study by Weichenthal et al. Citation2017; 10.67 μg/m3 in the study by Lefler et al. Citation2019; and 11 μg/m3 in the study by; Di et al. Citation2017). Still, in making such an inference, the other potential sources of exposure measurement error mentioned above, as well as other sources of bias and confounding, also need to be taken into consideration.

While 8 of the 11 studies adjusted for copollutants, none adjusted for physical activity or medication use, and few studies adjusted for diet, humidity, temperature, or other chemical exposures as potential confounders or primary covariates. Thus, the results of these studies, even those that are the largest and less subject to exposure measurement error (i.e. by meeting our criteria for all aspects of the exposure assessment category except for personal activities and multiple methods) (Weichenthal et al. Citation2017; Di et al. Citation2017), are still subject to residual confounding by these and other unmeasured and unknown factors. Moreover, 7 of the 11 studies examined nonlinearity, although their findings are inconsistent, as discussed above.

Occupation-specific cohorts

Six of the reviewed studies were conducted in occupation-specific cohorts without known underlying health conditions (Ostro et al. Citation2010, Citation2015; Puett et al. Citation2011; Hart et al. Citation2011, Citation2015; Weichenthal et al. Citation2014). By contrast, two studies of occupation-specific cohorts that focus only on individuals with health conditions are included below in the evaluation of studies of patients with underlying health conditions (DuPre et al. Citation2019; Lipfert and Wyzga Citation2019). Similar to the studies conducted in the general population, the six studies conducted in occupation-specific cohorts all reported a risk estimate (HR) for the PM2.5-mortality association assuming linearity. Two of the six studies (Weichenthal et al. Citation2014; Hart et al. Citation2015) also evaluated potential non-linearity of the association.

Linear Results. Among the six studies in occupation-specific cohorts, three studies included females only (teachers in the studies by Ostro et al. Citation2010, Citation2015; nurses in the study by Hart et al. Citation2015), two studies included males only (health professionals in the study by Puett et al. Citation2011; trucking industry workers in the study by Hart et al. Citation2011), and only one study included both males and females (commercial pesticide applicators, farmers, and their families in the study by Weichenthal et al. Citation2014). All studies reported results without copollutant adjustment, and only one study (Puett et al. Citation2011) further reported copollutant-adjusted results.

The three studies among females do not report consistent results. Although both Ostro et al. (Citation2010) and Ostro et al. (Citation2015) examined PM2.5-mortality (non-accidental) associations among participants of the CTS, the former study reported statistically significantly positive associations (within 8 km buffer, HR = 1.49, 95% CI: 1.28–1.74; within 30 km buffer, HR = 1.45, 95% CI: 1.36–1.55) whereas the latter study reported no association (HR = 1.01, 95% CI: 0.98–1.05). A key difference between the two studies is that Ostro et al. (Citation2010) examined direct site measured PM2.5 and restricted the analyses to subjects whose residences were within 8 km and 30 km of a monitor, respectively, whereas Ostro et al. (Citation2015) examined modeled PM2.5 and included CTS participants regardless of their distance to monitors. As a result, the participants in the study by Ostro et al. (Citation2010) (n = 7,888 within 8 km buffer; n = 44,847 within 30 km buffer) are largely a non-representative subsample of the participants in the study by Ostro et al. (Citation2015) (n = 101,884) and the results of the two studies are not directly comparable. Other differences between the two studies that could have partly contributed to the difference in observed results may be related to the follow-up period, as well as the exposure metric, temporal scale, and contrast. The study among female nurses by Hart et al. (Citation2015) reported a positive PM2.5-mortality (non-accidental) association (HR = 1.13, 95% CI: 1.05–1.22 for modeled PM2.5; HR = 1.12, 95% CI: 1.05–1.21 for monitored PM2.5) that is of similar magnitude to the female-specific results reported by Pinault et al. (Citation2016) and Villeneuve et al. (Citation2015) in studies conducted in the general population.

The magnitude of the results of the two male-only studies conducted in occupation-specific cohorts, without copollutant adjustment, are weaker than the male-specific result in the general population reported by Pinault et al. (Citation2016). Specifically, Puett et al. (Citation2011) reported no PM2.5-mortality (non-accidental) association (HR = 0.94, 95% CI: 0.87–1.00) and Hart et al. (Citation2011) reported a very weak, positive PM2.5-mortality (all-cause) association (HR = 1.04, 95% CI: 1.01–1.07), whereas Pinault et al. (Citation2016) reported an HR of 1.344 (95% CI: 1.239–1.457). While the healthy worker effect is often a possible explanation for weaker associations observed in occupation-specific cohorts compared to the general population, such speculation should be made with caution in this case because Pinault et al. (Citation2016) reported an association that is much stronger than all the other studies conducted in the general population and, therefore, could be an outlier. With copollutant adjustment, Puett et al. (Citation2011) still reported no PM2.5-mortality (non-accidental) association (HR = 0.94, 95% CI: 0.87–1.02), as opposed to the male-specific result of a weak positive association in the general population with copollutant adjustment reported by Di et al. (Citation2017) (HR = 1.087, 95% CI: 1.083–1.090).

Weichenthal et al. (Citation2014) reported no PM2.5-mortality (non-accidental) association, either overall or in sex-specific subgroups, although the exact P-value was not reported for the test of effect modification by sex. These null findings are consistent with the null results reported by Puett et al. (Citation2011) and Ostro et al. (Citation2015), although the studies vary by occupation of participants and many other aspects of study design.

Non-linear Results. Hart et al. (Citation2015) used stepwise restricted cubic spline techniques (degree of freedom not reported) to evaluate potential non-linearity of the PM2.5-mortality (non-accidental) association and reported an approximately linear shape of the curve for both direct site measured PM2.5 and modeled PM2.5, similar to the non-linear results reported in the studies by Di et al. (Citation2017) and Lepeule et al. (Citation2012) that were conducted in the general population. A potential threshold for the PM2.5-mortality curve was not examined by Hart et al. (Citation2015).

In the study by Weichenthal et al. (Citation2014), the authors stated that ‘concentration–response functions were graphed using natural splines for PM2.5 with two degrees of freedom using adjusted Cox survival models.’ However, non-linear results were only reported for cardiovascular-specific mortality, the other health outcome of interest in the study, and not for non-accidental mortality.

Study Quality. The studies conducted in occupation-specific cohorts share certain strengths and limitations. In general, these studies have smaller sample sizes than the studies conducted in the general population. The two largest studies have sample sizes just above 100,000, which we considered insufficient without justification from power calculation in our study quality evaluation. Because of the particular characteristics of workers and the limited geographic locations within which some of the studies were conducted (e.g. Ostro et al. Citation2010, Citation2015; Puett et al. Citation2011; Weichenthal et al. Citation2014), the results of these studies have limited generalizability.

Similar to the studies conducted in the general population, the six studies conducted in occupation-specific cohorts all assessed each participant’s exposure to PM2.5 by assigning to his/her location (primarily residential location) an ambient PM2.5 concentration that was either from direct site measurements at one or a few nearby stationary monitoring sites or estimates from prediction models; this methodology is subject to substantial exposure measurement error. Yet, most of the occupation-specific studies meet our criteria for assignment to participants’ locations and residential mobility and are therefore less subject to exposure measurement error associated with these aspects, which is a clear strength compared to the studies conducted in the general population.

The occupation-specific studies also, in general, adjusted for a larger number of key confounders, particularly individual-level behavioral factors (including diet, physical activity, and medication use), than the studies conducted in the general population. The results of the occupation-specific studies are still subject to residual confounding by other key confounders (particularly temperature, relative humidity, and other chemical exposures), as well as unmeasured and unknown confounders, however. Five of the six studies conducted in occupation-specific cohorts, including two studies that are less subject to exposure measurement error (i.e. by meeting our criteria for all aspects of the exposure assessment category except for personal activities and multiple methods) (Hart et al. Citation2015; Ostro et al. Citation2015), did not adjust for copollutants, indicating the results of these studies likely do not reflect the independent association of PM2.5 with mortality. This is a clear limitation compared to the studies conducted in the general population. In the only study that did adjust for copollutants (Puett et al. Citation2011), the correlation between PM2.5 and copollutants was not examined (which undermines the effectiveness of copollutant adjustment) and thus the study does not meet the quality criterion for copollutant measurement.

As mentioned above, nonlinearity was not examined in most of the studies conducted in occupation-specific cohorts, which is a clear limitation compared to the studies conducted in the general population. In addition, because non-linear results were not reported for non-accidental mortality by Weichenthal et al. (Citation2014), we did not consider this study as meeting the nonlinearity criterion in the study quality evaluation, although it is possible that the authors examined the PM2.5-mortality (non-accidental) curve but did not report the results.

Patients with underlying health conditions

Six of the reviewed studies were conducted in patients with underlying health conditions (Hartiala et al. Citation2016; Chen et al. Citation2016; Deng et al. Citation2017; Malik et al. Citation2019; DuPre et al. Citation2019; Lipfert and Wyzga Citation2019). As noted above, these include two studies where patients were also from occupation-specific cohorts (DuPre et al. Citation2019; Lipfert and Wyzga Citation2019). Similar to the studies conducted in the general population and in occupation-specific cohorts, the studies conducted in patients with underlying health conditions all reported a risk estimate (HR) for the PM2.5-mortality association assuming linearity. Three of the six studies (Chen et al. Citation2016; Deng et al. Citation2017; Malik et al. Citation2019) also evaluated potential non-linearity of the association.

Linear Results. Of the six studies in patients with underlying health conditions, four included patients with cardiovascular disease (CVD) or CVD risk factors (e.g. myocardial infarction [MI] in the studies by Malik et al. Citation2019; Chen et al. Citation2016; undergoing elective diagnostic coronary angiography in the study by Hartiala et al. Citation2016; male ostensibly hypertensive veterans in the study by Lipfert and Wyzga Citation2019) and two studies included cancer patients (e.g. female nurses with breast cancer in the study by DuPre et al. Citation2019; hepatocellular cancer in the study by Deng et al. Citation2017). One of the six studies (Malik et al. Citation2019) only reported copollutant-adjusted results, whereas the other five studies only reported results without copollutant adjustment.

Both studies conducted among MI patients reported statistically significant positive associations between PM2.5 and mortality (HR = 1.13, 95% CI: 1.07–1.20 in the study by Malik et al. Citation2019; HR = 1.22, 95% CI: 1.03–1.45 in the study by Chen et al. Citation2016), which are stronger than most of the associations reported in the general population. It is possible that MI patients are more susceptible to the impact of PM2.5 exposure, but this contrast in magnitude of association could also be at least partly attributable to differences in PM2.5 assessment, adjustments of confounders and copollutants, and other study design aspects. Chance findings also cannot be ruled out for the observed stronger association among MI patients, particularly because of the very small number of studies of these patients.

On the contrary, the two studies conducted among patients with CVD risk factors reported mixed results, with either weaker positive, null, or negative associations. Specifically, Hartiala et al. (Citation2016) reported no association between PM2.5 and mortality (all-cause) in patients undergoing elective diagnostic coronary angiography (HR = 1.16, 95% CI: 0.96–1.41). Lipfert and Wyzga (Citation2019) examined the PM2.5-mortality (all-cause) association among male ostensibly hypertensive veterans and reported a very weak, positive association among whites (HR = 1.051, 95% CI: 1.005–1.100) and a statistically significant inverse association among blacks (HR = 0.817, 95% CI: 0.750–0.891). It is possible that patients with CVD risk factors, similar to the general population, are less susceptible to the impact of PM2.5 exposures compared to MI patients, but, given the large variations in study design aspects and the very small number of studies available, it is impossible to rule out other possible explanations, such as confounding, bias, or chance.

DuPre et al. (Citation2019) reported no PM2.5-mortality (all-cause) association in female nurses with breast cancer (HR = 1.12, 95% CI: 0.96–1.30); whereas Deng et al. (Citation2017) reported a positive PM2.5-mortality (all-cause) association in patients with hepatocellular cancer (HR = 1.18, 95% CI: 1.16–1.20). The magnitude of this association is similar to those reported in MI patients and greater than most of the associations reported in the general population. Although it is possible that hepatocellular cancer patients are also more susceptible to the impact of PM2.5 exposure compared to the general population, it cannot be ruled out that the observed contrast is attributable to confounding, bias, or chance, given the large variations of study design aspects and the very small number of studies available.

Non-linear Results. All three studies that evaluated potential non-linearity of the PM2.5-mortality association used cubic spline techniques, although the degree of freedom and confounding adjustments varied. Both of the studies conducted among MI patients (Chen et al. Citation2016; Malik et al. Citation2019) reported a linear shape for the PM2.5-mortality curve, similar to the studies by Di et al. (Citation2017) and Lepeule et al. (Citation2012) that were conducted in the general population, and to the study by Hart et al. (Citation2015) that was conducted in an occupation-specific cohort. In the study by Deng et al. (Citation2017) that was conducted among patients with hepatocellular cancer, a J-shaped PM2.5-mortality (all-cause) curve was reported. Potential thresholds for the PM2.5-mortality curve were not examined in the studies among patients with underlying health conditions.

Study Quality. The six studies conducted among patients with underlying health conditions share certain strengths and limitations. In general, these studies have smaller sample sizes than the studies conducted in the general population and in occupation-specific cohorts, with four studies having sample sizes below 10,000, where statistical power is very limited considering the large number of potential confounders adjusted for. All underlying health conditions were ascertained by independent clinical assessment or self-report of physician diagnosis and as such, all six studies meet our study quality criterion for underlying health conditions. Because of the particular characteristics of patients and the limited geographic location within which some of the studies were conducted (e.g. Hartiala et al. Citation2016; Chen et al. Citation2016; Deng et al. Citation2017), however, the results of these studies have limited generalizability across populations. Three of the six studies tested model assumptions in their statistical analyses to ensure that they were satisfied, which is a strength compared to the studies conducted in the general population and occupation-specific cohorts where almost none of the studies did such testing.

Similar to the studies conducted in the general population and in occupation-specific cohorts, the six studies conducted in patients with underlying health conditions all assessed each participant’s exposure to PM2.5 by assigning to his/her location (primarily residential location) an ambient PM2.5 concentration that was either from direct site measurements at one or a few nearby stationary monitoring sites or estimates from prediction models; this methodology is subject to substantial exposure measurement error. As with the studies conducted in occupation-specific cohorts, most of the studies among patients meet our study quality criterion for assignment to participants’ locations, which is a clear strength compared to the studies conducted in the general population. Similar to the studies conducted in the general population, most of the studies among patients do not meet the criterion for residential mobility, which is a clear limitation compared to occupation-specific cohorts. Further, most of the studies among patients do not meet the criteria for spatial or temporal variabilities, which is a clear limitation compared to studies conducted in the general population and occupation-specific cohorts. As a result, the studies among patients are also subject to exposure measurement error due to a lack of accounting for residential mobility or spatial or temporal variabilities.

As with the studies conducted in occupation-specific cohorts, the studies conducted in patients with underlying health conditions are more likely to have adjusted for individual-level behavioral factors, such as physical activity and medication use, than the studies conducted in the general population. However, most of the studies conducted in patients did not adjust for at least one of the key confounders that were typically adjusted for in the studies conducted in the general population, including race, BMI, and smoking. As such, the results of these studies are still subject to residual confounding by many key, unmeasured, and unknown confounders. Similar to the studies conducted in occupation-specific cohorts, five of the six studies conducted in patients, including the study that is less subject to exposure measurement error (i.e. by meeting our criteria for all aspects of the exposure assessment category except for personal activities and multiple methods) (Chen et al. Citation2016), did not adjust for copollutants, indicating the results of these studies likely do not reflect the independent association of PM2.5 with mortality. This is a clear limitation compared to the studies conducted in the general population. The only study that did adjust for copollutants (Malik et al. Citation2019) did not meet our study quality criterion for copollutant measurement, which undermines the effectiveness of copollutant adjustment.

Evidence integration

We integrated the evidence across the epidemiology studies using modified Bradford Hill aspects (Supplemental Table S1) as a framework. These aspects were originally developed to answer the question ‘is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?’ (Hill Citation1965). Thus, the aspects should be used as guides for evaluating alternative explanations of the observed patterns in the study results and to assess whether they are a more compelling explanation of the results at hand than the explanation of causality (Rhomberg et al. Citation2013). Although evidence integration is typically conducted by assigning greater weight to higher quality studies and less weight to lower quality studies, the delineation of studies into higher and lower quality groups was not done in this review, considering the shared key strengths and limitations (e.g. with respect to exposure assessment, confounding, and statistical methods) and apparent consistency of the linear results across studies. In addition, some of the shared strengths and limitations go beyond the study quality criteria (e.g. using ambient PM2.5 concentration to estimate individual PM2.5 exposure, assuming linearity of the PM2.5-mortality relationship), and as discussed above, could have consistently and more substantially affected the studies and their risk estimates. Therefore, we incorporated the overall study strengths and limitations into the integration of evidence, particularly where they are relevant to the evaluation of alternative explanations of the results.

Consistency

Evidence for causality is stronger if consistent effects are observed among studies of different designs, populations, locations, circumstances, and time periods. The studies reviewed here were conducted in various locations across the US and Canada and evaluated different types of populations (general, occupational, or patients with underlying health conditions). All used a cohort study design, but there were many differences among studies with regard to specific aspects of study conduct. Despite the differences in these factors across studies, the majority of studies (particularly those in the general population) reported weak, positive associations that were statistically significant.

Null associations were reported more often in occupational populations compared to the general population, which is not surprising given that occupational populations tend to be healthier than the general population (Li and Sung Citation1999; Chowdhury et al. Citation2017). Studies of MI patients reported stronger positive associations than most of those reported for the general population, whereas studies of patients with CVD risk factors or cancer patients reported mixed results, with some positive, null, or negative PM2.5-mortality associations. One would expect that patients with underlying health conditions would be more susceptible and thus would have a greater risk of mortality from PM2.5 exposure, but this was only the case for the studies of MI patients and not patients with other health conditions. Given the small number of studies of patients with each particular underlying health condition however, it cannot be ruled out that the observations from these studies may be attributable to chance, bias, or confounding. Regardless, most of the studies conducted in the general population, as well as some of those conducted in occupational and patient populations, reported risk estimates of a similar magnitude, indicating that there is some consistency for weak, positive associations between long-term exposure to PM2.5 and total (all-cause or non-accidental) mortality across studies.

Strength of association

Large and precise risk estimates for an exposure-outcome association are less likely to be due to bias, confounding, or chance and, therefore, are more indicative of an underlying causal relationship than risk estimates that are small and imprecise. Although the HRs for the PM2.5-mortality association reported in the studies in this review are mostly of high precision (and of extremely high precision in the studies with extremely large sample sizes), their magnitudes mostly indicate a weak association. Considering the substantial extent of potential bias and confounding that these HR estimates are subject to based on the methodology of the studies, the weak associations do not support a causal PM2.5-mortality relationship.

A key source of bias to the reported weak associations is PM2.5 exposure measurement error, which could be substantial. As discussed above, all studies assessed each participant’s exposure to PM2.5 by assigning to his/her location (primarily residential location) an ambient PM2.5 concentration that was either from direct measurements at one or a few nearby stationary monitoring sites or estimates from prediction models, which does not account for individual factors that vary among participants (such as time spent indoors or at non-residential locations, and personal activities) and can greatly affect their actual PM2.5 exposures. Moreover, almost none of the studies reviewed here accounted for personal activities, and many of the studies did not assign ambient PM2.5 data to participants’ locations from the same time period and did not account for temporal variability or residential mobility.

Another potential important source of bias is model misspecification. As discussed above, all studies calculated a risk estimate for the PM2.5-mortality association assuming linearity, but the shapes of the PM2.5-mortality curves varied across the studies that also evaluated potential non-linearity, indicating that linearity may not be a valid modeling assumption. In calculating the risk estimate under a linear assumption, all studies also used a Cox proportional hazards regression model, which relies on a key assumption of proportional hazards, yet very few studies tested the proportional hazards assumption to ensure that it was satisfied, leaving biased modeling results unidentified.

The reported weak associations are also subject to confounding by copollutants, unmeasured confounders (e.g. diet, physical activity, temperature, relative humidity, medication use, other chemical exposures, stress, and noise), and unknown confounders (Clougherty and Kubzansky Citation2009; Stansfeld Citation2015; US EPA Citation2019). As discussed above, none of the studies meet the criterion for key confounders and many of the studies did not adjust for any copollutant exposure. Further, in the studies that meet our criterion for copollutant adjustment, only one or a few select copollutants were adjusted for and none of the studies meet the criterion for copollutant measurement, indicating that the copollutant adjustments are likely ineffective and the results likely do not reflect the independent association of PM2.5 with mortality. Residual confounding could also exist when covariate adjustment is incomplete or secular trend is not sufficiently adjusted for (Cox Citation2017).

The above-mentioned universal sources of bias and confounding could have systematically shifted the study results and artificially created consistency of weak, positive associations. Given this consistency across studies, chance is less likely as a possible non-causal explanation compared to bias and confounding. Nonetheless, it is worth noting that the majority of the HR estimates from the studies are subject to the multiple comparison issue, so chance findings are still possible. Overall, the aspect of strength for PM2.5-mortality associations is not met.

Coherence

Coherence occurs when all of the known facts related to an observed association that come from various realms of evidence fit together in a logical manner (Hill Citation1965). Coherence is difficult to assess for the evaluation of associations between long-term PM2.5 exposure and mortality. Controlled human exposure studies are conducted with short exposure durations and evaluate health outcomes of generally low adversity for ethical reasons. Experimental animal studies can be conducted with longer exposure durations and can evaluate more severe health effects, but the available chronic studies of PM2.5 exposure in experimental animals used PM2.5 concentrations that are much higher than ambient concentrations (US EPA Citation2019, Citation2020), so any health effects reported in these studies are not informative regarding potential human health effects at lower PM2.5 concentrations. It is notable, however, that in a review of multiple morbidity studies of rodents with lifetime inhalation exposures to various forms of PM2.5 (such as diesel exhaust, carbon black, and coal dust), there was no increase in mortality for any exposure level compared to controls, even when exposures were so high as to produce lung overload (Gamble Citation1998). Similarly, in studies evaluating atherosclerotic changes in apolipoprotein E-null mice (which are susceptible to atherosclerosis due to their high plasma levels of low-density lipoprotein and very low-density lipoprotein) with chronic exposures to PM2.5, such as those reviewed by Prueitt et al. (Citation2015), mortality was not increased with exposure to PM2.5 concentrations ranging from 85–138 μg/m3 compared to controls. The lack of increased mortality in experimental animal studies of long-term PM2.5 exposure, even at very high concentrations that induce other adverse effects and in an animal model that is susceptible to cardiovascular morbidity, does not provide support for a causal relationship between long-term PM2.5 exposure at lower, ambient concentrations and mortality.

Biological plausibility

Evidence for a plausible biological mechanism for an effect can contribute to a scientifically defensible determination of causation. Agencies such as US EPA consider the underlying morbidities for cardiovascular-, respiratory-, and metabolic disease-specific mortality (which contribute largely to total mortality) as support for the plausibility of associations with all-cause mortality (US EPA Citation2019). Several biological mechanisms have been proposed for these underlying morbidities, based on evidence from experimental animal, controlled human exposure, and epidemiology studies (US EPA Citation2019). Although we did not systematically review this evidence, we provide a high level review of the proposed mechanisms below, based on other comprehensive reviews in the peer-reviewed literature.

Two well-studied mechanistic pathways involve induction of oxidative stress and inflammation in the respiratory tract after inhalation of PM2.5, leading to lung cell injury (Xing et al. Citation2016; Li et al. Citation2018; US EPA Citation2019; Yu et al. Citation2020). Release of inflammatory mediators, as well as direct translocation of PM2.5 particles into the systemic circulation, can contribute to local oxidative stress and inflammation at extrapulmonary sites, resulting in cardiovascular effects (e.g. arrhythmia, atherosclerotic plaque instability) that increase the risk of cardiovascular disease (US EPA Citation2019; Yitshak-Sade et al. Citation2019; Miller Citation2020; Yu et al. Citation2020), or metabolic effects such as insulin resistance and metabolic syndrome comorbidities (US EPA Citation2019). The oxidative stress induced by PM2.5 in the respiratory tract can also disrupt calcium homeostasis by increasing intracellular calcium concentrations, which can further activate inflammatory reactions and lead to cell damage or cell death (Xing et al. Citation2016). There is also evidence from a few experimental animal studies that PM2.5 can modulate the autonomic nervous system, potentially by binding to receptors on lung or nerve cells, resulting in changes in heart rate (US EPA Citation2019; Yang et al. Citation2020). Such changes could potentially lead to cardiovascular outcomes such as hypertension, arrhythmia, and cardiovascular diseases such as ischemic heart disease or heart failure (US EPA Citation2019).

Despite the available mechanistic evidence, the epidemiology evidence for associations between PM2.5 exposure and cardiovascular, respiratory, and metabolic disease morbidity has similar issues (such as potential exposure measurement error and confounding) as the mortality evidence reviewed here, as epidemiology studies for morbidity and mortality are conducted in a generally similar manner. Morbidity evidence that is subject to such uncertainty does not provide strong support for biological plausibility of associations between PM2.5 exposure and mortality. Further, because morbidity associated with air pollution is less severe than mortality and, thus, is a more sensitive indicator of adverse health effects than death, morbidity should show stronger associations than mortality (Gamble Citation1998). This is not observed for PM2.5, however, as the evidence reported in the PM ISA indicates that PM2.5 associations are similar or weaker, but not stronger, as the health effects become less severe (US EPA Citation2019). For example, the evidence is stronger (i.e. effect estimates are higher and positive results are more consistent) for cause-specific mortality compared to underlying morbidity outcomes such as adult asthma prevalence, ischemic heart disease, myocardial infarction, or stroke (US EPA Citation2019, Citation2020).

While some controlled human exposure and experimental animal studies provide evidence for certain morbidity endpoints with exposure to PM2.5, the evidence is not strong nor consistent across studies and the effects are reported almost exclusively at high exposures (US EPA Citation2020) and therefore do not support biological plausibility for more serious effects at ambient exposures. Many of the adverse health effects reported in these experimental studies also have thresholds and do not occur at lower concentrations; for example, Green et al. (Citation2002) reported that various chronic exposure studies in rats with different compositions of PM2.5 indicate that concentrations of 100–200 μg/m3 must be exceeded before potentially adverse changes occur. As this threshold is above ambient concentrations, these experimental studies do not provide support for adverse effects at ambient concentrations. Thus, while there is evidence in the literature for a variety of potential biological mechanisms for the underlying health effects that contribute to total mortality, the experimental studies of adverse health effects with PM2.5 exposure do not provide evidence of biological plausibility for mortality associated with ambient PM2.5 exposures, so the aspect of biological plausibility is only partially met.

Biological gradient

An association is more likely to be causal when a well-characterized exposure-response relationship exists (e.g. disease risk increases with greater exposure intensity and duration). The studies in this review were generally consistent in reporting weak but statistically significant associations that indicate an increasing exposure-response relationship with increasing PM2.5 exposure, but this relationship is not well characterized and therefore may not be reliable. While all the studies reported a risk estimate for the PM2.5-mortality association assuming linearity, as discussed above, a linear PM2.5-mortality relationship with no threshold is not biologically plausible for the underlying morbidity that contributes to the outcome of mortality. Among the studies that also evaluated potential non-linearity of the association, the reported shape varied substantially, from approximately linear to supralinear to V-shaped, J-shaped, or monotonically increasing. Among the two studies that formally evaluated potential thresholds for the PM2.5-mortality curve, the estimated thresholds varied drastically, from 11 μg/m3 to 0 μg/m3 (Villeneuve et al. Citation2015; Pinault et al. Citation2016).

Although a few studies reported an approximately linear shape of the exposure-response curve, the degree of potential bias in those studies due to exposure measurement error (as discussed above) may have been sufficient to produce a false linear result and prevent the detection of a threshold (Rhomberg et al. Citation2011). As discussed above, the reported variation in non-linear shapes across studies also indicates that linearity may not be a valid modeling assumption. In fact, the linear assumption may have masked important heterogeneity and details of the underlying PM2.5-mortality relationships.

Before the PM2.5-mortality curve can be well characterized and contribute to an evaluation of causation with confidence, a number of other issues need to be addressed. For example, the studies in this review rarely used the same non-linear modeling techniques to evaluate the PM2.5-mortality exposure-response curves, so it is unclear as to the extent that this affects the comparability of the non-linear results. The available data at lower levels of PM2.5 (e.g. below the current standard of 12 μg/m3) are sparse, limiting the ability to characterize the curve at lower ambient PM2.5 levels with confidence (Smith and Gans Citation2015). As PM2.5 refers to a heterogeneous mixture of constituents that may vary greatly from one location to the other, and mortality (either all-cause or non-accidental) entails a variety of cause-specific deaths that have different etiologies, it is important to develop methods to account for these heterogeneities when characterizing the PM2.5-mortality curve in a multi-city or even nationwide study (Cox Citation2017). Overall, the aspect of biological gradient is not met, as these issues need to be addressed before the PM2.5-mortality exposure-response relationship can be considered to be well characterized.

Temporality

For a causal relationship to exist, exposure must precede the occurrence of disease with sufficient lag time, if any is expected. Because all the studies in this review were cohort by design, our study quality criterion for temporality is considered as being met in all studies. However, this is not without caveats that undermine the establishment of temporality and, thus, affect a judgment of causality.

As discussed above, all studies in this review were secondary analyses of data from existing cohorts that were initially recruited for research questions unrelated to PM2.5 or mortality. Although the conceptualized study baseline clearly preceded mortality follow-up in each study, ambient PM2.5 data, unlike data for participants’ locations where ambient PM2.5 data were assigned to, were often unavailable at the exact time of baseline (or the time of address update during the follow-up), as the ratings of studies for our ‘assignment to participants’ locations’ study quality criterion show. Of the studies that do not meet the this criterion, all but one study assigned PM2.5 data from as long as 5+ to 10+ years later to participants’ locations, thus underestimating the participants’ actual PM2.5 exposure concentration and overestimating the mortality rate associated with lower PM2.5 exposures.

Another caveat is that the temporality criterion used in this review does not enforce any lag time between PM2.5 exposure and mortality, as such a lag time is largely unknown. However, the PM2.5 exposure windows examined in the studies were often within a period of five years before mortality, which is unlikely to be the most relevant exposure window considering the chronic pathological changes and disease processes that have been proposed as potential underlying causal mechanisms for mortality (US EPA Citation2019). Even in the studies where longer lag times were examined, PM2.5 exposure was only measured for a short period of time when, in fact, PM2.5 exposure persists throughout an individual’s lifetime (even though the concentrations can change over time) and unmeasured historical PM2.5 exposures can be substantially higher than exposures measured in the studies. Thus, even though all of the studies in this review were designed to allow for exposure to precede the outcome, these caveats undermine the full establishment of temporality; therefore, this aspect is only partially met.

Specificity

Causal inference is strengthened when there is evidence that links a specific exposure to a specific health outcome, although any health outcome may have multiple causes. Mortality and the underlying morbidity associated with it have multiple causes and thus are not specific effects of PM2.5 exposure. Other risk factors for mortality include many of the key confounders that should be identified and adjusted for in epidemiology studies examining associations between air pollutants and mortality, such as SES, BMI, physical activity, temperature, relative humidity, medication use, smoking status, and other chemical exposures. As discussed above, other potential confounders not typically measured in air pollution epidemiology studies, such as stress and noise, are also risk factors for mortality (Clougherty and Kubzansky Citation2009; Stansfeld Citation2015; US EPA Citation2019).

It is of note that PM2.5 itself is not a ‘specific’ chemical but rather is comprised of many different solid and liquid constituents that vary in their presence and concentrations across locations and time periods due to variation in their sources. As discussed below, if ambient PM2.5 is causally associated with various health effects, including mortality, the specific constituents responsible are unknown. Overall, the aspect of specificity for PM2.5-mortality associations is not met.

Analogy

The evidence for causality is stronger when a similar substance is an established causal factor for a similar effect. A comparison of PM2.5 to other types of ambient particulates is difficult, as all such particulates in the PM2.5 size fraction are included as PM2.5 components. However, exposures to other size fractions of PM (PM10-2.5 and UFPs) are not established causal factors for mortality, due to limited available data or uncertainties associated with the epidemiology studies of these PM size fractions (US EPA Citation2019).

PM2.5 composition varies from one location to another, and the specific constituents potentially responsible for the reported associations between long-term PM2.5 exposure and mortality are unknown. For example, US EPA recently concluded that the pattern of results across studies of particular components or sources of PM2.5 ‘demonstrate that no individual PM2.5 component or source is a better predictor of mortality than PM2.5 mass’ (US EPA Citation2019). It is notable that experimental studies in both humans and animals indicate that exposures to nonacidic, soluble sulfates and nitrates, which make up sizable mass fractions of ambient PM, are associated with little to no adverse effects (as reviewed by Green et al. Citation2002). Further, exposures to strongly acidic sulfates induce adverse respiratory effects in humans or experimental animals only at high exposure levels (> 100 μg/m3), but such constituents are typically present in ambient air at concentrations below 5 μg/m3 (Green et al. Citation2002). Thus, it is unlikely that lower exposures to these constituents in ambient air are associated with morbidity, let alone mortality.

Environmental tobacco smoke (ETS) is a source of PM2.5 that is itself a mixture of thousands of constituents (Rojas-Rueda et al. Citation2021). Multiple studies have reported statistically significant associations between ETS exposure and all-cause mortality, with the magnitude of associations being similar to or slightly higher than those reported for long-term PM2.5 exposure and all-cause mortality (Lv et al. Citation2015; Diver et al. Citation2018; Pelkonen et al. Citation2019). The concentration of PM2.5 particles in ETS is much higher (up to an order of magnitude) than that of PM2.5 in indoor and outdoor environments where smoking does not occur (Van Deusen et al. Citation2009; Ruprecht et al. Citation2016), so ETS can be considered an analogous substance to PM2.5 exposures well above the PM2.5 NAAQS, but not to lower, ambient concentrations near the PM2.5 NAAQS. Overall, we did not identify any particulate substances similar to PM2.5 that are established causal factors for all-cause mortality at low, ambient concentrations.

Experiment

Natural experiments can provide strong evidence for causation when an intervention or cessation of exposure results in decreased health risks. PM2.5 concentrations have decreased in the US over time as the PM NAAQS have been revised and reduced, but even the epidemiology studies with the most recent PM2.5 data continue to report positive associations between PM2.5 exposure and mortality. Most of the exposure data measured or modeled in the studies reviewed here is from 1990 to 2010, with only one study (Lefler et al. Citation2019) including exposure data after 2013, when the impact of the most recent lowering of the PM2.5 NAAQS (implemented in 2013) can be assessed. It is likely that even if future studies include PM2.5 exposure data from after 2013, they would continue to report positive associations with mortality or other health endpoints at lower and lower exposure concentrations. This is because when annual average PM2.5 concentrations decline during the study period to a similar degree across study locations, it is possible that the distribution of PM2.5 concentrations that occurred in any particular year is associated with mortality that was at least partially attributable to the higher PM2.5 exposures that occurred in earlier years (Smith and Chang Citation2020). In addition, if most studies continue to use similar exposure assessment approaches (e.g. using ambient PM2.5 to estimate individual PM2.5 exposure), the degree of potential bias due to exposure measurement error may produce a false linear result and obscure any thresholds.

Several interventional and ‘accountability’ studies have examined past reductions in ambient PM2.5 and the degree to which those reductions have resulted in decreased health risks by using causal modeling approaches, which are not within the scope of this review. Two recent, comprehensive reviews of air pollution interventional and accountability studies reported mixed results across studies, indicating that measures to reduce PM2.5 have not clearly reduced mortality risks, particularly when confounding was well controlled (Henneman et al. Citation2017; Burns et al. Citation2019a). Even in the studies that showed an association between PM2.5 reduction and mortality reduction, one cannot directly attribute the mortality reduction to a decrease in PM2.5 concentrations, as these studies primarily evaluated the effectiveness of policies that could lower ambient PM2.5 concentrations but could also affect other risk factors for mortality. Conversely, for studies reporting no association between PM2.5 reduction and mortality reduction, one can conclude that similar policy changes do not lead to a reduction in mortality, even though they may have led to a reduction in PM2.5 concentrations. Overall, these studies do not provide any compelling evidence that a reduction in ambient PM2.5 concentrations is associated with a reduction in mortality.

Causal conclusion

We evaluated the potential causal relationship between long-term PM2.5 exposure and mortality using the four-tiered causal framework shown in Supplemental Table S2. The only Bradford Hill aspect that is fully met for the studies in this review is that of consistency, as there is some consistency across studies for reporting weak, positive associations. In addition, the aspects of temporality and biological plausibility are only partially met. All studies in this analysis are cohort by design and thus allow for exposure to precede the outcome; however, several caveats undermine the full establishment of temporality, as discussed above. Although there is evidence for a variety of potential biological mechanisms for the underlying health effects that contribute to total mortality, experimental studies of these effects do not provide evidence of biological plausibility for mortality associated with ambient PM2.5 exposures.

The other Bradford Hill aspects are either not met or there is inadequate information for their full evaluation. The aspect of strength of association is not met, as all reported associations are very weak, and there are many alternative explanations for such small risk estimates, including bias attributable to exposure measurement error or model misspecification, and substantial confounding by copollutants and unmeasured or unknown confounders. The aspect of coherence is not met due to inadequate evidence. The available animal studies of PM2.5 were only conducted at very high concentrations and are not informative regarding potential human health effects at lower PM2.5 concentrations (although increased mortality was not even observed in animals exposed to high concentrations of PM2.5 and thus is not likely to be observed at lower concentrations). The aspect of biological gradient is also not met due to inadequate evidence; although the studies indicate an exposure-response relationship, there are several issues that need to be addressed before it can be well characterized and, thus, reliable (as discussed above).

The aspect of specificity is not met because PM2.5 exposure is not specific to mortality, and PM2.5 is not a specific chemical but is made up of varying constituents depending on the location and time period. The aspect of analogy is also not met, because there are no particulate substances similar to PM2.5 that are established causal factors for all-cause mortality at low, ambient concentrations. Finally, the aspect of experiment is not met due to inconsistent evidence. Although the evidence from interventional and accountability studies does not indicate that reductions of PM2.5 concentrations have clearly reduced mortality risks, these studies only evaluated the effects of policy changes that may have reduced PM2.5 concentrations but could also affect other risk factors for mortality.

Overall, our evaluation of causality using the Bradford Hill aspects indicates that there is some consistency across studies for reporting positive associations, but these associations are very weak and explanations other than causality, such as bias and confounding, cannot be ruled out. There is no coherence with the available experimental evidence and there is no clear evidence for a biological mechanism for PM2.5 to cause mortality at ambient concentrations, and several caveats undermine the full establishment of the aspects of temporality and biological gradient. Exposure to PM2.5 is not specific to mortality, there is no evidence to show that reductions in PM2.5 have clearly reduced mortality risks, and there are no substances similar to PM2.5 that are established causes of mortality. For these reasons, our evaluation supports a conclusion that the evidence for a causal relationship between long-term exposure to ambient PM2.5 and mortality (all-cause or non-accidental) from epidemiology studies published since the 2009 PM ISA is inadequate.

Discussion

We used a transparent systematic review framework based on best practices for evaluating study quality and integrating evidence to conduct a review of the available epidemiology studies evaluating associations between long-term exposure to ambient concentrations of PM2.5 and mortality (all-cause and non-accidental) conducted in North America and published after those included in the 2009 PM ISA. Using a causality framework that incorporates best practices for making causal determinations, we concluded that the evidence for a causal relationship between long-term exposure to ambient PM2.5 concentrations and mortality from these studies is inadequate.

Our conclusion differs from US EPA’s conclusion in the most recent PM ISA that there is a causal relationship between long-term exposure to PM2.5 and total (non-accidental) mortality (US EPA Citation2019). Our review includes all of the North American studies of long-term PM2.5 exposure and all-cause or non-accidental mortality included in the most recent PM ISA (but not also included in the 2009 PM ISA), with the exception of four studies that we excluded because they were ecological studies (Garcia et al. Citation2016; Shi et al. Citation2016; Wang et al. Citation2016; Pun et al. Citation2017); four studies that we excluded because they were the least recent or least informative studies of cohorts examined in more than one study (Lipsett et al. Citation2011; Crouse et al. Citation2012; Kioumourtzoglou et al. Citation2016; Wang et al. Citation2017a); and one study that we excluded because it did not present relevant effect estimates for associations with mortality (Cox and Popken Citation2015). Our review also includes seven studies that were not included in the evaluation of mortality in the PM ISA, likely because most were published after the cutoff date for the literature searches conducted for the PM ISA (Hartiala et al. Citation2016; Deng et al. Citation2017; Weichenthal et al. Citation2017; DuPre et al. Citation2019; Lefler et al. Citation2019; Lipfert and Wyzga Citation2019; Malik et al. Citation2019). Altogether, there are 16 studies included in both our review and the most recent PM ISA. While our conclusion is solely based on the evidence published since the 2009 PM ISA, it is worth noting that US EPA’s conclusion in the 2019 PM ISA, although mainly focused on the most recent studies published since the 2009 PM ISA, also relied on the evidence evaluated in the 2009 PM ISA and the associated conclusions.

Although it is possible that the difference in conclusions regarding causality between our review and that in the PM ISA may be partly attributable to the differences in the specific studies included in each review, it is likely that the difference is also attributable to the methodologies used to evaluate the evidence. In the PM ISA, US EPA (Citation2019) did not evaluate and integrate the evidence for causality in a transparent or systematic manner, as the overall process lacks a detailed protocol to ensure that the evaluation is consistent across studies. The PM ISA also lacks an explanation for how the study quality aspects provided in its Appendix were used in the evaluation and integration of the evidence, as it is clear that these aspects were not applied consistently across studies. The study quality aspects should be included in the discussion of study results so they can be considered in the evaluation (including an evaluation of alternative explanations) and appropriate conclusions with regard to causality can be drawn. While US EPA discussed some of the study quality issues (e.g. exposure measurement error, confounding) in the PM ISA, it did not fully consider their impact on the study results and their implications for causality.

US EPA also uses a five-level causal framework that is prone to bias toward causal conclusions. In this framework, the evidence is considered sufficient to conclude a causal relationship if chance, confounding, and other biases can be ruled out with ‘reasonable confidence’ but does not include guidance for what constitutes ‘reasonable confidence.’ In addition, US EPA’s causal framework requires only one high-quality study for evidence of a causal relationship to be deemed as suggestive, rather than requiring an equivalent review of all studies under the same criteria. The lack of consistent application of study quality aspects to the evaluation and integration of evidence can lead to causal conclusions that are biased and not fully supported by the evidence as a whole.

For our review, when a particular cohort was evaluated in more than one study, we excluded studies if they were less recent or less informative than other studies of the same cohort, even if they met our initial study selection criteria (as described above). It is unlikely that our causal conclusion would be different if we had included these studies, however, as they had similar methodologies (and thus similar strengths and limitations) and reported similar results as the other studies of the same cohort, though we did exclude some of these studies based on additional limitations with regard to exposure assessment, statistical analyses, and confounder adjustment compared to the included studies of the same cohort. For example, the studies of the CanCHEC 1991 general population cohort reviewed here (Crouse et al. Citation2015; Weichenthal et al. Citation2016) reported weak, positive associations with nonaccidental mortality, as did the two studies of this cohort that we excluded (Crouse et al. Citation2012, Citation2016). Similarly, the study of female nurses in the NHS cohort by Hart et al. (Citation2015) reviewed here reported a weak association with mortality (HR = 1.13 for nonaccidental mortality), as did the two other studies of this cohort that we excluded (Puett et al. Citation2009, who reported an HR of 1.29; Liao et al. (Citation2018), who reported an HR of 1.18, both for all-cause mortality). In addition, the study of female teachers in the CTS cohort by Ostro et al. (Citation2015) reviewed here reported no association (HR = 1.01, 95% CI: 0.98–1.05) with non-accidental mortality, as did the study of this cohort that we excluded (Lipsett et al. Citation2011; HR = 1.01, 95% CI: 0.95–1.09). The results are highly similar among other studies that we excluded compared to the studies of the same cohort that we included in this review.

There are several key uncertainties related to the available epidemiology evidence for associations between exposure to ambient PM2.5 and mortality that are primarily due to potential confounding by copollutants and unmeasured/unknown confounders, exposure measurement error, model misspecification, and a limited understanding of risks related to relatively low PM2.5 concentrations. As studies begin to address these key uncertainties more, future studies may be better able than the current literature to improve our understanding of potential causal relationships between PM2.5 and mortality or other adverse health effects. Burns et al. (Citation2019b) recently developed a matrix for communicating risk assessment ‘asks’ of epidemiology research that describes characteristics of epidemiology studies that should be considered when they are used for risk assessment and decision making. These characteristics include confirming exposures and outcomes and determining the direction and magnitude of error surrounding exposure and dose-response assessments, for example. Most of the recent epidemiology studies of PM2.5 exposure and mortality do not fully meet these ‘asks’ of risk assessors or appreciably reduce uncertainty regarding associations between ambient PM2.5 concentrations and mortality and, thus, are of limited use for risk assessment; therefore, the ‘asks’ could be an important tool for consideration in future epidemiology publications to improve their value for use in decision making.

Conclusions

We conducted a review of the epidemiology studies of long-term exposure to ambient PM2.5 and mortality using a transparent systematic review framework based on best practices for evaluating study quality and integrating evidence. There is some consistency across studies for reporting positive associations, but these associations are weak and several important methodological issues have led to uncertainties with regard to the evidence from these studies, including potential confounding by measured and unmeasured factors, exposure measurement error, and model misspecification. Because these uncertainties provide a plausible, alternative explanation to causality for the weakly positive findings across studies, we concluded that the evidence for a causal relationship between long-term exposure to ambient PM2.5 concentrations and mortality (all-cause or non-accidental) from these studies is inadequate. Our review shows that a relatively consistent pattern of weak, positive associations does not necessarily lead to a conclusion of causality when study quality is incorporated into the evaluation and integration of evidence in a consistent manner and alternative explanations for the evidence are explored. Our conclusion that the evidence for a causal relationship between long-term ambient PM2.5 exposure and mortality is inadequate is based on the many study limitations and uncertainties associated with the evidence, and indicates that the epidemiology studies of PM2.5 and mortality should be interpreted with caution, particularly if they are to be used for regulatory decision making.

Declaration of interest

All of the authors are employees of Gradient, an independent environmental risk science consulting firm. The work reported in this paper was conducted during the normal course of employment, with financial support by the American Petroleum Institute (API). J.E. Goodman and R.L. Prueitt have previously given presentations or testimony on topics related to air pollution at scientific conferences and meetings with regulatory agencies, with funding provided by API. All other authors declare that they have not been involved in any regulatory activities related to the contents of this paper. This manuscript is the professional work product of the authors, and the opinions and conclusions offered within are not necessarily those of their employers or the financial sponsor of the work.

Supplemental material

Supplemental Material

Download MS Word (100.3 KB)

Acknowledgments

We thank Ms. Carla Walker for her assistance on this manuscript.

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

This work was supported by the American Petroleum Institute [No Grant].

References