Abstract
Several studies have provided strong evidence that long-term exposure to air pollution, even at low levels, increases risk of mortality. As regulatory actions are becoming prohibitively expensive, robust evidence to guide the development of targeted interventions to protect the most vulnerable is needed. In this article, we introduce a novel statistical method that (i) discovers subgroups whose effects substantially differ from the population mean, and (ii) uses randomization-based tests to assess discovered heterogeneous effects. Also, we develop a sensitivity analysis method to assess the robustness of the conclusions to unmeasured confounding bias. Via simulation studies and theoretical arguments, we demonstrate that hypothesis testing focusing on the discovered subgroups can substantially increase statistical power to detect heterogeneity of the exposure effects. We apply the proposed de novo method to the data of 1,612,414 Medicare beneficiaries in the New England region in the United States for the period 2000–2006. We find that seniors aged between 81 and 85 with low income and seniors aged 85 and above have statistically significant greater causal effects of long-term exposure to PM2.5 on 5-year mortality rate compared to the population mean.
Supplementary Materials
Appendix The online appendix contains proofs for Theorems 1 and 2, additional simulations for the discovery step. (.pdf file)
R code for application and simulations An R script illustrates our methods with a simulated dataset. Codes for both implementing the de novo method and producing simulation results are provided. (.R file)
Acknowledgments
We are grateful for helpful feedback from the editor, the associate editor, four anonymous referees, and session participants at JSM and European Causal Inference Meeting.