Search in:

Multivariate Behavioral Research Volume 46, 2011 - Issue 1

Submit an article Journal homepage

Open access

11,990

Views

311

CrossRef citations to date

Altmetric

Listen

Original Articles

A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality

Peter C. Austin Institute for Clinical Evaluative Sciences and University of Toronto

Pages 119-151 | Published online: 18 Feb 2011

Cite this article
https://doi.org/10.1080/00273171.2011.540480

In this article

Introduction
METHODS
RESULTS
DISCUSSION
ACKNOWLEDGMENTS
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

Abstract

Propensity score methods allow investigators to estimate causal treatment effects using observational or nonrandomized data. In this article we provide a practical illustration of the appropriate steps in conducting propensity score analyses. For illustrative purposes, we use a sample of current smokers who were discharged alive after being hospitalized with a diagnosis of acute myocardial infarction. The exposure of interest was receipt of smoking cessation counseling prior to hospital discharge and the outcome was mortality with 3 years of hospital discharge. We illustrate the following concepts: first, how to specify the propensity score model; second, how to match treated and untreated participants on the propensity score; third, how to compare the similarity of baseline characteristics between treated and untreated participants after stratifying on the propensity score, in a sample matched on the propensity score, or in a sample weighted by the inverse probability of treatment; fourth, how to estimate the effect of treatment on outcomes when using propensity score matching, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, or covariate adjustment using the propensity score. Finally, we compare the results of the propensity score analyses with those obtained using conventional regression adjustment.

Introduction

Propensity score methods allow one to minimize the effects of observed confounding when estimating treatment effects using observational data. An article to appear in a special issue on propensity score analysis to be published in Multivariate Behavioral Research describes a framework for using propensity scores to estimate causal treatment effects using observational or nonrandomized data (Austin, in press-a). In the review paper, the different methods of using propensity scores to estimate treatment effects are highlighted along with a description of the steps in conducting a propensity score analysis. The objective of the current article is to illustrate the methods described in the overview article using a single data source.

In this article, a propensity score analysis was conducted using four different propensity score methods to estimate the effect of in-patient smoking cessation counseling on mortality in patients hospitalized with a heart attack. The results from the propensity score analyses are compared with those obtained using conventional regression adjustment.

METHODS

Data Source

The data consisted of patients hospitalized with acute myocardial infarction (AMI or heart attack) at 103 acute care hospitals in Ontario, Canada, between April 1, 1999, and March 31, 2001. Data on patient history, cardiac risk factors, comorbid conditions and vascular history, vital signs, and laboratory tests were obtained by retrospective chart review by trained cardiovascular research nurses. These data were collected as part of the Enhanced Feedback For Effective Cardiac Treatment (EFFECT) study, an ongoing initiative intended to improve the quality of care for patients with cardiovascular disease in Ontario (CitationTu et al., 2004; CitationTu et al., 2009).

The sample was restricted to those patients who survived to hospital discharge and who had documented evidence of being current smokers. For the purposes of the current case study, the treatment or exposure of interest was whether the patient received in-patient smoking cessation counseling. Smokers whose counseling status could not be determined from the medical record were excluded from the current study. Patients with missing data on important baseline clinical covariates were excluded from the sample. Patient records were linked to the Registered Persons Database using encrypted health card numbers, which allowed for determining the vital status of each patient at 3 years following hospital discharge. For the current study, the outcome was survival to 3 years, considered as both a dichotomous and a time-to-event outcome.

Baseline Comparisons of Treatment Groups

In the overall sample, continuous variables and categorical variables were compared between treatment groups using the standard t test and chi-square test, respectively. Standardized differences were also used to compare baseline characteristics between the two groups (CitationAustin, 2009a; CitationFlury & Riedwyl, 1986). Furthermore, basic baseline demographic characteristics and the probability of death within 3 years of discharge were compared between participants with complete data on baseline covariates and participants who were excluded from the study sample due to missing data on baseline covariates.

Estimating the Propensity Score

An initial propensity score model was estimated using the 33 variables described in . To estimate the propensity score, a logistic regression model was used in which treatment status (receipt of smoking cessation counseling vs. no smoking cessation counseling) was regressed on the baseline characteristics listed in (CitationRosenbaum & Rubin, 1984). The continuous baseline variables were linearly related to the log-odds of receipt of treatment in the initial specification of the propensity score model. Prior research on variable selection for the propensity score suggests that it is preferable to either include those variables that affect the outcome or include those variables that affect both treatment selection and the outcome (CitationAustin, Grootendorst, & Anderson, 2007). The variables listed in are plausible predictors of mortality in AMI patients. Because we want to induce balance on variables that are prognostic of mortality, we included these variables in our initial propensity score model.

Matching on the Propensity Score

Treated and untreated participants were matched on the propensity score. In the data set, there were more treated participants (patients receiving smoking cessation counseling) than there were untreated participants (patients not receiving smoking cessation counseling). For technical reasons when matching, a pool of controls that is at least as large as the number of treated participants was required. Thus, in the context of propensity score matching, we attempted to match a treated participant to each participant who did not receive smoking cessation counseling. Thus, participants who received counseling were used as a pool or reservoir from which to find appropriate participants to match to those participants who did not receive counseling. Because propensity score matching allows one to estimate the average treatment effect for the treated (ATT), this implies that we are estimating the effect of smoking cessation counseling (or the lack thereof) in those patients who ultimately did not receive such therapy (CitationImbens, 2004).

For reasons described in the forthcoming review, participants were matched on the logit of the propensity score (CitationRosenbaum & Rubin, 1985) using calipers of width equal to 0.2 of the standard deviation of the logit of the estimated propensity score. This caliper width has been found to result in optimal estimation of risk differences in a variety of settings (CitationAustin, 2010a).

TABLE 1 Baseline Characteristics of the Study Sample

Download CSV Display Table

In those participants who did not receive smoking cessation counseling, differences in baseline covariates between matched and unmatched participants were examined using statistical significance testing and standardized differences.

Inverse Probability of Treatment Weighting

We weighted the entire study sample by inverse probability of treatment weights derived from the propensity score. Let Z denote treatment status (Z = 1 denotes treated; Z = 0 denotes untreated) and let e denote the estimated propensity score. Then the inverse probability of treatment weights are defined by .

Stratification on the Propensity Score

Using the entire study sample, we computed the quintiles of the estimated propensity score. Participants in the overall study sample were stratified into five approximately equal-size groups using the quintiles of the estimated propensity score.

Balance Diagnostics

As discussed in the forthcoming review article, the true propensity score is a balancing score: conditional on the true propensity score, treated and untreated participants will have the same distribution of measured baseline covariates. However, the true propensity score model is not known in observational studies (unlike randomized experiments in which the true propensity score is often defined by the study design). Thus, balance diagnostics allow one to assess whether the propensity score model has been adequately specified. Appropriate balance diagnostics are highlighted in our forthcoming review and are described in greater detail elsewhere (CitationAustin, 2009b).

Propensity score matched sample

We compared the means and prevalences of continuous and dichotomous baseline covariates between treatment groups in the matched sample. The standardized difference was used to quantify differences in means or prevalences between treatment groups. Furthermore, we compared balance between treatment groups in all pairwise interactions of continuous covariates. The variance of continuous variables was compared between treatment groups in the matched sample. Finally, cumulative density plots and quantile-quantile plots were used to compare the distribution of continuous baseline covariates between treatment groups.

The reader should note that statistical significance testing was not used to compare the baseline characteristics of treated and untreated participants in the propensity score matched sample. Such practices have been criticized by different authors. Readers are referred elsewhere for a greater discussion of this practice (CitationAustin, 2007a, Citation2008a, Citation2008b; CitationHo, Imai, King, & Stuart, 2007; CitationImai, King, & Stuart, 2008).

Diagnostics based on comparing the distribution of the propensity score between treated and untreated participants were not used. Recent research has shown that, in the context of propensity score matching, comparing the distribution of the estimated propensity score between treated and untreated participants does not provide any information as to whether the propensity score model has been adequately specified (CitationAustin, 2009b). For similar reasons, the c statistic (equivalent to the area under the receiver operating characteristic [ROC] curve) of the propensity score model was not reported. The c statistic does not provide information as to whether the propensity score model has been adequately specified (CitationAustin, 2009b; CitationWeitzen, Lapane, Toledano, Hume, & Mor, 2005).

Stratification on the propensity score

Within each stratum of the propensity score, standardized differences were used to compare the means and prevalences of measured baseline covariates between treatment groups. Within-quintile standardized differences were computed for each of the 55 pairwise interactions between continuous variables.

Inverse probability of treatment weighting

In the sample weighted by the inverse probability of treatment, we computed standardized differences to compare the balance of baseline covariates between treatment groups. We also used standardized differences to compare balance on pairwise interactions between continuous baseline covariates. Empirical cumulative distribution functions and quantile-quantile plots were also used to compare the distribution of continuous baseline covariates between treatment groups in the weighted sample.

Covariate adjustment using the propensity score

CitationAustin (2008c) described the weighted conditional absolute standardized difference for comparing balance in baseline covariates after adjusting for the propensity score. Briefly, a given baseline covariate is regressed on the following three variables: the propensity score, an indicator variable denoting treatment assignment, and the interaction between the first two variables. Linear regression is used for continuous covariates, whereas logistic regression is used for dichotomous covariates. From the fitted regression model, for a given value of the propensity score, the mean response is determined assuming a participant was treated and then assuming the participant was untreated. The absolute standardized difference between the mean response for treated participants and the mean response for untreated participants is then determined. For continuous outcomes, this calculation will also use the estimate of the variance of the error term that was obtained from the linear model. This conditional (on the propensity score) absolute standardized difference is then integrated over the distribution of the propensity score in the study sample.

A second balance diagnostic involves the use of quantile regression (CitationAustin, Tu, Daly, & Alter, 2005). For a given continuous baseline covariate, quantile regression was used to regress the given baseline covariate on the estimated propensity score in treated and untreated participants separately. The use of the 5th, 25th, 50th, 75th, and 95th percentiles has been previously suggested (CitationAustin, 2008c). The model-based estimates of these quantiles in treated and untreated participants can then be displayed graphically.

Estimating Treatment Effects

As noted earlier, we considered two different outcomes: survival to 3 years postdischarge (a dichotomous outcome) and time to death (a time-to-event outcome) with participants censored at 3 years following hospital discharge.

Propensity score matching

The difference in the probability of 3-year mortality between treatment groups was estimated by directly estimating the difference in proportions between treated and untreated participants in the propensity score matched sample. When estimating the statistical significance of treatment effects, the use of methods that account for the matched nature of the sample is recommended (CitationAustin, 2009d, in press-b). Accordingly, McNemar's test was used to assess the statistical significance of the risk difference. Confidence intervals were constructed using a method proposed by CitationAgresti and Min (2004) that accounts for the matched nature of the sample. The number needed to treat (NNT) is the reciprocal of the absolute risk reduction. The relative risk was estimated as the ratio of the probability of 3-year mortality in treated participants compared with that of untreated participants in the matched sample. Methods described by Agresti and Min were used to estimate 95% confidence intervals.

We then estimated the effect of provision of smoking cessation counseling on the time to death. Kaplan-Meier survival curves were estimated separately for treated and untreated participants in the propensity score matched sample. The log-rank test is not appropriate for comparing the Kaplan-Meier survival curves between treatment groups because the test assumes two independent samples (CitationHarrington, 2005; CitationKlein & Moeschberger, 1997). However, the stratified log-rank test is appropriate for matched pairs data (CitationKlein & Moeschberger, 1997).

Finally, we used a Cox proportional hazards model to regress survival time on an indicator variable denoting treatment status (smoking cessation counseling vs. no counseling). As the propensity score matched sample does not consist of independent observations, we used a marginal survival model with robust standard errors (CitationLin & Wei, 1989). An alternative to the use of a marginal model with robust variance estimation would be to fit a Cox proportional hazards model that stratified on the matched pairs (CitationCummings, McKnight, & Greenland, 2003). This approach accounts for the within-pair homogeneity by allowing the baseline hazard function to vary across matched sets.

Stratification on the propensity score

We estimated the probability of 3-year mortality for participants in each treatment group in each propensity score strata. The absolute reduction in the probability of 3-year mortality was then determined in each propensity score strata by the difference between the observed probability for treated participants and the observed participants for untreated participants within that stratum. The overall estimated treatment effect was the mean of the stratum-specific risk differences. The standard error of each stratum-specific risk difference can be estimated using standard methods for differences in two binomial proportions. The stratum-specific standard errors can then be pooled to obtain the standard error of the overall risk difference. We also obtained the Mantel-Haenszel estimate of the pooled relative risk across the propensity score strata (CitationBreslow & Day, 1987).

To estimate the effect of counseling on survival, we used a Cox proportional hazards model to regress survival time on treatment status. The model stratified on the propensity score strata, allowing the baseline hazard to vary across the strata.

As a sensitivity analysis, we also stratified the entire study sample into 10 approximately equal-size groups using the deciles of the estimated propensity score.

Propensity score weighting

We estimated the absolute reduction in the probability of mortality within 3 years of hospital discharge due to receipt of in-patient smoking cessation counseling using a method described by CitationLunceford and Davidian (2004). As mentioned earlier, let Z_i be an indicator variable denoting whether or not the ith participant was treated; furthermore, let e_i denote the propensity score for the ith participant. The weights are defined as w_i = . Assume that Y_i denotes the outcome variable measured on the ith participant. The first estimate of the average treatment effect (ATE) is , where n denotes the number of participants in the full sample. Lunceford and Davidian also provide estimates of the standard error of the estimated treatment effect.

We used a second weighted estimator, also described by CitationLunceford and Davidian (2004), from the family of doubly robust estimators. This estimator requires specifying the propensity score model and regression models relating the expected outcome to baseline covariates in treated and untreated subjects separately. Let m_z (X, α_z) = E(Y|Z = z, X). Then

has a “double-robustness” property in that the estimator remains consistent if either the propensity score model is correctly specified or if both the outcomes regression models are correctly specified (CitationLunceford & Davidian, 2004). For the outcomes-regression models, we used logistic regression models in which the dichotomous outcome was regressed on the 33 baseline covariates described in .

We then used an approach similar to the aforementioned to estimate the relative reduction in the probability of mortality within 3 years. Each of the estimators described by CitationLunceford and Davidian (2004) were modified to estimate the relative risk rather than the difference in risks. Confidence intervals were estimated using nonparametric bootstrap methods with 1,000 bootstrap samples.

We then used logistic regression to regress survival to 3 years (a dichotomous outcome) on an indicator variable denoting receipt of in-patient smoking cessation counseling in the weighted sample. Standard errors were obtained using a robust variance estimate (CitationJoffe, Ten Have, Feldman, & Kimmel, 2004). The logistic regression model was then modified by adjusting for the 33 variables in .

Our second outcome was time to death with participants censored at 3 years after hospital discharge. We used two different methods to estimate the effect of smoking cessation counseling on time to death in the weighted sample. First, we fit a Cox proportional hazards model with counseling as the only predictor variable. We used the inverse probability of treatment weights. Furthermore, we used a robust sandwich variance estimator to account for the weighted nature of the sample. Our second approach was based on the method of CitationXie and Liu (2005) to estimate adjusted Kaplan-Meier estimates of survival curves in a sample weighted by the inverse probability of treatment. Xie and Liu proposed a weighted version of the log-rank test to test the null hypothesis that the two survival curves are equal to one another.

Covariate adjustment using the propensity score

We considered two different approaches to using covariate adjustment using the propensity score. The first approach is based on regressing the outcome on two independent variables: an indicator variable denoting treatment assignment and the estimated propensity score. For the binary outcome (survival to 3 years postdischarge) a logistic regression model was used, whereas for the time-to-event outcome, a Cox proportional hazards regression model was used.

The aforementioned two approaches, although common in the medical literature, have been shown to result in biased estimates of conditional odds ratios and hazards ratios (CitationAustin, Grootendorst, Normand, & Anderson, 2007). Furthermore, the aforementioned approach for binary outcomes has also been shown to result in biased estimation of marginal odds ratios (CitationAustin, 2007b). Thus, we implemented an approach based on one described by CitationImbens (2004). This approach is similar to one described by CitationAustin (2010b) for estimating marginal treatment effects using logistic regression models. The aforementioned logistic regression model was fit to the sample. Then for each participant, two predicted probabilities were obtained: the probability of the outcome if the participant had been treated and the probability of the outcome if the participant had been untreated. The average probability of the outcome if untreated can then be determined over all participants in the full study sample. Similarly, the average probability of the outcome if treated can then be determined over all participants in the sample. The difference between these two probabilities is the average treatment effect (CitationImbens, 2004). Confidence intervals were obtained using nonparametric bootstrap techniques (CitationEfron & Tibshirani, 1993). A similar approach can be used to estimate the relative reduction in death due to smoking cessation counseling. The aforementioned approach can be replicated for the time-to-event outcome (CitationAustin, 2010c). Using this approach, one can determine the absolute reduction in the probability of an event occurring within a specified duration of follow-up.

Regression Adjustment

For comparative purposes, we used regression adjustment to estimate the effect of smoking cessation counseling on mortality. First, logistic regression was used to regress an indicator variable denoting survival to 3 years postdischarge on an indicator variable denoting receipt of smoking cessation counseling and the 33 baseline covariates listed in . The logistic regression model was then modified by using restricted cubic smoothing splines to model the relationship between continuous baseline covariates and the log-odds of mortality.

We then used a Cox proportional hazards model to regress survival time on treatment status and the 33 baseline covariates listed in . We then modified the Cox proportional hazards model by using restricted cubic smoothing splines to model the relationship between continuous baseline covariates and the log-hazard of mortality.

RESULTS

Sample Description

The study sample for this case study consisted of 2,342 participants, of whom 1,588 received in-patient smoking cessation counseling and 754 did not. The baseline characteristics of exposed and unexposed participants are described in . Patients receiving smoking cessation counseling tended to be younger (p < .001), were less likely to be female (p = .032), tended to have a lower burden of comorbid conditions, and were more likely to receive prescriptions for cardiac medications at hospital discharge compared with patients who did not receive in-patient smoking cessation counseling. There were statistically significant differences in 22 of the 33 baseline characteristics between exposed and unexposed participants in the study sample. Twenty of the variables had standardized differences that exceeded 0.10. Thus, as is typical in observational studies, there were systematic differences in baseline characteristics between treated and untreated patients.

There were no statistically significant differences in basic demographic characteristics (age and sex) and in the probability of death within 3 years of discharge between participants with complete data on baseline covariates and participants who were excluded due to missing data on baseline covariates.