ABSTRACT
Bayesian models are increasingly fit to large administrative datasets and then used to make individualized recommendations. In particular, Medicare’s Hospital Compare webpage provides information to patients about specific hospital mortality rates for a heart attack or acute myocardial infarction (AMI). Hospital Compare’s current recommendations are based on a random-effects logit model with a random hospital indicator and patient risk factors. Except for the largest hospitals, these individual recommendations or predictions are not checkable against data, because data from smaller hospitals are too limited to provide a meaningful check. Before individualized Bayesian recommendations, people derived general advice from empirical studies of many hospitals, for example, prefer hospitals of Type 1 to Type 2 because the risk is lower at Type 1 hospitals. Here, we calibrate these Bayesian recommendation systems by checking, out of sample, whether their predictions aggregate to give correct general advice derived from another sample. This process of calibrating individualized predictions against general empirical advice leads to substantial revisions in the Hospital Compare model for AMI mortality. To make appropriately calibrated predictions, our revised models incorporate information about hospital volume, nursing staff, medical residents, and the hospital’s ability to perform cardiovascular procedures. For the ultimate purpose of comparisons, hospital mortality rates must be standardized to adjust for patient mix variation across hospitals. We find that indirect standardization, as currently used by Hospital Compare, fails to adequately control for differences in patient risk factors and systematically underestimates mortality rates at the low volume hospitals. To provide good control and correctly calibrated rates, we propose direct standardization instead. Supplementary materials for this article are available online.
Supplementary Materials
Appendix A.1 details the MCMC implementation for simulated sampling from our hierarchical logit model posteriors. This entails successive substitution Gibbs sampling from the full conditionals obtained with a suitable Polya-Gamma latent variable posterior augmentation. Appendix A.2 illustrates the relationship between mortality rates and number of hospital beds with a model that excludes volume. This gives further insight into the persistent relationship between hospital mortality and hospital size. Appendix A.3 provides a second out-of-sample calibration example with US News and World Report hospital rankings.Appendix A.4 provides the full cross-classification of low, average and high hospital mortality rates by the (C,C) and the (SLI,L) models.
Acknowledgments
The authors are especially grateful to Nabanita Mukherjee, an associate editor, and anonymous referees for their many constructive suggestions.
Funding
This work was supported by the Agency for Healthcare Research and Quality grant No. R21-HS021854; and grants SBS-1260782 and DMS-1406563 from the National Science Foundation.