Abstract
As health care expenditures increase, patient cost mitigation becomes more essential. Cost mitigation through intervention programs such as accountable care organizations relies on the ability to accurately predict patient risk, which is notoriously difficult because of highly skewed data. We examine the Medicare Limited dataset (a 5% sample of Medicare claims) that includes demographics, costs, and health conditions. We first consider the Centers for Medicare and Medicaid Services (CMS) currently used Hierarchical Condition Category (HCC) linear model and then implement more complex two-part generalized additive and random forest models to predict patient costs in a future year based on current-year data. We find that the latter models more accurately predict the entire distribution of Medicare patient costs and can better support the existing cost mitigation frameworks. The two-part lognormal generalized additive model is chosen as the optimal model for its robust performance and reasonable interpretability when the data have extreme values.
ACKNOWLEDGMENTS
The authors thank students in PSTAT 196 and PSTAT 296 who worked on different versions of this study: Jordan Bu, Nathan Hwangbo, Ming Zhang, Elise Bonfiglio, Jordan Jang, and Ming Yi, as well as Qimeng Zhang of Santa Barbara Actuaries for data preprocessing.
Notes
1 “Allowed charges” is the amount of a claim recognized for payment by CMS after deduction of applicable discounts but before patient cost-sharing. Because of differences in cost-sharing, it is a more reliable measure of cost than net paid claims.
2 https://www2.ccwdata.org/web/guest/condition-categories (accessed on 01/26/2022)