2,082
Views
1
CrossRef citations to date
0
Altmetric
Respiratory Medicine

Development and evaluation of a predictive algorithm for unsatisfactory response among patients with pulmonary arterial hypertension using health insurance claims data

, , , , , , & show all
Pages 1019-1030 | Received 03 Dec 2021, Accepted 01 Mar 2022, Published online: 17 Mar 2022
 

Abstract

Objective

This study aimed to develop and validate a predictive algorithm for unsatisfactory response to initial pulmonary arterial hypertension (PAH) therapy using health insurance claims.

Methods

Adult patients with PAH initiated on a first PAH therapy (index date) were identified from Optum’s de-identified Clinformatics Data Mart Database (1/1/2010–12/31/2019). A random survival forest algorithm was developed using patient-month data and predicted the “survival function” (i.e. risk of not having unsatisfactory response) over time. For each patient-month observation, risk factors were assessed in the 12 months prior. Unsatisfactory response was defined as the first instance of (1) new PAH therapy, (2) PAH-related hospitalization or emergency room visit, (3) lung transplant or atrial septostomy, (4) PAH-related death or (5) chronic oxygen therapy initiation. To facilitate use in clinical practice, a simplified risk score was also developed based on a linear combination of the most important risk factors identified in the algorithm.

Results

In total, 4781 patients were included (median age = 69.0 years; 58.6% female). Over a median follow-up of 14.0 months, 3169 (66.3%) had an unsatisfactory response. The most important risk factors included in the algorithm were healthcare resource use (i.e. PAH-related outpatient visits, pulmonologist visits, cardiologist visits, all-cause hospitalizations), time since first PAH diagnosis, time since index date, Charlson Comorbidity Index, dyspnea, and age. Predictive accuracy was good for the full algorithm (C-statistic: 0.732) but was slightly lower for the simplified risk score (C-statistic: 0.668).

Conclusion

The present claims-based algorithm performed well in predicting time to unsatisfactory response following initial PAH therapy.

Transparency

Declaration of funding

This work was supported by Janssen Scientific Affairs, LLC. (Grant number: N/A).

Declaration of financial/other relationships

Janssen Scientific Affairs, LLC sponsored this study. Yuen Tsang, Peter Agron, Karimah S. Bell Lynum, and Sumeet Panjabi are employees of Janssen Scientific Affairs, LLC and may own stocks. Marjolaine Gauthier-Loiselle, Patrick Lefebvre, and Jimmy Royer are employees of Analysis Group, Inc., and Lucas Bennett is a former employee of Analysis Group, Inc., which has received consultancy fees from Janssen Scientific Affairs, LLC for the conduct of this study. The study sponsor was involved in all aspects of the research, including the collection of data, its analysis and interpretation, and approval of the final manuscript for publication. Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Author contributions

All authors participated in the design of the study, data interpretation, and critically revised the intellectual content of this manuscript. Marjolaine Gauthier-Loiselle, Patrick Lefebvre, Jimmy Royer, and Lucas Bennett participated in the data analysis. All authors approved the final manuscript.

Acknowledgements

The authors would like to thank Emma Russon, who is a former employee of Analysis Group, Inc., for her contribution to the development of the algorithm. The authors would also like to thank Tom Cornwall, who is currently employed at Analysis Group, Inc., for his contributions to data analysis and interpretation. Medical writing assistance was provided by Mona Lisa Chanda, PhD, an employee of Analysis Group, Inc. Funding for this assistance was provided by Janssen Scientific Affairs, LLC.

Data availability statement

The data that support the findings of this study are available from Optum, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Any researchers interested in obtaining the data used in this study can access database through Optum, under a license agreement, including the payment of appropriate license fee.

Notes

i Optum Clinformatics is a registered trademark of OptumInsight Life Sciences, Inc., Eden Prairie, MN, USA.

ii While interaction terms can be added to a Cox regression model, the difficulty arises from high dimensionality. Accommodating all two-way interactions requires k2 parameters, while all three way-interactions requires approximately k3, and so on. The number of parameters very quickly exceeds the number of observations, such that the model cannot be properly estimated.

iii For the development of such algorithms, 500 trees are often used due to the marginal improvement in performance observed with an increasing number of trees and for computational burden considerations. Based on the literature, 128 trees are presumed to be sufficient (e.g. Oshiro, Thais Mayumi, Pedro Santoro Perez, and José Augusto Baranauskas. "How many trees in a random forest?" International workshop on machine learning and data mining in pattern recognition. Springer, Berlin, Heidelberg, 2012).