88
Views
8
CrossRef citations to date
0
Altmetric
Original Research

Use of clinical characteristics to predict spirometric classification of obstructive lung disease

, , , , &
Pages 889-902 | Published online: 12 Mar 2018
 

Abstract

Background

There is no consensus on how to define patients with symptoms of asthma and chronic obstructive pulmonary disease (COPD). A diagnosis of asthma–COPD overlap (ACO) syndrome has been proposed, but its value is debated. This study (GSK Study 201703 [NCT2302417]) investigated the ability of statistical modeling approaches to define distinct disease groups in patients with obstructive lung disease (OLD) using medical history and spirometric data.

Methods

Patients aged ≥18 years with diagnoses of asthma and/or COPD were categorized into three groups: 1) asthma (nonobstructive; reversible), 2) ACO (obstructive; reversible), and 3) COPD (obstructive; nonreversible). Obstruction was defined as a post-bronchodilator forced expiratory volume in 1 second (FEV1)/forced vital capacity <0.7, and reversibility as a post-albuterol increase in FEV1 ≥200 mL and ≥12%. A primary model (PM), based on patients’ responses to a health care practitioner-administered questionnaire, was developed using multinomial logistic regression modeling. Other multivariate statistical analysis models for identifying asthma and COPD as distinct entities were developed and assessed using receiver operating characteristic (ROC) analysis. Partial least squares discriminant analysis (PLS-DA) assessed the degree of overlap between groups.

Results

The PM predicted spirometric classifications with modest sensitivity. Other analysis models performed with high discrimination (area under the ROC curve: asthma model, 0.94; COPD model, 0.87). PLS-DA identified distinct phenotypic groups corresponding to asthma and COPD.

Conclusion

Within the OLD spectrum, patients with asthma or COPD can be identified as two distinct groups with a high degree of precision. Patients outside these classifications do not constitute a homogeneous group.

Supplementary materials

Spirometry

This study used spirometry data obtained as part of enrollment in GSK Studies 200699 (NCT2164539; data were recorded using the ERT Masterscope CT) or 201496 (NCT2299375), or obtained as part of routine clinical management within the 6 months prior to the patient providing consent for study inclusion (data were not accessed until the patient had provided written consent for study enrollment). For patients without spirometry data in this period, spirometry was performed at the clinic visit during which the survey was administered.

Data from the following spirometry assessments performed according to accepted American Thoracic Society/European Respiratory Society standardsCitation1 were recorded in all patients: percentage predicted forced expiratory volume in 1 second (FEV1), pre- and post-bronchodilator FEV1 (L), pre-and post-bronchodilator forced vital capacity (FVC) (L); pre-and post-bronchodilator FEV1/FVC ratio, and reversibility (mL and %). Reversibility was calculated as follows:

ReversibilityinmL=(FEV1post-albuterolFEV1pre-albuterol)×1,000
Reversibilityin%=FEV1post-albuterolFEV1pre-albuterolFEV1pre-albuterol×100

Statistical considerations

The numbers of patients in each group were based on a preplanned sample size of ~1,000 patients to allow for a broad evaluation of various clinical characteristics associated with asthma and/or chronic obstructive pulmonary disease (COPD).

Data analyses were performed with SAS version 9.3 (SAS Institute, Cary, NC, USA) and R version 3.2.4.Citation2

The statistical analyses aimed to identify a primary model (PM) incorporating variables (patient demographics or medical history characteristics based on responses to the patient history questionnaire) that were good predictors of patients’ disease categories (asthma only, COPD only, or asthma–COPD overlap [ACO]), and to evaluate the performance of this model in differentiating patients with ACO from those with asthma or COPD as classified by spirometry. Questionnaire items 39, 40, and 41 were intended only for gathering information on the current diagnosis and were therefore not included as explanatory variables for the modeling.

The univariate distribution of each variable was assessed to determine any differences between patient cohorts. Questionnaire items were classified according to the odds ratios (ORs) of the COPD and the asthma cohorts to the ACO cohort, using the first answer listed for each question as the reference, as follows:

  • Type 1: ORs of the COPD and asthma cohorts to the ACO cohort both significantly different from 1 in the same direction (both >1 or <1), demonstrating a different response for the ACO cohort to both the asthma and COPD cohorts with the ACO cohort at one end of the spectrum of responses.

  • Type 2: ORs both significantly different from 1 in different directions (one <1 and the other >1), demonstrating differences in responses between all cohorts, with the ACO cohort falling between the asthma and COPD cohorts in the spectrum of responses.

  • Type 3a: OR significantly different from 1 for comparison with COPD only, demonstrating that the ACO cohort had similar responses to the asthma cohort but different responses from the COPD cohort.

  • Type 3c: OR significantly different from 1 for comparison with asthma only, demonstrating that the ACO cohort had similar responses to the COPD cohort but different responses from the asthma cohort.

  • Type 4: ORs showed no significant difference between ACO and asthma or COPD, demonstrating similar responses in all cohorts.

To assess the associations between categorical item response variables, correlation coefficient analysis was performed based on Spearman rank-order correlation coefficients. Variables that were moderately or highly correlated (correlation coefficient ≥0.5) were assessed for possible multicollinearity and excluded as appropriate from the multinomial logistic regression analysis. For continuous variables, similar correlation analysis was performed using Pearson’s product-moment correlation coefficients.

Multinomial logistic regression analysis was then performed to identify a subset of variables able to predict patients’ disease classification, as identified using spirometry, with reasonable accuracy and repeatability. Seven orthogonal partitions of the evaluable population were created using a prespecified random scheme (further details are available in the protocol for this study, which is available online from the GSK Clinical Study Register [number 201703]Citation3). For each partition, a split-sample approach was employed for cross-validation, using 50% of data to identify significant predictors and the other 50% for performance evaluation. Age, sex, body mass index (BMI), smoking status, and exacerbation history were of clinical relevance and included in every analysis model in the variable selection process. Multinomial logistic regression analysis was performed using PROC LOGISTIC in SAS, with ACO as the reference category, using the maximum likelihood method. For each round of modeling, prespecified stepwise variable selection criteria were used at a significance level of 0.10 for a variable to enter the model and 0.15 to remain in the model.

In assessing the performance of the model, the predicted probabilities for each patient in the validation dataset were calculated for the three disease cohorts (asthma only, ACO, or COPD only) based on the fitted model, and the predicted outcome for the patient was determined as the disease cohort with the maximum predicted probability. The disease classifications by spirometry and by model-based prediction were compared.

Post hoc analyses

To further understand the results from the PM, post hoc analyses were performed to identify variables that differentiate patients in the asthma cohort from all other patients (included in the model for asthma) and patients in the COPD cohort from all other patients (included in the model for COPD). As for the PM, binary logistic regression analysis was performed to identify variables able to discriminate between patients with or without asthma, and with or without COPD. Three versions of the asthma and COPD models were identified: a basic model, a reduced model, and a full model. The basic model for asthma included only age, sex, smoking status, and BMI as explanatory variables, while the basic model for COPD included age, sex, smoking status, and exacerbation history. The reduced models comprised the basic model characteristics plus those found to be significant predictors of disease in at least four out of seven rounds of logistic regression modeling, and the full models comprised the basic model characteristics plus responses to questionnaire items 1–36, with the exception of item 30. Receiver operating characteristic curves were generated to compare the effectiveness of the three versions of the models. The predicted outcomes using the full models for asthma and COPD were used to classify patients into four groups according to the presence or absence of asthma and COPD (asthma without COPD, COPD without asthma, neither asthma nor COPD, or asthma and COPD). A 100% stacked bar chart was generated to compare these predictions with spirometric disease classification.

Partial least squares discriminant analysis (PLS-DA) was conducted using the R package “ropls”Citation4 with the input variables of age, sex, smoking status, BMI, exacerbation incidence, and the responses to questionnaire items 1–36, to further examine the degree of separation between the groups identified as having asthma and having COPD, and the groups identified as having asthma, COPD, and ACO, by spirometric classification. Finally, data on percentage reversibility at screening, post-bronchodilator FEV1/FVC ratio, and pre-bronchodilator FEV1 were added to the input variables, and the PLS-DA was performed again. The outcome of this analysis was Q2, a cross-validated equivalent of R2, which ranges from negative to 1, with 1 representing perfect discrimination.

References

Acknowledgments

This study was funded by GSK (study number: 201703). The funder had a role in study design, data analysis, data interpretation, and writing of the report. The corresponding author had full access to all the data and the final responsibility to submit for publication. Editorial assistance in the preparation of this manuscript (in the form of writing assistance, including developing of the initial draft based on author direction, assembling tables and figures, collating and incorporating authors’ comments, grammatical editing, and referencing) was provided by Elizabeth Jameson, PhD, and Natasha Thomas, PhD, of Fishawack Indicia Ltd, UK, and was funded by GSK.

Author contributions

WW contributed to the study concept and design and was involved in data acquisition, analysis, and interpretation. SJP, KAC, LMN, KEW, and LAL contributed to the study concept and design and were involved in data analysis and interpretation. All authors were involved in preparation and review of the manuscript and approved the final version to be submitted.

Disclosure

SJP, KAC, LMN, KEW, and LAL are employees of GSK and hold stocks/shares in GSK. WW was an employee of GSK at the start of the study, owns GSK stock, and moved to PAREXEL during the study conduct, where she is currently an employee. The authors report no other conflicts of interest in this work.