292
Views
18
CrossRef citations to date
0
Altmetric
Original Research

A Model Using Support Vector Machines Recursive Feature Elimination (SVM-RFE) Algorithm to Classify Whether COPD Patients Have Been Continuously Managed According to GOLD Guidelines

ORCID Icon, , , , , , & ORCID Icon show all
Pages 2779-2786 | Published online: 04 Nov 2020

Abstract

Purpose

Patients with chronic obstructive pulmonary disease (COPD) would have a poor prognosis if they were not continuously managed according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines. We aim to develop a model to classify whether COPD patients have been continuously managed according to GOLD in the previous year.

Methods

The Managed group were COPD patients from a prospective cohort from November 2017 to November 2019, who have been continuously managed according to GOLD for 1 year. The Control group were COPD patients who were not continuously managed according to GOLD. They were from a retrospective cohort from October 2016 to October 2017 in the same hospitals as the Managed group. A synthetic minority over-sampling technique (SMOTE) algorithm was used to up-sample the Managed group in a training dataset. Features for classification were selected using a support vector machine recursive feature elimination (SVM-RFE) algorithm. The classification model was developed using LibSVM, and its performance was assessed on the testing dataset.

Results

The final analysis included 15 subjects in the Managed group and 191 in the Control group. SVM-RFE selects nine features including smoking history, post-bronchodilator (post-)FVC before management, and those after 1-year follow-up (BMI, moderate and severe AECOPD frequency in previous 12 months, mMRC score, post-FEV1, post-FEV1%pred, post-FVC, and post-FEV1/FVC). For our model, positive predictive value is 66.7%, F1 score is 0.978, and AUC is 0.987.

Conclusion

SVM classifier combined with SVM-REF feature selection algorithm could achieve good classification between COPD patients who are or are not continuously managed. This model could be applied in clinical practice to help doctors make decisions and enhance COPD patients’ compliance with standard treatment.

Introduction

Chronic obstructive pulmonary disease (COPD) has been a major public health problem in the world for its high prevalence, morbidity, and mortality in recent decades. The World Health Organization estimates it will become the third leading cause of death globally in 2030.Citation1 The situation is similar in China, where the overall prevalence of COPD is as high as 8.6% based on a national cross-sectional study in recent years.Citation2 Even more, the prevalence of COPD was 11.9% in men and 13.7% in people older than 40 years.Citation2 However, even for COPD patients living in a metropolis, more than 90% of them are not continuously managed according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines.Citation3 Not only in China, but also in the US, Italy and South Korea, about 40–70% of COPD patients are not continuously managed according to GOLD guidelines.Citation4Citation6 For COPD patients, inappropriate treatment usually increases the rate of exacerbations, symptoms, and medical cost, worsens the quality-of-life and physical activity, and decreases survival.Citation7 Thus, it’s crucial to classify COPD patients who are not continuously managed according to GOLD guidelines. Those patients could get more help from physicians to get appropriate treatment and improve their compliance with continuous and standard treatment, resulting in an improvement of their prognosis. However, recall bias and subjectivity of the information from patients could interfere with the doctor’s decision, particularly when COPD patients are older in the majority. Therefore, it’s also helpful for the doctor to get objective information about the treating history of COPD patients from a classification model.

In our study, we develop a model to classify whether COPD patients were continuously managed according to the GOLD guidelines using support vector machines (SVMs). SVMs are supervised machine learning techniques, which have been widely used in classification. SVMs work as a classifier by constructing a multidimensional hyper plane, which optimally discriminates two classes by maximizing the margin between the data clusters. SVMs are an effective approach for classification by using linear functions or special nonlinear functions, namely kernels, to transform the input space into a multidimensional space.Citation8 SVMs are trained with a training dataset, in which each case is marked as one category of the two. The trained algorithm of SVMs develops a model which could predict the category of a new case. Nowadays SVMs have been used in medical study.Citation9Citation11

Based on the model from SVMs, we could make an objective judgment for COPD patients as to whether they'd been continuously managed according to GOLD guidelines in the previous 1 year, which could help to improve compliance with standard treatment, resulting in an improvement of therapeutic effect and prognosis.

Methods

Study Design and Participants

A prospective cohort was conducted in Tongji Hospital, Tongji Medical College, Huazhong University of Sciences & Technology in Wuhan and Peking University Third Hospital in Beijing, China, from November 2017 to November 2019. Adult patients with acute exacerbation of COPD (AECOPD) diagnosed by pulmonologists according to the GOLD guideline in 2017Citation12 were enrolled in the study. Briefly, COPD was diagnosed based on a forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) ratio <70%, with a reversibility of less than 15% after inhaling 200 mg of salbutamol. AECOPD is the stage that respiratory symptoms acutely worsen, leading to additional therapy. When AECOPD patients had become stable and retained that level for at least 1 month, they were recruited into the study. All the pulmonologists were trained to diagnose COPD and manage patients according to GOLD guidelines in 2017 before the beginning of the trial.

Exclusion criteria for patients in the study were: 1) had other chronic pulmonary disease besides of COPD; 2) had a history of intubation within 3 years of enrollment; 3) were pregnant or prepared for pregnancy; or 4) had psychiatric disorders. Those subjects received COPD management according to GOLD guidelines in 2017Citation12 for 1 year. During this year, the subjects were received a face-to-face interview every 3 months. In each interview, subjects received a COPD assessment and adjusted therapeutic strategy according to GOLD guideline in 2017 based on the assessment. They also got education on smoking cessation and COPD self-care by a pulmonologist. That was the Managed group.

The Control group was from a retrospective cohort, who were hospitalized patients for AECOPD from October 2016 to October 2017 in the same hospitals as the Managed group. They received COPD treatment according to GOLD guidelines in 2016Citation13 in the hospital. They were recruited into the study 1 month after leaving hospital when they were in a stable condition of COPD. They had a face-to-face interview 1 year after recruiting into the study. In the interview, they received COPD assessment by a pulmonologist and completd a survey about the period of being COPD managed after leaving hospital. The inclusion and exclusion criteria were the same as subjects in the Managed group. In addition, subjects were excluded if they were continuously managed for more than 1 month after leaving hospital.

The study protocol was approved by the Institutional Review Board (Research Ethics Committee of Tongji Medical College, Huazhong University of Sciences & Technology). All methods were performed in accordance with the declaration of Helsinki, and informed consent was obtained from each subject. The ClinicalTrials.gov ID is NCT3314077.

In the Managed group, one of 16 subjects was excluded for not continuously being managed according to GOLD guidelines for personal reasons within the 3 months. In the Control group, 217 subjects were retrospectively interviewed. Twenty-three of them were excluded for being continuously managed according to GOLD guidelines for more than 1 month after leaving hospital. Three subjects were lost to follow-up. Thus, 15 subjects in the Managed group and 191 in the Control group were included in the final sample ().

Figure 1 Population flow chart.

Figure 1 Population flow chart.

Candidate Features to Classify Patients

The candidate predictors included demography characteristics of patients, and indicators of COPD assessment based on GOLD guidelines 2017. They were sex, age, education level, smoking history, BMI before and after 1-year follow-up, moderate and severe AECOPD frequency in previous 12 months before and after 1-year follow-up, modified medical British Research Council (mMRC) score after 1-year follow-up, COPD assessment test (CAT)Citation14,Citation15 score after 1-year follow-up, lung function testing results after inhaling bronchodilator including FEV1, FEV1% predicted value (FEV1%pred), FVC, and FEV1/FVC before and after 1-year follow-up. Moderate and severe AECOPD were defined as GOLD guidelines.Citation12 Briefly, the moderate AECOPD is defined as requiring systemic corticosteroids or antibiotics or both, and the severe one is defined as receiving treatment at the emergency department or hospital.Citation12

Statistical Analysis

Data were analyzed using SAS 9.4 (SAS Institute Inc., Cary, NC, USA). Group data were expressed as the mean±standard deviation (SD). Significant differences in patient characteristics were evaluated using independent-samples t-test or chi-square test. P-values<0.05 were considered to be statistically significant.

Support Vector Machines

The classification model was developed using the LibSVM algorithm.Citation16 The model was built using Python 3.5.5 programming language, scikit-learn 20.0 library,Citation17,Citation18 which is a powerful tool for scientific research.Citation19,Citation20 In each group of subjects, 80% were randomly selected (training sample), who were used to develop the model. The remaining 20% (testing sample) served to test the model. Twelve patients in the Managed group and 151 in the Control group were randomly selected as the training sample, leaving three patients in the Managed group and 40 in the Control group as the testing sample. Since there was an imbalance in sample size between the Managed group and Control group (15 vs 191), we applied a synthetic minority over-sampling technique (SMOTE)Citation21 procedure to up-sample the Managed group. In order to keep the testing efficiency of the testing dataset in a real background, only the training dataset was up-sampled, but not the testing dataset. In our SMOTE procedure, four nearest neighbors of each sample in the minority class were used in the training model,Citation22 which finally included 60 samples in the Managed group training dataset. The training data were standardized using z-score transformation, and the testing data were also transformed using the same parameters as those from the training data.

The support vector machine recursive feature elimination (SVM-RFE) algorithmCitation23 was used to find the features that could optimize the performance of the classifier. We used the grid-search and 10-fold cross-validation to train and estimate SVM hyperparameters. The grid-search was performed on the ranges of C = 0.01–10, kernel = linear or radial basis function (RBF). The best hyper parameters were RBF kernel, cost parameter as 1, gamma as scale. These hyperparameters and 10-fold cross-validation were used to train the classifier using the training set. The performance of the classifier was assessed on the testing dataset, which was not used during the training step. Since being lost to follow-up lead to missing data in our study, we excluded data from those who were lost to follow-up.

Results

Patient Characteristics

illustrates participants’ characteristics. The Managed group showed a higher proportion of smoking than the Control group. After 1-year of continuous management according to GOLD guidelines, the Managed Group showed less AECOPD frequency in the previous 12 months and a lower mMRC score than the Control group, whereas the AECOPD frequency was not significantly different before the follow-up. For patients in the COPD group according to the standard of GOLD, there were more patients in Group A and Group D in the Managed group than the Control group after 1-year follow-up, whereas there were less patients in Group B and Group C in the Managed group.

Table 1 Participants’ Characteristics in the Managed Group and the Control Group

For lung function (), when inhaling a bronchodilator, FEV1 was significantly higher in the Managed group than the Control group before 1-year following, while FEV1 was higher in the Managed group after 1-year follow-up even though not significant. In contrast, FVC after inhaling a bronchodilator was significantly higher in the Managed group after 1-year follow-up, and was also higher before the follow-up, although not significant.

Table 2 Participants’ Lung Function Post Inhaling Bronchodilator in the Managed Group and the Control Group

Model

When selected by SVM-RFE, the features in the model were smoking history, post-bronchodilator FVC before management, and those after 1-year follow-up including BMI, moderate and severe AECOPD frequency in the previous 12 months, mMRC score, post-bronchodilator FEV1, post-bronchodilator FEV1%pred, post-bronchodilator FVC, and post-bronchodilator FEV1/FVC. Those features could classify whether COPD patients were continuously managed according to GOLD guidelines using SVM. When the model was tested in the testing dataset, the positive predictive value (PPV) is 66.7% and F1 score is 0.978. The under the ROC (AUC) is 0.987. The kappa is 0.788. Even though the SVM hyperparameters were estimated by grid-search, a sensitivity analysis was conducted by reducing hyperparameter C in order to observe the robustness of the model. When we decreased the C-value to 0.1, the testing results were the same as those of the primary model, suggesting the robustness of our model.

Discussion

This study shows that we were able to develop a multivariable classifier as a promising tool to identify whether COPD patients were continuously managed according to GOLD guidelines in the previous 1 year. This model includes variables from spirometry, BMI, and moderate and severe AECOPD frequency, which could be used by both pulmonologists and general practitioners in everyday care.

In our study, the classification model is developed using LibSVM,Citation16 in which the hyperparameters were trained and estimated using the grid-search. The hyperparameters for a model influence the efficiency of the classifier and the result of classification. The optimal values of hyperparameters could eliminate the possibility of overfitting and underfitting. However, it’s difficult to get optimal values of hyperparameters (C and kernel) in the training dataset on a given problem. Thus, we used a grid-search and 10-fold cross-validation technique to get the optimal values of hyperparameters for a kernel-based SVM model with the training dataset. The grid-search performs a comprehensive search over the specified parameter values for an estimator. Since SVMs using grid-search have shown good performance in medical research,Citation24Citation26 the selected hyperparameters in our study should be the best for the model and contribute to the best results. In addition, our sensitivity analysis shows the robustness of the results of our model, confirming the optimal values of hyperparameters.

The imbalance problem in classification is quite common in medical data. The imbalanced datasets result in the classifier, which has a bias towards the majority class and tends to produce a majority class classifier.Citation27 In most cases, the class of interest is the minority class, which is the cause of lower sensitivity. In our study, the sample size of the Managed group and the Control group is significantly imbalanced (15 vs 191). To alleviate the effect of the imbalance in the training dataset, we adopted the powerful and effective SMOTE algorithm in the Managed group of the training dataset. In contrast, the SMOTE algorithm was not performed in the testing dataset, so the model could be tested in a real background. SMOTE performs better than simple oversampling when it’s with SVM as a base classifier.Citation28 Moreover, since the overall effect of the model is significantly influenced by correctly classifying the majority class, we assessed the performance of the model, in our study, using f1 score, PPV, and AUC, which take into account the performance regarding the minority class as well. Our results showed a good performance of the model to classify patients between the Managed group and Control group in the testing sample, suggesting a good generalization of the model.

In our study, nine features to classify those subjects were selected using SVM-RFE, which are smoking history, post-bronchodilator FVC before management and those after 1-year follow-up including BMI, moderate, and severe AECOPD frequency in previous 12 months, mMRC score, post-bronchodilator FEV1, post-bronchodilator FEV1%pred, post-bronchodilator FVC, and post-bronchodilator FEV1/FVC. According to GOLD guidelines, spirometry is the basis for COPD diagnosis and airflow limitation assessment.Citation12 And mMRC score and moderate or severe exacerbation history are the basis for assessing symptoms and the risk of exacerbation.Citation12 First, the mMRC score is useful to assess dyspnea symptoms, the risk of exacerbation, and hospitalization.Citation29 Second, moderate and severe AECOPD accelerates the process of COPD, which relates with high mortality and a repaid decline in health status.Citation12,Citation30,Citation31 Thus, moderate and severe AECOPD is considered in COPD assessment. Third, BMI is associated with FEV1 decline in COPD patients,Citation32 indicating the therapeutic efficiency. Therefore, mMRC score, moderate and severe exacerbation history, and BMI are closely related with the efficiency of COPD treatment and the prognosis of patients. On the other hand, in GOLD guidelines, the overall goals of COPD management are to optimize pulmonary function, to prevent progression, to improve quality-of-life, and to prevent and reduce the frequency and severity of exacerbations.Citation12 Thus, those features in our model were consistent with the overall goal of COPD management according to GOLD guidelines. Moreover, those predicative indicators are objective and accessible in clinical practice. Therefore, our model has implementation feasibility in clinical practice.

Since the overall goals of COPD management in GOLD guidelines are related with pulmonary function, prevent progression, exacerbations, and symptoms,Citation12 we selected candidate features for the model as follows: demographic characteristics, smoking history, BMI, moderate and severe AECOPD frequency in previous 12 months, mMRC score, CAT score, post-bronchodilator FEV1, FEV1%pred, FVC, and FEV1/FVC before and/or after 1-year follow-up. Those candidate predictors are involved in the overall goals of COPD standard management. The GOLD COPD group of patients was not selected as a candidate in our model, because the factors to assess patient’s COPD group have been included in the model, such as post-bronchodilator FEV1 and FEV1%pred, mMRC score, CAT score, and moderate and severe AECOPD frequency in previous 12 months. Those factors that were respectively selected into the model could keep more information than using the COPD group of patients. That could avoid multicollinearity between features in the model as well.

We acknowledge limitations in this study. First, the controls were not parallel to the Managed group. The controls were recruited from a retrospective cohort, comprising hospitalized patients with COPD 1 year before the Managed group recruitment. Although the controls were diagnosed according to GOLD 2016 guidelines, the diagnosis standard and the majority of management in GOLD 2016 guidelines were the same as GOLD 2017 which was used for the Managed Group.Citation12,Citation13 Moreover, controls were recruited from the same hospitals as the Managed group, as well as using the same inclusion and exclusion criteria. Thus, it may not bias the results of this study. Second, the sample size in the Managed group was significantly smaller than the Control group (15 vs 191). We adopted the SMOTECitation21 up-sampling algorithm to increase the sample size of the Managed group in the training dataset. The SMOTE up-sampling algorithm adds synthetic data between the minority sample and its nearest neighbors based on a distance which is calculated by standard Euclidean distance between minority samples. That SMOTE algorithm avoids problems caused by simple oversampling with replacement and undersampling, such as low generalization of the final model or not taking full advantage of the original dataset.Citation33Citation35 Furthermore, when we tested the model in the testing dataset without up-sampling, our model showed a good performance in f1 score, PPV, and AUC. Thus, SMOTE could solve the problem of imbalanced sample size between two groups in our SVM model.

Conclusions

In conclusion, a limited number of quantitative indicators could classify whether COPD patients are continuously managed according to GOLD guidelines in the previous 1 year. This classification model could be useful and readily applicable in clinical practice to help doctors make decisions and enhance COPD patients’ compliance with continuous and standard treatment according to GOLD guidelines in the long-term, which is crucial to improve the prognosis of COPD patients.

Data Sharing Statement

The ClinicalTrials.gov ID is NCT3314077. The individual deidentified participant data, specific data, and other study documents are available from the corresponding author upon reasonable request. The data has been made available from July 2020, which will last 3 years.

Disclosure of Interest

The authors report no conflicts of interest in this work. Jie Xia and Lina Sun contributed equally to this study and should be considered co-first authors.

References

  • The WHO website. Available from: https://www.who.int/respiratory/copd/burden/en/. Accessed 417, 2020.
  • WangC, XuJ, YangL, et al. Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China pulmonary health [CPH] study): a national cross-sectional study. Lancet. 2018;391(10131):1706–1717. doi:10.1016/S0140-6736(18)30841-929650248
  • XiaoT, ChenX, WangN, ZhaoQ, FuC, XuB. Study on the situation of drug use in patients with chronic obstructive pulmonary diseases in the Chinese communities of large cities. Chin J Epidemiol. 2017;38(2):142–146.
  • SuraniS, AiyerA, EikermannS, et al. Adoption and adherence to chronic obstructive pulmonary disease GOLD guidelines in a primary care setting. SAGE Open Med. 2019;7:2050312119842221. doi:10.1177/205031211984222131057794
  • PalmiottiGA, LacedoniaD, LiotinoV, et al. Adherence to GOLD guidelines in real-life COPD management in the Puglia region of Italy. Int J Chron Obstruct Pulmon Dis. 2018;13:2455–2462. doi:10.2147/COPD.S15777930147311
  • KimTO, ShinHJ, KimYI, et al. Adherence to the GOLD guideline in COPD management of South Korea: findings from KOCOSS study 2011–2018. Chonnam Med J. 2019;55(1):47–53. doi:10.4068/cmj.2019.55.1.4730740340
  • VogelmeierCF, CrinerGJ, MartinezFJ, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive lung disease 2017 report. GOLD executive summary. Am J Respir Crit Care Med. 2017;195(5):557–582. doi:10.1164/rccm.201701-0218PP28128970
  • YuW, LiuT, ValdezR, GwinnM, KhouryMJ. Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med Inform DecisMak. 2010;10:16. doi:10.1186/1472-6947-10-16
  • CornforthDM, DeesJL, IbbersonCB, et al. Pseudomonas aeruginosa transcriptome during human infection. Proc Natl Acad Sci U S A. 2018;115(22):E5125‐E5134.
  • EkinsS, PuhlAC, ZornKM, et al. Exploiting machine learning for end-to-end drug discovery and development. Nat Mater. 2019;18(5):435‐441.
  • IrimiaA, LeiX, TorgersonCM, JacokesZJ, AbeS, Van HornJD. Support vector machines, multidimensional scaling and magnetic resonance imaging reveal structural brain abnormalities associated with the interaction between autism spectrum disorder and sex. Front Comput Neurosci. 2018;12:93. doi:10.3389/fncom.2018.0009330534065
  • The GOLD website. Available from: https://goldcopd.org/. Accessed 101, 2017.
  • The GOLD website. Available from: https://goldcopd.org/. Accessed 101, 2016.
  • ZhouQT, MeiJJ, HeB, et al. Chronic obstructive pulmonary disease assessment test score correlated with dyspnea score in a large sample of Chinese patients. Chin Med J (Engl). 2013;126(1):11–15.23286470
  • TuYH, ZhangY, FeiGH. Utility of the CAT in the therapy assessment of COPD exacerbations in China. BMC Pulm Med. 2014;14:42. doi:10.1186/1471-2466-14-4224618290
  • ChangCC, LinC. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2013;2:1–39. doi:10.1145/1961189.1961199
  • Python software website. http://www.python.org. Available from:Accessed 105, 2019.
  • Scikit-learn website. https://scikit-learn.org/stable/. Availabel from:Accessed 105, 2019.
  • LimaI. Python for scientific computing python overview. Mar Chem. 2006;9:10–20.
  • MillmanKJ, AivazisM. Python for scientists and engineers. Comput Sci Eng. 2011;13(2):9–12. doi:10.1109/MCSE.2011.36
  • BowyerKW, ChawlaNV, HallLO, KegelmeyerWP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–357.
  • WolffP, GrañaM, RíosSA, YarzaMB. Machine learning readmission risk modeling: a pediatric case study. Biomed Res Int. 2019;2019:8532892. doi:10.1155/2019/853289231139655
  • LiF, ZhaoC, XiaZ, WangY, ZhouX, LiGZ. Computer-assisted lip diagnosis on traditional Chinese medicine using multi-class support vector machines. BMC Complement Altern Med. 2012;12:127. doi:10.1186/1472-6882-12-12722898352
  • ArslanAK, ColakC, SarihanME. Different medical data mining approaches based prediction of ischemic stroke. Comput Methods Programs Biomed. 2016;130:87‐92. doi:10.1016/j.cmpb.2016.03.022
  • GuptaY, LamaRK, KwonGR. Prediction and classification of Alzheimer’s disease based on combined features from apolipoprotein-E genotype, cerebrospinal fluid, MR, and FDG-PET imaging biomarkers. Front Comput Neurosci. 2019;13:72.31680923
  • ZhaoD, LiuH, ZhengY, HeY, LuD, LyuC. A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput. 2019;57(4):901‐912. doi:10.1007/s11517-018-1930-0
  • WeiQ, DunbrackRL. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS One. 2013;8(7):e67863. doi:10.1371/journal.pone.006786323874456
  • WallaceB, SmallK, BrodleyC, TrikalinosT. Data mining (ICDM), 2011 IEEE 11th International Conference on; 2011; Vancouver, Canada: Class imbalance, Redux 754–763.
  • ChengSL, LinCH, WangCC, et al. Comparison between COPD assessment test (CAT) and modified medical research council (mMRC) dyspnea scores for evaluation of clinical symptoms, comorbidities and medical resources utilization in COPD patients. J Formos Med Assoc. 2019;118(1 Pt 3):429–435. doi:10.1016/j.jfma.2018.06.01830150099
  • SuissaS, Dell’AnielloS, ErnstP. Long-term natural history of chronic obstructive pulmonary disease: severe exacerbations and mortality. Thorax. 2012;67(11):957–963. doi:10.1136/thoraxjnl-2011-20151822684094
  • MüllerovaH, MaselliDJ, LocantoreN, et al. Hospitalized exacerbations of COPD: risk factors and outcomes in the ECLIPSE cohort. Chest. 2015;147(4):999–1007. doi:10.1378/chest.14-065525356881
  • SunY, MilneS, JawJE, et al. BMI is associated with FEV1 decline in chronic obstructive pulmonary disease: a meta-analysis of clinical trials. Respir Res. 2019;20(1):236.31665000
  • NakamuraM, KajiwaraY, OtsukaA, KimuraH. LVQ-SMOTE-learning vector quantization based synthetic minority over-sam-pling technique for biomedical data. BioData Min. 2013;6:1–10. doi:10.1186/1756-0381-6-1623294634
  • LópezV, FernandezA, GarciaS, PaladeV, HerreraF. An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci. 2013;250:113–141. doi:10.1016/j.ins.2013.07.007
  • LuengoJ, FernandezA, GarciaS, HerreraF. Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft Comput. 2011;15:1909–1936. doi:10.1007/s00500-010-0625-8