308
Views
10
CrossRef citations to date
0
Altmetric
Original Articles

Unbalanced data classification using support vector machines with active learning on scleroderma lung disease patterns

, &
Pages 676-689 | Received 26 Feb 2014, Accepted 15 Oct 2014, Published online: 09 Dec 2014
 

Abstract

Unbalanced data classification has been a long-standing issue in the field of medical vision science. We introduced the methods of support vector machines (SVM) with active learning (AL) to improve prediction of unbalanced classes in the medical imaging field. A standard SVM algorithm with four different AL approaches are proposed: (1) The first one uses random sampling to select the initial pool with AL algorithm; (2) the second doubles the training instances of the rare category to reduce the unbalanced ratio before the AL algorithm; (3) the third uses a balanced pool with equal number from each category; and (4) the fourth uses a balanced pool and implements balanced sampling throughout the AL algorithm. Grid pixel data of two scleroderma lung disease patterns, lung fibrosis (LF), and honeycomb (HC) were extracted from computed tomography images of 71 patients to produce a training set of 348 HC and 3009 LF instances and a test set of 291 HC and 2665 LF. From our research, SVM with AL using balanced sampling compared to random sampling increased the test sensitivity of HC by 56% (17.5% vs. 73.5%) and 47% (23% vs. 70%) for the original and denoised dataset, respectively. SVM with AL with balanced sampling can improve the classification performances of unbalanced data.

Acknowledgements

We would like to thank R01 HL072424 from the National Institutes of Health for sharing their data and making this research project possible. Also, we would like to thank Dr Jonathan Goldin.

Disclosure statement

No potential conflict of interest was reported by the authors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.