162
Views
5
CrossRef citations to date
0
Altmetric
Articles

Cross-project defect prediction using data sampling for class imbalance learning: an empirical study

ORCID Icon, , &
Pages 130-143 | Received 15 Jan 2019, Accepted 26 Jul 2019, Published online: 06 Aug 2019
 

Abstract

The presence of defect data related to different projects leads to cross-project defect prediction an open issue in the field of research in software engineering. In cross-project defect prediction, the source and the target projects are different. The prediction model is trained by using the data sources of the different projects and then it is tested on the target data source. The data source from the varying projects leads to a highly imbalanced source dataset. The performance of the predictive model degrades due to this imbalance nature of the dataset. This is termed as the class imbalance problem in machine learning. This paper conducts an empirical analysis in a bi-fold manner. It evaluates whether data sampling techniques can handle the class imbalance problem and improve the performance of the predictive model for cross-project defect prediction (CPDP). Secondly, it also evaluates whether the results of CPDP after data sampling are comparable to within project defect prediction (WPDP). Ensemble learning classifiers are used as the predictive model over 12 publically available object-oriented project datasets. The experimental results infer that SMOTE oversampling can be applied to overcome the problem of class imbalance on CPDP. It also gives comparable results to WPDP with statistical significance.

GRAPHICAL ABSTRACT

Disclosure statement

No potential conflict of interest was reported by the authors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.