79
Views
0
CrossRef citations to date
0
Altmetric
Articles

Studying the effectiveness of deep active learning in software defect prediction

ORCID Icon &
Pages 534-552 | Received 11 Jun 2023, Accepted 19 Aug 2023, Published online: 05 Sep 2023
 

Abstract

Accurate prediction of defective software modules is of great importance for prioritizing quality assurance efforts, reasonably allocating testing resources, reducing costs and improving software quality. Several studies have used machine learning to predict software defects. However, complex structures and imbalanced class distributions in software defect data make learning an effective defect prediction model challenging. In this article, two deep learning-based defect prediction models using static code metrics are proposed. In order to enhance the learning process and improve the performance of the proposed models, pool-based active learning is employed. In this regard, the possibility of using active learning to mitigate the need for a large amount of labeled data in the process of building deep learning models is investigated. To deal with imbalanced distribution of software modules between defective and non-defective classes, Near-Miss under-sampling and KNN, with different number of neighbors, are used. The reason for choosing them is their good performance in binary classification problems. Experiments are performed on two well-known, publicly available datasets, GitHub Bug Dataset and public Unified Bug Dataset for java projects. The evaluation results reveal the effectiveness of our proposed models in comparison to the traditional machine learning algorithms. In the conducted investigations on the Unified Bug Dataset, at the file level, the value of F-measure and AUC criteria have improved by 13 and 11 percent, respectively and at the class level, the values have improved by 14 and 11 percent, respectively.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

Additional information

Notes on contributors

Farid Feyzi

Farid Feyzi received his M.S. degree in Software Engineering from the Sharif University of Technology in 2012, and his Ph.D. in Software Engineering from the Iran University of Science and Technology in 2018. He is an assistant professor of Software Engineering at the University of Guilan. His research focus is on developing statistical algorithms to improve software quality with an emphasis on statistical fault localization and automated test data generation.

Arman Daneshdoost

Arman Daneshdoost recieved his M.S. degree in Software Engineering from University of Guilan in 2020. His research focus is on designing, developing and fine tunning deep learning models and transformers to improve the performance of automated industrial and service based systems and applications.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 288.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.