162
Views
5
CrossRef citations to date
0
Altmetric
Articles

Cross-project defect prediction using data sampling for class imbalance learning: an empirical study

ORCID Icon, , &
Pages 130-143 | Received 15 Jan 2019, Accepted 26 Jul 2019, Published online: 06 Aug 2019

References

  • Tahir MA, Kittler J, Yan F. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit. 2012;45(10):3738–3750. doi: 10.1016/j.patcog.2012.03.014
  • Zheng J. Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl. 2010;37(6):4537–4543. doi: 10.1016/j.eswa.2009.12.056
  • Sun Z, Song Q, Zhu X. Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern C (Appl Rev). 2012;42(6):1806–1817. doi: 10.1109/TSMCC.2012.2226152
  • Liu M, Miao L, Zhang D. Two-stage cost-sensitive learning for software defect prediction. IEEE Trans Reliab. 2014;63(2):676–686. doi: 10.1109/TR.2014.2316951
  • Jing XY, Ying S, Zhang Z W, et al. Dictionary learning based software defect prediction. Proceedings of the 36th International Conference on Software Engineering; 2014 May 31–Jun 7; Hyderabad, India. New York (NY): ACM; 2014. p. 414–423.
  • Ma Y, Luo G, Zeng X, et al. Transfer learning for cross-company software defect prediction. Inf Softw Technol. 2012;54(3):248–256. doi: 10.1016/j.infsof.2011.09.007
  • Nam J, Pan SJ, Kim S. Transfer defect learning. 35th International Conference on Software Engineering (ICSE); 2013 May 18–26; San Francisco (CA). IEEE; 2013. p. 382–391.
  • Herbold S. Training data selection for cross-project defect prediction. 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM); 2017 Nov 9–10; Toronto (ON). IEEE; 2013. p. 6–16.
  • Peters F, Menzies T, Marcus A. Better cross company defect prediction. 10th Working Conference on Mining Software Repositories (MSR); 2013 May 18–19; San Francisco (CA). IEEE; 2013, p. 409–418.
  • Canfora G, De Lucia A, Di Penta M, et al. Multi-objective cross-project defect prediction. IEEE Sixth International Conference on Software Testing, Verification and Validation; 2013 Mar 18–22; Luxembourg. IEEE; 2013. p. 252–261.
  • Singh P, Verma S, Vyas O. Cross company and within company fault prediction using object oriented metrics. Int J Comput Appl. 2013;74: 5–11.
  • Dejaeger K, Verbraken T, Baesens B. Toward comprehensible software fault prediction models using Bayesian network classifiers. IEEE Trans Softw Eng. 2013;39(2):237–257. doi: 10.1109/TSE.2012.20
  • Turhan B, Mısırlı A, Bener A. Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol. 2013;55(6):1101–1118. doi: 10.1016/j.infsof.2012.10.003
  • He P, Li B, Ma Y. Towards cross-project defect prediction with imbalanced feature sets. Cornell University Library; 2014.
  • Panichella RO, De Lucia A. Cross-project defect prediction models: L’Union fait la force. Software Evolution Week – IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE); 2014 Feb 3–6; Antwerp, Belgium. IEEE; 2014. p. 164–173.
  • Dai W, Yang Q, Xue G, et al. Boosting for transfer learning. Proceedings of the 24th international conference on Machine Learning; 2007 Jun 20–24; Corvalis (OR). New York (NY): ACM; 2007. p. 193–200.
  • Mizuno O, Hirata Y. A cross-project evaluation of text-based fault-prone module prediction. 6th International Workshop on Empirical Software Engineering in Practice; 2014 Nov 12–13; Osaka, Japan. IEEE; 2014. p. 43–48.
  • Nam J, Kim S. Heterogeneous defect prediction. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering; 2015 Aug 30–Sept 4; Bergamo, Italy. New York (NY): ACM; 2015. p. 508–519.
  • Jing XY, Wu F, Dong X, et al. Heterogeneous cross company defect prediction by unified metric representation and CCA-based transfer learning. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering; 2015 Aug 30–Sept 4; Bergamo, Italy. New York (NY):ACM; 2015. p. 496–507.
  • Ryu D, Jang J-I, Baik J. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J. 2017;25(1):235–272. doi: 10.1007/s11219-015-9287-1
  • Chen L, Fang B, Shang Z, et al. Negative samples reduction in cross-company software defects prediction. Inf Softw Technol. 2015;62:67–77. doi: 10.1016/j.infsof.2015.01.014
  • Wang S, Yao X. Using class imbalance learning for software defect prediction. IEEE Trans Reliab. 2013;62(2):434–443. doi: 10.1109/TR.2013.2259203
  • Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–357. doi: 10.1613/jair.953
  • Elish KO, Elish MO. Predicting defect-prone software modules using support vector machines. J Syst Softw. 2008;81(5):649–660. doi: 10.1016/j.jss.2007.07.040
  • Chidamber SR, Kemerer CF. A metrics suite for object oriented design. IEEE Trans Softw Eng. 1994 June;20:476–493. doi: 10.1109/32.295895
  • Zimmermann T, Nagappan N, Gall H, et al. Cross-project defect prediction. Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on The Foundations of Software Engineering; 2009 Aug 24–28; Amsterdam, The Netherlands. New York (NY): ACM; 2009. p. 91–100.
  • Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010 Oct;22:1345–1359. doi: 10.1109/TKDE.2009.191
  • Ryu D, Choi O, Baik J. Value-cognitive boosting with a support vector machine for crossproject defect prediction. Empir Softw Eng. 2016;21(1):43–71. doi: 10.1007/s10664-014-9346-4
  • Zhang F, Zheng Q, Zou Y, et al. Cross-project defect prediction using a connectivity-based unsupervised classifier. IEEE/ACM 38th International Conference on Software Engineering (ICSE); 2016 May 14–22; Austin (TX). IEEE; 2016.
  • Xia X, Lo D, Pan SJ, et al. Hydra: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng. 2016;42(10):977–998. doi: 10.1109/TSE.2016.2543218
  • Herbold S, Trautsch A, Grabowski J, et al. A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Softw Eng. 2018;44(9):811–833.
  • Estabrooks A, Jo T, Japkowicz N. A multiple resampling method for learning from imbalanced data sets. Comput Intell. 2004;20(1):18–36. doi: 10.1111/j.0824-7935.2004.t01-1-00228.x
  • Japkowicz N, Myers C, Gluck M. A novelty detection approach to classification. IJCAI (US). 1995;1:518–523.
  • Zhou ZH, Liu XY. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng. 2006;18(1):63–77. doi: 10.1109/TKDE.2006.17
  • Rokach L. Ensemble-based classifiers. Artif Intell Rev. 2010;33(1–2):1–39. doi: 10.1007/s10462-009-9124-7
  • Wang S, Chen H, Yao X. Negative correlation learning for classification ensembles. Proceedings of the International Joint Conference on Neural Networks, WCCI; 2010. p. 2893–2900.
  • Chawla NV, Lazarevic A, Hall LO, et al. SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrac N, Gamberger D, Todorovski L, et al., editors. Knowledge discovery in databases: PKDD 2003. European Conference on Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 2838. Berlin: Springer; 2003. p. 107–119.
  • Briand LC, Wuest J, Ikonomovski S, et al. Investigation of quality factors in object-oriented designs: an industrial case study. Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002); 1999 May 22–22; Los Angeles (CA). IEEE; 1999. p. 345–354.
  • Tang MH, Kao MH, Chen MH. An empirical study on object oriented metrics. Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403); 1999 Nov 4–6; Boca Raton (FL). IEEE; 1999. p. 242–249.
  • Briand LC, Wuest J, Daly JW, et al. Exploring the relationship between design measures and software quality in object oriented systems. J Syst Softw. 2000;51(3):245–273. doi: 10.1016/S0164-1212(99)00102-8
  • Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. doi: 10.1023/A:1010933404324
  • Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013 Dec;7:21. doi:10.3389/fnbot.2013.00021.
  • Kim S, Zhang H, Wu R, et al. Dealing with noise in defect prediction. 33rd International Conference on Software Engineering (ICSE); 2011 May 21–28; Honolulu (HI). IEEE; 2011. p. 481–490.
  • Ozturk MM, Zengin A. HSDD: a hybrid sampling strategy for class imbalance in defect prediction data sets. Eleventh International Conference on Digital Information Management (ICDIM); 2016 Sept 19–21; Porto, Portugal. IEEE; 2016. p. 60–69.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.