213
Views
12
CrossRef citations to date
0
Altmetric
Original Articles

Comparing software fault predictions of pure and zero-inflated Poisson regression models

, &
Pages 705-715 | Received 01 Dec 2001, Accepted 01 Jun 2004, Published online: 23 Feb 2007
 

Abstract

Predicting the software quality prior to system tests and operations has proven to be useful for achieving effective reliability improvements. Poisson (pure) regression modelling is the most commonly used count modelling technique for predicting the expected number of faults in software modules. It is best suited to when the distribution of the fault data (dependent variable) is not biased, that is equidispersed fault data, whose mean equals the variance. However, in software fault data we often observe a large portion of zeros (no faults), especially in high-assurance systems. In such cases a pure Poisson regression model (PRM) may yield inaccurate fault predictions. A zero-inflated Poisson (ZIP) model changes the mean structure of a PRM, resulting in improved predictive quality. To illustrate the same, we examined software data collected from a full-scale industrial software system. Fault prediction models were calibrated using both pure Poisson and ZIP regression techniques. To prevent claims based on a biased data split (for the fit and test data sets), the data set was randomly split 50 times, and models were calibrated using each of these split combinations. A comparative hypothesis test between the pure Poisson and ZIP modelling techniques was performed. The test revealed that the ZIP model fitted better than its counterpart. Our comprehensive empirical comparative study presented in this paper showed that the ZIP model yielded better predictions than the PRM and also demonstrated better robustness in prediction accuracy across the 50 data splits.

Acknowledgments

Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering Laboratory. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, and statistical modeling. He has published more than 200 refereed papers in these areas. He has been a principal investigator and project leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the Association for Computing Machinery, the IEEE Computer Society, and IEEE Reliability Society. He served as the general chair of the 1999 International Symposium on Software Reliability Engineering (ISSRE’99), and the general chair of the 2001 International Conference on Engineering of Computer Based Systems. Also, he has served on technical program committees of various international conferences, symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems.

Kehan Gao received the Ph.D. degree in Computer Engineering from Florida Atlantic University, Boca Raton, FL, USA, in 2003. She is currently an Assistant Professor in the Department of Mathematics and Computer Science at Eastern Connecticut State University. Her research interests include software engineering, software metrics, software reliability and quality engineering, computer performance modeling, computational intelligence, and data mining. She is a member of the IEEE Computer Society and the Association for Computing Machinery.

Robert M. Szabo received the Ph.D. degree in computer science from Florida Atlantic University, Boca Raton, FL, USA in 1995. He received the M.S. degree (1981) in computer science and the B.S. degree (1980, Summa Cum Laude) in computer science from Cleveland State University, Cleveland, OH, USA. He is currently a Senior I/T Architect in IBM Software Group, Public Sector Solutions Development, Boca Raton, FL, USA. He is a member of the IEEE; the Empirical Software Engineering Laboratory, Florida Atlantic University, Boca Raton, FL, USA; and an Industrial Affiliate of the Center for Cardiovascular Bioinformatics and Modeling, Johns Hopkins University, Baltimore, MD, USA. His research interests include software quality engineering, software quality modeling, and databases for supporting biological data mining.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,413.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.