163
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Multi-Step Dynamic Ensemble Selection to Estimate Software Effort

, , & ORCID Icon
Article: 2351718 | Received 18 Mar 2024, Accepted 26 Apr 2024, Published online: 16 May 2024

ABSTRACT

Software Effort Estimation (SEE) is a foremost concern of software companies in order to successfully develop and deliver software products within a defined budget and time. Many software companies fail to deliver the product on time, either due to problems of over-estimation or under-estimation. In order to aid the decision-making process of the analyst and experts, the paper proposed a multi-step dynamic ensemble selection (MS-DES) approach. The DES works in two steps; I) in order to select the suitable models from the pool of models, which are anticipated to perform best when generating a prediction. II) to predict the labeled discretized effort more accurately. The paper utilized four software effort datasets discretized into labeled effort ranges. The performance of the proposed model is evaluated based on the K nearest neighbor oracle (KNORA) canonical approach to DES and in order to reduce the complexity, filter feature selection techniques are applied to extract the relevant feature set. The proposed feature selection-based MS-DES model outplayed the individual models in predicting labeled effort in terms of confusion metrics parameters with an accuracy of more than 90% with all datasets and the results are validated using the ROC curve.

Introduction

Software Effort Estimation is a critical task in software project development. The efforts required to complete the development process is been predicted in person-hours or man-months. There are multiple software development failures that are caused due to unattainable or poorly defined project goals, incorrect resource estimates; overestimation or underestimation, and a failure to control project complexities. Effort estimation is always crucial before the development of the software project which is also an ongoing activity to prevent software failures. In recent times, researchers proposed methods have been put forth, for achieving high predictability of effort. More lately, the community has become interested in the application of machine learning techniques, for software effort estimation. However, no single ML model is considered effective for all the software effort datasets. Therefore, identifying a model, that will be efficient for all the software datasets and will give the best performance in terms of accurate estimation is always a tough assignment.

Furthermore, Shepperd and Kadoda (Citation2001) argue that choosing the best model for a given situation is more beneficial than choosing the best individual model because models have unpredictable behavior across different data sets. This is due to the fact that no model can be said to be the best for all software effort data sets, in accordance with the no-free lunch theorem as discussed by Bardsiri et al. (Citation2013). One way to swamp this problem is to utilize diverse regression or classifier algorithms (ensemble learning) is a different way to get over these problematic situations as in Kocaguneli, Menzies, and Keung (Citation2011), Idri, Hosni, and Abran (Citation2016). Since ensembles are a combination of multiple ML algorithms, are more dependable as in Kocaguneli, Menzies, and Keung (Citation2011). Increased accuracy and robustness have been noted as advantages of ensembles over individual models as in Mendes-Moreira et al. (Citation2012).

The selection of models may be either static or dynamic. In a static selection, the same model subset would be used to make predictions for each test instance. In addition, these systematic reviews by Wen et al. (Citation2012), Idri, Hosni, and Abran (Citation2016), Jorgensen and Shepperd (Citation2006) involve ML, ensembles, and effort; they also present a thorough analysis of dynamic selection by Britto, Sabourin, and Oliveira (Citation2014); and, agile-based software development effort estimation by Usman et al. (Citation2014).

Therefore, our research proposes a multi-step dynamic-ensemble selection (MSDES) of heterogeneous models to handle SEE challenges. Since SEE is a regression problem, the models utilized in the preliminary phase for effort prediction are regression models. Using the concept of discretization, the original data set is transformed into a new data set for classifiers. The data set’s target element in the original dataset (effort, worksup), is detached based on the phenomenon of discretization, and classes are formed for each.

This article evaluates the use of DES models with machine learning algorithms. The dynamic-selection of heterogeneous models, made up of a collection of classification models, is suggested as a means of solving the SEE problem. Performance evaluation and comparisons between the suggested method and previously published individual models and ensemble methods have both been done using the confusion metrics parameters. The present paper compares the proposed methods with individual classifiers, and dynamic ensemble selection methods, which shows the evolution of the proposed model from the individual models. The paper also utilizes feature selection techniques for extracting the best relevant features of the dataset, in order to increase the accuracy and minimize the complexity, with a limited number of features.

Our Contribution

The article aspires to facilitate the transition of both academic institutions and corporate entities from traditional modes of effort estimation to AI-driven transformation. It assesses the efficacy of employing DES models in conjunction with machine learning algorithms for this purpose. The proposed approach advocates the utilization of dynamically selected heterogeneous models, comprising a variety of classification models, as a strategy to address the Software Effort Estimation (SEE) challenge. Furthermore, it underscores the necessity of a comprehensive understanding of the impact of enabling technologies on digitization and adherence to design principles when formulating implementation plans. The main contributions of article are as follows:

  • Employing machine learning and ensemble learning models on the datasets is aimed at achieving enhanced accuracy in effort estimation.

  • The paper utilizes the phenomenon of discretization, the output variable is classified into ordered label, to transform the task from regression to classification.

  • The paper evaluates the effectiveness of DES models with machine learning algorithms, particularly focusing on the dynamic selection of heterogeneous models comprising various classification models.

  • The utilization of feature selection techniques to enhance accuracy and reduce complexity.

  • The paper compares the performance of multiple canonical approach to dynamic ensemble selection based on evaluation parameters.

Article Organisation

The rest of the paper is organized as follows. Background briefly reviews the work in the domain and definition of SEE. Methodology discusses the methodology and formal definition of techniques utilized. Proposed work presents the proposed method, with algorithm and flow. Results and discussion shows the experiments’ statistical results based on evaluation metrics. Discussion and threats to validity, discusses results and their comparison with existing work and also discusses threats to validity. Lastly, Conclusion and future work summarizes and provides a future recommendation for the study.

Background

The section introduces the importance of software effort estimation, and work performed by various authors and researchers in the field of effort estimation, along with the motivation for the research.

Software Effort Estimation

An important factor in the effective development of a software project is the effort estimation of the software product prior to production. The decision-makers may manage and reuse the resources, budget, and product development time with the aid of accurate effort estimation. The scale of the project, the technology being utilized, the expertise of the company staff, the environment in which the project is being developed, and business standards are just a few of the variables that affect the effort estimation of software projects. These elements, often known as cost drivers, provide an estimation of the real effort needed to construct a software project.

The goal of our research is to increase the precision of estimates for developing software projects. Most software projects fail during development as a result of faulty budgets, defects, and effort estimates or go past the development deadline. The software product effort can be estimated using a variety of ways. Traditional methods include expert judgment, the top-down approach, and statistical and parametric approaches, which estimate the effort based on prior information and client data collected.

Related Work

In order to estimate the effort, Rao and Appa Rao (Citation2021) suggested a novel model based on ensemble learning and recursive feature elimination. The suggested approach is possible to estimate the efforts with the characteristics like size and cost using feature ranking and selection procedure. With the COCOMO II dataset, simulation results for the suggested technique seem to be positive. In comparison to the individual techniques, the performance of the suggested solution is quite promising in respect of both actual cost and LOC.

Cabral and Oliveira (Citation2021) has proposed DES an approach, a collection of regressors chosen in real time by classifiers, to estimate effort. The classifiers learn to choose the most effective regressor from a group of candidates across a number of different data sets. Classifiers choose regressors whose predictions are combined to form the final prediction.

Mousavi, Eftekhari, and Rahdari (Citation2018) recommended an unconventional approach by integrating static and dynamic ensemble selection processes with Over-Bagging. The author uses the Over-Bagging method for learning about class imbalances. In order to determine the optimal classifiers and their combiner across all test samples, the proposed technique employs the Genetic Algorithm as the static ensemble selection method. The dynamic ensemble selection is then performed on a subset of the chosen classifiers for each test sample. The author claims that the suggested Omni-Ensemble Learning (OEL) has superior performance using G-mean, balance, and AUC metrics across seven NASA datasets.

Suresh Kumar et al. (Citation2022) proposed a gradient boosting regressor model. The author uses COCOMO81 and CHINA datasets for training. The evaluation metrics for regression models, MAE, MSE, RMSE, and R2 were used to measure performance. The author states that the gradient boosting regressor model is effective, as evidenced by its accuracy of 98% with the COCOMO81 dataset and 93% with the CHINA dataset.

Alhazmi and Zubair Khan (Citation2020) employed ensemble learning bagging using linear regression as the base learner, as well as MLP, REPTree, random forest, SMOReg, and M5Rule. Also constructed the feature selection techniques; BestFit and Genetic Algorithm. The model is tested with a China dataset. According to the author, the relative error of the Bagging M5 rule with Genetic Algorithm as Feature Selection is 10%.

Hidmi and Erdogdu Sakar (Citation2017) proposed ensemble learning by merging the Support Vector Machine (SVM) and K-Nearest Neighbor (k-NN) machine learning algorithms. The author utilized labeled Desharnais and Maxwell, two public datasets. The author suggested that ensemble learning enhances the outcomes and that svm technique outperforms k-nn technique.

Shukla and Kumar (Citation2019) suggested feature reduction, to determine the most important attributes in the Desharnais data set. For software effort estimation, they then applied MLPNN to the reduced data set and obtained an R2 value of 79%.

Marco, Sharifah Syed Ahmad, and Ahmad (Citation2022) proposed AdaBoost ensemble learning and random forest (RF), as well as the Bayesian optimization method, to obtain the model’s hyperparameters. The SEE model was trained and tested using the PROMISE repository and the ISBSG dataset. The AdaBoost ensemble learning and bayesian optimization-based RF approach, according to the author, outperforms. The AdaBoost-based model also rates the relevance of each feature, making it a viable tool for estimating software work.

Three filter feature selection methods are used by Hosni, Idri, and Abran (Citation2021); correlation-based feature selection (CFS), RReliefF, and linear correlation. In this study, the authors used ensemble modeling with the four machine learning models kNN, SVR, MLP, and decision tree. They divided the ensemble modeling into three categories: ensemble with no feature selection, ensemble with one feature selection, and ensemble with a different feature selection. According to the author, ensemble members can produce estimates that are more accurate by using various feature subsets as opposed to using the same feature subset or all of the available characteristics.

Hosni, Idri, and Abran (Citation2017) used filter feature selection methods namely; correlation and rrelieff feature selection techniques. On six separate datasets; Albrecht, COCOMO81, China, Desharnais, Kemerer, and Miyazaki. The applied estimation model is an ensemble learning strategy with kNN, SVR, MLP, and decision trees. According to the authors, the ensemble learning model without the use of feature selection out-performed when applied with feature selection techniques in terms of accuracy.

Pospieszny, Czarnacka-Chrobot, and Kobylinski (Citation2018), proposed three machine learning models; SVM, MLP and GLM, and ensembled them to obtain an average of the three models. The ensemble approach outperforms the effort prediction when each algorithm is applied individually.

To conclude, the authors in the previous works used ensembled learning, which combines the predictions from multiple models and extracts the best out of it. These studies collectively contribute to advancing SEE methodologies, offering improved accuracy and robustness in effort estimation for software projects. This paper also aims at the same but with the objective to extract the best models from the pool of ML models and then apply them to the datasets. The ensemble models are less interpretable as the output is hard to predict and understand. As discussed in a survey by Mahmood et al. (Citation2022) the ensemble approaches outperform individual models. But there are a few advantages as well; ensemble modeling minimizes modeling method bias, and reduces variance and thus decreasing the likelihood of over-fitting.

Motivation

In order to keep up with the current demand for software development processes, the project manager must anticipate in beforehand how long the project development and how much effort are required in order to successfully develop the projects. The effort (cost) to complete software development is the key risk element. When considered in this light, the peculiarities of developing software projects can make the cost estimation process too challenging as discusses by Arau´jo, Soares, and Oliveira (Citation2012). Due to this, software engineering still considers effort estimation as a challenge. Additionally, the emergence of diverse datasets and the need for more robust and adaptable models further drive the motivation for developing and evaluating new methodologies in SEE. Overall, the goal is to incorporate the phenomenon of discretization and multistep DES to enhance the effectiveness and reliability of effort estimation in software projects, ultimately contributing to improved project planning, resource allocation, and project success.

Methodology

The Background discussed the literature by the researchers, in addition, the section discusses our motivation for proposing a novel approach to the problem. This section discusses the description of datasets utilized in our research. The definitions of the proposed approaches along with their advantages and disadvantages are mentioned. Lastly, the evaluation parameters for validating the performance of the proposed model are also specified for clear understanding.

Taking this challenge and the work done by academicians, and researchers into consideration, we aim to create a model to minimize the selection process of classification techniques and speed up the decision-making of the analyst and experts in the field. The paper proposes an approach for selecting the most appropriate machine learning model from a given set of candidates and obtain the highest accuracy in terms of performance metrics.

Dataset Description

The four publicly available datasets China, Kitchenham, Maxwell, and Desharnais are used in this research to create a model for determining the effort needed for novel software development initiatives. In the area of estimating software effort, these datasets are the most often used.

J.M. Desharnais acquired 81 project data from a Canadian software business in the late 1980s, forming the Desharnais dataset by Desharnais (Citation1989) and Mair et al. (Citation2000). The original dataset has 12 attributes; however, the ProjectID attribute was left out of this study’s analyzation because it is irrelevant to the study’s goals. All of the variables in this dataset are numeric, and the left-hand side lists 10 independent qualities and one dependent attribute (effort), with the exception of the nominal Language attribute.

The 62 projects in the Maxwell dataset, that were only formed by Maxwell (Citation2002), was accomplished between 1985 and 1993. Each project has 27 attributes that are all numerical and are used to summarize the project. A total of 27 independent features and one dependent feature (effort) as in Li (Citation2009).

The 499 projects in the China dataset by Yun (Citation2010) have a total of 19 attributes. In the China dataset, the Function Point serves as the measuring unit. In 2010, Tim Menzies released the dataset for the first time on the PROMISE repository. There isn’t any ambient information provided with the data.

The Kitchenham dataset was obtained from an American-based MNC Computer Science Corporation (CSC). It was released for the first time in 2002 by Kitchenham et al. (Citation2002). The dataset was gathered from software projects that CSC worked on between 1994 and 1999. Total of 145 software and maintenance projects in the dataset, each with 10 attributes as in Tsunoda (Citation2017). Start and expected completion dates are also included in the attributes. Function point is used to measure project size.

Data Preprocessing

Data Pre-processing of the input data in this research is the first and most important step in preventing undesired noise and artifacts. This sub-section gives a description of the employed methodology for pre-processing the data.

Discretization

The problem“Software Effort Estimation” is a regression problem, with continuous output variables. A regression problem can be changed into a classification. For instance, the attribute that needs to be forecasted might be divided into distinct classes or buckets. The process is frequently referred to as discretization as in Brownlee (Citation2017), and the output variable is classified with ordered labels called ordinal. In order to transform the task from regression to classification, we discretized the numerical target variables into a set of classes or intervals. This is important since the machine learning methods we adopted only work with minimal modeling and do not make predictions about grayscale (continuous) values but about discrete (labeled) ones, in other words, paper works with the machine learning algorithms for the classification of labeled data. We balanced the data and labeled it into classes of five in each China, Desharnais, and Kitchenham datasets and four classes in Maxwell dataset as in Radlinski and Hoffmann (Citation2010), the labeled explanation is depicted in . The datasets used in our work are discretized on the basis of the target field i.e Effort.

Table 1. Labeled dataset description with effort ranges.

Data Normalisation

The datasets selected for the experiment are normalized using MinMax Pedregosa et al. (Citation2011) scaling transformation in the range of 0 and 1. The minmax scaling technique preserves the shape of the original data distribution. Due to the wide scale of the feature range, the minmax scaling helps out the performance of the models over the datasets.

Feature Selection

As discussed earlier in data preprocessing, the paper targets software effort estimation of a regression problem, which is been converted into classification using the phenomenon of discretization. The relevant optimal feature set is selected using the SelectKBest as in Brownlee (Citation2019) classification filter feature selection technique. The feature selection technique selects the best relevant features that have the greatest impact on the target variable, which proportionally increases the performance of the machine learning models. The selectKBest filter feature selection technique is a white box technique that extracts the relevant features according to k’s highest score. In this research, the value of k is 7 for the Kitchenham dataset, 10 for Desharnais and China datasets, and 12 for the Maxwell dataset assigned using the hit-and-trial approach.

Machine Learning Models

In the discipline of machine learning, often referred to as a branch of artificial intelligence, the primary emphasis is on the creation of techniques and systems that will enable computers to learn and carry out responsibilities on their own. Machine learning techniques, which have some similarities to the human mind, enable us to tackle complexities quickly. Machine learning involves training algorithms on data and then making predictions or taking actions based on the learned patterns. Machine learning algorithms have been recommended as an alternate method to estimate software effort in the last 20 years or so.

Dynamic Ensemble Selection (DES)

A subset of the ensemble members is automatically chosen using the ensemble learning technique known as ”dynamic ensemble selection” as discussed in Cruz et al. (Citation2020) when formulating a prediction. This method requires fitting a large number of machine learning models to the training dataset, and then selecting the models that are expected to perform best when making a prediction for a given test case, all while taking into account the peculiarities of the target to be predicted. The selection mechanism considers the accuracy, diversity, and/or stability of the models in the pool and selects a set of models that complement each other and improve overall performance.

For the dynamic selection in the current approach, several distinct classifiers are used. Though not all of these classifiers adopt this same technique, some of them are based on local accuracy. The paper goes through multiple machine learning strategies that are applied to extract the best suitable model from the pool of models and then label discretized effort using DES again i.e. Multi-step DES (MS-DES) approach is been proposed. These techniques are both parametric and non-parametric which have been selected such as kNN, Random Forest, and Extra Tree are some non-parametric while Naive Bayes, MLP, and Logistic Regression are some parametric, this difference will enable us to reveal what kind of a model works better on software effort estimation datasets. The classifiers employed in the first dynamic selection method proposed to extract the best classification models in this work are shown in .

Table 2. Classification algorithms.

Some standard and defined ensemble learning by Dong et al. (Citation2020) methods such as AdaBoost, Bagging, Extra Tree, and Random Forest are incorporated into the pool. Decision trees with one level, or Decision trees with only one split, are the most popular algorithm used with AdaBoost. An ensemble meta-estimator known as a bagging classifier fits base classifiers one at a time to random subsets of the original dataset, and then combines each prediction (either by voting or by averaging) to get the final prediction. Extremely Randomized Trees Classifier, also known as Extra Trees Classifier, is a form of ensemble learning technique that combines the findings of various de-correlated decision trees gathered in a”forest” to produce its classification outcome. Random forest, which is again an ensemble meta-estimator, which fits multiple-decision trees and uses averaging for final prediction.

Extreme Gradient Boosting (XGBoost) is a distributed, scalable gradient-boosted decision tree (GBDT) machine learning framework, it offers parallel tree boosting. A feed-forward artificial neural network model called a multi-layer Perceptron classifier uses stochastic gradient descent to improve the log-loss function is utilized in Ray (Citation2019). The pool includes some other machine learning models such as k Nearest Neighbors, support vector machines, logistic regression, decision trees, naive Bayes, and J48 as in Mahesh (Citation2020).

Proposed Work

The primary goal of our paper is to extract the models from the pool of models, which work effectively in order to predict effort. The secondary goal is to achieve the highest precision of estimates for developing software projects. To do this, the paper proposed a multi-step dynamic ensemble selection (MS-DES), a system using a variety of machine-learning classification approaches that will be constructed and then applied to a pool of classification algorithms, in order to extract the best classification models. Then again propose dynamic ensemble selection (DES), which utilizes extracted classification models in order to improve effort estimation accuracy for software projects datasets as in Garc´ıa et al. (Citation2018). The algorithmic description of the proposed Multi-step Dynamic Ensemble Selection (MS-DES) is depicted in 1.

Dynamic ensemble selection strategies are methods that pick a group of classifiers as opposed to only one. The ensemble of classifiers is made up of all base classifiers that meet a minimum degree of competency. Through the process of fitting many models, each with a low error rate, and then combining them into an ensemble that may perform better, DES may improve the accuracy of any learning system. The first step DES identifies the best relevant models for the classification of labeled datasets (China, Maxwell, Desharnais, and Kitchenham) depending on the training data. In the next step, DES is then applied with the extracted models which calculate the parameters based on the canonical approach to dynamic ensemble selection: k-Nearest Neighbor Oracle (KNORA); Eliminate, Union, and Meta-DES.

The overall procedure is divided into three phases; data preprocessing, proposed work (multi-step dynamic ensemble selection), and model evaluation, which are illustrated in . In phase 1: Data Preprocessing; the datasets are refined using three methods; Discretization, Normalisation and Feature selection. The datasets are then split into training and testing data. The proportion of train to test data is 70:30. After refinement of the data, the preprocessed data is provided for training and testing purposes to the proposed MS-DES model. During the training phase, the model learns from the training data to capture patterns and relationships within the dataset. Following training, the model’s performance is evaluated using the testing data to assess its ability to generalize to unseen instances accurately.

Figure 1. Proposed model.

Figure 1. Proposed model.

In phase 2: Multi-step Dynamic Ensemble Selection; initially, the pool of classifiers is trained using the training data, allowing each classifier to learn patterns and relationships within the dataset. Once trained, the first step of the DES process is applied. DES dynamically selects the most pertinent classifiers from the pool for making predictions. This selection is based on the classifiers’ performance and effectiveness in capturing the intricacies of the data. Following the selection of relevant classifiers, the next step in the MS-DES phase involves employing DES again.

more detailed understanding of the effort required for various aspects of the software development process, allowing for more precise estimation.

Finally phase 3: Model Evaluation; the trained MS-DES model is evaluated using testing data in terms of the canonical approach to DES and confusion metrics parameter values are generated and validated in form of results.

Results and Discussion

The section discusses the results obtained after implying the proposed approach in terms of evaluation metrics. The evolution of the proposed model is discussed. To validate the performance measure, the receiver operating curve is created and depicted in this section. In the proposed multi-step DES, the outcome of the first DES is the extracted classifiers relevant to the prediction. This set of classifiers is again employed with next-level DES. The classifier extracted from the first DES depends upon the input training data, by fitting a number of models, each with a low error rate. The extracted classifiers with four different datasets may vary, which are shown in .

Table 3. Outcome of first-step DES.

Evaluation Metrics

The outcomes of the proposed model are analyzed using the confusion metrics parameters such as accuracy, sensitivity, F1-score, precision, or Positive Predicted Value (PPV), and r-squared value on the multiple datasets. The formulation of the abovementioned confusion matrix parameters is depicted in .

Table 4. Confusion matrix parameter calculation.

Model Evolution

The proposed MS-DES model is evolved in an accumulative manner, the labeled datasets are earlier tested with the individual models; NB, RF, ET, MLP, LR and kNN. The performance of the models has been noted in terms of the confusion metric parameter; accuracy. Later, this is compared with the proposed multi-step DES applied with extracted classification models after the first step DES. The performance comparison of the proposed model with individual models is shown in .

Table 5. Accuracy of individual models vs dynamic ensemble selection.

The selectKBest filter feature selection extracts the relevant subset of features set from all the features, which extracts the relevant features according to k’s highest score, in this research the value of k is 7 for the Kitchenham dataset, 10 for Desharnais and China datasets and 12 for the Maxwell dataset. The evolution shows which compares the performance of the proposed MS-DES model in terms of accuracy, and compares the performance of the proposed MS-DES model in terms of rsquared value with and without feature selection.

Table 6. Accuracy comparison of DES with and without feature selection.

Table 7. R-Square value comparison of DES with and without feature selection.

The final results, feature selection based on the proposed Multi-step Dynamic Ensemble Selection gives the more accurate results, in terms of confusion metrics parameters; accuracy, precision, recall, and f1-score and r-squared value as shown in . The highest accuracy of 94.74%, 99.30%, 92%, and 95.45% are obtained with Maxwell, China, Desharnais, and Kitchenham datasets respectively. The highest r-squared value (close to 1) tends to be the effective and efficient performance model; the results appeared in terms of the r-squared value i.e. 95.18%, 99.60%, 95.92%, and 98.16% in Maxwell, China, Desharnais, and Kitchenham datasets respectively manifest that the proposed approach outperforms.

Table 8. Confusion metrics parameters values obtained based on proposed work.

ROC

The receiver operating characteristic (ROC) curve is obtained to forecast effort ranges and to validate the performance measure using the confusion metric. The ROC curve is used to assess the number of true positives and false positives in test results. The ROC curve has been shown with the true positive rate, which can be found along the Y-axis, and the false positive rate, which can be found along the X-axis. illustrates the ROC curve for the proposed feature selection-based multi-step DES model. The RoC area under the curve score obtained for employed datasets is depicted in .

Figure 2. RoC curves for proposed model.

Figure 2. RoC curves for proposed model.

Table 9. ROC AUC score of datasets.

The proposed Multi-step Dynamic Ensemble Selection (MS-DES) model is developed incrementally, with initial testing conducted on individual models such as Naive Bayes, Random Forest, Extra Trees, Multilayer Perceptron, Logistic Regression, and k-Nearest Neighbors using labeled datasets. Performance is evaluated based on the accuracy metric. Subsequently, the MS-DES model is compared with these individual models after the initial DES step, showcasing their performance in terms of accuracy. Additionally, feature selection is performed using the selectKBest filter method to extract relevant subsets of features from the datasets. The evolution of the MSDES model’s performance with and without feature selection is documented in tables, highlighting improvements in accuracy and R-squared values. The results indicate that the MS-DES KNORA-E model with feature selection yields the most accurate predictions, achieving high accuracy and R-squared values across different datasets. Furthermore, the receiver operating characteristic (ROC) curve is utilized to assess the model’s performance in forecasting effort ranges, with the area under the curve

(AUC) score reported for each dataset. The proposed MS-DES model will contribute to the advancement of software engineering practices by providing more accurate effort estimation, enabling better decision-making, optimizing resource allocation, reducing project risks, and facilitating continuous improvement in project management processes.

Discussion and Threats to Validity

The results of a performance analysis of the three standard canonical methods of dynamic ensemble selection, k-Nearest Neighbor Oracle (KNORA); Eliminate, Union and Meta-DES suggest the following. The results suggest, KNORA-E outperforms other canonical approaches in terms of accuracy and r-squared value. It is clear from the suggested MS-DES approach that two levels of DES are used; first to separate the appropriate classifiers from a vast collection of classifiers. The first tier chooses a group of classifiers and the most suitable combiner for each test sample. In order to predict labeled effort ranges, the second-level DES is applied which employs chosen group of classifiers against each test sample. k- Nearest Neighbor Oracle, the standard dynamic ensemble selection method, is utilized to compute the performance of the model, which effectively and efficiently generates higher accuracy and quality performance of the model.

In order to validate the performance of the proposed MS-DES model, the paper compares the performance of the proposed model with existing models from the literature. The comparison is based on the r-squared value, the coefficient of determination between 0 and 1. R-squared is a statistical analytic method for evaluating the applicability and reliability of model performance. This statistic shows the proportion of the dependent variable’s variance that the independent variables account for collectively.

In comparison with existing work by authors in the same domain, an r2-squared value obtained by Malhotra and Jain (Citation2011) is 93% with the China dataset, Ur Rehman, Ali, and Jan (Citation2021) obtained r2-square value of 72%, 19%, and 88% with China, Desharnais, and Kitchenham datasets respectively. Burgess and Lefley (Citation2001) acquired an r2-square value of 82% with the Desharnais dataset. Hidmi and Erdogdu Sakar (Citation2017) obtained an accuracy of 91.35% with the Desharnais and 85.48% with Maxwell datasets. On comparing our proposed approach with the existing methods, the paper justifies the proposed MS-DES approach is outplaying other techniques. The proposed model performs efficiently with datasets and may yield generality for predicting effort in software companies on the basis of previously developed products.

We obtained significant results from this research for effort estimation to enhance the software development process. The reliability of our findings is, however, still threatened by a few factors (which may be related to the generalization of the data obtained). When testing our methodology, we chose four publicly available datasets that might pose the toughest threat. There are several additional publicly available datasets for effort estimation, such as ISBSG datasets, and the datasets selected only provide information for a small number of projects. However, we believe that our findings may be replicated using different datasets and that our suggested methodology would work effectively with any effort estimation dataset.

The software metrics were chosen to build predictors that pose another risk to validity. Even considering that we only used standard code measurements for labeled data; confusion metrics, we recognize the possibility of varying outcomes based on the software metric used. Instead of evaluating the effectiveness of predictors created using various kinds of software metrics, we want to investigate the role of a confined metric set in estimating effort from the perspective of the trade-off between generality, cost, and accuracy.

Conclusion and Future Work

This work proposes a heterogeneous Multi-step dynamic ensemble selection (MS-DES) approach, composed of a pool of classifiers dynamically chosen by classifiers, to address software development effort estimates. Additionally, using the suggested methodology, it conducts an experimental analysis using four pertinent datasets for software development effort estimation. The datasets were discretized into labeled data based on effort ranges. Since no single model is effective with all the datasets, as discussed earlier, the proposed feature selection-based MS-DES model performs effectively in predicting effort ranges, in terms of confusion metrics and resulted in the accuracy of 94.74%, 99.30%, 92%, and 95.45% with Maxwell. China, Desharnais, and Kitchenham datasets respectively. The accuracy of the proposed MS-DES model is calculated using the canonical approach to dynamic ensemble selection which is k-Nearest Neighbor Oracle (KNORA-E/U/Meta-DES); KNORA-E outperforms other canonical approaches in terms of accuracy and r-squared value.

In the future, we intend to implement the model on large datasets available in order to validate the performance of the proposed model. In the proposed model, which chooses the best suitable classifiers from the pool of classifiers, we intend to add more new classification techniques to improve the efficacy of the model and in order to develop an independent model for any classification problem.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

The datasets analyzed during the current study are available in different repositories: Desharnais dataset is available in Promise repository, http://promise.site.uottawa.ca/SERepository/datasets/desharnais.arff, Maxwell, China, and Kitchenham datasets are available in Zenodo repository their respective web links are; https://doi.org/10.5281/zenodo.268461, https://doi.org/10.5281/zenodo.268446, https://doi.org/10.5281/zenodo.268457.

Additional information

Funding

This research was funded by Horizon Europe (HORIZON) project number 101138678 — ZEBAI

References

  • Alhazmi, O. H., and M. Zubair Khan. 2020. Software effort prediction using ensemble learning methods. Journal of Software Engineering and Applications 13 (7):143–20. doi:10.4236/jsea.2020.137010
  • Arau´jo, R. D. A., S. Soares, and A. L. Oliveira. 2012. Hybrid morphological methodology for software development cost estimation. Expert Systems with Applications 39 (6):6129–39. doi:10.1016/j.eswa.2011.11.077
  • Bardsiri, V. K., D. Norhayati Abang Jawawi, A. Khatibi Bardsiri, and E. Khatibi. 2013. LMES: A localized multi-estimator model to estimate software development effort. Engineering Applications of Artificial Intelligence 26 (10):2624–40. doi:10.1016/j.engappai.2013.08.005
  • Britto, A. S., Jr, R. Sabourin, and L. E. Oliveira. 2014. Dynamic selection of classifiers—a comprehensive review. Pattern Recognition 47 (11):3665–80. doi:10.1016/j.patcog.2014.05.003
  • Brownlee, J. 2017. Difference between classification and regression in machine learning. Machine Learning Mastery 25:985–1.
  • Brownlee, J. 2019. How to choose a feature selection method for machine learning. Machine Learning Mastery 10:1–7.
  • Burgess, C. J., and M. Lefley. 2001. Can genetic programming improve software effort estimation? A comparative evaluation. Information and Software Technology 43 (14):863–73. doi:10.1016/S0950-5849(01)00192-6
  • Cabral, J. T. H. D. A., and A. L. Oliveira. 2021. Ensemble effort estimation using dynamic selection. Journal of Systems and Software 175:110904. doi:10.1016/j.jss.2021.110904
  • Cruz, R. M., L. G. Hafemann, R. Sabourin, and G. D. Cavalcanti. 2020. DESlib: A dynamic ensemble selection library in Python. Journal of Machine Learning Research 21 (8):1–5. doi:10.1214/10-BA521
  • Desharnais, J. M. 1989. Analyse statistique de la productivitie des projects informatique a partie de la technique des point des function. Masters thesis University of Montreal.
  • Dong, X., Z. Yu, W. Cao, Y. Shi, and Q. Ma. 2020. A survey on ensemble learning. Frontiers of Computer Science 14 (2):241–58. doi:10.1007/s11704-019-8208-z
  • Garc´ıa, S., Z. Zhong-Liang, A. Abdulrahman, A. Saleh, and H. Francisco. 2018. Dynamic ensemble selection for multi-class imbalanced datasets. Information Sciences 445:22–37. doi:10.1016/j.ins.2018.03.002
  • Hidmi, O., and B. Erdogdu Sakar. 2017. Software development effort estimation using ensemble machine learning. International Journal of Computing, Communication and Instrumentation Engineering 4 (1):143–47.
  • Hosni, M., A. Idri, and A. Abran. 2017. “Investigating heterogeneous ensembles with filter feature selection for software effort estimation.” In Proceedings of the 27th international workshop on software measurement and 12th international conference on software process and product measurement, October 25–27, Gothenburg Sweden, 207–220.
  • Hosni, M., A. Idri, and A. Abran. 2021. On the value of filter feature selection techniques in homogeneous ensembles effort estimation. Journal of Software: Evolution and Process 33 (6):e2343. doi:10.1002/smr.2343
  • Idri, A., M. Hosni, and A. Abran. 2016. Systematic literature review of ensemble effort estimation. Journal of Systems and Software 118:151–75. doi:10.1016/j.jss.2016.05.016
  • Jorgensen, M., and M. Shepperd. 2006. A systematic review of software development cost estimation studies. IEEE Transactions on Software Engineering 33 (1):33–53. doi:10.1109/TSE.2007.256943
  • Kitchenham, B., S. Lawrence Pfleeger, B. McColl, and S. Eagan. 2002. An empirical study of maintenance and development estimation accuracy. Journal of Systems and Software 64 (1):57–77. doi:10.1016/S0164-1212(02)00021-3
  • Kocaguneli, E., T. Menzies, and J. W. Keung. 2011. On the value of ensemble effort estimation. IEEE Transactions on Software Engineering 38 (6):1403–16. doi:10.1109/TSE.2011.111
  • Li, Y. 2009. “Effort estimation: Maxwell.” Mar. doi:10.5281/zenodo.268461
  • Mahesh, B. 2020. Machine learning algorithms-A review. International Journal of Science and Research (Ijsr) [Internet] 9:381–86.
  • Mahmood, Y., N. Kama, A. Azmi, A. Salman Khan, and M. Ali. 2022. Software effort estimation accuracy prediction of machine learning techniques: A systematic performance evaluation. Software: Practice and Experience 52 (1):39–65. doi:10.1002/spe.3009
  • Mair, C., G. Kadoda, M. Lefley, K. Phalp, C. Schofield, M. Shepperd, and S. Webster. 2000. An investigation of machine learning based prediction systems. Journal of Systems and Software 53 (1):23–29. doi:10.1016/S0164-1212(00)00005-4
  • Malhotra, R., and A. Jain. 2011. Software effort prediction using statistical and machine learning methods. International Journal of Advanced Computer Science and Applications 2 (1). doi:10.14569/IJACSA.2011.020122
  • Marco, R., S. Sharifah Syed Ahmad, and S. Ahmad. 2022. Bayesian hyperparameter optimization and ensemble learning for machine learning models on software effort estimation. International Journal of Advanced Computer Science and Applications 13 (3). doi:10.14569/IJACSA.2022.0130351
  • Maxwell, K. D. 2002. Applied Statistics for Software Managers, Illustrated. 333. Prentice Hall PTR. https://books.google.co.in/books?id=irVQAAAAMAAJ
  • Mendes-Moreira, J., C. Soares, A. M. Jorge, and J. F. S. Sousa. 2012. Ensemble approaches for regression: A survey. ACM Computing Surveys (CSUR) 45 (1):1–40. doi:10.1145/2379776.2379786
  • Mousavi, R., M. Eftekhari, and F. Rahdari. 2018. Omni-ensemble learning (OEL): Utilizing over-bagging, static and dynamic ensemble selection approaches for software defect prediction. International Journal on Artificial Intelligence Tools 27 (6):1850024.
  • Pedregosa F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–30.
  • Pospieszny, P., B. Czarnacka-Chrobot, and A. Kobylinski. 2018. An effective approach for software project effort and duration estimation with machine learning algorithms. Journal of Systems and Software 137:184–96. doi:10.1016/j.jss.2017.11.066
  • Radlinski, L., and W. Hoffmann. 2010. On predicting software development effort using machine learning techniques and local data. International Journal of Software Engineering and Computing 2 (2):123–36.
  • Rao, K. E., and G. Appa Rao. 2021. RETRACTED ARTICLE: Ensemble learning with recursive feature elimination integrated software effort estimation: A novel approach. Evolutionary Intelligence 14 (1):151–62. doi:10.1007/s12065-020-00360-5
  • Ray, S. 2019. “A quick review of machine learning algorithms.” In 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), February 14–16, India, 35–39. IEEE.
  • Rehman, Israr Ur, Zulfiqar Ali, and Zahoor Jan. 2021. An empirical analysis on software development efforts estimation in machine learning perspective. Advances in Distributed Computing and Artificial Intelligence Journal.
  • Shepperd, M., and G. Kadoda. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering 27 (11):1014–22. doi:10.1109/32.965341
  • Shukla, S., and S. Kumar. 2019. “Applicability of neural network based models for software effort estimation.” In 2019 IEEE World Congress on Services (SERVICES), Milan, Italy, Vol. 2642, 339–42. IEEE.
  • Suresh Kumar, P., H. S. Behera, J. Nayak, and B. Naik. 2022. A pragmatic ensemble learning approach for effective software effort estimation. Innovations in Systems and Software Engineering 18 (2):283–99. doi:10.1007/s11334-020-00379-y
  • Tsunoda, M. 2017. Kitchenham. doi:10.5281/zenodo.268457
  • Usman, M., E. Mendes, F. Weidt, and R. Britto. 2014. “Effort estimation in agile software development: A systematic literature review.” In Proceedings of the 10th international conference on predictive models in software engineering, Turin, Italy, 82–91.
  • Wen, J., S. Li, Z. Lin, Y. Hu, and C. Huang. 2012. Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology 54 (1):41–59. doi:10.1016/j.infsof.2011.09.002
  • Yun, F. H. April 2010. China: Effort estimation dataset. April. doi:10.5281/zenodo.268446