Software Defect Prediction Using Hybrid Machine Learning Techniques
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Various researchers tried to develop methods of software defect prediction through applying
different machine learning algorithms. However, the performance of those techniques on most
publically available defect datasets is far from satisfactory. This is because of defect datasets
which mostly are affected by two main problems such as high feature dimensionality and class
imbalance. Five AEEEM projects of software defect datasets namely EQ, JDT, LC, ML and
PDE are used in this research and affected by high feature dimensionality and class imbalance
problem. To solve these problems, in this research proposed software defect prediction models
using seven ensemble machine learning algorithms such as AdaBoost, GB, XGBoost, RF, ET,
Bagging and Stacking with base classifier. And three feature selection methods namely CFS,
SFS and CO for solving problem of high feature dimensionality and SMOTE data balancing
technique to handle class imbalance problem are first used as part of preprocessing methods
before implementing the above models. The experiment is performed and evaluated on 10- folds
cross validation with performance metrics such as accuracy, recall, precision, F-measure and
AUC. Results indicate that CO feature selection with ET ensemble learning algorithm is
outperforming as compared to other models on all five datasets. The accuracy results are 93.1
%, 96.3 %, 99.2%, 98.2% and 97.8% for EQ, JDT, LC, ML and PDE data sets respectively.
Additionally, other performance metrics have also demonstrated more than 91% for all
datasets. Therefore, ET with CO model is recommended for software defect prediction model
that classify software modules as defect or non-defect.
