9
A Novel Convolutional Neural Network Model to Predict Software Defects

Kumar Rajnish*, Vandana Bhattacharjee and Mansi Gupta

Department of CSE, BIT Mesra, Ranchi, India

Abstract

Machine learning (ML) is becoming increasingly important as a research tool due to its various frameworks and learning approaches. With the ever-increasing scale of software, reliability has become a crucial issue and software defect prediction is utilized to assist developers in finding potential defect and allocating their testing efforts. Traditional methods of software defect prediction mainly focus on designing static code metrics which are fed into ML classifiers to predict defects in the code. Even with the same ML techniques, many researchers apply statistical approaches to classify software modules and decide whether each module is defect prone or not and, accordingly, train their model. Deep neural network (DNN) and convolutional neural network (CNN) models built by the appropriate design decisions are crucial to obtain the desired classifier performance. This is especially significant when predicting fault proneness of software modules. When correctly identified, this could help in reducing the testing cost by directing the efforts more toward the modules identified to be fault prone. This paper proposed a Novel CNN (NCNN) model to predict software defects. The framework used is Python Programming Language with Keras and TensorFlow. A comparative analysis with ML algorithms [such as Random Forest (RF), Decision Trees (DT), and Naïve Bayes (NB)] and DNN model in terms of F-measure (known as F1-score), recall, precision, and accuracy has been presented from four NASA system data sets (KC1, PC1, PC2, and KC3) selected from PROMISE repository. The experimental results indicated that NCNN model was comparable to the existing classifiers and outperformed them in most of the experiments.

Keywords: Machine learning, software defect prediction, CNN model, deep learning, metrics

9.1 Introduction

The increasing complexity of modern software has raised the importance of software reliability. Building highly reliable software requires a substantial amount of testing and debugging. Due to limited budget and time, these efforts must be prioritized for better efficiency. As a result, software defect prediction methods, which predict the occurrence of defects, have been widely used to assist developers in prioritizing their testing and debugging efforts [1].

Software defect prediction [2–5] is the process of building classifiers to predict defects that occur in a definite area of source code. The prediction results can contribute developers in ordering their testing and debugging efforts. From the viewpoint of prediction hardness, software defect prediction can include method-level, class-level, file-level, package-level, and change-level defect prediction. In this research, we focused on file-level defect prediction. Typical software defect prediction [6] relies on extracting features from software artifacts and building classification models using various machine learning (ML) algorithms for training and validation.

Previous research on software defect prediction methods initially used code metrics or simply software metrics and statistical approach for fault prediction. Thereafter, the focus shifted to soft computing and ML techniques which took over all the prediction techniques [7]. In software code metric–based methods, internal attributes of the software were measured for fault prediction. The commonly used software metrics suites were QMOOD metric suite [8], Chidamber and Kemerer (CK) metric suite [9], MOOD metric suite [10], etc. From the perspective of ML, fault prediction comes under classification task in which it discriminates faulty and non-faulty modules [11]. Some representative ML methods are Ensemble, Support Vector Machine (SVM), Naïve Bayes (NB), Logistic Regression, Decision Table, etc., and a review of such techniques applied to software fault prediction is given in [12].

For this study, we proposed NCNN model for software defect prediction. The role of number of layers, nodes in each layer, learning rate, loss function, optimizer, and regularization methods have been studied. We evaluate NCNN model on four NASA system data sets (KC1, PC1, PC2, and KC3) are selected from PROMISE repository [13] to confirm that proposed NCNN model was comparable to or better than existing state-of-the-art models [such as Random Forest (RF), Decision Tree (DT), and NB] [12] and Deep Neural Network (DNN) [14] in terms of F-measure (known as F1-score), recall, precision, and accuracy. The experimental results indicated that NCNN model was comparable to the existing classifiers and outperformed.

The rest of the sections in this paper are organized as follows: Section 9.2 presents the related work. Section 9.3 gives the theoretical background on convolutional neural network (CNN) and software defect prediction. Section 9.4 presents the experimental setup. Section 9.5 gives the results and analysis. Finally, Section 9.6 concludes the paper.

9.2 Related Works

This section presents the literature review of research papers on software defect prediction based on deep learning, defect prediction based on deep features, and deep learning in software engineering.

9.2.1 Software Defect Prediction Based on Deep Learning

Singh et al. [15] used public data set AR1 for predicting fault proneness of modules. They compared logistic regression technique with six ML classifiers (DT, group method of data handling polynomial method, artificial neural network (ANN), gene expression programming, SVM, and cascade correlation network). The performance was compared by computing the area under the curve using Receiver Operating Characteristic (ROC) analysis where it was concluded that the value generated by DT was 0.865 which outperformed regression and other ML techniques. Dejaeger et al. [16] considered 15 distinct Bayesian Network (BN) classifiers and comparison was done with ML techniques. For the purpose of feature selection, Markov blanket principle was used. AUC and H-measure was tested using statistical framework of Demšar. The result showed that simple and comprehensible networks having a smaller number of nodes can be constructed using BN classifiers other than the NB classifier. Kumar et al. [17] experimented on 30 open-source projects to build a ML-based model for software fault prediction model using Least Square SVM (LSSVM). They applied 10 distinct feature selection techniques. Their prediction model was only appropriate for project with faulty classes less than the threshold value. B. Twala [18] performed software fault prediction on four NASA public data sets using DT, SVM, K-Nearest Neighbor, and NB. He concluded that NB classifier was most robust and DT classifier the most accurate.

9.2.2 Software Defect Prediction Based on Deep Features

S. Wang et al. [19] proposed a representation learning algorithm using Deep Belief Network (DBN) which helps in learning semantic program representation directly from source code. They worked on 10 open-source projects and showed that directly learned semantic features considerably improve both within- and cross-project defect prediction (WPDP and CPDP). On an average, WPDP was improved by 14.2% in F1, 11.5% in recall, and 14.7% in precision. CPDP approach beats TCA+ having traditional features by 8.9% in F1. Miholca et al. [20] proposed HYGRAR, a non-linear hybrid supervised classification method for software fault prediction. HYGRAR combined relational association rule mining and ANNs to distinguish between faulty and non-faulty software objects. For experiment purposes, they used 10 open-source data sets and validated the outstanding performance of the HYGRAR classifier. J. Li et al. [21] proposed a framework called Defect Prediction via CNN (DP-CNN) that used deep learning in order to effectively generate features. On the bases of program’s Abstract Syntax Trees (ASTs), they initially extracted token vectors and then encoded them as numerical vectors with the help of the process of word embedding and word mapping. Then, these numerical vectors were fed into CNN that automatically learnt structural and semantic program features. After this, for perfect software fault prediction, they combined traditional hand-crafted features with the learnt features. The experiment was conducted on seven open source project data. The measurement was done on the basis of F-measure. The final results showed that DP-CNN improves the state-of-the-art method by 12%. Cong Pan et al. [22] proposed an improved CNN model for WPDP and compared their results to existing CNN results and an empirical study. Their experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that their improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art ML models significantly for WPDP. Furthermore, they defined hyperparam-eter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.

9.2.3 Deep Learning in Software Engineering

Apart from software defect prediction, deep learning models have been used in software maintenance [23], code clone detection [24], defect detection [25], and other areas. Guo et al. [23] used a Recurrent Neural Network (RNN) model in software maintenance to create links between requirements, design, source code, test cases, and other artifacts. Li et al. [24] proposed a deep learning-based clone detection approach. In their paper, they used ASTs tokens to represent method-level code clones and non-clones to train a classifier, and classifier used to detect code clones. Their methods accomplished similar performance with low time cost. Nguyen et al. [25] in defect prediction used DNN for bug localization. Their model aim was to solve lexical mismatch problem, and pointed out that the terms used in bug report are different from the terms and code tokens used in source files. Their model achieved 70% accuracy with five recommended files.

Other software engineering areas influenced by deep learning are source code organization [26], run-time behavior analysis [27], feature position [28], vulnerability analysis [29], and code novelist identification [30].

9.3 Theoretical Background

9.3.1 Software Defect Prediction

Figure 9.1 presents a typical file level defect prediction process based on ML concepts, which is adopted by many researchers in most recent studies. From Figure 9.1, the first step of the procedure is to extract program modules (i.e., source files) from the repositories. The second step is to cate-gorized program modules defect (i.e., buggy) or no-defect (i.e., clean). The categorization is based on post-release defects collected from PROMISE repository (i.e., NASA Metrics Data Program). The third step is to extract features from the categorized program modules to form training instances. The features consist of code and design metrics. The fourth step is to build a classification model and use training instances to train the model. In our case, we had selected three ML models (RF, DT, and NB) as well as DNN for comparative analysis with our proposed NCNN model. The last step is to feed new program feature instances into the trained classifier to predict whether a source file is defective or non-defective (Buggy/Clean).

Schematic illustration of the file level defect prediction process.

Figure 9.1 File level defect prediction process.

9.3.2 Convolutional Neural Network

CNN is a special kind of neural network which is used to process data and has a recognized, grid-like topology [31] such as one-dimensional (1D) time series data and 2D image data. CNN has been extremely successful in practical applications, with speech recognition [32], image classification [33], and natural language processing [34, 35]. In our work, we influenced our proposed model to extract features from software repositories (i.e., source files).

Figure 9.2 demonstrates a basic CNN architecture. It consists of convo-lutional layers, pooling layers, and a simple fully connected network, along with a dense network. Neuron units are also connected to all neuron units of its neighboring layers. Even neural units connected to these two layers are sparsely connected, which is determined by kernel size and pooling size. The architecture represents the two features of CNN: sparse connectivity and shared weights, which allows CNN to capture local structural information of inputs.

The sparse connectivity property means that each neuron is connected to only a limited number of other neurons, and in CNN, it is controlled by kernel size and pooling size. From Figure 9.2, if we take node V3 and kernel size 3, it only affects three nodes in convolutional layer, i.e., h1, h2, and h3, whereas node h4 is not affected by V3. Each subset acts as a local filter

Schematic illustration of a basic convolutional neural network (CNN) architecture.

Figure 9.2 A basic convolutional neural network (CNN) architecture.

connecting to the next layer in CNN which produce strong responses to a spatially local input pattern. To find the output to the next layer, each local filter will multiply by outputs from the previous layer, then add a bias and perform a non-linear transformation. From Figure 9.2, it is observed that the ith neuron in the mth layer (convolutional layer) as him, the weights of the ith neuron in the (m – 1)th layer as Wim–1, the bias in the (m– 1)th layer as bm–1, and, in our work, we use rectified linear units (RELUs) which were recently shown to give a better performance in many neural network classifications tasks. The out can be calculated as follows:

(9.1) Image

To produce values from output layer, Softmax activation function is used which is also a type of sigmoid function. Softmax normalizes each neuron’s output to a range 1 and 0. It is non-linear in nature. It is usually used when we are trying to handle multiple classes.

The mathematical expression for Softmax activation function is as follows:

(9.2) Image

where

z: The input vector to the softmax function.

zi: All the zi values are the elements of the input vector to the softmax function, and they can take any real value, positive, zero, or negative.

ezi: The standard exponential function is applied to each element of the input vector. This gives a positive value above 0, which will be very small if the input was negative, and very large if the input was large.

Image: The term on the bottom of the formula is the normalization term. It ensures that all the output values of the function will sum to 1 and each be in the range (0, 1), thus constituting a valid probability distribution.

K: The number of classes in the multi-class classifier.

Shared weight means filter shares the same parameterization (weight vector and bias). From Figure 9.2, connecting the input layer and convo-lutional layer indicated by all the solid black lines which share the same parameters. The same is also true for the blue sparse-dotted lines and the orange dense-dotted lines. This concept of share weights allows a CNN to capture features that are self-determining their positions and efficiently diminish model capacity.

Another significant concept of CNN is max-pooling, which partition the output vector into several non-overlapping subregions, and outputs the maximum value of each sub-region. This keen concept of reducing the dimensionality of in-between representations provides additional robustness to our defect prediction.

The efficacy of CNNs largely works in an empirical manner, with the researcher tuning her models as per the application domain and the data available. Thus, parameter tuning is a key to train a successful CNN. We will discuss in Section 9.4 how to set these parameters in our proposed model.

9.4 Experimental Setup

9.4.1 Data Set Description

There are a number of open-source data sets available online for the analysis of defect prediction models. For the study, four NASA system data sets (KC1, PC1, PC2, and KC3) are selected from PROMISE repository [13] which is freely available as public data sets. The selected data sets are of different sizes and different number of set of metrics, i.e., KC1, have 22 attributes with 2,109 instances, PC1 have 22 attributes with 1,109 instances, PC2 have 37 attributes with 745 instances, and KC3 have 40 attributes with 194 instances. These data sets contain software metrics like Halstead and McCabe metrics and a Boolean variable that indicates defect or no-defect proneness of a module. Table 9.1 displays characteristics of the NASA data sets (PC1, PC2, KC1, and KC3).

The WEKA (Waikato Environment for Knowledge Analysis) tool was used for the statistical output processing of data sets. WEKA is open-source software that gives the user the power of pre-processing, implementation of well-known ML algorithms, and conception of their data so that one can develop ML techniques and apply them to real-world data problems. The data was analysed i.e., the accuracy of different data sets was calculated using various classifiers, namely, RF, DT, NB [12], and DNN [14]. The results of these classifiers were then compared with the results generated by our proposed NCNN model.

Moreover, in this data set, we are provided with 21 traditional defect prediction features for each source file, including Lines of Code (LOC), McCabe complexity measures, Halstead base measure, and derive measure. The 21 traditional features are carefully extracted from PROMISE Software Engineering Repository [13]. We list the detailed description about the 21 features in Table 9.2.

Table 9.1 Characteristics of the NASA data sets.

Data setProjectNumber of attributesNumber of instancesNumber of defective entitiesNumber of nondefective entities
NASAPC122110977 (6.9%)1,032 (93.05%)
NASAPC23774516 (2.10%)729 (97.90%)
NASAKC1222109326 (15.45%)1,783 (84.54%)
NASAKC34019436 (18.6%)158 (81.4%)

9.4.2 Building Novel Convolutional Neural Network (NCNN) Model

This section presents our proposed NCNN model based on 1D CNN model. The overall network architecture of NCNN is shown in Figure 9.3. NCNN model consisted of two convolutional layers, two max-pooling layers to extract global pattern, a flattening layer and two dense layers to generate deep features and help better simplification, and, finally, a convo-lutional linear sequential model classifier to predict whether a source file was defect.

Other details about NCNN architecture are mentioned below:

  • Image For the proposed NCNN model’s modeling, Python 3.5.2 is used. With the help of Keras pre-processing (version: 1.1.2), which is a neural network library written in Python and which is also capable of running on top of TensorFlow (version: 2.3.1), the NCNN related results were generated. The experiment was executed using the system having 64-bit operating system, x64 processor with 16 GB RAM.
  • Image For predicting software defects convolutional linear sequential model classifier is proposed which is implemented in Keras. After pre-processing of labeled source files, we split data set into training set and test set with split ratio [75:25]. We fed our training data to our NCNN model, both weights and bias are fixed, and then for each file in the test set, we fed it into defect prediction model to get our prediction results. The obtained result is in the form of 0 and 1 and based on which we predicted a source file as defective or non-defective. If the result was above 0.5, then it is considered as defective, otherwise it was considered as non-defective.

Table 9.2 Attribute information of the 21 features of PROMISE repository [13].

Attribute informationSymbol
McCabe’s line count of codeloc
McCabe “cyclomatic complexity”v(g)
McCabe “essential complexity”ev(g)
McCabe “design complexity”iv(g)
Halstead total operators + operandsn
Halstead “volume”v
Halstead “program length”l
Halstead “difficulty”d
Halstead “intelligence”i
Halstead “effort”e
Halsteadb
Halstead’s time estimatort
Halstead’s line countlOCode
Halstead’s count of lines of commentslOComment
Halstead’s count of blank lineslOBlank
Halstead lines of code and commentlOCodeAndComment
Unique operatorsuniq_Op
Unique operandsuniq_Opnd
Total operatorstotal_Op
Total operandstotal_Opnd
’Ihe flow graphbranchCount
Schematic illustration of the overall network architecture of proposed NCNN model.

Figure 9.3 Overall network architecture of proposed NCNN model.

  • Image Since we use two convolutional layers, two max-pooling layers, two dense layers, and a flattening, as increasing more depth of deep models, we might get better outcomes.
  • Image Input layers and hidden layers used a RELU activation function and last layer which used the SoftMax function for classification.
  • Image 9 We used Adam optimizer, as an optimization function in order to update the weight of the network after every single iteration. We used binary cross-entropy as the loss function.

9.4.3 Evaluation Parameters

This section presents the effectiveness of our NCNN by comparing its F-measure and accuracy on defect prediction with other state-of-the-art methods such as RF, DT, and NB [12] and DNN [14]. We also explain some basic terminologies associated with software defect prediction. A training set refers to a set of instances used to train a model, whereas a test set refers to a set of instances used to evaluate the learned model. When applying defect prediction, the training set and the test set come from the same source file. In the field of ML and, specifically, the problem of statistical classification, a confusion matrix, also known as an error matrix is used. A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix. The confusion matrix shows the ways in which classification model is confused when it makes predictions. It gives us insight not only into the errors being made by a classifier but more importantly the types of errors that are being made. Figure 9.4 shows the description regarding the confusion matrix.

Class 1: False

Class 2: True

Where the above terms are defined as follows:

  1. 1. True: Observation is true.
  2. 2. False: Observation is not true.
  3. 3. True-Positive (TP): Observation is true, and is predicted to be true.
  4. 4. False-Negative (FN): Observation is true, but is predicted false.
Schematic illustration of the description regarding confusion matrix.

Figure 9.4 Description regarding confusion matrix.

  1. 5. True-Negative (TN): Observation is false, and is predicted to be false.
  2. 6. False-Positive (FP): Observation is false, but is predicted true.

The other evaluation parameters used in this research work are TP Rate (TPR), TN Rate (TNR), FN Rate (FNR), and FP Rate (FPR), precision, recall, F-measure (also called F1-score), and accuracy.

TPR is when it is actually positive, how often does it predict positive?

Image

TNR is when it is actually negative, how often does it predict negative?

Image

FPR is when it is actually negative, how often does it predict positive?

Image

FNR is proportion of positive which yield negative test outcomes with the test?

Image

Precision (P) measures the number of positive class predictions that belong to the positive class.

Image

Recall measures the number of positive class predictions made out of all positive examples in the data set.

Image

F-measure offers a single score that balances both the concerns of precision and recall in one number.

Image

Whereas the accuracy is the total number of correct predictions divided by the total number of predictions made for a data set.

Image

9.4.4 Results and Analysis

For the final analysis, we computed the performance measures for all the five classification techniques used in the study. The results were based on the values of precision, recall, F-measure, and accuracy. Mentioned below are the tables and graphs of our study for four NASA system data sets (KC1, PC1, PC2, and KC3) that are selected from PROMISE repository [13] which is freely available as public data sets.

The observations from the tables and graphs are as follows:

  • Image For better performance, TPR and TNR should be high, and FNR and FPR should be low. From Table 9.7, it is observed that NCNN classifier predicts TPR and TNR greater than all ML classifier (RF, DT, and NB) and DNN classifier in the data sets KC1, KC2, and PC2 except in PC1 TPR of NCNN is lower than NB classifier but greater than RF, DT, and DNN classifiers but TNR of NCNN in case of PC1 is greater than all other classifiers. It is also observed that NCNN predicts FNR and FPR well; it is lower than all the data sets except in KC3 and PC1 where FPR of DNN is lower than NCNN and FNR of NB is lower than NCNN. This observation predicts that NCNN model outperformed in confusion rate analysis for the all data sets. This shows that NCNN model is not in underfit or overfit.
  • Image From Figure 9.5, it is found that NCNN predicts more faults than RF, DT, NB, and DNN (162 faults in case of KC1 data set, 33 faults in case of KC3 data set, and 14 faults in case of PC2 data set), but in case of PC1 data set, NCNN predicts more faults than DNN, RF, and DT but lesser than NB as NB predicts 23 faults and NCNN predicts 20 faults.
  • Image Figure 9.6 represents model accuracy and model loss for the data sets (KC1, KC3, PC1, and PC2). Accuracy and loss are two well-known metrics in ML and neural networks. Splitting data set will use at every epoch and used Adam optimizer, as an optimization function in order to update the weight of the network. Binary cross-entropy used as the loss function, so that we can get better model performance. Accuracy is a method for measuring a classification model’s performance and naturally expressed as a percentage. It is the count of predictions where the predicted value is equal to the true value and a loss function taken into consideration the probabilities of a prediction based on how much the prediction varies from the true value. This gives us a clearer view how well the model is performing. The lower the loss the better a model to the training data. The loss is calculated on training and validation and its interpretation is how well the model is doing for the data sets. Unlike accuracy, loss is not a percentage. From the plot of model accuracy, we can see that the model could possibly be trained a little more as the trend for accuracy on KC1 data set, KC3 data set, and PC2 data set is rising for the middle and last few epochs except in PC1 data set where, in the middle and last epochs, it slightly moves up and down. From the plot of model loss, we can see that the model has comparable performance on both train and validation data sets (labeled test). If these equivalent plots start to depart constantly, then it might be a sign to stop training at an earlier epoch. For the data sets KC1, KC3, and PC2 model loss has comparable performance on both train and validation data sets except in PC1 data set where in the last few epochs it drastically increasing.
    Bar graphs depict the confusion matrix analysis for the data sets (KC1, KC3, PC1, and PC2).

    Figure 9.5 Confusion matrix analysis for the data sets (KC1, KC3, PC1, and PC2).

    Graphs depict the model accuracy and model loss analysis for the data sets (KC1, KC3, PC1, and PC2).

    Figure 9.6 Model accuracy and model loss analysis for the data sets (KC1, KC3, PC1, and PC2).

  • Image To evaluate prediction accuracy F-measure (also called a F1-score), precision, and recall were used. Usually, there are adjustment between precision and recall. As for example, by predicting all the test files as defective, we will get a recall values as 1 and a very low precision. So, F-measure is the better prediction performance representation which is a compound of precision and recall and falls in the range [0, 1].
  • Image Figure 9.7 shows the performance comparison in terms of F-measure, recall, and precision of different models for soft-ware defect prediction for the data sets (KC1, KC3, PC1, and PC2). From Figure 9.7, it is observed that F-measure of NCNN is 5% higher than DNN, 9.3% higher than NB, 7.4% higher than DT, and 6.5% higher than RT for the data set KC1. For the data set KC3, F-measure of NCNN is 1% higher than DNN, 10.9% higher than NB, 8.1% higher than DT, and 8.5% higher than RF. For the data set PC1, F-measure of NCNN is 1.8% higher than NB and equal to RF and DT but 1% less than DNN. The reason may be that the training set of PC1 is relatively small. Similarly, for the data set PC2 F-measure of NCNN is 1% higher than DNN, 3.9% higher than NB, 1% higher than DT, and 1% higher than RF. Almost in all the data sets, NCNN outperformed the other state-of-the-art models of ML and DNN in terms of F-measure for software defect prediction.
  • Image From Tables 9.3 to 9.6 and Figure 9.8, we examined the accuracy value for the data sets KC1, KC3, PC1, and PC2. From analysis, we found that NCNN has higher accuracy for the data set KC1 (88.76), for the data set KC3 (96.97), and for the data set PC2 (98.71). For the data set PC1, NCNN has less accuracy (93.16) than DNN (94.65) and RF (93.68) but higher than DT and NB.
Bar graphs depict the performance comparison of different models for software defect prediction for the data sets (KC1, KC3, PC1, and PC2).

Figure 9.7 Performance comparison of different models for software defect prediction for the data sets (KC1, KC3, PC1, and PC2).

Table 9.3 Performance comparison for the data set KC1.

KC1
AlgorithmPrecisionRecallF-MeasureAccuracy
RF0.8870.9650.92586.67
DT0.8650.9740.91684.87
NB0.8880.9050.89782.36
DNN0.870.980.9488.57
NCNN0.990.990.9988.76

Table 9.4 Performance comparison for the data set KC3.

KC3
AlgorithmPrecisionRecallF-MeasureAccuracy
RF0.8320.9680.89581.44
DT0.870.930.89982.99
NB0.8630.880.87178.87
DNN0.970.970.9787.88
NCNN0.970.980.9896.97

Table 9.5 Performance comparison for the data set PC1.

PC1
AlgorithmPrecisionRecallF-MeasureAccuracy
RF0.950.9840.9693.68
DT0.9370.990.9692.87
NB0.9470.9360.94289.17
DNN0.940.990.9794.65
NCNN0.930.930.9693.16

Table 9.6 Performance comparison for the data set PC2.

PC2
AlgorithmPrecisionRecallF-MeasureAccuracy
RF0.97910.98997.85
DT0.97910.98997.85
NB0.980.9250.95190.73
DNN0.990.990.9898.66
NCNN0.990.990.9998.71
  • Image From the above analysis, it reflects that the proposed NCNN model outperforms well for all four NASA system data sets from PROMISE repository [13] with other ML classifier and neural network classifier in terms of F-measure, recall, precision, and accuracy and through confusion matrix analysis and confusion rate analysis for the software defect prediction.

9.5 Conclusion and Future Scope

In this paper, an attempt has been made to propose a NCNN model to predict software defects. The framework using Python Programming Language with Keras and TensorFlow was used to implement our NCNN model. A comparative analysis with ML algorithms (such as RF, DT, and NB) and DNN model in terms of F-measure (known as F1-score), recall, precision, and accuracy has been presented from four NASA system data sets (KC1, PC1, PC2, and KC3) selected from PROMISE repository. From Table 9.7, we observed that NCNN classifier predicts TPR and TNR greater than all ML classifier (RF, DT, and NB) and DNN classifier almost in all the data sets. Even, NCNN predicts FNR and FPR well, it is lower almost in all the data sets. From Tables 9.3 to 9.6 and Figures 9.7 and 9.8, we examined (F-measure, recall, precision, and accuracy). From Figure 9.7, we observed that NCNN predicts software defects in terms of F-measure which is higher than all other classifiers (DNN, RF, DT, and NB) for the data sets KC1, KC2, and PC2 except in PC1 where it is lower than DNN. From Table 9.3 to 9.6 and Figure 9.8, we found that NCNN has higher accuracy for the data set KC1 (88.76), for the data set KC3 (96.97), and for the data set PC2 (98.71). For the data set PC1, NCNN has less accuracy (93.16) than DNN (94.65) and RF (93.68) but higher than DT and NB. Thus, it is shown that the proposed NCNN model outperformed the other algorithm models in most cases. In terms of future scope, we will present a deep learning outline which automatically obtains syntactic and semantic features from the source code and yields key features from them for accurate software defect prediction. We believed to apply an open-source python packaged name javalang which helps to parse Java source code into ASTs, which offers a lexical analyzer and parser based on the Java language specification which helps to construct ASTs of the Java source code. From Figure 9.9, it is observed that NCNN classifier predicts TPR and TNR greater than all ML classifier (RF, DT, and NB) and DNN classifier in the data sets KC1, KC2, and PC2 except in PC1 TPR of NCNN is lower than NB classifier but greater than RF, DT, and DNN classifiers but TNR of NCNN in case of PC1 is greater than all other classifiers. It is also observed that NCNN predicts FNR and FPR well; it is lower than all the data sets except in KC3 and PC1 where FPR of DNN is lower than NCNN and FNR of NB is lower than NCNN. This firmly belief that NCNN model outperformed in confusion rate analysis for the all-data sets and it is not underfit or overfit.

Table 9.7 Confusion matrix analysis for the KC1, KC3, PC1, and PC2 data sets (TPR, True Positive Rate; TNR, True Negative Rate; FPR, False Positive Rate; FNR, False Negative Rate).

AlgorithmKC1KC3PC1PC2
TPRTNRFPRFNRTPRTNRFPRFNRTPRTNRFPRFNRTPRTNRFPRFNR
RF0.330.960.040.670.140.970.030.860.290.980.0150.700101
DT0.170.970.030.830.380.910.070.610.100.990.0090.890101
NB0.380.900.070.620.380.900.120.610.290.930.060.700.130.920.070.88
DNN0.470.980.020.530.880.830.020.110.220.9940.0050.770.810.9900.0090.19
NCNN0.500.990.010.500.910.820.030.080.250.9980.0010.740.870.9940.0050.13
Bar graphs depict the model accuracy analysis for the data sets (KC1, KC3, PC1, and PC2).

Figure 9.8 Model accuracy analysis for the data sets (KC1, KC3, PC1, and PC2).

Graphs depict the confusion rate analysis for the data sets (KC1, KC3, PC1, and PC2).

Figure 9.9 Confusion rate analysis for the data sets (KC1, KC3, PC1, and PC2).

References

1. Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., Bener, A., Defect prediction from static code features: Current results, limitations, new approaches. Autom. Software Eng., 17, 375–407, 2010.

2. Moser, R., Pedrycz, W., Succi, G., A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, in: Proceedings of the 30th International Conference on Software Engineering, Leipzig, Germany, 15 May 2008, p. 181.

3. Tan, M., Tan, L., Dara, S., Mayeux, C., Online Defect Prediction for Imbalanced Data, in: Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), Florence, Italy, 16–24 May 2015, pp. 99–108.

4. Nam, J., Pan, S.J., Kim, S., Transfer defect learning, in: Proceedings of the International Conference of Software Engineering, San Francisco, CA, USA, 18–26 May 2013.

5. Nam, J., Survey on Software Defect Prediction. Ph.D. Thesis, The Hong Kong University of Science and Technology, Hong Kong, China, 3 July 2014.

6. Lyu, M.R., Handbook of Software Reliability Engineering, vol. 222, IEEE Computer Society Press, Washington, DC, USA, 1996.

7. Rathore, S.S. and Kumar, S., A decision tree logic-based recommendation system to select software fault prediction techniques. Computing, 99, 3, 255– 285, Mar. 2016.

8. Malhotra, R. and Jain, A., Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality. J. Inf. Process. Syst., 8, 2, 241–262, Jun. 2012.

9. He, P., Li, B., Liu, X., Chen, J., Ma, Y., An empirical study on software defect prediction with a simplified metric set. Inf. Software Technol., 59, 170–190, Mar. 2015.

10. Elish, O.M., Al-Yafei, H.A., Al-Mulhem, M., Empirical comparison of three metrics suites for fault prediction in packages of object-oriented systems: A case study of Eclipse. Adv. Eng. Software, 42, 10, 852–859, Oct. 2011.

11. Peng, Y., Kou, G., Wang, G., Wu, W., Shi, Y., Ensemble Of Software Defect Predictors: An Ahp-Based Evaluation Method. Int. J. Inf. Technol. Decis. Mak., 10, 01, 187–206, Jan. 2011.

12. Malhotra, R., A systematic review of machine learning techniques for soft-ware fault prediction. Appl. Soft Comput., 27, 504–518, Feb. 2015.

13. http://promise.site.uottawa.ca/SERepository/datasets-page.html.

14. Gupta, M., Rajnish, K., Bhattacherjee, V., Impact of parameter tuning for optimizing deep neural networks models for predicting software faults. Sci. Program., Hindawi, 2021, 1–17, 12th June, 2021 (Page No:234) [communicated].

15. Singh, Y., Kaur, A., Malhotra, R., Prediction of Fault-Prone Software Modules using Statistical and Machine Learning Methods. Int. J. Comput. Appl., 1, 22, 8–15, Feb. 2010.

16. Dejaeger, K., Verbraken, T., Baesens, B., Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers. IEEE Trans. Software Eng., 39, 2, 237–257, Feb. 2013.

17. Kumar, L., Sripada, K.S., Sureka, A., Rath, K.S., Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM). J. Syst. Software, 137, 686–712, Mar. 2018.

18. Twala, B., Predicting Software Faults in Large Space Systems using Machine Learning Techniques. Def. Sci. J., 61, 4, 306–316, Jul. 2011.

19. Wang, S., Liu, T., Tan, L., Automatically learning semantic features for defect prediction. Proceedings of the 38th International Conference on Software Engineering - ICSE ‘16, 2016.

20. Miholca, L.D., Czibula, G., Czibula, G.I., A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks. Inf. Sci., 441, 152–170, May 2018.

21. Li, J., He, P., Zhu, J., Lyu, R.M., Software Defect Prediction via Convolutional Neural Network. IEEE International Conference on Software Quality, Reliability and Security (QRS), Jul. 2017.

22. Pan, C., Lu, M., Xu, B., Gao, H., An Improved CNN model for within project software defect prediction. Appl. Sci., 9, 2138–216, 2019.

23. Cheng, J., Guo, J., Cleland-Huang., J., Semantically Enhanced Software Traceability Using Deep Learning Techniques, in: Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, Argentina, 20–28 May 2017, pp. 3–14.

24. Li, L., Feng, H., Zhuang, W., Meng, N., Ryder, B., CC Learner: A Deep Learning-Based Clone Detection Approach, in: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), Shangai, China, 17–24 September 2017, pp. 249–260.

25. Lam, N.A., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N., Bug localization with combination of deep learning and information retrieval, in: Proceedings of the 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), Buenos Aires, Argentina, 22–23 May 2017, pp. 218–229.

26. Reyes, J., Ramirez, D., Paciello, J., Automatic Classification of Source Code Archives by Programming Language: A Deep Learning Approach, in: Proceedings of the International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2016, pp. 514–519, 2016.

27. Zekany, S., Rings, D., Harada, N., Laurenzano, M.A., Tang, L., Mars, J., Crystal Ball: Statically analyzing runtime behaviour via deep sequence learning, in: Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016, pp. 1–12.

28. Corley, C.S., Damevski, K., Kraft, N.A., Exploring the use of deep learning for feature location, in: Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), Bremen, Germany, 29 September–1 October 2015, pp. 556–560.

29. Pang, Y., Xue, X., Wang, H., Predicting Vulnerable Software Components through Deep Neural Network, in: Proceedings of the 2017 International Conference on Deep Learning Technologies, Chengdu China, 2–4 June 2017, pp. 6–10.

30. Bandara, U. and Wijayarathna, G., Deep Neural Networks for Source Code Author Identification, in: Proceedings of the 20th International Conference, Daegu, Korea, 3–7 November 2013, pp. 368–375.

31. Goodfellow, I., Bengio, Y., Courville, A., Deep learning Nature, 2021, 1–17, 12th June, 2021 (Page No:234).

32. Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G., Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition, in: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012, pp. 4277–4280.

33. Krizhevsky, A., Sutskever, I., Hinton, G.E., Image net classification with deep convolutional neural networks, in: Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012, pp. 1097–1105.

34. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient-based learning applied to document recognition. Proc. IEEE, 86, 2278–2324, 1998.

35. Zhang, X., Zhao, J., LeCun, Y., Character-level Convolutional Networks for Text Classification, in: Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015.

  1. *Corresponding author: [email protected]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.8.216