Kumar Rajnish*, Vandana Bhattacharjee and Mansi Gupta
Department of CSE, BIT Mesra, Ranchi, India
Abstract
Machine learning (ML) is becoming increasingly important as a research tool due to its various frameworks and learning approaches. With the ever-increasing scale of software, reliability has become a crucial issue and software defect prediction is utilized to assist developers in finding potential defect and allocating their testing efforts. Traditional methods of software defect prediction mainly focus on designing static code metrics which are fed into ML classifiers to predict defects in the code. Even with the same ML techniques, many researchers apply statistical approaches to classify software modules and decide whether each module is defect prone or not and, accordingly, train their model. Deep neural network (DNN) and convolutional neural network (CNN) models built by the appropriate design decisions are crucial to obtain the desired classifier performance. This is especially significant when predicting fault proneness of software modules. When correctly identified, this could help in reducing the testing cost by directing the efforts more toward the modules identified to be fault prone. This paper proposed a Novel CNN (NCNN) model to predict software defects. The framework used is Python Programming Language with Keras and TensorFlow. A comparative analysis with ML algorithms [such as Random Forest (RF), Decision Trees (DT), and Naïve Bayes (NB)] and DNN model in terms of F-measure (known as F1-score), recall, precision, and accuracy has been presented from four NASA system data sets (KC1, PC1, PC2, and KC3) selected from PROMISE repository. The experimental results indicated that NCNN model was comparable to the existing classifiers and outperformed them in most of the experiments.
Keywords: Machine learning, software defect prediction, CNN model, deep learning, metrics
The increasing complexity of modern software has raised the importance of software reliability. Building highly reliable software requires a substantial amount of testing and debugging. Due to limited budget and time, these efforts must be prioritized for better efficiency. As a result, software defect prediction methods, which predict the occurrence of defects, have been widely used to assist developers in prioritizing their testing and debugging efforts [1].
Software defect prediction [2–5] is the process of building classifiers to predict defects that occur in a definite area of source code. The prediction results can contribute developers in ordering their testing and debugging efforts. From the viewpoint of prediction hardness, software defect prediction can include method-level, class-level, file-level, package-level, and change-level defect prediction. In this research, we focused on file-level defect prediction. Typical software defect prediction [6] relies on extracting features from software artifacts and building classification models using various machine learning (ML) algorithms for training and validation.
Previous research on software defect prediction methods initially used code metrics or simply software metrics and statistical approach for fault prediction. Thereafter, the focus shifted to soft computing and ML techniques which took over all the prediction techniques [7]. In software code metric–based methods, internal attributes of the software were measured for fault prediction. The commonly used software metrics suites were QMOOD metric suite [8], Chidamber and Kemerer (CK) metric suite [9], MOOD metric suite [10], etc. From the perspective of ML, fault prediction comes under classification task in which it discriminates faulty and non-faulty modules [11]. Some representative ML methods are Ensemble, Support Vector Machine (SVM), Naïve Bayes (NB), Logistic Regression, Decision Table, etc., and a review of such techniques applied to software fault prediction is given in [12].
For this study, we proposed NCNN model for software defect prediction. The role of number of layers, nodes in each layer, learning rate, loss function, optimizer, and regularization methods have been studied. We evaluate NCNN model on four NASA system data sets (KC1, PC1, PC2, and KC3) are selected from PROMISE repository [13] to confirm that proposed NCNN model was comparable to or better than existing state-of-the-art models [such as Random Forest (RF), Decision Tree (DT), and NB] [12] and Deep Neural Network (DNN) [14] in terms of F-measure (known as F1-score), recall, precision, and accuracy. The experimental results indicated that NCNN model was comparable to the existing classifiers and outperformed.
The rest of the sections in this paper are organized as follows: Section 9.2 presents the related work. Section 9.3 gives the theoretical background on convolutional neural network (CNN) and software defect prediction. Section 9.4 presents the experimental setup. Section 9.5 gives the results and analysis. Finally, Section 9.6 concludes the paper.
This section presents the literature review of research papers on software defect prediction based on deep learning, defect prediction based on deep features, and deep learning in software engineering.
Singh et al. [15] used public data set AR1 for predicting fault proneness of modules. They compared logistic regression technique with six ML classifiers (DT, group method of data handling polynomial method, artificial neural network (ANN), gene expression programming, SVM, and cascade correlation network). The performance was compared by computing the area under the curve using Receiver Operating Characteristic (ROC) analysis where it was concluded that the value generated by DT was 0.865 which outperformed regression and other ML techniques. Dejaeger et al. [16] considered 15 distinct Bayesian Network (BN) classifiers and comparison was done with ML techniques. For the purpose of feature selection, Markov blanket principle was used. AUC and H-measure was tested using statistical framework of Demšar. The result showed that simple and comprehensible networks having a smaller number of nodes can be constructed using BN classifiers other than the NB classifier. Kumar et al. [17] experimented on 30 open-source projects to build a ML-based model for software fault prediction model using Least Square SVM (LSSVM). They applied 10 distinct feature selection techniques. Their prediction model was only appropriate for project with faulty classes less than the threshold value. B. Twala [18] performed software fault prediction on four NASA public data sets using DT, SVM, K-Nearest Neighbor, and NB. He concluded that NB classifier was most robust and DT classifier the most accurate.
S. Wang et al. [19] proposed a representation learning algorithm using Deep Belief Network (DBN) which helps in learning semantic program representation directly from source code. They worked on 10 open-source projects and showed that directly learned semantic features considerably improve both within- and cross-project defect prediction (WPDP and CPDP). On an average, WPDP was improved by 14.2% in F1, 11.5% in recall, and 14.7% in precision. CPDP approach beats TCA+ having traditional features by 8.9% in F1. Miholca et al. [20] proposed HYGRAR, a non-linear hybrid supervised classification method for software fault prediction. HYGRAR combined relational association rule mining and ANNs to distinguish between faulty and non-faulty software objects. For experiment purposes, they used 10 open-source data sets and validated the outstanding performance of the HYGRAR classifier. J. Li et al. [21] proposed a framework called Defect Prediction via CNN (DP-CNN) that used deep learning in order to effectively generate features. On the bases of program’s Abstract Syntax Trees (ASTs), they initially extracted token vectors and then encoded them as numerical vectors with the help of the process of word embedding and word mapping. Then, these numerical vectors were fed into CNN that automatically learnt structural and semantic program features. After this, for perfect software fault prediction, they combined traditional hand-crafted features with the learnt features. The experiment was conducted on seven open source project data. The measurement was done on the basis of F-measure. The final results showed that DP-CNN improves the state-of-the-art method by 12%. Cong Pan et al. [22] proposed an improved CNN model for WPDP and compared their results to existing CNN results and an empirical study. Their experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that their improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art ML models significantly for WPDP. Furthermore, they defined hyperparam-eter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.
Apart from software defect prediction, deep learning models have been used in software maintenance [23], code clone detection [24], defect detection [25], and other areas. Guo et al. [23] used a Recurrent Neural Network (RNN) model in software maintenance to create links between requirements, design, source code, test cases, and other artifacts. Li et al. [24] proposed a deep learning-based clone detection approach. In their paper, they used ASTs tokens to represent method-level code clones and non-clones to train a classifier, and classifier used to detect code clones. Their methods accomplished similar performance with low time cost. Nguyen et al. [25] in defect prediction used DNN for bug localization. Their model aim was to solve lexical mismatch problem, and pointed out that the terms used in bug report are different from the terms and code tokens used in source files. Their model achieved 70% accuracy with five recommended files.
Other software engineering areas influenced by deep learning are source code organization [26], run-time behavior analysis [27], feature position [28], vulnerability analysis [29], and code novelist identification [30].
Figure 9.1 presents a typical file level defect prediction process based on ML concepts, which is adopted by many researchers in most recent studies. From Figure 9.1, the first step of the procedure is to extract program modules (i.e., source files) from the repositories. The second step is to cate-gorized program modules defect (i.e., buggy) or no-defect (i.e., clean). The categorization is based on post-release defects collected from PROMISE repository (i.e., NASA Metrics Data Program). The third step is to extract features from the categorized program modules to form training instances. The features consist of code and design metrics. The fourth step is to build a classification model and use training instances to train the model. In our case, we had selected three ML models (RF, DT, and NB) as well as DNN for comparative analysis with our proposed NCNN model. The last step is to feed new program feature instances into the trained classifier to predict whether a source file is defective or non-defective (Buggy/Clean).
Figure 9.1 File level defect prediction process.
CNN is a special kind of neural network which is used to process data and has a recognized, grid-like topology [31] such as one-dimensional (1D) time series data and 2D image data. CNN has been extremely successful in practical applications, with speech recognition [32], image classification [33], and natural language processing [34, 35]. In our work, we influenced our proposed model to extract features from software repositories (i.e., source files).
Figure 9.2 demonstrates a basic CNN architecture. It consists of convo-lutional layers, pooling layers, and a simple fully connected network, along with a dense network. Neuron units are also connected to all neuron units of its neighboring layers. Even neural units connected to these two layers are sparsely connected, which is determined by kernel size and pooling size. The architecture represents the two features of CNN: sparse connectivity and shared weights, which allows CNN to capture local structural information of inputs.
The sparse connectivity property means that each neuron is connected to only a limited number of other neurons, and in CNN, it is controlled by kernel size and pooling size. From Figure 9.2, if we take node V3 and kernel size 3, it only affects three nodes in convolutional layer, i.e., h1, h2, and h3, whereas node h4 is not affected by V3. Each subset acts as a local filter
Figure 9.2 A basic convolutional neural network (CNN) architecture.
connecting to the next layer in CNN which produce strong responses to a spatially local input pattern. To find the output to the next layer, each local filter will multiply by outputs from the previous layer, then add a bias and perform a non-linear transformation. From Figure 9.2, it is observed that the ith neuron in the mth layer (convolutional layer) as him, the weights of the ith neuron in the (m – 1)th layer as Wim–1, the bias in the (m– 1)th layer as bm–1, and, in our work, we use rectified linear units (RELUs) which were recently shown to give a better performance in many neural network classifications tasks. The out can be calculated as follows:
To produce values from output layer, Softmax activation function is used which is also a type of sigmoid function. Softmax normalizes each neuron’s output to a range 1 and 0. It is non-linear in nature. It is usually used when we are trying to handle multiple classes.
The mathematical expression for Softmax activation function is as follows:
where
z: The input vector to the softmax function.
zi: All the zi values are the elements of the input vector to the softmax function, and they can take any real value, positive, zero, or negative.
ezi: The standard exponential function is applied to each element of the input vector. This gives a positive value above 0, which will be very small if the input was negative, and very large if the input was large.
: The term on the bottom of the formula is the normalization term. It ensures that all the output values of the function will sum to 1 and each be in the range (0, 1), thus constituting a valid probability distribution.
K: The number of classes in the multi-class classifier.
Shared weight means filter shares the same parameterization (weight vector and bias). From Figure 9.2, connecting the input layer and convo-lutional layer indicated by all the solid black lines which share the same parameters. The same is also true for the blue sparse-dotted lines and the orange dense-dotted lines. This concept of share weights allows a CNN to capture features that are self-determining their positions and efficiently diminish model capacity.
Another significant concept of CNN is max-pooling, which partition the output vector into several non-overlapping subregions, and outputs the maximum value of each sub-region. This keen concept of reducing the dimensionality of in-between representations provides additional robustness to our defect prediction.
The efficacy of CNNs largely works in an empirical manner, with the researcher tuning her models as per the application domain and the data available. Thus, parameter tuning is a key to train a successful CNN. We will discuss in Section 9.4 how to set these parameters in our proposed model.
There are a number of open-source data sets available online for the analysis of defect prediction models. For the study, four NASA system data sets (KC1, PC1, PC2, and KC3) are selected from PROMISE repository [13] which is freely available as public data sets. The selected data sets are of different sizes and different number of set of metrics, i.e., KC1, have 22 attributes with 2,109 instances, PC1 have 22 attributes with 1,109 instances, PC2 have 37 attributes with 745 instances, and KC3 have 40 attributes with 194 instances. These data sets contain software metrics like Halstead and McCabe metrics and a Boolean variable that indicates defect or no-defect proneness of a module. Table 9.1 displays characteristics of the NASA data sets (PC1, PC2, KC1, and KC3).
The WEKA (Waikato Environment for Knowledge Analysis) tool was used for the statistical output processing of data sets. WEKA is open-source software that gives the user the power of pre-processing, implementation of well-known ML algorithms, and conception of their data so that one can develop ML techniques and apply them to real-world data problems. The data was analysed i.e., the accuracy of different data sets was calculated using various classifiers, namely, RF, DT, NB [12], and DNN [14]. The results of these classifiers were then compared with the results generated by our proposed NCNN model.
Moreover, in this data set, we are provided with 21 traditional defect prediction features for each source file, including Lines of Code (LOC), McCabe complexity measures, Halstead base measure, and derive measure. The 21 traditional features are carefully extracted from PROMISE Software Engineering Repository [13]. We list the detailed description about the 21 features in Table 9.2.
Table 9.1 Characteristics of the NASA data sets.
Data set | Project | Number of attributes | Number of instances | Number of defective entities | Number of nondefective entities |
NASA | PC1 | 22 | 1109 | 77 (6.9%) | 1,032 (93.05%) |
NASA | PC2 | 37 | 745 | 16 (2.10%) | 729 (97.90%) |
NASA | KC1 | 22 | 2109 | 326 (15.45%) | 1,783 (84.54%) |
NASA | KC3 | 40 | 194 | 36 (18.6%) | 158 (81.4%) |
This section presents our proposed NCNN model based on 1D CNN model. The overall network architecture of NCNN is shown in Figure 9.3. NCNN model consisted of two convolutional layers, two max-pooling layers to extract global pattern, a flattening layer and two dense layers to generate deep features and help better simplification, and, finally, a convo-lutional linear sequential model classifier to predict whether a source file was defect.
Other details about NCNN architecture are mentioned below:
Table 9.2 Attribute information of the 21 features of PROMISE repository [13].
Attribute information | Symbol |
McCabe’s line count of code | loc |
McCabe “cyclomatic complexity” | v(g) |
McCabe “essential complexity” | ev(g) |
McCabe “design complexity” | iv(g) |
Halstead total operators + operands | n |
Halstead “volume” | v |
Halstead “program length” | l |
Halstead “difficulty” | d |
Halstead “intelligence” | i |
Halstead “effort” | e |
Halstead | b |
Halstead’s time estimator | t |
Halstead’s line count | lOCode |
Halstead’s count of lines of comments | lOComment |
Halstead’s count of blank lines | lOBlank |
Halstead lines of code and comment | lOCodeAndComment |
Unique operators | uniq_Op |
Unique operands | uniq_Opnd |
Total operators | total_Op |
Total operands | total_Opnd |
’Ihe flow graph | branchCount |
Figure 9.3 Overall network architecture of proposed NCNN model.
This section presents the effectiveness of our NCNN by comparing its F-measure and accuracy on defect prediction with other state-of-the-art methods such as RF, DT, and NB [12] and DNN [14]. We also explain some basic terminologies associated with software defect prediction. A training set refers to a set of instances used to train a model, whereas a test set refers to a set of instances used to evaluate the learned model. When applying defect prediction, the training set and the test set come from the same source file. In the field of ML and, specifically, the problem of statistical classification, a confusion matrix, also known as an error matrix is used. A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix. The confusion matrix shows the ways in which classification model is confused when it makes predictions. It gives us insight not only into the errors being made by a classifier but more importantly the types of errors that are being made. Figure 9.4 shows the description regarding the confusion matrix.
Class 1: False
Class 2: True
Where the above terms are defined as follows:
Figure 9.4 Description regarding confusion matrix.
The other evaluation parameters used in this research work are TP Rate (TPR), TN Rate (TNR), FN Rate (FNR), and FP Rate (FPR), precision, recall, F-measure (also called F1-score), and accuracy.
TPR is when it is actually positive, how often does it predict positive?
TNR is when it is actually negative, how often does it predict negative?
FPR is when it is actually negative, how often does it predict positive?
FNR is proportion of positive which yield negative test outcomes with the test?
Precision (P) measures the number of positive class predictions that belong to the positive class.
Recall measures the number of positive class predictions made out of all positive examples in the data set.
F-measure offers a single score that balances both the concerns of precision and recall in one number.
Whereas the accuracy is the total number of correct predictions divided by the total number of predictions made for a data set.
For the final analysis, we computed the performance measures for all the five classification techniques used in the study. The results were based on the values of precision, recall, F-measure, and accuracy. Mentioned below are the tables and graphs of our study for four NASA system data sets (KC1, PC1, PC2, and KC3) that are selected from PROMISE repository [13] which is freely available as public data sets.
The observations from the tables and graphs are as follows:
Figure 9.5 Confusion matrix analysis for the data sets (KC1, KC3, PC1, and PC2).
Figure 9.6 Model accuracy and model loss analysis for the data sets (KC1, KC3, PC1, and PC2).
Figure 9.7 Performance comparison of different models for software defect prediction for the data sets (KC1, KC3, PC1, and PC2).
Table 9.3 Performance comparison for the data set KC1.
KC1 | ||||
Algorithm | Precision | Recall | F-Measure | Accuracy |
RF | 0.887 | 0.965 | 0.925 | 86.67 |
DT | 0.865 | 0.974 | 0.916 | 84.87 |
NB | 0.888 | 0.905 | 0.897 | 82.36 |
DNN | 0.87 | 0.98 | 0.94 | 88.57 |
NCNN | 0.99 | 0.99 | 0.99 | 88.76 |
KC3 | ||||
Algorithm | Precision | Recall | F-Measure | Accuracy |
RF | 0.832 | 0.968 | 0.895 | 81.44 |
DT | 0.87 | 0.93 | 0.899 | 82.99 |
NB | 0.863 | 0.88 | 0.871 | 78.87 |
DNN | 0.97 | 0.97 | 0.97 | 87.88 |
NCNN | 0.97 | 0.98 | 0.98 | 96.97 |
PC1 | ||||
Algorithm | Precision | Recall | F-Measure | Accuracy |
RF | 0.95 | 0.984 | 0.96 | 93.68 |
DT | 0.937 | 0.99 | 0.96 | 92.87 |
NB | 0.947 | 0.936 | 0.942 | 89.17 |
DNN | 0.94 | 0.99 | 0.97 | 94.65 |
NCNN | 0.93 | 0.93 | 0.96 | 93.16 |
Table 9.6 Performance comparison for the data set PC2.
PC2 | ||||
Algorithm | Precision | Recall | F-Measure | Accuracy |
RF | 0.979 | 1 | 0.989 | 97.85 |
DT | 0.979 | 1 | 0.989 | 97.85 |
NB | 0.98 | 0.925 | 0.951 | 90.73 |
DNN | 0.99 | 0.99 | 0.98 | 98.66 |
NCNN | 0.99 | 0.99 | 0.99 | 98.71 |
In this paper, an attempt has been made to propose a NCNN model to predict software defects. The framework using Python Programming Language with Keras and TensorFlow was used to implement our NCNN model. A comparative analysis with ML algorithms (such as RF, DT, and NB) and DNN model in terms of F-measure (known as F1-score), recall, precision, and accuracy has been presented from four NASA system data sets (KC1, PC1, PC2, and KC3) selected from PROMISE repository. From Table 9.7, we observed that NCNN classifier predicts TPR and TNR greater than all ML classifier (RF, DT, and NB) and DNN classifier almost in all the data sets. Even, NCNN predicts FNR and FPR well, it is lower almost in all the data sets. From Tables 9.3 to 9.6 and Figures 9.7 and 9.8, we examined (F-measure, recall, precision, and accuracy). From Figure 9.7, we observed that NCNN predicts software defects in terms of F-measure which is higher than all other classifiers (DNN, RF, DT, and NB) for the data sets KC1, KC2, and PC2 except in PC1 where it is lower than DNN. From Table 9.3 to 9.6 and Figure 9.8, we found that NCNN has higher accuracy for the data set KC1 (88.76), for the data set KC3 (96.97), and for the data set PC2 (98.71). For the data set PC1, NCNN has less accuracy (93.16) than DNN (94.65) and RF (93.68) but higher than DT and NB. Thus, it is shown that the proposed NCNN model outperformed the other algorithm models in most cases. In terms of future scope, we will present a deep learning outline which automatically obtains syntactic and semantic features from the source code and yields key features from them for accurate software defect prediction. We believed to apply an open-source python packaged name javalang which helps to parse Java source code into ASTs, which offers a lexical analyzer and parser based on the Java language specification which helps to construct ASTs of the Java source code. From Figure 9.9, it is observed that NCNN classifier predicts TPR and TNR greater than all ML classifier (RF, DT, and NB) and DNN classifier in the data sets KC1, KC2, and PC2 except in PC1 TPR of NCNN is lower than NB classifier but greater than RF, DT, and DNN classifiers but TNR of NCNN in case of PC1 is greater than all other classifiers. It is also observed that NCNN predicts FNR and FPR well; it is lower than all the data sets except in KC3 and PC1 where FPR of DNN is lower than NCNN and FNR of NB is lower than NCNN. This firmly belief that NCNN model outperformed in confusion rate analysis for the all-data sets and it is not underfit or overfit.
Table 9.7 Confusion matrix analysis for the KC1, KC3, PC1, and PC2 data sets (TPR, True Positive Rate; TNR, True Negative Rate; FPR, False Positive Rate; FNR, False Negative Rate).
Algorithm | KC1 | KC3 | PC1 | PC2 | ||||||||||||
TPR | TNR | FPR | FNR | TPR | TNR | FPR | FNR | TPR | TNR | FPR | FNR | TPR | TNR | FPR | FNR | |
RF | 0.33 | 0.96 | 0.04 | 0.67 | 0.14 | 0.97 | 0.03 | 0.86 | 0.29 | 0.98 | 0.015 | 0.70 | 0 | 1 | 0 | 1 |
DT | 0.17 | 0.97 | 0.03 | 0.83 | 0.38 | 0.91 | 0.07 | 0.61 | 0.10 | 0.99 | 0.009 | 0.89 | 0 | 1 | 0 | 1 |
NB | 0.38 | 0.90 | 0.07 | 0.62 | 0.38 | 0.90 | 0.12 | 0.61 | 0.29 | 0.93 | 0.06 | 0.70 | 0.13 | 0.92 | 0.07 | 0.88 |
DNN | 0.47 | 0.98 | 0.02 | 0.53 | 0.88 | 0.83 | 0.02 | 0.11 | 0.22 | 0.994 | 0.005 | 0.77 | 0.81 | 0.990 | 0.009 | 0.19 |
NCNN | 0.50 | 0.99 | 0.01 | 0.50 | 0.91 | 0.82 | 0.03 | 0.08 | 0.25 | 0.998 | 0.001 | 0.74 | 0.87 | 0.994 | 0.005 | 0.13 |
Figure 9.8 Model accuracy analysis for the data sets (KC1, KC3, PC1, and PC2).
Figure 9.9 Confusion rate analysis for the data sets (KC1, KC3, PC1, and PC2).
1. Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., Bener, A., Defect prediction from static code features: Current results, limitations, new approaches. Autom. Software Eng., 17, 375–407, 2010.
2. Moser, R., Pedrycz, W., Succi, G., A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, in: Proceedings of the 30th International Conference on Software Engineering, Leipzig, Germany, 15 May 2008, p. 181.
3. Tan, M., Tan, L., Dara, S., Mayeux, C., Online Defect Prediction for Imbalanced Data, in: Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), Florence, Italy, 16–24 May 2015, pp. 99–108.
4. Nam, J., Pan, S.J., Kim, S., Transfer defect learning, in: Proceedings of the International Conference of Software Engineering, San Francisco, CA, USA, 18–26 May 2013.
5. Nam, J., Survey on Software Defect Prediction. Ph.D. Thesis, The Hong Kong University of Science and Technology, Hong Kong, China, 3 July 2014.
6. Lyu, M.R., Handbook of Software Reliability Engineering, vol. 222, IEEE Computer Society Press, Washington, DC, USA, 1996.
7. Rathore, S.S. and Kumar, S., A decision tree logic-based recommendation system to select software fault prediction techniques. Computing, 99, 3, 255– 285, Mar. 2016.
8. Malhotra, R. and Jain, A., Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality. J. Inf. Process. Syst., 8, 2, 241–262, Jun. 2012.
9. He, P., Li, B., Liu, X., Chen, J., Ma, Y., An empirical study on software defect prediction with a simplified metric set. Inf. Software Technol., 59, 170–190, Mar. 2015.
10. Elish, O.M., Al-Yafei, H.A., Al-Mulhem, M., Empirical comparison of three metrics suites for fault prediction in packages of object-oriented systems: A case study of Eclipse. Adv. Eng. Software, 42, 10, 852–859, Oct. 2011.
11. Peng, Y., Kou, G., Wang, G., Wu, W., Shi, Y., Ensemble Of Software Defect Predictors: An Ahp-Based Evaluation Method. Int. J. Inf. Technol. Decis. Mak., 10, 01, 187–206, Jan. 2011.
12. Malhotra, R., A systematic review of machine learning techniques for soft-ware fault prediction. Appl. Soft Comput., 27, 504–518, Feb. 2015.
13. http://promise.site.uottawa.ca/SERepository/datasets-page.html.
14. Gupta, M., Rajnish, K., Bhattacherjee, V., Impact of parameter tuning for optimizing deep neural networks models for predicting software faults. Sci. Program., Hindawi, 2021, 1–17, 12th June, 2021 (Page No:234) [communicated].
15. Singh, Y., Kaur, A., Malhotra, R., Prediction of Fault-Prone Software Modules using Statistical and Machine Learning Methods. Int. J. Comput. Appl., 1, 22, 8–15, Feb. 2010.
16. Dejaeger, K., Verbraken, T., Baesens, B., Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers. IEEE Trans. Software Eng., 39, 2, 237–257, Feb. 2013.
17. Kumar, L., Sripada, K.S., Sureka, A., Rath, K.S., Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM). J. Syst. Software, 137, 686–712, Mar. 2018.
18. Twala, B., Predicting Software Faults in Large Space Systems using Machine Learning Techniques. Def. Sci. J., 61, 4, 306–316, Jul. 2011.
19. Wang, S., Liu, T., Tan, L., Automatically learning semantic features for defect prediction. Proceedings of the 38th International Conference on Software Engineering - ICSE ‘16, 2016.
20. Miholca, L.D., Czibula, G., Czibula, G.I., A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks. Inf. Sci., 441, 152–170, May 2018.
21. Li, J., He, P., Zhu, J., Lyu, R.M., Software Defect Prediction via Convolutional Neural Network. IEEE International Conference on Software Quality, Reliability and Security (QRS), Jul. 2017.
22. Pan, C., Lu, M., Xu, B., Gao, H., An Improved CNN model for within project software defect prediction. Appl. Sci., 9, 2138–216, 2019.
23. Cheng, J., Guo, J., Cleland-Huang., J., Semantically Enhanced Software Traceability Using Deep Learning Techniques, in: Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, Argentina, 20–28 May 2017, pp. 3–14.
24. Li, L., Feng, H., Zhuang, W., Meng, N., Ryder, B., CC Learner: A Deep Learning-Based Clone Detection Approach, in: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), Shangai, China, 17–24 September 2017, pp. 249–260.
25. Lam, N.A., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N., Bug localization with combination of deep learning and information retrieval, in: Proceedings of the 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), Buenos Aires, Argentina, 22–23 May 2017, pp. 218–229.
26. Reyes, J., Ramirez, D., Paciello, J., Automatic Classification of Source Code Archives by Programming Language: A Deep Learning Approach, in: Proceedings of the International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2016, pp. 514–519, 2016.
27. Zekany, S., Rings, D., Harada, N., Laurenzano, M.A., Tang, L., Mars, J., Crystal Ball: Statically analyzing runtime behaviour via deep sequence learning, in: Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016, pp. 1–12.
28. Corley, C.S., Damevski, K., Kraft, N.A., Exploring the use of deep learning for feature location, in: Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), Bremen, Germany, 29 September–1 October 2015, pp. 556–560.
29. Pang, Y., Xue, X., Wang, H., Predicting Vulnerable Software Components through Deep Neural Network, in: Proceedings of the 2017 International Conference on Deep Learning Technologies, Chengdu China, 2–4 June 2017, pp. 6–10.
30. Bandara, U. and Wijayarathna, G., Deep Neural Networks for Source Code Author Identification, in: Proceedings of the 20th International Conference, Daegu, Korea, 3–7 November 2013, pp. 368–375.
31. Goodfellow, I., Bengio, Y., Courville, A., Deep learning Nature, 2021, 1–17, 12th June, 2021 (Page No:234).
32. Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G., Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition, in: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012, pp. 4277–4280.
33. Krizhevsky, A., Sutskever, I., Hinton, G.E., Image net classification with deep convolutional neural networks, in: Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012, pp. 1097–1105.
34. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient-based learning applied to document recognition. Proc. IEEE, 86, 2278–2324, 1998.
35. Zhang, X., Zhao, J., LeCun, Y., Character-level Convolutional Networks for Text Classification, in: Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015.
18.191.8.216