4
Mathematical Methods in Deep Learning

Srinivasa Manikant Upadhyayulaand Kannan Venkataramanan

CRISIL Global Research & Analytics, CRISIL (A S&P Company), CRISIL House, Central Avenue, Hiranandani Business Park, Powai, Mumbai, 400 076, India

4.1 Deep Learning Using Neural Networks

Deep learning allows building quantitative models that constitute a series of processing layers to learn representations of data coupled with multiple levels of abstraction [1]. Neural networks were developed to understand the basic functioning of human brain and the entire central nervous system. Later, the model designed to capture the working of the human nervous system is applied to financial service domains such as link analysis of payments, fraud detection in customer transactions, and anomalies in transactions for potential money laundering.

4.2 Introduction to Neural Networks

A neural network works in the same pattern as that of a neuron in the human nervous system. The fundamental epitome of this learning technique is that it consists of a large number of highly organized and connected neurons working in harmony to solve a specific problem including pattern recognition or data classification. Neural networks are not a recent phenomenon but started before the advent of modern computers. It began with the work of McCulloch and Pitts [2] who created a theoretical representation of neural networks using a combination of human nervous system and mathematics (application of calculus and linear algebra). McCulloch–Pitts networks (or referred as MP networks) represent a finite state automaton embodying the logic of propositions, with quantifiers, in the form of computer programs [3].

With the advent of parallel distributed processing in mid‐1980s, Rumelhart, McClelland, and coworker [4] applied the concepts of parallel distributed processing to the neural networks. Their work signaled the dawn of applying advanced techniques in neural networks in the domain of medical research. Qian and Sejnowski [5] presented a novel method predicting the secondary structure of globular proteins based on nonlinear neural network models. The average accuracy of the developed model on a testing set of proteins nonhomologous with the corresponding training set was 64.3%. Kneller et al. [6] have applied neural networks to predict the mapping between protein sequence and secondary structure. By adding neural network units that detect periodicities in the input sequence and use of tertiary structural class, the accuracy for predicting the class of all‐α proteins is at 79%. Rost and Sander [7] applied evolutionary information contained in multiple sequence alignments as inputs to neural networks and predicted the secondary structure with significant accuracy. The model developed has demonstrated an overall accuracy of 71.6% in a multiple cross‐validation test on 126 unique protein chains.

In the recent years, the applications of neural networks have increased exponentially across domains. Courbariaux et al. [8] introduced a method to train binarized neural networks (BNNs) – neural networks with binary weights and activations at run‐time. BNNs drastically reduce memory size and accesses and replace most arithmetic operations with bit‐wise operations, which is expected to substantially improve power efficiency. Silver et al. [9] have developed a new approach to computer Go in which deep neural networks are trained by a novel combination of supervised learning from human expert games and reinforcement learning from games of self‐play. The neural networks play Go at the level of state‐of‐the‐art Monte Carlo tree search programs that simulate thousands of random games of self‐play. Their program Alpha Go achieved a 99.8% winning rate against other Go programs. Esteva et al. [10] have applied deep convolutional neural networks (CNNs) to classify skin lesions, trained end‐to‐end from images directly, using only pixels and disease labels as inputs using a dataset of 129 450 clinical images. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists.

4.2.1 Artificial Neural Network (ANN)

Artificial neural network (ANN) is a computational model that is based on the structure and functioning of human neural networks, and the models are built using an interconnected network of neurons with multiple hidden layer(s) for processing data from input variables.

ANN consists of input variables, hidden layers, and an output layer. The input layer represents the set of neurons that represent the independent features (or input variables). These data points are passed to the hidden layer. In the hidden layer with “n” neurons, each neuron is assigned a weight and the inputs are multiplied with the designated weights, thereby transforming the input parameters and segregating the data to obtain the desired output. The combination of inputs and weights are then passed to the activation function, which determines the processing of different neurons in the hidden layer and passes the results to the output layer. The output layer considers the output results and displays the final result (Figure 4.1).

A single neuron in the hidden layer obtains information from the set of independent variables from the input data (numbered x1 to xn). Each input variable will be assigned a weight wi (where i represents the value from 1 to n).

Mathematically, a single neuron is represented as

equation
Schematic representation of a single hidden layer neural network in which the input payer determines the processing
of different neurons in the hidden layer and passes the results to the output layer.

Figure 4.1 Shows the schematic representation of a single hidden layer neural network.

Schematic representation of a single neuron in the hidden layer which obtains information from the set of independent variables from the input data (numbered x1 to xn).

Figure 4.2 Shows the schematic representation of a single neuron of the hidden layer.

In simple terms, it can be written as

equation

where xi represents the inputs from independent variables,

wi represents the weights associated with these independent variables, and

ε0 represents the error or bias associated with the neuron in the hidden layer (Figure 4.2).

For a single hidden layer ANN with “N” neurons, the neurons in the hidden layer are represented as

equation
equation
equation
equation

The entire single hidden layer with “N” neurons with activation function can be represented as

equation
equation
equation

where xi represents the inputs from independent variables,

wij represents the weights associated with these independent variables, and

εj represents the error or bias associated with each neuron in the hidden layer.

The output y will be represented as an activation function of Z, and it is dependent on the activation function of the neural network.

equation

4.2.1.1 Activation Function

Activation function is defined as the computational logic of the neural networks that takes into account both the input variables and their correspondent weights to determine the impact of the variables on the desired output and segregate the relevant input information necessary to process a particular neuron. Also known as transfer function or cost function, activation function influences the output based on a given set of inputs.

In terms of mathematical representation, an activation function can be either a linear or a nonlinear function. For a linear activation function, the output is linear in nature and lies within a range of {−∞, ∞}. In simple terms, it can be represented as

equation

The identity activation function is the simplest and most commonly linear activation function used in regression problems. The identity function is monotonic in nature, and the derivative of an identity function g ′ (z) is 1.

However, this kind of activation function does not support when the input variables have complexity in the data or the input data of a variable follow nonlinear patterns. In such cases, a nonlinear activation function is used. It is extremely useful when the data follow a parabolic or exponential curve, and this function makes it easier for the model to adapt to the variety of data points for an input variable.

The most common applications of nonlinear functions are used when the slope follows a differential or derivative function. Also, if the input variable conforms to monotonic functions, then a nonlinear activation function would help in determining the output. Some of the commonly used nonlinear activation functions are – sigmoid function, tanh function, and Rectified Linear Unit (ReLU) function.

4.2.1.2 Logistic Sigmoid Activation Function

A sigmoid function has a characteristic “S”‐shaped curve. Also, it is considered to be a special case of logistic function. Mathematically, it is represented as

equation
equation

Graphically, it is presented in Figure 4.3.

Graphical representation of Sigmoid function that has a characteristic “S”-shaped curve, which is considered to be a special case of logistic
function.

Figure 4.3 Shows the graphical representation of Sigmoid function.

For sigmoid activation function, the output is nonlinear in nature and lies within a range of (0,1). Because most of the problems that involve neural networks are classification problems (especially binary classification for areas such as predicting default of a loan or identifying a transaction to be fraud or not), this function is most appropriate for predicting the probability of the output. Because this function is differentiable and monotonic, the derivative of a sigmoid function g ′ (z) is

equation
equation
equation

Also, the derivative of this function conforms to a bell‐shaped curve or a simple normal distribution, and it is the most appropriate form for calculating the gradients used in the neural networks. The gradients for the layer can be estimated using arithmetic operations such as simple subtraction and multiplication. Also, it is used as an activation function in neural networks to bring nonlinearity into the model. For example, Gershenfeld et al. [11] has used the logistic sigmoid function as an activation function to keep the response of the neural network bounded. The function used is represented as

equation

4.2.1.3 tanh or Hyperbolic Tangent Activation Function

A sigmoid function‐based activation function has its set of limitations during training of the neural networks. For such scenarios where highly negative inputs are fed to the logistic sigmoid function, then the output value is almost zero. This affects the calculations of gradient parameters for the feedforward neural networks when there are large numbers of neurons in the hidden layer and the activation function gets stuck during training.

In such cases, an alternative to the sigmoid activation function is hyperbolic tangent function (tanh). The points in the hyperbolic functions (cosh θ and sinh θ) form a semiequilateral hyperbola. The hyperbolic functions can be defined as the two sides of a right‐angled triangle covering the hyperbolic sector. Mathematically, a hyperbolic tangent is represented as

equation
equation

Graphically, the function is presented in Figure 4.4.

Graphical representation of hyperbolic tangent activation function that has an “S”-shaped curve and lies within a range of
(−1,1).

Figure 4.4 Shows the graphical representation of tanh function.

For tanh activation function, the output is similar to sigmoid function (“S” curve) and lies within a range of (−1,1). Like the sigmoid function, this function is differentiable and monotonic, and the derivative of a hyperbolic tangent function g ′ (z) is

equation
equation
equation

Similar to the sigmoid function, the tanh activation function is applied in feedforward neural networks for classification and prediction problems.

4.2.1.4 ReLU (Rectified Linear Unit) Activation Function

The ReLU function is the most used activation function in all forms of neural networks including CNNs [12]. Also known as ramp function, the ReLU function is rectified for all the negative values of the input while it conforms to linear or identity function for all the positive values of the input.

Deep networks with ReLU as an activation function are easy to optimize and train in comparison with the network models with sigmoid‐based or tanh‐based activation functions because the gradient parameters are able to flow easily when there are multiple hidden layers with a large number of neurons. This has made ReLU a popular activation function and has great applications in speech recognition.

Mathematically, the rectifier in the activation function is defined as

equation

where x is the input to the neuron.

In simple terms, this function is represented as

equation

Graphically, the function is presented in Figure 4.5.

Graphical representation of a Rectified Linear Unit function and its derivative that are monotonic in nature and the output lies in the range of [0, ∞) for all values of input.

Figure 4.5 Shows the graphical representation of ReLU function.

This function does not have negative outputs for negative input values. Unlike the above two functions, Sigmoid and tanh, both the ReLU function and its derivative are monotonic in nature and the output lies in the range of [0, ∞) for all values of input. The derivative of the ReLU function is

equation

However, this function does not include any negative inputs because of which only the positive inputs will be available for training the model. Hence, all negative inputs will not be part of the model, and this drastically affects the predictability and accuracy of the model.

4.3 Other Activation Functions (Variant Forms of ReLU)

4.3.1 Smooth ReLU

In this variant of ReLU, the function is represented as a logistic function

equation

The derivative of this variant of ReLU function is

equation
equation

4.3.2 Noisy ReLU

In this variant of ReLU, the function is represented as a ramp function with Gaussian noise α

equation

where α ∼ η (0, σ(x))

In simple terms, this function is represented as

equation

The derivative of this variant of ReLU function is

equation

This variant is used in restricted Boltzmann machines for computer vision activities.

4.3.3 Leaky ReLU

In this variant of ReLU, the function is represented as a ramp function with a small, positive gradient for negative inputs

equation

The derivative of this variant of ReLU function is

equation

4.3.4 Parametric ReLU

In this variant of ReLU, the function is represented as a ramp function with the coefficient of leakage into a parameter for learning for negative inputs

equation

In cases where a ≤ 1, then the function is represented as

equation

The derivative of this variant of ReLU function is

equation

4.3.5 Training and Optimizing a Neural Network Model

ANN is a supervised learning technique, i.e. the algorithm is trained on labeled data to identify different patterns in the input data that contribute to a given input. For obtaining a specific output, the weights assigned to inputs in each neuron are adjusted accordingly by the model. The higher the weight assigned to a particular input, the more impact the input variable has on the neuron, and this impact will continue to affect the neurons in subsequent layers as well. To show inhibition, negative weights are sometimes assigned to the input variables. This entire process of adjusting the weights of the neuron and obtaining the right set of values to obtain the desired output results is known as training the neural network.

There are multiple algorithms to train a neural network model. The most commonly used algorithm is backpropagation algorithm. This algorithm calculates the error in estimation and correspondingly determines the weights of each layer to obtain the desired output.

4.4 Backpropagation Algorithm

Werbos's [13] backpropagation algorithm provided a breakthrough in the field of neural networks paving way for ANNs. Johansson et al. [14] developed a backpropagation learning for multilayer feedforward neural networks using the conjugate gradient method for improving and optimizing the learning rates. Chen and Jain [15] derived a robust backpropagation learning algorithm that is resistant to the noise effects and is capable of rejecting gross errors during the approximation process.

Yu et al. [16] proposed a general backpropagation algorithm for feedforward neural network learning with time‐varying inputs. In this approach, the Lyapunov function is used to analyze the convergence of weights, with the use of the algorithm for minimization of the error function. Khashman [17] proposed a modified backpropagation learning algorithm, with additional emotional weights for the two additional emotional parameters: anxiety and confidence. The proposed neural network was implemented to a facial recognition problem, and the results showed an improved performance with higher recognition rates and faster recognition time in comparison to the results of conventional neural network.

Sapna et al. [18] proposed a novel way of building backpropagation algorithm based on Levenberg–Marquardt algorithm to obtain an intellectual and efficient diabetic prediction method for assisting medical practitioners, special educators, occupational therapists, and psychologists in better assessment of diabetes.

Backpropagation algorithm follows a gradient descent approach and implements the chain rule (used in calculus) for calculating the derivative of two or more functions. It is similar to the Gauss–Newton algorithm (Figure 4.6).

Schematic representation of a standard back propagation neural network with the input variables (x1, x2, x3, x4), hidden layers (z1, z2, z3, z4, z5) and output values (y1, y2).

Figure 4.6 Shows the schematic representation of a standard back propagation neural network with the input variables (x1, x2, x3, x4), hidden layers (z1, z2, z3, z4, z5) and output values ( y1, y2).

For training the neural network, the inputs from input layer (represented by xi) are multiplied with weights (represented by wij, where i represents the feature from the input layer and j represents the neuron from the hidden layer). Each neuron is represented by a mathematical function

equation

Similarly, if there are “n” neurons in the hidden layer, then the neurons are represented as

equation

On applying the activation function on the neurons, the result at each neuron is represented as

equation

Now, each neuron has its influence on the output (represented by Zj) and each will be assigned a weightage (represented by Wjk, where j represents the neuron from the hidden layer and k represents the values from the output layer). Hence, the first output from “N” neurons represented by Z1 will be influenced by a combination of results from neuron multiplied by the assigned weightage with the error term E.

equation
equation

The output y will be represented as an activation function of Z

equation

For “k” values of output obtained, the activation function is represented as

equation
equation
equation

As part of training the neural network and optimizing the model for all the input parameters used for the model, the predicted output obtained from the model is compared with the actual output. Based on the difference, the model is optimized to ensure that the variance between the predicted output and actual output is minimum. For the model, the standard error function is sum of squared difference between the predicted output of the model and the actual output. Mathematically, it is represented as

equation

The backpropagation algorithm effectively solved the pattern identification and data classification problems through multiple iterations of training the algorithm and assigning the calculated weights at each node. The constraints for model training and optimization depend on the minimization of error function E and the optimal use of different activation functions.

Training and optimizing are the two important aspects in the process of building a neural network model for solving classification and prediction problems. In order to achieve the best‐fit and most optimal model, there are multiple parameters that are to be tuned and optimized. Some of the most important parameters are number of hidden layers, activation function used in the hidden layer, learning rate for gradient descent (for backpropagation algorithm), momentum for gradient descent, number of epochs, and output function. Applying the right combination of optimized parameters to obtain a best‐fit model is a tricky process and requires multiple repetition of model training through modification of each parameter for achieving the least error and maximum accuracy.

As part of building an ANN model, the developer (or modeler) has to assess the model's variable selection criteria and transformation process to ensure a strong relationship between the transformed predictor variables and dependent variable. As part of model's variable selection criteria, the developer needs to

  1. Review the selected set of input and predictor variables
  2. Review all the new features created and its predictive power including categorizing the continuous variables in the input parameters
  3. Perform data assessment and profiling for identifying blanks, missing values, and outliers in the data
  4. Review the criteria used for determining whether the variable should be included for model building
  5. Critical examination of all the methods used for treatment of missing (or special) values to ensure whether such treatments will have an adverse effect on the predictive performance of the ANN model or not.

In order to evaluate the predictive power of all the selected input variables, the following two statistical tests need to be performed:

Statistical test Test description Points of consideration
Weight of evidence (WoE) Measures the strength of a set of categories across different values of the predictor variable to separate “good” and “bad” outcomes High negative or positive values are an indication of strong variable predictive power
Information value (IV) Assesses the overall power of a variable in separating “good” and “bad” outcomes by summing the product of WoE and the difference of “good” and “bad” across all categories within the variable Higher IV levels indicate stronger relationship between the variable and the good/bad odds ratio. Can be used to compare predictive power among competing variables

In order to test and review the model building parameter settings and their impact on the model performance, the following parameters need to be considered:

  • Hidden layers: Number of hidden layers in the neural network
  • Activation function: The activation function can be sigmoid, linear, and tanh (hyperbolic tangent)
  • Learning rate: The learning rate for gradient descent
  • Momentum: The momentum for gradient descent is a value between 0 and 1 that increases the size of the steps taken toward the minimum by trying to jump from some local minima
  • Learning rate scale: The learning rate will be multiplied by this scale after every iteration
  • Number of epochs: Number of iterations for samples
  • Batch size: The size of the batches taken for training
  • Output function: The output function can be sigmoid, linear, or ReLU
  • Hidden dropout: The fraction of hidden layer to be dropped out for model training
  • Visible dropout : The fraction of drop put in the input layer

These parameter settings are essential to regulate the complexity of the ANN model as they ensure optimization of the computational power, the number of variables involved in model training, and the weights to each variable for obtaining the best‐fit model.

4.5 Performance and Accuracy

The following tests can be used to formally assess the performance and accuracy of an ANN model.

Statistical test Test description Points of consideration
Traditional performance metrics (from confusion matrix) – accuracy, precision, F‐measure, sensitivity, and specificity The confusion matrix compares the predictions of the model with respect to actual values The higher the key performance metrics, the better the model performance
Receiver operating characteristic (ROC) The ROC curve displays the trade‐off between sensitivity and specificity
The area under the ROC curve is a measure of discriminatory power
The closer the curve is to the left border and then the top border, the more accurate the model
The closer the curve is to diagonal, the less accurate is the model
Somers' D Calculated from the difference between the number of concordant and discordant pairs Value close to zero indicates random model while value close to 1 indicates higher discriminatory power
Kolmogorov–Smirnov (KS) The KS test is used to test for differences between distribution functions Higher values of KS statistic correspond to higher level of discriminatory power
Error attribution analysis Discrepancies between predicted and actual values for the models show the magnitude of error Lower is the difference between the predicted and actual values, lower is the error in the model

4.6 Results and Observation

We have developed an ANN model based on synthetic data that closely represents the customer credit transaction data to predict the potential defaulters in the credit card payment.

The current dataset is synthetic data prepared for the purpose of prototype development. Despite a very high performance in the validation dataset, the model may be overfitting while identifying the defaulters. Therefore, owing to human error in the existing system and model overperformance, we have created two new datasets by extremely swapping the potential defaulters (PDs) with genuine customers (GCs) by 35% and 65%. This process ensures randomness to the dataset and reduces the problem of overfitting to satisfactory levels. If the performance of the model deteriorates significantly, then we can assess that the model has captured random noise in the dataset, thereby necessitating model redevelopment. Therefore, this process will indirectly assess the sensitivity of ANN algorithm.

As part of data sampling and feature engineering, we have aggregated the credit card transaction data at customer level to determine customer‐level behavior. We obtain customer‐level average transaction amount, maximum transaction amount, and number of transactions executed for each of the modes of transfer. We create a new feature based on the average duration and the standard deviation of the durations between any two transactions irrespective of the mode of transfer. We obtained the final dataset with a total of 38 746 values and 27 variables, which can be used for training the model. Although actual data may contain more variables than the synthetic dataset and has more variability, the synthetic dataset is created such that it closely represents the actual data as careful segregation is performed while preparing the dataset.

For variable selection, stepwise logistic regression is used and all 27 variables were considered to build the logistic regression model. Then, bidirectional stepwise logistic regression based on Akaike information criterion (AIC) value was implemented to determine the best combination of independent variables to estimate the dependent variable. Finally, we obtained a model with 13 variables.

We used the variables that were significant for initially training the ANN model. After multiple iterations, including variable addition and deletion, the final model variables were identified. After rigorous tuning, the identified variables based on random trials and prudential tweaking, the final set of parameters was identified for best‐fit ANN model. For example, the identified model parameters include

  1. – Number of hidden layers = 2
  2. – Number of neurons in the first layer = 80
  3. – Number of neurons in the second layer = 50
  4. – Learning rate = 0.337
  5. – Momentum = 0.7
  6. – Number of epochs = 170
  7. – Activation function used for hidden layers and output layer = sigmoid function

For comparison, we have tested the performance of this ANN against the standard logistic regression model for predicting the defaulters. We found that both ANN and logistic regression yielded similar Area Under the Curve (AUC) values. Other factors also showed similar results:

  • Logistic regression: Sensitivity of 93.7%, specificity of 53.76%, and transactional volume reduction of 39.5% with a small misclassification of 0.83% was achieved.
  • ANN: Sensitivity of 94.8%, specificity of 50.03%, and transactional volume reduction of 37.6% with a small misclassification of 0.65% was achieved.

However, on actual dataset, ANN and other forms of neural network algorithms are likely to predict nonlinearity in the dataset better than logistic regression. The objective of the ANN modeling is to identify potential areas of improvement in identifying potential defaulters in Credit Card domain using neural networks. Although we have used a synthetic dataset, we have tested various assumptions and patterns of the transaction dataset and developed a generalized ANN with backpropagation model. This model can be customized to cater to the requirements of a specific scenario in Credit Risk and Fraud detection domains.

References

  1. 1 LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521 (7553): 436.
  2. 2 McCulloch, W.S. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5 (4): 115–133.
  3. 3 Cowan, J.D. (1990). Neural networks: the early days. In: Advances in Neural Information Processing Systems, 828–842.
  4. 4 McClelland, J.L., Rumelhart, D.E., and PDP Research Group (1986). Parallel distributed processing. Explorations in the Microstructure of Cognition 2: 216–271.
  5. 5 Qian, N. and Sejnowski, T.J. (1988). Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology 202 (4): 865–884.
  6. 6 Kneller, D.G., Cohen, F.E., and Langridge, R. (1990). Improvements in protein secondary structure prediction by an enhanced neural network. Journal of Molecular Biology 214 (1): 171–182.
  7. 7 Rost, B. and Sander, C. (1994). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins: Structure, Function, and Bioinformatics 19 (1): 55–72.
  8. 8 Courbariaux, M., Hubara, I., Soudry, D., El‐Yaniv, R., & Bengio, Y. (2016). Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or −1. arXiv preprint arXiv:1602.02830.
  9. 9 Silver, D., Huang, A., Maddison, C.J. et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature 529 (7587): 484.
  10. 10 Esteva, A., Kuprel, B., Novoa, R.A. et al. (2017). Dermatologist‐level classification of skin cancer with deep neural networks. Nature 542 (7639): 115.
  11. 11 Gershenfeld, H.K., Neumann, P.E., Li, X. et al. (1999). Mapping quantitative trait loci for seizure response to a GABAA receptor inverse agonist in mice. Journal of Neuroscience 19 (10): 3731–3738.
  12. 12 Hahnloser, R.H.R., Sarpeshkar, R., Mahowald, M. et al. (2000). Digital selection and analogue amplification coexist in a cortex‐inspired silicon circuit. Nature 405 (6789): 947.
  13. 13 Werbos, P. (1974). Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD dissertation. Harvard University.
  14. 14 Wadell, I., Johansson, H., Sjölander, P. et al. (1991). Fusimotor reflexes influencing secondary muscle spindle afferents from flexor and extensor muscles in the hind limb of the cat. Journal de Physiologie 85 (4): 223–234.
  15. 15 Chen, D.S. and Jain, R.C. (1994). A robust backpropagation learning algorithm for function approximation. IEEE Transactions on Neural Networks 5 (3): 467–479.
  16. 16 Yu, X., Efe, M.O., and Kaynak, O. (2002). A general backpropagation algorithm for feedforward neural networks learning. IEEE Transactions on Neural Networks 13 (1): 251–254.
  17. 17 Khashman, A. (2008). A modified backpropagation learning algorithm with added emotional coefficients. IEEE Transactions on Neural Networks 19 (11): 1896–1909.
  18. 18 Sapna, S., Tamilarasi, A., and Kumar, M.P. (2012). Backpropagation learning algorithm based on Levenberg Marquardt Algorithm. Computer Science and Information Technology (CS and IT) 2: 393–398.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.85.167.119