This chapter covers the following items:
–Algorithm for neural network
–Algorithm for classifying with multilayer perceptron
–Examples and applications
Artificial neural networks (ANN) serve as a computational tool modeled on the interconnection of neurons in the nervous systems of the human brain as well as the brain of other living things and organisms; these are also named and known as “artificial neural nets” and “neural nets.” Biological neural nets (BNN) are known as the naturally occurring equivalent of the ANN. Both ANN and BNN are network systems that are composed of atomic components. Such components are known as the “neurons.” ANN differ significantly from biological networks but many of the characteristics and concepts pertaining to biological systems are reproduced authentically in the artificial systems. ANN are sort of nonlinear processing systems. Such systems are appropriate for an extensive spectrum of tasks. These tasks are particularly those in which there are not any existing algorithms for the completion of the task. It is possible to train ANN for solving particular problems by using sample data as well as a teaching method. In this way, it would be possible for the identically constructed ANN to be capable of conducting various tasks in line with the training it has received. ANN can perform generalizations and acquire the ability of similarity recognition among different input patterns, particularly patterns that have been corrupted by noise.
Section 10.1 discusses an ANN topology/structure definition. The description of backpropagation algorithm is given in Section 10.2. The description of learning vector quantization (LVQ) algorithm is in Section 10.3.
First let us start with the definition of backpropagation. In fact, it is a neural network learning algorithm. The foundations of the neural networks field were laid by neuro-biologists along with psychologists. Such people were trying to find a way to develop and test computational analogs of neurons. General definition of a neural network is as follows: neural network is regarded as a set of connected input/ output units with each connection having a weight related to it [1–6]. There are certain phases involved. For example, during the learning phase, the network learns by adjusting the weights so that it will be capable of predicting the correct class label pertaining to the input datasets. Following a short definition, we can have a brief discussion on the advantages and disadvantages of neural networks. A drawback is that neural networks have long training times; for this reason, they are more appropriate for the connectivity of applications between the units. In the process, a number of parameters is required; these parameters are usually determined best empirically (to illustrate, we can give the examples of network structure). Due to their poor interpretability level, neural networks have been subject to criticism. For humans, it is a difficult task to construe the symbolic meaning that is behind the learned weights and hidden units in the network. For such reasons, neural networks have proven to be less appropriate for data mining purposes. As for the advantages, we know that neural networks include high tolerance for noisy data. In addition, they have the ability of classifying patterns where they have not been trained. Even if you do not have a lot of knowledge of the relationships between the attributes and classes, you may use neural networks. In addition, neural networks are considered to be well suited for continuous-valued inputs and outputs [7–11]. This quality is missing in most decision tree algorithms, though. As for applicability to real life, which is also the emphasis of our book, neural networks are said to be successful in terms of a wide range of data related to real world and fieldwork. They are used in pathology field, handwritten character recognition as well as in laboratory medicine. Neural network algorithms also have parallel characteristics with the parallelization techniques being able to be used for accelerating the process of computation. Another application for neural network algorithms is seen in rule extraction from trained neural networks, thanks to certain techniques developed recently. All of these developments and uses indicate the practicality and usefulness of neural networks for data mining particularly while doing numeric prediction as well as classification.
Varied kinds of neural network algorithms and neural networks exist. Among them we see backpropagation as the most popular one. Section 10.1.1 will provide insight into multilayer feed-forward networks (it is the kind of neural network on which the backpropagation algorithm performs).
The backpropagation algorithm carries out learning on a multilayer feed-forward neural network. It learns a set of weights for predicting the class label of datasets repetitively or iteratively [5–11]. A multilayer feed-forward neural network is composed of the following components: an input layer, one or more hidden layers and an output layer. Figure 10.1 presents an illustration of a multilayer feed-forward network.
As you can see from Figure 10.1, units exist on each of the layers. The inputs to the network signify the attributes that are measured for each training pertaining to the dataset. The inputs are fed concurrently into the units that constitute the input layer [11]. Subsequently, such inputs pass through the input layer. They are subsequently weighted and fed concurrently to a second layer of a unit that resembles a neuron. This neuronlike unit is also known as the hidden layer. The outputs of the hidden layer units may well be input to another hidden layer; this process is carried out this way. We know that the number of hidden layers is arbitrary. However, thinking in terms of practice, only one of them is used as a general practice. The weighted outputs of the last hidden layer are input to units that constitute the output layer, emitting the prediction of the network for the relevant datasets.
Following the layers, here are some definitions for the units:
Input units are the units in the input layer. The units in the hidden layers and output layer are called output units. In Figure 10.1, a multilayer neural network with two layers of output units can be seen. This is why it is also called a two-layer neural network. It is fully connected because each unit supplies input to each unit in the next forward layer. A weighted sum of the outputs from units in the former layer is taken as input by each of the output unit (see Figure 10.4). A nonlinear function is applied onto the weighted input. Multilayer feed-forward neural networks are capable of modeling the class prediction as a nonlinear combination of the inputs. In terms of statistics, they perform a nonlinear regression [12–14].
Backpropagation involves processing a dataset of training datasets iteratively. For the prediction of the network, each dataset is compared with the actual known target value. For classification problems, the target value could be the known class label of the training dataset. For numeric predictions and continuous values, you may have to modify the weights if you want to minimize the mean-squared error between the prediction of the network and the actual target value. And such modifications are carried out in the backward direction, that is, from the output layer through each hidden layer down to the first hidden layer. This is why the name backpropagation is appropriate. The weights usually converge at the end; it is not always the case though. As a result, you come to an end of the learning process as it stops. The flowchart of the algorithm is shown in Figure 10.2. The summary of the algorithm can be found in Figure 10.3. The relevant steps are defined with respect to inputs, outputs and errors. Despite appearing a bit difficult at the beginning, each step is in fact simple. So in time, it is possible that you become familiar with them [9–15].
The general configuration of FFBP algorithm is given as follows:
In these steps, you initialize the weights to small random numbers that can vary from –1.0 to 1.0, or –0.5 to 0.5. There is a bias in each unit linked with it. You also initialize the biases corresponding to small random numbers.
Each training dataset X is processed by the following steps:
Initially, the training dataset is fed to the input layer of the network. The step is called the propagation of the inputs forward. As a result, the inputs pass through the input units but they remain unchanged. Figure 10.4 shows the hidden output layer unit: The inputs to unit j are outputs from the previous layer, multiplied by their matching weights to form a weighted sum. This sum is added to the bias that is associated with unit j. Nonlinear activation function is applied to the net input. Due to practical reasons, the inputs to unit j are labeled y1, y2, ..., yn, where unit j in the first hidden layer; these inputs would match the input dataset x1, x2, ..., xn [16].
Its output, Oj, equals to its input value, Ij. Later, the calculation of the net input and output of each unit in the hidden as well as output layers is done. The net input in the hidden or output layers is calculated as a linear combination of its inputs. Figure 10.1 shows an illustration to make the understanding easier. Figure 10.4 presents a hidden layer or output layer unit. Each of these units have inputs. The units’ outputs are connected to it in the former layer [17–19]. It is known that each connection has a weight. Each input that is connected to the unit is multiplied by the weight corresponding to it and the sum is performed for the calculation of the net input, j, in a hidden or an output layer; the net input, Ij, to unit j is in line with the following denotation:
where wij is the weight of the connection from unit i in the former layer to unit j; Oi is the output of unit I from the former layer and θj represents the bias of the unit. The bias functions as a threshold because it is for the alteration of the unit activity. Each unit found in the hidden and output layers takes its net input. As a next step, an activation function is applied to it; the illustration is shown in Figure 10.4. The function indicates the neuron regarding the activation of characterized by the unit. Here, the sigmoid or logistic function is used. The net input Ij to unit j, then θj for the output of unit j, is calculated in line with the following formula:
This function also has the ability of mapping a large input area onto the smaller range of 0 to 1; it is also referred to as a squashing function. The logistic function is differentiable; it is nonlinear as well. It permits the backpropagation algorithm for the modeling of the classification problems that are inseparable linearly. The output values, Oj, for each hidden layer, together with the output layer, are calculated. This computation gives the prediction pertaining to the network [20–22]. For real-life and practical areas, saving the intermediate output values at each unit is a good idea since you will need them once again at later stages when you are doing the backpropagation of the error. This hint can reduce the amount of computation required, thus saving you a lot of energy and time as well.
In the step of backpropagating the error, as the name suggests, the error is propagated backward. How is it done? It is done by updating the weights and biases to reflect the error of the prediction of the network. For a unit j in the output layer, the error Errj is given as follows:
Here, Oj is the actual output of unit j, and Tj is the target value that is known regarding the relevant training dataset. Oj(1 − Oj) is the derivative of the logistic function. If you want to calculate the error of a hidden layer unit j, you have to consider the weighted sum of the errors of the units that are connected to unit j in the subsequent layer. We can find the error of a hidden layer unit j as follows:
where wjk is the weight of the connection from unit j to a unit k in the next higher layer, and Errk is the error of unit k.
The weights and biases are updated for the reflection of the propagated errors. The following equations are used for the updating of the weights:
In these equations Δwij refers to the change in weight wij.
In eq. (10.5), / is the learning rate, a constant that has a value between 0.0 and 1.0. In backpropagation, there is a learning involved, that is, to use a gradient descent method for performing the search of a set of weights. It is important that this will fit the training data so that the mean-squared distance between the class prediction of the network and the known target value of the datasets can be minimized [21]. The learning rate is helpful because it allows to prevent being getting caught at a local minimum in decision space. It also prompts to find the minimum global. If you have a learning rate that is too small, learning pace will be really slow. On the other hand, if the learning rate is extremely high, then it is possible to see oscillation between inadequate solutions occurring. An important rule to bear in mind is that you have to set the learning rate to 1/t, with t being the iteration number through the training set up until then.
The following equations provided are used for updating the biases:
In these equations, Δθj is the change in bias θj. The biases and weights are updated following the presentation of each dataset, which also known as case updating. In the strategy called iteration (E) updating, the weight and bias increments/ increases may well be accumulated in variables alternatively, and so this will allow updating the biases and weights after all the datasets in the training set are presented. Here, iteration (E) is a common term, which is defined as one iteration through the training set. Theoretically, the mathematical derivation of backpropagation uses iteration updating. As we have mentioned in the beginning of the book, real-life practices may be different than theoretical knowledge. As in this case, in practice, we see that case updating is more common since it is likely to give outcomes that are more accurate. As we know, accuracy is vital for scientific studies. Especially for medicine, accuracy affects a lot of vital decisions to be made for the life of patients [20–22].
Let us consider the terminating condition: Training stops in the process when the following are observed:
–all Δwij in the previous iteration are too small and below the identified threshold, or
–the percentage of datasets incorrectly classified in the previous iteration is below the threshold, or
–a prespecified number of iterations have terminated.
In practice, we require hundreds of thousands of iterations before the weights reach a convergence.
The efficiency of backpropagation is often questionable. The efficiency of the calculation is reliant on the time allocated for the training of the network. For |D| datasets and w weights, each iteration needs O(|D| ×w) time. In the worst-case scenario, one may imagine the number of iterations could be exponential in n, the number of inputs. In real life, the time required for the networks to reach convergence is changeable. It is possible to accelerate the training time by means of certain techniques. Simulated annealing is one such technique that guarantees convergence to a global optimum.
FFBP algorithm has been applied on economy (U.N.I.S.) dataset, multiple sclerosis (MS) dataset and WAIS-R dataset in Sections 10.2.1.1, 10.2.1.2 and 10.2.1.3, respectively.
As the second set of data, in the following sections, we use some data related to U. N.I.S. countries. The attributes of these countries’ economies are data regarding years, unemployment, GDP per capita (current international $), youth male (% of male labor force ages 15–24) (national estimate), …, GDP growth (annual %). (Data is composed of a total of 18 attributes.) Data belonging to USA, New Zealand, Italy, Sweden economies from 1960 to 2015 are defined based on the attributes given in Table 2.8. Economy dataset (http://data.worldbank.org) [15] is used in the following sections. For the classification of D matrix through FFBP algorithm, the first step training procedure is to be employed. For the training procedure, 66.66% of the D matrix can be split for the training dataset (151 ;× ;18), and 33.33% as the test dataset (77 × 18).
Following the classification of the training dataset being trained with FFBP algorithm, we can do the classification of the test dataset. Although the procedural steps of the algorithm may seem complicated at first, the only thing you have to do is to concentrate on the steps and grasp them. We can have a close look at the steps provided in Figure 10.7:
In order to do classification with FFBP algorithm, the test data from the economy dataset has been selected randomly as 33.3%. Figure 10.5 presents the multilayer feed-forward neural network of the economy dataset. The learning coefficient has been identified as 0.7 as can be seen from Figure 10.5. Table 10.5 lists the bias values and the initial values of the network. Calculation has been done by using the data attributes of the first country in the economy dataset. There are 18 attributes belonging to each subject in the economy dataset. If the economy with X = (1, 1,…, 1) data has the label of 1, the relevant classification will be USA. Dataset learns from the network structure and the output value of each node is calculated, which is listed in Table 10.6. Calculations are done for each of the nodes and in this way learning procedure is realized through the computations with the network. The errors are also calculated by the learning process. If the errors are not optimum error expected, the network realizes learning by the errors it makes. By doing backpropagation, the learning process is carried on. The error values are listed in Table 10.7. The weights and bias values updated can be seen in Table 10.6.
As can be seen in Figure 10.5, for the multilayer perceptron network, the sample economy data X = (1, 0,…, 1) have been chosen randomly. Now, let us find the class in the sample USA, New Zealand, Italy and Sweden from X as the dataset. The training procedure of the X through FFBP continues until the minimum error rate is achieved based on the iteration number to be determined. In Figure 10.8, the steps of FFBP algorithm have been provided as applied for each iteration.
Steps (1–3) Initialize all weights and biases in the network step (see Figure 10.6).
Steps (4–28) Before starting the training procedure of X sample listed in Table 10.1 by FFBP network, the initial input values (x), weight values (w) as well as bias values (θ) are introduced to Multilayer Perceptron (MLP) network.
In Table 10.2, net input (Ij) values are calculated along with one hidden layer (w) value based on eq. (10.1). The value of the output layer (Oj) is calculated according to eq. (10.2). The calculations regarding the learning in FFBP algorithm are carried out based on the node numbers specified.
Tables 10.1 and 10.2 represent the hidden layer, j: 4, 5, output layer j: 6, 7, 8, 9 and node numbers. In addition, the calculations regarding learning in FFBP are carried out according to these node numbers.
Table 10.1: Initial input, weight and bias values for sample economy (U.N.I.S.) dataset.
Table 10.2: Net input and output calculations for sample economy dataset.
j | Net input, Ij | Output, Oj |
4 | 0.1+ 0.5 + 0.1+ 0.5 = 1.2 | 1/1 + e1.2 = 0.231 |
5 | 0.6 + 0.4 – 0.1 + 0.4 = 1.3 | 1/1 + e1.3 = 0.214 |
6 | (0.8)(0.231) + (0.4)(0.214) + 0.5 = 0.707 | 1/1 + e0.707 = 0.330 |
7 | (0.5)(0.231) + (0.6)(0.214) – 0.6 = –0.3561 | 1/1 + e 0.3561 = 0.588 |
8 | (0.1)(0.231) + (0.2)(0.214) + 0.2 = 0.2659 | 1/1 + e 0.2659 = 0.434 |
9 | (–0.4)(0.231) + (0.1)(0.214) + 0.3 = 0.229 | 1/1 + e 0.229 = 0.443 |
Table 10.3 lists calculation of the error at each node; the error values of each node in the output layer Errj are calculated using eq. (10.4).
The calculation procedure of X sample data for the first iteration is completed by backpropagating the errors calculated in Table 10.4 from the output layer to the input layer. The bias values (θ) and the initial weight values (w) are calculated as well.
Table 10.3: Calculation of the error at each node.
J | Errj |
9 | (0.443)(1 – 0.443)(1– 0.443) = 0.493 |
8 | (0.434)(1 – 0.434)(1– 0.434) = 0.491 |
7 | (0.588)(1 – 0.588)(1 – 0.588) = 0.484 |
6 | (0.330)(1 – 0.330)(1 – 0.330) = 0.442 |
5 | (0.214)(1 – 0.214)[(0.442)(0.4) + (0.484)(0.6) + (0.491)(0.2) + (0.493)(0.1)] = 0.103 |
4 | (0.231)(1 – 0.231)[(0.442)(0.8) + (0.588)(0.5) + (0.434)(0.1) + (0.493)(–0.4)] = 0.493 |
Table 10.4: Calculations for weight and bias updating.
Weight (bias) | New value |
w46 | 0.8 + (0.7)(0.442)(0.231) = 0.871 |
w47 | 0.5+(0.7)(0.484)(0.231) = 0.578 |
w56 | 0.4 + (0.7)(0.442)(0.214) = 0.466 |
w57 | 0.6 + (0.7)(0.484)(0.214) = 0.672 |
w48 | 0.1 + (0.7)(0.491)(0.231) = 0.179 |
w58 | 0.2 + (0.7)(0.491)(0.214) = 0.273 |
w49 | 0.4 + (0.7)(0.493)(0.231) = – 0.320 |
w59 | 0.1 + (0.7)(0.493)(0.214) = 0.173 |
w14 | 0.1 + (0.7)(0.493)(1) = 0.445 |
w15 | 0.6 + (0.7)(0.103)(1) = 0.672 |
w24 | 0.5 + (0.7)(0.493)(1) = 0.845 |
w25 | 0.4 + (0.7)(0.103)(1) = 0.472 |
w34 | 0.1 + (0.7)(0.493)(1) = 0.445 |
w35 | – 0.1+ (0.7)(0.103)(1) = – 0.0279 |
θ9 | 0.3 + (0.7)(0.493) = 0.645 |
θ8 | 0.2 + (0.7)(0.491) = 0.543 |
θ7 | – 0.6 + (0.7)(0.484) = –0.261 |
θ6 | 0.5 + (0.7)(0.442) = 0.809 |
θ5 | 0.4 + (0.7)(0.103) = 0.472 |
θ4 | 0.5 + (0.7)(0.493) = 0.845 |
The weight values (w) and bias values (θ) as obtained from calculation procedures are applied for Iteration 2.
The steps in FFBP network for the attributes of an economy (see Table 2.8) have been applied up to this point. The steps for calculation of Iteration 1 are provided in Figure 10.7. Table 2.8 shows that the same steps will be applied for the other individuals in economy (U.N.I.S.) dataset, and the classification of countries economies can be calculated with minimum error.
In the economy dataset application, iteration number is identified as 1,000. The iteration in training procedure will end when the dataset learns max accuracy rate with [1– (min error × 100)] by the network of the dataset.
About 33.3% portion has been randomly selected from the economy training dataset and this portion is allocated as the test dataset. A question to be addressed at this point is as follows: how can we do the classification of a sample economy dataset (whose class is not known) by using the trained economy dataset?
The answer to this question is as follows: X dataset, whose class is not certain, and which is randomly selected from the training dataset is applied to the network. Afterwards, the net input and net output values of each node are calculated. (You do not have to do the computation and/or backpropagation of the error in this case. If the output node for each class is one, then the output node with the highest value determines the predicted class label for X.) The accuracy rate for classification by neural network for the test dataset has been found to be 57.234% regarding the learning of the class labels of USA, New Zealand, Italy and Sweden.
As presented in Table 2.12, MS dataset has data from the following groups: 76 sample belonging to RRMS, 76 sample to SPMS, 76 sample to PPMS and 76 sample belonging to healthy subjects of control group. The attributes of the control group are data regarding brain stem (MRI 1), corpus callosum periventricular (MRI 2), upper cervical (MRI 3) lesion diameter size (millimetric (mm)) in the MR image and EDSS score. Data are composed of a total of 112 attributes. It is known that using these attributes of 304 individuals, we can know whether the data belong to the MS subgroup or healthy group. How can we make the classification as to which MS patient belongs to which subgroup of MS including healthy individuals and those diagnosed with MS (based on the lesion diameters (MRI 1, MRI 2 and MRI 3) and number of lesion size for (MRI 1, MRI 2 and MRI 3) as obtained from MRI images and EDSS scores)? D matrix has a dimension of (304 × 112). This means D matrix includes the MS dataset of 304 individuals along with their 112 attributes (see Table 2.12 for the MS dataset). For the classification of D matrix through FFBP, the first step training procedure needs to be employed. For the training procedure, 66.66% of the D matrix can be split for the training dataset (203 × 112) and 33.33% as the test dataset (101 × 112).
Following the classification of the training dataset being trained with FFBP algorithm, we can do the classification of the test dataset. Although the procedural steps of the algorithm may seem complicated at first glance, the only thing you have to do is to concentrate on the steps and grasp them. For this, let us have a close look at the steps provided in Figure 10.5.
The test dataset has been chosen randomly as 33.33% from the MS dataset in order to do the classification through FFBP algorithm. Figure 10.6 presents the multilayer feed-forward neural network of the MS dataset. The learning coefficient has been identified as 0.8 as can be seen from Figure 10.7.
The initial and bias values of the network are listed in Table 10.1. Calculation was done using the data attributes of the first individual in the MS dataset. There are 112 attributes belonging to each subject in the MS dataset. If the person with X = (1, 0,…, 1) data has the label of 1, the relevant subgroup will be RRMS. Dataset learns from the network structure and the output value of each node is calculated, which is listed in Table 10.2. Calculations are done for each of the nodes and in this way learning procedure is realized through the computations with the network. The errors are also calculated through the learning process. If the errors are not optimum error expected, the network realizes learning through the errors it makes. By doing backpropagation, the learning process is carried on. The error values are listed in Table 10.3. The weights and bias values updated are listed in Table 10.4.
As can be seen in Figure 10.8, for the multilayer perceptron network, the sample MS data X = (1, 0,…, 1) have been chosen randomly. Now, let us find the class in the sample MS subgroup from X as the dataset. The training procedure of the X through FFBP continues up until the minimum error rate is achieved based on the iteration number to be determined. In Figure 10.5, the steps of FFBP algorithm are provided as applied for each iteration.
Steps (1–3) Initialize all weights and biases in network.
Steps (4–28) Before starting the training procedure of X sample in Table 10.1 by FFBP network, the initial input values (x), weight values (w) as well as bias values (θ) are introduced to MLP network.
In Table 10.5, net input (Ij) values are calculated along with one hidden layer (w) value in accordance with eq. (10.1). The value of the output layer (Oj) is calculated based on eq. (10.2). The calculations regarding the learning in FFBP algorithm are carried out based on the node numbers specified below.
Tables 10.5 and 10.6 provided represent the hidden layer, j: 4, 5, output layer j: 6, 7, 8, 9 and node numbers. Additionally, calculations concerning learning in FFBP are performed according to these node numbers.
Table 10.7 lists calculation of the error at each node. The error values of each node in the output layer Errj are calculated using eq. (10.4).
The calculation procedure of X sample data for the first iteration is completed by backpropagating the errors calculated in Table 10.8 from the output layer to the input layer. The bias values (θ) and the initial weight values (w) are calculated as well. The weight values (w) and bias values (θ) as obtained from calculation procedures are applied for Iteration 2.
Table 10.5: Initial input, weight and bias values for sample MS dataset.
Table 10.6: Net input and output calculations for sample MS dataset.
j | Net input, Ij | Output, Oj |
4 | 0.1+ 0–0.3–0.1 =–0.3 | 1/1 + e 0.3 = 0.425 |
5 | 0.2 + 0–0.2–0.2 =–0.2 | 1/1 + e0.2 = 0.450 |
6 | (0.1)(0.425)+(0.3)(0.450) + 0.4 = 0.577 | 1/1 + e0.557 = 0.359 |
7 | (0.1)(0.425)+(0.1)(0.450) + 0.1 = 0.187 | 1/1 + e0.187 = 0.453 |
8 | (0.2)(0.425) + (0.1)(0.450) + 0.2 = 0.33 | 1/1 + e0.33 = 0.418 |
9 | (0.4)(0.425)+(0.5)(0.450) + 0.3 = 0.695 | 1/1 + e0.695 = 0.333 |
Table 10.7: Calculation of the error at each node.
j | Errj |
9 | (0.333)(1–0.333)(1–0.333) = 0.444 |
8 | (0.418)(1–0.418)(1–0.418) = 0.486 |
7 | (0.453)(1–0.453)(1–0.453) = 0.495 |
6 | (0.359)(1–0.359)(1–0.359) = 0.460 |
5 | (0.450)(1–0.450)[(0.460)(0.3) + (0.495)(0.1) + (0.486)(0.1) + (0.444)(0.5)] = 0.113 |
4 | (0.425)(1–0.425) [(0.460)(0.1) + (0.495)(0.1) + (0.486)(0.1) + (0.444)(0.4)] = 0.078 |
The steps in FFBP network for the attributes of an MS patient (see Table 2.12) have been applied up to this point. The way of doing calculation for Iteration 1 is provided with their steps in Figure 10.5. Table 2.12 shows that the same steps will be applied for the other individuals in MS dataset, and the classification of MS subgroups and healthy individuals can be calculated with the minimum error.
In the MS dataset application, iteration number is identified as 1,000. The iteration in training procedure will end when the dataset learns max accuracy rate with [1– (min error × 100)] by the network of the dataset.
Table 10.8: Calculations for weight and bias updating.
Weight (/bias) | New value |
w46 | 0.1 + (0.8)(0.460)(0.425) = 0.256 |
w47 | 0.1 + (0.8)(0.495)(0.425) = 0.268 |
w56 | 0.3 + (0.8)(0.460)(0.450) = 0.465 |
w57 | 0.1 + (0.8)(0.495)(0.450) = 0.278 |
w48 | 0.2 + (0.8)(0.486)(0.425) = 0.365 |
w58 | 0.1 + (0.8)(0.486)(0.450) = 0.274 |
w49 | 0.1 + (0.8)(0.444)(0.425) = 0.250 |
w59 | 0.5 + (0.8)(0.444)(0.450) = 0.659 |
w14 | 0.2 + (0.8)(0.113)(1) = 0.290 |
w15 | –0.3 + (0.8)(0.113)(1) = –0.209 |
w24 | 0.1 + (0.8)(0.078)(0) = 0.1 |
w25 | 0.2 + (0.8)(0.113)(0) = 0.2 |
w34 | –0.3 + (0.8)(0.078)(1) = –0.237 |
w35 | –0.2 + (0.8)(0.113)(1) = –0.109 |
θ9 | –0.3 + (0.8)(0.444) = 0.655 |
θ8 | 0.2 + (0.8)(0.486) = 0.588 |
θ7 | 0.1 + (0.8)(0.495) = 0.496 |
θ6 | 0.4 + (0.8)(0.460) = 0.768 |
θ5 | –0.2 + (0.8)(0.113) = –0.109 |
θ4 | –0.1 + (0.8)(0.078) = –0.037 |
About 33.3% portion has been selected from the MS training dataset randomly and this portion is allocated as the test dataset. A question to be addressed at this point would be this: what about the case when you know the data but you do not know the subgroup details of MS or it is not known whether the group is a healthy one? How could the identification be performed for such a sample data class?
The answer to this question is as follows: X data, whose class is not certain, and which is randomly selected from the training dataset, are applied to the neural network. Subsequently, the net input and net output values of each node are calculated. (You do not have to do the computation and/ or backpropagation of the error in this case. If, for each class, there exists one output node, then the output node that has the highest value will determine the predicted class label for X.) The accuracy rate for classification by neural network has been found to be 84.1% regarding sample test data whose class is not known.
As presented in Table 2.19, the WAIS-R dataset has data where 200 belong to patient and 200 sample to healthy control group. The attributes of the control group are data regarding school education, gender, …, DM. MS data is composed of a total of 21 attributes. It is known that using these attributes of 400 individuals, we can know whether the data belong to patient or Healthy group. How can we make the classification as to which individual belongs to which patient or healthy individuals and those diagnosed with WAIS-R test (based on the school, education, gender, the DM, vocabulary, QIV, VIV …, DM)? D matrix has a dimension of 400 × 21. This means that D matrix includes the WAIS-R dataset of 400 individuals along with their 21 attributes (see Table 2.19 for the WAIS-R dataset). For the classification of D matrix through FFBP algorithm, the first step training procedure is to be employed. For the training procedure, 66.66% of the D matrix can be split for the training dataset (267 × 21) and 33.33% as the test dataset (133 × 21). Following the classification of the training dataset being trained with FFBP algorithm.
We can classify the test dataset. Although the procedural steps of the algorithm may seem complicated at first glance, the only thing you have to do is to concentrate on the steps and grasp them. We can have a look at the steps as presented in Figure 10.9.
In order to classify with FFBP algorithm, the test data from the WAIS-R dataset have been selected randomly as 33.3%. Figure 10.9 presents the multilayer feed-forward neural network of the WAIS-R dataset. The learning coefficient has been identified as 0.5 as can be seen from Figure 10.9. The initial and bias values of the network are listed in Table 10.9. Calculation was done using the data attributes of the first individual in the WAIS-R dataset. There are 21 attributes belonging to each subject in the WAIS-R dataset. If the person with X = (1, 0,…, 0) data has the label of 1, the relevant classification will be patient. Dataset learns from the network structure and the output value of each node is calculated, which is listed in Table 10.9. Calculations are done for each of the nodes, and in this way learning procedure is realized through the computations with the network. The errors are also calculated through the learning process. If the errors are not optimum error expected, the network realizes learning through the errors it makes. By doing backpropagation the learning process is carried on. The error values are listed in Table 10.10. The weights and bias values updated are listed in Table 10.11.
As shown in Figure 10.9, for the multilayer perceptron network, the sample WAIS-R data X = (1, 0, …, 0) have been chosen randomly. Now, let us find the class in the sample patient or healthy from X as the dataset. The training procedure of the X through FFBP continues up until the minimum error rate is achieved based on the iteration number to be determined. In Figure 10.10, the steps of FFBP algorithm have been provided as applied for each iteration.
Steps (1–3) Initialize all weights and biases in network step.
Steps (4–28) Before starting the training procedure of X sample in Table 10.9 by FFBP network, the initial input values (x), weight values (w) as well as bias values (θ) are introduced to MLP network.
Table 10.9: Initial input, weight and bias values for sample WAIS-R dataset.
Table 10.10: Net input and output calculations.
j | Net input, Ij | Output, Oj |
4 | 0.2 + 0 + 0–0.1 = 0.1 | 1/1 + e0.1 = 0.475 |
5 | –0.3 + 0 + 0 + 0.2 = –0.1 | 1/1 + e0.1 = 0.525 |
6 | (–0.3)(0.475)–(0.2)(0.525) + 0.1 = –0.1475 | 1/1 + e . = 0.525 |
7 | (0.3)(0.475) + (0.1)(0.525)–0.4 = –0.205 | 1/1 + e0.205 = 0.551 |
Table 10.11: Calculation of the error at each node.
j | Errj |
7 | (0.551)(1–0.551)(1–0.551) = 0.3574 |
6 | (0.536)(1–0.536)(1–0.536) = 0.1153 |
5 | (0.525)(1–0.525)[(0.525)(–0.2) + (0.3574)(0.1)] = –0.0009 |
4 | (0.475)(1–0.475)[(0.3574)(0.3) + (0.1153)(–0.3)] = 0.0181 |
In Table 10.10, net input (Ij) values are calculated along with one hidden layer (w) value based on eq. (10.1). The value of the output layer (Oj) is calculated according to eq. (10.2). The calculations regarding the learning in FFBP algorithm are carried out based on the node numbers specified below.
Tables 10.10 and 10.11 list the hidden layer, j: 4, 5, output layer j: 6, 7, 8, 9 and node numbers. In addition, the calculations regarding learning in FFBP are carried out according to these node numbers.
Table 10.11 lists calculation of the error at each node; the error values of each node in the output layer Errj are calculated using eq. (10. 4).
3.147.78.131