© Sayan Mukhopadhyay, Pratip Samanta 2023
S. Mukhopadhyay, P. SamantaAdvanced Data Analytics Using Pythonhttps://doi.org/10.1007/978-1-4842-8005-8_5

5. Deep Learning and Neural Networks

Sayan Mukhopadhyay1   and Pratip Samanta1
(1)
Kolkata, West Bengal, India
 

Neural networks, specifically known as artificial neural networks (ANNs), were developed by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He defines a neural network as follows: “…a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs.”

Customarily, neutral networks are arranged in multiple layers. The layers consist of several interconnected nodes containing an activation function. The input layer, communicating to the hidden layers, delineates the patterns. The hidden layers are linked to an output layer.

Neural networks have many uses. As an example, you can cite the fact that in a passenger load prediction in the airline domain, passenger load in month t is heavily dependent on t-12 months of data rather on t-1 or t-2 data. Hence, the neural network normally produces a better result than the time-series model or even image classification. In a chatbot system, the memory network, which is actually a neural network of a bag of words of the previous conversation, is a popular approach. There are many ways to realize a neural network.

A diagram illustrates how multiple input nodes are connected to a hierarchy of nodes in the hidden layers and the output node is predicted.

Figure 5-1

Neural network architecture

Backpropagation

Backpropagation, which usually substitutes an optimization method like gradient descent, is a common method of training artificial neural networks. The method computes the error in the outermost layer and backpropagates up to the input layer and then updates the weights as a function of that error, input, and learning rate. The final result is to minimize the error as far as possible.

Backpropagation Approach

Apply the input vector Xp = (xp1, xp2, …, xpN)t to the input units.

Calculate the net input values to the hidden layer units.
$$ {net}_{pj}^h=sum limits_{i=1}^N{omega}_{ji}^h{x}_{pi}+{	heta}_j^h $$
Calculate the outputs from the hidden layer.
$$ {i}_{pj}={f}_j^hleft({net}_{pj}^h
ight) $$
Calculate the net input values to each unit.
$$ {net}_{pk}^o=sum limits_{j=1}^L{omega}_{kj}^o{i}_{pj}+{	heta}_k^o $$
Calculate the outputs.
$$ {o}_{pk}={f}_k^oleft({net}_{pk}^o
ight) $$
Calculate the error terms for the output units.
$$ {delta}_{pk}^o=left({y}_{pk}-{o}_{pk}
ight){f_k^o}^{,}left({net}_{pk}^o
ight) $$
Calculate the error terms for the hidden units.
$$ {delta}_{pj}^h={f_j^h}^{,}left({net}_{pj}^h
ight)sum limits_k{delta}_{pk}^o{omega}_{kj}^o $$
Update weights on the output layer.
$$ {omega}_{kj}^oleft(t+1
ight)={omega}_{kj}^o(t)+{eta delta}_{pk}^o{i}_{pj} $$
Update weights on the hidden layer.
$$ {omega}_{ji}^hleft(t+1
ight)={omega}_{ji}^h(t)+{eta delta}_{pj}^h{x}_i $$
Let’s see some code:
from numpy import exp, dot, array, random
class SimpleNN():
    def __init__(self):
        random.seed(2)
        self.weights = random.random((3, 1))
    # activation funtion
    def __sigmoid(self, x):
        return 1 / (1 + exp(-x))
    # derivative of the Sigmoid function.
    def __sigmoid_derivative(self, x):
        return x * (1 - x)
    # train the neural network and adjust weights
    def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations):
        for iteration in range(number_of_training_iterations):
            output = self.predict(training_set_inputs)
            error = training_set_outputs - output
            adjustment = dot(training_set_inputs.T, error * self.__sigmoid_derivative(output))
            self.weights += adjustment
    # prediction
    def predict(self, inputs):
        return self.__sigmoid(dot(inputs, self.weights))
if __name__ == "__main__":
    neural_network = SimpleNN()
    print("Random starting weights: ")
    print(neural_network.weights)
    # The training set. We have 4 examples, each consisting of 3 input values
    # and 1 output value.
    training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
    training_set_outputs = array([[0, 1, 1, 0]]).T
    neural_network.train(training_set_inputs, training_set_outputs, 10000)
    print("New weights after training: ")
    print(neural_network.weights)
    # Test the neural network with a new situation.
print(neural_network.predict(array([1, 0, 0])))

Other Algorithms

Many techniques are available to train neural networks besides backpropagation. One of the methods is to use common optimization algorithms such as gradient descent, the Adam optimizer, and so on. The simple perception method is also frequently applied. Hebb’s postulate is another popular method. In Hebb’s learning, instead of the error, the product of the input and output goes as the feedback to correct the weight.
$$ {w}_{ij}left(t+1
ight)={w}_{ij}(t)+eta {y}_j(t){x}_i(t) $$

TensorFlow

TensorFlow is a popular deep learning library in Python. It is a Python wrapper on the original library. It supports parallelism on the CUDA–based GPU platform. The following code is an example of MNIST digit classification with TensorFlow:
import tensorflow as tf
# getting mnist data set
mnist = tf.keras.datasets.mnist
# loads data and splits into train and test set
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation="softmax")
])
# loss funtion
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# model complie
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])
# model fit
model.fit(x_train, y_train, epochs=5)
# model evaluation
model.evaluate(x_test,  y_test, verbose=2)

People nowadays do not use raw TensorFlow code. They use the wrapper of Keras, which is code that uses data from several family surveys to determine the risk of delivery.

You can find the data in the deep_learning_keras_1st_example folder in the Git repository of the book. Please refer to https://github.com/Apress/advanced-data-analytics-python-2e/tree/main/deep_learning_keras_1st_example.
# Importing modules
import pandas as pd
import numpy as np
import csv
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import sys
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import keras
import keras
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from plot_keras_history import plot_history
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from keras.utils import np_utils
import pandas as pd
PATH_TO_FOLDER = ''
df_delivery = pd.read_csv(PATH_TO_FOLDER+"delivery (1).csv")
print(df_delivery.head())
df_family_survey = pd.read_csv(PATH_TO_FOLDER+"family_survey (1).csv")
print(df_family_survey.head())
df_merged = pd.merge(df_delivery, df_family_survey, how="left", left_on=  ['hh_id'], right_on= ['hh_id'])
print(df_merged.columns)
print(df_merged.head())
print(df_merged.danger_signs_at_delivery.value_counts())
# Dropping unnecessary columns and typecasting
columns = ['delivery_id', 'patient_id', 'hh_id', 'delivery_date_time_submitted']
df_merged.drop(columns, inplace=True, axis=1)
print(df_merged.columns)
df_merged = pd.get_dummies(df_merged, columns = ['facility_delivery',  'first_visit_on_time', 'hand_washing_facilities', 'electricity', 'floor',
       'highest_education_achieved'])
print(df_merged.head())
print(df_merged.columns)
y = pd.Categorical(df_merged.danger_signs_at_delivery).codes
print(y[:10])
# Scaling and PCA
X = df_merged
X.fillna(0,inplace=True)
np.nan_to_num(X)
# scaled_X = X
Xscaler = StandardScaler()
Xscaler.fit(X)
scaled_X = Xscaler.transform(X)
np.nan_to_num(scaled_X)
pca = PCA(.95)
pca.fit(scaled_X)
scaled_X = pca.transform(scaled_X)
print(scaled_X.shape)
#Train, test set splitting
train_x, test_x, train_y, test_y = train_test_split( scaled_X, np.array(y), test_size=1/7.0, random_state=0)
train_y=np_utils.to_categorical(train_y,num_classes=2)
test_y=np_utils.to_categorical(test_y,num_classes=2)
print("Shape of y_train",train_y.shape)
print("Shape of y_test",test_y.shape)
# Model architecture
model=Sequential()
model.add(Dense(500,input_dim=scaled_X.shape[1],activation='relu'))
model.add(Dense(200,activation='relu'))
model.add(Dense(100,activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(2,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy',keras.metrics.Recall()])
print(model.summary())
history = model.fit(train_x,train_y,validation_data=(test_x,test_y),batch_size=500,epochs=4,verbose=1)
plot_history(history.history, path="standard.png")
plt.show()
prediction=model.predict(test_x)
y_label=np.argmax(test_y,axis=1)
predict_label=np.argmax(prediction,axis=1)
# accuracy=np.sum(y_label==predict_label)/length * 100
# print("Accuracy of the dataset",accuracy )
print(classification_report(y_label, predict_label))

Network Architecture and Regularization Techniques

Before moving on to the next section, not that as you raise the number of hidden layers in your network, the accuracy improves, but the application consumes more memory. Typically, people employ two to three hidden layers.

The Adam optimizer is commonly chosen because it combines gradient descent and stochastic gradient descent. People may add the coefficient of the resulting equation in the loss function because a big coefficient indicates overfitting. Another option is to use a dropout layer, which ignores a certain percentage of neurons chosen at random during time learning. For the classification problem, an entropy function, which is a measurement of the chaos of the system, should be used in the loss function. For binary and multiclass classification, there are multiple versions.

Note that the PCA presented in Chapter 3 is used here with a 95 percent variance. That is the standard.

Updatable Model and Transfer Learning

In deep learning, declaring a model trainable is all that is required to create an updatable machine learning model, as mentioned in Chapter 3. The following code is an example of an anomaly detection system. Every cloud service offers network packet information, and any instrument, such as Suricata, can send a security alert. The following preprocessor code processes data and marks the network packet as alert or nonalert depending on the alert type.

You can find the data in the frame_packet_june_14.csv file in the Git repository of the book. Please refer to the following links:

https://github.com/Apress/advanced-data-analytics-python-2e/blob/main/eve.json

https://github.com/Apress/advanced-data-analytics-python-2e/blob/main/frame_packet_June14.csv
# importing modules
import pandas as pd
import json
import gzip
import glob
import sys, codecs
#reading packet files and building data frame
PATH_TO_FOLDER = ""
# PATH_TO_FOLDER = "/home/ubuntu/AWS/"
df = pd.read_json(codecs.open(PATH_TO_FOLDER+'new-datasample-1.json','r','utf-8').read().replace("\","/"), lines=True)
print(df.columns)
#adding alert type and required typecasting
df['class_label'] = 0
df['alert_type'] = ""
# df['StartTime'] = pd.to_datetime(df['StartTime'])
df['src_port'] = df['src_port'].astype(int,errors = 'ignore')
df['dst_port'] = df['dst_port'].astype(int,errors = 'ignore')
df['source'] = df['source'].str.strip()
df['destination'] = df['destination'].str.strip()
print(df.head())
#for event txt file and update df if alert found
with open(PATH_TO_FOLDER+'new-dataeve.json','r') as fjson:
    for line in fjson:
        data = json.loads(line)#.decode('utf-8'))
        if data['event_type'] == 'alert':
            src_ip = data['src_ip'].strip()
            src_port = int(data['src_port'])
            dest_ip = data['dest_ip'].strip()
            dest_port = int(data['dest_port'])
            df.loc[(df.source == src_ip) & (df.destination == dest_ip) & (df.src_port == src_port) & (df.dst_port == dest_port),['class_label','alert_type']] = [1, data['alert']['category']]
#saving into training file csv
df_label = df[df.class_label == 1]
print(df_label)
df.to_csv('frame_packet_June14.csv')
Here’s the first iteration code (build the model and save it):
##  Filename : predict_type_packet.py
##  Purpose/Description : Main Classification Code
##  Author : Sayan Mukhopadhyay
''' module to predict packet type and generate classification report '''
# importing modules
import pandas as pd
import numpy as np
import csv
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import sys
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import keras
import keras
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from plot_keras_history import plot_history
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from keras.utils import np_utils
'''global variables/parameters'''
UNITS1 = 1000
UNITS2 = 500
UNITS3 = 300
EPOCHS = 4
DROPOUT_RATE = 0.2
BATCH_SIZE = 500
'''reading training csv'''
df = pd.read_csv('frame_packet_June14.csv', nrows=49000)
'''droppping unneccessary columns and typecasting '''
columns = ['id', 'conn_flag', 'pckt_info', 'TSval', 'TSecr', 'SLE', 'SRE']#, 'time']
df.drop(columns, inplace=True, axis=1)
columns2 = ['Unnamed: 0', 'class_label']
df.drop(columns2, inplace=True, axis=1)
print(len(df.columns))
df = df.replace('-',0.0)
df = pd.get_dummies(df, columns = ['src_port', 'dst_port', 'source', 'destination', 'protocol'])
print(len(df.columns))
print(df.columns)
df['time'] = pd.to_datetime(df['time'])
df = df.sort_values(by="time")
df.set_index('time', inplace=True)
print(df.head())
df.alert_type = df.alert_type.fillna('NonAlert')
df['alert_type'] = df['alert_type'].astype('category')
'''coverting categorical columns into codes'''
df['alert_type'] = pd.Categorical(df.alert_type).codes
y = df['alert_type'].tolist()
print(y[:10])
labels = df['alert_type'].unique()
print(labels)
count = len(df['alert_type'].unique())
print("alert_type count ",count)
columns3 = ['alert_type']
df.drop(columns3, inplace=True, axis=1)
'''standardscaler transformation and PCA, chossing important columns'''
X = df
X.fillna(0,inplace=True)
np.nan_to_num(X)
# scaled_X = X
Xscaler = StandardScaler()
Xscaler.fit(X)
scaled_X = Xscaler.transform(X)
np.nan_to_num(scaled_X)
pca = PCA(400)
pca.fit(scaled_X)
scaled_X = pca.transform(scaled_X)
print(scaled_X.shape)
'''train, test set splitting label encoder'''
train_x, test_x, train_y, test_y = train_test_split( scaled_X, np.array(y), test_size=1/7.0, random_state=0)
train_y=np_utils.to_categorical(train_y,num_classes=5)
test_y=np_utils.to_categorical(test_y,num_classes=5)
print("Shape of y_train",train_y.shape)
print("Shape of y_test",test_y.shape)
''' dnn model architecture '''
model=Sequential()
model.add(Dense(UNITS1,input_dim=scaled_X.shape[1],activation='relu'))
model.add(Dense(UNITS2,activation='relu'))
model.add(Dense(UNITS3,activation='relu'))
model.add(Dropout(DROPOUT_RATE))
model.add(Dense(count,activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy', keras.metrics.Recall()])
''' model fit'''
history = model.fit(train_x,train_y,validation_data=(test_x,test_y),batch_size=BATCH_SIZE,epochs=EPOCHS,verbose=1)
model.save('my_model_400.h5')
prediction=model.predict(train_x)
length=len(prediction)
y_label=np.argmax(train_y,axis=1)
predict_label=np.argmax(prediction,axis=1)
print(classification_report(y_label, predict_label))
matrix = confusion_matrix(y_label, predict_label)
print(matrix)
Here is the all-iteration model (load the model, train it, and save it):
''' importing modules '''
import pandas as pd
import numpy as np
import csv
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import sys
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import keras
import keras
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from plot_keras_history import plot_history
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from keras.utils import np_utils
from keras.models import load_model
'''global variables/parameters'''
'''reading training csv'''
df = pd.read_csv(' frame_packet_June14.csv')#, nrows=5000)
cols = ['time_x', 'source_x', 'destination_x', 'protocol_x', 'length_x', 'src_port',
       'dst_port_x', 'Seq_x', 'Ack_x', 'Win_x', 'Len_x', 'MSS_x', 'WS_x', 'SACK_PERM_x',
       'alert_type']
df = df[cols]
print(df.head())
'''droppping unneccessary columns and typecasting '''
df = df.replace('-',0.0)
df = pd.get_dummies(df, columns = ['src_port', 'dst_port_x', 'source_x', 'destination_x', 'protocol_x'])
print(len(df.columns))
print(df.columns)
print(df.loc[0,'time_x'])
df['time_x'] = pd.to_datetime(df['time_x'])
df = df.sort_values(by="time_x")
df.set_index('time_x', inplace=True)
df.alert_type = df.alert_type.fillna('NonAlert')
df['alert_type'] = df['alert_type'].astype('category')
'''coverting categorical columns into codes'''
codes = pd.Categorical(df.alert_type).codes
y = codes
print(y[:10])
count = len(df['alert_type'].unique())
print("alert_type count ",count)
columns3 = ['alert_type']
df.drop(columns3, inplace=True, axis=1)
'''standardscaler transformation and PCA, chossing important columns'''
X = df
X.fillna(0,inplace=True)
np.nan_to_num(X)
Xscaler = StandardScaler()
Xscaler.fit(X)
scaled_X = Xscaler.transform(X)
np.nan_to_num(scaled_X)
pca = PCA(400)
pca.fit(scaled_X)
scaled_X = pca.transform(scaled_X)
print(scaled_X.shape)
'''train, test set splitting and timeseries generator'''
train_x, test_x, train_y, test_y = train_test_split( scaled_X, np.array(y), test_size=1/7.0, random_state=0)
train_y=np_utils.to_categorical(train_y,num_classes=5)
test_y=np_utils.to_categorical(test_y,num_classes=5)
print("Shape of y_train",train_y.shape)
print("Shape of y_test",test_y.shape)
model = load_model('my_model_400.h5')
for i in range(3):
    model.layers[i].trainable = True
history = model.fit(train_x,train_y,validation_data=(test_x,test_y),batch_size=500,epochs=4,verbose=1)
plot_history(history.history, path="standard.png")
plt.show()
'''model accuracy'''
prediction=model.predict(test_x)
y_label=np.argmax(test_y,axis=1)
predict_label=np.argmax(prediction,axis=1)
print(classification_report(y_label, predict_label))
labels = [i for i in range(0,count)]
matrix = confusion_matrix(y_label, predict_label)
print(matrix)
prediction=model.predict(train_x)
length=len(prediction)
y_label=np.argmax(train_y,axis=1)
predict_label=np.argmax(prediction,axis=1)
print(classification_report(y_label, predict_label))
matrix = confusion_matrix(y_label, predict_label)
print(matrix)

Note that PCA is not initialized with a 95 percent variance since we need the same number of parameters in each iteration; thus, we chose 15 as the number of parameters as the network packet had a total of 22 input parameters. PCA is not an updatable model. The development of an updatable feature selection model is still a research area.

This type of system is useful for a TV channel that wants to predict whether a user would churn. As they are not addicted to the channel, churn is defined as a user who does not renew their plan within 24 hours of it expiring. However, since hackers are continually trying new things, the network intrusion detection model always has new types of alerts. As a result, the system should cluster the data first, with each group representing a different type of alert. We have a classifier model for each category that determines if the alert is of that type.

Recurrent Neural Network

A recurrent neural network is an extremely popular kind of network where the output of the previous step goes to the feedback or is input to the hidden layer. It is an extremely useful solution for a problem like a sequence leveling algorithm or time-series prediction. One of the more popular applications of the sequence leveling algorithm is in an autocomplete feature of a search engine.

LSTM

In an RNN, the network takes feedback from past.X(t) = K × X(t − 1) = K2 × X(t − 2) = KN × X(t − N). Now, if K > 1, then KN is very large; otherwise, if K < 1, then KN is very small. To avoid this problem, network programmatically forgets some of its past state. LSTM does this.

This way, it can remember values over arbitrary intervals. LSTM works very well to classify, process, and predict time series given time lags of unknown duration. Relative insensitivity to gap length gives an advantage to LSTM over alternative RNNs, hidden Markov models, and other sequence learning methods.

RNN and HMM rely on the hidden state before emission/sequence. If we want to predict the sequence after 500 intervals instead of 5, LSTM can remember the states and predict properly.

To simplify this, suppose a person forgets old memories and remembers only recent things. But they need those memories from the past to perform some tasks in the future. This is the problem with traditional RNNs. Also, there is another person who remembers the important memories from the past along with the recent ones and deletes the useless memories from the past. This way, they can use that information to carry out the task more efficiently. This is the case with LSTM.

Each LSTM cell has three inputs, h{t − 1}, c{t − 1}, and xt, and two outputs, ht and ct. For a given time t, ht is the hidden state, ct is the cell state or memory, and xt is the current data point or input. The first sigmoid layer has two inputs, h{t − 1} and xt, where h{t − 1} is the hidden state of the previous cell. It is known as the forget gate as its output selects the amount of information of the previous cell to be included. The output is a number in [0,1], which is multiplied (pointwise) with the previous cell state c{t − 1}.

The network packet is a sequence, so the RNN can readily be applied to an anomaly detection system. One difference is that we store data in a cloud database and sort it by IP address and ports. This sorting is crucial since without a suitable sequence, you will not get a good result. The autocorrelation function, which is discussed in Chapter 3, can be used to determine whether the sequence is correct.
# importing modules
import pandas as pd
import numpy as np
import csv
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import sys
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import keras
import keras
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from plot_keras_history import plot_history
import matplotlib.pyplot as plt
sys.path.append('../resource')
from MSSqlDb import MSSqlDbWrapper
mssql_instance = MSSqlDbWrapper("../config/config1.txt")
con = mssql_instance.get_connect()
df = pd.read_sql("SELECT * from packet_table order by IP, PORT;",con)
#global variables/parameters
LSTM_UNITS1 = 150
LSTM_UNITS2 = 100
LSTM_UNITS3 = 50
EPOCHS = 3
DROPOUT_RATE = 0.2
TIMESERIESLEN = 50
#reading training csv
df = pd.get_dummies(df, columns = ['src_port', 'dst_port', 'source', 'destination', 'protocol'])
df['time'] = pd.to_datetime(df['time'])
df = df.sort_values(by="time")
df.set_index('time', inplace=True)
print(df.head())
df['alert_type'] = df['alert_type'].astype('category')
#coverting categorical columns into codes
df['alert_type'] = pd.Categorical(df.alert_type).codes
y = df['alert_type'].tolist()
labels = df['alert_type'].unique()
count = len(df['alert_type'].unique())
print("alert_type count ",count)
columns3 = ['alert_type']
df.drop(columns3, inplace=True, axis=1)
#standardscaler transformation and PCA, chossing important columns
X = df
X.fillna(0,inplace=True)
np.nan_to_num(X)
Xscaler = StandardScaler()
Xscaler.fit(X)
scaled_X = Xscaler.transform(X)
np.nan_to_num(scaled_X)
#changing format of class labels
y_intermediate = [[0 for i in range(count)] for j in range(len(y))]
for i in range(len(y)):
    y_intermediate[i][y[i]] = 1
y_final = np.array(y_intermediate)
# sys.exit()
#train, test set splitting and timeseries generator
train_x, test_x, train_y, test_y = train_test_split( scaled_X, y_final, test_size=1/7.0, random_state=0)
train_generator = TimeseriesGenerator(train_x,train_y, length=TIMESERIESLEN)
test_generator = TimeseriesGenerator(test_x, test_y, length=TIMESERIESLEN)
print(len(test_generator))
for i in range(len(test_generator)):
  x, y = test_generator[i]
  print(len(y))
print(len(test_y))
print("train generator")
for i in range(len(train_generator)):
  x, y = train_generator[i]
  print(len(y))
print(len(train_y))
#LSTM model train
model =  Sequential()
model.add(LSTM(LSTM_UNITS1, activation='relu', input_shape=(TIMESERIESLEN,scaled_X.shape[1]),return_sequences=True))
model.add(Dropout(DROPOUT_RATE))
model.add(LSTM(LSTM_UNITS2,return_sequences=True))
model.add(Dropout(DROPOUT_RATE))
model.add(LSTM(LSTM_UNITS3,return_sequences=False))
model.add(Dropout(DROPOUT_RATE))
model.add(Dense(count, activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=[keras.metrics.Recall(), 'accuracy'])
history= model.fit_generator(train_generator,epochs=EPOCHS)
#model accuracy
plot_history(history.history, path="standard.png")
plt.show()
scores = model.evaluate(test_generator)
print("Final Score")
print(scores)

Reinforcement Learning

We’ll talk about reinforcement learning in this section. Learning from feedback is referred to as reinforcement learning. Reinforcement learning is one of three main types of machine learning approach alongside supervised and unsupervised machine learning. It’s used to learn models by performing specific tasks in a given environment. The program interacts with its surroundings and performs actions to move between different states. Actions are then either positively or negatively considered through reward or penalty. Successful actions are reinforced, and unsuccessful actions are penalized. A model will go through many different iterations to find the best possible sequence of actions to achieve a given goal. The following is the algorithm behind it.

TD0

Algorithm The TD(0) tabular algorithm is implemented by this function. After each transition, this function must be called.

function TD0(X,R,Y,V ):
  • X = Last state

  • Y = Next State

  • R = Instant reward connected with this transition

  • V = Array of estimated value
    $$ delta =R+gamma left[V(Y)-V(X)
ight] $$
$$ Vleft[Y
ight]=Vleft[X
ight]+alpha ullet delta $$

where α is step size.

return V

Please pass the following file in a command prompt while running this function:

https://github.com/Apress/advanced-data-analytics-python-2e/blob/main/reinforcement_learning_td0_reduce_dat.csv

Here is an example of four-step TD0 with alpha and gamma 1:
import pandas as pd
import numpy as np
import sys
from sklearn.model_selection import train_test_split
import tensorflow as tf
from keras.models import Sequential
from keras.layers.core import Dense
from scipy.stats.stats import pearsonr
from math import sqrt
if len(sys.argv) < 3:
      print("Usage is")
      print("python assignment.py    <input file path>    <output file path>    <No of split>")
      exit(0)
#Read the input data
df = pd.read_csv(sys.argv[1])
split = int(sys.argv[3])
out = open(sys.argv[2],'w')
final_error = []
size = int(df.shape[0]/split)
matched = 0
relaxed_matched = 0
count = 0
square_sum = 0
sum = 0
#Run the code for each split of input
for i in range(split):
    X = df.loc[i*size: (i+1)*size]
    X = X.astype(float)
    #Fill the missing values by the average of the column
    X.fillna(X.mean(), inplace=True)
    y = X['y']
    X.drop('y', inplace=True, axis=1)
    X_back = X
    X = X.as_matrix()
    y = y.as_matrix()
    #split the data in test and training sample
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.4, random_state=42)
    #normalize the data
    while(1):
        flag = True
        for i in range(X_train.shape[1]):
            if X_train[:,i].std() != 0:
                X_train[:,i] = (X_train[:,i]- X_train[:,i].mean())/X_train[:,i].std()
                X_test[:,i] = (X_test[:,i]- X_test[:,i].mean())/X_test[:,i].std()
            else:
                X_train = np.delete(X_train,i,1)
                X_test = np.delete(X_test,i,1)
                flag = False
                break
        if flag:
            break
    av = y_train.mean()
    st = y_train.std()
    y_train = (y_train- y_train.mean())/y_train.std()
    index = []
    i1 = 0
    processed = 0
    #select the columns which is correlated with y
    while(1):
        flag = True
        for i in range(X_train.shape[1]):
            if i > processed :
                i1 = i1 + 1
                corr = pearsonr(X_train[:,i], y_train)
                PEr= .674 * (1- corr[0]*corr[0])/ (len(X_train[:,i])**(1/2.0))
                if abs(corr[0]) < PEr:
                    X_train = np.delete(X_train,i,1)
                    X_test = np.delete(X_test,i,1)
                    index.append(X_back.columns[i1-1])
                    processed = i - 1
                    flag = False
                    break
        if flag:
            break
    #drop the columns which is correlated with other input column
    while(1):
        flag = True
        for i in range(X_train.shape[1]):
            for j in range(i+1,X_train.shape[1]-1):
                corr = pearsonr(X_train[:,i], X_train[:,j])
                PEr= .674 * (1- corr[0]*corr[0])/ (len(X_train[:,i])**(1/2.0))
                if abs(corr[0]) > 6*PEr:
                    X_train = np.delete(X_train,j,1)
                    X_test = np.delete(X_test,j,1)
                    flag = False
                    break
            break
        if flag:
            break
    #build the model to predict the y
    learning_rate = 0.0001
    model = Sequential([
        Dense(64, activation=tf.nn.relu, input_shape=[X_train.shape[1]]),
        Dense(64, activation=tf.nn.relu),
        Dense(1)
    ])
    optimizer = tf.train.RMSPropOptimizer(learning_rate)
    model.compile(loss='mse',
                  optimizer=optimizer,
                  metrics=['mae', 'mse'])
    model.fit(
    X_train, y_train,
    epochs=int(X_train.shape[1]/2), validation_split = 0.2, verbose=0)
    predict = model.predict(X_train)
    #build the model to predict the error in prediction
    error = []
    for i in range(len(predict)):
        error.append(y_train[i] - predict[i][0])
    error = np.array(error)
    model_e = Sequential([
        Dense(64, activation=tf.nn.relu, input_shape=[X_train.shape[1]]),
        Dense(64, activation=tf.nn.relu),
        Dense(1)
    ])
    model_e.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
    model_e.fit(
        X_train, error,
    epochs=int(X_train.shape[1]/2), validation_split = 0.2, verbose=0)
    #predict the test data using the trained model
    predict = model.predict(X_test)
    err_p = model_e.predict(X_test)
    predict = predict + err_p
    predict = predict*st + av
    for i in range(len(predict)):
        error = y_test[i] - predict[i][0]
        if abs(error) <= 3:
            matched = matched + 1
        if abs(error/y_test[i]) <= 0.1:
            relaxed_matched = relaxed_matched + 1
        square_sum = square_sum + error*error
        sum = sum + error
        count = count + 1
out.write("RMSE="+str(sqrt(square_sum/count))+' ')
out.write("matched count="+ str(matched) +' Total count=' + str(count) +' ')
out.write("ME="+str(sqrt(abs(sum)/count))+' ')
out.write("relaxed matched count="+ str(relaxed_matched) +' Total count=' + str(count) +' ')
out.close()
print("RMSE=",str(sqrt(square_sum/count)),' ')
print("matched count=", str(matched),' ', "Total count=", str(count),' ')
print("ME=",str(sqrt(abs(sum)/count)),' ')
print("relaxed matched count=", str(relaxed_matched),' ', "Total count=", str(count),' ')

You can use this method to boost accuracy in any regression problem by predicting error, but it’s a little trickier for classification.

TDλ

Algorithm This function uses replacing traces to perform the tabular TD(λ) algorithm. After each transition, this function must be called.

function TDLambda (X, R, Y, V, z)
  • X= Last state

  • Y= Next state

  • R= Instant reward connected with this transition

  • V = Array of estimated value

  • z= Array of eligibility traces
    $$ delta =R+gamma ullet V left[Y
ight]-Vleft[X
ight] $$
for all $$ xin mathcal{X} $$ do:
$$ zleft[x
ight]=gamma ullet lambda ullet zleft[x
ight] $$
    if X = x then
      z[x] = 1
    end if
    V[X] = V[x] + α · δ · z[x]
end for
return (V, z)

Example of Dialectic Learning

An algorithmic trader now wants to divide stock prices into three goal categories: same, up, and down. The class same denotes that the stock’s price has remained unchanged. The class up denotes that the stock’s price is going up. The class down denotes that the stock’s price is decreasing.

Ninety-seven percent of the data is classified as the class same. The time series is about 20,000 points long. For this type of problem, most people use biased sampling. We, on the other hand, did things differently. We divide the data into batches of 1,000 points and train the model with 1,000 data points to predict the following 100 in each iteration. In Keras, we use SoftMax regression with RNN with a sequence length of 100. We now calculate the probability of being up, down, or the same for each iteration. We also compute the probability distribution’s mean and standard deviation. We now use the following formula to determine the score for each class:

    inc = (prob_increasing[j] - increasing_mean + k_inc*increasing_std)

    dec = (prob_decreasing[j] - decreasing_mean + k_dec*decreasing_std)

    same = (prob_same[j] - same_mean + k_same*same_std)

where K_inc, K_dec, and K_same are the constants initialized as 1.

Then we classify the data using the following logic:
if same > 0:
      pr_status = 0
else:
      if inc > dec:
            pr_status = 100
      else:
            pr_status = -100

So, most of the data is classified in same and the rest is in the other class.

Then we calculate the following parameters in each iteration:
if acc_status == 0:
      if pr_status == 0:
            wrong_count_pos_same = wrong_count_pos_same + 1
      total_count_acc_same = total_count_acc_same + 1
else:
      if pr_status != 0:
            wrong_count_neg_same = wrong_count_neg_same + 1
      total_count_acc_not_same = total_count_acc_not_same + 1
where
  • wrong_count_pos_same = Count of points wrongly classified in the same class

  • total_count_acc_same = Count of points actually belonging to the same class

  • wrong_count_neg_same = Count of points wrongly not classified in the same class

  • total_count_acc_not_same = Count of points actually not in the class same

The same parameters are calculated for the up and down classes considering the points are not in the same class, and then after each iteration, the constants are adjusted with the following logic:
if total_count_acc_same != 0:
      if wrong_count_neg_same/total_count_acc_same > .5:
            k_same = 1.2 * k_same
      if total_count_acc_not_same !=0:
            if wrong_count_pos_same/total_count_acc_not_same > .5:
            k_same = 0.9 * k_same

So, a dialectic was created between all candidate classes, and the points were adjusted depending on local trends of 100 points on top of the prediction based on the previous 1,000 points. This pattern improves the accuracy of classification with any model in stock point prediction. The original code is given next.

Please copy the CSV files from https://github.com/Apress/advanced-data-analytics-python-2e.
''''
#import matplotlib.pyplot as plt
import numpy as np
import csv
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import sys
def read_data(path_to_dataset, path_to_target,
              sequence_length=50,
              ratio=1.0):
    max_values = ratio * 2049280
    with open(path_to_dataset) as f:
        data = csv.reader(f, delimiter=",")
        power = []
        nb_of_values = 0
        for line in data:
            try:
                power.append([float(line[1]),float(line[4]),float(line[7])])
                nb_of_values += 1
            except ValueError:
                pass
            if nb_of_values >= max_values:
                break
    with open(path_to_target) as f:
        data = csv.reader(f, delimiter=",")
        target = []
        nb_of_values = 0
        for line in data:
            try:
                target.append(float(line[0].strip()))
                nb_of_values += 1
            except ValueError:
                pass
            if nb_of_values >= max_values:
                break
    return power, target
def create_matrix(y_train):
    y = [[0 for i in range(3)] for j in range(len(y_train))]
    for i in range(len(y_train)):
        if y_train[i] == -100:
            y[i][0] = 1
        else:
            if y_train[i] == 100:
                y[i][1] = 1
            else:
                if y_train[i] == 0:
                    y[i][2] = 1
    return y
def process_data(power, target, sequence_length):
    result = []
    for index in range(len(power) - sequence_length-1):
        result.append(power[index: index + sequence_length])
    result = np.array(result)
    #print(result.shape)
    row = int(round(0.9 * result.shape[0]))
    X_train = result[:row, :]
    #X_train = train[:, :-1]
    y_train = np.array(create_matrix(target))
    #print(y_train.shape)
    X_test = result[row:, :]
    y_test = y_train[row:]
    #print(y_test.shape)
    y_train = y_train[:row]
    #print(y_train.shape)
    #print(X_train.shape)
    X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 3))
    X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 3))
    return [X_train, y_train, X_test, y_test]
def build_model():
    model = Sequential()
    layers = [3, 100, 50, 3]
    model.add(LSTM(
        layers[1],
        input_shape=(None, layers[0]),
        return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(
        layers[2],
        return_sequences=False))
    model.add(Dropout(0.2))
    model.add(Dense(
        layers[3]))
    model.add(Activation('softmax'))
    model.compile(loss="categorical_crossentropy", optimizer="adam")
    return model
def run_network(data=None, target=None):
    epochs = 2
    ratio = 0.5
    sequence_length = 50
    X_train, y_train, X_test, y_test = process_data(
        data, target, sequence_length)
    model = build_model()
    try:
        model.fit(
            X_train, y_train,
            batch_size=512, nb_epoch=epochs, validation_split=0.05, verbose=0)
        predicted = model.predict(X_test)
    except KeyboardInterrupt:
        exit(0)
    return y_test, predicted
def convert(x):
    if x[0] == 1:
        return -100
    if x[1] == 1:
        return 100
    if x[2] == 1:
        return 0
if __name__ == '__main__':
    path_to_dataset = ' dialectic_learnign_data.csv '
    path_to_target = ' dialectic_learning_label.csv '
    data, target = read_data(path_to_dataset, path_to_target)
    k_inc = 1
    k_dec = 1
    k_same = 1
    for i in range(0,len(data)-1000,89):
        d = data[i:i+1001]
        t = target[i:i+1001]
        y_test, predicted = run_network(d,t)
        prob_increasing = predicted[:,1]
        increasing_mean = prob_increasing.mean()
        increasing_std = prob_increasing.std()
        prob_decreasing = predicted[:,0]
        decreasing_mean = prob_decreasing.mean()
        decreasing_std = prob_decreasing.std()
        prob_same = predicted[:,2]
        same_mean = prob_same.mean()
        same_std = prob_same.std()
        wrong_count_pos_same = 0
        total_count_acc_not_same = 0
        wrong_count_neg_same = 0
        total_count_acc_same = 0
        wrong_count_pos_up = 0
        total_count_acc_not_up = 0
        wrong_count_neg_up = 0
        total_count_acc_up = 0
        wrong_count_pos_down = 0
        total_count_acc_not_down = 0
        wrong_count_neg_down = 0
        total_count_acc_down = 0
        for j in range(len(predicted)-1):
            inc = (prob_increasing[j] - increasing_mean + k_inc*increasing_std)
            dec = (prob_decreasing[j] - decreasing_mean + k_dec*decreasing_std)
            same = (prob_same[j] - same_mean +  k_same*same_std)
            acc_status = convert(y_test[j])
            if same > 0:
                pr_status = 0
            else:
                if inc > dec:
                    pr_status = 100
                else:
                    pr_status = -100
            if acc_status == 0:
                if pr_status == 0:
                    wrong_count_pos_same = wrong_count_pos_same + 1
                total_count_acc_same = total_count_acc_same + 1
            else:
                if pr_status != 0:
                    wrong_count_neg_same = wrong_count_neg_same + 1
                total_count_acc_not_same = total_count_acc_not_same + 1
                if acc_status == 100:
                    if pr_status != 100:
                        wrong_count_pos_up = wrong_count_pos_up + 1
                    total_count_acc_up = total_count_acc_up + 1
                else:
                    if pr_status == 100:
                        wrong_count_neg_up = wrong_count_neg_up + 1
                    total_count_acc_not_up = total_count_acc_not_up + 1
                if acc_status == -100:
                    if pr_status != -100:
                        wrong_count_pos_down = wrong_count_pos_down + 1
                    total_count_acc_down = total_count_acc_down + 1
                else:
                    if pr_status == -100:
                        wrong_count_neg_down = wrong_count_neg_down + 1
                    total_count_acc_not_down = total_count_acc_not_down + 1
            print(acc_status,',', pr_status)
        if total_count_acc_same != 0:
            if wrong_count_neg_same/total_count_acc_same > .5:
                k_same = 1.2 * k_same
        if total_count_acc_not_same !=0:
            if wrong_count_pos_same/total_count_acc_not_same > .5:
                k_same = 0.9 * k_same
        if total_count_acc_up != 0:
            if wrong_count_neg_up/total_count_acc_up > .5:
                k_inc = 1.2 * k_inc
        if total_count_acc_not_up !=0:
            if wrong_count_pos_up/total_count_acc_not_up > .5:
                k_inc = 0.9 * k_inc
        if total_count_acc_down != 0:
            if wrong_count_neg_down/total_count_acc_down > .5:
                k_dec = 1.2 * k_dec
        if total_count_acc_not_down !=0:
            if wrong_count_pos_down/total_count_acc_not_down > .5:
                k_dec = 0.9 * k_dec

You can find the data in the dialectic_learning_data.csv and dialectic_learning_label.csv files in the Git repository of the book.

Convolution Neural Networks

Now we will discuss another kind of neural network: convolution neural networks.

Each neuron in the input layer is linked to each output neuron in the following layer, forming a fully connected (FC) layer in classic feedforward neural networks (like the ones we studied previously in this book). In CNNs, however, FC layers are not used until the very final layer (or final few layers of a network).

A diagram defines how an input frame is convoluted, and features extracted. pooled, and classified through the fully connected layers to predict the output.

Figure 5-2

CNN architecture

Instead, we compute the output using convolutional filters applied to the input layer. When you use these convolutions, you get local connections, which means that every part of the input is linked to a portion of the output (we’ll explain this later in the chapter). In a CNN, each layer applies a separate set of filters, usually hundreds or thousands, and then merges the results.

During training, a CNN learns the values for these filters automatically. When it comes to picture categorization, our CNN could learn to do the following:
  • In the first layer, detect edges using raw pixel data.

  • In the second layer, use these edges to recognize shapes (i.e., blobs).

  • In the top layers of the network, use these shapes to detect higher-level characteristics like face structures.

  • The last layer is a classifier, which makes predictions about the contents of the picture based on these higher-level characteristics.

An RNN of a CNN is a suitable choice if you have a sequence of images, such as a video. We classify a sequence of images in the following code:
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from keras.utils import to_categorical
import tensorflow as tf
mnist = tf.keras.datasets.mnist
# loads data and splits into train and test set
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# # Reshape the images.
train_images = np.expand_dims(x_train, axis=3)
test_images = np.expand_dims(x_test, axis=3)
num_filters = 4
filter_size = 2
pool_size = 2
# Build the model.
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])
# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)
# Train the model.
model.fit(
  train_images,
  to_categorical(y_train),
  epochs=3,
  validation_data=(test_images, to_categorical(y_test)),
)
# Predict on the first 10 test images.
predictions = model.predict(test_images[:10])
print(np.argmax(predictions, axis=1))
# Check our predictions against the ground truths.
print(x_test[:10])

Summary

This chapter is the heart of this book. We discussed neural networks such as RNN and CNN with reinforcement learning using real examples. In the next chapter, we will discuss some classical statistical methods to analyze time-series data, and the last chapter is all about how to scale your analytic application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.6.75