Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

S. Mukhopadhyay, P. SamantaAdvanced Data Analytics Using Pythonhttps://doi.org/10.1007/978-1-4842-8005-8_5

5. Deep Learning and Neural Networks

Sayan Mukhopadhyay¹ and Pratip Samanta¹

(1)

Kolkata, West Bengal, India

Neural networks, specifically known as artificial neural networks (ANNs), were developed by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He defines a neural network as follows: “…a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs.”

Customarily, neutral networks are arranged in multiple layers. The layers consist of several interconnected nodes containing an activation function. The input layer, communicating to the hidden layers, delineates the patterns. The hidden layers are linked to an output layer.

Neural networks have many uses. As an example, you can cite the fact that in a passenger load prediction in the airline domain, passenger load in month t is heavily dependent on t-12 months of data rather on t-1 or t-2 data. Hence, the neural network normally produces a better result than the time-series model or even image classification. In a chatbot system, the memory network, which is actually a neural network of a bag of words of the previous conversation, is a popular approach. There are many ways to realize a neural network.

Backpropagation

Backpropagation, which usually substitutes an optimization method like gradient descent, is a common method of training artificial neural networks. The method computes the error in the outermost layer and backpropagates up to the input layer and then updates the weights as a function of that error, input, and learning rate. The final result is to minimize the error as far as possible.

Backpropagation Approach

Apply the input vector X_p = (x_p1, x_p2, …, x_pN)^t to the input units.

Calculate the net input values to the hidden layer units.

${net}_{pj}^h=sum limits_{i=1}^N{omega}_{ji}^h{x}_{pi}+{ heta}_j^h$

Calculate the outputs from the hidden layer.

${i}_{pj}={f}_j^hleft({net}_{pj}^h ight)$

Calculate the net input values to each unit.

${net}_{pk}^o=sum limits_{j=1}^L{omega}_{kj}^o{i}_{pj}+{ heta}_k^o$

Calculate the outputs.

${o}_{pk}={f}_k^oleft({net}_{pk}^o ight)$

Calculate the error terms for the output units.

${delta}_{pk}^o=left({y}_{pk}-{o}_{pk} ight){f_k^o}^{,}left({net}_{pk}^o ight)$

Calculate the error terms for the hidden units.

${delta}_{pj}^h={f_j^h}^{,}left({net}_{pj}^h ight)sum limits_k{delta}_{pk}^o{omega}_{kj}^o$

Update weights on the output layer.

${omega}_{kj}^oleft(t+1 ight)={omega}_{kj}^o(t)+{eta delta}_{pk}^o{i}_{pj}$

Update weights on the hidden layer.

${omega}_{ji}^hleft(t+1 ight)={omega}_{ji}^h(t)+{eta delta}_{pj}^h{x}_i$

Let’s see some code:

from numpy import exp, dot, array, random

class SimpleNN():

def __init__(self):

random.seed(2)

self.weights = random.random((3, 1))

# activation funtion

def __sigmoid(self, x):

return 1 / (1 + exp(-x))

# derivative of the Sigmoid function.

def __sigmoid_derivative(self, x):

return x * (1 - x)

# train the neural network and adjust weights

def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations):

for iteration in range(number_of_training_iterations):

output = self.predict(training_set_inputs)

error = training_set_outputs - output

adjustment = dot(training_set_inputs.T, error * self.__sigmoid_derivative(output))

self.weights += adjustment

# prediction

def predict(self, inputs):

return self.__sigmoid(dot(inputs, self.weights))

if __name__ == "__main__":

neural_network = SimpleNN()

print("Random starting weights: ")

print(neural_network.weights)

# The training set. We have 4 examples, each consisting of 3 input values

# and 1 output value.

training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])

training_set_outputs = array([[0, 1, 1, 0]]).T

neural_network.train(training_set_inputs, training_set_outputs, 10000)

print("New weights after training: ")

print(neural_network.weights)

# Test the neural network with a new situation.

print(neural_network.predict(array([1, 0, 0])))

Other Algorithms

Many techniques are available to train neural networks besides backpropagation. One of the methods is to use common optimization algorithms such as gradient descent, the Adam optimizer, and so on. The simple perception method is also frequently applied. Hebb’s postulate is another popular method. In Hebb’s learning, instead of the error, the product of the input and output goes as the feedback to correct the weight.

${w}_{ij}left(t+1 ight)={w}_{ij}(t)+eta {y}_j(t){x}_i(t)$

TensorFlow

TensorFlow is a popular deep learning library in Python. It is a Python wrapper on the original library. It supports parallelism on the CUDA–based GPU platform. The following code is an example of MNIST digit classification with TensorFlow:

import tensorflow as tf

# getting mnist data set

mnist = tf.keras.datasets.mnist

# loads data and splits into train and test set

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

# model

model = tf.keras.models.Sequential([

tf.keras.layers.Flatten(input_shape=(28, 28)),

tf.keras.layers.Dense(128, activation='relu'),

tf.keras.layers.Dropout(0.2),

tf.keras.layers.Dense(10, activation="softmax")

])

# loss funtion

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# model complie

model.compile(optimizer='adam',

loss=loss_fn,

metrics=['accuracy'])

# model fit

model.fit(x_train, y_train, epochs=5)

# model evaluation

model.evaluate(x_test, y_test, verbose=2)

People nowadays do not use raw TensorFlow code. They use the wrapper of Keras, which is code that uses data from several family surveys to determine the risk of delivery.

You can find the data in the deep_learning_keras_1st_example folder in the Git repository of the book. Please refer to https://github.com/Apress/advanced-data-analytics-python-2e/tree/main/deep_learning_keras_1st_example.

# Importing modules

import pandas as pd

import numpy as np

import csv

from keras.layers.core import Dense, Activation, Dropout

from keras.layers.recurrent import LSTM

from keras.models import Sequential

import sys

from keras.preprocessing.sequence import TimeseriesGenerator

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import keras

from sklearn.decomposition import PCA

from sklearn.model_selection import train_test_split

from plot_keras_history import plot_history

import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier

from keras.utils import np_utils

import pandas as pd

PATH_TO_FOLDER = ''

df_delivery = pd.read_csv(PATH_TO_FOLDER+"delivery (1).csv")

print(df_delivery.head())

df_family_survey = pd.read_csv(PATH_TO_FOLDER+"family_survey (1).csv")

print(df_family_survey.head())

df_merged = pd.merge(df_delivery, df_family_survey, how="left", left_on= ['hh_id'], right_on= ['hh_id'])

print(df_merged.columns)

print(df_merged.head())

print(df_merged.danger_signs_at_delivery.value_counts())

# Dropping unnecessary columns and typecasting

columns = ['delivery_id', 'patient_id', 'hh_id', 'delivery_date_time_submitted']

df_merged.drop(columns, inplace=True, axis=1)

print(df_merged.columns)

df_merged = pd.get_dummies(df_merged, columns = ['facility_delivery', 'first_visit_on_time', 'hand_washing_facilities', 'electricity', 'floor',

'highest_education_achieved'])

print(df_merged.head())

print(df_merged.columns)

y = pd.Categorical(df_merged.danger_signs_at_delivery).codes

print(y[:10])

# Scaling and PCA

X = df_merged

X.fillna(0,inplace=True)

np.nan_to_num(X)

# scaled_X = X

Xscaler = StandardScaler()

Xscaler.fit(X)

scaled_X = Xscaler.transform(X)

np.nan_to_num(scaled_X)

pca = PCA(.95)

pca.fit(scaled_X)

scaled_X = pca.transform(scaled_X)

print(scaled_X.shape)

#Train, test set splitting

train_x, test_x, train_y, test_y = train_test_split( scaled_X, np.array(y), test_size=1/7.0, random_state=0)

train_y=np_utils.to_categorical(train_y,num_classes=2)

test_y=np_utils.to_categorical(test_y,num_classes=2)

print("Shape of y_train",train_y.shape)

print("Shape of y_test",test_y.shape)

# Model architecture

model=Sequential()

model.add(Dense(500,input_dim=scaled_X.shape[1],activation='relu'))

model.add(Dense(200,activation='relu'))

model.add(Dense(100,activation='relu'))

model.add(Dropout(0.1))

model.add(Dense(2,activation='sigmoid'))

model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy',keras.metrics.Recall()])

print(model.summary())

history = model.fit(train_x,train_y,validation_data=(test_x,test_y),batch_size=500,epochs=4,verbose=1)

plot_history(history.history, path="standard.png")

plt.show()

prediction=model.predict(test_x)

y_label=np.argmax(test_y,axis=1)

predict_label=np.argmax(prediction,axis=1)

# accuracy=np.sum(y_label==predict_label)/length * 100

# print("Accuracy of the dataset",accuracy )

print(classification_report(y_label, predict_label))

Network Architecture and Regularization Techniques

Before moving on to the next section, not that as you raise the number of hidden layers in your network, the accuracy improves, but the application consumes more memory. Typically, people employ two to three hidden layers.

The Adam optimizer is commonly chosen because it combines gradient descent and stochastic gradient descent. People may add the coefficient of the resulting equation in the loss function because a big coefficient indicates overfitting. Another option is to use a dropout layer, which ignores a certain percentage of neurons chosen at random during time learning. For the classification problem, an entropy function, which is a measurement of the chaos of the system, should be used in the loss function. For binary and multiclass classification, there are multiple versions.

Note that the PCA presented in Chapter 3 is used here with a 95 percent variance. That is the standard.

Updatable Model and Transfer Learning

In deep learning, declaring a model trainable is all that is required to create an updatable machine learning model, as mentioned in Chapter 3. The following code is an example of an anomaly detection system. Every cloud service offers network packet information, and any instrument, such as Suricata, can send a security alert. The following preprocessor code processes data and marks the network packet as alert or nonalert depending on the alert type.

You can find the data in the frame_packet_june_14.csv file in the Git repository of the book. Please refer to the following links:

https://github.com/Apress/advanced-data-analytics-python-2e/blob/main/eve.json

https://github.com/Apress/advanced-data-analytics-python-2e/blob/main/frame_packet_June14.csv

# importing modules

import pandas as pd

import json

import gzip

import glob

import sys, codecs

#reading packet files and building data frame

PATH_TO_FOLDER = ""

# PATH_TO_FOLDER = "/home/ubuntu/AWS/"

df = pd.read_json(codecs.open(PATH_TO_FOLDER+'new-datasample-1.json','r','utf-8').read().replace("\","/"), lines=True)

print(df.columns)

#adding alert type and required typecasting

df['class_label'] = 0

df['alert_type'] = ""

# df['StartTime'] = pd.to_datetime(df['StartTime'])

df['src_port'] = df['src_port'].astype(int,errors = 'ignore')

df['dst_port'] = df['dst_port'].astype(int,errors = 'ignore')

df['source'] = df['source'].str.strip()

df['destination'] = df['destination'].str.strip()

print(df.head())

#for event txt file and update df if alert found

with open(PATH_TO_FOLDER+'new-dataeve.json','r') as fjson:

for line in fjson:

data = json.loads(line)#.decode('utf-8'))

if data['event_type'] == 'alert':

src_ip = data['src_ip'].strip()

src_port = int(data['src_port'])

dest_ip = data['dest_ip'].strip()

dest_port = int(data['dest_port'])

df.loc[(df.source == src_ip) & (df.destination == dest_ip) & (df.src_port == src_port) & (df.dst_port == dest_port),['class_label','alert_type']] = [1, data['alert']['category']]

#saving into training file csv

df_label = df[df.class_label == 1]

print(df_label)

df.to_csv('frame_packet_June14.csv')

Here’s the first iteration code (build the model and save it):

## Filename : predict_type_packet.py

## Purpose/Description : Main Classification Code

## Author : Sayan Mukhopadhyay

''' module to predict packet type and generate classification report '''

# importing modules

import pandas as pd

import numpy as np

import csv

from keras.layers.core import Dense, Activation, Dropout

from keras.layers.recurrent import LSTM

from keras.models import Sequential

import sys

from keras.preprocessing.sequence import TimeseriesGenerator

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import keras

from sklearn.decomposition import PCA

from sklearn.model_selection import train_test_split

from plot_keras_history import plot_history

import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier

from keras.utils import np_utils

'''global variables/parameters'''

UNITS1 = 1000

UNITS2 = 500

UNITS3 = 300

EPOCHS = 4

DROPOUT_RATE = 0.2

BATCH_SIZE = 500

'''reading training csv'''

df = pd.read_csv('frame_packet_June14.csv', nrows=49000)

'''droppping unneccessary columns and typecasting '''

columns = ['id', 'conn_flag', 'pckt_info', 'TSval', 'TSecr', 'SLE', 'SRE']#, 'time']

df.drop(columns, inplace=True, axis=1)

columns2 = ['Unnamed: 0', 'class_label']

df.drop(columns2, inplace=True, axis=1)

print(len(df.columns))

df = df.replace('-',0.0)

df = pd.get_dummies(df, columns = ['src_port', 'dst_port', 'source', 'destination', 'protocol'])

print(len(df.columns))

print(df.columns)

df['time'] = pd.to_datetime(df['time'])

df = df.sort_values(by="time")

df.set_index('time', inplace=True)

print(df.head())

df.alert_type = df.alert_type.fillna('NonAlert')

df['alert_type'] = df['alert_type'].astype('category')

'''coverting categorical columns into codes'''

df['alert_type'] = pd.Categorical(df.alert_type).codes

y = df['alert_type'].tolist()

print(y[:10])

labels = df['alert_type'].unique()

print(labels)

count = len(df['alert_type'].unique())

print("alert_type count ",count)

columns3 = ['alert_type']

df.drop(columns3, inplace=True, axis=1)

'''standardscaler transformation and PCA, chossing important columns'''

X = df

X.fillna(0,inplace=True)

np.nan_to_num(X)

# scaled_X = X

Xscaler = StandardScaler()

Xscaler.fit(X)

scaled_X = Xscaler.transform(X)

np.nan_to_num(scaled_X)

pca = PCA(400)

pca.fit(scaled_X)

scaled_X = pca.transform(scaled_X)

print(scaled_X.shape)

'''train, test set splitting label encoder'''

train_x, test_x, train_y, test_y = train_test_split( scaled_X, np.array(y), test_size=1/7.0, random_state=0)

train_y=np_utils.to_categorical(train_y,num_classes=5)

test_y=np_utils.to_categorical(test_y,num_classes=5)

print("Shape of y_train",train_y.shape)

print("Shape of y_test",test_y.shape)

''' dnn model architecture '''

model=Sequential()

model.add(Dense(UNITS1,input_dim=scaled_X.shape[1],activation='relu'))

model.add(Dense(UNITS2,activation='relu'))

model.add(Dense(UNITS3,activation='relu'))

model.add(Dropout(DROPOUT_RATE))

model.add(Dense(count,activation='softmax'))

model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy', keras.metrics.Recall()])

''' model fit'''

history = model.fit(train_x,train_y,validation_data=(test_x,test_y),batch_size=BATCH_SIZE,epochs=EPOCHS,verbose=1)

model.save('my_model_400.h5')

prediction=model.predict(train_x)

length=len(prediction)

y_label=np.argmax(train_y,axis=1)

predict_label=np.argmax(prediction,axis=1)

print(classification_report(y_label, predict_label))

matrix = confusion_matrix(y_label, predict_label)

print(matrix)

Here is the all-iteration model (load the model, train it, and save it):

''' importing modules '''

import pandas as pd

import numpy as np

import csv

from keras.layers.core import Dense, Activation, Dropout

from keras.layers.recurrent import LSTM

from keras.models import Sequential

import sys

from keras.preprocessing.sequence import TimeseriesGenerator

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import keras

from sklearn.decomposition import PCA

from sklearn.model_selection import train_test_split

from plot_keras_history import plot_history

import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier

from keras.utils import np_utils

from keras.models import load_model

'''global variables/parameters'''

'''reading training csv'''

df = pd.read_csv(' frame_packet_June14.csv')#, nrows=5000)

cols = ['time_x', 'source_x', 'destination_x', 'protocol_x', 'length_x', 'src_port',

'dst_port_x', 'Seq_x', 'Ack_x', 'Win_x', 'Len_x', 'MSS_x', 'WS_x', 'SACK_PERM_x',

'alert_type']

df = df[cols]

print(df.head())

'''droppping unneccessary columns and typecasting '''

df = df.replace('-',0.0)

df = pd.get_dummies(df, columns = ['src_port', 'dst_port_x', 'source_x', 'destination_x', 'protocol_x'])

print(len(df.columns))

print(df.columns)

print(df.loc[0,'time_x'])

df['time_x'] = pd.to_datetime(df['time_x'])

df = df.sort_values(by="time_x")

df.set_index('time_x', inplace=True)

df.alert_type = df.alert_type.fillna('NonAlert')

df['alert_type'] = df['alert_type'].astype('category')

'''coverting categorical columns into codes'''

codes = pd.Categorical(df.alert_type).codes

y = codes

print(y[:10])

count = len(df['alert_type'].unique())

print("alert_type count ",count)

columns3 = ['alert_type']

df.drop(columns3, inplace=True, axis=1)

'''standardscaler transformation and PCA, chossing important columns'''

X = df

X.fillna(0,inplace=True)

np.nan_to_num(X)

Xscaler = StandardScaler()

Xscaler.fit(X)

scaled_X = Xscaler.transform(X)

np.nan_to_num(scaled_X)

pca = PCA(400)

pca.fit(scaled_X)

scaled_X = pca.transform(scaled_X)

print(scaled_X.shape)

'''train, test set splitting and timeseries generator'''

train_x, test_x, train_y, test_y = train_test_split( scaled_X, np.array(y), test_size=1/7.0, random_state=0)

train_y=np_utils.to_categorical(train_y,num_classes=5)

test_y=np_utils.to_categorical(test_y,num_classes=5)

print("Shape of y_train",train_y.shape)

print("Shape of y_test",test_y.shape)

model = load_model('my_model_400.h5')

for i in range(3):

model.layers[i].trainable = True

history = model.fit(train_x,train_y,validation_data=(test_x,test_y),batch_size=500,epochs=4,verbose=1)

plot_history(history.history, path="standard.png")

plt.show()

'''model accuracy'''

prediction=model.predict(test_x)

y_label=np.argmax(test_y,axis=1)

predict_label=np.argmax(prediction,axis=1)

print(classification_report(y_label, predict_label))

labels = [i for i in range(0,count)]

matrix = confusion_matrix(y_label, predict_label)

print(matrix)

prediction=model.predict(train_x)

length=len(prediction)

y_label=np.argmax(train_y,axis=1)

predict_label=np.argmax(prediction,axis=1)

print(classification_report(y_label, predict_label))

matrix = confusion_matrix(y_label, predict_label)

print(matrix)

Note that PCA is not initialized with a 95 percent variance since we need the same number of parameters in each iteration; thus, we chose 15 as the number of parameters as the network packet had a total of 22 input parameters. PCA is not an updatable model. The development of an updatable feature selection model is still a research area.

This type of system is useful for a TV channel that wants to predict whether a user would churn. As they are not addicted to the channel, churn is defined as a user who does not renew their plan within 24 hours of it expiring. However, since hackers are continually trying new things, the network intrusion detection model always has new types of alerts. As a result, the system should cluster the data first, with each group representing a different type of alert. We have a classifier model for each category that determines if the alert is of that type.

Recurrent Neural Network

A recurrent neural network is an extremely popular kind of network where the output of the previous step goes to the feedback or is input to the hidden layer. It is an extremely useful solution for a problem like a sequence leveling algorithm or time-series prediction. One of the more popular applications of the sequence leveling algorithm is in an autocomplete feature of a search engine.

LSTM

In an RNN, the network takes feedback from past.X_(t) = K × X_(t − 1) = K² × X_(t − 2) = K^N × X_(t − N). Now, if K > 1, then K^N is very large; otherwise, if K < 1, then K^N is very small. To avoid this problem, network programmatically forgets some of its past state. LSTM does this.

This way, it can remember values over arbitrary intervals. LSTM works very well to classify, process, and predict time series given time lags of unknown duration. Relative insensitivity to gap length gives an advantage to LSTM over alternative RNNs, hidden Markov models, and other sequence learning methods.

RNN and HMM rely on the hidden state before emission/sequence. If we want to predict the sequence after 500 intervals instead of 5, LSTM can remember the states and predict properly.

To simplify this, suppose a person forgets old memories and remembers only recent things. But they need those memories from the past to perform some tasks in the future. This is the problem with traditional RNNs. Also, there is another person who remembers the important memories from the past along with the recent ones and deletes the useless memories from the past. This way, they can use that information to carry out the task more efficiently. This is the case with LSTM.

Each LSTM cell has three inputs, h_{t − 1}, c_{t − 1}, and x_t, and two outputs, h_t and c_t. For a given time t, h_t is the hidden state, c_t is the cell state or memory, and x_t is the current data point or input. The first sigmoid layer has two inputs, h_{t − 1} and x_t, where h_{t − 1} is the hidden state of the previous cell. It is known as the forget gate as its output selects the amount of information of the previous cell to be included. The output is a number in [0,1], which is multiplied (pointwise) with the previous cell state c_{t − 1}.

The network packet is a sequence, so the RNN can readily be applied to an anomaly detection system. One difference is that we store data in a cloud database and sort it by IP address and ports. This sorting is crucial since without a suitable sequence, you will not get a good result. The autocorrelation function, which is discussed in Chapter 3, can be used to determine whether the sequence is correct.

# importing modules

import pandas as pd

import numpy as np

import csv

from keras.layers.core import Dense, Activation, Dropout

from keras.layers.recurrent import LSTM

from keras.models import Sequential

import sys

from keras.preprocessing.sequence import TimeseriesGenerator

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import keras

from sklearn.decomposition import PCA

from sklearn.model_selection import train_test_split

from plot_keras_history import plot_history

import matplotlib.pyplot as plt

sys.path.append('../resource')

from MSSqlDb import MSSqlDbWrapper

mssql_instance = MSSqlDbWrapper("../config/config1.txt")

con = mssql_instance.get_connect()

df = pd.read_sql("SELECT * from packet_table order by IP, PORT;",con)

#global variables/parameters

LSTM_UNITS1 = 150

LSTM_UNITS2 = 100

LSTM_UNITS3 = 50

EPOCHS = 3

DROPOUT_RATE = 0.2

TIMESERIESLEN = 50

#reading training csv

df = pd.get_dummies(df, columns = ['src_port', 'dst_port', 'source', 'destination', 'protocol'])

df['time'] = pd.to_datetime(df['time'])

df = df.sort_values(by="time")

df.set_index('time', inplace=True)

print(df.head())

df['alert_type'] = df['alert_type'].astype('category')

#coverting categorical columns into codes

df['alert_type'] = pd.Categorical(df.alert_type).codes

y = df['alert_type'].tolist()

labels = df['alert_type'].unique()

count = len(df['alert_type'].unique())

print("alert_type count ",count)

columns3 = ['alert_type']

df.drop(columns3, inplace=True, axis=1)

#standardscaler transformation and PCA, chossing important columns

X = df

X.fillna(0,inplace=True)

np.nan_to_num(X)

Xscaler = StandardScaler()

Xscaler.fit(X)

scaled_X = Xscaler.transform(X)

np.nan_to_num(scaled_X)

#changing format of class labels

y_intermediate = [[0 for i in range(count)] for j in range(len(y))]

for i in range(len(y)):

y_intermediate[i][y[i]] = 1

y_final = np.array(y_intermediate)

# sys.exit()

#train, test set splitting and timeseries generator

train_x, test_x, train_y, test_y = train_test_split( scaled_X, y_final, test_size=1/7.0, random_state=0)

train_generator = TimeseriesGenerator(train_x,train_y, length=TIMESERIESLEN)

test_generator = TimeseriesGenerator(test_x, test_y, length=TIMESERIESLEN)

print(len(test_generator))

for i in range(len(test_generator)):

x, y = test_generator[i]

print(len(y))

print(len(test_y))

print("train generator")

for i in range(len(train_generator)):

x, y = train_generator[i]

print(len(y))

print(len(train_y))

#LSTM model train

model = Sequential()

model.add(LSTM(LSTM_UNITS1, activation='relu', input_shape=(TIMESERIESLEN,scaled_X.shape[1]),return_sequences=True))

model.add(Dropout(DROPOUT_RATE))

model.add(LSTM(LSTM_UNITS2,return_sequences=True))

model.add(Dropout(DROPOUT_RATE))

model.add(LSTM(LSTM_UNITS3,return_sequences=False))

model.add(Dropout(DROPOUT_RATE))

model.add(Dense(count, activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=[keras.metrics.Recall(), 'accuracy'])

history= model.fit_generator(train_generator,epochs=EPOCHS)

#model accuracy

plot_history(history.history, path="standard.png")

plt.show()

scores = model.evaluate(test_generator)

print("Final Score")

print(scores)

Reinforcement Learning

We’ll talk about reinforcement learning in this section. Learning from feedback is referred to as reinforcement learning. Reinforcement learning is one of three main types of machine learning approach alongside supervised and unsupervised machine learning. It’s used to learn models by performing specific tasks in a given environment. The program interacts with its surroundings and performs actions to move between different states. Actions are then either positively or negatively considered through reward or penalty. Successful actions are reinforced, and unsuccessful actions are penalized. A model will go through many different iterations to find the best possible sequence of actions to achieve a given goal. The following is the algorithm behind it.

TD0

Algorithm The TD(0) tabular algorithm is implemented by this function. After each transition, this function must be called.

function TD0(X,R,Y,V ):

X = Last state
Y = Next State
R = Instant reward connected with this transition
V = Array of estimated value

$$ Vleft[Y
ight]=Vleft[X
ight]+alpha ullet delta $$

where α is step size.

return V

Please pass the following file in a command prompt while running this function:

https://github.com/Apress/advanced-data-analytics-python-2e/blob/main/reinforcement_learning_td0_reduce_dat.csv

Here is an example of four-step TD0 with alpha and gamma 1:

import pandas as pd

import numpy as np

import sys

from sklearn.model_selection import train_test_split

import tensorflow as tf

from keras.models import Sequential

from keras.layers.core import Dense

from scipy.stats.stats import pearsonr

from math import sqrt

if len(sys.argv) < 3:

print("Usage is")

print("python assignment.py <input file path> <output file path> <No of split>")

exit(0)

#Read the input data

df = pd.read_csv(sys.argv[1])

split = int(sys.argv[3])

out = open(sys.argv[2],'w')

final_error = []

size = int(df.shape[0]/split)

matched = 0

relaxed_matched = 0

count = 0

square_sum = 0

sum = 0

#Run the code for each split of input

for i in range(split):

X = df.loc[i*size: (i+1)*size]

X = X.astype(float)

#Fill the missing values by the average of the column

X.fillna(X.mean(), inplace=True)

y = X['y']

X.drop('y', inplace=True, axis=1)

X_back = X

X = X.as_matrix()

y = y.as_matrix()

#split the data in test and training sample

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.4, random_state=42)

#normalize the data

while(1):

flag = True

for i in range(X_train.shape[1]):

if X_train[:,i].std() != 0:

X_train[:,i] = (X_train[:,i]- X_train[:,i].mean())/X_train[:,i].std()

X_test[:,i] = (X_test[:,i]- X_test[:,i].mean())/X_test[:,i].std()

else:

X_train = np.delete(X_train,i,1)

X_test = np.delete(X_test,i,1)

flag = False

break

if flag:

break

av = y_train.mean()

st = y_train.std()

y_train = (y_train- y_train.mean())/y_train.std()

index = []

i1 = 0

processed = 0

#select the columns which is correlated with y

while(1):

flag = True

for i in range(X_train.shape[1]):

if i > processed :

i1 = i1 + 1

corr = pearsonr(X_train[:,i], y_train)

PEr= .674 * (1- corr[0]*corr[0])/ (len(X_train[:,i])**(1/2.0))

if abs(corr[0]) < PEr:

X_train = np.delete(X_train,i,1)

X_test = np.delete(X_test,i,1)

index.append(X_back.columns[i1-1])

processed = i - 1

flag = False

break

if flag:

break

#drop the columns which is correlated with other input column

while(1):

flag = True

for i in range(X_train.shape[1]):

for j in range(i+1,X_train.shape[1]-1):

corr = pearsonr(X_train[:,i], X_train[:,j])

PEr= .674 * (1- corr[0]*corr[0])/ (len(X_train[:,i])**(1/2.0))

if abs(corr[0]) > 6*PEr:

X_train = np.delete(X_train,j,1)

X_test = np.delete(X_test,j,1)

flag = False

break

if flag:

break

#build the model to predict the y

learning_rate = 0.0001

model = Sequential([

Dense(64, activation=tf.nn.relu, input_shape=[X_train.shape[1]]),

Dense(64, activation=tf.nn.relu),

Dense(1)

])

optimizer = tf.train.RMSPropOptimizer(learning_rate)

model.compile(loss='mse',

optimizer=optimizer,

metrics=['mae', 'mse'])

model.fit(

X_train, y_train,

epochs=int(X_train.shape[1]/2), validation_split = 0.2, verbose=0)

predict = model.predict(X_train)

#build the model to predict the error in prediction

error = []

for i in range(len(predict)):

error.append(y_train[i] - predict[i][0])

error = np.array(error)

model_e = Sequential([

Dense(64, activation=tf.nn.relu, input_shape=[X_train.shape[1]]),

Dense(64, activation=tf.nn.relu),

Dense(1)

])

model_e.compile(loss='mse',

optimizer=optimizer,

metrics=['mae', 'mse'])

model_e.fit(

X_train, error,

epochs=int(X_train.shape[1]/2), validation_split = 0.2, verbose=0)

#predict the test data using the trained model

predict = model.predict(X_test)

err_p = model_e.predict(X_test)

predict = predict + err_p

predict = predict*st + av

for i in range(len(predict)):

error = y_test[i] - predict[i][0]

if abs(error) <= 3:

matched = matched + 1

if abs(error/y_test[i]) <= 0.1:

relaxed_matched = relaxed_matched + 1

square_sum = square_sum + error*error

sum = sum + error

count = count + 1

out.write("RMSE="+str(sqrt(square_sum/count))+' ')

out.write("matched count="+ str(matched) +' Total count=' + str(count) +' ')

out.write("ME="+str(sqrt(abs(sum)/count))+' ')

out.write("relaxed matched count="+ str(relaxed_matched) +' Total count=' + str(count) +' ')

out.close()

print("RMSE=",str(sqrt(square_sum/count)),' ')

print("matched count=", str(matched),' ', "Total count=", str(count),' ')

print("ME=",str(sqrt(abs(sum)/count)),' ')

print("relaxed matched count=", str(relaxed_matched),' ', "Total count=", str(count),' ')

You can use this method to boost accuracy in any regression problem by predicting error, but it’s a little trickier for classification.

TDλ

Algorithm This function uses replacing traces to perform the tabular TD(λ) algorithm. After each transition, this function must be called.

function TDLambda (X, R, Y, V, z)

X= Last state
Y= Next state
R= Instant reward connected with this transition
V = Array of estimated value
z= Array of eligibility traces

for all

do:

$$ zleft[x
ight]=gamma ullet lambda ullet zleft[x
ight] $$

if X = x then

z[x] = 1

end if

V[X] = V[x] + α · δ · z[x]

end for

return (V, z)

Example of Dialectic Learning

An algorithmic trader now wants to divide stock prices into three goal categories: same, up, and down. The class same denotes that the stock’s price has remained unchanged. The class up denotes that the stock’s price is going up. The class down denotes that the stock’s price is decreasing.

Ninety-seven percent of the data is classified as the class same. The time series is about 20,000 points long. For this type of problem, most people use biased sampling. We, on the other hand, did things differently. We divide the data into batches of 1,000 points and train the model with 1,000 data points to predict the following 100 in each iteration. In Keras, we use SoftMax regression with RNN with a sequence length of 100. We now calculate the probability of being up, down, or the same for each iteration. We also compute the probability distribution’s mean and standard deviation. We now use the following formula to determine the score for each class:

inc = (prob_increasing[j] - increasing_mean + k_inc*increasing_std)

dec = (prob_decreasing[j] - decreasing_mean + k_dec*decreasing_std)

same = (prob_same[j] - same_mean + k_same*same_std)

where K_inc, K_dec, and K_same are the constants initialized as 1.

Then we classify the data using the following logic:

if same > 0:

pr_status = 0

else:

if inc > dec:

pr_status = 100

else:

pr_status = -100

So, most of the data is classified in same and the rest is in the other class.

Then we calculate the following parameters in each iteration:

if acc_status == 0:

if pr_status == 0:

wrong_count_pos_same = wrong_count_pos_same + 1

total_count_acc_same = total_count_acc_same + 1

else:

if pr_status != 0:

wrong_count_neg_same = wrong_count_neg_same + 1

total_count_acc_not_same = total_count_acc_not_same + 1

where

wrong_count_pos_same = Count of points wrongly classified in the same class
total_count_acc_same = Count of points actually belonging to the same class
wrong_count_neg_same = Count of points wrongly not classified in the same class
total_count_acc_not_same = Count of points actually not in the class same

The same parameters are calculated for the up and down classes considering the points are not in the same class, and then after each iteration, the constants are adjusted with the following logic:

if total_count_acc_same != 0:

if wrong_count_neg_same/total_count_acc_same > .5:

k_same = 1.2 * k_same

if total_count_acc_not_same !=0:

if wrong_count_pos_same/total_count_acc_not_same > .5:

k_same = 0.9 * k_same

So, a dialectic was created between all candidate classes, and the points were adjusted depending on local trends of 100 points on top of the prediction based on the previous 1,000 points. This pattern improves the accuracy of classification with any model in stock point prediction. The original code is given next.

Please copy the CSV files from https://github.com/Apress/advanced-data-analytics-python-2e.

''''

#import matplotlib.pyplot as plt

import numpy as np

import csv

from keras.layers.core import Dense, Activation, Dropout

from keras.layers.recurrent import LSTM

from keras.models import Sequential

import sys

def read_data(path_to_dataset, path_to_target,

sequence_length=50,

ratio=1.0):

max_values = ratio * 2049280

with open(path_to_dataset) as f:

data = csv.reader(f, delimiter=",")

power = []

nb_of_values = 0

for line in data:

try:

power.append([float(line[1]),float(line[4]),float(line[7])])

nb_of_values += 1

except ValueError:

pass

if nb_of_values >= max_values:

break

with open(path_to_target) as f:

data = csv.reader(f, delimiter=",")

target = []

nb_of_values = 0

for line in data:

try:

target.append(float(line[0].strip()))

nb_of_values += 1

except ValueError:

pass

if nb_of_values >= max_values:

break

return power, target

def create_matrix(y_train):

y = [[0 for i in range(3)] for j in range(len(y_train))]

for i in range(len(y_train)):

if y_train[i] == -100:

y[i][0] = 1

else:

if y_train[i] == 100:

y[i][1] = 1

else:

if y_train[i] == 0:

y[i][2] = 1

return y

def process_data(power, target, sequence_length):

result = []

for index in range(len(power) - sequence_length-1):

result.append(power[index: index + sequence_length])

result = np.array(result)

#print(result.shape)

row = int(round(0.9 * result.shape[0]))

X_train = result[:row, :]

#X_train = train[:, :-1]

y_train = np.array(create_matrix(target))

#print(y_train.shape)

X_test = result[row:, :]

y_test = y_train[row:]

#print(y_test.shape)

y_train = y_train[:row]

#print(y_train.shape)

#print(X_train.shape)

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 3))

X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 3))

return [X_train, y_train, X_test, y_test]

def build_model():

model = Sequential()

layers = [3, 100, 50, 3]

model.add(LSTM(

layers[1],

input_shape=(None, layers[0]),

return_sequences=True))

model.add(Dropout(0.2))

model.add(LSTM(

layers[2],

return_sequences=False))

model.add(Dropout(0.2))

model.add(Dense(

layers[3]))

model.add(Activation('softmax'))

model.compile(loss="categorical_crossentropy", optimizer="adam")

return model

def run_network(data=None, target=None):

epochs = 2

ratio = 0.5

sequence_length = 50

X_train, y_train, X_test, y_test = process_data(

data, target, sequence_length)

model = build_model()

try:

model.fit(

X_train, y_train,

batch_size=512, nb_epoch=epochs, validation_split=0.05, verbose=0)

predicted = model.predict(X_test)

except KeyboardInterrupt:

exit(0)

return y_test, predicted

def convert(x):

if x[0] == 1:

return -100

if x[1] == 1:

return 100

if x[2] == 1:

return 0

if __name__ == '__main__':

path_to_dataset = ' dialectic_learnign_data.csv '

path_to_target = ' dialectic_learning_label.csv '

data, target = read_data(path_to_dataset, path_to_target)

k_inc = 1

k_dec = 1

k_same = 1

for i in range(0,len(data)-1000,89):

d = data[i:i+1001]

t = target[i:i+1001]

y_test, predicted = run_network(d,t)

prob_increasing = predicted[:,1]

increasing_mean = prob_increasing.mean()

increasing_std = prob_increasing.std()

prob_decreasing = predicted[:,0]

decreasing_mean = prob_decreasing.mean()

decreasing_std = prob_decreasing.std()

prob_same = predicted[:,2]

same_mean = prob_same.mean()

same_std = prob_same.std()

wrong_count_pos_same = 0

total_count_acc_not_same = 0

wrong_count_neg_same = 0

total_count_acc_same = 0

wrong_count_pos_up = 0

total_count_acc_not_up = 0

wrong_count_neg_up = 0

total_count_acc_up = 0

wrong_count_pos_down = 0

total_count_acc_not_down = 0

wrong_count_neg_down = 0

total_count_acc_down = 0

for j in range(len(predicted)-1):

inc = (prob_increasing[j] - increasing_mean + k_inc*increasing_std)

dec = (prob_decreasing[j] - decreasing_mean + k_dec*decreasing_std)

same = (prob_same[j] - same_mean + k_same*same_std)

acc_status = convert(y_test[j])

if same > 0:

pr_status = 0

else:

if inc > dec:

pr_status = 100

else:

pr_status = -100

if acc_status == 0:

if pr_status == 0:

wrong_count_pos_same = wrong_count_pos_same + 1

total_count_acc_same = total_count_acc_same + 1

else:

if pr_status != 0:

wrong_count_neg_same = wrong_count_neg_same + 1

total_count_acc_not_same = total_count_acc_not_same + 1

if acc_status == 100:

if pr_status != 100:

wrong_count_pos_up = wrong_count_pos_up + 1

total_count_acc_up = total_count_acc_up + 1

else:

if pr_status == 100:

wrong_count_neg_up = wrong_count_neg_up + 1

total_count_acc_not_up = total_count_acc_not_up + 1

if acc_status == -100:

if pr_status != -100:

wrong_count_pos_down = wrong_count_pos_down + 1

total_count_acc_down = total_count_acc_down + 1

else:

if pr_status == -100:

wrong_count_neg_down = wrong_count_neg_down + 1

total_count_acc_not_down = total_count_acc_not_down + 1

print(acc_status,',', pr_status)

if total_count_acc_same != 0:

if wrong_count_neg_same/total_count_acc_same > .5:

k_same = 1.2 * k_same

if total_count_acc_not_same !=0:

if wrong_count_pos_same/total_count_acc_not_same > .5:

k_same = 0.9 * k_same

if total_count_acc_up != 0:

if wrong_count_neg_up/total_count_acc_up > .5:

k_inc = 1.2 * k_inc

if total_count_acc_not_up !=0:

if wrong_count_pos_up/total_count_acc_not_up > .5:

k_inc = 0.9 * k_inc

if total_count_acc_down != 0:

if wrong_count_neg_down/total_count_acc_down > .5:

k_dec = 1.2 * k_dec

if total_count_acc_not_down !=0:

if wrong_count_pos_down/total_count_acc_not_down > .5:

k_dec = 0.9 * k_dec

You can find the data in the dialectic_learning_data.csv and dialectic_learning_label.csv files in the Git repository of the book.

Convolution Neural Networks

Now we will discuss another kind of neural network: convolution neural networks.

Each neuron in the input layer is linked to each output neuron in the following layer, forming a fully connected (FC) layer in classic feedforward neural networks (like the ones we studied previously in this book). In CNNs, however, FC layers are not used until the very final layer (or final few layers of a network).

Instead, we compute the output using convolutional filters applied to the input layer. When you use these convolutions, you get local connections, which means that every part of the input is linked to a portion of the output (we’ll explain this later in the chapter). In a CNN, each layer applies a separate set of filters, usually hundreds or thousands, and then merges the results.

During training, a CNN learns the values for these filters automatically. When it comes to picture categorization, our CNN could learn to do the following:

In the first layer, detect edges using raw pixel data.
In the second layer, use these edges to recognize shapes (i.e., blobs).
In the top layers of the network, use these shapes to detect higher-level characteristics like face structures.
The last layer is a classifier, which makes predictions about the contents of the picture based on these higher-level characteristics.

An RNN of a CNN is a suitable choice if you have a sequence of images, such as a video. We classify a sequence of images in the following code:

import numpy as np

from keras.models import Sequential

from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

from keras.utils import to_categorical

import tensorflow as tf

mnist = tf.keras.datasets.mnist

# loads data and splits into train and test set

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

# # Reshape the images.

train_images = np.expand_dims(x_train, axis=3)

test_images = np.expand_dims(x_test, axis=3)

num_filters = 4

filter_size = 2

pool_size = 2

# Build the model.

model = Sequential([

Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),

MaxPooling2D(pool_size=pool_size),

Flatten(),

Dense(10, activation='softmax'),

])

# Compile the model.

model.compile(

'adam',

loss='categorical_crossentropy',

metrics=['accuracy'],

)

# Train the model.

model.fit(

train_images,

to_categorical(y_train),

epochs=3,

validation_data=(test_images, to_categorical(y_test)),

)

# Predict on the first 10 test images.

predictions = model.predict(test_images[:10])

print(np.argmax(predictions, axis=1))

# Check our predictions against the ground truths.

print(x_test[:10])

Summary

This chapter is the heart of this book. We discussed neural networks such as RNN and CNN with reinforcement learning using real examples. In the next chapter, we will discuss some classical statistical methods to analyze time-series data, and the last chapter is all about how to scale your analytic application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Deep Learning and Neural Networks

Create new playlist

Sign In

Sign Up

5. Deep Learning and Neural Networks

Backpropagation

Backpropagation Approach

Other Algorithms

TensorFlow

Network Architecture and Regularization Techniques

Updatable Model and Transfer Learning

Recurrent Neural Network

LSTM

Reinforcement Learning

TD0

TDλ

Example of Dialectic Learning

Convolution Neural Networks

Summary

Table of Contents for
5. Deep Learning and Neural Networks