© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
A. Pajankar, A. JoshiHands-on Machine Learning with Pythonhttps://doi.org/10.1007/978-1-4842-7921-2_16

16. Bringing It All Together

Ashwin Pajankar1   and Aditya Joshi2
(1)
Nashik, Maharashtra, India
(2)
Haldwani, Uttarakhand, India
 

The past chapters in this book have introduced data analysis methods, feature extraction techniques, and traditional machine learning and deep learning techniques. We have conducted multiple experiments on numeric, textual, and visual data and found how to analyze and tweak the performance.

When you are working as a part of a large team trying to solve a business or a research problem, or building a complex AI-powered software that will be used by millions of users, you have to plan the project with the end goal in mind (Figure 16-1). This brings you to consider the management and engineering side of data science.
Figure 16-1

Practices for successful deployment of machine learning projects

In this chapter, we’re going to discuss strategies for planning data science and artificial intelligence projects, tools for persisting the models, and hosting the models as a microservice that can be used in the evolving applications.

Data Science Life Cycle

Data science and artificial intelligence projects are complex, and it is very easy to get caught up in smaller details or focus too much on creating models and hosting them while losing the sight of long-term vision. Every data science project is different and might be managed using different frameworks and processes – however, all the projects have similar steps (Figure 16-2).
Figure 16-2

Iterative data science life cyle process

The process usually begins with focus on defining the business or research objectives and coming up with the artifacts that properly define the problem we are trying to solve. This leads to a clear understanding about the data that will be required, which then expands to analysis of data sources, technical expertise and cost required to obtain the data, and evaluation of data in terms of how nicely will it support in reaching the business objective. Once the data has been obtained, we might need to clean, preprocess, and, in some cases, combine multiple data sources to enrich the quality of data.

Next step in the process is model creation. Based on the business objectives and technological constraints, we decide what kind of solutions might be applicable to this problem. We often begin with simple experiments with basic feature engineering and out-of-the-box solutions and then proceed to more thorough model developments. Based on the type of data, chosen solution, and availability of computation power, this can take hours to days to development as well as training. This is closely tied with thorough evaluation and tuning.

This life cycle is not a rigid structure but shows the process at a top level. The aim of such processes is to provide a standard set of steps involved, along with details about information required for each such step, and the deliverables and documentations that are produced. One such highly popular framework is CRISP-DM.

CRISP-DM Process

CRoss Industry Structured Process for Data Mining is a process framework that defines the common tasks in data-intensive projects that are done in a series of phases with an aim to create repeatable processes for data mining applications. It is an open standard that is developed and followed by hundreds of large enterprises around the world. It was originally devised in 1996, which led to the creation of CRISP-DM Special Interest group that obtained funding from the European Commission and led to a series of workshops that over the past decades have been defining and refining the process and artifacts involved.
Figure 16-3

CRISP-DM methodology

Figure 16-3 shows how the process model is designed at four levels of abstraction. At the top level, the phases define several generic tasks that are meant to be well-defined, complete, and stable tasks, which are carried out as special tasks. There may be a generic task called Collect User Data, which may require specialized tasks like (1) export users table from the database, (2) find user location using external service, and (3) download data from user’s LinkedIn profile using the API. The fourth level covers the actual implementation of the specialized tasks – also covering a record of actions, decision, and results of the task that is performed.

There are six phases of the CRISP-DM model. The following sections describe each one.

Phase 1: Business Understanding

Before diving deeper into the project, the first step is to understand the end goal of the project from the stakeholder’s point of view. There might be conflicting objectives, which, if not analyzed at this level, may lead to unnecessary repetition costs. By the end of this phase, we will have a clear set of business objectives and business success criteria. We also conduct analysis of resources availability and risk during assess situation. After this, we then define the goals of the project from a technical data mining perspective and produce a project plan.

Phase 2: Data Understanding

This phase involves tasks for collecting initial data. Most projects require data from multiple sources which need to be integrated – that can be covered either in this phase of the next. However, the important part here is to create an initial data collection report that explains how the data was acquired and what problems were encountered. This phase also covers data exploration and describing the data along with verifying data quality. Any potential data quality issues must be addressed.

Phase 3: Data Preparation

The data preparation phase assumes that initial data has been obtained and studied and potential risks have been planned. The end goal of this goal is to produce the ready-to-use datasets that will be used for modelling or analysis. An additional artifact will describe the dataset.

As a part of this phase, select the datasets – and for each dataset, document the reasons for inclusion and exclusion. This is followed by data cleaning, in which the data quality is improved. This may involve transformation, deriving more attributes or enriching the datasets. After cleaning, transformation, and integration, the data is formatted to make it simpler to load the data in the future stages.

Phase 4: Modelling

Modelling is the phase in which you build and assess various models based on the different modelling and machine learning techniques we have studied so far. At the first step, the modelling technique to be used is selected. There will be different instances of this task based on different modelling methods or algorithms that you wish to explore and evaluate. You will generate a test design, build the model, assess it thoroughly, and evaluate how closely the model fits the technical needs of the system.

Phase 5: Evaluation

Evaluation phase looks broadly at which model meets the business needs. The tasks involved in this phase test the models in real application and assess the results generated. After this, there are tasks on Review process, in which we do a thorough review of the data mining engagement in order to determine if there is any important factor or task that should have been covered. Finally, we determine the next steps to decide whether the models require further tuning or move to the deployment of the model. At the end of this phase, we have documented the quality of the models and a list of possible actions that should be taken next.

Phase 6: Deployment

The final phase, deployment, is the phase that brings the work done so far to the actual use. This phase varies widely based on the business needs, organization policies, and engineering needs. This begins with planning deployment, involves developing a deployment plan containing the strategy for deployment. We also need to plan a thorough monitoring and maintenance plan to avoid issues after the end-to-end project has been launched. Finally, the project team documents a summary of the project and conducts a project review to discuss and document what went well, what could have been better, and how to improve in the future.

In practice, most organizations use these phases as guidelines and create their own processes based on their budgets, governance requirements, and needs. Many small-scale teams might not follow these steps and get captured in a long loop of iterations and iterations of development and improvements, not being able to avoid the pitfalls that otherwise could have been well planned and handled if these were studied.

In the next part of this chapter, we will study the technical aspects of development and deployment of data science and AI projects.

How ML Applications Are Served

Once a model has been created, it has to be integrated with larger enterprise application. A most common form of serving the models is as a service or a microservice. The aim of this kind of architecture (Figure 16-4) is to encapsulate the whole workflow of prediction/inference process including data preparation, feature extraction, loading a previously created model, predicting the output values, and, often, logging together through an easy-to-use interface. These interfaces are most commonly served as an endpoint in a web server.
Figure 16-4

Serving ML models as a microservice

In larger applications, these servers are hosted on cloud often through Docker for easy deployment. The concept of deploying, monitoring, and maintaining machine learning models for AI applications is being expanded into well-structured concepts in the form of MLOps.

In the next few pages, we will take a small project that will be eventually hosted as an ML application.

Learning with an Example

In this mini-project, we will build a sentiment analysis tool using PyTorch with an aim to experiment with model architecture to achieve relatively good performance, save the parameters, and host it using flask.

The first attempt toward sentiment analysis was General Inquirer system, published in 1961. The typical task in sentiment analysis is text polarity classification, where the classes of interest are positive and negative, sometimes with a neutral class. With advancement in computational capabilities, machine learning algorithms, and later deep learning, sentiment analysis is much more accurate and prevalent in a lot of situations.

Defining the Problem

Sentiment analysis is a vast field that covers the problem of identifying emotions, opinions, moods, and attitudes. There are also many names and slightly different tasks, for example, sentiment analysis, opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysis, emotion analysis, review mining, etc.

In this problem, we will build a model for classifying whether a movie review sentence is positive, negative, or neutral. In traditional machine learning approaches, feature engineering would be the primary task. A feature vector is a representation of actual content (document, tweet, etc.) that the classification algorithm takes as an input. The purpose of a feature, other than being an attribute, would be much easier to understand in the context of a problem. A feature is a characteristic that might help when solving the problem.

In a deep learning solution, we can either use embeddings or sequence of characters. But first, we have to obtain the data.

In some cases, you will collect the data through your database logs, or hire a data gathering team, or like in our case, get lucky and stumble over a freely available dataset. A 50,000-item movie review dataset1 has been gathered and prepared by Stanford, published in 2011.

Data

You can download the data from their webpage though the solutions we’re going to explain here will work equally fine with other datasets including the ones from product reviews or social media text. The dataset downloaded from the website contains a compressed tar file, which after decompression expands into two folders, namely, test and train, and some additional files containing information about the dataset. An alternate copy of the dataset that has been preprocessed is available on Kaggle,2 which is shared by Lakshmipathi N.

The dataset contains 50,000 reviews, each of which is marked as positive or negative. This gives an indication about the last layer of the neural network structure – all we need is a single node with sigmoid activation function. If there were more than two classes, say, positive, negative, or neutral, we would create three nodes, each representing a sentiment class label. The node with the highest value would indicate the predicted result.

Assuming that you have downloaded the dataset containing one review and sentiment per row in a CSV, we can start exploring it.
import pandas as pd
dataset = "data/IMDB Dataset.csv"
df = pd.read_csv(dataset, sep=",")
sample_row = df.sample()
sample_row['review'].values
Out:
array(["Etienne Girardot is just a character actor--the sort of person people almost never would know by name. However, he once again plays the coroner--one of the only actors in the Philo Vance films that played his role more than once.

The output has been truncated here for brevity. You can see the labelled sentiment for this review using sample_row['sentiment'], which is positive for this sample.

To prepare for a machine learning experiment, we’ll divide the data into training and testing datasets.
from sklearn.model_selection import train_test_split
X,y = df['review'].values,df['sentiment'].values
X_traindata,X_testdata,y_traindata,y_testdata = train_test_split(X,y,stratify=y)
print(f'Training Data shape : {X_traindata.shape}')
print(f'Test Data shape     : {X_testdata.shape}')
print(f'Training Target shape : {y_traindata.shape}')
print(f'Test Target shape     : {y_testdata.shape}')
Out:
Training Data shape : (37500,)x
Test Data shape     : (12500,)
Training Target shape : (37500,)
Test Target shape     : (12500,)

We know that most models require the data to be converted to a particular format. In our RNN-based model, we will need to convert the data into a sequence of numbers – where each number represents a word in the vocabulary.

In preprocessing stage, we will need to (1) convert all the words to lowercase, (2) tokenize and clean the string, (3) remove stop words, and (4) based on our knowledge of words in the training corpus, prepare a dictionary of words and convert all the words to numbers based on the dictionary.

We will also convert the sentiment labels so that negative is represented by a 0 and positive by a 1. In this implementation, we are limiting the vocabulary size to 2000 – thus, the most frequent 2000 words will be considered while creating the sequences and others will be ignored. You can experiment with changing this number based on computational capacity and the target quality of results. The implementation is shown here:
import re
import numPy as np
from collections import Counter
def preprocess_string(s):
    s = re.sub(r"[^ws]", '', s)
    s = re.sub(r"s+", '', s)
    s = re.sub(r"d", '', s)
    return s
def mytokenizer(x_train,y_train,x_val,y_val):
    word_list = []
    stop_words = set(stopwords.words('english'))
    for sent in x_train:
        for word in sent.lower().split():
            word = preprocess_string(word)
            if word not in stop_words and word != '':
                word_list.append(word)
    corpus = Counter(word_list)
    corpus_ = sorted(corpus,key=corpus.get,reverse=True)[:2000]
    onehot_dict = {w:i+1 for i,w in enumerate(corpus_)}
    final_list_train,final_list_test = [],[]
    for sent in x_train:
            final_list_train.append([onehot_dict[preprocess_string(word)] for word in sent.lower().split() if preprocess_string(word) in onehot_dict.keys()])
    for sent in x_val:
            final_list_test.append([onehot_dict[preprocess_string(word)] for word in sent.lower().split() if preprocess_string(word) in onehot_dict.keys()])
    encoded_train = [1 if label =='positive' else 0 for label in y_train]
    encoded_test = [1 if label =='positive' else 0 for label in y_val]
    return np.array(final_list_train), np.array(encoded_train),np.array(final_list_test), np.array(encoded_test),onehot_dict
We can now prepare the training and test arrays using
X_train,y_train,X_test,y_test,vocab = mytokenizer(X_traindata,y_traindata,X_testdata,y_testdata)
There's a possibility that you might get an error message that looks like this:
LookupError:
***********************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:
  >>> import nltk
  >>> nltk.download('stopwords')
  For more information see: https://www.nltk.org/data.html
  Attempted to load corpora/stopwords
  Searched in:
    - 'C:\Users\JohnDoe/nltk_data'
    - 'C:\Users\ JohnDoe \Anaconda3\nltk_data'
    - 'C:\Users\ JohnDoe \Anaconda3\share\nltk_data'
    - 'C:\Users\ JohnDoe \Anaconda3\lib\nltk_data'
    - 'C:\Users\ JohnDoe \AppData\Roaming\nltk_data'
    - 'C:\nltk_data'
    - 'D:\nltk_data'
    - 'E:\nltk_data'
************************************************************

This means we do not have a list of stopwords in nltk, which we can install using nltk.download(). This is required only once in your python environment. For more details, you can refer to the NLTK3 documentation.

Alternatively, you can construct a list of stopwords and add the logic to remove the words present in the stopwords list.

Before proceeding further, we should verify if the objects are in the right shape and size.

vocab should be a dictionary of length 2000 (the number we limit the vocabulary size to). X_train, y_train, X_test, and y_test should be numpy.ndarray with the size same as the split of the original dataset.

As we are planning to use RNN for this task, RNNs use sequences to be of a certain length thus, if a sentence is too short, we will have to pad the sequence, potentially with 0s. If a sentence is too long, we have to decide a maximum length and truncate which might happen more often during the prediction phase when we encounter previously unseen data. To decide that length, let's explore the training dataset and see how long the reviews are.
review_length = [len(i) for i in X_train]
print ("Average Review Length : {} Maximum Review Length : {} ".format(pd.Series(review_length).mean(), pd.Series(review_length).max()))
Out:
Average Review Length : 81.74666666666667
Maximum Review Length : 662

The review lengths look quite long by looking at these. However, in general, we’ll have a lot of reviews that are short, and there will be very few that are very long. It will not be wrong to truncate review sequence length to 200 words, thus, of course, losing information in reviews that are more than 200 words long, but we assume they will be quite rare and should not have much impact on the performance of the model.

Each review, if long, should be limited to 200 words. What if it is shorter than 200 words? We will pad it with empty cells (or zeros).
def pad(sentences, seq_len):
    features = np.zeros((len(sentences), seq_len),dtype=int)
    for ii, review in enumerate(sentences):
        if len(review) != 0:
            features[ii, -len(review):] = np.array(review)[:seq_len]
    return features
We can test it before using with the training dataset. In the following lines, we’ll pass a data row with ten elements, and the function will be called to pad it to make its length 20.
test = pad(np.array([list(range(10))]), 20)
test
Out: array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
If we instead send numbers from 1 to 30, the function would truncate the ten trailing numbers.
test = pad(np.array([list(range(30))]), 20)
test
Out: array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
Test and train dataset can be padded now.
X_train = pad(X_train,200)
X_test = pad(X_test,200)
The data is now ready. We can proceed to define the neural network. If we have a GPU available, we'll set our device to GPU. We'll use this variable later in our code.
import torch
is_cuda = torch.cuda.is_available()
if is_cuda:
    device = torch.device("cuda")
else:
    device = torch.device("cpu")
print (device)
Out:
    GPU

Preparing the Model

We will first need to convert the datasets into tensors. We can use torch.from_numpy() to create tensors, followed by TensorDataset() to pack the data values and labels together. We can then define DataLoaders to load the data for larger experiments.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
valid_data = TensorDataset(torch.from_numpy(X_test), torch.from_numpy(y_test))
batch_size=50
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)

The model is a simple model that will have an input layer, followed by the LSTM layer, followed by a one-unit output layer with sigmoid activation. We will use a dropout layer for basic regularization to avoid overfitting.

The input layer is an embedding layer with shape representing the batch size, layer size, and the sequence length. In the model class, the forward() method will implement the forward propagation computations. We will also implement init_hidden() method to initialize the hidden state of LSTM to zeros. The hidden state stores the internal state of the RNN from predictions made on previous tokens in the current sequence to maintain the concept of memory within the sequence. However, it is important that when we read the first token of the next review, the state should be reset, which will be updated and used by the rest of the tokens. The implementation is given here:
class SentimentAnalysisModel(nn.Module):
    def __init__(self, no_layers, vocab_size, hidden_dim, embedding_dim, drop_prob=0.5):
        super(SentimentAnalysisModel,self).__init__()
        self.output_dim = output_dim
        self.hidden_dim = hidden_dim
        self.no_layers = no_layers
        self.vocab_size = vocab_size
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=self.hidden_dim, num_layers=no_layers, batch_first=True)
        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(self.hidden_dim, output_dim)
        self.sig = nn.Sigmoid()
    def forward(self,x,hidden):
        batch_size = x.size(0)
        embeds = self.embedding(x)
        lstm_out, hidden = self.lstm(embeds, hidden)
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        out = self.dropout(lstm_out)
        out = self.fc(out)
        sig_out = self.sig(out)
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1] # get last batch of labels
        return sig_out, hidden
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        h0 = torch.zeros((self.no_layers, batch_size, self.hidden_dim)).to(device)
        c0 = torch.zeros((self.no_layers, batch_size, self.hidden_dim)).to(device)
        hidden = (h0,c0)
        return hidden
We can now initialize the model object. The hyperparameters can be defined separately for fine-tuning the model later. no_layers will be used to define stacking of RNNs. vocab_size is being incremented to adjust the shape of the embedding layer to accommodate 0s for padded reviews. output_dim is 1 to have only one node in the output layer that will contain a number that can be seen as probability of a review being positive. hidden_dim is used to specify the size of hidden state in the LSTM.
no_layers = 2
vocab_size = len(vocab) + 1
embedding_dim = 64
output_dim = 1
hidden_dim = 256
model = SentimentAnalysisModel (no_layers,  vocab_size, hidden_dim, embedding_dim, drop_prob=0.5)
model.to(device)
print(model)
The model output should look like the following:
SentimentAnalysisModel(
  (embedding): Embedding(2001, 64)
  (lstm): LSTM(64, 256, num_layers=2, batch_first=True)
  (dropout): Dropout(p=0.3, inplace=False)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)

Let’s define the training loop. We will use binary cross entropy loss function, which is a good choice for simple binary classification problems. We will keep learning rate as 0.01, and the optimizer is Adam optimization algorithm.

The training loop can be made to run for a large number of epochs. We will keep a track of accuracy and losses over each epoch to see how the performance improves over multiple iterations of training.

The accuracy will simply compare the output at the output layer and round it. Across all the examples, the accuracy will simply show the ratio of correctly labelled training data points.
def acc(pred,label):
    return torch.sum(torch.round(pred.squeeze()) == label.squeeze()).item()
The training loop is implemented as follows:
lr=0.001
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
clip = 5
epochs = 10
valid_loss_min = np.Inf
# train for some number of epochs
epoch_tr_loss,epoch_vl_loss = [],[]
epoch_tr_acc,epoch_vl_acc = [],[]
for epoch in range(epochs):
    train_losses = []
    train_acc = 0.0
    model.train()
    # initialize hidden state
    h = model.init_hidden(batch_size)
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])
        model.zero_grad()
        output,h = model(inputs,h)
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()
        train_losses.append(loss.item())
        # calculating accuracy
        accuracy = acc(output,labels)
        train_acc += accuracy
        #`clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()
    val_h = model.init_hidden(batch_size)
    val_losses = []
    val_acc = 0.0
    model.eval()
    for inputs, labels in valid_loader:
            val_h = tuple([each.data for each in val_h])
            inputs, labels = inputs.to(device), labels.to(device)
            output, val_h = model(inputs, val_h)
            val_loss = criterion(output.squeeze(), labels.float())
            val_losses.append(val_loss.item())
            accuracy = acc(output,labels)
            val_acc += accuracy
    epoch_train_loss = np.mean(train_losses)
    epoch_val_loss = np.mean(val_losses)
    epoch_train_acc = train_acc/len(train_loader.dataset)
    epoch_val_acc = val_acc/len(valid_loader.dataset)
    epoch_tr_loss.append(epoch_train_loss)
    epoch_vl_loss.append(epoch_val_loss)
    epoch_tr_acc.append(epoch_train_acc)
    epoch_vl_acc.append(epoch_val_acc)
    print(f'Epoch {epoch+1}')
    print(f'train_loss : {epoch_train_loss} val_loss : {epoch_val_loss}')
    print(f'train_accuracy : {epoch_train_acc*100} val_accuracy : {epoch_val_acc*100}')
    if epoch_val_loss <= valid_loss_min:
        torch.save(model.state_dict(), 'data/temp/state_dict.pt')
        print('Validation loss change ({:.6f} --> {:.6f}).  Saving model ...'.format(valid_loss_min,epoch_val_loss))
        valid_loss_min = epoch_val_loss
    print(' ')
We have added enough print statements to show a clear picture about how much the model learns in each epoch. You will be able to see the logs as follows:
Epoch 1
train_loss : 0.6903249303499858 val_loss : 0.6897501349449158
train_accuracy : 54.666666666666664 val_accuracy : 52.400000000000006
Validation loss change (inf --> 0.689750).  Saving model ...
Epoch 2
train_loss : 0.6426109115282694 val_loss : 0.7218503952026367
train_accuracy : 64.8 val_accuracy : 57.199999999999996
We can also visualize these in a chart as shown in Figure 16-5.
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (20, 6))
plt.subplot(1, 2, 1)
plt.plot(epoch_tr_acc, label='Train Acc')
plt.plot(epoch_vl_acc, label='Validation Acc')
plt.title("Accuracy")
plt.legend()
plt.grid()
plt.subplot(1, 2, 2)
plt.plot(epoch_tr_loss, label='Train loss')
plt.plot(epoch_vl_loss, label='Validation loss')
plt.title("Loss")
plt.legend()
plt.grid()
plt.show()
Figure 16-5

Accuracy and loss over epochs

For further improvements and tuning, you can play with the model architecture and hyperparameters and, the easiest of all, increase the number of epochs or add more labelled data.

Serializing for Future Predictions

Usually, you’d play around with modifications in terms of network architecture, tune factors like how you are creating features (vocabulary), and other hyperparameters. Once you’ve got sufficiently high accuracy that can be reliably used in the application, you would save the model state so that we don’t have to repeat computationally intensive training process every time we want to predict sentiment for a sentence.

Assuming that we will not use the training method again, here are the items that need to be stored for the prediction phase:
  • Logic to convert sentence into sequence

  • Vocabulary dictionary containing mapping from words to numbers

  • Network architecture and forward propagation computations

Python’s pickle module is often used to serialize objects and store them permanently in the disk. PyTorch provides a method to save the model parameters, which internally uses pickle by default, and models, tensors, and dictionaries of all kinds of objects can be saved using this function.
torch.save(model.state_dict(), 'model_path.pt')
state_dict is a Python dictionary object that maps each layer to its parameter tensor. In future, you can load the parameters using
model = SentimentAnalysisModel (args)
model.load_state_dict(torch.load('model_path.pt'))

Remember, this only saves the model parameters. You would still need the model definition that you have specified in the code before.

The inference method can use the vocabulary dictionary to convert sequences of word tokens to sequences of numbers. We will pad this up to the length we decided before (200) and use that as input for the model. This method is implemented as follows:
def inference(text):
    word_seq = np.array([vocab[preprocess_string(word)] for word in text.split() if preprocess_string(word) in vocab.keys()])
    word_seq = np.expand_dims(word_seq,axis=0)
    padded =  torch.from_numpy(pad(word_seq,500))
    inputs = padded.to(device)
    batch_size = 1
    h = model.init_hidden(batch_size)
    h = tuple([each.data for each in h])
    output, h = model(inputs, h)
    return(output.item())
You can simply call this method to find the sentiment score. Make sure to call model.eval() if you want to make an inference.
inference("The plot was deeply engaging and I couldn't move")

This will preprocess the sentence, split it into tokens, convert into a sequence of vocabulary indices, pass the input sequence to the network, and return the value obtained in the output layer after a forward propagation pass. This returned a value of 0.6632 , which denotes a positive sentiment. If the use case requires, you can add a conditional statement to return a string containing a word “positive” or “negative” instead of a number.

Hosting the Model

One of the highly popular methods to use a train model in a larger application is to host the model as a microservice. This means a small HTTP server will be used that can accept GET requests.

In this example, we can build and create a server that can accept GET data, which will be a review. The server will read the data and respond with a sentiment label.

Hello World in Flask

If you haven’t installed Flask, you can install it using
pip install Flask
Here’s a very simple hello world application in flask. You will need to create an object of Flask() and implement a function that is used by an address defined by @app.route(). Here, we’re defining an endpoint that will accept a request to the server URL which will be represented as http://server-url:5000/hello and returns a string saying Hello World!. We have to specify
from flask import Flask
app = Flask(__name__)
@app.route('/hello')
def hello():
    return 'Hello World!'
app.run(port=5000)
If you run it on Jupyter or terminal, you will be able to see the server logs:
* Serving Flask app "__main__" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
You can now navigate to http://127.0.0.1:5000/hello on the browser and confirm the output. The server logs will confirm that.
127.0.0.1 - - [28/Jun/2021 10:10:00] "GET /hello HTTP/1.1" 200
To host the model, you can create a separate Python file and define a new
@app.route("/getsentiment", methods=['GET'])
def addParams():
  args = request.args
  text = args['reviewtext']
  score = inference(text)
  label = 'positive' if score >0.5 else 'negative'
  return {'sentimentscore': score, 'sentimentlabel':label}

The front-end application can send a request to http://server:port/getsentiment and send the data as a reviewtext argument and receive a json/dictionary with sentimentscore and sentimentlabel.

What’s Next

The field of machine learning, artificial intelligence, and data science has been evolving over the past decades and will keep evolving as newer hardware technologies and algorithmic perspectives keep evolving.

AI is not a magic wand that will solve our unsolvable problems – but a well-structured suite of concepts, theories, and techniques that help us understand and implement solutions that help the machines learn by looking at the data that we offer them. It is important to understand the implications of potential biases and thoroughly inspect the ethical aspects of the projects and products that are the outcome of our practice. This book serves not as an end but as a handy tool to navigate the steps in your data science journey.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.134.218