We’ve already demonstrated how to train a character-level RNN to create original text. Now, we create a word-level RNN to analyze sentiment.
Sentiment analysis is the interpretation and classification of polarity, emotions, and intentions within text data using NLP text analysis tools. Polarity can be positive, negative, or neutral. Emotions can vary across a wide range of feelings such as anger, happiness, frustration, and sadness, to name a few. Intentions can also vary across a wide range of motives such as interested or not interested. A common application of sentiment analysis is to identify customer sentiment toward products, brands, or services through online feedback. General application includes social media monitoring, brand monitoring, customer service, customer feedback, and market research.
For an excellent discussion of sentiment analysis, consult the following URL:
Sentiment analysis is a very common NLP task. Technically, it computationally identifies and categorizes opinions expressed in a text corpus to determine attitude or sentiment. Typically, sentiment analysis is used to determine a positive, negative, or neutral opinion toward a particular topic or product.
Notebooks for chapters are located at the following URL: https://github.com/paperd/tensorflow.
IMDb Dataset
A popular dataset used to practice NLP is the IMDb reviews dataset. IMDb is a benchmark dataset for binary sentiment classification. The dataset contains 50,000 movie reviews labeled as either positive (1) or negative (0). Reviews are preprocessed with each encoded as a sequence of word indexes in the form of integers. Words within the reviews are indexed by their overall frequency within the dataset. The 50,000 reviews are split into 25,000 for training and 25,000 for testing. So we can predict the number of positive and negative reviews using either classification or other deep learning algorithms.
IMDb is popular because it is simple to use, relatively easy to process, and challenging enough for machine learning aficionados. We enjoy working with IMDb because it’s just plain fun to work with movie data.
- 1.
Click Runtime in the top-left menu.
- 2.
Click Change runtime type from the drop-down menu.
- 3.
Choose GPU from the Hardware accelerator drop-down menu.
- 4.
Click SAVE.
Import the tensorflow library. If ‘/device:GPU:0’ is displayed, the GPU is active. If ‘..’ is displayed, the regular CPU is active.
Load IMDb as a TFDS
We use the imdb_reviews/subwords8k TFDS so we train the model on a smaller vocabulary. The subwords8k subset has a vocabulary size of 8,000, which means that we are training the model on the 8,000 most commonly used words in the reviews. It also means that we don’t have to build our own vocabulary dictionary! We can get good performance with this subset and substantially reduce training time. Loading the TFDS also gives us access to the tfds.features.text.SubwordTextEncoder, which is the TFDS text encoder.
We set with_info=True to enable access to information about the dataset and the encoder. We set as_supervised=True so that the returned TFDS has a two-tuple structure (input, label) in accordance with builder.info.supervised_keys. If set to False (the default), the returned TFDS will have a dictionary with all features included. We set shuffle_files=True because shuffling typically improves performance.
Display the Keys
We see that the dataset is split into test, train, and unsupervised samples.
Split into Train and Test Sets
Display the First Sample
The first training example contains an encoded review and a label. The review is already encoded as a tensor of integers with datatype int64. The label is a scalar value of either 0 (negative) or 1 (positive) with datatype int64.
The shape of the review tensor indicates the number of words it contains. For readability, we convert the target tensor to values with the numpy method.
Display Information About the TFDS
Peruse Metadata
Create the Encoder
An encoder is built into the TFDS SubwordTextEncoder. With the encoder, we can easily decode (integer to text) and encode (text to integer). We access the encoder from the dataset’s info object.
Now that the encoder is built, we can use it to vectorize strings and decode vectorized strings back into text strings.
Use the Encoder on Samples
Display the first review
Display multiple reviews
We skip the first review because we’ve already seen it.
Finish the Input Pipeline
Create batches of the encoded strings (or reviews) to greatly enhance performance. Since machine learning algorithms expect batches of the same size, use the padded_batch method to zero-pad the sequences so that each review is the same length as the longest string in the batch.
Finish the input pipeline
Consult the following URL for updates on padding character tensors:
Create the Model
Create the model
The first layer is an embedding layer. The embedding layer is used to create word vectors for incoming words. During training, representations of word categories (or word vectors) are learned in a way where similar categories are closer to one another. So word vectors can store relationships between words like good and great. Word vectors are dense because our model learns word relationships. As a result, word vectors aren’t padded with a huge number of zeros like what we do with one-hot encodings.
The embedding layer accepts vocabulary size, embedding size, and input shape. We set mask_zero=True to inform the model to ignore padding tokens by all downstream layers. Ignoring padding tokens improves performance.
The next two layers are GRU layers, and the final layer is a single-neuron dense output layer. The output layer uses sigmoid activation to output the estimated probability that the review expresses a positive or negative sentiment regarding the movie.
Model Summary
The first layer is an embedding. So calculate the number of learnable parameters by multiplying vocabulary size of 8185 by embedding dimension (embed_size) of 128 for a total of 1,047,680.
The second layer is a GRU. The number of learnable parameters is thereby based on the formula 3 × (n2 × mn + 2n) where m is the input dimension and n is the output dimension. Multiply by 3 because there are three sets of operations for a GRU that requires weight matrices of these sizes. Multiply n by 2 because of the feedback loops of a RNN. So we get 99,072 learnable parameters.
* 3 × (1282 + 128 × 128 + 2 × 128)
* 3 × (16384 + 16384 + 256)
* 3 × 33024
* 99,072
As we can see, calculating learnable parameters for the second layer is pretty complex. So let’s break it down logically. A GRU layer is a feedforward layer with feedback loops. Learnable parameters for a feedforward network are calculated by multiplying output from the previous layer (128 neurons) with neurons at the current layer (128 neurons). With a feedforward network, we also have to account for the 128 neurons at this layer. But we multiply the 128 neurons at this layer by 2 because of the feedback mechanism of a RNN. Finally, the current layer’s 128 neurons are fed back resulting in 1282 learnable parameters. A GRU uses three sets of operations (hidden state, reset gate, and update gate) requiring weight matrices, so we multiply the learnable parameters by 3.
The third layer is a GRU. We get 99,072 learnable parameters because n and m are exactly the same as the second layer. So the calculations are the same.
The final layer is dense. So calculate the number of learnable parameters by multiplying output dimension of 1 by input dimension of 128 and adding 1 to account for the number of neurons at this layer for a total of 129.
Compile the Model
Train the Model
Generalize on Test Data
Visualize Training Performance
Visualize training performance
Make Predictions from Fabricated Reviews
Let’s make predictions from reviews that we fabricate. Begin by creating a function that returns the predictions. Since we create our own reviews, the function must convert the text review for TensorFlow consumption.
The function accepts a text review. It begins by encoding the review. It then converts the encoded review to float32. The function ends by making the prediction and returning it to the calling environment. We add the 1 dimension to the encoded text so it can be consumed by the TensorFlow model.
We have a prediction. Predictions greater than 0.5 mean that the review is positive. Otherwise, the review is negative.
The function converts the prediction to a numpy scalar.
The function removes the 1 dimension from the prediction.
As expected, the review is positive.
As expected, the review is negative.
Make Predictions on a Test Data Batch
We can also predict from the test set. Let’s make predictions on the first test batch with the predict method. Since test data is already in tensor form, we don’t need to encode.
Make predictions based on a batch from the test set
We take the first batch from test_ds. We make predictions with the predict method and place them in y_pred_64. Variable y_pred_64 holds 64 predictions because batch size is 64. We then display the first review and its associated label from this batch. Remember that label 1 means the review is positive and label 0 means it is negative. We end by displaying the size of the sample and target to verify that we have 64 examples in our first batch.
If the prediction matches the actual label, it was correct.
Prediction efficacy for five predictions
Prediction Accuracy for the First Batch
Prediction accuracy for the first batch
We begin by traversing the first batch and comparing labels to predictions. If a prediction is correct, we add this information to a list. We continue by counting the number of correct predictions. We end by dividing correct predictions by the batch size to get overall prediction accuracy.
Leverage Pretrained Embeddings
Amazingly, we can reuse modules from pretrained models on the IMDb dataset. The TensorFlow Hub project is a library with hundreds of reusable machine learning modules. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a different task.
You can browse the library by perusing the following URL:
Once you locate a module, copy the URL into your model. The module is automatically downloaded along with its pretrained weights. A huge advantage of using pretrained models is that we don’t have to create and train our own models from scratch!
Load the IMDb Dataset
We use the full vocabulary because we don’t have to worry about training with it!
Build the Input Pipeline
Create the Pretrained Model
Create the model
The hub.KerasLayer downloads the sentence encoder module. Each string input into this layer is automatically encoded as a 50D vector. So each vector represents 50 words. Each word is embedded based on an embedding matrix pretrained on the 7 billion–word Google News corpus. The next two dense layers are added to provide a basic model for sentiment analysis. Using TF Hub is convenient and efficient because we can use what was already learned from the pretrained model.
Compile the Model
Train the Model
Training time is substantially reduced!
Make Predictions
Since batch size is 32, we have 32 predictions for each batch.
Misclassifications in the first batch by index
Calculate Prediction Accuracy for the First Batch
Calculate prediction accuracy for the first batch
Instead of finding misclassifications, the code finds correct predictions. The code begins by comparing an actual label to a predicted one. If the actual label is predicted correctly, a Boolean True is appended to a list. Once the first batch is traversed, the number of elements in the list is counted. This count is divided by batch size of 32 to determine accuracy, which is displayed as a percentage.
Explore IMDb with Keras
Since Keras is very popular in industry, we demonstrate how to train IMDb with keras.datasets. We use the keras.datasets.imdb.load_data function to load the dataset in a format-ready fashion for use in neural network and deep learning models.
Loading the Keras IMDb has some advantages. First, words have already been encoded with integers. Second, encoded words are arranged by their absolute popularity in the dataset. So sentences in each review are comprised of a sequence of integers. Third, calling imdb.load_data the first time downloads IMDb to your computer and stores it in your home directory under ~/.keras/datasets/imdb.pkl as a 32-megabyte file. The imdb.load_data function also provides additional arguments including number of top words to load (where words with a lower integer are marked as zero in the returned data), number of top words to skip (to avoid words like the), and the maximum length of reviews to support.
The function loads data into train and test tuples. So train[0] contains training reviews and train[1] contains training labels. And test[0] contains test reviews and test[1] contains test labels. Each review is represented as a numpy array of integers with each integer representing a word. The labels contain lists of integer labels (0 is a negative review and 1 is positive).
As expected, we have 25,000 train and 25,000 test reviews.
As expected, we have 25,000 train and 25,000 test labels.
Explore the Train Sample
The dataset is labeled by two categories that represent sentiment of each review. And the train sample contains 88,585 unique words.
We create a list containing the number of words in each review and then find the length of the review with the maximum number of words.
We use the np.where function to find the index. We used double indexing because the function returns a tuple containing a list that holds the index we desire.
Create a Decoding Function
Function that decodes a review
The function uses the tf.keras.datasets.imdb.get_word_index utility to obtain a dictionary of words and their uniquely assigned integers. The function then creates another dictionary containing key-value groupings as value and key groupings from the first dictionary. Finally, it returns the words based on their IDs (or keys). The indices are offset by 3 because 0, 1, and 2 are reserved indices for padding, start of sequence, and unknown.
Invoke the Decoding Function
Decode the longest review
Since we already know the index of the longest review, we can easily retrieve it from train_reviews. Display a slice of it since the review is pretty long. We can also easily retrieve the label from train_labels. Make the label readable and display it. Finally, display the length of the longest review.
Let’s see what the shortest review looks like. But we can’t do this directly.
Since we don’t know which review is the shortest, we use the amin method to return the minimum.
We use the where method to return the index we seek. Since the method returns all reviews that meet the criterion, we grab the first one and display its index.
Display the shortest review
Continue Exploring the Training Sample
First label and its review