BoW classifier using logistic regression

Probabilities will be logged onto our two labels English and Spanish on which our generated model will map a sparse BoW representation. In the vocabulary, we will assign each word as an index. Let's say for example, we have two words in our vocabulary, that is hello and world, which have indices as zero and one, respectively. For example, for the sentence hello hello hello hello hello, the BoW vector is [5,0]. Similarly the BoW vector for hello world world hello world is [2,3], and so on.

Generally, it is [Count(hello),Count(world)].

Let us denote is BOW vector as x.

The network output is as follows:

Next, we need to pass the input through an affine map and then use log softmax:

data = [("El que lee mucho y anda mucho, ve mucho y sabe mucho".split(), "SPANISH"),
 ("The one who reads a lot and walks a lot, sees a lot and knows a lot.".split(), "ENGLISH"),
 ("Nunca es tarde si la dicha es buena".split(), "SPANISH"),
 ("It is never late if the joy is good".split(), "ENGLISH")]

test_data = [("Que cada palo aguante su vela".split(), "SPANISH"),
 ("May every mast hold its own sail".split(), "ENGLISH")]

#each word in the vocabulary is mapped to an unique integer using word_to_ix, and that will be considered as that word's index in BOW

word_to_ix = {}
for sent, _ in data + test_data:
 for word in sent:
 if word not in word_to_ix:
 word_to_ix[word] = len(word_to_ix)
print(word_to_ix)

VOCAB_SIZE = len(word_to_ix)
NUM_LABELS = 2

class BoWClassifier(nn.Module): # inheriting from nn.Module!

def __init__(self, num_labels, vocab_size):

#This calls the init function of nn.Module. The syntax might confuse you, but don't be confused. Remember to do it in nn.module 

 super(BoWClassifier, self).__init__()

Next, we will define the parameters that are needed. Here, those parameters are A and B, and the following code block explains the further implementations are required:

 # let's look at the prarmeters required for affine mapping
 # nn.Linear() is defined using Torch that gives us the affine maps.
#We need to ensure that we understand why the input dimension is vocab_size
 # num_labels is the output
 self.linear = nn.Linear(vocab_size, num_labels)

# Important thing to remember: parameters are not present in the non-linearity log softmax. So, let's now think about that.

def forward(self, bow_vec):
 #first, the input is passed through the linear layer
 #then it is passed through log_softmax
 #torch.nn.functional contains other non-linearities and many other fuctions

 return F.log_softmax(self.linear(bow_vec), dim=1)

def make_bow_vector(sentence, word_to_ix):
 vec = torch.zeros(len(word_to_ix))
 for word in sentence:
 vec[word_to_ix[word]] += 1
 return vec.view(1, -1)

def make_target(label, label_to_ix):
 return torch.LongTensor([label_to_ix[label]])

model = BoWClassifier(NUM_LABELS, VOCAB_SIZE)

Now, the model knows its own parameters. The first output is A, while the second is B, as follows:

#A component is assigned to a class variable in the __init__ function
# of a module, which was done with the line
# self.linear = nn.Linear(...)

# Then from the PyTorch devs, knowledge of the nn.linear's parameters #is stored by the module (here-BoW Classifier)

for param in model.parameters():
 print(param)


#Pass a BoW vector for running the model
# the code is wrapped since we don't need to train it
torch.no_grad()
with torch.no_grad():
 sample = data[0]
 bow_vector = make_bow_vector(sample[0], word_to_ix)
 log_probs = model(bow_vector)
 print(log_probs)

The output of the preceding code is as follows:


{'El': 0, 'que': 1, 'lee': 2, 'mucho': 3, 'y': 4, 'anda': 5, 'mucho,': 6, 've': 7, 'sabe': 8, 'The': 9, 'one': 10, 'who': 11, 'reads': 12, 'a': 13, 'lot': 14, 'and': 15, 'walks': 16, 'lot,': 17, 'sees': 18, 'knows': 19, 'lot.': 20, 'Nunca': 21, 'es': 22, 'tarde': 23, 'si': 24, 'la': 25, 'dicha': 26, 'buena': 27, 'It': 28, 'is': 29, 'never': 30, 'late': 31, 'if': 32, 'the': 33, 'joy': 34, 'good': 35, 'Que': 36, 'cada': 37, 'palo': 38, 'aguante': 39, 'su': 40, 'vela': 41, 'May': 42, 'every': 43, 'mast': 44, 'hold': 45, 'its': 46, 'own': 47, 'sail': 48}
Parameter containing:
tensor([[-0.0347, 0.1423, 0.1145, -0.0067, -0.0954, 0.0870, 0.0443, -0.0923,
 0.0928, 0.0867, 0.1267, -0.0801, -0.0235, -0.0028, 0.0209, -0.1084,
 -0.1014, 0.0777, -0.0335, 0.0698, 0.0081, 0.0469, 0.0314, 0.0519,
 0.0708, -0.1323, 0.0719, -0.1004, -0.1078, 0.0087, -0.0243, 0.0839,
 -0.0827, -0.1270, 0.1040, -0.0212, 0.0804, 0.0459, -0.1071, 0.0287,
 0.0343, -0.0957, -0.0678, 0.0487, 0.0256, -0.0608, -0.0432, 0.1308,
 -0.0264],
 [ 0.0805, 0.0619, -0.0923, -0.1215, 0.1371, 0.0075, 0.0979, 0.0296,
 0.0459, 0.1067, 0.1355, -0.0948, 0.0179, 0.1066, 0.1035, 0.0887,
 -0.1034, -0.1029, -0.0864, 0.0179, 0.1424, -0.0902, 0.0761, -0.0791,
 -0.1343, -0.0304, 0.0823, 0.1326, -0.0887, 0.0310, 0.1233, 0.0947,
 0.0890, 0.1015, 0.0904, 0.0369, -0.0977, -0.1200, -0.0655, -0.0166,
 -0.0876, 0.0523, 0.0442, -0.0323, 0.0549, 0.0462, 0.0872, 0.0962,
 -0.0484]], requires_grad=True)
Parameter containing:
tensor([ 0.1396, -0.0165], requires_grad=True)
tensor([[-0.6171, -0.7755]])

We got the tensor output values. But, as we can see from the preceding code, these values aren't in correspondence to the log probability whether which is English and which corresponds to word Spanish. We need to train the model, and for that it's important to define these values to the log probabilities.

label_to_ix = {"SPANISH": 0, "ENGLISH": 1}

Let's start training our model then. We start with passing instances through the model to the those log probabilities. Then, the loss function is computed, and once the loss function is computer we calculate the gradient of this loss function. Finally, the parameters are updated with a gradient step. The nn package in PyTorch provides the loss functions. We want nn.NLLLoss() as the negative log likelihood loss. Optimization functions are also defined is torch.optim.

Here, we will just use Stochastic Gradient Descent (SGD):

# Pass the BoW vector for running the model
# the code is wrapped since we don't need to train it
torch.no_grad()

with torch.no_grad():
 sample = data[0]
 bow_vector = make_bow_vector(sample[0], word_to_ix)
 log_probs = model(bow_vector)
 print(log_probs)

# We will run this on data that can be tested temporarily, before training, just to check the before and after difference using touch.no_grad():

with torch.no_grad():
 for instance, label in test_data:
 bow_vec = make_bow_vector(instance, word_to_ix)
 log_probs = model(bow_vec)
 print(log_probs)


#The matrix column corresponding to "creo" is printed
print(next(model.parameters())[:, word_to_ix["mucho"]])

loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

We don't want to pass the training data again and again for no reason. Real datasets have multiple instances and not just 2. It is reasonable to train the model for epochs between 5 to 30.

The following code shows the range for our example:

for epoch in range(100):
 for instance, label in data:
 # Firstly, remember that gradients are accumulated by PyTorch
 # It's important that we clear those gradients before each instance
 model.zero_grad()

#The next step is to prepare our BOW vector and the target should be #wrapped in also we must wrap the target in a tensor in the form of an #integer
 # For example, as considered above, if the target word is SPANISH, #then, the integer wrapped should be 0
#The loss function is already trained to understand that when the 0th element among the log probabilities is the one that is in accordance to SPANISH label

 bow_vec = make_bow_vector(instance, word_to_ix)
 target = make_target(label, label_to_ix)

# Next step is to run the forward pass
 log_probs = model(bow_vec)

Here, we will compute the various factors such as loss, gradient, and updating the parameters by calling the function optimizer.step():


 loss = loss_function(log_probs, target)
 loss.backward()
 optimizer.step()

with torch.no_grad():
 for instance, label in test_data:
 bow_vec = make_bow_vector(instance, word_to_ix)
 log_probs = model(bow_vec)
 print(log_probs)

# After computing and the results, we see that the index that corresponds to Spanish has gone up, and for English is has gone down!
print(next(model.parameters())[:, word_to_ix["mucho"]])

The output is as follows:


tensor([[-0.7653, -0.6258]])
tensor([[-1.0456, -0.4331]])
tensor([-0.0071, -0.0462], grad_fn=<SelectBackward>)
tensor([[-0.1546, -1.9433]])
tensor([[-0.9623, -0.4813]])
tensor([ 0.4421, -0.4954], grad_fn=<SelectBackward>)

Table of Contents for BoW classifier using logistic regression

Create new playlist

Sign In

Sign Up

Table of Contents for
BoW classifier using logistic regression