Freeze the embedding layer weights

It is a two-step process to tell PyTorch not to change the weights of the embedding layer:

Set the requires_grad attribute to False, which instructs PyTorch that it does not need gradients for these weights.
Remove the passing of the embedding layer parameters to the optimizer. If this step is not done, then the optimizer throws an error, as it expects all the parameters to have gradients.

The following code demonstrates how easy it is to freeze the embedding layer weights and instruct the optimizer not to use those parameters:

model.embedding.weight.requires_grad = False
optimizer = optim.SGD([ param for param in model.parameters() if param.requires_grad == True],lr=0.001)

We generally pass all the model parameters to the optimizer, but in the previous code we pass parameters which have requires_grad to be True.

We can train the model using this exact code and should achieve similar accuracy. All these model architectures fail to take advantage of the sequential nature of the text. In the next section, we explore two popular techniques, namely RNN and Conv1D, that take advantage of the sequential nature of the data.

Table of Contents for Freeze the embedding layer weights

Create new playlist

Sign In

Sign Up

Table of Contents for
Freeze the embedding layer weights