Downloading IMDB data and performing text tokenization

For applications related to computer vision, we used the torchvision library, which provides us with a lot of utility functions, helping to building computer vision applications. In the same way, there is a library called torchtext, part of PyTorch, which is built to work with PyTorch and eases a lot of activities related to natural language processing (NLP) by providing different data loaders and abstractions for text. At the time of writing, torchtext does not come with PyTorch installation and requires a separate installation. You can run the following code in the command line of your machine to get torchtext installed:

pip install torchtext

Once it is installed, we will be able to use it. Torchtext provides two important modules called torchtext.data and torchtext.datasets

We can download the IMDB Movies dataset from the following link:
https://www.kaggle.com/orgesleka/imdbmovies
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.34.226