Text Document Categorization

In this chapter, we discuss the application of transfer learning to text document categorization. Text categorization is a very popular natural language processing task. The key objective is to assign a document to one or more classes or categories based on its textual content. This has widespread applications in the industry including email classification to spam/non-spam, review and ratings classification, sentiment analysis, email or incident routing where we categorize emailsincidents so that it can be automatically assigned to respective person. The following are the major topics that will be covered in this chapter:

Text categorization in general, industry applications, and challenges
Benchmark text categorization datasets and performance of traditional models
Word representation by dense vectors—deep learning models
CNN document model—word-to-sentence embedding and then to document embeddings
Application of transfer learning where source and target domain distributions are different; that is, the source domain consists of classes with less overlap and the target domain has many mixing classes
Application of transfer learning where source and target domains themselves are different (for example, the source is news and the target is movie reviews, and so on)
Application of the trained model to do other text analysis tasks such as document summarization—explaining why a review is categorized as negative/positive

We will focus on both conceptual and practical implementations with hands-on examples. The code for this chapter is available for quick reference in the Chapter 7 folder in the GitHub repository at https://github.com/dipanjanS/hands-on-transfer-learning-with-python which you can refer to as needed to follow along with the chapter.

Table of Contents for Text Document Categorization

Create new playlist

Sign In

Sign Up

Table of Contents for
Text Document Categorization