Text Document Categorization

In this chapter, we discuss the application of transfer learning to text document categorization. Text categorization is a very popular natural language processing task. The key objective is to assign a document to one or more classes or categories based on its textual content. This has widespread applications in the industry including email classification to spam/non-spam, review and ratings classification, sentiment analysis, email or incident routing where we categorize emailsincidents so that it can be automatically assigned to respective person. The following are the major topics that will be covered in this chapter:

  • Text categorization in general, industry applications, and challenges
  • Benchmark text categorization datasets and performance of traditional models
  • Word representation by dense vectors—deep learning models
  • CNN document model—word-to-sentence embedding and then to document embeddings
  • Application of transfer learning where source and target domain distributions are different; that is, the source domain consists of classes with less overlap and the target domain has many mixing classes
  • Application of transfer learning where source and target domains themselves are different (for example, the source is news and the target is movie reviews, and so on)
  • Application of the trained model to do other text analysis tasks such as document summarization—explaining why a review is categorized as negative/positive 

We will focus on both conceptual and practical implementations with hands-on examples. The code for this chapter is available for quick reference in the Chapter 7 folder in the GitHub repository at https://github.com/dipanjanS/hands-on-transfer-learning-with-python which you can refer to as needed to follow along with the chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.160.63