Sentiment Analysis of Amazon Reviews with NLP

Every day, we generate data from emails, online posts such as blogs, social media comments, and more. It is not surprising to say that unstructured text data is much larger in size than the tabular data that exists in the databases of any organization. It is important for organizations to acquire useful insights from the text data pertaining to the organization. Due to the different nature of the text data when compared to data in databases, the methods that need to be employed to understand the text data are different. In this chapter, we will learn a number of key techniques in natural language processing (NLP) that help us to work on text data. 

The common definition of NLP is as follows: an area of computer science and artificial intelligence that deals with the interactions between computers and human (natural) languages; in particular, how to program computers to fruitfully process large amounts of natural language data.

In general terms, NLP deals with understanding human speech as it is spoken. It helps machines read and understand "text".

Human languages are highly complex and several ambiguities need to be resolved in order to correctly comprehend the spoken language or written text. In the area of NLP, several techniques are applied in order to deal with these ambiguities, including the Part-of-Speech (POS) tagger, term disambiguation, entity extraction, relations' extraction, key term recognition, and more.

For natural language systems to work successfully, a consistent knowledge base, such as a detailed thesaurus, a lexicon of words, a dataset for linguistic and grammatical rules, an ontology, and up-to-date entities, are prerequisites.

It may be noted that NLP is concerned with understanding the text from not just the syntactic perspective, but also from a semantic perspective. Similar to humans, the idea is for the machines to be able to perceive underlying messages behind the spoken words and not just the structure of words in sentences. There are numerous application areas of NLP, and the following are just a few of these:

  • Speech recognition systems
  • Question answering systems
  • Machine translation
  • Text summarization
  • Virtual agents or chatbots
  • Text classification
  • Topic segmentation

As the NLP subject area in itself is very vast, it is not practical to cover all the areas in just one chapter. Therefore, we will be focusing on "text classification" in this chapter. We do this by implementing a project that performs sentiment analysis in the reviews expressed by Amazon.com customers. Sentiment analysis is a type of text classification task where we classify each of the documents (reviews) into one of the possible categories. The possible categories could be positive, negative, or neutral, or it could be positive, negative, or a rating on a scale of 1 to 10.

Text documents that need to be classified cannot be input directly to a machine learning algorithm. Each of the documents needs to be represented in a certain format that is acceptable for the ML algorithm as input to work on. In this chapter, we explore, implement, and understand the Bag of Words (BoW) word embedding approaches. These are approaches in which text can be represented.

As we progress with the chapter, we will cover the following topics:

  • The sentiment analysis problem
  • Understanding the Amazon reviews dataset
  • Building a text sentiment classifier with the BoW approach
  • Understanding word embedding approaches
  • Building a text sentiment classifier with pretrained Word2vec word embedding based on Reuters news corpus
  • Building a text sentiment classifier with GloVe word embedding
  • Building a text sentiment classifier with fastText
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.123.126