Exploring the data

If we look at some of these reviews, we can see just how difficult categorizing the reviews as positive or negative is, even for humans.

For instance, some words are used in ways that aren't associated with their straightforward meaning. For example, look at the use of the term greatest in the following quote from a review for a Beijing hotel:

"Not the greatest area but no problems, even at 3:00 AM."

Also, many reviews recount both good and bad aspects of the hotel that they're discussing, even if the final review decidedly comes down one way or the other. This review of a London hotel starts off listing the positives, but then it pivots:

"… These are the only real positives. Everything else was either average or below average...."

Another reason why reviews are difficult to classify is that many reviews just don't wholeheartedly endorse whatever it is they're reviewing. Instead, the review will be tepid, or the reviewers qualify their conclusions as they did in this review for a Las Vegas hotel:

"It's faded, but it's fine. If you're on a budget and want to stay on the Strip, this is the place. But for a really great inexpensive experience, try the Main Street Station downtown."

All of these factors contribute toward making this task more difficult than standard document classification problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.18.218