Improving the results

What could we do to improve these results?

First, we should improve the test and training sets. It would be good to have multiple raters, say, have each review independently reviewed three times and use the rating that was chosen two or three times.

Most importantly, we'd like to have a larger and better test set and training set. For this type of problem, having 500 observations is really on the low end of what you can do anything useful with, and you can expect the results to improve with more observations. However, I do need to stress on the fact that more training data doesn't necessarily imply better results. It could help, but there are no guarantees.

We could also look at improving the features. We could select them more carefully, because having too many useless or unneeded features can make the classifier perform poorly. We could also select different features such as dates or information about the informants; if we had any data on them, it might be useful.

There has also been more recent work in moving beyond polarity classification, such as looking at emotional classification. Another way of being more fine grained than binary categorization is to classify the documents on a scale. For instance, instead of positive or negative, these classifiers could try to predict how the user would rate the product on a five-star scale, such as what has become popular on Amazon and many websites that include user ratings and reviews.

Once we have identified the positive or negative reviews, we can apply other analyses separately to those reviews, whether its topic modeling, named entity recognition, or something else.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.176.80