Business reviews – the Yelp dataset challenge

Finally, we apply sentiment analysis to the significantly larger Yelp business review dataset with five outcome classes. The data consists of several files with information on the business, the user, the review, and other aspects that Yelp provides to encourage data science innovation.

We will use around six million reviews produced over the 2010-2018 period (see the relevant notebook for details). The following diagrams show the number of reviews and the average number of stars per year:

Graphs representing number of reviews and the average number of stars per year

In addition to the text features resulting from the review texts, we will also use other information submitted with the review or about the user.

We will train various models on data through 2017 and use 2018 as the test set.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.36.10