Finding the top rated firms

We can find the businesses with the most top ratings (5's) using the following script. This script uses SQL-like access to data frames. SQL has built-in mechanisms for searching and selecting and ordering of the data components as needed.

In this script we are building a computed data frame with two columns: business_id and review count. The data frame is ordered with the top rated firms appearing first. Once created, we display the head of the data frame to get at the top rated businesses in the dataset:

#businesses with most 5 star ratings
#install.packages("sqldf", repos='http://cran.us.r-project.org')
library(sqldf)
five_stars = sqldf("select business_id, count(*) from reviews where stars = 5 group by business_id order by 2 desc")
head(five_stars)  

It is remarkable that the top five businesses had such a skewed amount of ratings (number 1 had close to double the number 6 business). You wonder if there is some collusion in the ratings process for such a divergence. Again, the names are mangled so far.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.200.136