All of the columns appear to be correctly loading. Now, we can look at summary statistics for the data:
summary(reviews)
There are several points in the summary worth noting:
- Some of the data points I had assumed would be just TRUE/FALSE, 0/1 have ranges instead; for example, funny has a max value over 600; useful has a max 1100, cool has 500.
- All of the IDs (users, businesses) have been mangled. We could use the user file and the business file to come up with exact references.
- Star ratings are 1-5, as expected. However, the mean and median are about a 4, which I take as many people only take the time to write good reviews.