The following section shows how to connect to BigQuery from R programming language and create visuals using that data:
- Open RStudio and click on the Run button to run the following script in the console or source pane to install bigrquery and ggplot2:
install.packages("bigrquery")
install.packages("ggplot2")
- Run the next script in the console to query your Google BigQuery Table:
library("bigrquery")
project <- "Enter your project ID Here"
query <- "SELECT trafficsource.medium as Medium,
COUNT(visitId) as Visits
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
GROUP BY Medium"
result <- query_exec(query, project, use_legacy_sql = FALSE)
- The bigrquery package works with the API for authentication. Once the script is run, R will prompt the user for confirmation to store a local file with authorization information. Select 1 for yes:
- At this point, a web browser window should open with a response from Google and include a code. This code is used by Google to authenticate the session. Copy and paste this code in the RStudio console:
- R will then run the script accessing the BigQuery API. BigQuery will run the query and return results in the form of an R data frame object (named result in this case):
- Run the next script in the R console:
library(ggplot2)
p <- ggplot(data=result, aes(x=Medium, y=Visits)) +
geom_bar(stat="identity")
p
The main selling point for using a programming language like R for these types of visualizations is the type of flexibility programming it provides. Unlike Tableau and Google Data Studio, which provide simple but strict frameworks for creating visualizations, R allows the user to be creative with their visualizations.
Let's try pushing R a bit further. The following script adds an error bar comparing the percent age difference of the actual values to the standard deviation for the bar chart visualization we've just created.
- Run the next script in the R console to show the difference in standard deviation and visualization:
#add a calculation of % difference from the standard deviation
result$sd <- result$Visits/sd(result$Visits)
#plot result data
p <- ggplot(data=result, aes(x=Medium, y=Visits)) +
geom_bar(stat="identity") +
geom_errorbar(aes(ymin=Visits-sd, ymax=Visits+sd), width=2,
position=position_dodge(.9))
p
Now we have a highly specialized visualization that lets us know how far each Medium is from the standard deviation.