In the preceding couple of recipes, we looked at representing a matrix of data along two axes on a heat map. In this recipe, we will learn how to summarize multivariate data using a heat map.
We are only using the base graphics functions for this recipe. So, just open up the R prompt and type in the following code. We will use the nba.csv
example dataset for this recipe. So, let's first load it:
nba <- read.csv("nba.csv")
This example dataset, which shows some statistics on the top scorers in NBA basketball games has been taken from a blog post on FlowingData (see http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/ for details). The original data is from the databaseBasketball.com website (http://databasebasketball.com/). We will use our own code to create a similar heat map showing player statistics.
We will use the RColorBrewer
library for a nice color palette, so let's load it:
library(RColorBrewer)
We are going to summarize a number of NBA player statistics in the same heat map using the image()
function:
rownames(nba)<-nba[,1] data_matrix<-t(scale(data.matrix(nba[,-1]))) pal=brewer.pal(6,"Blues") statnames<-c("Games Played", "Minutes Played", "Total Points", "Field Goals Made", "Field Goals Attempted", "Field Goal Percentage", "Free Throws Made", "Free Throws Attempted", "Free Throw Percentage", "Three Pointers Made", "Three Pointers Attempted", "Three Point Percentage", "Offensive Rebounds", "Defensive Rebounds", "Total Rebounds", "Assists", "Steals", "Blocks", "Turnovers", "Fouls") par(mar = c(3,14,19,2),oma=c(0.2,0.2,0.2,0.2),mex=0.5) #Heat map image(x=1:nrow(data_matrix),y=1:ncol(data_matrix), z=data_matrix,xlab="",ylab="",col=pal,axes=FALSE) #X axis labels text(1:nrow(data_matrix), par("usr")[4] + 1, srt = 45, adj = 0,labels = statnames, xpd = TRUE, cex=0.85) #Y axis labels axis(side=2,at=1:ncol(data_matrix), labels=colnames(data_matrix), col="white",las=1, cex.axis=0.85) #White separating lines abline(h=c(1:ncol(data_matrix))+0.5, v=c(1:nrow(data_matrix))+0.5, col="white",lwd=1,xpd=F) #Graph Title text(par("usr")[1]+5, par("usr")[4] + 12, "NBA per game performance of top 50corers", xpd=TRUE,font=2,cex=1.5)
Once again, in a way similar to the preceding couple of recipes, we first formatted the dataset with the appropriate row names (in this case, names of players) and cast it as a matrix. We did one additional thing—we scaled the values in the matrix using the scale()
function, which centers and scales each column so that we can denote the relative values of each column on the same color scale.
We chose a blue color palette from the RColorBrewer
library. We also created a vector with the descriptive names of the player statistics to use as labels for the x axis.
The code for the heat map itself and the axis labels is very similar to the previous recipe. We used the image()
function with data_matrix
as z
and suppressed the default axes. Then, we used text()
and axis()
to add the x and y axis labels. We also used the text()
function to add the graph title (instead of the title()
function) in order to left align it with the y axis labels instead of the heat map.
As shown in the FlowingData blog post, we can order the data in the matrix as per the values in any one column. By default, the data is in the ascending order of total points scored by each player (as can be seen from the light to dark blue progression in the Total Points column). To order the players based on their scores from highest to lowest, we need to run the following code after reading the CSV file:
nba <- nba[order(nba$PTS),]
Then, we can run the rest of the code to make the following graph:
18.118.24.228