Summarizing multivariate data in a single heat map

In the preceding couple of recipes, we looked at representing a matrix of data along two axes on a heat map. In this recipe, we will learn how to summarize multivariate data using a heat map.

Getting ready

We are only using the base graphics functions for this recipe. So, just open up the R prompt and type in the following code. We will use the nba.csv example dataset for this recipe. So, let's first load it:

nba <- read.csv("nba.csv")

This example dataset, which shows some statistics on the top scorers in NBA basketball games has been taken from a blog post on FlowingData (see http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/ for details). The original data is from the databaseBasketball.com website (http://databasebasketball.com/). We will use our own code to create a similar heat map showing player statistics.

We will use the RColorBrewer library for a nice color palette, so let's load it:

library(RColorBrewer)

How to do it...

We are going to summarize a number of NBA player statistics in the same heat map using the image() function:

rownames(nba)<-nba[,1]

data_matrix<-t(scale(data.matrix(nba[,-1])))

pal=brewer.pal(6,"Blues")

statnames<-c("Games Played", "Minutes Played", "Total Points",
"Field Goals Made", "Field Goals Attempted", 
"Field Goal Percentage", "Free Throws Made", 
"Free Throws Attempted", "Free Throw Percentage", 
"Three Pointers Made", "Three Pointers Attempted", 
"Three Point Percentage", "Offensive Rebounds", 
"Defensive Rebounds", "Total Rebounds", "Assists", "Steals", 
"Blocks", "Turnovers", "Fouls")

par(mar = c(3,14,19,2),oma=c(0.2,0.2,0.2,0.2),mex=0.5)

#Heat map 
image(x=1:nrow(data_matrix),y=1:ncol(data_matrix),
z=data_matrix,xlab="",ylab="",col=pal,axes=FALSE)

#X axis labels
text(1:nrow(data_matrix), par("usr")[4] + 1, 
srt = 45, adj = 0,labels = statnames,
xpd = TRUE, cex=0.85)

#Y axis labels
axis(side=2,at=1:ncol(data_matrix),
labels=colnames(data_matrix),
col="white",las=1, cex.axis=0.85)

#White separating lines
abline(h=c(1:ncol(data_matrix))+0.5,
v=c(1:nrow(data_matrix))+0.5,
col="white",lwd=1,xpd=F)

#Graph Title
text(par("usr")[1]+5, par("usr")[4] + 12,
"NBA per game performance of top 50corers", 
xpd=TRUE,font=2,cex=1.5)
How to do it...

How it works...

Once again, in a way similar to the preceding couple of recipes, we first formatted the dataset with the appropriate row names (in this case, names of players) and cast it as a matrix. We did one additional thing—we scaled the values in the matrix using the scale() function, which centers and scales each column so that we can denote the relative values of each column on the same color scale.

We chose a blue color palette from the RColorBrewer library. We also created a vector with the descriptive names of the player statistics to use as labels for the x axis.

The code for the heat map itself and the axis labels is very similar to the previous recipe. We used the image() function with data_matrix as z and suppressed the default axes. Then, we used text() and axis() to add the x and y axis labels. We also used the text() function to add the graph title (instead of the title() function) in order to left align it with the y axis labels instead of the heat map.

There's more

As shown in the FlowingData blog post, we can order the data in the matrix as per the values in any one column. By default, the data is in the ascending order of total points scored by each player (as can be seen from the light to dark blue progression in the Total Points column). To order the players based on their scores from highest to lowest, we need to run the following code after reading the CSV file:

nba <- nba[order(nba$PTS),]

Note

See help on the order() function by running ?order or help(order) at the R prompt.

Then, we can run the rest of the code to make the following graph:

There's more
There's more
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.241.133