Multiple histograms along the diagonal of a pairs plot

In this recipe, we will look at some slightly advanced code to embed histograms inside another kind of graph. We learned how to make pairs plots (a matrix of scatter plots) in Chapter 1, R Graphics, and Chapter 4, Creating Scatter Plots. In those pairs plots, the diagonal cells running from the top-left to the bottom-right showed the names of the variables, while the other cells showed the relationship between any two pairs of variables. It will be useful if we can also see the probability distribution of each variable in the same plot. Here, we will learn how to do this by adding histograms inside the diagonal cells.

Getting ready

We will use the inbuilt iris flowers dataset of R. So, we need not load any other datasets. We can simply type in the given code at the R prompt.

How to do it...

So, let's make an enhanced pairs plot showing the relationship between different measurements of the iris flower species and how each measurement's values are spread across the range:

panel.hist <- function(x, ...)
  {
    par(usr = c(par("usr")[1:2], 0, 1.5) )
    hist(x, prob=TRUE,add=TRUE,col="black",border="white")
  }

plot(iris[,1:4],
main="Relationships between characteristics of iris flowers",
pch=19,col="blue",cex=0.9,
diag.panel=panel.hist)
How to do it...

How it works...

We first defined the panel.hist() function that handles how the histograms are drawn. It is called by the plot() function later when the diag.panel argument is set to panel.hist.

The panel.hist() function only has two simple lines of code. First, we call the par() function to set the X and Y limits using the usr argument. To reiterate what we learnt in Chapter 3, Beyond the Basics – Adjusting Key Parameters, the usr arguments take values in the form of a vector, c(xmin,xmax,ymin,ymax), giving the minimum and maximum values on the x and y axes, respectively. In the code, we keep the x axis limits the same as already set up by the plot() function call. We need to change the y axis limits for each diagonal cell because they are set by plot() to be the same as the x axis limits. We need the y axis limits in terms of the kernel density of each variable, so we set them to 0 and 1.5.

Then, we make the hist() function call with the style arguments of our choice and one key argument, add (set to TRUE), which makes sure the histograms are added to the existing pairs plot and not drawn as new plots. Any panel function should not start a new plot or it will terminate the pairs plot. So, we can't use the hist() function without setting add to TRUE.

How it works...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.209.130