Grouping over a variable

In this recipe, we will see how we can summarize data for a variable with respect to another variable in the dataset. We will learn to group over a variable such that a separate box plot is created for each group.

Getting ready

We will only use the base graphics functions for this recipe. So, just open up the R prompt and type in the following code. We will use the metals.csv example dataset for this recipe. So, let's first load it:

metals<-read.csv("metals.csv")

How to do it...

Let's make a box plot that shows copper (Cu) concentrations grouped over measurement sites:

boxplot(Cu~Source,data=metals,
main="Summary of Copper (Cu) concentrations by Site")
How to do it...

How it works...

The preceding box plot works by using the formula notation, y~group, where y is the variable whose values are depicted as separated box plots for each value of group.

There's more

Grouping over a variable works well only when the group variable has a limited number of values, for example, when it is a category (or factor in terms of an R data type) such as Source in this example. Grouping over another numerical variable with lots of unique values (say, manganese (Mn) concentrations) would result in a graph with too many box plots and not tell us much about the data.

We can also group over more than one category. If we wanted to group over Source and another variable, Expt, the experiment number, we can run:

boxplot(Cu~Source*Expt,data=metals,
main="Summary of Copper (Cu) concentrations by Site")

See also

We will use grouped box plots as examples in the next few recipes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.139.245