In this recipe, we will see how we can summarize data for a variable with respect to another variable in the dataset. We will learn to group over a variable such that a separate box plot is created for each group.
We will only use the base graphics functions for this recipe. So, just open up the R prompt and type in the following code. We will use the metals.csv
example dataset for this recipe. So, let's first load it:
metals<-read.csv("metals.csv")
Let's make a box plot that shows copper (Cu
) concentrations grouped over measurement sites:
boxplot(Cu~Source,data=metals, main="Summary of Copper (Cu) concentrations by Site")
The preceding box plot works by using the formula notation, y~group
, where y
is the variable whose values are depicted as separated box plots for each value of group
.
Grouping over a variable works well only when the group variable has a limited number of values, for example, when it is a category (or factor in terms of an R data type) such as Source
in this example. Grouping over another numerical variable with lots of unique values (say, manganese (Mn
) concentrations) would result in a graph with too many box plots and not tell us much about the data.
We can also group over more than one category. If we wanted to group over Source
and another variable, Expt
, the experiment number, we can run:
boxplot(Cu~Source*Expt,data=metals, main="Summary of Copper (Cu) concentrations by Site")
3.22.66.132