A scatter plot is the simplest plot that visualizes the relationship pattern between numeric variables. In this recipe, we will see how we can produce a scatter plot of two numeric variables conditional on a categorical variable.
The dataset used for this recipe is as follows:
# Set a seed value to make the data reproducible set.seed(12345) qqdata <-data.frame(disA=rnorm(n=100,mean=20,sd=3), disB=rnorm(n=100,mean=25,sd=4), disC=rnorm(n=100,mean=15,sd=1.5), age=sample((c(1,2,3,4)),size=100,replace=T), sex=sample(c("Male","Female"),size=100,replace=T), econ_status=sample(c("Poor","Middle","Rich"), size=100,replace=T))
The primary code structure that produces the scatter plot using the lattice environment is as follows:
xyplot(disA~disB, data=qqdata)
However, in this recipe, we want to produce a conditional scatter plot. We can perform the conditioning in two different ways:
Here are both the code respectively:
# colored scatter plot xyplot(disA~disB,group=sex,data=qqdata,auto.key=T)
To create the panel scatter plot, we could use the following code, where the scatter plot will be created for each unique value of a categorical variable. In this case, sex
is the categorical variable:
# panel scatter plot xyplot(disA~disB|sex,data=qqdata)
The formula part of the xyplot()
function specifies the variables for each axis that corresponds to a single point for each pair of values. The
group
argument has been used to create the conditional plot with the color for each point. However, if we do not use the group argument and use a vertical bar in the formula, then a panel scatter plot is produced.
13.58.201.75