In most real-world research data, we have multiple categorical variables. Though we can summarize these variables using cross-tabulation, if we want to visualize this through the bar chart, we can do so easily. In this recipe, we will see how we can produce a bar chart in order to visualize cross-tabulation.
To produce a bar chart from cross-tabulation, we will add two new variables with the dataset that we used in the first two recipes. The new variable will represent the sex and economic status. Here is the code that prepares the dataset:
# Set a seed value to make the data reproducible set.seed(12345) cross_tabulation_data <-data.frame(disA=rnorm(n=100,mean=20,sd=3), disB=rnorm(n=100,mean=25,sd=4), disC=rnorm(n=100,mean=15,sd=1.5), age=sample((c(1,2,3,4)),size=100,replace=T), sex=sample(c("Male","Female"),size=100,replace=T), econ_status=sample(c("Poor","Middle","Rich"), size=100,replace=T))
Since we want to produce a bar chart for cross-tabulated data, we will summarize the dataset in such a way that it contains the frequency count of each combination of the variables. For this, we will use the table()
function; here is the code:
# producing the cross tabulation data # with three categorical variables cross_table <- as.data.frame(table(cross_tabulation_data[,4:6]))
We want to visualize the frequency distribution for each combination of the variable age for the sex and economic status; for this, the lattice code structure is as follows:
barchart(age~Freq|sex+econ_status,data=cross_table)
In the following bullet points, we will describe how the code produced the preceding figure. We will explain each of the arguments separately:
barchart(age~Freq|sex+econ_status,data=cross_table)
age~Freq
formula specifies that the age will be displayed on the y axis, and the x axis will display the frequency countsex+econ_status
specifies that the final display will be grouped by each unique combination of these two variablesThe noticeable feature of this plot is the way to write the formula in the barchart()
function. If we write the formula differently, then it will produce a different visualization. For example, let's replace age
with the econ_status
variable and write the following code:
barchart(econ_status~Freq|sex+age,data=cross_table)
If we want to write a label for each axis and provide a title for this chart, we need to specify the respective argument in the plot function. For the title, we need to use the main
argument; for the y axis label, we need to use ylab
; and for the x axis label, we need to use xlab
, as shown in the following code:
barchart(econ_status~Freq|sex+age,data=cross_table, main="Chart title",xlab="Frequency count",ylab="Economic Status")
To change the color of the plot, we can use the col
argument to specify the desired color as col="black"
. The following figure shows the bar chart from a cross-tabulated frequency distribution. Particularly, this figure shows a three way frequency distribution of the age, economic status, and the sex of the individuals.
18.116.42.137