Creating bar charts to visualize cross-tabulation

In most real-world research data, we have multiple categorical variables. Though we can summarize these variables using cross-tabulation, if we want to visualize this through the bar chart, we can do so easily. In this recipe, we will see how we can produce a bar chart in order to visualize cross-tabulation.

Getting ready

To produce a bar chart from cross-tabulation, we will add two new variables with the dataset that we used in the first two recipes. The new variable will represent the sex and economic status. Here is the code that prepares the dataset:

# Set a seed value to make the data reproducible
set.seed(12345)
cross_tabulation_data <-data.frame(disA=rnorm(n=100,mean=20,sd=3),
                disB=rnorm(n=100,mean=25,sd=4),
                disC=rnorm(n=100,mean=15,sd=1.5),
                age=sample((c(1,2,3,4)),size=100,replace=T),
                sex=sample(c("Male","Female"),size=100,replace=T),
                econ_status=sample(c("Poor","Middle","Rich"),
                size=100,replace=T))

Since we want to produce a bar chart for cross-tabulated data, we will summarize the dataset in such a way that it contains the frequency count of each combination of the variables. For this, we will use the table() function; here is the code:

# producing the cross tabulation data 
# with three categorical variables
cross_table <- as.data.frame(table(cross_tabulation_data[,4:6]))

How to do it…

We want to visualize the frequency distribution for each combination of the variable age for the sex and economic status; for this, the lattice code structure is as follows:

barchart(age~Freq|sex+econ_status,data=cross_table)
How to do it…

How it works…

In the following bullet points, we will describe how the code produced the preceding figure. We will explain each of the arguments separately:

  • The code is barchart(age~Freq|sex+econ_status,data=cross_table)
  • The first argument specifies what will be on the x axis and what will be on the y axis
  • The next part indicates what the different panel or grouping will be
  • The very left-hand side of the age~Freq formula specifies that the age will be displayed on the y axis, and the x axis will display the frequency count
  • On the other hand, sex+econ_status specifies that the final display will be grouped by each unique combination of these two variables

The noticeable feature of this plot is the way to write the formula in the barchart() function. If we write the formula differently, then it will produce a different visualization. For example, let's replace age with the econ_status variable and write the following code:

barchart(econ_status~Freq|sex+age,data=cross_table)
How it works…

There's more…

If we want to write a label for each axis and provide a title for this chart, we need to specify the respective argument in the plot function. For the title, we need to use the main argument; for the y axis label, we need to use ylab; and for the x axis label, we need to use xlab, as shown in the following code:

barchart(econ_status~Freq|sex+age,data=cross_table,
main="Chart title",xlab="Frequency count",ylab="Economic Status")

To change the color of the plot, we can use the col argument to specify the desired color as col="black". The following figure shows the bar chart from a cross-tabulated frequency distribution. Particularly, this figure shows a three way frequency distribution of the age, economic status, and the sex of the individuals.

There's more…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.42.137