Visualizing distributions through a kernel-density plot

The kernel-density plot is another method of visualizing the distribution of numeric variables. In this recipe, we will see how we can produce a kernel density plot with minor modifications to the code that produces a histogram.

Getting ready

Recall the data from the histogram recipe using the following code:

# Set a seed value to make the data reproducible
set.seed(12345)
cross_tabulation_data <-data.frame(disA=rnorm(n=100,mean=20,sd=3),
                disB=rnorm(n=100,mean=25,sd=4),
                disC=rnorm(n=100,mean=15,sd=1.5),
                age=sample((c(1,2,3,4)),size=100,replace=T),
                sex=sample(c("Male","Female"),size=100,replace=T),
                econ_status=sample(c("Poor","Middle","Rich"),
                size=100,replace=T))

How to do it…

Use the following code if you want to visualize the kernel density of the disA variable for each value of the sex variable:

densityplot(~disA|sex,data=cross_tabulation_data)
How to do it…

How it works…

The densityplot function works the same as histogram, but the visualization is different. The numeric variable after the ~ symbol specifies the variable for which we are expecting the density plot. Then, the categorical variable after the vertical bar indicates the grouping information. Grouping refers to the number of density plots that should be produced. Now, if we want a multiple variable density plot, then we will write the formula as ~disA+disB. In this case, we need to use the legend key to identify the plots for two variables. So, the final code will be as follows:

densityplot(~disA+disB|sex,data=cross_tabulation_data,auto.key=T)
How it works…

There's more…

To get a multiple density plot with a unique combination of more than one categorical variable, we just need to add a new categorical variable using a plus (+) sign. In the kernel density plot (a way to approximate the PDF of data that is smoother than the histogram), the default kernel function is gaussian, but we can easily change this to another function through the kernel= argument. We can also specify the other necessary parameters of the kernel function, such as the bandwidth (bw). The other argument, such as main, xlab, ylab, and col, will be similar to other recipes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.173.89