In this chapter, we will cover the following recipes:
Exploratory data analysis is one of the most popular techniques to view patterns in data and the pattern of association among variables. In this chapter, we will learn how we can visualize multivariate data in a single plot where both continuous and categorical data can be graphed. The continuous variable can take any values, including decimal points; this can be the age of a person, height, or any numeric value, whereas categorical variables usually take a limited number of values, representing a nominal group. The examples of categorical variables include sex, occupation, and so on. In this chapter, we will use the tabplot
library to produce the plot and view various options of this library.
We will create a dataset by modifying a default dataset in R, which is stored in the mtcars
dataset. We will keep the same variables but with high amount of observation. The following are the variables that are present in the mtcars
dataset:
In this dataset, we will consider the cyl
, vs
, am
, gear
, and carb
variables as categorical variables and the others as continuous variables. Now, we will modify this dataset to make the number of observations 1000
through empirical simulation by generating quantiles, as follows:
# calling mtcars dataset and store it in the dat object dat <- mtcars # set seed to make the data reproducible. Here reproducible # means the code will create same dataset in each and every # run in any computer set.seed(12345) # Generate 1000 random uniform number # The random uniform number will be # used as probability argument in quantile function probs <- runif(1000) # Generate each of the variables separately mpg <- quantile(dat$mpg,prob=probs) cyl <- as.integer(quantile(dat$cyl,prob=probs)) disp <- as.integer(quantile(dat$disp,prob=probs)) hp <- as.integer(quantile(dat$hp,prob=probs)) drat <- quantile(dat$drat,prob=probs) wt <- quantile(dat$wt,prob=probs) qsec <- quantile(dat$qsec,prob=probs) vs <- as.integer(quantile(dat$vs,prob=probs)) am <- as.integer(quantile(dat$am,prob=probs)) gear <- as.integer(quantile(dat$gear,prob=probs)) carb <- as.integer(quantile(dat$carb,prob=probs)) # Make a new dataframe containing all the variables # Some of the variables we converted to factor # to represents as categorical variable modified_mtcars <- data.frame(mpg,cyl=factor(cyl), disp,hp,drat,wt,qsec,vs=factor(vs), am=factor(am),gear=factor(gear), carb=factor(carb)) row.names(modified_mtcars) <- NULL
Now, let's look at the actual recipe to see the pattern of this data.
3.15.195.128