Chapter 13. Inspecting Large Datasets

In this chapter, we will cover the following recipes:

  • Multivariate continuous data visualization
  • Multivariate visualization of categorical data
  • Visualizing mixed data
  • Zooming and filtering

Introduction

Exploratory data analysis is one of the most popular techniques to view patterns in data and the pattern of association among variables. In this chapter, we will learn how we can visualize multivariate data in a single plot where both continuous and categorical data can be graphed. The continuous variable can take any values, including decimal points; this can be the age of a person, height, or any numeric value, whereas categorical variables usually take a limited number of values, representing a nominal group. The examples of categorical variables include sex, occupation, and so on. In this chapter, we will use the tabplot library to produce the plot and view various options of this library.

We will create a dataset by modifying a default dataset in R, which is stored in the mtcars dataset. We will keep the same variables but with high amount of observation. The following are the variables that are present in the mtcars dataset:

  • mpg: Miles/(US) gallon
  • cyl: The number of cylinders
  • disp: Displacement (cu.in)
  • hp: The gross horsepower
  • drat: The rear axle ratio
  • wt: Weight (lb/1000)
  • qsec: Quarter mile time
  • vs: Versus
  • am: Transmission (0= automatic, 1= manual)
  • gear: The number of forward gears
  • carb: The number of carburetors

In this dataset, we will consider the cyl, vs, am, gear, and carb variables as categorical variables and the others as continuous variables. Now, we will modify this dataset to make the number of observations 1000 through empirical simulation by generating quantiles, as follows:

# calling mtcars dataset and store it in the dat object
dat <- mtcars

# set seed to make the data reproducible. Here reproducible
# means the code will create same dataset in each and every
# run in any computer
set.seed(12345)

# Generate 1000 random uniform number 
# The random uniform number will be
# used as probability argument in quantile function
probs <- runif(1000)

# Generate each of the variables separately
mpg <- quantile(dat$mpg,prob=probs)
cyl <- as.integer(quantile(dat$cyl,prob=probs))
disp <- as.integer(quantile(dat$disp,prob=probs))
hp <- as.integer(quantile(dat$hp,prob=probs))
drat <- quantile(dat$drat,prob=probs)
wt <- quantile(dat$wt,prob=probs)
qsec <- quantile(dat$qsec,prob=probs)
vs <- as.integer(quantile(dat$vs,prob=probs))
am <- as.integer(quantile(dat$am,prob=probs))
gear <- as.integer(quantile(dat$gear,prob=probs))
carb <- as.integer(quantile(dat$carb,prob=probs))

# Make a new dataframe containing all the variables
# Some of the variables we converted to factor 
# to represents as categorical variable
modified_mtcars <- data.frame(mpg,cyl=factor(cyl),
                     disp,hp,drat,wt,qsec,vs=factor(vs),
                     am=factor(am),gear=factor(gear),
                     carb=factor(carb))
row.names(modified_mtcars) <- NULL

Now, let's look at the actual recipe to see the pattern of this data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.195.128