Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 13. Inspecting Large Datasets

In this chapter, we will cover the following recipes:

Multivariate continuous data visualization
Multivariate visualization of categorical data
Visualizing mixed data
Zooming and filtering

Introduction

Exploratory data analysis is one of the most popular techniques to view patterns in data and the pattern of association among variables. In this chapter, we will learn how we can visualize multivariate data in a single plot where both continuous and categorical data can be graphed. The continuous variable can take any values, including decimal points; this can be the age of a person, height, or any numeric value, whereas categorical variables usually take a limited number of values, representing a nominal group. The examples of categorical variables include sex, occupation, and so on. In this chapter, we will use the tabplot library to produce the plot and view various options of this library.

We will create a dataset by modifying a default dataset in R, which is stored in the mtcars dataset. We will keep the same variables but with high amount of observation. The following are the variables that are present in the mtcars dataset:

mpg: Miles/(US) gallon
cyl: The number of cylinders
disp: Displacement (cu.in)
hp: The gross horsepower
drat: The rear axle ratio
wt: Weight (lb/1000)
qsec: Quarter mile time
vs: Versus
am: Transmission (0= automatic, 1= manual)
gear: The number of forward gears
carb: The number of carburetors

In this dataset, we will consider the cyl, vs, am, gear, and carb variables as categorical variables and the others as continuous variables. Now, we will modify this dataset to make the number of observations 1000 through empirical simulation by generating quantiles, as follows:

# calling mtcars dataset and store it in the dat object
dat <- mtcars

# set seed to make the data reproducible. Here reproducible
# means the code will create same dataset in each and every
# run in any computer
set.seed(12345)

# Generate 1000 random uniform number 
# The random uniform number will be
# used as probability argument in quantile function
probs <- runif(1000)

# Generate each of the variables separately
mpg <- quantile(dat$mpg,prob=probs)
cyl <- as.integer(quantile(dat$cyl,prob=probs))
disp <- as.integer(quantile(dat$disp,prob=probs))
hp <- as.integer(quantile(dat$hp,prob=probs))
drat <- quantile(dat$drat,prob=probs)
wt <- quantile(dat$wt,prob=probs)
qsec <- quantile(dat$qsec,prob=probs)
vs <- as.integer(quantile(dat$vs,prob=probs))
am <- as.integer(quantile(dat$am,prob=probs))
gear <- as.integer(quantile(dat$gear,prob=probs))
carb <- as.integer(quantile(dat$carb,prob=probs))

# Make a new dataframe containing all the variables
# Some of the variables we converted to factor 
# to represents as categorical variable
modified_mtcars <- data.frame(mpg,cyl=factor(cyl),
                     disp,hp,drat,wt,qsec,vs=factor(vs),
                     am=factor(am),gear=factor(gear),
                     carb=factor(carb))
row.names(modified_mtcars) <- NULL

Now, let's look at the actual recipe to see the pattern of this data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 13. Inspecting Large Datasets

Create new playlist

Sign In

Sign Up

Chapter 13. Inspecting Large Datasets

Introduction

Table of Contents for
13. Inspecting Large Datasets