Working with Big Data inR | 207
> sd(data$mpg)
[1] 7.815984
8.2.2 Basic Plots for Data Exploration
Even though statistical techniques gives a good idea about the nature and quality of data in
a data set, however, more effective data exploration is possible through visualization. R pro-
gramming provides a bunch of very strong libraries for data exploration using charts and plots.
The front-runner amongst them is the ggplot2 library. It was created by Hadley Wickham, the
ggplot2 library offers a comprehensive graphics module for creating elaborate and complex
plots.
In order to start using the library functions of ggplot2, we need to load the library as follows.
> library(ggplot2)
Let us now understand the different graphs used for data exploration and how to generate them
using R code.
Box Plot: A box plot is an extremely effective mechanism to get a one-shot view and under-
stand the nature of the data. Boxplot (also called box and whisker plot) gives a standard
visualization of the ve-number summary statistics of a data, namely Minimum, First quartile
(Q1), Median (Q2), Third Quartile (Q3) and Maximum. Below is a detailed interpretation of
a box plot.
• The central rectangle or the box spans from first to third quartile (i.e., Q1 to Q3), thereby
giving the interquartile range or IQR.
• Median is given by the line or band within the box.
• The lower whisker extends up to 1.5 times of the interquartile range (IQR) from the bottom
of the box, i.e., the first quartile or Q1.
• The upper whisker extends up to 1.5 times of the interquartile range (IQR) from the top of
the box, i.e., the third quartile or Q3.
• The data values coming beyond the lower or upper whiskers are the ones which are of
unusually low or high values, respectively. These are the outliers, which may deserve special
consideration.
Syntax: boxplot (x, data, notch, var width, names, main)
Usage:
> boxplot(iris)# Iris is a popular data set used in machine
learning which comes bundled in R installation
A separate window comes up in R console with the box plot generated as shown in Figure 8.3.
M08 Big Data Simplified XXXX 01.indd 207 5/10/2019 10:01:14 AM