Working with Big Data inR | 225
Summary
R programming has the following basic
data structures.
Scalar
Vector
Matrix
List
Array
Factor
Data frame
dplyr package of R is meant for advanced
data manipulation.
Core R has a lot of functionalities to sup-
port basic statistical data exploration.
ggplot2 library offers a comprehensive
graphics module for creating elaborate and
complex plots.
The basic plots supported by R for explor-
atory data analysis (EDA) are box plot,
histogram and scatter plot.
The primary limitations that are faced
when working with conventional libraries
of R is as follows.
Size of the data tends to be larger than
the amount of memory available in
RAM.
The processing speed of R is relatively
lesser than the other comparable lan-
guages, for example, Python.
The R packages has to deal with the above
limitations.
ff and ffbase
parallel
data.table
For integrating R with Hadoop ecosystem,
RHadoop package can be leveraged.
RHadoop is a collection of five R packages
that allows users to manage and analyse
data with Hadoop. The five packages used
are namely as follows.
rhdfs package
rhbase package
rmr2 package
plyrmr package
ravro package
Multiple-choice Questions (1 Mark Questions)
1. Which of the following is not a R data
structure?
a. Class
b. Array
c. Data frame
d. Matrix
2. Which of the following is a package for
advanced data manipulation?
a. ggplot2
b. dplyr
c. rmr2
d. None of the above
3. Which of the following is a package for
implementing advanced graphics?
a. ggplot2
b. dplyr
c. caret
d. parallel
4. The main problem of R as a language is
a. Complex coding
b. Processing speed
c. Poor graphical functionality
d. All the above
M08 Big Data Simplified XXXX 01.indd 225 5/10/2019 10:01:18 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.113.55