OBJECTIVE
CHAPTER
8
Working with
Big Data inR
8.1 Prerequisites
8.2 Exploratory Data Analysis
8.3 R Libraries for Dealing with
Large Data Sets
8.4 Integrating Hadoop with R
8.5 Simple R Program with Hadoop
In the last seven chapters, we were going through an
exciting journey. We started learning the characteristics
of traditional relational databases and pointed its inef-
ciencies to support large-scale (orshould we say web-
scale) enterprise systems. Then we came across learning
the concept of Big Data, which not only addressed
the challenges due to the growing, unstructured data
volume, but also highlighted the effects of Big Data
using commodity hardware set-up. With the concept
of Big Data, we got introduced to Hadoop ecosystem
with its two critical pillars of success, such as Hadoop
Distributed File System (HDFS) and MapReduce.
Wealso started to learn the basic concepts of all-im-
portant sub-systems of Hadoop ecosystem, namely
NoSQL, Spark, Kafka, Pig, Hive, Sqoop, Flume, Storm
and Mahout. Is Hadoop ecosystem the only way to
explore Big Data? Or there are alternates, albeit may
not be that powerful as Hadoop.
In this chapter, we shall explore the possibilities of
using R as an alternate tool to discover and process
large data sets. We shall start learning to obtain a basic
exposure to R as a programming tool and then do a
deep-dive into its abilities to handle data sets of large
size. We shall wrap up with a quick introduction on
how R can work as a tool integrating with Hadoop eco-
system, thereby boosting the statistical and analytical
capabilities that can be implemented in Hadoop.
M08 Big Data Simplified XXXX 01.indd 191 5/10/2019 10:01:10 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.67.70