Other BigData Tools andTechnologies | 187
les, social feeds, log les and more. NiFi provides a congurable plumbing platform for moving
data and it enables tracing data in real time. It is not an interactive ETL component and it can
be part of an ETL solution.
Apache NiFi is designed from the ground up to be enterprise ready, flexible, extensible and
suitable for a range of devices from network edge devices such as a Raspberry Pi to enterprise
data clusters and the cloud. Apache NiFi can also adjust to fluctuating network connectivity that
could impact the delivery of data.
Features of Apache NiFi: Some of the standard reasons for using Apache NiFi are listed as follows.
Allows to do data ingestion to pull data into NiFi from numerous data sources and create
flow files.
It offers real-time control which helps to manage the movement of data between any source
and destination.
Visualize data flow at the enterprise level from different data sources.
Provide common tooling and extensions.
Allows to take advantage of existing libraries and Java ecosystem functionality.
Helps organizations to integrate NiFi with their existing infrastructure.
NiFi is designed to scale-out in clusters which offer guaranteed delivery of data.
It helps you to listen, fetch, split, aggregate, route, transform and drag and drop data flow.
Summary
• Hive is used for viewing and analysing
the semi-structured data stored in HDFS.
The advantage is Hive supports normal
SQL language, i.e., HiveQL. After initiat-
ing a Hive query, it is transformed into a
MapReduce job by Hive execution engine.
So, the development effort and time gets
optimized.
• Hive interacts with Hive Metastore during
query execution and executes the job in
Yarn container as a MapReduce job. The
connection should be established well
with Metastore. The main configuration
file to integrate Metastore with Hive ser-
vice is ‘hive-site.xml’.
• Hive provides two options of execution
engine to execute a HiveQL. The first
engine is MR and the second engine is
TEZ.
• Sqoop is very useful for migrating a leg-
acy database into HDFS / Hive directly.
Incremental data load is also possible by
Sqoop.
• Solr is extensively used in document
indexing use cases. ZooKeeper plays a vital
role to manage Solr nodes.
• Using Flume, data can be streamed from
any source system to HDFS continuously.
• NiFi can take advantage of the existing
libraries and Java ecosystem functionality
and in this way, the real development can
be decreased.
• An Oozie workflow consists of action
nodes and control-flow nodes. It is actually
a schedular component of Hadoop ecosys-
tem to schedule different types of Hadoop
jobs.
M07 Big Data Simplified XXXX 01.indd 187 5/17/2019 2:50:16 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.97.170