170 | Big Data Simplied
hduser@sayan:~/$hadoop fs -cat /user/hive/warehouse/emp1_
bucketed/000000_0
4564546498,280813,840,820,Samir,Aus
4564546498,280813,840,820,Rabi,Aus
hduser@sayan:~/$ hadoop fs -cat /user/hive/warehouse/emp1_
bucketed/000001_0
4564546678,280813,740,875,Sayan,UK
4564546678,280813,740,875,Sayan,UK
hduser@sayan:~/$ hadoop fs -cat /user/hive/warehouse/emp1_
bucketed/000002_0
4564546489,280813,640,840,Soumen,USA
4564546454,280813,640,890,Arijit,USA
4564546489,280813,640,840,Astik,USA
4564546454,280813,640,890,Chiranjib,USA
As such, the most important thing about Hive is that, it is the glue that binds the world of Big
Data and the world of traditional business intelligence in many cases. Most BI tools and data
visualization tools that paint a Big Data story reach there by building their own connectors to
Hive and then in effect generates Hive queries that would then eventually manifest themselves as
MapReduce jobs and return data back to those tools. So, Hive has a very special role to play in
terms of getting more conventional client tools and data visualization tools to work with Hadoop.
7.3 PIG
Pig uses a stepwise data transformation language called Pig Latin. Pig is a scripting language
which allows you to take data in its raw and unstructured form. As an example, think of log les,
where there is absolutely no structure to the data and the information is available as lines of mes-
sages, warnings, etc. Pig manipulates data such as this, and converts it into a structured format.
Data in this structured format can then be stored in HDFS or in Hive, and then can be queried
by anyone looking to extract insights from it.
7.3.1 Why Apache Pig
Using Pig Latin, developers can perform MapReduce jobs easily without having to write
complex codes in Java.
Apache Pig uses multi-query approach, thereby reducing the length of codes. For example,
an operation that would require to write 200–300 lines of code in Java can be easily done
by writing as less as just 10 lines of code in Apache Pig. Ultimately, Apache Pig reduces the
development time by almost 20 times.
Pig Latin is a SQL-like language and it is an easy to learn Apache Pig when familiar with SQL.
Apache Pig provides many built-in operators to support data operations, like joins, filters,
ordering, etc. In addition, it also provides nested data types like tuples, bags and maps that
are missing from core MapReduce programming.
M07 Big Data Simplified XXXX 01.indd 170 5/17/2019 2:50:07 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.14.98