164 | Big Data Simplied
LEFT OUTER JOIN: The Hive LEFT OUTER JOIN returns all the rows from the left table, even if there
are no matches in the right table. This means, if the ON clause matches 0 (zero) records in the
right table, then the JOIN still returns a row in the result, but with NULL in each column from
the right table.
A LEFT JOIN returns all the values from the left table, plus the matched values from the right
table or NULL in case of no matching JOIN predicate.
The syntax ‘FROM a LEFT OUTER join b’ should be written in one line so as to grasp how it
works: a is to the LEFT of b in this query, then all rows from a are kept.
Left Table Data, with dept_id null:
SELECT emp.emp_id,emp.emp_name,dept.dept_name FROM emp LEFT OUTER JOIN
dept ON (dept.id = emp.dept_id);
M07 Big Data Simplified XXXX 01.indd 164 5/17/2019 2:50:05 PM
Other BigData Tools andTechnologies | 165
RIGHT OUTER JOIN: The Hive RIGHT OUTER JOIN returns all the rows from the right table, even
if there are no matches in the left table. If the ON clause matches 0 (zero) records in the left
table, then the JOIN still returns a row in the result, but with NULL in each column from the
left table.
A RIGHT JOIN returns all the values from the right table, plus the matched values from the
left table or NULL in case of no matching join predicate.
SELECT emp.emp_id,emp.emp_name,dept.dept_name FROM dept RIGHT OUTER
JOIN emp ON (dept.id = emp.dept_id);
FULL OUTER JOIN: The Hive FULL OUTER JOIN combines the records of both the left and the right
outer tables that full the JOIN condition. The joined table either contains all the records from
both the tables or lls in NULL values for missing matches on either side.
SELECT emp.emp_id,emp.emp_name,dept.dept_name FROM dept FULL OUTER
JOIN emp ON (dept.id = emp.dept_id);
M07 Big Data Simplified XXXX 01.indd 165 5/17/2019 2:50:05 PM
166 | Big Data Simplied
Sort-Merge-Bucket Map JOIN: Sort-Merge-Bucket Map join is also a technique of Hive join optimiza-
tion. These joins are very efcient as they require a simple merge of previously sorted tables. The
corresponding buckets are joined with each other at the mapper stage.
There are several scenarios where we can use Hive Sort Merge Bucket Join.
When all tables are large.
When all tables are bucketed using the join columns.
When using the join columns, Sorted.
When the number of buckets is same as the number of all tables.
Properties need to change in Hive session to perform SMB join is as follows.
hive>set hive.enforce.sortmergebucketmapjoin=false;
hive>set hive.auto.convert.sortmerge.join=true;
hive>set hive.optimize.bucketmapjoin = true;
hive>set hive.optimize.bucketmapjoin.sortedmerge = true;
hive>set hive.auto.convert.join=false;
Following are the limitations of Hive Sort Merge Bucket Join:
SQL joins tables need to be bucketed. Hence, for other types of SQL, it cannot be used.
It is possible that too much partition tables might slow down here.
For SMB join created two bucketed and sorted tables named buck_emp and buck_dept.
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
create table buck_emp(
emp_id int,
emp_name string,
dept_id int)
M07 Big Data Simplified XXXX 01.indd 166 5/17/2019 2:50:06 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.144.56