Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Advanced Joins

In this chapter, we will cover:

Joining data in the Mapper using MapReduce
Joining data using Apache Pig replicated join
Joining sorted data using Apache Pig merge join
Joining skewed data using Apache Pig skewed join
Using a map-side join in Apache Hive to analyze geographical events
Using optimized full outer joins in Apache Hive to analyze geographical events
Joining data using an external key-value store (Redis)

Introduction

In most processing environments, there will be a need to join multiple datasets to produce some final result. Unfortunately, joins in MapReduce are non-trivial and can be an expensive operation. This chapter will demonstrate different approaches to joining data in Hadoop using a number of tools, including Java MapReduce, Apache Pig, and Apache Hive. In addition, this chapter will demonstrate how to leverage external memory resources using Hadoop MapReduce.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Advanced Joins

Create new playlist

Sign In

Sign Up

Chapter 5. Advanced Joins

Introduction

Table of Contents for
5. Advanced Joins