Chapter 5. Advanced Joins

In this chapter, we will cover:

  • Joining data in the Mapper using MapReduce
  • Joining data using Apache Pig replicated join
  • Joining sorted data using Apache Pig merge join
  • Joining skewed data using Apache Pig skewed join
  • Using a map-side join in Apache Hive to analyze geographical events
  • Using optimized full outer joins in Apache Hive to analyze geographical events
  • Joining data using an external key-value store (Redis)

Introduction

In most processing environments, there will be a need to join multiple datasets to produce some final result. Unfortunately, joins in MapReduce are non-trivial and can be an expensive operation. This chapter will demonstrate different approaches to joining data in Hadoop using a number of tools, including Java MapReduce, Apache Pig, and Apache Hive. In addition, this chapter will demonstrate how to leverage external memory resources using Hadoop MapReduce.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.27.131