Chapter 7. Advanced Big Data Analysis

In this chapter, we will cover:

  • PageRank with Apache Giraph
  • Single-source shortest-path with Apache Giraph
  • Using Apache Giraph to perform a distributed breadth-first search
  • Collaborative filtering with Apache Mahout
  • Clustering with Apache Mahout
  • Sentiment classification with Apache Mahout

Introduction

Graph and machine learning problems are hard to solve using the MapReduce framework. Most of these problems require iterative steps and/or knowledge of complex algorithms, which can be cumbersome to implement in MapReduce. Luckily, there are two frameworks available to help with graph and machine learning problems in the Hadoop environment. Apache Giraph is a graph-processing framework designed to run large-scale algorithms. Apache Mahout is a framework that provides implementations of distributed machine learning algorithms.

This chapter will introduce readers to these two frameworks, which are capable of leveraging the distributed power of MapReduce.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.150.41