Summary

In this chapter, similar to the other chapters in this part of the book, we started with the layer where Apache Hadoop was discussed in detail. We then mapped the technology, namely Hadoop, to this layer. Once we named the technology, we went into detail on the Hadoop technology.

First of all, we gave reasons for choosing this technology and then got into its history, advantages, and disadvantages. Soon, we delved into Hadoop’s working by explaining both Hadoop 1.x and Hadoop 2.x architecture. Since we are using Hadoop 2.x, we explained Hadoop’s architecture components. We then looked at some of the very important components in Hadoop Ecosystem, and we will be using some of these in implementing our SVC use case.

We then delved into some of the other aspects of Hadoop, namely its distributions, HDFS, and its various formats and finally, various deployment modes. We then dived deep into hands on coding using Hadoop and also saw how SCV use case is using the Hadoop technology. Finally, we wrapped up with two sections where we explained when to and when not to use Hadoop. As always, we finally discussed some of the alternatives that can be considered in place of Hadoop.

After reading this chapter, you would have a clear idea of the Data Storage layer and Hadoop technology. Full coverage of Hadoop technology is out of scope of this book, so we briefly discussed the core aspects of Hadoop that are key to implementing our use case.

Hadoop is one of the core technologies in our Data Lake implementation, and we are sure you have another technology under your kitty after going through this chapter.

Well done! You are one step close to knowing the full technology stack of our Data Lake.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary