Summary

In this chapter, we have explored the topic of data security and explained some of the surrounding issues. We have discovered that not only is there technical knowledge to master, but also that a data security mindset is just as important. Data security is often overlooked and, therefore, taking a systematic approach, and educating others, is a key responsibility for mastering data science.

We have explained the data security life cycle and outlined the most important areas of responsibility, including authorization, authentication and access, along with related examples and use cases. We have also explored the Hadoop security ecosystem and described the important open source solutions currently available.

A significant part of this chapter was dedicated to building a Hadoop InputFormat compressor that operates as a data encryption utility that can be used with Spark. Appropriate configuration allows the codec to be used in a variety of key areas, crucially when spilling shuffled records to local disk where currently no solution exists.

In the next chapter, we will explore Scalable Algorithms, demonstrating the key techniques that we can master to enable performance at a truly "big data" scale.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.172.56