Use case – security and configuration isolation

The Hadoop authentication and authorization model is weak. Sensitive data is hard to protect. It has multiple MapReduce workloads for production batch analysis, ad hoc analysis, and experiment tasks with different SLAs for different jobs.

We need to take the following into consideration:

  • Where it makes sense, HDFS is consolidated to minimize data duplication
  • High-priority jobs get more resources to ensure they are completed on time
  • Each type of job can get as many resources as possible at any time
  • Avoid CPU and memory contention so better utilize resources to get a job done on time

Our objective is to integrate Hadoop workloads and other workloads by having a big data shared infrastructure. The Hadoop MapReduce framework uses HDFS as an underlying filesystem to process large sets of data and use their own storage mechanism. We also have other technologies, such as HBase and Pivotal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.0.25