Authorization

You could be storage-based using ACLs or permissions on storage level to determine whether you can do the job or not. HDFS supports ACLs and you can configure it. EMRFS/S3 (S3 access control). EMRFS is storage-based, fine-grained authorization useful in shared cluster scenario where each user assumes a different IAM role to limit what they can or can't access on S3. You can map an IAM role to users, groups, or S3 prefixes.

For more details on using EMFRS authorization for data on S3, refer to AWS documentation: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-authz.html.

HiveServer2 and Presto support SQL standards based access control. For example, a given user can access or not access a specific table. HBase supports cell level access control. With Kerberos, you can authenticate users to manage access to YARN queues. At EMR cluster level ,you can use IAM and tags to implement access control. Finally, you can use Apache Ranger (essentially a policy engine for Hadoop) on edge node (using CloudFormation).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.12.202