Authorization in Apache Hadoop

With authentication, we have validated the user. The next step in the security is to implement Service Level Authorization controls for users. Service Level Authorization sets the permissions for users to the different objects in the cluster. These permissions employ controls on the different actions that a user could perform, for example, submitting a MapReduce job, accessing a file on HDFS, and so on.

Service Level Authorization in Hadoop is done by defining an access control list (ACL). The ACLs allow the administrator to define the list of users that have permissions to the different services in Hadoop.

Configuring access control lists in Hadoop

The ACLs are configured in the hadoop-policy.xml file. This file is located under Hadoop's configuration directory. If Cloudera Manager was used to set up CDH on your cluster, you should see this configuration file under the /opt/cloudera/parcels/<CDH VERSION>/etc/hadoop/conf.dist directory.

In the cluster, which we are using as examples in this book, the file is present at the /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/etc/hadoop/conf.dist directory.

The hadoop-policy.xml file consists of name and value pairs for each of the properties. The value is specified as a comma-separated list of users and groups. The user and groups list are separated by a space.

For example, the following value represents an access control list for users, rohit and mark and for groups, scientist and miners:

<value>rohit, mark scientist, miners</value>

By default the value is *, which stands for universal access to a service.

The following are a few of the properties:

  • security.client.protocol.acl: This property defines the access control list for the ClientProtocol interface that is used in user code for job submission. Only the list of users configured in this property will be allowed to talk to the cluster as a distributed filesystem client.
  • security.client.datanode.protocol.acl: This property defines the access control list for the client to datanode protocol that is used for communication between the client and the datanodes to retrieve data blocks. Only the list of users configured in this property will be allowed to recover blocks from the datanode.
  • security.datanode.protocol.acl: This property defines the access control list that the datanodes use to communicate with the namenode. Only the list of users configured in this property will be allowed to start the datanodes, which will have access to the namenode.
  • security.namenode.protocol.acl: Only the list of users configured in this property will be allowed to start the secondary namenode, which will have access to the namenode.
  • security.refresh.policy.protocol.acl: Only the list of users configured in this property will be allowed to refresh the security policies for Hadoop.
  • security.ha.service.protocol.acl: Only the list of the users configured in this property will be allowed to perform administration commands required to change the namenode state from active to standby in a high availability scenario.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.124.194