Authentication and authorization

Authentication is related to the mechanisms used to ensure that the user is who they say they are and operates at two key levels, namely, local and remote.

Authentication can take various forms, the most common is user login, but other examples include fingerprint reading, iris scanning, and PIN number entry. User logins can be managed on a local basis, as you would on your personal computer, for example, or on a remote basis using a tool such as Lightweight Directory Access Protocol (LDAP). Managing users remotely provides roaming user profiles that are independent of any particular hardware and can be managed independently of the user. All of these methods execute at the operating system level. There are other mechanisms that sit at the application layer and provide authentication for services, such as Google OAuth.

Alternative authentication methods have their own pros and cons, a particular implementation should be understood thoroughly before declaring a secure system; for example, a fingerprint system may seem very secure, but this is not always the case. For more information, refer to http://www.cse.msu.edu/rgroups/biometrics/Publications/Fingerprint/CaoJain_HackingMobilePhonesUsing2DPrintedFingerprint_MSU-CSE-16-2.pdf. We are not going to explore authentication any further here, as we have made the assumption that most systems will only be implementing user logins; a feature, by the way, that is often not a secure solution in its own right and indeed, in many cases, provides no security at all. For more information, refer to http://www.cs.arizona.edu/~collberg/Teaching/466-566/2012/Resources/presentations/2012/topic7-final/report.pdf.

Authorization is an area that is of great interest to us as it forms a critical part of basic security, is an area that we most often have greatest control over, and is something that we can use natively in any modern operating system. There are various different ways of implementing resource authorization, the two main ones being:

  • Access control lists (ACL)
  • Role-based access control (RBAC)

We'll discuss each of these in turn.

Access control lists (ACL)

In Unix, ACLs are used throughout the filesystem. If we list directory contents at the command line:

drwxr-xr-x 6 mrh mygroup 204 16 Jun 2015 resources

We can see there is a directory called resources that has an assigned owner (mrh) and group (mygroup), has 6 links, a size of 204 bytes, and was last modified on the 16 June 2015. The ACLs drwxr-xr-x indicate:

  • d this is a directory (- if it is not)
  • rwx the owner(mrh) has read, write, and executable rights
  • r-x anyone in the group (mygroup) has read and execute rights
  • r-x everyone else has read and execute rights

Using ACLs is an excellent first step towards securing our data. It should always be the first thing considered, and should always be correct; if we do not ensure these settings are correct at all times, then we are potentially making it easy for other users to access this data, and we don't necessarily know who the other users on the system are. Always avoid providing full access in the all part of the ACL:

-rwx---rwx 6 mrh mygroup 204 16 Jun 2015 secretFile.txt

It doesn't matter how secure our system is, any user with access to the filesystem can read, write, and delete this file! A far more appropriate setting would be:

-rwxr----- 6 mrh mygroup 204 16 Jun 2015 secretFile.txt

Which provides full owner access and read-only access for the group.

HDFS implements ACLs natively; these can be administered using the command line:

hdfs dfs -chmod 777 /path/to/my/file.txt

This gives full permissions to the file in HDFS for everyone, assuming the file already had sufficient permissions for us to make the change.

Note

When Apache released Hadoop in 2008, it was often not understood that a cluster set at all of its defaults did not do any authentication of users. The superuser in Hadoop, hdfs, could be accessed by any user if the cluster had not been correctly configured, by simply creating an hdfs user on a client machine (sudo useradd hdfs).

Role-based access control (RBAC)

RBAC takes a different approach, by assigning users one or more roles. These roles are related to common tasks or job functions, such that they can be easily added or removed dependent upon the user's responsibilities. For example, in a company, there may be many roles, including accounts, stock, and deliveries. An accountant may be given all three roles, so that they can compile the end of year finances, whereas an administrator booking deliveries would just have the deliveries role. This makes it much easier to add new users and manage users when they change departments or leave the organization.

Three key rules are defined for RBAC:

  • Role assignment: a user can exercise a permission only if the user has selected or been assigned a role
  • Role authorization: a user's active role must be authorized for the user
  • Permission authorization: a user can exercise a permission only if the permission is authorized for the user's active role

The relationships between users and roles can be summarized as follows:

  • Role-Permissions: a particular role grants specific permissions to the user.
  • User-Role: the relationships between types of users and specific roles.
  • Role-Role: the relationships between roles. These can be hierarchical, so role1 => role2 could mean that, if a user has role1, then they automatically have role2, but if they have role2, this does not necessarily mean they have role1.

RBAC is realized in Hadoop through Apache Sentry. Organizations can define the privileges for datasets that will be enforced from multiple access paths, including HDFS, Apache Hive, Impala, as well as Apache Pig and Apache MapReduce/Yarn via HCatalog. As an example, each Spark application runs as the requesting user and requires access to the underlying files. Spark cannot enforce access control directly, since it is running as the requesting user and is untrusted. Therefore, it is restricted to filesystem permissions (ACLs). Apache Sentry provides role-based control to resources in this case.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.14.98