Impala security

Impala is designed and developed to run on top of Hadoop. So you must understand the Hadoop security model as well as the security provided in the OS where Hadoop is running. If Hadoop is running on Linux, then a Linux administrator and Hadoop administrator user can tighten the security, which definitely can be taken into account with the security provided by Impala. Impala 1.1 or higher uses Sentry Open Source Project to provide a detailed authorization framework for Hadoop. Impala 1.1.1 supports auditing capabilities in a cluster by creating auditing data, which can be collected from all nodes and then processed for further analysis and insight.

Here, in this chapter, we will talk about the security features provided by Impala. To start with Impala security, we can consider the following types of security features.

Authorization

Authorization means "who can access the data resources" and "what kind of action is approved for which user." Impala uses the Linux OS user ID of the user who started the Impala shell process or another client application. This user ID is associated with other privileges to be used with Impala. With Impala 1.1, the Open Source Sentry project is used for authorization. so users can learn more by accessing relevant information in this regard.

Impala uses the same authorization privilege model that is used with other database systems, that is, MySQL and Hive. In Impala, privilege is granted to various kinds of objects in schema. Any privilege that can be granted is associated with a level in the object hierarchy. For example, if a container object is given privilege, the child object automatically inherits it.

Currently only Server Name, URI, Databases, and Tables can be used to restrict privileges; however, partition- or column-level restriction is not supported.

Following this we will learn how a restricted set of privileges determines what you can do with each object.

The SELECT privilege

The SELECT privilege allows the user to read the data from a table. If users use SHOW DATABASES and SHOW TABLES statements, only objects for which a user has this privilege will be shown in the output and the same goes with the REFRESH and INVALIDATE METADATA statements. These statements will only access metadata for tables for which the user has this privilege.

The INSERT privilege

The INSERT privilege applies only to the INSERT and LOAD DATA statements, and allows the user to write data into a table.

The ALL privilege

With the ALL privilege users can create or modify any object. This access privilege is needed to execute DDL statements, that is, CREATE TABLE, ALTER TABLE, or DROP TABLE for a table, CREATE DATABASE or DROP DATABASE for a database, or CREATE VIEW, ALTER VIEW, or DROP VIEW for a view.

Here are a few examples of how you can set the described privileges:

GRANT SELECT on TABLE table_name TO USER user_name
GRANT ALL on TABLE table_name TO GROUP group_name

Authentication through Kerberos

Authentication means verifying the credentials and confirming the identity of the user before processing the request. Impala uses Kerberos security subsystems to authenticate the user and his or her identity.

In the Cloudera Hadoop distribution, the Kerberos security can be enabled through Cloudera Manager. Running Impala in a managed environment, Cloudera Manager automatically completes the Kerberos configuration. At the time of writing this book, Impala does not support application data wire encryption. Once your Hadoop distribution has Kerberos security enabled, you can enable Kerberos security in Impala.

Note

To learn more about enabling Kerberos security features with Impala, please visit the Cloudera Impala documentation website, where you can find the latest information.

Auditing

Auditing means keeping account of each and every operation executed in the system and maintaining a record of whether they succeed or failed. Using auditing features, users can look back to check what operation was executed and what part of the data has been accessed by which user. The auditing feature helps track down such activities in the system, so respective professionals can take proper measurements. In Impala, the auditing feature produces audit data, which is collected and presented in user-friendly details by Cloudera Manger.

Auditing features are introduced with Impala 1.1.1 and the key features are as follows:

  • Enable auditing directory with the impalad startup option using audit_event_log_dir.
  • By default, Impala starts a new audit logfile after every 5,000 queries. To change this count, use the -max_audit_event_log_file_size option with the impalad startup option.
  • Optionally, the Cloudera Navigator application is used to collect and consolidate audit logs from all nodes in the cluster.
  • Optionally, Cloud Manager is used to filter, visualize, and produce the audit reports.

Here are the types of SQL queries that are logged with audit logs:

  • Blocked SQL queries that could not be authorized
  • SQL queries that are authorized to execute are logged after analysis is done and before the actual execution

Query information is logged into the audit log in JSON format, using a single line per SQL query. Each logged query can be accessed through SQL syntax by providing any combination of session ID, user name, and client network address.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.127