Impala is designed and developed to run on top of Hadoop. So you must understand the Hadoop security model as well as the security provided in the OS where Hadoop is running. If Hadoop is running on Linux, then a Linux administrator and Hadoop administrator user can tighten the security, which definitely can be taken into account with the security provided by Impala. Impala 1.1 or higher uses Sentry Open Source Project to provide a detailed authorization framework for Hadoop. Impala 1.1.1 supports auditing capabilities in a cluster by creating auditing data, which can be collected from all nodes and then processed for further analysis and insight.
Here, in this chapter, we will talk about the security features provided by Impala. To start with Impala security, we can consider the following types of security features.
Authorization means "who can access the data resources" and "what kind of action is approved for which user." Impala uses the Linux OS user ID of the user who started the Impala shell process or another client application. This user ID is associated with other privileges to be used with Impala. With Impala 1.1, the Open Source Sentry project is used for authorization. so users can learn more by accessing relevant information in this regard.
Impala uses the same authorization privilege model that is used with other database systems, that is, MySQL and Hive. In Impala, privilege is granted to various kinds of objects in schema. Any privilege that can be granted is associated with a level in the object hierarchy. For example, if a container object is given privilege, the child object automatically inherits it.
Currently only Server Name, URI, Databases, and Tables can be used to restrict privileges; however, partition- or column-level restriction is not supported.
Following this we will learn how a restricted set of privileges determines what you can do with each object.
The SELECT
privilege allows the user to read the data from a table. If users use SHOW DATABASES
and SHOW TABLES
statements, only objects for which a user has this privilege will be shown in the output and the same goes with the REFRESH
and INVALIDATE METADATA
statements. These statements will only access metadata for tables for which the user has this privilege.
The INSERT
privilege applies only to the
INSERT
and LOAD DATA
statements, and allows the user to write data into a table.
With the ALL
privilege users can create or modify any object. This access privilege is needed to execute DDL statements, that is, CREATE TABLE
, ALTER TABLE
, or DROP TABLE
for a table, CREATE DATABASE
or DROP DATABASE
for a database, or CREATE VIEW
, ALTER VIEW
, or DROP VIEW
for a view.
Here are a few examples of how you can set the described privileges:
GRANT SELECT on TABLE table_name TO USER user_name GRANT ALL on TABLE table_name TO GROUP group_name
Authentication means verifying the credentials and confirming the identity of the user before processing the request. Impala uses Kerberos security subsystems to authenticate the user and his or her identity.
In the Cloudera Hadoop distribution, the Kerberos security can be enabled through Cloudera Manager. Running Impala in a managed environment, Cloudera Manager automatically completes the Kerberos configuration. At the time of writing this book, Impala does not support application data wire encryption. Once your Hadoop distribution has Kerberos security enabled, you can enable Kerberos security in Impala.
Auditing means keeping account of each and every operation executed in the system and maintaining a record of whether they succeed or failed. Using auditing features, users can look back to check what operation was executed and what part of the data has been accessed by which user. The auditing feature helps track down such activities in the system, so respective professionals can take proper measurements. In Impala, the auditing feature produces audit data, which is collected and presented in user-friendly details by Cloudera Manger.
Auditing features are introduced with Impala 1.1.1 and the key features are as follows:
audit_event_log_dir
.-max_audit_event_log_file_size
option with the impalad startup option.Here are the types of SQL queries that are logged with audit logs:
Query information is logged into the audit log in JSON format, using a single line per SQL query. Each logged query can be accessed through SQL syntax by providing any combination of session ID, user name, and client network address.
3.147.6.243