Views

Views are logical data structures that can be used to simplify queries by hiding the complexities, such as joins, subqueries, and filters. It is called logical because views are only defined in metastore without the footprint in HDFS. Unlike what's in the relational database, views in HQL do not store data or get materialized. Once the view is created, its schema is frozen immediately. Subsequent changes to the underlying tables (for example, adding a column) will not be reflected in the view's schema. If an underlying table is dropped or changed, subsequent attempts to query the invalid view will fail. In addition, views are read-only and may not be used as the target of the LOAD/INSERT/ALTER statements.

The following is an example of a view creation statement:

> CREATE VIEW IF NOT EXISTS employee_skills
> AS
> SELECT 
> name, skills_score['DB'] as DB, > skills_score['Perl'] as Perl, > skills_score['Python'] as Python, > skills_score['Sales'] as Sales, > skills_score['HR'] as HR > FROM employee; No rows affected (0.253 seconds)

When creating views, there is no yarn job triggered since this is only a metadata change. However, the job will be triggered when querying the view. To check the view definition, we can use the SHOW statement. When modifying the view definition, we can use the ALTER VIEW statement. The following are some examples to show, check, and modify the view:

  1. Show only views in the database. This was introduced in Hive v2.2.0. We can use the SHOW TABLES statement in the earlier version of Hive instead:
      > SHOW VIEWS;
> SHOW VIEWS 'employee_*';
No rows affected (0.19 seconds)
  1. Show the view's definition:
      > DESC FORMATTED employee_skills;
> SHOW CREATE TABLE employee_skills; -- this is recommended
No rows affected (0.19 seconds)
  1. Alter the views' properties:
      > ALTER VIEW employee_skills SET TBLPROPERTIES ('comment'='A 
view');

No rows affected (0.19 seconds)
  1. Redefine the views:
      > ALTER VIEW employee_skills as SELECT * from employee;
No rows affected (0.17 seconds)
  1. Drop the views:
      > DROP VIEW employee_skills;
No rows affected (0.156 seconds)

There is a special view in HQL, called LateralView. It is usually used with user-defined table-generating functions in Hive, such as explode(), for data normalization or processing JSON data. LateralView first applies the table-generation function to the data, and then joins the function's input and output together. See the following examples:

> SELECT name, workplace FROM employee_internal

> LATERAL VIEW explode(work_place) wp as workplace;
+---------+-----------+
| name | workplace |
+---------+-----------+
| Michael | Montreal |
| Michael | Toronto |
| Will | Montreal |
| Shelley | Montreal |
| Lucy | Vancouver |
+---------+-----------+
5 rows selected (6.693 seconds)

By adding OUTER after LATERAL VIEW, we can ensure we generate the result even if the table-generating function's output is NULL:

> SELECT name, workplace FROM employee_internal
> LATERAL VIEW explode(split(null, ',')) wp as workplace;
+-------+------------+
| name | workplace |
+-------+------------+
+-------+------------+
No rows selected (5.499 seconds)

> SELECT name, workplace FROM employee_internal
> LATERAL VIEW OUTER explode(split(null, ',')) wp as workplace;
+---------+-----------+
| name | workplace |
+---------+-----------+
| Michael | NULL |
| Michael | NULL |
| Will | NULL |
| Shelley | NULL |
| Lucy | NULL |
+---------+-----------+
5 rows selected (5.745 seconds)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.79.241