Chapter 7. HiveQL: Views

A view allows a query to be saved and treated like a table. It is a logical construct, as it does not store data like a table. In other words, materialized views are not currently supported by Hive.

When a query references a view, the information in its definition is combined with the rest of the query by Hive’s query planner. Logically, you can imagine that Hive executes the view and then uses the results in the rest of the query.

Views to Reduce Query Complexity

When a query becomes long or complicated, a view may be used to hide the complexity by dividing the query into smaller, more manageable pieces; similar to writing a function in a programming language or the concept of layered design in software. Encapsulating the complexity makes it easier for end users to construct complex queries from reusable parts. For example, consider the following query with a nested subquery:

FROM (
  SELECT * FROM people JOIN cart
    ON (cart.people_id=people.id) WHERE firstname='john'
) a SELECT a.lastname WHERE a.id=3;

It is common for Hive queries to have many levels of nesting. In the following example, the nested portion of the query is turned into a view:

CREATE VIEW shorter_join AS
SELECT * FROM people JOIN cart
ON (cart.people_id=people.id) WHERE firstname='john';

Now the view is used like any other table. In this query we added a WHERE clause to the SELECT statement. This exactly emulates the original query:

SELECT lastname FROM shorter_join WHERE id=3;

Views that Restrict Data Based on Conditions

A common use case for views is restricting the result rows based on the value of one or more columns. Some databases allow a view to be used as a security mechanism. Rather than give the user access to the raw table with sensitive data, the user is given access to a view with a WHERE clause that restricts the data. Hive does not currently support this feature, as the user must have access to the entire underlying raw table for the view to work. However, the concept of a view created to limit data access can be used to protect information from the casual query:

hive> CREATE TABLE userinfo (
    >   firstname string, lastname string, ssn string, password string);
hive> CREATE VIEW safer_user_info AS
    > SELECT firstname,lastname FROM userinfo;

Here is another example where a view is used to restrict data based on a WHERE clause. In this case, we wish to provide a view on an employee table that only exposes employees from a specific department:

hive> CREATE TABLE employee (firstname string, lastname string,
    >   ssn string, password string, department string);
hive> CREATE VIEW techops_employee AS
    > SELECT firstname,lastname,ssn FROM userinfo WERE department='techops';

Views and Map Type for Dynamic Tables

Recall from Chapter 3 that Hive supports arrays, maps, and structs datatypes. These datatypes are not common in traditional databases as they break first normal form. Hive’s ability to treat a line of text as a map, rather than a fixed set of columns, combined with the view feature, allows you to define multiple logical tables over one physical table.

For example, consider the following sample data file that treats an entire row as a map rather than a list of fixed columns. Rather than using Hive’s default values for separators, this file uses ^A (Control-A) as the collection item separator (i.e., between key-value pairs in this case, where the collection is a map) and ^B (Control-B) as the separator between keys and values in the map. The long lines wrap in the following listing, so we added a blank line between them for better clarity:

time^B1298598398404^Atype^Brequest^Astate^Bny^Acity^Bwhite plains^Apart^Bmuffler

time^B1298598398432^Atype^Bresponse^Astate^Bny^Acity^Btarrytown^Apart^Bmuffler

time^B1298598399404^Atype^Brequest^Astate^Btx^Acity^Baustin^Apart^Bheadlight

Now we create our table:

CREATE EXTERNAL TABLE dynamictable(cols map<string,string>)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '04'
  COLLECTION ITEMS TERMINATED BY '01'
  MAP KEYS TERMINATED BY '02'
STORED AS TEXTFILE;

Because there is only one field per row, the FIELDS TERMINATED BY value actually has no effect.

Now we can create a view that extracts only rows with type equal to requests and get the city, state, and part into a view called orders:

CREATE VIEW orders(state, city, part) AS
SELECT cols["state"], cols["city"], cols["part"]
FROM dynamictable
WHERE cols["type"] = "request";

A second view is created named shipments. This view returns the time and part column from rows where the type is response:

CREATE VIEW shipments(time, part) AS
SELECT cols["time"], cols["parts"]
FROM dynamictable
WHERE cols["type"] = "response";

For another example of this feature, see http://dev.bizo.com/2011/02/columns-in-hive.html#!/2011/02/columns-in-hive.html.

View Odds and Ends

We said that Hive evaluates the view and then uses the results to evaluate the query. However, as part of Hive’s query optimization, the clauses of both the query and view may be combined together into a single actual query.

Nevertheless, the conceptual view still applies when the view and a query that uses it both contain an ORDER BY clause or a LIMIT clause. The view’s clauses are evaluated before the using query’s clauses.

For example, if the view has a LIMIT 100 clause and the query has a LIMIT 200 clause, you’ll get at most 100 results.

While defining a view doesn’t “materialize” any data, the view is frozen to any subsequent changes to any tables and columns that the view uses. Hence, a query using a view can fail if the referenced tables or columns no longer exist.

There are a few other clauses you can use when creating views. Modifying our last example:

CREATE VIEW IF NOT EXISTS shipments(time, part)
COMMENT 'Time and parts for shipments.'
TBLPROPERTIES ('creator' = 'me')
AS SELECT ...;

As for tables, the IF NOT EXISTS and COMMENT … clauses are optional, and have the same meaning they have for tables.

A view’s name must be unique compared to all other table and view names in the same database.

You can also add a COMMENT for any or all of the new column names. The comments are not “inherited” from the definition of the original table.

Also, if the AS SELECT contains an expression without an alias—e.g., size(cols) (the number of items in cols)—then Hive will use _CN as the name, where N is a number starting with 0. The view definition will fail if the AS SELECT clause is invalid.

Before the AS SELECT clause, you can also define TBLPROPERTIES, just like for tables. In the example, we defined a property for the “creator” of the view.

The CREATE TABLE … LIKE … construct discussed in Creating Tables can also be used to copy a view, that is with a view as part of the LIKE expression:

CREATE TABLE shipments2
LIKE shipments;

You can also use the optional EXTERNAL keyword and LOCATION … clause, as before.

Warning

The behavior of this statement is different as of Hive v0.8.0 and previous versions of Hive. For v0.8.0, the command creates a new table, not a new view. It uses defaults for the SerDe and file formats. For earlier versions, a new view is created.

A view is dropped in the same way as a table:

DROP VIEW IF EXISTS shipments;

As usual, IF EXISTS is optional.

A view will be shown using SHOW TABLES (there is no SHOW VIEWS), however DROP TABLE cannot be used to delete a view.

As for tables, DESCRIBE shipments and DESCRIBE EXTENDED shipments displays the usual data for the shipment view. With the latter, there will be a tableType value in the Detailed Table Information indicating the “table” is a VIRTUAL_VIEW.

You cannot use a view as a target of an INSERT or LOAD command.

Finally, views are read-only. You can only alter the metadata TBLPROPERTIES for a view:

ALTER VIEW shipments SET TBLPROPERTIES ('created_at' = 'some_timestamp');
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.97.219