A view allows a query to be saved and treated like a table. It is a logical construct, as it does not store data like a table. In other words, materialized views are not currently supported by Hive.
When a query references a view, the information in its definition is combined with the rest of the query by Hive’s query planner. Logically, you can imagine that Hive executes the view and then uses the results in the rest of the query.
When a query becomes long or complicated, a view may be used to hide the complexity by dividing the query into smaller, more manageable pieces; similar to writing a function in a programming language or the concept of layered design in software. Encapsulating the complexity makes it easier for end users to construct complex queries from reusable parts. For example, consider the following query with a nested subquery:
FROM
(
SELECT
*
FROM
people
JOIN
cart
ON
(
cart
.
people_id
=
people
.
id
)
WHERE
firstname
=
'john'
)
a
SELECT
a
.
lastname
WHERE
a
.
id
=
3
;
It is common for Hive queries to have many levels of nesting. In the following example, the nested portion of the query is turned into a view:
CREATE
VIEW
shorter_join
AS
SELECT
*
FROM
people
JOIN
cart
ON
(
cart
.
people_id
=
people
.
id
)
WHERE
firstname
=
'john'
;
Now the view is used like any other table. In this query we added a
WHERE
clause to the SELECT
statement. This exactly emulates the
original query:
SELECT
lastname
FROM
shorter_join
WHERE
id
=
3
;
A common use case for views is restricting the result rows
based on the value of one or more columns. Some databases allow a view to
be used as a security mechanism. Rather than give the user access to the
raw table with sensitive data, the user is given access to a view with a
WHERE
clause that restricts the data.
Hive does not currently support this feature, as the user must have access
to the entire underlying raw table for the view to work. However, the
concept of a view created to limit data access can be used to protect
information from the casual query:
hive
>
CREATE
TABLE
userinfo
(
>
firstname
string
,
lastname
string
,
ssn
string
,
password
string
);
hive
>
CREATE
VIEW
safer_user_info
AS
>
SELECT
firstname
,
lastname
FROM
userinfo
;
Here is another example where a view is used to restrict data based
on a WHERE
clause. In this case, we wish to provide a
view on an employee
table that only
exposes employees from a specific department:
hive
>
CREATE
TABLE
employee
(
firstname
string
,
lastname
string
,
>
ssn
string
,
password
string
,
department
string
);
hive
>
CREATE
VIEW
techops_employee
AS
>
SELECT
firstname
,
lastname
,
ssn
FROM
userinfo
WERE
department
=
'techops'
;
Recall from Chapter 3 that Hive supports arrays, maps, and structs datatypes. These datatypes are not common in traditional databases as they break first normal form. Hive’s ability to treat a line of text as a map, rather than a fixed set of columns, combined with the view feature, allows you to define multiple logical tables over one physical table.
For example, consider the following sample data file that treats an
entire row as a map rather than a list of fixed columns. Rather than using
Hive’s default values for separators, this file uses ^A
(Control-A) as the collection item separator
(i.e., between key-value pairs in this case, where the collection is a
map) and ^B
(Control-B) as the
separator between keys and values in the map. The long lines wrap in the
following listing, so we added a blank line between them for better
clarity:
time^B1298598398404^Atype^Brequest^Astate^Bny^Acity^Bwhite plains^Apart^Bmuffler
time^B1298598398432^Atype^Bresponse^Astate^Bny^Acity^Btarrytown^Apart^Bmuffler
time^B1298598399404^Atype^Brequest^Astate^Btx^Acity^Baustin^Apart^Bheadlight
Now we create our table:
CREATE
EXTERNAL
TABLE
dynamictable
(
cols
map
<
string
,
string
>
)
ROW
FORMAT
DELIMITED
FIELDS
TERMINATED
BY
' 04'
COLLECTION
ITEMS
TERMINATED
BY
' 01'
MAP
KEYS
TERMINATED
BY
' 02'
STORED
AS
TEXTFILE
;
Because there is only one field per row, the FIELDS TERMINATED BY
value actually has no effect.
Now we can create a view that extracts only rows with
type
equal to requests
and get the
city
, state
, and part
into a view called orders
:
CREATE
VIEW
orders
(
state
,
city
,
part
)
AS
SELECT
cols
[
"state"
],
cols
[
"city"
],
cols
[
"part"
]
FROM
dynamictable
WHERE
cols
[
"type"
]
=
"request"
;
A second view is created named shipments
. This view returns the time
and part
column from rows where the type
is
response
:
CREATE
VIEW
shipments
(
time
,
part
)
AS
SELECT
cols
[
"time"
],
cols
[
"parts"
]
FROM
dynamictable
WHERE
cols
[
"type"
]
=
"response"
;
For another example of this feature, see http://dev.bizo.com/2011/02/columns-in-hive.html#!/2011/02/columns-in-hive.html.
We said that Hive evaluates the view and then uses the results to evaluate the query. However, as part of Hive’s query optimization, the clauses of both the query and view may be combined together into a single actual query.
Nevertheless, the conceptual view still applies when the view and a
query that uses it both contain an ORDER
BY
clause or a LIMIT
clause.
The view’s clauses are evaluated before the using
query’s clauses.
For example, if the view has a LIMIT
100
clause and the query has a LIMIT
200
clause, you’ll get at most 100 results.
While defining a view doesn’t “materialize” any data, the view is frozen to any subsequent changes to any tables and columns that the view uses. Hence, a query using a view can fail if the referenced tables or columns no longer exist.
There are a few other clauses you can use when creating views. Modifying our last example:
CREATE
VIEW
IF
NOT
EXISTS
shipments
(
time
,
part
)
COMMENT
'Time and parts for shipments.'
TBLPROPERTIES
(
'creator'
=
'me'
)
AS
SELECT
...;
As for tables, the IF NOT EXISTS
and COMMENT …
clauses are optional, and
have the same meaning they have for tables.
A view’s name must be unique compared to all other table and view names in the same database.
You can also add a COMMENT
for
any or all of the new column names. The comments are not “inherited” from
the definition of the original table.
Also, if the AS SELECT
contains
an expression without an alias—e.g., size(cols)
(the number of items in cols
)—then Hive will use _CN
as the name, where N
is a number starting with 0. The view
definition will fail if the AS SELECT
clause is invalid.
Before the AS SELECT
clause, you
can also define TBLPROPERTIES
, just
like for tables. In the example, we defined a property for the “creator”
of the view.
The CREATE TABLE … LIKE …
construct discussed in Creating Tables can also be used
to copy a view, that is with a view as part of the LIKE
expression:
CREATE
TABLE
shipments2
LIKE
shipments
;
You can also use the optional EXTERNAL
keyword and LOCATION …
clause, as before.
The behavior of this statement is different as of Hive v0.8.0 and previous versions of Hive. For v0.8.0, the command creates a new table, not a new view. It uses defaults for the SerDe and file formats. For earlier versions, a new view is created.
A view is dropped in the same way as a table:
DROP
VIEW
IF
EXISTS
shipments
;
As usual, IF EXISTS
is
optional.
A view will be shown using SHOW
TABLES
(there is no SHOW
VIEWS
), however DROP TABLE
cannot be used to delete a view.
As for tables, DESCRIBE shipments
and DESCRIBE EXTENDED shipments
displays the usual data for the shipment
view. With the latter, there will be a
tableType
value in the Detailed Table Information
indicating the
“table” is a VIRTUAL_VIEW
.
You cannot use a view as a target of an INSERT
or LOAD
command.
Finally, views are read-only. You can only alter the metadata
TBLPROPERTIES
for a view:
ALTER
VIEW
shipments
SET
TBLPROPERTIES
(
'created_at'
=
'some_timestamp'
);
3.142.97.219