Filtering data with conditions

It is quite common to narrow down the result set by using a condition clause, such as LIMIT, WHERE, IN/NOT IN, and EXISTS/NOT EXISTSThe LIMIT keyword limits the specified number of rows returned randomly. Compared with LIMIT, WHERE is a more powerful and generic condition clause to limit the returned result set by expressions, functions, and nested queries as in the following examples:

> SELECT name FROM employee LIMIT 2;
+----------+
| name |
+----------+
| Lucy |
| Michael |
+----------+
2 rows selected (71.125 seconds)


> SELECT name, work_place FROM employee WHERE name = 'Michael';
+----------+------------------------+
| name | work_place |
+----------+------------------------+
| Michael | ["Montreal","Toronto"] |
+----------+------------------------+
1 row selected (38.107 seconds)

-- All the conditions can use together and use after WHERE
> SELECT name, work_place FROM employee WHERE name = 'Michael' LIMIT 1;
+----------+------------------------+
| name | work_place |
+----------+------------------------+
| Michael | ["Montreal","Toronto"] |
+----------+------------------------+
1 row selected (39.103 seconds)

IN/NOT IN is used as an expression to check whether values belong to a set specified by IN or NOT IN. With effect from Hive v2.1.0, IN and NOT IN statements support more than one column:

> SELECT name FROM employee WHERE gender_age.age in (27, 30);
+----------+
| name |
+----------+
| Michael |
| Shelley |
+----------+
2 rows selected (0.3 seconds)

-- With multiple columns support after v2.1.0
> SELECT
> name, gender_age
> FROM employee
> WHERE (gender_age.gender, gender_age.age) IN
> (('Female', 27), ('Male', 27 + 3)); -- Also support expression
+---------+------------------------------+
| name | gender_age |
+---------+------------------------------+
| Michael | {"gender":"Male","age":30} |
| Shelley | {"gender":"Female","age":27} |
+---------+------------------------------+
2 rows selected (0.282 seconds)

In addition, filtering data can also use a subquery in the WHERE clause with IN/NOT IN and EXISTS/NOT EXISTS. A subquery that uses EXISTS or NOT EXISTS must refer to both inner and outer expressions:

> SELECT 
> name, gender_age.gender as gender > FROM employee > WHERE name IN > (SELECT name FROM employee WHERE gender_age.gender = 'Male'); +----------+----------+ | name | gender | +----------+----------+ | Michael | Male | | Will | Male | +----------+----------+ 2 rows selected (54.644 seconds)

> SELECT
> name, gender_age.gender as gender > FROM employee a > WHERE EXISTS ( > SELECT *
> FROM employee b > WHERE
> a.gender_age.gender = b.gender_age.gender AND
b.gender_age.gender = 'Male' > );
-- This likes join table a and b with column gender +----------+----------+ | name | gender | +----------+----------+ | Michael | Male | | Will | Male | +----------+----------+ 2 rows selected (69.48 seconds)

There are additional restrictions for subqueries used in WHERE clauses:

  • Subqueries can only appear on the right-hand side of WHERE clauses
  • Nested subqueries are not allowed
  • IN/NOT IN in subqueries only support the use of a single column, although they support more in regular expressions
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.139.169