It is quite common to narrow down the result set by using a condition clause, such as LIMIT, WHERE, IN/NOT IN, and EXISTS/NOT EXISTS. The LIMIT keyword limits the specified number of rows returned randomly. Compared with LIMIT, WHERE is a more powerful and generic condition clause to limit the returned result set by expressions, functions, and nested queries as in the following examples:
> SELECT name FROM employee LIMIT 2;
+----------+
| name |
+----------+
| Lucy |
| Michael |
+----------+
2 rows selected (71.125 seconds)
> SELECT name, work_place FROM employee WHERE name = 'Michael';
+----------+------------------------+
| name | work_place |
+----------+------------------------+
| Michael | ["Montreal","Toronto"] |
+----------+------------------------+
1 row selected (38.107 seconds)
-- All the conditions can use together and use after WHERE
> SELECT name, work_place FROM employee WHERE name = 'Michael' LIMIT 1;
+----------+------------------------+
| name | work_place |
+----------+------------------------+
| Michael | ["Montreal","Toronto"] |
+----------+------------------------+
1 row selected (39.103 seconds)
IN/NOT IN is used as an expression to check whether values belong to a set specified by IN or NOT IN. With effect from Hive v2.1.0, IN and NOT IN statements support more than one column:
> SELECT name FROM employee WHERE gender_age.age in (27, 30);
+----------+
| name |
+----------+
| Michael |
| Shelley |
+----------+
2 rows selected (0.3 seconds)
-- With multiple columns support after v2.1.0
> SELECT
> name, gender_age
> FROM employee
> WHERE (gender_age.gender, gender_age.age) IN
> (('Female', 27), ('Male', 27 + 3)); -- Also support expression
+---------+------------------------------+
| name | gender_age |
+---------+------------------------------+
| Michael | {"gender":"Male","age":30} |
| Shelley | {"gender":"Female","age":27} |
+---------+------------------------------+
2 rows selected (0.282 seconds)
In addition, filtering data can also use a subquery in the WHERE clause with IN/NOT IN and EXISTS/NOT EXISTS. A subquery that uses EXISTS or NOT EXISTS must refer to both inner and outer expressions:
> SELECT
> name, gender_age.gender as gender > FROM employee > WHERE name IN > (SELECT name FROM employee WHERE gender_age.gender = 'Male'); +----------+----------+ | name | gender | +----------+----------+ | Michael | Male | | Will | Male | +----------+----------+ 2 rows selected (54.644 seconds)
> SELECT
> name, gender_age.gender as gender > FROM employee a > WHERE EXISTS ( > SELECT *
> FROM employee b > WHERE
> a.gender_age.gender = b.gender_age.gender AND
b.gender_age.gender = 'Male' > ); -- This likes join table a and b with column gender +----------+----------+ | name | gender | +----------+----------+ | Michael | Male | | Will | Male | +----------+----------+ 2 rows selected (69.48 seconds)
There are additional restrictions for subqueries used in WHERE clauses:
- Subqueries can only appear on the right-hand side of WHERE clauses
- Nested subqueries are not allowed
- IN/NOT IN in subqueries only support the use of a single column, although they support more in regular expressions