SQL join operation with the example database

Now, we will try to check which maker is common between both tables and their models by using the join SQL command as follows:

[Hadoop.testdomain:21000] > select automobiles.make, automobiles.model, motorcycles.make,motorcycles.model from automobiles JOIN motorcycles USING (make);
Query finished, fetching results ...
+-------+-------+-------+---------+
| make  | model | make  | model   |
+-------+-------+-------+---------+
| BMW   | M3    | BMW   | R1200RT |
| BMW   | X5    | BMW   | R1200RT |
| Honda | Civic | Honda | VFR400  |
| Honda | Civic | Honda | CB900   |
| Honda | Civic | Honda | CBR600  |
+-------+-------+-------+---------+
Returned 5 row(s) in 0.40s

Now, we will insert automakers from both the automobiles and motorcycles tables into the automakers table by using the Insert Overwrite and Insert Into SQL statements as follows:

[Hadoop.testdomain:21000] > INSERT OVERWRITE TABLE automakers SELECT distinct(make)  from automobiles;
Inserted 10 rows in 2.00s
[Hadoop.testdomain:21000] > INSERT  INTO TABLE automakers SELECT distinct(make)  from motorcycles;
Inserted 10 rows in 0.71s

Using various types of SQL statements

In the automobiles table, I have included a field named city_hwy_mpg. It includes miles per gallon in the city and highway, separated by a dash (-). Here, I want to show you how to use the STRING manipulation to get both highway and city miles per gallon values along with the LIMIT and WHERE clauses:

[Hadooptestdomain:21000] > select city_hwy_mpg, substr(city_hwy_mpg,1,2),  substr(city_hwy_mpg, instr(city_hwy_mpg, "-")+1 , 5) from automobiles WHERE price > 15000 LIMIT 2;
Query finished, fetching results ...
+--------------+----------------------------+-------------------------------------------------------+
| city_hwy_mpg | substr(city_hwy_mpg, 1, 2) | substr(city_hwy_mpg, instr(city_hwy_mpg, '-') + 1, 5) |
+--------------+----------------------------+-------------------------------------------------------+
| 22-30        | 22                         | 30                                                    |
| 24-32        | 24                         | 32                                                    |
+--------------+----------------------------+-------------------------------------------------------+
Returned 2 row(s) in 0.35s

In the preceding code snippet, the substr SQL command is used to select the first two letters that represent the city_mpg value. After this, instr is used to find the location of - and then the same substr is used to get the hwy_mpg part of the value.

Now, let's use the COUNT and DISTINCT clause to collect unique automakers as follows:

[Hadoop.testdomain:21000] > select count(distinct(make)) from automobiles;
Query finished, fetching results ...
+----------------------+
| count(distinct make) |
+----------------------+
| 10                   |
+----------------------+
Returned 1 row(s) in 0.48s

The preceding distinct SQL command removes the duplicate make values to make them unique and after that the count SQL command counts all these items to return the result.

Now, you will see how to use the EXPLAIN clause to understand a query execution as follows:

[Hadoop.testdomain:21000] > explain select * from motorcycles where price > 10000 and price < 20000;
PLAN FRAGMENT 0
  PARTITION: UNPARTITIONED
  1:EXCHANGE
     tuple ids: 0 
PLAN FRAGMENT 1
  PARTITION: RANDOM
  STREAM DATA SINK
    EXCHANGE ID: 1
    UNPARTITIONED
  0:SCAN HDFS
     table=autos.motorcycles #partitions=1 size=889B
     predicates: price > 10000.0, price < 20000.0
     tuple ids: 0

The explain command takes the remaining part of the query and then shows how it is going to handle the query. If you pass the select command with explain, you will get results about the query execution plan from the optimizer. Depending upon your query, explain will show you if a partition is used, where the data is stored, how the results are fetched from the table, and if any index is used or not. You can use the result from the explain SQL command to do an interactive analysis of your query and then modify it for faster execution as and if needed. You can also use the results from explain to troubleshoot specific issues.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.181.47