Other BigData Tools andTechnologies | 169
4564546454 280813 640 890 Chiranjib USA
4564546489 280813 640 840 Astik USA
4564546454 280813 640 890 Arijit USA
4564546489 280813 640 840 Soumen USA
Time taken: 0.186 seconds, Fetched: 4 row(s)
Bucketing: Now, let us assume a condition that there is a huge dataset. At times, even after par-
titioning on a particular eld or elds, the partitioned le size doesn’t match with the actual
expectation and remains huge and we want to manage the partition results into different parts.
To overcome this problem of partitioning, Hive provides Bucketing concept, which allows user
to divide table data sets into more manageable parts.
At first create ‘emp’ external table with ‘emp.txt’ data and then create a bucketed table as
shown below.
hive>
>create table emp1_bucketed(accno string,
> dt string,
>ctrycd int,
>groupid int,
> name string,
>ctry string)
> CLUSTERED BY (ctry) into 3 buckets
> row format delimited
> fields terminated by ‘,’;
OK
Time taken: 0.278 seconds
Now, load the data in bucketed table from ‘emp’ table as shown below.
hive>from emp insert into table emp1_bucketed select *;
hduser@sayan:~/$ hadoop fs -ls /user/hive/warehouse
Found 1 items
drwxrwxr-x - hduser supergroup 0 2016-02-10 16:33 /user/hive/
warehouse/emp1_bucketed
hduser@sayan:~/$hadoop fs -ls /user/hive/warehouse/emp1_bucketed
Found 3 items
-rwxrwxr-x 1 hduser supergroup 71 2016-02-10 16:33 /user/hive/
warehouse/emp1_bucketed/000000_0
-rwxrwxr-x 1 hduser supergroup 70 2016-02-10 16:33 /user/hive/
warehouse/emp1_bucketed/000001_0
-rwxrwxr-x 1 hduser supergroup 230 2016-02-10 16:33 /user/hive/
warehouse/emp1_bucketed/000002_0
M07 Big Data Simplified XXXX 01.indd 169 5/17/2019 2:50:07 PM