Data visualization with HIVE tables

Now let us visualize the loaded data by creating Hive tables over the binary Avro data by executing the following Hive table scripts (using Hive Query Editor):

  1. Create a customer Hive table by executing the following script:
CREATE EXTERNAL TABLE customer
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/datalake/raw/customer'
TBLPROPERTIES (
'avro.schema.literal'='{"namespace": "example.avro",
"type": "record",
"name": "Customer",
"fields": [
{"name": "id", "type": "int"},
{"name": "first_name", "type": "string"},
{"name": "last_name", "type": "string"},
{"name": "dob", "type": "long"}
]}'
);
  1. Create address Hive table by executing the following script:
CREATE EXTERNAL TABLE address
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/datalake/raw/address'
TBLPROPERTIES (
'avro.schema.literal'='{"namespace": "example.avro",
"type": "record",
"name": "Address",
"fields": [
{"name": "id", "type": "int"},
{"name": "street1", "type": "string"},
{"name": "street2", "type": "string"},
{"name": "city", "type": "string"},
{"name": "state", "type": "string"},
{"name": "country", "type": "string"},
{"name": "zip_pin_postal_code", "type": "string"}
]}'
);
  1. Create contacts Hive table by executing the following script:
CREATE EXTERNAL TABLE contacts
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/datalake/raw/contacts/load1'
TBLPROPERTIES (
'avro.schema.literal'='{"namespace": "example.avro",
"type": "record",
"name": "Contact",
"fields": [
{"name": "id", "type": "string"},
{"name": "cell", "type": "string"},
{"name": "phone", "type": "string"},
{"name": "email", "type": "string"}
]}'
);

Now we have all the data ingested into Hadoop represented as external Hive tables, which have been actually sourced differently, but are all now coming together into the Data Lake in a way that enables querying and further processing. With the mechanisms explained in this chapter the whole Data Lake is coming together as shown in the figure (Figure 29) from coverage perspective.:

Figure 29: Single Customer View Coverage
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.80.209