Querying data in a partition table

For this demo, a public dataset by the name of nyc-tlc will be used. You can locate this dataset under Public Datasets, as shown in the following screenshot. This demo will show you how to create partition tables for existing data in other tables and how to query the table.

Run the following commands to extract 5 days of data from the trips_2015 table to the daily tables. Replace the destination dataset name and table name as per your project.

This command loads 2015-Mar-30 data into the trips_20150330 table in the PartitionTaxiTables dataset:

bq query --destination_table PartitionTaxiTables.trips_20150330 "SELECT * FROM [nyc-tlc:green.trips_2015] WHERE TIMESTAMP(pickup_datetime) between TIMESTAMP('2015-03-30 00:00:00 UTC') and TIMESTAMP('2015-03-30 23:59:59 UTC')"

The following command loads 2015-Mar-31 data into trips_20150331 in the PartitionTaxiTables dataset:

bq query --destination_table PartitionTaxiTables.trips_20150331 "SELECT * FROM [nyc-tlc:green.trips_2015] WHERE TIMESTAMP(pickup_datetime) between TIMESTAMP('2015-03-31 00:00:00 UTC') and TIMESTAMP('2015-03-31 23:59:59 UTC')"

This command loads 2015-Apr-01 data into the trips_20150401 table in the PartitionTaxiTables dataset:

bq query --destination_table PartitionTaxiTables.trips_20150331 "SELECT * FROM [nyc-tlc:green.trips_2015] WHERE TIMESTAMP(pickup_datetime) between TIMESTAMP('2015-04-01 00:00:00 UTC') and TIMESTAMP('2015-04-01 23:59:59 UTC')"

This one loads 2015-Apr-02 data into the trips_20150402 table in the same dataset:

bq query --destination_table PartitionTaxiTables.trips_20150402 "SELECT * FROM [nyc-tlc:green.trips_2015] WHERE TIMESTAMP(pickup_datetime) between TIMESTAMP('2015-04-02 00:00:00 UTC') and TIMESTAMP('2015-04-02 23:59:59 UTC')"

The following command loads 2015-Apr-03 data into trips_20150403 in the PartitionTaxiTables dataset:

bq query --destination_table PartitionTaxiTables.trips_20150403 "SELECT * FROM [nyc-tlc:green.trips_2015] WHERE TIMESTAMP(pickup_datetime) between TIMESTAMP('2015-04-03 00:00:00 UTC') and TIMESTAMP('2015-04-03 23:59:59 UTC')"

You should be able to see the data for the 5 days in separate tables, as shown in the following screenshot. You can see the count of records for each day by switching the value in the Table Details drop-down at the top.

To move the data from these daily tables to new partition tables, run the following command. Replace the dataset for source and destination tables and also the source and destination table name as per your project. The first argument, PartitionTaxiTables.trips_, is a prefix for the source table. The second argument is the destination table, which is a new partitioned table:

bq partition PartitionTaxiTables.trips_ PartitionTaxiTables.trips_partitioned

The following screenshot shows the structure of the newly created partition table from the preceding command:

For the same tables, create a non-partitioned table by loading all records into one table using the command given as follows. This command uses standard SQL; hence, the flag --use_legacy_sql is set to false:

bq query --destination_table PartitionTaxiTables.trips_non_partition --use_legacy_sql=false 'SELECT * FROM `PartitionTaxiTables.trips_2015*`'

Compare the storage size and record count of both the trips_partitioned table and the trips_non_partition table. They should have the same record count and same storage size, as shown in this screenshot:

The following query returns records for trips between 2015-Apr-01 and 2015-Apr-02 from the non-partitioned table. This query uses legacy SQL:

SELECT * FROM [PartitionTaxiTables.trips_non_partition] 
WHERE pickup_datetime between TIMESTAMP("2015-04-01 00:00:00") and TIMESTAMP("2015-04-02 23:59:59")

The following query returns records from the partitioned table for trips between 2015-Apr-01 and 2015-Apr-02. This query also uses legacy SQL:

SELECT * FROM [my-first-project-170319:PartitionTaxiTables.trips_partitioned] WHERE _PARTITIONTIME BETWEEN TIMESTAMP("2015-04-01 00:00:00") and TIMESTAMP("2015-04-02 23:59:59")

The processing details for these two queries are shown in the next screenshot for comparison. The query executed using the partitioned table used only 15.9 MB of data compared to the one that used the non-partitioned table, which used 39.1 MB of data. The bytes billed were also lower for the query that used the partition table.

To query only on the day's data from the partition table, you can directly specify the partition table name in the query as shown in the next screenshot. This query will return data from the 2015-Apr-01 table only. This query uses legacy SQL. If you are using this command in the Unix command shell, when passing to the bq utility, you need to escape the $ with $:

SELECT * FROM [PartitionTaxiTables.trips_partitioned$20150401]

You can combine data from multiple partitions of the partition table. The following query uses legacy SQL. This query gets records from both the 2015-Apr-01 and 2015-Apr-02 tables:

SELECT * FROM [PartitionTaxiTables.trips_partitioned$20150401],[PartitionTaxiTables.trips_partitioned$20150402]

To delete a partition, run the following command from the bq utility. The rm command is used to remove tables from a dataset and delete a specific partition. Provide the dataset name and partition name of the partition table as shown in this command. It deletes the trips for 2015-Apr-01:

bq rm PartitionTaxiTables.trips_partitioned$20150401

To get the list of partitions in a partitioned table, query the __PARTITIONS_SUMMARY__ internal table. The following is a legacy SQL command used to see a list of partitions in a partition table. Add the $__PARTITIONS_SUMMARY__ partition decorator to the query and select the partition_id column:

#legacySQL
SELECT partition_id
FROM [PartitionTaxiTables.trips_partitioned$__PARTITIONS_SUMMARY__]

The following is the standard SQL equivalent of the list of partitions in the partition table. The query just gets distinct values of the _PARTITIONTIME column from the partition table:

#standardSQL
SELECT
DISTINCT _PARTITIONTIME as pt
FROM
`PartitionTaxiTables.trips_partitioned`

Partitioned tables support streaming insert and streaming insert can be done to any partition, not just the current day's partition. The next section shows how to migrate data in your tables to partition tables, make your application query the data using partitions to save billing, and improve the performance of queries using partition tables in your projects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.102