Wildcard tables

Wildcard is a way of performing a union on tables whose names are similar and have compatible schemas. The following queries show how to perform wildcard operations on tables in the public dataset bigquery-public-data:new_york provided by Google.

The following query gets the number of trips per year made by a yellow taxi in New York. The query uses UNION ALL on all tables that start with the name tlc_yellow_trips_. If a new table is added for 2017, this query has to be modified to include that table as well. To automatically include tables having similar names in the query, wildcard table syntax can be used. This query uses standard SQL:

#standardSQL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2009`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2010`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2011`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2012`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2013`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2014`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2015`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2016`
order by TripYear

The following query returns the same result by using the wildcard tables format in standard SQL. The FROM clause has the table name prefix specified with * at the end to select all tables starting with the name tlc_yellow_trips_:

#standardSQL
SELECT EXTRACT(YEAR from pickup_datetime) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_*`
GROUP BY TripYear
ORDER BY TripYear

If the data from 2009 to 2012 has to be selected and other tables are be ignored in the query, then use _TABLE_SUFFIX in the WHERE clause. The following query returns the trips from 2009 to 2012:

#standardSQL
SELECT EXTRACT(YEAR from pickup_datetime) as TripYear, count(1) as TripCount
FROM `bigquery-public-data.new_york.tlc_yellow_trips_*`
WHERE _TABLE_SUFFIX BETWEEN '2009' AND '2012'
GROUP BY TripYear
ORDER BY TripYear

The next query returns data only from 2010 and 2016. The data is also queried only from the tables that match the conditions in _TABLE_SUFFIX; hence, the billing will be less and performance will be better:

#standardSQL
SELECT EXTRACT(YEAR from pickup_datetime) as TripYear, count(1) as TripCount
FROM `bigquery-public-data.new_york.tlc_yellow_trips_*`
WHERE (_TABLE_SUFFIX = '2010'
OR _TABLE_SUFFIX = '2016')
GROUP BY TripYear
ORDER BY TripYear
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.131.10