BigQuery SQL functions

BigQuery, much like other database management systems, allows users to use functions on their data. Functions are objects that perform actions on input data and output result data. While there are a number of different types of functions, this section will focus on the basic string (which acts on string or character values) and Aggregate functions (which act on a set of values to return a single value). In most cases, functions can be used by typing the function name and wrapping the value that will be acted upon in parentheses. In some cases, a function parameter (or parameters) will also have to be added, which is usually delimited by a comma. Here is an example:

FUNCTION_NAME(value,<parameter>)
Function Function type Action Example
CHAR_LENGTH String Returns the length of a string CHAR_LENGTH(value)
CONCAT String Concatenates two or more values into a single string CONCAT("hey","you")
LOWER String Coerces a string to lowercase LOWER("HELLO")
TRIM String Removes leading and trailing spaces from a string TRIM(" CAR ")
SUBSTR String Returns the string starting from the position specified SUBSTR("CAT",2)
SUM Aggregate Returns the sum of the input values SUM(2,2)
AVG Aggregate Returns the average of the input values AVG(2,10)
MIN Aggregate Returns the minimum of the input values MIN(1,3,5)
MAX Aggregate Returns the maximum of the input values

MAX(1,3,5)

 

The following section provides some examples on how to use aggregate functions and other built-in functions. The following query returns the minimum and maximum values in the column specified in the query:

#legacySQL
SELECT MIN( trip_start_timestamp),MAX( trip_start_timestamp )
FROM [bigquery-public-data:chicago_taxi_trips.taxi_trips]

The following query returns the maximum year and minimum year from the specified column. This query uses the YEAR function to get the year from the timestamp column, and the min and max functions to get the minimum and maximum year available for the trip_start_timestamp column:

#legacySQL
SELECT YEAR(MIN(trip_start_timestamp)),YEAR(MAX( trip_start_timestamp ))
FROM [bigquery-public-data:chicago_taxi_trips.taxi_trips]

The following query returns the number of trips made by the taxi companies for each available year in the table. It uses the count aggregate function:

#legacySQL
SELECT YEAR(trip_start_timestamp) as trip_year, company, count(1) trip_per_year
FROM [bigquery-public-data:chicago_taxi_trips.taxi_trips]
WHERE company is not null
AND company != ''
GROUP BY trip_year, company
ORDER by trip_year, company

The following query returns the number of trips made by the companies for each month available in the table. The CAST function is used to convert the timestamp type column trip_start_timestamp to a DATE type:

#standardSQL
SELECT FORMAT_DATE('%Y%m', CAST(trip_start_timestamp AS DATE)) as trip_year_month, company, count(1) trip_per_month
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE company is not null
AND company != ''
GROUP BY trip_year_month, company
ORDER by trip_year_month, company

The following query uses a subquery to return the monthly average trips made by each company for the available years in the table. The inner query uses the CAST function to convert the trip_start_timestamp column to a DATE type and then uses FORMAT_DATE to format the date as YYYYMM. In the outer query, the SUBSTR function is used to get the year from the YYYYMM value:

#standardSQL
SELECT SUBSTR(trip_year_month,1,4) as trip_year, company, AVG(trip_per_month) as monthly_average
FROM
(
SELECT FORMAT_DATE('%Y%m', CAST(trip_start_timestamp AS DATE)) as trip_year_month, company, count(1) trip_per_month
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE company is not null
AND company != ''
GROUP BY trip_year_month, company
ORDER by trip_year_month, company
)
GROUP BY trip_year, company
ORDER by trip_year, company

The following query uses the previous query, but applies a filter on the grouped columns using the HAVING clause:

#standardSQL
SELECT SUBSTR(trip_year_month,1,4) as trip_year, company, AVG(trip_per_month) as monthly_average, SUM(trip_per_month) as total_trips_year
FROM
(
SELECT FORMAT_DATE('%Y%m', CAST(trip_start_timestamp AS DATE)) as trip_year_month, company, count(1) trip_per_month
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE company is not null
AND company != ''
GROUP BY trip_year_month, company
ORDER by trip_year_month, company
)
GROUP BY trip_year, company
HAVING monthly_average > 100 and total_trips_year > 1500
ORDER by trip_year, company
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.195.111