Many of the functions discussed so far in this chapter can be used for the sanitation/transformation phase of data warehousing. Let's look at some of these functions with the specific task of sanitization/transformation:
- Decoding of encoded values. The CASE function can be used to create columns based on conditions of other columns:
CASE
WHEN region = "W" THEN "West"
WHEN region = "E" THEN "East"
END AS region
- Calculation of values. Any of the arithmetic functions can be used in this case:
SUM(revenue) / COUNT(orders) AS average_order_value
- Splitting delimited string values into individual columns. The REGEX_EXTRACT() function can be used to extract individual parts of a string. Here is a how the function can be used to pull the value prior to the first space:
#standardSQL
SELECT REGEXP_EXTRACT(title,r'^([w-]+)')
FROM `bigquery-public-data.samples.wikipedia`
- Calculating values based on aggregation. The GROUP BY statement can be used to aggregate rows into summarized columns:
#standardSQL
SELECT word, SUM(word_count)
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY word