Functions for transformation

Many of the functions discussed so far in this chapter can be used for the sanitation/transformation phase of data warehousing. Let's look at some of these functions with the specific task of sanitization/transformation:

  1. Decoding of encoded values. The CASE function can be used to create columns based on conditions of other columns:
 CASE
WHEN region = "W" THEN "West"
WHEN region = "E" THEN "East"
END AS region
  1. Calculation of values. Any of the arithmetic functions can be used in this case:
 SUM(revenue) / COUNT(orders) AS average_order_value
  1. Splitting delimited string values into individual columns. The REGEX_EXTRACT() function can be used to extract individual parts of a string. Here is a how the function can be used to pull the value prior to the first space:
#standardSQL 
SELECT REGEXP_EXTRACT(title,r'^([w-]+)')
FROM `bigquery-public-data.samples.wikipedia`
  1. Calculating values based on aggregation. The GROUP BY statement can be used to aggregate rows into summarized columns:
#standardSQL
SELECT word, SUM(word_count)
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY word
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.7.102