© Kim Berg Hansen 2020
K. Berg HansenPractical Oracle SQLhttps://doi.org/10.1007/978-1-4842-5617-6_8

8. Pivoting Rows to Columns

Kim Berg Hansen1 
(1)
Middelfart, Denmark
 

The previous chapter was about unpivoting, which is the process of turning columns into rows. The opposite operation is called pivoting, which – surprise, surprise – is turning rows into columns.

The idea is that you have a resultset with some dimensional values in one or more columns and some facts/measure values in one or more other columns. You’d like the output grouped by some other columns, so you only have one aggregated row for those values, and then the values from your measures should be placed in a set of columns, one for each value of your dimension (or combination of values if you have multiple dimensions).

One thing to remember here is that in SQL, the engine needs at parse time to be able to determine names and datatypes of each column. That means that you have to hardcode the dimension values and what column names they should be turned into.

If you wish to have dynamic pivoting, where there automatically will be columns for every dimension value in the data, you need to build it with dynamic SQL similarly to what I showed at the end of the previous chapter. That way there will be a parsing every time you run it, and the column names can then be known at that time. Alternatively the pivot clause supports returning XML instead of columns, which allows you dynamic pivoting without dynamic SQL – which can be an option if an XML output is acceptable. Either way of dynamic pivoting will not be covered in this book.

Tip

In Oracle version 18c or newer, there is a third dynamic pivoting method using polymorphic table functions. I won’t be covering PTFs in this book, but Chris Saxon of the Oracle AskTom team has an example of a PTF for dynamic pivoting on Live SQL: https://livesql.oracle.com/apex/livesql/file/content_HPN95108FSSZD87PXX7MG3LW3.html.

Tables for pivoting

The Good Beer Trading Co purchases beer from some breweries, storing the information in the purchases table shown in Figure 8-1, along with dimension lookup tables breweries, products, and product_groups.
../images/475066_1_En_8_Chapter/475066_1_En_8_Fig1_HTML.jpg
Figure 8-1

Purchases table and associated dimension tables

I’ll be demonstrating pivoting data by brewery, product group, and year. To do that, I use the view purchases_with_dims in Listing 8-1, which simply joins the purchases table with the dimension tables.
SQL> create or replace view purchases_with_dims
  2  as
  3  select
  4     pu.id
  5   , pu.purchased
  6   , pu.brewery_id
  7   , b.name as brewery_name
  8   , pu.product_id
  9   , p.name as product_name
 10   , p.group_id
 11   , pg.name as group_name
 12   , pu.qty
 13   , pu.cost
 14  from purchases pu
 15  join breweries b
 16     on b.id = pu.brewery_id
 17  join products p
 18     on p.id = pu.product_id
 19  join product_groups pg
 20     on pg.id = p.group_id;
View PURCHASES_WITH_DIMS created.
Listing 8-1

View joining purchases table with the dimensions

At first I’m going to aggregate the quantity grouped by brewery, product group, and year in Listing 8-2, which is a simple group by without any pivoting at all.
SQL> select
  2     brewery_name
  3   , group_name
  4   , extract(year from purchased) as yr
  5   , sum(qty) as qty
  6  from purchases_with_dims pwd
  7  group by
  8     brewery_name
  9   , group_name
 10   , extract(year from purchased)
 11  order by
 12     brewery_name
 13   , group_name
 14   , yr;
Listing 8-2

Yearly purchased quantities by brewery and product group

The output shows me that the company bought from three breweries, two different product groups from each brewery, in three years from 2016 to 2018, resulting in 18 rows for those combinations:
BREWERY_NAME        GROUP_NAME  YR    QTY
Balthazar Brauerei  Belgian     2016  800
Balthazar Brauerei  Belgian     2017  1000
Balthazar Brauerei  Belgian     2018  1000
Balthazar Brauerei  Wheat       2016  500
Balthazar Brauerei  Wheat       2017  500
Balthazar Brauerei  Wheat       2018  400
Brewing Barbarian   IPA         2016  200
Brewing Barbarian   IPA         2017  300
Brewing Barbarian   IPA         2018  500
Brewing Barbarian   Stout       2016  800
Brewing Barbarian   Stout       2017  1000
Brewing Barbarian   Stout       2018  1200
Happy Hoppy Hippo   IPA         2016  1000
Happy Hoppy Hippo   IPA         2017  900
Happy Hoppy Hippo   IPA         2018  800
Happy Hoppy Hippo   Wheat       2016  200
Happy Hoppy Hippo   Wheat       2017  100
Happy Hoppy Hippo   Wheat       2018  100

Now I’d like to have a column for quantity purchased each of the three years instead of a row for each year – this is what pivoting is all about.

Pivoting single measure and dimension

Listing 8-3 shows how I do the pivoting of the years using the pivot clause.
SQL> select *
  2  from (
  3     select
  4        brewery_name
  5      , group_name
  6      , extract(year from purchased) as yr
  7      , sum(qty) as qty
  8     from purchases_with_dims pwd
  9     group by
 10        brewery_name
 11      , group_name
 12      , extract(year from purchased)
 13  ) pivot (
 14     sum(qty)
 15     for yr
 16     in (
 17        2016 as y2016
 18      , 2017 as y2017
 19      , 2018 as y2018
 20     )
 21  )
 22  order by brewery_name, group_name;
Listing 8-3

Pivoting the year rows into columns

I built the query of these elements:
  • Lines 3–12 simply are the select from Listing 8-2, wrapped in an inline view.

  • The pivot keyword in line 13 tells Oracle I want to pivot the data.

  • Then I define my measures – in this case only one, the quantity – in line 14. I must use an aggregate function here – it can be any aggregate, the one that makes sense in this case is sum.

  • After the keyword for in line 15, I define the dimensions I want – here only the year.

  • Last, the in clause in lines 16–19 maps in which columns the aggregated measure should be placed for which values of the dimension – columns that do not exist in the table, but will be created in the output.

Shown schematically, you can see in Figure 8-2 that the measure sum(qty) flows to the three column aliases, one for each of the values of the yr dimension.
../images/475066_1_En_8_Chapter/475066_1_En_8_Fig2_HTML.jpg
Figure 8-2

The flows of the pivot clause

And so I get the output that I desired with 18 aggregated quantities shown in six rows of three quantity columns (one per year) instead of 18 rows:
BREWERY_NAME        GROUP_NAME  Y2016  Y2017  Y2018
Balthazar Brauerei  Belgian     800    1000   1000
Balthazar Brauerei  Wheat       500    500    400
Brewing Barbarian   IPA         200    300    500
Brewing Barbarian   Stout       800    1000   1200
Happy Hoppy Hippo   IPA         1000   900    800
Happy Hoppy Hippo   Wheat       200    100    100

Notice that the yr and qty columns from the inline view are no longer in the output, but brewery_name and group_name are. What happens is that those columns I am referencing in the measures and dimensions in the pivot clause are used for the pivoting. The columns that are left over, they are used for an implicit group by.

Since in my inline view I have already grouped the data by brewery, product group, and year, this means that the sum(qty) in line 14 actually always will “aggregate” just a single row of data into each of the year columns, so that aggregation is not really necessary. But I cannot skip it – the pivot clause demands an aggregate function.

What I can do instead is to skip the group by within the inline view and instead let the implicit group by performed by pivot do the aggregation alone, thus avoiding an unnecessary grouping operation. Listing 8-4 simply is the same as Listing 8-3, just with the group by from Listing 8-3 lines 9–12 removed.
SQL> select *
  2  from (
  3     select
  4        brewery_name
  5      , group_name
  6      , extract(year from purchased) as yr
  7      , qty
  8     from purchases_with_dims pwd
  9  ) pivot (
 10     sum(qty)
 11     for yr
 12     in (
 13        2016 as y2016
 14      , 2017 as y2017
 15      , 2018 as y2018
 16     )
 17  )
 18  order by brewery_name, group_name;
Listing 8-4

Utilizing the implicit group by

Listing 8-4 gives exactly the same output as Listing 8-3; it is just a little bit more efficient from not doing a superfluous grouping operation.

You might think that I could then skip the inline view completely? Well, sometimes it is possible, but not in this case, first because I need to extract the year from the purchased date column and second because the pivot performs an implicit group by on the remaining columns after some of the columns have been used for measures and dimensions.

If I had the yr column in the view and could pivot directly on the purchases_with_dims view, the grouping would be performed on all the columns of the view except qty and yr – it would give me the wrong result. The inline view lets me keep only the columns I need – those to be used in the pivoting and those to be used for the implicit group by.

To make it a little more clear what’s happening behind the scenes with the pivot clause, let me show you pivoting performed manually without pivot.

Do-it-yourself manual pivoting

In really old database versions (before version 10), I would have had to do pivoting myself with no help from the pivot clause. Instead I would have had to write a query like Listing 8-5.
SQL> select
  2     brewery_name
  3   , group_name
  4   , sum(
  5        case extract(year from purchased)
  6           when 2016 then qty
  7        end
  8     ) as y2016
  9   , sum(
 10        case extract(year from purchased)
 11           when 2017 then qty
 12        end
 13     ) as y2017
 14   , sum(
 15        case extract(year from purchased)
 16           when 2018 then qty
 17        end
 18     ) as y2018
 19  from purchases_with_dims pwd
 20  group by
 21     brewery_name
 22   , group_name
 23  order by brewery_name, group_name;
Listing 8-5

Manual pivoting without using pivot clause

I do a group by brewery and product group in lines 20–22. And then I have three case structures for each of the three columns I want, so that all rows in the view from the year 2016 will have the qty value summed in column y2016, all rows from 2017 will be summed in y2017, and 2018 in y2018. The output is exactly the same as Listing 8-4 and Listing 8-3.

This structure is built for me automatically when I use the pivot clause. In Listing 8-4, I defined I wanted to use aggregate function sum on the value from column qty, but such that qty for rows in year 2016 goes to a column I want to be named y2016, and so on. I am not defining what to use for the implicit group by – this will be whatever columns are left over, so therefore I am using the inline view to limit the columns that go to the pivot clause rather than use all columns of the view.

Knowing this is the way pivot works will help, when I now show you pivoting with multiple measures by also using the column cost from the table purchases and the view purchases_with_dims, instead of just qty.

Multiple measures

I’m going to extend my query to not only pivot the aggregate quantity but also the aggregate cost. In Listing 8-6, you see I’ve simply added the cost column in line 8, so I also can add the aggregate measure sum(cost) in line 12.
SQL> select *
  2  from (
  3     select
  4        brewery_name
  5      , group_name
  6      , extract(year from purchased) as yr
  7      , qty
  8      , cost
  9     from purchases_with_dims pwd
 10  ) pivot (
 11     sum(qty)
 12   , sum(cost)
 13     for yr
 14     in (
 15        2016 as y2016
 16      , 2017 as y2017
 17      , 2018 as y2018
 18     )
 19  )
 20  order by brewery_name, group_name;
Error at Command Line : 1 Column : 8
Error report -
SQL Error: ORA-00918: column ambiguously defined
Listing 8-6

Getting an ORA-00918 error with multiple measures

Why do I get an error saying column ambiguously defined? I haven’t written the same column alias twice? Well, not directly, but indirectly I have.

What happens is that I have defined two measures with no column aliases. Then I have defined the three year values in the yr dimension and column aliases for them. There will be created a column for every combination, so 2 x 3 = 6 columns. Those six columns will be named <dimension alias>_<measure alias>, but if there are no measure aliases, then they will just be named <dimension alias>, as you saw in Listings 8-3 and 8-4. There it was okay, but here it means there will be two columns named y2016, two columns y2017, and two columns y2018. Thus the ORA-00918 error.

The solution is to also give the measures column aliases, so, for example, I can do as shown in Figure 8-3, where I alias the measures simply q and c, while the dimension values are aliased with two digits of the year (since those aliases do not start with a letter, they need to be quoted).

This generates therefore the six columns (2 x 3) that are named 16_Q, 16_C, and so on.
../images/475066_1_En_8_Chapter/475066_1_En_8_Fig3_HTML.jpg
Figure 8-3

Schematic flow when you have multiple measures

And to show you it is not just in a schematic diagram it works, I change Listing 8-6 by aliasing the measures and dimension values as shown in Figure 8-3:
...
 10  ) pivot (
 11     sum(qty)  as q
 12   , sum(cost) as c
 13     for yr
 14     in (
 15        2016 as "16"
 16      , 2017 as "17"
 17      , 2018 as "18"
 18     )
 19  )
 ...
And I get the output I want:
BREWERY_NAME        GROUP_NAME  16_Q  16_C  17_Q  17_C  18_Q  18_C
Balthazar Brauerei  Belgian     800   5840  1000  7360  1000  6960
Balthazar Brauerei  Wheat       500   3280  500   3600  400   2800
Brewing Barbarian   IPA         200   1440  300   1680  500   3920
Brewing Barbarian   Stout       800   5600  1000  6960  1200  8960
Happy Hoppy Hippo   IPA         1000  7360  900   6400  800   5680
Happy Hoppy Hippo   Wheat       200   960   100   800   100   720

(Normally I’d probably pick a little more descriptive column aliases, but using so short aliases makes the lines fit in a book.)

So I’ve now demonstrated getting pivoted columns as combinations of multiple measures and values of a single dimension. Next up is adding multiple dimensions too.

Multiple dimensions as well

So far I’ve pivoted only with the year as a dimension, leaving brewery and product group as the columns that are used for implicit group by. Now I’m going to also pivot the product group as a second dimension, leaving only the brewery to be grouped upon.

I have in my data 4 product groups and 3 years, which would mean 12 combinations of dimension values, each showing 2 measures (quantity and cost) for a total of 24 columns. That’s a bit large to demo here on a printed page, so in Listing 8-7, I’m reducing the data a bit by selecting only two product groups in line 10 and only two years (2017 and 2018) in lines 11–12.
SQL> select *
  2  from (
  3     select
  4        brewery_name
  5      , group_name
  6      , extract(year from purchased) as yr
  7      , qty
  8      , cost
  9     from purchases_with_dims pwd
 10     where group_name in ('IPA', 'Wheat')
 11     and   purchased >= date '2017-01-01'
 12     and   purchased <  date '2019-01-01'
 13  ) pivot (
 14     sum(qty)  as q
 15   , sum(cost) as c
 16     for (group_name, yr)
 17     in (
 18        ('IPA'  , 2017) as i17
 19      , ('IPA'  , 2018) as i18
 20      , ('Wheat', 2017) as w17
 21      , ('Wheat', 2018) as w18
 22     )
 23  )
 24  order by brewery_name;
Listing 8-7

Combining two dimensions and two measures

You’ll notice that the content of the inline view in lines 3–12 is in principle the same as before; I’ve simply added a where clause to reduce the dataset I’m pivoting.

The measures q and c in lines 14–15 are also unchanged, just as they were when I only used a single dimension.

Line 16 is different, since here I am no longer just specifying a single column to be my dimension. I am specifying an expression list of two columns instead – group_name and yr.

And since I use an expression list of two columns in my for clause, I also need to use corresponding expression lists of values in the in clause mappings in lines 18–21. Each value expression list (combinations of dimension values) I give a column alias – in this case a very short alias to keep my lines short enough for print; in real life more meaningful aliases should be used.

In total you can see in Figure 8-4 that the combining of the two dimensions I do manually with the expression list and then the combining of the dimension values and the measures automatically creates the columns named with the aliases joined by an underscore.
../images/475066_1_En_8_Chapter/475066_1_En_8_Fig4_HTML.jpg
Figure 8-4

Flows with multiple dimensions just have expression lists instead of single expressions

And those eight column names you see in the output of Listing 8-7:
BREWERY_NAME        I17_Q I17_C I18_Q I18_C W17_Q W17_C W18_Q W18_C
Balthazar Brauerei                          500   3600  400   2800
Brewing Barbarian   300   1680  500   3920
Happy Hoppy Hippo   900   6400  800   5680  100   800   100   720

The blanks are because the Good Beer Trading Co does not buy any IPA from Balthazar Brauerei nor any Wheat beers from Brewing Barbarian.

Knowing how the pivoting works as an implicit group by as I showed earlier about do-it-yourself manual pivoting, you can also see that in principle, I did not need to reduce the dataset with the where clause in lines 10–12. I could simply remove those three lines, and my output would be exactly the same. (Since I do have all three breweries in my output already, if I had had breweries with no purchases at all within the years and product groups I’m after, then there’d be output differences in the form of empty rows.)

However, it would not be a good idea to do so, since the data from the other years and other product groups still would be processed; the implicit case structures would just mean no data from those other years and product groups would be added to the aggregate sums. It would be a waste of CPU cycles and I/O.

Lessons learned

With the help of a mix of code examples and some diagrams showing how the bits and pieces of the pivot clause work together creating new columns, I’ve covered pivoting topics as
  • Pivoting with the three elements of the pivot clause, measures, dimensions, and mappings

  • Naming the pivoted columns with measure and dimension aliases, where combinations with multiple measures are automatically joined with underscores

  • Manual pivoting with group by and aggregation on case structures to aid understanding of how pivot works

  • Using expression lists for values from multiple dimensions when pivoting

Pivoting is a very useful tool in your toolbox for a variety of things, quite often simply because users get a much better overview of their data if they do not need to read a lot of rows like the output of Listing 8-2, but can have fewer rows with more columns like the various pivoted outputs in the chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.192.3