Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8. Pivoting Rows to Columns

Kim Berg Hansen¹

(1)

Middelfart, Denmark

The previous chapter was about unpivoting, which is the process of turning columns into rows. The opposite operation is called pivoting, which – surprise, surprise – is turning rows into columns.

The idea is that you have a resultset with some dimensional values in one or more columns and some facts/measure values in one or more other columns. You’d like the output grouped by some other columns, so you only have one aggregated row for those values, and then the values from your measures should be placed in a set of columns, one for each value of your dimension (or combination of values if you have multiple dimensions).

One thing to remember here is that in SQL, the engine needs at parse time to be able to determine names and datatypes of each column. That means that you have to hardcode the dimension values and what column names they should be turned into.

If you wish to have dynamic pivoting, where there automatically will be columns for every dimension value in the data, you need to build it with dynamic SQL similarly to what I showed at the end of the previous chapter. That way there will be a parsing every time you run it, and the column names can then be known at that time. Alternatively the pivot clause supports returning XML instead of columns, which allows you dynamic pivoting without dynamic SQL – which can be an option if an XML output is acceptable. Either way of dynamic pivoting will not be covered in this book.

Tip

In Oracle version 18c or newer, there is a third dynamic pivoting method using polymorphic table functions. I won’t be covering PTFs in this book, but Chris Saxon of the Oracle AskTom team has an example of a PTF for dynamic pivoting on Live SQL: https://livesql.oracle.com/apex/livesql/file/content_HPN95108FSSZD87PXX7MG3LW3.html.

Tables for pivoting

The Good Beer Trading Co purchases beer from some breweries, storing the information in the purchases table shown in Figure 8-1, along with dimension lookup tables breweries, products, and product_groups.

../images/475066_1_En_8_Chapter/475066_1_En_8_Fig1_HTML.jpg — Figure 8-1
Purchases table and associated dimension tables

I’ll be demonstrating pivoting data by brewery, product group, and year. To do that, I use the view purchases_with_dims in Listing 8-1, which simply joins the purchases table with the dimension tables.

SQL> create or replace view purchases_with_dims

2 as

3 select

4 pu.id

5 , pu.purchased

6 , pu.brewery_id

7 , b.name as brewery_name

8 , pu.product_id

9 , p.name as product_name

10 , p.group_id

11 , pg.name as group_name

12 , pu.qty

13 , pu.cost

14 from purchases pu

15 join breweries b

16 on b.id = pu.brewery_id

17 join products p

18 on p.id = pu.product_id

19 join product_groups pg

20 on pg.id = p.group_id;

View PURCHASES_WITH_DIMS created.

Listing 8-1

View joining purchases table with the dimensions

At first I’m going to aggregate the quantity grouped by brewery, product group, and year in Listing 8-2, which is a simple group by without any pivoting at all.

SQL> select

2 brewery_name

3 , group_name

4 , extract(year from purchased) as yr

5 , sum(qty) as qty

6 from purchases_with_dims pwd

7 group by

8 brewery_name

9 , group_name

10 , extract(year from purchased)

11 order by

12 brewery_name

13 , group_name

14 , yr;

Listing 8-2

Yearly purchased quantities by brewery and product group

The output shows me that the company bought from three breweries, two different product groups from each brewery, in three years from 2016 to 2018, resulting in 18 rows for those combinations:

BREWERY_NAME GROUP_NAME YR QTY

Balthazar Brauerei Belgian 2016 800

Balthazar Brauerei Belgian 2017 1000

Balthazar Brauerei Belgian 2018 1000

Balthazar Brauerei Wheat 2016 500

Balthazar Brauerei Wheat 2017 500

Balthazar Brauerei Wheat 2018 400

Brewing Barbarian IPA 2016 200

Brewing Barbarian IPA 2017 300

Brewing Barbarian IPA 2018 500

Brewing Barbarian Stout 2016 800

Brewing Barbarian Stout 2017 1000

Brewing Barbarian Stout 2018 1200

Happy Hoppy Hippo IPA 2016 1000

Happy Hoppy Hippo IPA 2017 900

Happy Hoppy Hippo IPA 2018 800

Happy Hoppy Hippo Wheat 2016 200

Happy Hoppy Hippo Wheat 2017 100

Happy Hoppy Hippo Wheat 2018 100

Now I’d like to have a column for quantity purchased each of the three years instead of a row for each year – this is what pivoting is all about.

Pivoting single measure and dimension

Listing 8-3 shows how I do the pivoting of the years using the pivot clause.

SQL> select *

2 from (

3 select

4 brewery_name

5 , group_name

6 , extract(year from purchased) as yr

7 , sum(qty) as qty

8 from purchases_with_dims pwd

9 group by

10 brewery_name

11 , group_name

12 , extract(year from purchased)

13 ) pivot (

14 sum(qty)

15 for yr

16 in (

17 2016 as y2016

18 , 2017 as y2017

19 , 2018 as y2018

20 )

21 )

22 order by brewery_name, group_name;

Listing 8-3

Pivoting the year rows into columns

I built the query of these elements:

Lines 3–12 simply are the select from Listing 8-2, wrapped in an inline view.
The pivot keyword in line 13 tells Oracle I want to pivot the data.
Then I define my measures – in this case only one, the quantity – in line 14. I must use an aggregate function here – it can be any aggregate, the one that makes sense in this case is sum.
After the keyword for in line 15, I define the dimensions I want – here only the year.
Last, the in clause in lines 16–19 maps in which columns the aggregated measure should be placed for which values of the dimension – columns that do not exist in the table, but will be created in the output.

Shown schematically, you can see in Figure 8-2 that the measure sum(qty) flows to the three column aliases, one for each of the values of the yr dimension.

../images/475066_1_En_8_Chapter/475066_1_En_8_Fig2_HTML.jpg — Figure 8-2
The flows of the pivot clause

And so I get the output that I desired with 18 aggregated quantities shown in six rows of three quantity columns (one per year) instead of 18 rows:

BREWERY_NAME GROUP_NAME Y2016 Y2017 Y2018

Balthazar Brauerei Belgian 800 1000 1000

Balthazar Brauerei Wheat 500 500 400

Brewing Barbarian IPA 200 300 500

Brewing Barbarian Stout 800 1000 1200

Happy Hoppy Hippo IPA 1000 900 800

Happy Hoppy Hippo Wheat 200 100 100

Notice that the yr and qty columns from the inline view are no longer in the output, but brewery_name and group_name are. What happens is that those columns I am referencing in the measures and dimensions in the pivot clause are used for the pivoting. The columns that are left over, they are used for an implicit group by.

Since in my inline view I have already grouped the data by brewery, product group, and year, this means that the sum(qty) in line 14 actually always will “aggregate” just a single row of data into each of the year columns, so that aggregation is not really necessary. But I cannot skip it – the pivot clause demands an aggregate function.

What I can do instead is to skip the group by within the inline view and instead let the implicit group by performed by pivot do the aggregation alone, thus avoiding an unnecessary grouping operation. Listing 8-4 simply is the same as Listing 8-3, just with the group by from Listing 8-3 lines 9–12 removed.

SQL> select *

2 from (

3 select

4 brewery_name

5 , group_name

6 , extract(year from purchased) as yr

7 , qty

8 from purchases_with_dims pwd

9 ) pivot (

10 sum(qty)

11 for yr

12 in (

13 2016 as y2016

14 , 2017 as y2017

15 , 2018 as y2018

16 )

17 )

18 order by brewery_name, group_name;

Listing 8-4

Utilizing the implicit group by

Listing 8-4 gives exactly the same output as Listing 8-3; it is just a little bit more efficient from not doing a superfluous grouping operation.

You might think that I could then skip the inline view completely? Well, sometimes it is possible, but not in this case, first because I need to extract the year from the purchased date column and second because the pivot performs an implicit group by on the remaining columns after some of the columns have been used for measures and dimensions.

If I had the yr column in the view and could pivot directly on the purchases_with_dims view, the grouping would be performed on all the columns of the view except qty and yr – it would give me the wrong result. The inline view lets me keep only the columns I need – those to be used in the pivoting and those to be used for the implicit group by.

To make it a little more clear what’s happening behind the scenes with the pivot clause, let me show you pivoting performed manually without pivot.

Do-it-yourself manual pivoting

In really old database versions (before version 10), I would have had to do pivoting myself with no help from the pivot clause. Instead I would have had to write a query like Listing 8-5.

SQL> select

2 brewery_name

3 , group_name

4 , sum(

5 case extract(year from purchased)

6 when 2016 then qty

7 end

8 ) as y2016

9 , sum(

10 case extract(year from purchased)

11 when 2017 then qty

12 end

13 ) as y2017

14 , sum(

15 case extract(year from purchased)

16 when 2018 then qty

17 end

18 ) as y2018

19 from purchases_with_dims pwd

20 group by

21 brewery_name

22 , group_name

23 order by brewery_name, group_name;

Listing 8-5

Manual pivoting without using pivot clause

I do a group by brewery and product group in lines 20–22. And then I have three case structures for each of the three columns I want, so that all rows in the view from the year 2016 will have the qty value summed in column y2016, all rows from 2017 will be summed in y2017, and 2018 in y2018. The output is exactly the same as Listing 8-4 and Listing 8-3.

This structure is built for me automatically when I use the pivot clause. In Listing 8-4, I defined I wanted to use aggregate function sum on the value from column qty, but such that qty for rows in year 2016 goes to a column I want to be named y2016, and so on. I am not defining what to use for the implicit group by – this will be whatever columns are left over, so therefore I am using the inline view to limit the columns that go to the pivot clause rather than use all columns of the view.

Knowing this is the way pivot works will help, when I now show you pivoting with multiple measures by also using the column cost from the table purchases and the view purchases_with_dims, instead of just qty.

Multiple measures

I’m going to extend my query to not only pivot the aggregate quantity but also the aggregate cost. In Listing 8-6, you see I’ve simply added the cost column in line 8, so I also can add the aggregate measure sum(cost) in line 12.

SQL> select *

2 from (

3 select

4 brewery_name

5 , group_name

6 , extract(year from purchased) as yr

7 , qty

8 , cost

9 from purchases_with_dims pwd

10 ) pivot (

11 sum(qty)

12 , sum(cost)

13 for yr

14 in (

15 2016 as y2016

16 , 2017 as y2017

17 , 2018 as y2018

18 )

19 )

20 order by brewery_name, group_name;

Error at Command Line : 1 Column : 8

Error report -

SQL Error: ORA-00918: column ambiguously defined

Listing 8-6

Getting an ORA-00918 error with multiple measures

Why do I get an error saying column ambiguously defined? I haven’t written the same column alias twice? Well, not directly, but indirectly I have.

What happens is that I have defined two measures with no column aliases. Then I have defined the three year values in the yr dimension and column aliases for them. There will be created a column for every combination, so 2 x 3 = 6 columns. Those six columns will be named <dimension alias>_<measure alias>, but if there are no measure aliases, then they will just be named <dimension alias>, as you saw in Listings 8-3 and 8-4. There it was okay, but here it means there will be two columns named y2016, two columns y2017, and two columns y2018. Thus the ORA-00918 error.

The solution is to also give the measures column aliases, so, for example, I can do as shown in Figure 8-3, where I alias the measures simply q and c, while the dimension values are aliased with two digits of the year (since those aliases do not start with a letter, they need to be quoted).

This generates therefore the six columns (2 x 3) that are named 16_Q, 16_C, and so on.

../images/475066_1_En_8_Chapter/475066_1_En_8_Fig3_HTML.jpg — Figure 8-3
Schematic flow when you have multiple measures

And to show you it is not just in a schematic diagram it works, I change Listing 8-6 by aliasing the measures and dimension values as shown in Figure 8-3:

...

10 ) pivot (

11 sum(qty) as q

12 , sum(cost) as c

13 for yr

14 in (

15 2016 as "16"

16 , 2017 as "17"

17 , 2018 as "18"

18 )

19 )

...

And I get the output I want:

BREWERY_NAME GROUP_NAME 16_Q 16_C 17_Q 17_C 18_Q 18_C

Balthazar Brauerei Belgian 800 5840 1000 7360 1000 6960

Balthazar Brauerei Wheat 500 3280 500 3600 400 2800

Brewing Barbarian IPA 200 1440 300 1680 500 3920

Brewing Barbarian Stout 800 5600 1000 6960 1200 8960

Happy Hoppy Hippo IPA 1000 7360 900 6400 800 5680

Happy Hoppy Hippo Wheat 200 960 100 800 100 720

(Normally I’d probably pick a little more descriptive column aliases, but using so short aliases makes the lines fit in a book.)

So I’ve now demonstrated getting pivoted columns as combinations of multiple measures and values of a single dimension. Next up is adding multiple dimensions too.

Multiple dimensions as well

So far I’ve pivoted only with the year as a dimension, leaving brewery and product group as the columns that are used for implicit group by. Now I’m going to also pivot the product group as a second dimension, leaving only the brewery to be grouped upon.

I have in my data 4 product groups and 3 years, which would mean 12 combinations of dimension values, each showing 2 measures (quantity and cost) for a total of 24 columns. That’s a bit large to demo here on a printed page, so in Listing 8-7, I’m reducing the data a bit by selecting only two product groups in line 10 and only two years (2017 and 2018) in lines 11–12.

SQL> select *

2 from (

3 select

4 brewery_name

5 , group_name

6 , extract(year from purchased) as yr

7 , qty

8 , cost

9 from purchases_with_dims pwd

10 where group_name in ('IPA', 'Wheat')

11 and purchased >= date '2017-01-01'

12 and purchased < date '2019-01-01'

13 ) pivot (

14 sum(qty) as q

15 , sum(cost) as c

16 for (group_name, yr)

17 in (

18 ('IPA' , 2017) as i17

19 , ('IPA' , 2018) as i18

20 , ('Wheat', 2017) as w17

21 , ('Wheat', 2018) as w18

22 )

23 )

24 order by brewery_name;

Listing 8-7

Combining two dimensions and two measures

You’ll notice that the content of the inline view in lines 3–12 is in principle the same as before; I’ve simply added a where clause to reduce the dataset I’m pivoting.

The measures q and c in lines 14–15 are also unchanged, just as they were when I only used a single dimension.

Line 16 is different, since here I am no longer just specifying a single column to be my dimension. I am specifying an expression list of two columns instead – group_name and yr.

And since I use an expression list of two columns in my for clause, I also need to use corresponding expression lists of values in the in clause mappings in lines 18–21. Each value expression list (combinations of dimension values) I give a column alias – in this case a very short alias to keep my lines short enough for print; in real life more meaningful aliases should be used.

In total you can see in Figure 8-4 that the combining of the two dimensions I do manually with the expression list and then the combining of the dimension values and the measures automatically creates the columns named with the aliases joined by an underscore.

../images/475066_1_En_8_Chapter/475066_1_En_8_Fig4_HTML.jpg — Figure 8-4
Flows with multiple dimensions just have expression lists instead of single expressions

And those eight column names you see in the output of Listing 8-7:

BREWERY_NAME I17_Q I17_C I18_Q I18_C W17_Q W17_C W18_Q W18_C

Balthazar Brauerei 500 3600 400 2800

Brewing Barbarian 300 1680 500 3920

Happy Hoppy Hippo 900 6400 800 5680 100 800 100 720

The blanks are because the Good Beer Trading Co does not buy any IPA from Balthazar Brauerei nor any Wheat beers from Brewing Barbarian.

Knowing how the pivoting works as an implicit group by as I showed earlier about do-it-yourself manual pivoting, you can also see that in principle, I did not need to reduce the dataset with the where clause in lines 10–12. I could simply remove those three lines, and my output would be exactly the same. (Since I do have all three breweries in my output already, if I had had breweries with no purchases at all within the years and product groups I’m after, then there’d be output differences in the form of empty rows.)

However, it would not be a good idea to do so, since the data from the other years and other product groups still would be processed; the implicit case structures would just mean no data from those other years and product groups would be added to the aggregate sums. It would be a waste of CPU cycles and I/O.

Lessons learned

With the help of a mix of code examples and some diagrams showing how the bits and pieces of the pivot clause work together creating new columns, I’ve covered pivoting topics as

Pivoting with the three elements of the pivot clause, measures, dimensions, and mappings
Naming the pivoted columns with measure and dimension aliases, where combinations with multiple measures are automatically joined with underscores
Manual pivoting with group by and aggregation on case structures to aid understanding of how pivot works
Using expression lists for values from multiple dimensions when pivoting

Pivoting is a very useful tool in your toolbox for a variety of things, quite often simply because users get a much better overview of their data if they do not need to read a lot of rows like the output of Listing 8-2, but can have fewer rows with more columns like the various pivoted outputs in the chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8. Pivoting Rows to Columns

Create new playlist

Sign In

Sign Up

8. Pivoting Rows to Columns

Tables for pivoting

Pivoting single measure and dimension

Do-it-yourself manual pivoting

Multiple measures

Multiple dimensions as well

Lessons learned

Table of Contents for
8. Pivoting Rows to Columns