© Kim Berg Hansen 2020
K. Berg HansenPractical Oracle SQLhttps://doi.org/10.1007/978-1-4842-5617-6_9

9. Splitting Delimited Text

Kim Berg Hansen1 
(1)
Middelfart, Denmark
 

Particularly if you get data from somewhere else, it is not uncommon to get it in the form of a string with a list of values separated by some delimiter, typically comma, semicolon, tab, or similar. As you most often don’t know the number of elements in the list, you can’t just use substr to split it into a fixed number of columns. Instead it is normally most useful to be able to turn the list into rows, so you can treat it as a table in your SQL.

Such splitting involves generating rows, which you can do in many ways. I’ll show some different methods, ranging from using PL/SQL to loop over the elements of the list and generating a row at a time, over generating all rows at once by selecting from dual and retrieving the elements for each row from the list, to pretending the list is JSON and parsing it with native JSON functionality.

Customer favorites and reviews

You would practically never model your tables with a column containing delimited strings (actually I can’t think of a use case for it, but it’s safer never to say never). You would get such strings from external data sources like files. For demonstration purposes here, the web site of Good Beer Trading Co gives the customers a possibility to choose their favorite beers as well as review beers; the favorites and reviews end up in the customer_favorites and customer_reviews tables shown in Figure 9-1.
../images/475066_1_En_9_Chapter/475066_1_En_9_Fig1_HTML.jpg
Figure 9-1

Tables involved in these examples

Both tables have a proper foreign key to the customers table, but of course cannot have it to the products table, as the product ids are just part of the strings in columns favorite_list and review_list – I show sample data in the upcoming sections. The task at hand is basically to extract out those product ids to be able to join to the products table.

Delimited single values

In Listing 9-1, I examine the data of the customer_favorites table, where column favorite_list contains a comma-separated list of product ids. One customer has saved an empty favorite list.
SQL> select customer_id, favorite_list
  2  from customer_favorites
  3  order by customer_id;
CUSTOMER_ID  FAVORITE_LIST
50042        4040,5310
50741        5430,7790,7870
51007
51069        6520
Listing 9-1

Comma-delimited content of customer_favorites table

I now need to treat this list as if it was a child table with a row for each of the comma-separated entries. That will enable me to join to the products table (and any other table with a product id column, for that matter). In the rest of this section, I show four different ways to do this.

Pipelined table function

One way that will work also in old database versions (since version 8i) is to extract values from the string in a PL/SQL table function. That requires a collection type (nested table type) and function whose return value is of that type, such as what I create in Listing 9-2.
SQL> create type favorite_coll_type
  2     as table of integer;
  3  /
Type FAVORITE_COLL_TYPE compiled
SQL> create or replace function favorite_list_to_coll_type (
  2     p_favorite_list   in customer_favorites.favorite_list%type
  3  )
  4     return favorite_coll_type pipelined
  5  is
  6     v_from_pos  pls_integer;
  7     v_to_pos    pls_integer;
  8  begin
  9     if p_favorite_list is not null then
 10        v_from_pos := 1;
 11        loop
 12           v_to_pos := instr(p_favorite_list, ',', v_from_pos);
 13           pipe row (to_number(
 14              substr(
 15                 p_favorite_list
 16               , v_from_pos
 17               , case v_to_pos
 18                    when 0 then length(p_favorite_list) + 1
 19                           else v_to_pos
 20                 end - v_from_pos
 21              )
 22           ));
 23           exit when v_to_pos = 0;
 24           v_from_pos := v_to_pos + 1;
 25        end loop;
 26     end if;
 27  end favorite_list_to_coll_type;
 28  /
Function FAVORITE_LIST_TO_COLL_TYPE compiled
Listing 9-2

Collection type and pipelined table function

Collection types can be of object types or scalar types – in this case a scalar type: integer.

I’ve chosen to make the table function pipelined by using the keyword pipelined in line 4.

Inside the function, I create a loop beginning in line 11, where I search for the position of the next comma (the first if it’s the first iteration of the loop). Lines 13–22 then pipe a row to the output containing the substr from the previous comma to the found comma (or the end of the string if no comma was found).

If I reach the end of the string (no comma was found), line 23 breaks out of the loop. If there’s still something left in the string, line 24 sets the next v_from_pos to be used in the next iteration of the loop.

The loop strategy works if there’s at least one element in the comma-separated list. If it’s a completely empty list, I make sure in line 9 that I don’t start the loop at all – in such a case, no rows will be piped to the output.

Tip

I could have used a regular table function instead of pipelined – then I would have had to build the entire output collection before returning it. But if a table function is meant to be used strictly from SQL and never from PL/SQL, it is almost always a good idea to make it pipelined. This has the advantage of less PGA memory usage as well as the ability to quit processing if the client SQL stops fetching rows from the function. The downside is that you cannot use it in PL/SQL.

Having created my table function, I can use it in Listing 9-3 to split my strings into collections and turn the collections into rows.
SQL> select
  2     cf.customer_id
  3   , fl.column_value as product_id
  4  from customer_favorites cf
  5     , table(
  6          favorite_list_to_coll_type(cf.favorite_list)
  7       ) fl
  8  order by cf.customer_id, fl.column_value;
Listing 9-3

Using pipelined table function to split string

The table keyword in line 5 takes a collection (nested table) and turns the elements of the collection into rows. If the collection had been of an object type, the columns of the result would have been named like the object attributes, but here the collection is of a scalar type (integer), and then the single column is always called column_value, which in line 3 I give a more meaningful column alias:
CUSTOMER_ID  PRODUCT_ID
50042        4040
50042        5310
50741        5430
50741        7790
50741        7870
51069        6520

But you’ll undoubtedly notice that the customer with a blank favorite_list is missing in the output. That’s how Listing 9-3 works; I’m joining the customer_favorites table to the row source that is pipelined from my function, and it outputs (correctly) no rows for a blank favorite_list. This is exactly as if I was inner joining to a child table where no rows existed for this customer.

If I want to show the customer with no favorites, I need the equivalent of a left outer join. But as there are no join predicates, I cannot use the (+) syntax on a predicate column. Instead Oracle supports putting the (+) syntax directly after the table(...) call, so I can change line 7 to this:
...
  7       )(+) fl
...
And that gives me an output that includes the customer with no favorites:
CUSTOMER_ID  PRODUCT_ID
50042        4040
50042        5310
50741        5430
50741        7790
50741        7870
51007
51069        6520
The row source that’s the result of the table function I can of course use for joins as well, just like if it had been a real child table. I demonstrate this in Listing 9-4, at the same time showing you how to do ANSI style joins to the table function instead of the traditional comma used in Listing 9-3.
SQL> select
  2     cf.customer_id  as c_id
  3   , c.name          as cust_name
  4   , fl.column_value as p_id
  5   , p.name          as prod_name
  6  from customer_favorites cf
  7  cross apply table(
  8     favorite_list_to_coll_type(cf.favorite_list)
  9  ) fl
 10  join customers c
 11     on c.id = cf.customer_id
 12  join products p
 13     on p.id = fl.column_value
 14  order by cf.customer_id, fl.column_value;
Listing 9-4

Join the results of the splitting to products

The normal join syntax requires an on clause, which I do not have and do not need. In principle what I need is like a cross join lateral to an inline view, but in ANSI SQL, it has been decided instead to use a special syntax cross apply for this, which I put just before the table keyword in line 7.

The rest is normal SQL with normal joins using the column_value column in the on clause in line 13:
C_ID   CUST_NAME        P_ID  PROD_NAME
50042  The White Hart   4040  Coalminers Sweat
50042  The White Hart   5310  Monks and Nuns
50741  Hygge og Humle   5430  Hercule Trippel
50741  Hygge og Humle   7790  Summer in India
50741  Hygge og Humle   7870  Ghost of Hops
51069  Der Wichtelmann  6520  Der Helle Kumpel
If again I want to include the customer with no favorites, in ANSI SQL I do not use (+), instead I change the cross apply in line 7 to outer apply, which necessitates changing join in line 12 to left outer join:
...
  7  outer apply table(
  8     favorite_list_to_coll_type(cf.favorite_list)
  9  ) fl
 10  join customers c
 11     on c.id = cf.customer_id
 12  left outer join products p
 13     on p.id = fl.column_value
...
Customer Boom Beer Bar, who has no favorites, is now included in the output:
C_ID   CUST_NAME        P_ID  PROD_NAME
50042  The White Hart   4040  Coalminers Sweat
50042  The White Hart   5310  Monks and Nuns
50741  Hygge og Humle   5430  Hercule Trippel
50741  Hygge og Humle   7790  Summer in India
50741  Hygge og Humle   7870  Ghost of Hops
51007  Boom Beer Bar
51069  Der Wichtelmann  6520  Der Helle Kumpel

This first method is a custom built table function for this purpose only. You can also do a generic function, but in fact you don’t need to do that. The built-in APEX schema that you probably have in your database has already done this for you, as I’ll show next.

Built-in APEX table function

There is APEX API function apex_util.string_to_table(favorite_list, ',') – but it returns a PL/SQL collection type defined in a package, not a nested table type defined in SQL. But it is a deprecated function anyway, so I just mention it so you won’t use it, even if you happen to Google it.

Note

As of version 12.2, APEX is not installed in the database by default; rather it is just shipped with the software for easy installation. Even if your company does not use APEX applications as such, I think it is a good idea to install APEX in the database anyway to take advantage of the API packages when you code SQL and PL/SQL. If you wish, you can do it without configuring a web listener (ORDS, embedded PL/SQL gateway, or Oracle HTTP Server).

From APEX version 5.1, the supported function for this is apex_string.split, which returns a SQL nested table type and therefore is good to use in SQL as well. Listing 9-5 is like Listing 9-4, just using the APEX API function instead of the custom function I created before.
SQL> select
  2     cf.customer_id  as c_id
  3   , c.name          as cust_name
  4   , to_number(fl.column_value) as p_id
  5   , p.name          as prod_name
  6  from customer_favorites cf
  7  cross apply table(
  8     apex_string.split(cf.favorite_list, ',')
  9  ) fl
 10  join customers c
 11     on c.id = cf.customer_id
 12  join products p
 13     on p.id = to_number(fl.column_value)
 14  order by cf.customer_id, p_id;
Listing 9-5

Splitting with apex_string.split

The difference is just the function call in line 8 and then a small detail in line 14, where I utilize the fact that I can use column aliases in the order by clause to order by the more meaningful p_id instead of fl.column_value.

The output of Listing 9-5 is identical to that of Listing 9-4. Both methods call PL/SQL functions to do the actual splitting of the strings, which of course means context switching happening. Next up is a method in straight SQL without the context switching.

Straight SQL with row generators

No matter which method I use, I need to generate rows for each of the elements in the comma-delimited lists. The two previous methods used collections and the table function for this purpose. Another typical method of generating rows is to use a connect by query on dual, and this can be used here as well, as I show in Listing 9-6.
SQL> select
  2     favs.customer_id as c_id
  3   , c.name           as cust_name
  4   , favs.product_id  as p_id
  5   , p.name           as prod_name
  6  from (
  7     select  8        cf.customer_id
  9      , to_number(
 10           regexp_substr(cf.favorite_list, '[^,]+', 1, sub#)
 11        ) as product_id
 12     from customer_favorites cf
 13     cross join lateral(
 14        select level sub#
 15        from dual
 16        connect by level <= regexp_count(cf.favorite_list, ',') + 1
 17     ) fl
 18  ) favs
 19  join customers c
 20     on c.id = favs.customer_id
 21  join products p
 22     on p.id = favs.product_id
 23  order by favs.customer_id, favs.product_id;
Listing 9-6

Generating as many rows as delimiter count

Using cross join lateral in line 13 makes the inline view fl in lines 14–16 be executed for each row in customer_favorites, since I correlate the lateral inline view by using cf.favorite_list in line 16. By counting the number of commas and adding one, the inline view generates exactly the number of rows as there are elements in the comma-separated list.

As I’ve numbered the fl rows consecutively 1, 2, 3... in column sub#, I can use sub# in regexp_substr in line 10 to extract the first, second, third... occurrence of a “list of at least one character not containing a comma.” This is then my product_id which I use to join the products table.

The output of Listing 9-6 is identical to both Listing 9-5 and Listing 9-4.

The preceding simple regular expression works if every element in the list has at least one character (hence the +). If I want it to work also if an element can be blank (meaning two commas in a row in the string), it will not work simply by changing the + to a *, instead I need to switch to slightly more complex regular expression like this:
...
 10           regexp_substr(
 11              cf.favorite_list
 12            , '(^|,)([^,]*)'
 13            , 1
 14            , sub#
 15            , null
 16            , 2
 17           )
...

The second group in the expression is like before, just with + changed to *, but I need to state it must follow either the beginning of the string or a comma. As I don’t want that preceding comma to be part of the output, I ask for regexp_substr to return to me just the second group (line 16).

Treating the string as a JSON array

A simple comma-separated list of values can become a JSON array as shown in Listing 9-7.
SQL> select
  2     cf.customer_id  as c_id
  3   , c.name          as cust_name
  4   , fl.product_id   as p_id
  5   , p.name          as prod_name
  6  from customer_favorites cf
  7  outer apply json_table(
  8     '[' || cf.favorite_list || ']'
  9   , '$[*]'
 10     columns (
 11        product_id number path '$'
 12     )
 13  ) fl
 14  join customers c
 15     on c.id = cf.customer_id
 16  left outer join products p
 17     on p.id = fl.product_id
 18  order by cf.customer_id, fl.product_id;
Listing 9-7

Treating the string as a JSON array

Instead of a PL/SQL table function, I use the SQL function json_table in line 7.

The first parameter to json_table must be valid JSON, which in this case I can very simply accomplish by surrounding the comma-separated list with square brackets in line 8.

Note

I can keep line 8 very simple only because my values are all numeric. If there had been text values involved, I would have needed to surround the text values with double quotes by replacing commas with quote-comma-quotes and take into consideration escaping any existing quotes. Then I would do as Stew Ashton shows here: https://stewashton.wordpress.com/2018/06/05/splitting-strings-a-new-champion/.

In line 9, I state that there should be one row output from json_table for every element in the JSON array. As those elements are simple scalars, the path in line 11 becomes a simple $.

I’ve shown four methods to split simple delimited strings into rows of scalar values. In most cases, I’d choose between using straight SQL, JSON arrays, and apex_string.split. If you have very long strings with many elements, the SQL method of asking for the 1st, 2nd, 3rd…occurrence in regexp_substr might become slower for the 50th occurrence – such a case might be better with a function that pipes a row as it traverses the string. On the other hand, if you have many relatively short strings each with few elements, the overhead of occurrence retrieval of elements might be smaller than the comparatively more context switching to PL/SQL.

As always, test your own use case whether SQL or pipelined function is the best. If pipelined function is the answer for you, using built-in apex_string.split is often a good choice – creating your own pipelined function would be useful if your database does not have the APEX API packages installed or if you need some special datatype handling.

Now it’s time to increase the complexity and look at delimited strings with some more structure in them.

Delimited multiple values

From time to time, I see applications where a string contains data with two delimiters – a row delimiter and a column delimiter. These days that would typically be a JSON string instead, but as data lives on a long time, you might still have to deal with such strings.

As an example here, I’ve chosen that the customers on the Good Beer Trading Co web site not only can enter their favorite lists, but they can also enter a list of beers that they review, each beer with a score of A, B, or C. This information is stored in column review_list of table customer_reviews, the content of which I show in Listing 9-8.
SQL> select customer_id, review_list
  2  from customer_reviews
  3  order by customer_id;
Listing 9-8

Comma- and colon-delimited content of customer_reviews table

The row delimiter is a comma, the column delimiter is a colon, so the data is like product:score,product:score,…
CUSTOMER_ID  REVIEW_LIST
50042        4040:A,6600:C,7950:B
50741        4160:A
51007
51069        4280:B,7790:B

To split up those strings into rows and columns, I’ll show you four different methods.

Custom ODCI table function

The first method I’ll show involves a pipelined table function again, but not a straightforward one like Listing 9-2.

Instead I am implementing it with the Oracle Data Cartridge Interface (ODCI) that allows me to hook into specific points in the processing of a SQL statement. This means that when the SQL engine hard parses a statement using this function, it will call my code to find out what columns and datatypes will be returned – instead of finding this information from the data dictionary. When a statement is prepared, when a row is fetched, and when the cursor is closed – all these will call my code instead of the standard handling.

Note

This is just one type of ODCI function implementing a custom pipelined table function. ODCI can also be used to implement a custom aggregate function, which I’ll show you in the next chapter.

Here I’ll focus on using this ODCI function – all of the details of the PL/SQL is outside the scope of this book. In Listing 9-9, I just show the skeleton of the object type used for implementation of the function.

For the curious reader, the complete code is available in the companion scripts. I describe the internals in detail on my blog: www.kibeha.dk/2015/06/supposing-youve-got-data-as-text-string.html.
SQL> create or replace type delimited_col_row as object (
...
 14   , static function parser(
 15        p_text      in    varchar2
 16      , p_cols      in    varchar2
 17      , p_col_delim in    varchar2 default '|'
 18      , p_row_delim in    varchar2 default ';'
 19     ) return anydataset pipelined
 20       using delimited_col_row
 21
 22   , static function odcitabledescribe(
...
 28     ) return number
 29
 30   , static function odcitableprepare(
...
 37     ) return number
 38
 39   , static function odcitablestart(
...
 45     ) return number
 46
 47   , member function odcitablefetch(
...
 51     ) return number
 52
 53   , member function odcitableclose(
...
 55     ) return number
 56  )
 57  /
Type DELIMITED_COL_ROW compiled
SQL> create or replace type body delimited_col_row as
...
260  end;
261  /
Type Body DELIMITED_COL_ROW compiled
Listing 9-9

The skeleton of the object type that implements the ODCI function

The object type must contain and implement the 5 odci* functions – they will be called by the SQL engine, not by anyone using the type.

The parser function is the one that should be called when you wish to use it. As it references the implementing object type using the syntax using delimited_col_row (line 20), it needs not be inside the object type; if you prefer, it could be implemented as a stand-alone function or in a package.

The object type can be used generically – in Listing 9-10, I use it for this specific case.
SQL> select cr.customer_id, rl.product_id, rl.score
  2  from customer_reviews cr
  3  outer apply table (
  4     delimited_col_row.parser(
  5        cr.review_list
  6      , 'PRODUCT_ID:NUMBER,SCORE:VARCHAR2(1)'
  7      , ':'
  8      , ','
  9     )
 10  ) rl
 11  order by cr.customer_id, rl.product_id;
Listing 9-10

Using the ODCI table function to parse the delimited data

Just like Listing 9-4, I do an apply on my table function – in this case I chose an outer apply instead of a cross apply. The table function delimited_col_row.parser then takes four parameters:
  • First, the string that contains my delimited data: cr.review_list

  • Then, the specification of the “columns” of each “row” of delimited data, what are their names and datatypes (this should be a literal, not a variable, as this is used at hard parse time, not soft parsing)

  • Last, what is the column delimiter and the row delimiter in the data (these same delimiters I use in the column specification in line 6)

When I execute this statement the first time (hard parse), the SQL engine calls my odcitabledescribe function, which parses the second parameter and lets the SQL engine know the table function will return a row set with two columns, product_id and score, of the specified datatypes.

Then the SQL engine runs through odcitableprepare, odcitablestart, odcitablefetch, and odcitableclose. The actual splitting of the string data happens in odcitablefetch, where next row delimiter is found and the data split by the column delimiter, so a “row” is returned. At the end I see this output:
CUSTOMER_ID  PRODUCT_ID  SCORE
50042        4040        A
50042        6600        C
50042        7950        B
50741        4160        A
51007
51069        4280        B
51069        7790        B
Note that I didn’t have to do any column aliasing of a generic column_value – I can use rl.product_id and rl.score directly. I use this in Listing 9-11 for a meaningful join to the products table.
SQL> select
  2     cr.customer_id  as c_id
  3   , c.name          as cust_name
  4   , rl.product_id   as p_id
  5   , p.name          as prod_name
  6   , rl.score
  7  from customer_reviews cr
  8  cross apply table (
  9     delimited_col_row.parser(
 10        cr.review_list
 11      , 'PRODUCT_ID:NUMBER,SCORE:VARCHAR2(1)'
 12      , ':'
 13      , ','
 14     )
 15  ) rl
 16  join customers c
 17     on c.id = cr.customer_id
 18  join products p
 19     on p.id = rl.product_id
 20  order by cr.customer_id, rl.product_id;
Listing 9-11

Joining with real column names instead of generic column_value

In line 8, I used cross apply, so the output doesn’t have the customer with no reviews:
C_ID   CUST_NAME        P_ID  PROD_NAME         SCORE
50042  The White Hart   4040  Coalminers Sweat  A
50042  The White Hart   6600  Hazy Pink Cloud   C
50042  The White Hart   7950  Pale Rider Rides  B
50741  Hygge og Humle   4160  Reindeer Fuel     A
51069  Der Wichtelmann  4280  Hoppy Crude Oil   B
51069  Der Wichtelmann  7790  Summer in India   B

Using an ODCI implementation like this allows fine control of all the small details of the implementation. This is well and good, but there are other solutions as well that doesn’t need installing a custom ODCI function.

Combining apex_string.split and substr

For the simple delimited list, I showed using apex_string.split as an alternative to building your own pipelined table function. There is no such standard alternative for the ODCI function delimited_col_row.parser that will handle both rows and columns.

But I can separate handling of columns from handling of rows, as shown in Listing 9-12.
SQL> select
  2     cr.customer_id  as c_id
  3   , c.name          as cust_name
  4   , p.id            as p_id
  5   , p.name          as prod_name
  6   , substr(
  7        rl.column_value
  8      , instr(rl.column_value, ':') + 1
  9     ) as score
 10  from customer_reviews cr
 11  cross apply table(
 12     apex_string.split(cr.review_list, ',')
 13  ) rl
 14  join customers c
 15     on c.id = cr.customer_id
 16  join products p
 17     on p.id = to_number(
 18                  substr(
 19                     rl.column_value
 20                   , 1
 21                   , instr(rl.column_value, ':') - 1
 22               ))
 23  order by cr.customer_id, p_id;
Listing 9-12

Getting rows with apex_string.split and columns with substr

I start by splitting the review list into rows in line 12 by using apex_string.split with the row delimiter comma. That means that rl will have rows with column_value, which will contain values with the two columns delimited by a colon – for example, 4040:A.

Then it is a simple matter of using substr to pick out the product id in lines 17–22 and pick out the score in lines 6–9. The output is identical to Listing 9-11.

I’ve eliminated the custom function, but I’m still incurring a lot of context switches to PL/SQL, so next I’ll try to use pure SQL again.

Row generators and regexp_substr

Similar to how I used apex_string.split to get the rows and then substr to get the columns, I am adapting Listing 9-6 to create Listing 9-13, where I generate rows with dual and use regexp_substr to get the columns.
SQL> select
  2     revs.customer_id as c_id
  3   , c.name           as cust_name
  4   , revs.product_id  as p_id
  5   , p.name           as prod_name
  6   , revs.score
  7  from (
  8     select
  9        cr.customer_id
 10      , to_number(
 11           regexp_substr(
 12              cr.review_list
 13            , '(^|,)([^:,]*)'
 14            , 1
 15            , sub#
 16            , null
 17            , 2
 18           )
 19        ) as product_id
 20      , regexp_substr(
 21           cr.review_list
 22         , '([^:,]*)(,|$)'
 23         , 1
 24         , sub#
 25         , null
 26         , 1
 27        ) as score
 28     from customer_reviews cr
 29     cross join lateral(
 30        select level sub#
 31        from dual
 32        connect by level <= regexp_count(cr.review_list, ',') + 1
 33     ) rl
 34  ) revs
 35  join customers c
 36     on c.id = revs.customer_id
 37  join products p
 38     on p.id = revs.product_id
 39  order by revs.customer_id, revs.product_id;
Listing 9-13

Generating as many rows as delimiter count

The lateral inline view in lines 29–33 is just as I did in Listing 9-6. The trick here is to specify suitable regular expressions in lines 13 and 22 to extract the two columns as what comes before and after the colon, respectively:
  • Line 13 looks for either the beginning of the string or a comma (group 1), followed by zero or more characters that are neither colon nor comma (group 2). Line 17 states the function should return the second group (this needs minimum version 11.2).

  • Line 22 looks for zero or more characters that are neither colon nor comma (group 1), followed by either a comma or the end of the string (group 2). Line 26 states the function should return the first group.

Listing 9-13 produces an identical output as Listing 9-11 and Listing 9-12, but does it without PL/SQL calls at all. The cost is more use of regular expression functions, which can be relatively CPU expensive – so to find which performs best, you should test the approaches against your specific use case.

All three solutions so far handle the string as it is, but I also mentioned at the start of the chapter that in many modern applications, such data would be stored as JSON rather than delimited. The database is capable of efficiently handling JSON as well as XML, so here’s a fourth method that utilizes this.

Transformation to JSON

The first thing I want to do is to transform the delimited string into some valid JSON. This I do in Listing 9-14, where I transform the delimited pieces into a JSON array of JSON arrays, where each inner array has two elements, the first having the value of the product id and the second having the value of the review score.
SQL> select
  2     customer_id
  3   , '[["'
  4     || replace(
  5           replace(
  6              review_list
  7            , ','
  8            , '"],["'
  9           )
 10         , ':'
 11         , '","'
 12        )
 13     || '"]]'
 14     as json_list
 15  from customer_reviews
 16  order by customer_id;
Listing 9-14

Turning delimited text into JSON

Let me show you the output before I explain the code:
CUSTOMER_ID  JSON_LIST
50042        [["4040","A"],["6600","C"],["7950","B"]]
50741        [["4160","A"]]
51007        [[""]]
51069        [["4280","B"],["7790","B"]]

You can see in the output that the code in lines 3–13 transformed the text of review_list into nested JSON arrays. An outer array whose elements correspond to rows, where each row itself is an inner array whose elements correspond to columns.

To do this transformation, the innermost replace in lines 5–9 replaces each row delimiter (comma) with the five characters "],[", where each character is
  • End of inner element

  • End of inner array

  • Comma as delimiter between elements of the outer array

  • Start of new inner array

  • Start of new inner element

After that the replace in lines 4 and 10–12 replaces each column delimiter (colon) with the three characters ",", where each character is
  • End of inner element

  • Comma as delimiter between elements in the inner array

  • Start of new inner element

In line 3, the JSON begins with the three characters [[" for start of outer array, start of inner array, and start of inner element.

Finally in line 13, the JSON ends with the three characters "]] for end of inner element, end of inner array, and end of outer array.

Having created the string concatenation expression that transforms the delimited string to JSON, I can now use it in the json_table function in Listing 9-15.
SQL> select
  2     cr.customer_id  as c_id
  3   , c.name          as cust_name
  4   , rl.product_id   as p_id
  5   , p.name          as prod_name
  6   , rl.score
  7  from customer_reviews cr
  8  cross apply json_table (
  9     '[["'
 10     || replace(
 11           replace(
 12              cr.review_list
 13            , ','
 14            , '"],["'
 15           )
 16         , ':'
 17         , '","'
 18        )
 19     || '"]]'
 20   , '$[*]'
 21     columns (
 22        product_id  number      path '$[0]'
 23      , score       varchar2(1) path '$[1]'
 24     )
 25  ) rl
 26  join customers c
 27     on c.id = cr.customer_id
 28  join products p
 29     on p.id = rl.product_id
 30  order by cr.customer_id, rl.product_id;
Listing 9-15

Parsing JSON with json_table

The first parameter to the json_table function is the JSON itself, so lines 9–19 are the expression I developed in the previous listing.

The second parameter in line 20 specifies that json_table should take as rows all the inner arrays (*) in the outer JSON array that is in the root of the JSON string ($).

And last in the column specification lines 22–23, I state that the first element ($[0]) of the inner array is a number and should be a column called product_id, while the second element ($[1]) of the inner array is a varchar2 and should be a column called score.

As you see, this output is identical to the output of the three previous methods:
C_ID   CUST_NAME        P_ID  PROD_NAME         SCORE
50042  The White Hart   4040  Coalminers Sweat  A
50042  The White Hart   6600  Hazy Pink Cloud   C
50042  The White Hart   7950  Pale Rider Rides  B
50741  Hygge og Humle   4160  Reindeer Fuel     A
51069  Der Wichtelmann  4280  Hoppy Crude Oil   B
51069  Der Wichtelmann  7790  Summer in India   B

As shown before, if I had wanted to show the customer with a blank review_list, I change cross apply in line 8 to outer apply.

Tip

Listing 9-15 can be adapted to use linefeed for row delimiter and comma for column delimiter if you have plain CSV in a CLOB, for example. Alternatively you could look into the apex_data_parser package as shown here: https://blogs.oracle.com/apex/super-easy-csv-xlsx-json-or-xml-parsing-about-the-apex_data_parser-package.

Using json_table requires version 12.1.0.2 or newer. If you have a need for older versions, you’ll find in the companion script an example of doing the same thing by transforming to XML and using xmltable instead.

Lessons learned

Delimited text is most often a list of values separated by a single delimiter, but it can also be more structured with, for example, both a “row” delimiter and a “column” delimiter. I’ve shown both types of examples in this chapter along with multiple ways of splitting them, so you can
  • Split delimited text with SQL only or built-in PL/SQL functionality.

  • Create custom PL/SQL table functions – both regular and the ODCI variant – for special needs.

  • Transform the text to JSON and use native JSON parsing.

If you create your own data model, you should use child tables, collections, XML, or JSON rather than relying on storing data as delimited text. But it is common to receive delimited text from places out of your control, in which case any of the shown methods can be useful. Normally using native and built-in functionality is the easiest and the best performant, but for more special use cases, you can test if the other methods are better suited for you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.154.208