Using STRING_SPLIT

Since SQL Server does not support arrays, when multiple values need to be sent to it, developers use a list of values (usually comma-separated ones).

SQL Server 2008 introduced an excellent feature called table-valued parameters (TVP), which allows you to pack values in a table and transfer them to SQL Server in table format. On the server, stored procedures or queries use this parameter as a table variable and can leverage set-based operations to improve performance, compared to separate executions per single parameter value. Thus, in all editions of SQL Server 2008 onwards it is strongly recommended to use TVP instead of a list of values in such cases.

However, lists of values as parameters for stored procedures are still widely used, mainly for the following two reasons:

  • Missing support for TVP in JDBC drivers: Java applications and services still have to use comma-separated lists or XML to transfer a list of values to SQL Server
  • Legacy code: Significant amounts of Transact-SQL code from the previous SQL Server versions, where TVP was not supported.

You might ask yourself why companies still have legacy code in their production systems and why they don't migrate the old code so that they can benefit from new features and enhancements. For instance, why are old implementations with comma-separated lists not replaced by the recommended TVPs? The migration steps are not complex and every developer can perform them. However, in a medium or large company, developers cannot decide what should be done. Their responsibility scope is related to how and not to what. Therefore, in such cases, developers can suggest the migration to project managers or product owners and the decision about the priority of the action is made on the business side. To migrate a comma-separated list to TVP, you need to change not only the body of stored procedures, but also their parameters and their interface. You also need to change the data access layer to touch the application code, to adjust unit tests, to compile the project, and to deploy it. Even if your tests are fully automated, this is not a trivial effort. On the other hand, the migration does not bring significant improvements for customers. Nowadays, development processes are mostly based on the agile methodology and features mostly wanted and appreciated by customers have the highest priority. Therefore, such migration actions usually remain at the bottom of the to-do list.

When a list of values is transferred to SQL Server as a stored procedure parameter, in the stored procedure body this list has to be converted to a table. Until SQL Server 2016, there was no built-in function that could perform this action. Developers had to write user-defined functions (UDF) or play with the FOR XML PATH extension for that purpose. An excellent overview and performance comparison of existing UDFs for converting a string to a table can be found in the article Split strings the right way – or the next best way, written by Aaron Bertrand. The article is available at the following address: http://sqlperformance.com/2012/07/t-sql-queries/split-strings.

Finally, the SQL Server development team added the STRING_SPLIT function into the latest release. This is a table-valued function and converts a delimited string into a single-column table. The function accepts two input arguments:

  • String: An expression of any non-deprecated string data type that needs to be split
  • Separator: Single character used as a separator in the input string

Since it is a table-valued function, it returns a table. The returned table contains only one column with the name value and with a data type and length that are the same as those of the input string.

Here is an example showing how this function can be used to produce a three-row table for a comma-separated list as input. Execute this code:

USE tempdb; 
SELECT value FROM STRING_SPLIT(N'Rapid Wien,Benfica Lisboa,Seattle Seahawks',','); 

The preceding query produces the following output:

value
------------------
Rapid Wien
Benfica Lisboa
Seattle Seahawks

The actual execution plan for the preceding query looks as follows:

Estimated Number of Rows for the STRING_SPLIT function

Notice that the Estimated Number of Rows is 50. This is always the case with this function: the estimated output is 50 rows and it does not depend on the number of string elements. Even when you specify the OPTION (RECOMPILE) query hint, the estimation remains the same. In the case of user-defined table-valued functions, the Estimated Number of Rows is 100.

As a table-valued function, STRING_SPLIT can be used not only in the SELECT clause, but also in FROM, WHERE, and wherever a table expression is supported. To demonstrate its usage, you will use the new SQL Server sample database: WideWorldImporters. The database is available for download at https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0. The following query extracts stock items with the 16GB tag in the Tags attribute:

USE WideWorldImporters; 
SELECT StockItemID, StockItemName, Tags  
FROM Warehouse.StockItems  
WHERE '"16GB"' IN (SELECT value FROM STRING_SPLIT(REPLACE(REPLACE(Tags,'[',''), ']',''), ','));

This query produces the following result:

The following code example demonstrates how this function can be used to return details about orders for IDs provided in a comma-separated list:

USE WideWorldImporters; 
DECLARE @orderIds AS VARCHAR(100) = '1,3,7,8,9,11'; 
SELECT o.OrderID, o.CustomerID, o.OrderDate FROM Sales.Orders AS o 
INNER JOIN STRING_SPLIT(@orderIds,',') AS x ON x.value= o.OrderID; 

This produces the following output:

OrderID     CustomerID  OrderDate
----------- ----------- ----------
1           832         2013-01-01
3           105         2013-01-01
7           575         2013-01-01
8           964         2013-01-01
9           77          2013-01-01
11          586         2013-01-01

Note that, since the function returns a column of STRING data type, there is an implicit conversion between the columns involved in the JOIN clause:

DECLARE @input AS NVARCHAR(20) = NULL; 
SELECT * FROM STRING_SPLIT(@input,',');

This is the output produced by the preceding command:

value
--------

The STRING_SPLIT function requires that the database is at least in compatibility level 130. If this is not the case, you will get an error. The next code example demonstrates an attempt to use this function under compatibility level 120:

USE WideWorldImporters; 
ALTER DATABASE WideWorldImporters SET COMPATIBILITY_LEVEL = 120; 
GO 
SELECT value FROM STRING_SPLIT('1,2,3',','); 
/*Result: 
Msg 208, Level 16, State 1, Line 65 
Invalid object name 'STRING_SPLIT'. 
*/ 
--back to the original compatibility level 
ALTER DATABASE WideWorldImporters SET COMPATIBILITY_LEVEL = 130; 

This is a handy function and will definitely have its use cases. However, as you will have already noticed, there are some limitations:

  • Single character separator: The function accepts only a single character separator; if you had a separator with more characters, you would still need to write your own user-defined function.
  • Single output column: The output is a single column table, without the position of the string element within the delimited string. Thus, you can only sort the output by the element name.
  • String data type: When you use this function to delimit a string of numbers, although all the values in the output column are numbers, their data type is string, and when you join them to numeric columns in other tables, data type conversion is required.

If these limitations are acceptable to you, you should use the STRING_SPLIT function in future developments. I would always suggest a built-in function rather than a user-defined one if they are similar from a performance point of view. It is always pre-deployed and available in all databases. Once again, take note that the database must be at least in compatibility level 130.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.38.92