CLR integration

You probably noticed that the two spatial data types are implemented as CLR data types. The spatial data types are shipped with SQL Server; therefore, Microsoft developers created them. However, you can also create your own CLR data types. SQL Server featured CLR inside the database engine for the first time in the 2005 version.

You can create the following CLR objects in a SQL Server database:

  • User-defined functions
  • Stored procedures
  • Triggers
  • User-defined aggregate functions
  • User-defined data types

You can use CLR objects to extend the functionality of the Transact-SQL language. You should use CLR for objects that you can't create in Transact-SQL, such as user-defined aggregates or user-defined data types. For objects that you can also create in Transact-SQL, such as functions, stored procedures, and triggers, you should use Transact-SQL for manipulating the data, and CLR only in the areas where CLR languages such as Visual C# are faster than Transact-SQL—such as complex calculations, and string manipulations.

For example, Transact-SQL language includes only a fistful of aggregate functions. To describe a distribution of a continuous variable, you use the first four population moments in the descriptive statistics, namely the following:

  • Mean, or average value
  • Standard deviation
  • Skewness
  • Kurtosis, or peakedness

Transact-SQL includes only aggregate functions for calculating the mean and the standard deviation. These two measures might be descriptors good enough to describe the regular normal, or Gaussian distribution, as the following figure shows:

Normal or Gaussian distribution

However, a distribution in the real world might not follow the normal curve exactly. Often it is skewed. A typical example is income, which is usually highly skewed to the right. The following figure shows a positively skewed distribution, where you have a long tail on the right side of the distribution:

Positively skewed distribution

Here is the formula for the skewness:

Where, , , , and

The formula for the skewness uses the mean value and the standard deviation. I don't want to calculate these values in advance. If I calculated these values in advance, I would need to scan through the data twice. I want to have a more efficient algorithm, an algorithm that would scan the data only once.

I use a bit of mathematics for this optimization. First, I expand the formula for the subtraction of the mean from the ith value on the third degree:

Then I use the fact that the sum is distributive over the product, as shown in the formula for two values only:

This formula can be generalized for all values:

Of course, I can do the same mathematics for the remaining elements of the expanded formula for the subtraction, and calculate all the aggregates I need with a single pass through the data, as shown in the following C# code for the user-defined aggregate function that calculates the skewness.

The first part of the code declares the namespaces used:

-- C# code for skewness 
using System; 
using System.Data; 
using System.Data.SqlClient; 
using System.Data.SqlTypes; 
using Microsoft.SqlServer.Server; 

You represent a user-defined aggregate (UDA) with a class or a structure in CLR. First, you decorate it with attributes that give some information about the UDA's behavior and information for potential optimization of the queries that use it:

[Serializable] 
[SqlUserDefinedAggregate( 
   Format.Native,                   
   IsInvariantToDuplicates = false,  
   IsInvariantToNulls = true,        
   IsInvariantToOrder = true,      
   IsNullIfEmpty = false)]        

The next part of the code for the UDA defines the structure and internal variables used to hold the intermediate results for the elements of the calculation, as I explained in the formula reorganization:

public struct Skew 
{ 
   private double rx;       
   private double rx2;      
   private double r2x;      
   private double rx3;      
   private double r3x2; 
   private double r3x;      
   private Int64 rn; 

Structures or classes that represent UDAs must implement four methods. The Init() method initializes the internal variables:

   public void Init() 
   { 
         rx = 0; 
         rx2 = 0; 
         r2x = 0; 
         rx3 = 0; 
         r3x2 = 0; 
         r3x = 0; 
         rn = 0; 
   } 

The Accumulate() method does the actual work of aggregating:

   public void Accumulate(SqlDouble inpVal) 
   { 
         if (inpVal.IsNull) 
         { 
               return; 
         } 
         rx = rx + inpVal.Value; 
         rx2 = rx2 + Math.Pow(inpVal.Value, 2); 
         r2x = r2x + 2 * inpVal.Value; 
         rx3 = rx3 + Math.Pow(inpVal.Value, 3); 
         r3x2 = r3x2 + 3 * Math.Pow(inpVal.Value, 2); 
         r3x = r3x + 3 * inpVal.Value; 
         rn = rn + 1; 
   } 

The Merge() method accepts another aggregate as the input. It merges two aggregates. Where do two or more aggregates come from? SQL Server might decide to execute the aggregating query in parallel, store the intermediate aggregate results internally, and then merge them by using the Merge() method:

   public void Merge(Skew Group) 
   { 
         this.rx = this.rx + Group.rx; 
         this.rx2 = this.rx2 + Group.rx2; 
         this.r2x = this.r2x + Group.r2x; 
         this.rx3 = this.rx3 + Group.rx3; 
         this.r3x2 = this.r3x2 + Group.r3x2; 
         this.r3x = this.r3x + Group.r3x; 
         this.rn = this.rn + Group.rn; 
   } 

The Terminate() method does the final calculations and returns the aggregated value to the calling query:

   public SqlDouble Terminate() 
   { 
         double myAvg = (rx / rn); 
         double myStDev = Math.Pow((rx2 - r2x * myAvg + rn * Math.Pow(myAvg, 2)) 
                          / (rn - 1), 1d / 2d); 
         double mySkew = (rx3 - r3x2 * myAvg + r3x * Math.Pow(myAvg, 2) 
                         - rn * Math.Pow(myAvg, 3)) / 
                        Math.Pow(myStDev,3) * rn / (rn - 1) / (rn - 2); 
         return (SqlDouble)mySkew; 
   } 
 
} 

You can use the C# compiler to compile the code for the UDA. However, in the associated code for the book, a compiled assembly, the .dll file, is provided for your convenience. The code also includes the function that calculates the kurtosis; for the sake of brevity, this code is not explained in detail here.

In order to use CLR objects, you need to enable CLR for your instance. The following code enables the CLR for your SQL Server instance, and then checks the CLR configuration options:

EXEC sp_configure 'clr enabled', 1; 
RECONFIGURE WITH OVERRIDE; 
-- Check the CLR options
SELECT name, value, minimum, maximum, value_in_use
FROM sys.configurations
WHERE name LIKE N'clr %';

The query returns two rows in SQL Server 2017:

name                 value  minimum  maximum  value_in_use
------------------- ----- ------- ------- ------------

clr enabled 1 0 1 1
clr strict security 1 0 1 1

Note the second option, the clr strict security option, which is new in SQL Server 2017, and also available with the latest updates for SQL Server 2016. So what is this option about?

In .NET (or CLR) code, you could traditionally use Code Access Security (CAS) as a security boundary to protect the system resources. With CAS, you could narrow down the permissions of the user executing the CLR code. For each assembly you imported to SQL Server, you could define a set of CAS permissions, named permission sets. There were three permission sets: SAFE,  EXTERNAL_ACCESS, and  UNSAFE. With the SAFE set, the CLR code could not perform any action that T-SQL code could not perform, and therefore it was safe to use the assembly. With EXTERNAL_ACCESS, you could access some external resources, like local disks and network shares. With the UNSAFE set, you could do nearly anything with the CLR code, including crashing SQL Server. There were some prerequisites for using unsafe CLR code in SQL Server.

In recent versions of .NET Framework, CAS is not a security boundary anymore. This is not a good piece of news for CLR code in SQL Server. Now SQL Server treats any CLR assembly as UNSAFE.

If the clr strict security option is not set to 1 on your SQL Server instance, please execute the following code:

EXEC sys.sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
EXEC sys.sp_configure 'clr strict security', 1;
RECONFIGURE WITH OVERRIDE;

In order to use a CLR assembly, you need to catalog, or deploy the assembly in the database with the CREATE ASSEMBLY statement. So, let's try to import the DescriptiveStatistics CLR assembly with permission set to SAFE. The code assumes that the assembly is stored in the C:SQL207DevGuide folder:

CREATE ASSEMBLY DescriptiveStatistics  
FROM 'C:SQL2017DevGuideDescriptiveStatistics.dll' 
WITH PERMISSION_SET = SAFE; 

You should get error 10343, and import of the assembly should fail. You can get the assembly in the database in one of the following ways:

  • Alter the database to set the TRUSTWORTHY option on (which is not the recommended way)
  • Sign the assembly with a certificate or an asymmetric key that has a corresponding login with the UNSAFE ASSEMBLY permission
  • Add an assembly to the list of trusted assemblies for the server with the sys.sp_add_trusted_assembly system stored procedure

Although this is not a recommended option, I am showing you here how to set the database option TRUSTWORTHY on, because this is the simplest and the shortest way. However, in production, you should sign your assemblies with a certificate. The following code sets the TRUSTWORTHY option on for the WideWorldImportersDW database. In order to change this option, the owner of the database should be the sa  login:

ALTER AUTHORIZATION ON database::WideWorldImportersDW TO sa;
ALTER DATABASE WideWorldImportersDW SET TRUSTWORTHY ON;

Now you can import the assembly. Then you create the aggregate functions with the CREATE AGGREGATE statement. The following code enables CLR, deploys the assembly provided with the book, and then creates the two aggregate functions:

CREATE ASSEMBLY DescriptiveStatistics  
FROM 'C:SQL2017DevGuideDescriptiveStatistics.dll' 
WITH PERMISSION_SET = SAFE; 
 
CREATE AGGREGATE dbo.Skew(@s float) 
RETURNS float 
EXTERNAL NAME DescriptiveStatistics.Skew; 
 
CREATE AGGREGATE dbo.Kurt(@s float) 
RETURNS float 
EXTERNAL NAME DescriptiveStatistics.Kurt; 

Once the assembly is cataloged and the UDAs are created, you can use them just like the built-in aggregate functions. The following query calculates the four moments for the sum over customers of the amount ordered without tax. In a CTE, it calculates the sum of the amount per customer, and then in the outer query the average, the standard deviation, the skewness, and the kurtosis for this total:

WITH CustomerSalesCTE AS 
( 
SELECT c.Customer,  
  SUM(f.[Total Excluding Tax]) AS TotalAmount 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0  
GROUP BY c.Customer 
) 
SELECT ROUND(AVG(TotalAmount), 2) AS Average, 
  ROUND(STDEV(TotalAmount), 2) AS StandardDeviation,  
  ROUND(dbo.Skew(TotalAmount), 6) AS Skewness, 
  ROUND(dbo.Kurt(TotalAmount), 6) AS Kurtosis 
FROM CustomerSalesCTE; 

Here is the result:

   Average        StandardDeviation Skewness  Kurtosis 
   -------------  ----------------- --------  ---------
   270479.220000  38586.08          0.005943  -0.263897
  

After you have tested the UDAs, you can execute the following code to clean up your database, and potentially disable CLR. Do not forget to set the TRUSTWORTHY option for the database to off:

DROP AGGREGATE dbo.Skew;
DROP AGGREGATE dbo.Kurt;
DROP ASSEMBLY DescriptiveStatistics;
ALTER DATABASE WideWorldImportersDW SET TRUSTWORTHY OFF;
GO
/*
EXEC sp_configure 'clr enabled', 0;
RECONFIGURE WITH OVERRIDE;
EXEC sys.sp_configure 'show advanced options', 0;
RECONFIGURE WITH OVERRIDE;
GO
*/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.252.204