Using HASHBYTES

The HASHBYTES built-in function is used to hash the string of characters using one of the seven supported hashing algorithms. The function accepts the following two input arguments:

  • algorithm: This is a hashing algorithm for hashing the input. The possible values are MD2, MD4, MD5, SHA, SHA1, SHA2_256, and SHA2_512, but only the last two are recommended in SQL Server 2017.
  • input: This is an input variable, column, or expression that needs to be hashed. The data types that are allowed are varchar, nvarchar, and varbinary.

The return type of the function is varbinary (8000). This function has been available in SQL Server since 2005, but it is enhanced in SQL Server 2016. The most important enhancement is removing the limit for the input size. Prior to SQL Server 2016, the allowed input values were limited to 8,000 bytes; now no limit is defined. In addition to this significant enhancement, five old algorithms, MD2, MD4, MD5, SHA, and SHA1, are marked for deprecation. The SHA2_256 and SHA2_512 algorithms are stronger, require more storage space, and the hash calculation is slower, but the collision probability is very low.

To demonstrate the importance of the removed input limit, the former standard sample AdventureWorks database will be used. Execute the following code in a SQL Server 2014 instance with this sample database installed to calculate a hash value for the XML representation of the first six orders in the SalesOrderHeader table:

USE AdventureWorks2014; 
SELECT HASHBYTES('SHA2_256',(SELECT TOP (6) * FROM Sales.SalesOrderHeader FOR XML AUTO)) AS hashed_value; 

The following hashed value is produced by the previous command:

hashed_value
------------------------------------------------------------------
0x26C8A739DB7BE2B27BCE757105E159647F70E02F45E56C563BBC3669BEF49AAF

However, when you want to include the seventh row in the hash calculation, use the following code:

USE AdventureWorks2014; 
SELECT HASHBYTES('SHA2_256',(SELECT TOP (7) * FROM Sales.SalesOrderHeader FOR XML AUTO)) AS hashed_value; 

Instead of the hashed value, this query, executed in an SQL Server 2014 instance, generates a very well-known and very frustrating SQL Server error message:

Msg 8152, Level 16, State 10, Line 2
String or binary data would be truncated.

Clearly, the reason for this error is the size of the input string which exceeds the limit of 8,000 bytes. You can confirm this by executing the following query:

SELECT DATALENGTH(CAST((SELECT TOP (7) * FROM Sales.SalesOrderHeader FOR XML AUTO) AS NVARCHAR(MAX))) AS input_length; 

Indeed, the size of the input argument for the HASHBYTES function exceeds 8,000 bytes:

input_length
--------------------
8754

Since SQL Server 2017, this limitation has been removed:

USE AdventureWorks2017; 
SELECT HASHBYTES('SHA2_256',(SELECT TOP (7) * FROM Sales.SalesOrderHeader FOR XML AUTO)) AS hashed_value; 

The preceding hash query returns the following result:

hashed_value
------------------------------------------------------------------
0x864E9FE792E0E99165B46F43DB43E659CDAD56F80369FD6D2C58AD2E8386CBF3

Prior to SQL Server 2016, if you wanted to hash more than 8 KB of data, you had to split input data to 8 KB chunks and then combine them to get a final hash for the input entry. Since the limit does not exist anymore, you can use the entire table as an input parameter now. You can slightly modify the initial query to calculate the hash value for the entire order table:

USE AdventureWorks2017; 
SELECT HASHBYTES('SHA2_256',(SELECT * FROM Sales.SalesOrderHeader FOR XML AUTO)) AS hashed_value; 

This generates the following output:

hashed_value
------------------------------------------------------------------
0x2930C226E613EC838F88D821203221344BA93701D39A72813ABC7C936A8BEACA

I played around with it and could successfully generate a hash value, even for an expression with a size of 2 GB. It was slow, of course, but it did not break. I just want to check the limit; it does not make much sense to use HASHBYTES to detect changes in a large table.

AdventureWorks has long been one of the most used SQL Server sample databases. After introducing the WideWorldImporter database, it was not supported in the SQL Server 2016 RTM, but the AdventureWorks is back in SQL Server 2017. You can download it at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks.

This function can be very useful to check whether relative static tables are changed or to compare the same database tables in two instances. With the following query, you can check the status of products in the AdventureWorks2017 database:

USE AdventureWorks2017; 
SELECT HASHBYTES('SHA2_256',(SELECT *  
 FROM  
  Production.Product p 
  INNER JOIN Production.ProductSubcategory sc ON p.ProductSubcategoryID = sc.ProductSubcategoryID 
  INNER JOIN Production.ProductCategory c ON sc.ProductCategoryID = c.ProductCategoryID 
  INNER JOIN Production.ProductListPriceHistory ph ON ph.ProductID = p.ProductID 
  FOR XML AUTO)) AS hashed_value; 

The following is the output generated by this query:

hashed_value
------------------------------------------------------------------
0xAFC05E912DC6742B085AFCC2619F158B823B4FE53ED1ABD500B017D7A899D99D

If you want to check whether any of the product attributes defined by the previous statement are different for two or more instances, you can execute the same query against the other instances and compare the values only, without loading and comparing the entire datasets.

With no limit for the input string, you do not need to implement workarounds for large inputs anymore, and the fact that you can easily generate a hash value for multiple joined tables can increase the number of use cases for the HASHBYTES function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.79.33