Encryption is a process of obfuscating data with an algorithm that uses keys and certificates so that if security is bypassed and data is accessed or stolen by unauthorized users, then it will be useless, unless the keys that were used to encrypt it are also obtained. This adds an additional layer of security over and above access control, but it does not replace the need for an access control implementation. Encrypting data also has the potential to considerably degrade performance, so you should use it on the basis of need, as opposed to implementing it on all data as a matter of routine.
In this chapter, we discuss the SQL Server encryption hierarchy before demonstrating how to implement transparent data encryption (TDE) as well as cell-level encryption. We also discuss Always Encrypted, and secure enclaves.
Encryption Hierarchy
SQL Server offers the ability to encrypt data through a hierarchy of keys and certificates. Each layer within the hierarchy encrypts the layer below it.
Encryption Concepts
Before we discuss the hierarchy in detail, it is important to understand the concepts that relate to encryption. The following sections provide an overview of the main artifacts that are involved in encryption.
Symmetric Keys
A symmetric key is an algorithm that you can use to encrypt data. It is the weakest form of encryption because it uses the same algorithm for both encrypting and decrypting the data. It is also the encryption method that has the least performance overhead. You can encrypt a symmetric key with a password or with another key or certificate.
Asymmetric Keys
In contrast to a symmetric key, which uses the same algorithm to both encrypt and decrypt data, an asymmetric key uses a pair of keys or algorithms. You can use one for encryption only and the other for decryption only. The key that is used to encrypt the data is called the private key and the key that is used to decrypt the data is known as the public key.
Certificates
A certificate is issued by a trusted source, known as a certificate authority (CA). It uses an asymmetric key and provides a digitally signed statement, which binds the public key to a principal or device, which holds the corresponding private key.
Windows Data Protection API
The Windows Data Protection API (DPAPI) is a cryptographic application programming interface (API) that ships with the Windows operating system. It allows keys to be encrypted by using user or domain secret information. DPAPI is used to encrypt the Service Master Key, which is the top level of the SQL Server encryption hierarchy.
SQL Server Encryption Concepts
SQL Server’s cryptography functionality relies on a hierarchy of keys and certificates, with the root level being the Service Master Key. The following sections describe the use of master keys, as well as SQL Server’s encryption hierarchy.
Master Keys
The root level of the SQL Server encryption hierarchy is the Service Master Key. The Service Master Key is created automatically when the instance is built, and it is used to encrypt database master keys, credentials, and linked servers’ passwords using the DPAPI. The Service Master Key is stored in the Master database and there is always precisely one per instance. The Service Master Key is a symmetric key that is generated using the AES 256 algorithm. This is in contrast to older versions of SQL Server, which use the Triple DES algorithm.
If you ever need to regenerate the Service Master Key, it involves decrypting and then reencrypting every key and certificate that sits below it in the hierarchy. This is a very resource-intensive process and should only be attempted during a maintenance window.
Regenerating the Service Master Key
Backing Up and Restoring the Service Master Key
service_master_key is the name of the key file as opposed to a folder. By convention, it does not have an extension.
As when you are regenerating a Service Master Key, when you restore it, you can also use the FORCE keyword with the same consequences.
Creating a Database Master Key
Backing Up and Restoring a Database Master Key
As with the Service Master Key, if the restore is unable to decrypt and reencrypt any of the keys below it in the hierarchy, the restore fails. You are able to use the FORCE keyword to force the restore to succeed, but when you do so, you permanently lose access to the data encrypted using the key(s) that could not be decrypted and reencrypted.
SQL Server 2022 introduces support for backing up the Database Master key to Azure Blob storage. To use this option, you will need to create a SQL Server Credential that uses the URL of the Blob storage as its name, the shared access signature as the identity, and the shared access signature key as the secret. Credentials are discussed in Chapter 10 and creating credentials for Azure is demonstrated in Chapter 21.
Once you have a Credential in place, you can simply replace the TO FILE clause of the BACKUP MASTER KEY and RESTORE MASTER KEY statements with TO URL and FROM URL clauses, where the URL is the same as the URL used by the Credential.
Hierarchy
The diagram shows that the Service Master Key and a copy of the Database Master Key are stored at the instance level, with the Database Master Key also being stored within the database. The certificates, symmetric keys, and asymmetric keys that are encrypted using the Database Master Key are also stored within the database.
To the right of the diagram, you see a section called the EKM Module. An Extensible Key Management (EKM) module allows you to generate and manage keys and certificates used to secure SQL Server data in third-party hardware security modules, which interface with SQL Server using the Microsoft Cryptographic API (MSCAPI). This is more secure because the key is not being stored with the data, but it also means that you can benefit from advanced features that may be offered by the third-party vendor, such as key rotation and secure key disposal.
Enabling EKM and Importing the EKM Module
A full discussion of EKM is beyond the scope of this book, but you can obtain further information from your cryptographic provider.
Transparent Data Encryption
When implementing a security strategy for your sensitive data, one important aspect to consider is the risk of data being stolen. Imagine a situation in which a privileged user with malicious intent uses detach/attach to move a database to a new instance in order to gain access to data they are not authorized to view. Alternatively, if a malicious user gains access to the database backups, they can restore the backups to a new server in order to gain access to the data.
Transparent data encryption (TDE) protects against these scenarios by encrypting the data pages and log file of a database and by storing the key, known as a Database Encryption Key, in the boot record of the database. Once you enable TDE on a database, pages are encrypted before they are written to disk and they are decrypted when they are read into memory.
TDE also provides several advantages over cell-level encryption, which will be discussed later in this chapter. First, it does not cause bloat. A database encrypted with TDE is the same size as it was before it was encrypted. Also, although there is a performance overhead, this is significantly less than the performance overhead associated with cell-level encryption. Another significant advantage is that the encryption is transparent to applications, meaning that developers do not need to modify their code to access the data.
When planning the implementation of TDE, be mindful of how it interacts with other technologies. For example, you are able to encrypt a database that uses In-Memory OLTP, but the data within the In-Memory filegroup is not encrypted because the data resides in memory, and TDE only encrypts data at rest, meaning when it is on disk. Even though the memory-optimized data is not encrypted, log records associated with in-memory transactions are encrypted.
It is also possible to encrypt databases that use FILESTREAM, but again, data within a FILESTREAM filegroup is not encrypted. If you use full-text indexes (FTE), then new full-text indexes are encrypted. Existing full-text indexes are only encrypted after they are imported during an upgrade. It is regarded as bad practice to use full-text indexing with TDE, however. This is because data is written to disk in plain text during the full-text indexing scan operation, which leaves a window of opportunity for attackers to access sensitive data.
High availability and disaster recovery technologies such as database mirroring, AlwaysOn Availability Groups, and log shipping are supported with databases that have TDE enabled. Data on the replica database is also encrypted, and the data within the log is encrypted, meaning that it cannot be intercepted as it is being sent between the servers. Replication is also supported with TDE, but the data in the subscriber is not automatically encrypted. You must enable TDE manually on subscribers and the distributor.
Even if you enable TDE at the subscriber, data is still stored in plain text while it is in intermediate files. This, arguably, poses a greater risk than using FTE (full-text indexes), so you should closely consider the risk/benefit scenario.
It is also important to note that enabling TDE for any database within an instance causes TDE to be enabled on TempDB. The reason for this is that TempDB is used to store user data for intermediate results sets, during sort operations, spool operations, and so on. TempDB also stores user data when you are using Temp Tables, or row versioning operations occur. This can have the undesirable effect of decreasing the performance of other user databases that have not had TDE enabled.
It is also important to note, from the viewpoint of the performance of database maintenance, that TDE is incompatible with instant file initialization. Instant file initialization speeds up operations that create or expand files, as the files do not need to be zeroed out. If your instance is configured to use instant file initialization, then it no longer works for the files associated with any databases that you encrypt. It is a hard technical requirement that files are zeroed out when TDE is enabled.
Implementing TDE
To implement transparent data encryption, you must first create a Database Master Key. Once this key is in place, you can create a certificate. You must use the Database Master Key to encrypt the certificate. If you attempt to encrypt the certificate using a password only, then it will be rejected when you attempt to use it to encrypt the Database Encryption Key. The Database Encryption Key is the next object that you need to create, and as implied earlier, you encrypt this using the certificate. Finally, you can alter the database to turn encryption on.
It is possible to encrypt the Database Encryption Key using an asymmetric key as opposed to a server certificate, but only if the asymmetric key is protected using an EKM module.
Dropping a file
Dropping a filegroup
Dropping the database
Detaching the database
Taking the database offline
Setting the database as read_only
Luckily, a feature in recent versions of SQL Server gives DBAs more control over this process, with the ability to pause and restart the encryption scan, using an ALTER DATABASE statement, with either SET ENCRYPTION SUSPEND or SET ENCRYPTION RESTART options specified.
It is also important to note that the operation to enable TDE will fail if any of the filegroups within a database are marked as read_only. This is because all pages within all files need to be encrypted when TDE is enabled, and this involves changing the data within the pages to obfuscate them.
Creating the Chapter12Encrypted Database
You can see that we have selected our server certificate from the drop-down box and have chosen to enable database encryption. In the Encryption Algorithm drop-down box, we have selected AES 128, which is the default option.
Choosing an algorithm is essentially a trade-off between security and performance. Longer keys consume more CPU resources but are more difficult to crack.
Enabling Transparent Data Encryption
In SQL Server 2022, certificates can also be created and backed up as in PFX format. This allows the server certificate, intermediate certificates, and the private key to all be stored in the same file, using a binary format. This can be used to avoid using the weaker RC4 algorithm, to allow conversion to public key pairs.
Managing TDE
Backing Up the Certificate
Migrating an Encrypted Database
Restoring the Server Certificate
Make sure that the SQL Server service account has permissions to the certificate and key files in the operating system. Otherwise, you will receive an error stating that the certificate is not valid, does not exist, or that you do not have permissions to it. This means that you should check the restore immediately and periodically repeat the test.
Managing Cell-Level Encryption
Cell-level encryption allows you to encrypt a single column, or even specific cells from a column, using a symmetric key, an asymmetric key, a certificate, or a password. Although this can offer an extra layer of security for your data, it can also cause a significant performance impact and a large amount of bloat. Bloat means that the size of the data is much larger after the data has been encrypted than it was before. Additionally, implementing cell-level encryption is a manual process that requires you to make code changes to applications. Therefore, encrypting data should not be your default position, and you should only do it when you have a regulatory requirement or clear business justification.
Cryptographic Functions
Encryption Method | Encryption Function | Decryption Function |
---|---|---|
Asymmetric key | ENCRYPTBYASYMKEY() | DECRYPTBYASYMKEY() |
Certificate | ENCRYPTBYCERT() | DECRYPTBYCERT() |
Passphrase | ENCRYPTBYPASSPHRASE() | DECRYPTBYPASSPHRASE() |
When we created the SensitiveData table in our database, you may have noticed that we used the VARBINARY(8000) data type for the CreditCardNumber column when the obvious choice would have been a CHAR(19). This is because encrypted data must be stored as one of the binary data types. We have set the length to 8000 bytes, because this is the maximum length of the data that is returned from the function used to encrypt it.
Encrypting a Column of Data
You will notice that we back up the symmetric key before using it. This is a new feature of SQL Server 2022. In previous versions, only Service Master Keys, Database Master Keys, and Certificates could be backed up. The feature allows for the backup of symmetric keys to either file (as demonstrated) or to Azure Blob storage. To back up to Azure Blob storage, use a Credential to authenticate and replace the TO FILE clause with a TO URL clause, as discussed for Database Master Keys, earlier in this chapter. Keys can then be restored using a RESTORE SYMMETRIC KEY statement. Once again, this statement accepts FROM FILE or FROM URL clauses.
EncryptByKey() Parameters
Parameter | Description |
---|---|
Key_GUID | The GUID of the symmetric key that is used to encrypt the data |
ClearText | The binary representation of the data that you wish to encrypt |
Add_authenticator | A BIT parameter that indicates if an authenticator column should be added |
Authenticator | A parameter that specifies the column that should be used as an authenticator |
Also notice that before we use the key to encrypt the data, we issue a statement to open the key. The key must always be opened before it is used for either encrypting or decrypting data. To do this, the user must have permissions to open the key.
When you encrypt a column of data using the method shown in Listing 12-10, you still have a security risk caused by the deterministic nature of the algorithm used for encryption, which means when you encrypt the same value, you get the same hash. Imagine a scenario in which a user has access to the SensitiveData table but is not authorized to view the credit card numbers. If that user is also a customer with a record in that table, they could update their own credit card number with the same hashed value as that of another customer in the table. They have then successfully stolen another customer’s credit card number, without having to decrypt the data in the CreditCardNumber column. This is known as a whole-value substitution attack.
To protect against this scenario, you can add an authenticator column, which is also known as a salt value. This can be any column but is usually the primary key column of the table. When the data is encrypted, the authenticator column is encrypted along with the data. At the point of decryption, the authenticator value is then checked, and if it does not match, then the decryption fails.
It is very important that the values in the authenticator column are never updated. If they are, you may lose access to your sensitive data.
The script in Listing 12-11 shows how we can use an authenticator column to encrypt the CreditCardNumber column using the primary key of the table as an Authenticator column. Here, we use the HASHBYTES() function to create a hash value of the Authenticator column, and then we use the hash representation to encrypt the data. If you have already encrypted the column, the values are updated to include the salt.
This script is included as an example, but you should avoid running it at this point so you are able to follow later code examples.
Encrypting a Column Using an Authenticator
At the end of the script, we close the key. If we do not close it explicitly, then it remains open for the rest of the session. This can be useful if we are going to perform multiple activities using the same key, but it is good practice to explicitly close it immediately following its final usage within a session.
Even though it is possible to encrypt data using symmetric keys, asymmetric keys, or certificates for performance reasons, you will usually choose to use a symmetric key and then encrypt that key using either an asymmetric key or a certificate.
Accessing Encrypted Data
DecryptByKey Parameters
Parameter | Description |
---|---|
Cyphertext | The encrypted data that you want to decrypt |
AddAuthenticator | A BIT value specifying if an authenticator column is required |
Authenticator | The column to be used as an authenticator |
Reading an Encrypted Column
Always Encrypted
Always Encrypted is a technology which protects data against privileged users, such as members of the sysadmin role. Because DBAs cannot view the encrypted data, Always Encrypted provides true segregation of duties. This can help with compliance issues for sensitive data when your platform support is outsourced to a third-party vendor. This is especially true if you have a regulatory requirement not to make your data available outside of your country’s jurisdiction and the third-party vendor is using offshore teams.
Always Encrypted uses two separate types of key: a column encryption key and a column master key. The column encryption key is used to encrypt the data within a column and the column master key is used to encrypt the column encryption keys.
The column master key is a key or a certificate, located within an external store.
Having the second layer of key means that SQL Server needs only to store an encrypted value of the column encryption key, instead of storing it in plain text. The column master key is not stored in the database engine at all. Instead, it is stored in an external key store. The key store used could be an HSM (hardware security module), Windows Certificate Store, or an EKM provider, such as Azure Key Vault or Thales. SQL Server then stores the location of the column master key, within the database metadata.
Instead of SQL Server being responsible for the encryption and decryption of data, this responsibility is handled by the client driver. Of course, this means that the application must be using a supported driver, and the following link contains details of working with supported drivers: https://msdn.microsoft.com/en-gb/library/mt147923.aspx.
When an application issues a request, which will require data to either be encrypted or decrypted, the client driver liaises with the database engine to determine the location of the column master key. The database engine also provides the encrypted column encryption key and the algorithm used to encrypt it.
The client driver can now contact the external key store and retrieve the column master key, which it uses to decrypt the column encryption key. The plaintext version of the column encryption key can then be used to encrypt or decrypt the data, as required.
The entire process is transparent to the application, meaning that changes are not required to the application’s code in order to use Always Encrypted. The only change that may be required is to use a later supported driver.
The client driver will cache the plaintext version of column encryption keys as an optimization, which attempts to avoid repeated round trips to the external key store.
Always Encrypted has some significant limitations, including an inability to perform nonequality comparisons (even equality comparisons are only available with deterministic encryption). SQL Server 2019 introduced Secure Enclaves to address some of these issues. With secure enclaves, operators such as <, >, and even LIKE are supported, providing that randomized encryption is used. Secure Enclaves also support in-place encryption.
Secure Enclaves work, by using a protected area of memory, inside the SQL Server process, as a trusted execution environment. Within this memory region, data is decrypted and computations are performed. It is not possible for the rest of the SQL Server process, or any other process on the server, to access the secure memory, meaning that the decrypted data cannot be leaked, even when using debugging tools.
If SQL Server determines that a secure enclave is required to satisfy a query, then the client driver uses a secure channel, to send the encryption keys to the secure enclave. The client driver then submits the query, and encrypted query parameters, for execution. Because the data (even the encrypted parameters) are only ever decrypted inside the enclave, the data, parameters, and encryption keys are never exposed, in plain text.
Because the decrypted data and keys are available inside of the enclave, the client driver needs to verify that the enclave is genuine. To do this, it requires an external arbiter, known as an attestation service, such as the Windows Server Host Guardian Service. Before sending any data to the enclave, the client driver will contact the attestation service to determine the enclave’s validity.
When using Always Encrypted, there are many limitations. For example, advanced data types are not supported. A full list of limitations can be found at docs.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-database-engine?view=sql-server-2022#feature-details.
Implementing Always Encrypted
In this section, we will encrypt data, within a database called Chapter12AlwaysEncrypted, using Always Encrypted with Secure Enclaves. We will use VBS (virtualization-based security) enclaves. In production environments, you should ensure that your enclaves use TPM (Trusted Platform Module) attestation, for enhanced security. TPM is hardware-based attestation and is beyond the scope of this chapter. Further details can be found at https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/component-updates/tpm-key-attestation.
In this section, we will configure a second server, as the attestation service, and register our SQL Server with it, so that it may arbitrate, when client wishes to use the enclave.
The first step is to use the PowerShell script in Listing 12-13 to configure a second server as the attestation service. This server must be running Windows Server 2019 or higher. The process requires the server to perform several reboots. Therefore, the script uses comments to mark the sections. After running each section of the script, the subsequent section should not be run, until the reboot is complete. Remember that you cannot use the same server, which hosts your SQL Server instance. It is also worthy of note that the server acting as the attestation service may not be domain joined, at the point it is configured. This is because a new domain is created, for the host guardian.
The PowerShell terminal should be run as Administrator, for the script to execute successfully.
Configure the Attestation Server
Prepare to Register the Host As a Guarded Host
At this point, you should manually copy the certificate file that is generated in the c:keys folder to the attestation server. Assuming that you copy the certificate to a folder called c:keys, the script in Listing 12-15 will import the key into the attestation service.
Be sure to change the server and key names to match your own.
Import Client Key into Attestation Service
Configure the Client
Create the Always Encrypted Cryptographic Objects
Key Store Values
Key Store Type | Description |
---|---|
Windows Certificate Store - Current User | The key or certificate is stored in the area of the Windows Certificate Store that is reserved for the profile of the user that created the certificate. This option may be appropriate if you use the database engine’s service account interactively, to create the certificate. |
Windows Certificate Store - Local Machine | The key or certificate is stored in the area of the Windows Certificate Store that is reserved for the local machine. |
Azure Key Vault | The key or certificate is stored in the Azure Key Vault EKM service. |
Key Storage Provider (CNG) | The key or certificate is stored in an EKM store that supports Cryptography API: Next Generation. |
Enable Secure Enclaves
We now want to encrypt the CreditCardNumber, ExpMonth, and ExpYear columns of the dbo.CreditCards table, which is loosely based on the Sales.CreditCard table of the AdventureWorks database.
When encrypting the data, we have a choice of two methods: deterministic or randomized. This is an important decision to understand, as it may have an impact on performance, security, and the features that are available with secure enclaves.
Deterministic encryption will always produce the same encrypted value, for the same plaintext value. This means that if deterministic encryption is used, operations including equality joins, grouping, and indexing are possible on an encrypted column, proving a BIN2 collation is used for the column. This leaves the possibility of attacks against the encryption, however.
If you use randomized encryption, then different encrypted values can be generated for the same plaintext values. This means that while encryption loopholes are plugged, for standard Always Encrypted implementations, equality joins, grouping, and indexing are not supported against the encrypted data.
Encryption Types and Feature Compatibility
Encryption Type | In-Place Encryption | Equality Comparisons | Rich Computations | LIKE |
---|---|---|---|---|
Deterministic Without Enclaves | No | Yes | No | No |
Deterministic With Enclaves | Yes | Yes | No | No |
Randomized Without Enclaves | No | No | No | No |
Randomized With Enclaves | Yes | Yes (Inside Enclave) | Yes | Yes |
Create the CreditCards Table, with Encrypted Columns
If encrypting existing data, only ever perform the operation during a maintenance window, as DML statements against the table, while encryption is in progress could potentially result in data loss.
Insert Data into Encrypted Columns
Administering Keys
sys.column_master_keys Columns
Column | Description |
---|---|
Name | The name of the column master key. |
Column_master_key_id | The internal identifier of the column master key. |
Create_date | The date and time that the key was created. |
Modify_date | The date and time that the key was last modified. |
Key_store_provider_name | The type of key store provider, where the key is stored. |
Key_path | The path to the key, within the key store. |
Allow_enclave_computations | Specifies if the key is enclave enabled. |
Signature | A digital signature, combining key_path and allow_enclave_computations. This stops malicious administrators from changing the key’s enclave enabled setting. |
Columns Returned by sys.column_encryption_keys
Name | Description |
---|---|
Name | The name of the column encryption key |
Column_encryption_key_id | The internal ID of the column encryption key |
Create_date | The date and time that the key was created |
Modify_date | The date and time that the key was last modified |
sys.column_encryption_key_values Columns
Name | Description |
---|---|
Column_encryption_key_id | The internal ID of the column encryption key |
Column_master_key_id | The internal ID of the column master key |
Encrypted_value | The encrypted value of the column encryption key |
Encrypted_algorithm_name | The algorithm used to encrypt the column encryption key |
Therefore, we could use the query in Listing 12-21 to find all columns in a database that have been encrypted with enclave enabled keys.
Remove the WHERE clause to return all columns that are secure with Always Encrypted, and determine which columns do and do not support secure enclaves.
Return Details of Columns That Use Secure Enclaves
It is not possible for an Administrator to toggle a key between enclave enabled and not enclave enabled. This is a conscious design decision by Microsoft to protect against malicious Administrators. It is possible to rotate keys, however, and when rotating keys, you can rotate out a key that is not enclave enabled, and replace it with one that is (or vice versa).
The following demonstration assumes that an additional column master key exists within the Chapter12AlwaysEncrypted database.
Summary
The SQL Server encryption hierarchy begins with the Service Master Key, which is encrypted using the Data Protection API (DPAPI) in the Windows operating system. You can then use this key to encrypt the Database Master Key. In turn, you can use this key to encrypt keys and certificates stored within the database. SQL Server also supports third-party Extensible Key Management (EKM) providers to allow for advanced key management of keys used to secure data.
Transparent data encryption (TDE) gives administrators the ability to encrypt an entire database with no bloat and an acceptable performance overhead. This offers protection against the theft of data by malicious users attaching a database to a new instance or stealing the backup media. TDE gives developers the advantage of not needing to modify their code in order to access the data.
Cell-level encryption is a technique used to encrypt data at the column level, or even the specific rows within a column, using a symmetric key, an asymmetric key, or a certificate. Although this functionality is very flexible, it is also very manual and causes a large amount of bloat and a large performance overhead. For this reason, I recommended that you only use cell-level encryption to secure the minimum amount of data you need in order to fulfill a regulatory requirement or that you have clear business justification for using it.
In order to mitigate the impact of bloat and performance degradation when using cell-level encryption, it is recommended that you encrypt data using a symmetric key. You can then encrypt the symmetric key using an asymmetric key or certificate.