**After reading this appendix, you will be able to:**

Grasp core cryptographic constructs.

Better understand common cryptographic terms.

Originally, we didn’t think we needed this appendix. But as we wrote more text, we realized that for the book to be well-rounded, we were wrong.

This appendix is designed to give a layperson a brief introduction to the cornerstones of modern cryptographic solutions. It is not designed to replace extensive books on the topic of cryptography—and there are plenty of good ones. Here are just a few:

*Cryptography Engineering: Design Principles and Practical Applications*by Niels Ferguson, Bruce Schneier, and Tadayoshi Kohno (Wiley)*Practical Cryptography*by Niels Ferguson and Bruce Schneier (Wiley)*Serious Cryptography: A Practical Introduction to Modern Encryption*by Jean-Philippe Aumasson (No Starch Press)*The Manga Guide to Cryptography*by Masaaki Mitani, Shinichi Sato, and Idero Hinoki (No Starch Press)

Note

You may think that the last book is a tongue-in-cheek suggestion, and perhaps it is. But there’s no denying it’s an excellent book if you’re new to cryptography and enjoy manga!

At a high level, much of cryptography can be carved up into two major areas:

Secrecy and integrity/authentication

Symmetric and asymmetric algorithms

These two groups lead to a combination of four cryptographic solutions and associated techniques (see Table A-1):

Symmetric secrecy

Asymmetric secrecy

Symmetric integrity/authentication

Asymmetric integrity/authentication

**TABLE A-1** Cryptographic solutions and associated techniques

| Symmetric | Asymmetric |
---|---|---|

| Symmetric ciphers | Asymmetric ciphers |

| Hashes Message authentication codes | Digital signatures |

The rest of this appendix covers these four areas and associated cryptographic techniques. After that, it covers other topics of importance, such as certificates and key derivation.

When most people think about cryptography, they’re thinking of symmetric ciphers. The most common of these today is the Advanced Encryption Standard (AES) algorithm. AES is a block cipher, which means it encrypts and decrypts one block at a time. The standard block size for AES is 128 bits. With AES, the same key is used to encrypt and decrypt data and must be known to all communicating parties.

Note

Today, most TLS sessions use AES for bulk-encryption of data.

Symmetric ciphers operate in various cipher modes, such as the following:

**Electronic code book (ECB)**This is rarely used because two sets of identical plaintext yield the same ciphertext, when using the same key. This helps attackers because if two ciphertexts are the same, then the two plaintexts are the same. So, don’t use it.**Cipher block chaining (CBC)**This is the default for many libraries. When a block is encrypted, the resulting single block of ciphertext is XORed with the next block of plaintext before it is encrypted—and hence the name! For many situations, this is a good default mode to use, as it does not suffer from “same key, same plaintext, same ciphertext” weakness of ECB.**Galois Counter Mode (GCM)**GCM is becoming an important mode. It is complex to explain and certainly beyond the scope of this appendix. GCM not only encrypts, but it also produces a tag. This is like a message authentication code (MAC), which we will cover later in this appendix.

Block ciphers also require a padding mode. Because plaintext might not carve up nicely into 64-bit or 128-bit blocks, the last block is usually padded. Examples of padding modes include the following:

**ANSIX923**A sequence of bytes filled with zeros before the length**ISO10126**Random data before the length**None**No padding**PKCS7**A sequence of bytes, each of which is equal to the total number of padding bytes added**Zeros**Consists of bytes set to zero

.NET defaults to using PKCS7 padding, as does the Always Encrypted cipher suite used by SQL Server and Cosmos DB and the Microsoft Data Encryption SDK.

Some block modes, such as CBC and GCM, need an initialization vector (IV), which is fed into the first encryption block. An IV must be unique and unencrypted. An IV is usually appended to the ciphertext so the decryption process has access to the IV. The role of an IV is to create different ciphertext if an encryption key is reused—which is why it needs to be unique. An IV does not need to be random.

There are many other well-known and no-longer-secure symmetric block ciphers, such as DES and TripleDES. There also exist stream ciphers that encrypt and decrypt one byte at a time. Currently, AES is the go-to symmetric cipher, however.

Please also note that AES is derived from the Rijndael algorithm, but Rijndael is not AES, so please do not use Rijndael.

The major advantage of symmetric ciphers is they are quick and well understood. The major disadvantage is all communicating parties must share the same key to encrypt and decrypt the data.

Unlike symmetric ciphers, asymmetric ciphers use key pairs: a private key and a public key. These keys are generated using math. You cannot deduce one key just by knowing the other. If software performs a cryptographic operation using one key, the inverse cryptographic operation must be performed using the other key.

The public portion is, as its name suggests, public, and can be shared with anyone. The private key is just that: private, so must remain protected and are never shared, but they should still be backed up.

The most well-known asymmetric algorithms in widespread use today are as follows:

Rivest-Shamir-Adleman (RSA)

Elliptic curve (EC)

Diffie-Hellman (DH)

The following are the core asymmetric cryptographic operations:

Encrypt and decrypt

Sign and verify (for digital signatures)

Wrap and unwrap (to protect and unprotect other keys, such as symmetric keys)

Each of these use the public and private key differently. For example:

Encrypt with public, decrypt with private.

Sign with private, verify with public.

Wrap with public, unwrap with private.

This means if you know someone’s public key, you could send them an encrypted message, and only the valid recipient can decrypt it because only they have the private key associated with the public key. Or someone could digitally sign a document with their private key, and anyone could verify it came from them using their public key.

Not all asymmetric algorithms support all cryptographic operations, however. RSA can perform all six of the aforementioned operations, but EC can only sign and verify. You cannot encrypt using EC.

The advantage of asymmetric ciphers is they don’t require key sharing among all parties the way symmetric cryptography does. The downside is performance. RSA and DH are both incredibly slow. EC, while slower than symmetric operations, is significantly faster than RSA and DH.

A hash is a data fingerprint. A person’s fingerprint identifies them, but it does not tell you anything about them. A hash is the same way. The result of a hash function is often called a fingerprint, thumb-print, or digest.

The most common hash algorithms today are the SHA-2 suite of algorithms:

SHA-2 224

SHA-2 256

SHA-2 384

SHA-2 512

SHA-3 is new and still not commonly used. Currently, TLS does not use SHA-3. It is our opinion that you should still use SHA-2 until SHA-3 has had a little more time in the market and because SHA-2 is not broken. SHA-2 has been a standard since 2001 and SHA-3 since 2015. Also, NIST is not presently replacing SHA-2 with SHA-3.

Back to SHA-2. You’ll notice that each hash function in the preceding list has a number at the end. This is the bit size of the resulting digest. So, if you have a 2 MB Word document and you hash the contents using SHA-2 256, you get a 256-bit digest that identifies the document (that is, a fingerprint).

A property of a good hash function is that it resists collisions, so you should not be able to find two documents that have the same digest. Some older hash functions—such as MD4, MD5, and SHA-1—have been demonstrated to have collisions and as such should no longer be used.

Fun fact #1: SHA-2 224 is a full SHA-2 256 with some bits thrown away, and SHA-2 384 is a full SHA-512 with some bits thrown away.

Fun fact #2: There is a well-known but esoteric attack against hash functions called a length extension attack. SHA-2 256 and SHA-2 512 are potentially subject to these attacks, but the truncated versions, SHA-2 224 and SHA-2 384, are not.

Hash functions are often called an integrity check, and it’s not uncommon to see, for example, a list of files with their SHA-2 hash. You can see an example at https://azsec.tech/gp7 when you click the Updated Packages tab. When you download the file updates, you can verify they have not been tampered with by recalculating the hash yourself.

Hash functions are lightning fast. For most solutions, you will want to use SHA-2 256. It’s a good middle ground of speed and security. Always Encrypted uses SHA-2 256. The problem with hashes is that the document and its hash cannot travel with each other. This is because an attacker could change the document and recalculate the hash at any stage, and no one would know. There is nothing secret, like a key, to protect the hash from alteration. Hashes work in the sample link in the preceding paragraph because the hash information is on the website and cannot be tampered with.

We need to look at another option: message authentication code (MAC). A MAC is used to provide tamper detection and authentication. It does this by using a shared key called a MAC key.

The most prevalent MAC today is the hashed MAC (HMAC), which uses a hash function as its base. An easy way to think about an HMAC is it’s a hash function, but instead of hashing only a message, it stirs in a secret key, also.

If you prefer a more precise definition, if a hash function can be represented as:

`h = H(m)`

where `h`

is the resulting digest, `H`

is the hash algorithm, and `m`

is the message, then an (incredibly simplified) HMAC could be represented as:

`hmac = H( K || H( K || m ) )`

where `hmac`

is the resulting HMAC, `H`

is the hash algorithm, `K`

is a MAC key, and `m`

is the message. (The `||`

symbol means concatenation.)

This construct yields an interesting property: if a user creates a document and derives its HMAC using an HMAC key, an attacker will need to modify the document and then re-create the HMAC. But doing so would mean having access to the key (`K`

), which is a secret known only to all communicating parties.

Note

In the previous section, we said Always Encrypted uses SHA-2 256. This is technically not correct, however. In fact, Always Encrypted uses HMAC SHA-2 256.

HMACs are fast, but they have one major problem: verifying an HMAC means you must know the MAC key (`K`

), and that is the same for all parties. This means there is no way to know who actually created the document and created the HMAC, because if 10 entities have access to the same MAC key, it could have been any of those 10 entities that performed the work.

There is a way to solve the problem associated with HMACs, gain the benefits of asymmetric keys, and use them rather than shared, symmetric MAC keys: digital signatures, covered in the next section.

From a practical perspective, the big difference between an HMAC and a digital signature is the key. Sure, there are other differences, like the algorithms used and such, but let’s ignore those details in this 100-level introduction.

A common digital signature algorithm used today is Elliptic Curve Digital Signature Algorithm (ECDSA). Ethereum uses ECDSA to sign blockchain transactions.

When your code digitally signs something, it uses your private key on a document hash. This offers an advantage over an HMAC: if you sign something, anyone with your public key can verify it came from you. There is no need to disseminate sensitive HMAC keys; all they need is your public key. And because the key is public, anyone can gain access to it to verify your digitally signed documents. Also, because only you have the private key used to sign the hash, then the signed document must have come from you.

But there’s a nagging problem. How do you know to whom the public key belongs? That’s where certificates come in. They’re discussed next.

Certificates are designed to do one thing: to cryptographically bind a name, such as an email address or a DNS name, to a public key. To understand how certificates work, you must understand their lifecycle. To explain this, we will use a real example: the certificate used for the Azure Security Podcast website at https://azsecuritypodcast.net. This website contains the show notes for all episodes and is hosted on Azure as an App Service. Here’s how it works:

We need a certificate so we can use TLS on the site. Some software on Azure App Service creates an RSA key pair. (It could have been EC, but we chose RSA.)

The private key is stored securely in Azure.

The public key is wrapped into a certificate signing request (CSR) along with the site’s name and contact information, and the entire request is signed with the private key. Note that the private key does not leave its source. It’s used to sign the request, but nothing more.

The CSR is sent to a certificate authority (CA)—in our case, GoDaddy.

GoDaddy verifies that the CSR came from someone with administrative access to the azsecuritypodcast.net site. (There are multiple ways to do this; we won’t cover all the possibilities here.)

Assuming the verification is OK, GoDaddy creates the certificate. In it they include a serial number, valid from and valid to fields, the name of the website, the public key, and a few other critical items.

The entire blob is digitally signed by the private key associated with a GoDaddy certificate. This certificate is usually an intermediate certificate, which is perhaps signed by another intermediate certificate, which in turn is signed by a root certificate.

The result is the App Service has a digitally signed certificate we can use for TLS using the private key created in step 1.

Note

The current certificate standard is X.509 version 3.

We want to cover one final topic that’s not represented in Table A-1: key derivation. Humans generally use passwords and passphrases as credentials, but human language makes for lousy cryptographic keys because human language is low entropy. A better way to use passphrases as a starting point for keys is to use key derivation functions. By the way, the best solution is to never use anything but cryptographically random keys!

The most well-known derivation function is defined in RFC 2898 and is referred to as a *password-based key derivation function* (PBKDF). This function takes a few arguments:

A hash algorithm

A salt (a unique value)

An iteration count

The starting passphrase

On the first iteration, the function hashes the passphrase, the iteration count (zero), and the salt. Let’s call this k0. On subsequent iterations, the function hashes the output from the previous iteration, with the iteration count and the salt. This continues until the iterations are complete. This is often called *salting and stretching*. The output from the final step could be used as an AES key or an HMAC key. Microsoft internally uses iteration counts of 100,000.

PBKDFs are also used in environments that need to prove a user possesses a password. In this scenario, you don’t need the password; you just need to store something that can be used to prove they know the password. So, when a user presents their password, you pass it through the same PBKDF function and compare it with the stored PBKDF result; if they are the same, then the user knows the password. This is better than storing the password or even an encrypted password, because if an attacker gets the list of PBKDF-derived passwords, they can do nothing with the data other than attempt a brute-force attack. In this scenario, you also need to store the salt and the iteration count with the resulting PBKDF data.

The salt and the iteration count protect against online and offline attacks. This includes resistance to rainbow tables, which are precomputed tables of password hashes. So, rather than calculate hashes from passwords, an attacker takes the hash, looks it up in a rainbow table, and gets the password directly! You can read more about rainbow tables here: https://azsec.tech/x8c/.

You can also think of the iteration count as a “Moore’s law compensator.” Like an IV, the salt does not need to be random or encrypted, but it must be unique.

Other derivation functions include Argon2 and scrypt. These are both memory hard, which means they are inefficient when used on massively parallel hardware like custom chips and GPU cards—a favorite tool for attackers who perform brute-force attacks.

Over the decades, numerous weaknesses have been found in cryptographic algorithms and solutions, so it’s important to stay ahead of new research in this area. A common issue is the combination of AES, CBC, and PKCS7 padding.

There’s a known weakness called the Padding Oracle attack, discussed here: https://azsec.tech/74a. What’s interesting about this is a recent issue affected the Azure SDK and the code used to encrypt data at the client. The fix was to move to use AES-GCM (identified as version 2) rather than AES-CBC (identified as version 1). But because the SDK uses cryptographic agility by adding a 1-byte version number at the start of the protected blob, the code can easily decrypt older data encrypted with AES-CBC but can encrypt and write out using the more secure AES-GCM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.