© Stephen Haunts 2019
Stephen HauntsApplied Cryptography in .NET and Azure Key Vaulthttps://doi.org/10.1007/978-1-4842-4375-6_4

4. Hashing and Hashed Message Authentication Codes

Stephen Haunts1 
(1)
Belper, Derbyshire, UK
 

Now that you know the importance of random numbers in cryptography and how to generate them, let’s look at one of the pillars of cryptography. In Chapter 2, I mentioned the four pillars of modern cryptography: cryptography, integrity, authentication, and non-repudiation. In this chapter, we explore integrity and authentication by looking at the various hashing and authenticated hashing operations available in .NET. Let’s start with integrity and hashing.

Hashing and Integrity

A hash function in cryptography is an algorithm that takes in a block of data and then returns a fixed-size reply, which is the (cryptographic) hash value (see Figure 4-1). Any change to the original input data results in the hash code changing. The data to be hashed is often called the message, and the hash value is often called the hash code, the message digest, or simply, the digest.
../images/457525_1_En_4_Chapter/457525_1_En_4_Fig1_HTML.png
Figure 4-1

A hash function takes data and generates a unique hash code for it

To be a reliable and useful hash function, it must conform to four main properties.
  • The hash code must be easily calculated for any given input message.

  • You should not be able to create a message that has a specified hash code.

  • Any changes to the original message should completely change the hash code.

  • You should not be able to find two input messages that result in the same hash code.

Another way to frame the concept of a hash function is to think of it as the digital equivalent of a fingerprint for a piece of data. Once you have generated a hash code for that piece of data, the hash code is always the same if you calculate it again, unless the original data changes in any way, no matter how small that change is.

The process of calculating a hash code or digest of an item of data is straightforward in the .NET Framework or .NET Core. There are different algorithms you can use in .NET, including MD5, SHA-1, SHA-256, and SHA-512, which we explore in this chapter.

Generating a hash code for a piece of data is a one-way operation, which means that once you have calculated the hash code for a piece of data, you cannot reverse the hash code back to the original data. There is no reversal process to return hash code back to the original data. On the flip side, encryption is designed to be a two-way operation. Once you have encrypted data with a key, you can then decrypt that data using the same key or recover the original message. Encryption is covered later in this book.

The properties of hashing, such as only being able to hash in one direction, and the hash code is unique to a piece of data, makes hashing the perfect mechanism for checking the integrity of data. Integrity checking means when you send data across a network to someone else, you can use hashing as a way to tell if the original data has been tampered with or corrupted. Before sending the data, you calculate a hash of the data to get its unique fingerprint. You then send that data and the hash to the recipient. The recipient calculates the hash of the data and then compares it to the hash you sent. If the generated hash codes are identical, then the data is successfully received without data loss or corruption. If the hash codes fail to match correctly, then the data received is not the same as the data initially sent. The two most common hashing methods are MD5 and the SHA family of hashes (SHA-1, SHA-256, and SHA-512), which are all supported in .NET. Let’s look at these in more detail.

MD5

The MD5 message digest algorithm is a widely used cryptographic hash function that produces a 128-bit (16-byte) hash value, which is expressed in text format as a 32-digit hexadecimal number or as a base64-encoded string. MD5 is used in a wide variety of cryptographic applications for operating systems and large-scale enterprise systems. One of the most common uses is verifying file integrity.

MD5 was designed by Ron Rivest in 1991 to replace MD4, an earlier hash function.

A flaw in the design of MD5 was found in 1996. The flaw did not seem to be a fatal weakness, but cryptographers recommended the use of other algorithms, such as the SHA family, which we explore later in this chapter. MD5 was used commercially for a long time, but in 2004, it was discovered that MD5 is not collision-resistant, which means that it is possible that generating an MD5 hash of two sets of data could result in the same hash.

Because of this flaw, MD5 is not recommended in any new systems. It is still important to talk about its use, though, as you may still need it in applications if you are checking the integrity of data coming from a legacy system that makes use of MD5. In most companies, legacy code is something developers have to live with, which is why MD5 may still be very relevant. Old data stored in a database may contain MD5 hashes as part of the data stored in their table, and legacy code potentially checks these hashes in disparate systems. If you need to read or receive any of that data from these older systems, you need the ability to recalculate and check the same hashes.

I had this problem in a company I used to work for, which was a large Internet bank in the United Kingdom. The core banking platform lived on AS400 mainframes, and the modern website and services that the company provided were developed in .NET and ASP.NET on top of the core banking platform. This means that we frequently had to query data from the AS400 banking system. All of this data was sent with a corresponding MD5 hash of the payload, which meant to check the integrity of the financial data coming from the banking platform, we had to recalculate the MD5 hash in .NET and compare the values. If they matched, then we were happy that the data integrity was intact. The banking platform used MD5, which wasn’t going to change, so we had to accept this decision and work with it. This is why MD5 is still relevant today, but don’t use it unless you have to.

Creating an MD5 hash in your code is very straightforward, as you can see in the following code. In the static method, ComputeHashMd5, we pass in a byte array of the data we want to create a hash code for. All the hashing operations for MD5 and the SHA family of hashes work with byte arrays, so if your input data isn’t in this format already, you first need to convert it; for example, if your input data is a string, then you need to use something like the Encoding.UTF8.GetBytes method to turn that string into an array of bytes. You see this in the example in a moment.
public class HashData
{
    public static byte[] ComputeHashMd5(byte[] toBeHashed)
    {
        using (var md5 = MD5.Create())
        {
            return md5.ComputeHash(toBeHashed);
        }
    }
}

Once you have a byte array for your data, you then need to hash it. To do this with MD5, you call the static Create method on the MD5 class, which gives you an instance of a class that you can use to create the hash. Once you have that instance, you call the ComputeHash method and pass in the byte array of the data where you want the hash code created.

Let’s now look at wiring up this method and calling it. In the following sample code, we start by creating two strings containing the same text. Then we create a hash MD5 hash code for each string by first converting it to a byte array. To display the resulting hash codes onto the screen, we have to convert it from a byte array to something more display-friendly. A useful format to convert to is a base64-encoded string, which is done using the static Convert.ToBase64String method. Once we have done this, the hash code displays to the console window.
class Program
{
    static void Main()
    {
        const string originalMessage = " Message to hash";
        const string originalMessage2 = " Message to hash";
        var md5HashedMessage = HashData.ComputeHashMd5(
                 Encoding.UTF8.GetBytes(originalMessage));
        var md5HashedMessage2 = HashData.ComputeHashMd5(
                 Encoding.UTF8.GetBytes(originalMessage2));
        Console.WriteLine("Message 1 hash = " +
                   Convert.ToBase64String(md5HashedMessage));
        Console.WriteLine("Message 2 hash = " +
                   Convert.ToBase64String(md5HashedMessage2));
    }
As you can see, even though we were hashing to separate strings (Figure 4-2), the final hash code is the same because the strings contained the same message.
../images/457525_1_En_4_Chapter/457525_1_En_4_Fig2_HTML.jpg
Figure 4-2

The result of running MD5 against two identical strings

If you were to change just a single character in one of those strings, then the generated hash codes would be completely different.

Secure Hash Algorithm (SHA) Family

MD5 shouldn’t be used if you can help it, but what is the available alternative in .NET? The alternative is the Secure Hash Algorithm family of hash functions, or the SHA family. The SHA family is a family of cryptographic hash functions published by the US National Institute of Standards and Technology (NIST). The premise of the SHA family of hashes is the same as with MD5. You supply some input data, run it through the hashing function, and get a hash code back. The concept is the same, but the underlying algorithm is different, and you get a much longer and more robust hash code.

The Secure Hash family covers many variants, including the following:
  • SHA-1. The SHA-1 hash function produces a 160-bit (20 bytes) hash code. SHA-1 was designed by the National Security Agency to be part of the Digital Signature Algorithm (DSA). Cryptographic weaknesses were discovered in SHA-1, and the standard was no longer approved for most cryptographic uses after 2010. As with MD5, it is still around to enable integration with legacy systems that use SHA-1.

  • SHA-2. SHA-2 is a family of two similar hash functions with different block sizes known as SHA-256 and SHA-512. These hash functions differ in word size. SHA-256 uses 32-bit words, whereas SHA-512 uses 64-bit words. NSA designed these versions of the SHA algorithm.

  • SHA-3. SHA-3 was defined after a public competition to find a hashing function implementation that was not designed by NSA. The winner was chosen in 2012. It is based on a hashing implementation called Keccak. SHA-3 supports the same hash length as SHA-2, but its internal working and structure is entirely different from SHA-1 and SHA-2. SHA-3 is not currently supported in the .NET Framework directly, although third-party implementations are available.

Implementing SHA in your applications is a straightforward process because the signatures of SHA objects are identical to those of MD5 objects. Table 4-1 shows how this is done. In the example class, we have three methods for creating our different SHA-based hashes: SHA-1, SHA-256, and SHA-512. As with the MD5 hash, the code for creating each of these hashes is almost identical; only the static hashing class names are different—SHA-1, SHA-256, and SHA-512.
Table 4-1

Size of Hash Codes in Bits and Bytes

Hash Type

Size in Bits

Size in Bytes

SHA-1

160

20

SHA-256

256

32

SHA-512

512

64

A SHA-1 hash returns a 160-bit or 20-byte hash code. The SHA-256 hash returns a hash code that is 256 bits or 32 bytes in length, and finally, a SHA-512 hash returns a hash code that is 512 bits or 64 bytes in length. Which type of hash function you use is a matter of preference if you need to store the hashes, but longer hash codes are more secure and resistant to hash collisions.
public class HashData
{
    public static byte[] ComputeHashSha1(byte[] toBeHashed)
    {
        using (var sha1 = SHA1.Create())
        {
            return sha1.ComputeHash(toBeHashed);
        }
    }
    public static byte[] ComputeHashSha256(byte[] toBeHashed)
    {
        using (var sha256 = SHA256.Create())
        {
            return sha256.ComputeHash(toBeHashed);
        }
    }
    public static byte[] ComputeHashSha512(byte[] toBeHashed)
    {
        using (var sha512 = SHA512.Create())
        {
            return sha512.ComputeHash(toBeHashed);
        }
    }
}
Let’s now wire them up as we did with the MD5 hash example. This time, I made the two original messages different by setting originalMessage2 different by one character. Again, we need to convert the string into a byte array to calculate the hash code. Once the strings convert to a byte array, we can then calculate the hash code. When this is completed for each of our three hash types, the resulting hash code is printed to the console window by converting the hash code byte array into a base64-encoded string.
class Program
{
    static void Main()
    {
        const string originalMessage = "Message to hash";
        const string originalMessage2 = "M3ssage to hash";
        var sha1HashedMessage = HashData.ComputeHashSha1(
            Encoding.UTF8.GetBytes(originalMessage));
        var sha1HashedMessage2 = HashData.ComputeHashSha1(
            Encoding.UTF8.GetBytes(originalMessage2));
        var sha256HashedMessage = HashData.ComputeHashSha256(
            Encoding.UTF8.GetBytes(originalMessage));
        var sha256HashedMessage2 = HashData.ComputeHashSha256(
            Encoding.UTF8.GetBytes(originalMessage2));
        var sha512HashedMessage = HashData.ComputeHashSha512(
            Encoding.UTF8.GetBytes(originalMessage));
        var sha512HashedMessage2 = HashData.ComputeHashSha512(
            Encoding.UTF8.GetBytes(originalMessage2));
        Console.WriteLine();
        Console.WriteLine("SHA 1 Hashes");
        Console.WriteLine("Message 1 hash = " +
             Convert.ToBase64String(sha1HashedMessage));
        Console.WriteLine("Message 2 hash = " +
             Convert.ToBase64String(sha1HashedMessage2));
        Console.WriteLine();
        Console.WriteLine("SHA 256 Hashes");
        Console.WriteLine("Message 1 hash = " +
             Convert.ToBase64String(sha256HashedMessage));
        Console.WriteLine("Message 2 hash = " +
             Convert.ToBase64String(sha256HashedMessage2));
        Console.WriteLine();
        Console.WriteLine("SHA 512 Hashes");
        Console.WriteLine("Message 1 hash = " +
             Convert.ToBase64String(sha512HashedMessage));
        Console.WriteLine("Message 2 hash = " +
             Convert.ToBase64String(sha512HashedMessage2));
    }
}
When you look at the output of this example in the console window (Figure 4-3), you can see the difference in the size of the final base64 string. The SHA-512 hash is double the size of the SHA-256 hash. If you are not storing many hashes, then you may want to go straight to SHA-512. SHA-512 hashes provide the best security and future proofing. If you are storing many hashes and you feel storage double the size of the hash string is an issue, then a SHA-256 hash is a reasonable default.
../images/457525_1_En_4_Chapter/457525_1_En_4_Fig3_HTML.jpg
Figure 4-3

The result of running SHA family hashes against two different strings

We now have the ability to perform integrity checking through hashing. Let’s extend this capability with authentication by looking at hashed message authentication codes.

Authenticated Hashing

So far, we have covered MD5 and the SHA family of hashing functions. Their purpose is to provide integrity checking capabilities within applications to help detect if data has been tampered with or corrupted over time. What we want to do now is satisfy another of our four pillars of cryptography by talking about authentication, which naturally follows integrity.

If you combine a one-way hash function with a secret cryptographic key (Figure 4-4), you get a hash message authentication code (HMAC). Like a hash code, a HMAC verifies the integrity of a message.
../images/457525_1_En_4_Chapter/457525_1_En_4_Fig4_HTML.jpg
Figure 4-4

HMAC is similar to a normal hashing function, except that it takes a key as well as its input data

A HMAC also allows you to verify the authentication of a message, because only the person who knows the key can calculate the same message’s hash. Let’s walk through that with an example.

Let’s say you have a PDF file on your computer, and you calculate an HMAC of that data. To do this, you create a key using the same technique we talked about in Chapter 3. You generate a 256-bit or 32-byte random number using the RNGCryptoServiceProvider class. You take the PDF file and the key, pass it in the HMAC function, and get a hash code back. You then send the PDF to a colleague along with the key. (We won’t worry about how you send the key just yet because we tackle that problem later in the book.) Your colleague recalculates the hash code using the same key, and they receive the same hash code in response, which means they are confident that the PDF file is intact.

The same day, someone else gets hold of a copy of the PDF file and tries to calculate the same hash code. The problem is they do not have a copy of the key, so when they try to recalculate the hash code, they get a completely different response; the hash code doesn’t match for this person because they are not in possession of the correct key. This means that only the authorized person can calculate the same hash code for a file. The authorized party has a copy of the correct key.

Why does this matter so much? Well, it gives us a level of trust. If Alice and Bob are the only people who know the authentication key for the HMAC; if Alice sends Bob a message and a hash code, when Bob recalculates the hash (with the key he has), if the hashes match, Bob is confident that that data came from Alice. On the flipside to that, if our hacker, Eve, sends a message to Bob but uses a different key; when Bob recalculates the hash with the key he knows Alice has, the hash codes won’t match. If they don’t match, he shouldn’t trust the message that has been sent to him and Bob should disregard it.

I’ll summarize the fundamental differences between a standard MD5 or SHA hash and an HMAC: anyone can calculate a hash code using MD5 or SHA and get the same results for a piece of data. Only an authorized individual can generate the same hash code using an HMAC because they need to have the same key used to generate the original HMAC hash code.

A HMAC, while requiring a key to be passed in, can be used with different hashing functions like MD5 or the SHA family of algorithms. The cryptographic strength of an HMAC depends on the size of the key that is used for the hash. When I use a HMAC in the systems I develop, I tend to use a 32-byte random number. Another common way to provide a key is a standard password that is first hashed with SHA-256, and then the hashed password is used as the key. If you need an ordinary person to provide a key, then using passwords is common, but then you have the problem of weak passwords to deal with. I talk about passwords and password storage in the next chapter.

The most common attack against an HMAC is a brute-force attack to uncover the key. A brute-force attack involves trying multiple combinations of a key until you find the correct key. The attacker tries to find a new key by iterating in a loop, and then compares the hash code output with the original hash code. This is why using a secure key such as a 32-byte randomly generated key is better; the chances of finding the correct key are significantly harder.

Passwords, on the other hand, are much easier to crack because the attacker can use a dictionary attack to recover the password. This is where a vast precomputed list of passwords and their corresponding hashes are stored. The attacker then checks to see if the hash code for the key is in the dictionary. If it is, they know the key. The dictionaries contain several gigabytes of precomputed passwords, including all the common variants in which people switch vowels into numbers or insert an exclamation mark at the end of the password.

Earlier I said that one of the requirements for a hashing algorithm is to not produce a hash code that is the same for two different pieces of original data, which is called a hash collision . Hash collisions are one of the main reasons why MD5 is no longer recommended. HMACs are substantially less affected by hash collisions than their underlying hashing algorithms, such as MD5 or SHA, because you are also using a key to add entropy to the source data being hashed.

In the following example, we look at how to use the different HMAC variants available in .NET. First, we have a class called Hmac. This class has everything needed to perform a hashed message authentication code, including the generation of a random number key. The code in GenerateKey is identical to the random number generator we used in Chapter 3. In this example, we use RNGCryptoServiceProvider to generate a fixed-size 32 byte or 256 bit key. The key doesn’t have to be this size, but I always default to 32 bytes because it is more impervious to a brute-force attack. You can make the key longer if you wish, or you can make it shorter, but personally, I wouldn’t go shorter than 32 bytes.

Next, we have the code to generate an HMAC based on the SHA-256 algorithm. First, an instance of the HMACSHA256 class is instantiated with the key passed into the constructor. Next, you call ComputeHash by passing in a byte array of the data you want hashed. Again, if your data is not represented as a byte array, you need to convert it. When ComputeHash finishes, it returns a byte array with the final hash code. The fundamental difference from ordinary hashing is that this hash is dependent on the key that is generated, so if the recipient of the hash wants to calculate the same hash code for the same input data, they need a copy of that key.

The process for calculating SHA-1, SHA-512, or MD5-based HMACs is the same, except the classes that you instantiate with the key as the constructor parameter are different. The classes are HMACSHA1, HMACSHA256, HMACSHA512, and HMACMD5, respectively.
public class Hmac
{
    private const int KeySize = 32;
    public static byte[] GenerateKey()
    {
        using (var randomNumberGenerator =
                      new RNGCryptoServiceProvider())
        {
            var randomNumber = new byte[KeySize];
            randomNumberGenerator.GetBytes(randomNumber);
            return randomNumber;
        }
    }
    public static byte[] ComputeHmacsha256(byte[] toBeHashed, byte[] key)
    {
        using (var hmac = new HMACSHA256(key))
        {
            return hmac.ComputeHash(toBeHashed);
        }
    }
    public static byte[] ComputeHmacsha1(byte[] toBeHashed, byte[] key)
    {
        using (var hmac = new HMACSHA1(key))
        {
            return hmac.ComputeHash(toBeHashed);
        }
    }
    public static byte[] ComputeHmacsha512(byte[] toBeHashed, byte[] key)
    {
        using (var hmac = new HMACSHA512(key))
        {
            return hmac.ComputeHash(toBeHashed);
        }
    }
    public static byte[] ComputeHmacmd5(byte[] toBeHashed, byte[] key)
    {
        using (var hmac = new HMACMD5(key))
        {
            return hmac.ComputeHash(toBeHashed);
        }
    }
}
We now have a helper class that can calculate HMACs and generate our key. Let’s hook them up to see an example. In our Main method, we first declare a const string with the string that we want to calculate the HMAC. Then we generate our key using our help method, GenerateKey. Next, calculate the HMACs of our test string using the key. HMACS are calculated using the MD5, SHA-1, SHA-256, and SHA-512 variants by first converting our sample string into a byte array. Once the HMACs are calculated, they are converted to base64-encoded strings. Then the result is output to the console display.
class Program
{
    static void Main()
    {
        const string originalMessage = "Message to hash";
        var key = Hmac.GenerateKey();
        var hmacMd5Message = Hmac.ComputeHmacmd5(
            Encoding.UTF8.GetBytes(originalMessage), key);
        var hmacSha1Message = Hmac.ComputeHmacsha1(
            Encoding.UTF8.GetBytes(originalMessage), key);
        var hmacSha256Message = Hmac.ComputeHmacsha256(
            Encoding.UTF8.GetBytes(originalMessage), key);
        var hmacSha512Message = Hmac.ComputeHmacsha512(
            Encoding.UTF8.GetBytes(originalMessage), key);
        Console.WriteLine();
        Console.WriteLine("MD5 HMAC");
        Console.WriteLine("hash = " +
                     Convert.ToBase64String(hmacMd5Message));
        Console.WriteLine();
        Console.WriteLine("SHA 1 HMAC");
        Console.WriteLine("hash = " +
                    Convert.ToBase64String(hmacSha1Message));
        Console.WriteLine();
        Console.WriteLine("SHA 256 HMAC");
        Console.WriteLine("hash = " +
                  Convert.ToBase64String(hmacSha256Message));
   
        Console.WriteLine();
        Console.WriteLine("SHA 512 HMAC");
        Console.WriteLine("hash = " +
                  Convert.ToBase64String(hmacSha512Message));
    }
}
The result of this sample application is shown in Figure 4-5. As with standard hashing, you can see the difference in size between the resulting hash codes.
../images/457525_1_En_4_Chapter/457525_1_En_4_Fig5_HTML.jpg
Figure 4-5

The result of running HMACs for our input data with a precomputed key

Summary

In this chapter, we explored classes in .NET to help satisfy two of the four pillars of cryptography: integrity and authentication. We used hashing algorithms such as MD5 and the SHA family of hashes to accomplish our integrity checking. The benefit of integrity checking is that if you’re sending data to another system or person, you can calculate a hash code of that data before sending it. The recipient can then recalculate the hash code from the data they receive and compare it to the original hash code sent to them. If they match, then the data wasn’t tampered with or corrupted.

Hashing functions, such as MD5 and the SHA hashes, work by passing source data into the hashing function, and then getting a unique hash code returned for that data. It should not be possible to produce the same hash for two different pieces of source data; if you encounter this, it is called a hash collision. MD5 is susceptible to this problem, which is why it is not recommended to use it in new applications. We covered how to use MD5 because you will most likely need it when interfacing with legacy systems. The recommendation is to use the SHA family of hashes, ideally SHA-256.

Then, I introduced authentication. Hashed message authentication codes, or HMACs for short, extend the hashing concept by providing a key as well as the original data that you want to hash. This gives us a unique property in that the recipient can only recalculate the same hash for some data if they are in possession of the key. If they don’t have the key or an incorrect key, then the resulting hash code will be different. This has another unique property in that if the recipient can recalculate the correct hash code with their key, then they have a level of confidence that the correct person sent the message. If an imposter sent the message and hash code, they would have a different key, provider the originator had kept their key safe. I cover the safe storage of keys later in the book.

In the next chapter, we build upon what we have covered in this chapter by talking about secure password storage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.206.162