Chapter 11 Hash Functions

11.1 Hash Functions

A basic component of many cryptographic algorithms is what is known as a hash function. When a hash function satisfies certain non-invertibility properties, it can be used to make many algorithms more efficient. In the following, we discuss the basic properties of hash functions and attacks on them. We also briefly discuss the random oracle model, which is a method of analyzing the security of algorithms that use hash functions. Later, in Chapter 13, hash functions will be used in digital signature algorithms. They also play a role in security protocols in Chapter 15, and in several other situations.

A cryptographic hash function h takes as input a message of arbitrary length and produces as output a message digest of fixed length, for example, 256 bits as depicted in Figure 11.1. Certain properties should be satisfied:

  1. Given a message m, the message digest h(m) can be calculated very quickly.

  2. Given a y, it is computationally infeasible to find an m with h(m)=y (in other words, h is a one-way, or preimage resistant, function). Note that if y is the message digest of some message, we are not trying to find this message. We are only looking for some m with h(m)=y.

  3. It is computationally infeasible to find messages m1 and m2 with h(m1)=h(m2) (in this case, the function h is said to be strongly collision resistant).

Figure 11.1 A Hash Function

An illustration shows a cryptographic hash function. It shows a long message and 256-bit message digest.

Note that since the set of possible messages is much larger than the set of possible message digests, there should always be many examples of messages m1 and m2 with h(m1)=h(m2). The requirement (3) says that it should be hard to find examples. In particular, if Bob produces a message m and its hash h(m), Alice wants to be reasonably certain that Bob does not know another message m with h(m)=h(m), even if both m and m are allowed to be random strings of symbols.

Preimage resistance and collision resistance are closely related, but we list them separately because they are used in slightly different circumstances. The following argument shows that, for our hash functions, collision resistance implies preimage resistance: Suppose H is not preimage resistant. Take a random x and compute y=H(x). If H is not preimage resistant, we can quickly find x with H(x)=y=H(x). Because H is many-to-one, it is likely that xx, so we have a collision, contradicting the collision resistance of H. However, there are examples that show that for arbitrary functions, collision resistance does not imply preimage resistance. See Exercise 12.

In practice, it is sometimes sufficient to weaken (3) to require H to be weakly collision resistant. This means that given x, it is computationally infeasible to find xx with H(x)=H(x). This property is also called second preimage resistance.

Requirement (3) is the hardest one to satisfy. In fact, in 2004, Wang, Feng, Lai, and Yu (see [Wang et al.]) found many examples of collisions for the popular hash functions MD4, MD5, HAVAL-128, and RIPEMD. The MD5 collisions have been used by Ondrej Mikle [Mikle] to create two different and meaningful documents with the same hash, and the paper [Lenstra et al.] shows how to produce examples of X.509 certificates (see Section 15.5) with the same MD5 hash (see also Exercise 15). This means that a valid digital signature (see Chapter 13) on one certificate is also valid for the other certificate, hence it is impossible for someone to determine which is the certificate that was legitimately signed by a Certification Authority. It has been reported that weaknesses in MD5 were part of the design of the Flame malware, which attacked several computers in the Middle East, including Iran’s oil industry, from 2010 to 2012.

In 2005, Wang, Yin, and Yu [Wang et al. 2] predicted that collisions could be found for the hash function SHA-1 with around 269 calculations, which is much better than the expected 280 calculations required by the birthday attack (see Section 12.1). In addition, they found collisions in a smaller 60-round version of SHA-1. These weaknesses were a cause for concern for using these hash algorithms and led to research into replacements. Finally, in 2017, a joint project between CWI Amsterdam and Google Research found collisions for SHA-1 [Stevens et al.]. Although SHA-1 is still common, it is starting to be used less and less.

One of the main uses of hash functions is in digital signatures. Since the length of a digital signature is often at least as long as the document being signed, it is much more efficient to sign the hash of a document rather than the full document. This will be discussed in Chapter 13.

Hash functions may also be employed as a check on data integrity. The question of data integrity comes up in basically two scenarios. The first is when the data (encrypted or not) are being transmitted to another person and a noisy communication channel introduces errors to the data. The second occurs when an observer rearranges the transmission in some manner before it gets to the receiver. Either way, the data have become corrupted.

For example, suppose Alice sends Bob long messages about financial transactions with Eve and encrypts them in blocks. Perhaps Eve deduces that the tenth block of each message lists the amount of money that is to be deposited to Eve’s account. She could easily substitute the tenth block from one message into another and increase the deposit.

In another situation, Alice might send Bob a message consisting of several blocks of data, but one of the blocks is lost during transmission. Bob might never realize that the block is missing.

Here is how hash functions can be used. Say we send (m, h(m)) over the communications channel and it is received as (M, H). To check whether errors might have occurred, the recipient computes h(M) and sees whether it equals H. If any errors occurred, it is likely that h(M)H, because of the collision-resistance properties of h.

Example

Let n be a large integer. Let h(m)=m(modn) be regarded as an integer between 0 and n1. This function clearly satisfies (1). However, (2) and (3) fail: Given y, let m=y. Then h(m)=y. So h is not one-way. Similarly, choose any two values m1 and m2 that are congruent mod n. Then h(m1)=h(m2), so h is not strongly collision resistant.

Example

The following example, sometimes called the discrete log hash function, is due to Chaum, van Heijst, and Pfitzmann [Chaum et al.]. It satisfies (2) and (3) but is much too slow to be used in practice. However, it demonstrates the basic idea of a hash function.

First we select a large prime number p such that q=(p1)/2 is also prime (see Exercise 15 in Chapter 13). We now choose two primitive roots α1 and α2 for p. Since α1 is a primitive root, there exists a such that α1aα2(modp). However, we assume that a is not known (finding a, if not given it in advance, involves solving a discrete log problem, which we assume is hard).

The hash function h will map integers mod q2 to integers mod p. Therefore, the message digest usually contains approximately half as many bits as the message. This is not as drastic a reduction in size as is usually required in practice, but it suffices for our purposes.

Write m=x0+x1q with 0x0, x1q1. Then define

h(m)α1x0α2x1(modp).

The following shows that the function h is probably strongly collision resistant.

Proposition

If we know messages mm with h(m)=h(m), then we can determine the discrete logarithm a=Lα1(α2).

Proof

Write m=x0+x1q and m=x0'+x1'q. Suppose

α1x0α2x1α1x0'α2x1'(modp).

Using the fact that α2α1a(modp), we rewrite this as

α1a(x1x1')(x0'x0)1(modp).

Since α1 is a primitive root mod p, we know that α1k1(modp) if and only if k0(modp1). In our case, this means that

a(x1x1)x0x0(modp1).

Let d=gcd(x1-x'1, p-1). There are exactly d solutions to the preceding congruence (see Subsection 3.3.1), and they can be found quickly. By the choice of p, the only factors of p1 are 1, 2, q, p1. Since 0x1, x1q1, it follows that (q1)x1x1q1. Therefore, if x1x10, then it is a nonzero multiple of d of absolute value less than q. This means that dq, p1, so d=1 or 2. Therefore, there are at most two possibilities for a. Calculate α1a for each possibility; only one of them will yield α2. Therefore, we obtain a, as desired.

On the other hand, if x1x1=0, then the preceding yields x0x00(modp1). Since (q1)x0x0q1, we must have x0=x0. Therefore, m=m, contrary to our assumption.

It is now easy to show that h is preimage resistant. Suppose we have an algorithm g that starts with a message digest y and quickly finds an m with h(m)=y. In this case, it is easy to find m1m2 with h(m1)=h(m2): Choose a random m and compute y=h(m), then compute g(y). Since h maps q2 messages to p1=2q message digests, there are many messages m with h(m)=h(m). It is therefore not very likely that m=m. If it is, try another random m. Soon, we should find a collision, that is, messages m1m2 with h(m1)=h(m2). The preceding proposition shows that we can then solve a discrete log problem. Therefore, it is unlikely that such an algorithm g exists.

As we mentioned earlier, this hash function is good for illustrative purposes but is impractical because of its slow nature. Although it can be computed efficiently via repeated squaring, it turns out that even repeated squaring is too slow for practical applications. In applications such as electronic commerce, the extra time required to perform the multiplications in software is prohibitive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.48.135