A hash function maps bit strings of any length to bit strings of a fixed length n. For practical uses, hash functions should be easy to compute, that is, computing the hash of x should be doable in time polynomial in the size of x.
Since a hash function H maps an infinite set to a finite set, there must exist pairs (x1, x2) of distinct strings with H(x1) = H(x2). Such a pair is called a collision for H. For cryptographic applications (for example, for generating digital signatures), it should be computationally infeasible to find collisions for hash functions. To elaborate this topic further we mention the following two desirable properties of hash functions used in cryptography.
A hash function H is called second pre-image resistant, if it is computationally infeasible[6] to find, for a given bit string x1, a second bit string x2 with H(x1) = H(x2).
|
A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x1 and x2 with H(x1) = H(x2). |
In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.
A hash function (provably or believably) satisfying all these three properties is called a cryptographic hash function. A hash function having first and second pre-image resistance is often called a one-way hash function. Some authors require both second pre-image resistance and collision resistance to define a collision-resistant hash function, but here we stick to Definitions A.3 and A.4. In what follows, an unqualified use of the phrase hash function indicates a cryptographic hash function.
Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.
The converse of Proposition A.1 is not true: A second pre-image resistant hash function need not be collision resistant (Exercise A.19). Also collision resistance (or second pre-image resistance) does not imply first pre-image resistance (Exercise A.20), and first pre-image resistance does not imply second pre-image resistance (Exercise A.21).
A hash function may or may not be used in conjunction with a secret key. An unkeyed hash function is typically used to check the integrity of a message and is often called a modification detection code (MDC). A keyed hash function, on the other hand, is usually employed to authenticate the origin of a message (in addition to verifying the integrity of the message) and so is often called a message authentication code (MAC).
Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.
Since m > n, collisions must exist for F. For cryptographic use, collisions should be difficult to locate. We can define first and second pre-image resistance and collision resistance of compression functions as before.
Algorithm A.18 demonstrates how a compression function can be used to design an n-bit hash function H. The input message x is first broken into l ≥ 0 blocks each of bit length r, after padding zero bits, if necessary. The initial bit length λ of x is then stored in a new block. This implies that H cannot handle bit strings of length ≥ 2r. For a reasonably big r, this is not a practical limitation. Storing λ is necessary for several reasons. First, it ensures that the for loop is executed at least once for any message. This prevents the trivial hash value 0r (the bit string of length r containing zero bits only) for the null message. Moreover, if hi = 0r for some , then, without the length block, we would get H(x1 ‖ . . . ‖ xl) = H(xi+1 ‖ . . . ‖ xl) that leads to a collision for H.
We now show if F possesses the desired properties for use in cryptography, then so does H too.
If F is first pre-image resistant, then so is H. Proof Assume that H is not first pre-image resistant, that is, an efficient algorithm A exists to compute x with H(x) = y for most (if not all) . Since y = hl+1 = F (hl ‖ xl+1), a pre-image (namely, hl ‖ xl+1) of y under F is easily computable. |
If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant). Proof Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′. First consider l ≠ l′. But then, in particular, the length blocks xl+1 and are different and thus is a collision for F. So for the rest of the proof we take l = l′. Now, suppose that for some . Choose the largest such i and note that hi+1 and are defined and equal for this choice. This gives us the collision for F. The only case that remains to be treated is for all . Since x ≠ x′, there is at least one with . For such an i, the equality implies that is a collision for F. |
In order to design cryptographic hash functions, it suffices to design cryptographic compression functions. Block ciphers can be used for that purpose. Let f be a block cipher with block size n and key size r. Take m := n + r and consider the map that sends x = L ‖ R with and to the encrypted bit string fR(L). If fR are assumed to be random permutations of , the resulting compression function F possesses the desirable properties.
Several custom-designed hash functions have been popularly used by the cryptography community. MD4 and MD5 are somewhat older 128-bit hash functions. Soon after its conception, MD4 was found to be vulnerable to several attacks. Also collisions for the compression function of MD5 are known. Therefore, these two hash functions have lost the desired level of confidence for cryptographic uses.
NIST has proposed a family of four hash algorithms. These algorithms are called secure hash algorithms and have the short names SHA-1, SHA-256, SHA-384 and SHA-512, which respectively produce 160-, 256-, 384- and 512-bit hash values. No collisions for SHA are known till date. In the rest of this section, we explain the SHA-1 algorithm. The workings of the other SHA algorithms are very similar and can be found in the FIPS document [222]. RIPEMD-160 is another popular 160-bit hash function.
SHA-1 (like other custom-designed hash functions mentioned above) is suitable for implementation in 32-bit processors. Suppose that we want to compute the hash SHA-1(M) of a message M of bit length λ. First, M is padded to get the bit string M′ := M ‖ 1 ‖ 0k ‖ Λ, where Λ is the 64-bit representation of λ, and where k is the smallest non-negative integer for which the bit length of M′, that is, λ + 1 + k + 64, is a multiple of 512. M′ is broken into blocks M(1), M(2), . . . , M(l) each of length 512 bits. Each M(i) is represented as a collection of sixteen 32-bit words , j = 0, 1, . . . , 15. SHA-1 supports big-endian packing, that is, stores the leftmost 32 bits of M(i), the next 32 bits of the rightmost 32 bits of M(i).
The SHA-1 computations are given in Algorithm A.19. One starts with a fixed initial 160-bit hash H(0). Successively for i = 1, 2, . . . , l the i-th message block M(i) is considered and the previous hash value H(i–1) is updated to H(i). At the end of the loop the 160-bit string H(l) is returned as SHA-1(M). Each H(i) is represented by five 32-bit words , j = 0, 1, 2, 3, 4. Here also, big-endian notation is used, that is, stores the leftmost 32 bits of H(i), . . . , the rightmost 32 bits of H(i).
The updating procedure uses logical functions fj. Here, product (like xy) implies bit-wise AND, bar (as in ) denotes bit-wise complementation and ⊕ denotes bit-wise XOR, each on 32-bit operands. The notation LRk(z) (resp. RRk(z)) stands for a left (resp. right) rotation, that is, a cyclic left (resp. right) shift, of the bit string z of length 32 by k positions.
The bits of H(i) are well-defined transformations of the bits of H(i–1) under the guidance of the bits of M(i). The good amount of non-linearity, introduced by the functions fj and the modulo 232 sums, makes it difficult to invert the transformation H(i–1) ↦ H(i) and thereby makes SHA-1 an (apparently) secure hash function.
Output: The hash SHA-1(M) of M. Steps: Generate the message blocks M(i), i = 1, 2, . . . , l. |
A test vector for SHA-1 is the following (here 616263 is the string “abc”):
SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.
A.18 | Let x be a bit string. Break up x into blocks x1, . . . , xl each of bit size n (after padding, if necessary). Define H1(x) := x1 ⊕ . . . ⊕ xl. Show that H1 possesses none of the desirable properties of a cryptographic hash function. |
A.19 | Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0n+1 refers to a bit string of length n + 1 containing zero-bits only. Show that H2 is second pre-image resistant, but not collision resistant. [H] |
A.20 | Let H be an n-bit cryptographic hash function. Show that the function H3 defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H] |
A.21 | Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H4(x) denote the n-bit binary representation of y2 (mod m). Show that H4 is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H] |
A.22 | Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x). |
A.23 | Let be a collision resistant compression function.
|
A.24 |
|
A.25 | Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined fj as fj(x, y, z) := x ⊕ y ⊕ z for all and they replaced all costly mod 232 addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H] |
18.216.47.169