Hash Functions

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A.4. Hash Functions

A hash function maps bit strings of any length to bit strings of a fixed length n. For practical uses, hash functions should be easy to compute, that is, computing the hash of x should be doable in time polynomial in the size of x.

Since a hash function H maps an infinite set to a finite set, there must exist pairs (x₁, x₂) of distinct strings with H(x₁) = H(x₂). Such a pair is called a collision for H. For cryptographic applications (for example, for generating digital signatures), it should be computationally infeasible to find collisions for hash functions. To elaborate this topic further we mention the following two desirable properties of hash functions used in cryptography.

Definition A.3.

A hash function H is called second pre-image resistant, if it is computationally infeasible^[6] to find, for a given bit string x₁, a second bit string x₂ with H(x₁) = H(x₂).

^[6] A problem P is said to be computationally infeasible if any known or possible algorithm (deterministic or randomized) to solve P runs in infeasible (like super-polynomial) time, except perhaps for a set of some input instances, the density of which in the input space is zero (or, more generally, negligibly small).

Definition A.4.

A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x₁ and x₂ with H(x₁) = H(x₂).

In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.

Definition A.5.

An n-bit hash function H is called first pre-image resistant (or simply pre-image resistant), if it is computationally infeasible to find, for almost all bit strings y of length n, a bit string x (of any length) such that y = H(x). The qualification almost all in the last sentence was necessary, since one can compute and store the pairs (x_i, H(x_i)), i = 1, 2, . . . , k, for some small k and for some x_i of one’s choice. If the given y turns out to be one of these hash values H(x_i), a pre-image of y is easily available.

A hash function (provably or believably) satisfying all these three properties is called a cryptographic hash function. A hash function having first and second pre-image resistance is often called a one-way hash function. Some authors require both second pre-image resistance and collision resistance to define a collision-resistant hash function, but here we stick to Definitions A.3 and A.4. In what follows, an unqualified use of the phrase hash function indicates a cryptographic hash function.

Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.

Proposition A.1.

A collision resistant hash function is second pre-image resistant.

Proof

Let H be a (non-cryptographic) hash function which is not second pre-image resistant. This means that there is an algorithm A that efficiently computes second pre-images, except perhaps for a vanishingly small fraction of inputs. Choose a random bit string x₁. The probability that x₁ is not a bad input to A is very high and, in that case, A outputs a second pre-image x₂ quickly. This gives us an efficient randomized algorithm to compute collisions (x₁, x₂) for H.

The converse of Proposition A.1 is not true: A second pre-image resistant hash function need not be collision resistant (Exercise A.19). Also collision resistance (or second pre-image resistance) does not imply first pre-image resistance (Exercise A.20), and first pre-image resistance does not imply second pre-image resistance (Exercise A.21).

A hash function may or may not be used in conjunction with a secret key. An unkeyed hash function is typically used to check the integrity of a message and is often called a modification detection code (MDC). A keyed hash function, on the other hand, is usually employed to authenticate the origin of a message (in addition to verifying the integrity of the message) and so is often called a message authentication code (MAC).

A.4.1. Merkle’s Meta Method

Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.

Definition A.6.

Let m, with m = n + r for some . A function that maps bit strings of length m to bit strings of length n is called a compression function. Henceforth, we will consider only those compression functions that can be computed easily, that is, in polynomial time of the input size.

Since m > n, collisions must exist for F. For cryptographic use, collisions should be difficult to locate. We can define first and second pre-image resistance and collision resistance of compression functions as before.

Algorithm A.18. Merkle’s meta method

Input: A compression function with m = n + r and a bit string x of length < 2^r.

Output: The hash value H(x).

Steps:

Let λ be the bit length of x.
Set l := ⌈λ/r⌉.
If (λ is not a multiple of r) { Append rl – λ zero bits to the right of x. }
Break the padded x into blocks x₁, . . . , x_l each of length r.
Store in a new block x_l+1 the r-bit representation of λ.
Initialize h₀ := 0^r.
for i = 1, 2, . . . , l + 1 { h_i := F (h_i–1 ‖ x_i) }
Set H(x) := h_l+1.

Algorithm A.18 demonstrates how a compression function can be used to design an n-bit hash function H. The input message x is first broken into l ≥ 0 blocks each of bit length r, after padding zero bits, if necessary. The initial bit length λ of x is then stored in a new block. This implies that H cannot handle bit strings of length ≥ 2^r. For a reasonably big r, this is not a practical limitation. Storing λ is necessary for several reasons. First, it ensures that the for loop is executed at least once for any message. This prevents the trivial hash value 0^r (the bit string of length r containing zero bits only) for the null message. Moreover, if h_i = 0^r for some , then, without the length block, we would get H(x₁ ‖ . . . ‖ x_l) = H(x_i+1 ‖ . . . ‖ x_l) that leads to a collision for H.

We now show if F possesses the desired properties for use in cryptography, then so does H too.

Proposition A.2.

If F is first pre-image resistant, then so is H.

Proof

Assume that H is not first pre-image resistant, that is, an efficient algorithm A exists to compute x with H(x) = y for most (if not all) . Since y = h_l+1 = F (h_l ‖ x_l+1), a pre-image (namely, h_l ‖ x_l+1) of y under F is easily computable.

Proposition A.3.

If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant).

Proof

Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′.

First consider l ≠ l′. But then, in particular, the length blocks x_l+1 and are different and thus is a collision for F. So for the rest of the proof we take l = l′.

Now, suppose that for some . Choose the largest such i and note that h_i+1 and are defined and equal for this choice. This gives us the collision for F.

The only case that remains to be treated is for all . Since x ≠ x′, there is at least one with . For such an i, the equality implies that is a collision for F.

In order to design cryptographic hash functions, it suffices to design cryptographic compression functions. Block ciphers can be used for that purpose. Let f be a block cipher with block size n and key size r. Take m := n + r and consider the map that sends x = L ‖ R with and to the encrypted bit string f_R(L). If f_R are assumed to be random permutations of , the resulting compression function F possesses the desirable properties.

A.4.2. The Secure Hash Algorithm

Several custom-designed hash functions have been popularly used by the cryptography community. MD4 and MD5 are somewhat older 128-bit hash functions. Soon after its conception, MD4 was found to be vulnerable to several attacks. Also collisions for the compression function of MD5 are known. Therefore, these two hash functions have lost the desired level of confidence for cryptographic uses.

NIST has proposed a family of four hash algorithms. These algorithms are called secure hash algorithms and have the short names SHA-1, SHA-256, SHA-384 and SHA-512, which respectively produce 160-, 256-, 384- and 512-bit hash values. No collisions for SHA are known till date. In the rest of this section, we explain the SHA-1 algorithm. The workings of the other SHA algorithms are very similar and can be found in the FIPS document [222]. RIPEMD-160 is another popular 160-bit hash function.

SHA-1 (like other custom-designed hash functions mentioned above) is suitable for implementation in 32-bit processors. Suppose that we want to compute the hash SHA-1(M) of a message M of bit length λ. First, M is padded to get the bit string M′ := M ‖ 1 ‖ 0^k ‖ Λ, where Λ is the 64-bit representation of λ, and where k is the smallest non-negative integer for which the bit length of M′, that is, λ + 1 + k + 64, is a multiple of 512. M′ is broken into blocks M⁽¹⁾, M⁽²⁾, . . . , M^(l) each of length 512 bits. Each M⁽ⁱ⁾ is represented as a collection of sixteen 32-bit words , j = 0, 1, . . . , 15. SHA-1 supports big-endian packing, that is, stores the leftmost 32 bits of M⁽ⁱ⁾, the next 32 bits of the rightmost 32 bits of M⁽ⁱ⁾.

The SHA-1 computations are given in Algorithm A.19. One starts with a fixed initial 160-bit hash H⁽⁰⁾. Successively for i = 1, 2, . . . , l the i-th message block M⁽ⁱ⁾ is considered and the previous hash value H^(i–1) is updated to H⁽ⁱ⁾. At the end of the loop the 160-bit string H^(l) is returned as SHA-1(M). Each H⁽ⁱ⁾ is represented by five 32-bit words , j = 0, 1, 2, 3, 4. Here also, big-endian notation is used, that is, stores the leftmost 32 bits of H⁽ⁱ⁾, . . . , the rightmost 32 bits of H⁽ⁱ⁾.

The updating procedure uses logical functions f_j. Here, product (like xy) implies bit-wise AND, bar (as in ) denotes bit-wise complementation and ⊕ denotes bit-wise XOR, each on 32-bit operands. The notation LR^k(z) (resp. RR^k(z)) stands for a left (resp. right) rotation, that is, a cyclic left (resp. right) shift, of the bit string z of length 32 by k positions.

The bits of H⁽ⁱ⁾ are well-defined transformations of the bits of H^(i–1) under the guidance of the bits of M⁽ⁱ⁾. The good amount of non-linearity, introduced by the functions f_j and the modulo 2³² sums, makes it difficult to invert the transformation H^(i–1) ↦ H⁽ⁱ⁾ and thereby makes SHA-1 an (apparently) secure hash function.

Algorithm A.19. The SHA-1 algorithm

Input: A message M.

Output: The hash SHA-1(M) of M.

Steps:

Generate the message blocks M⁽ⁱ⁾, i = 1, 2, . . . , l.
/* Initialize the hash value */
H₀ := 0x67452301 efcdab89 98badcfe 10325476 c3d2e1f0.
for i = 1, 2, . . . , l {
   /* Compute the message schedule W_j, 0 ≤ j ≤ 79. */
   for
   for j = 16, 17, . . . , 79 { W_j := LR¹(W_j–3 ⊕ W_j–8 ⊕ W_j–14 ⊕ W_j–16) }
   /* Store the previous hash words */
   for
   /* Compute the updating values */
   for j = 0, 1, . . . , 79 {
      , Where

          and

      t₄ := t₃, t₃ := t₂, t₂ := RR²(t₁), t₁ := t₀, t₀ := T.
   }
   /* Update the hash value */
   for
}
Set SHA-1(M) := H^(l).

A test vector for SHA-1 is the following (here 616263 is the string “abc”):

SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.

Exercise Set A.4

A.18	Let x be a bit string. Break up x into blocks x₁, . . . , x_l each of bit size n (after padding, if necessary). Define H₁(x) := x₁ ⊕ . . . ⊕ x_l. Show that H₁ possesses none of the desirable properties of a cryptographic hash function.
A.19	Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0ⁿ⁺¹ refers to a bit string of length n + 1 containing zero-bits only. Show that H₂ is second pre-image resistant, but not collision resistant. [H]
A.20	Let H be an n-bit cryptographic hash function. Show that the function H₃ defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H]
A.21	Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H₄(x) denote the n-bit binary representation of y² (mod m). Show that H₄ is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H]
A.22	Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2^n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2^n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x).
A.23	Let be a collision resistant compression function. Define a compression function as follows. Let x be a bit string of length 4n. Write x = L ‖ R, where each of L and R is of length 2n bits. Define F₂(x) := F₁(F₁(L) ‖ F₁(R)). Show that F₂ is also collision-resistant. Inductively define as F_k(x) := F₁(F_k–1(L) ‖ F_k–1(R)), where L and R are the left and right halves of x. Show that each F_k is collision resistant. Show that if F₁ is first pre-image resistant, then so is each F_k. Define an n-bit hash function H as follows. Let x be a bit string of length l. If l < n, take k := 1, else choose such that 2^k–1n ≤ l < 2^kn. Construct the string and define H(x) := F_k(y). Is H collision resistant? [H] (Appending a one-bit at the end of x delimits x and thereby prevents trivial collisions.)
A.24	Let and be cryptographic compression functions. Show that defined as F(L ‖ R) := F₁(L) ‖ F₂(R) (where and ) is again a cryptographic compression function. The hash function H derived from DES (Section A.4.1) produces 64-bit hash values. For reasonable security, we require n-bit hash values with n at least 128. Use Part (a) to propose a method to make H achieve this desired level of security.
A.25	Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined f_j as f_j(x, y, z) := x ⊕ y ⊕ z for all and they replaced all costly mod 2³² addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H]