7.3 Differential Cryptanalysis

This section is rather technical and can be skipped on a first reading.

Differential cryptanalysis was introduced by Biham and Shamir around 1990, though it was probably known much earlier to the designers of DES at IBM and NSA. The idea is to compare the differences in the ciphertexts for suitably chosen pairs of plaintexts and thereby deduce information about the key. Note that the difference of two strings of bits can be found by XORing them. Because the key is introduced by XORing with E(Ri1), looking at the XOR of the inputs removes the effect of the key at this stage and hence removes some of the randomness introduced by the key. We’ll see that this allows us to deduce information as to what the key could be.

7.3.1 Differential Cryptanalysis for Three Rounds

We eventually want to describe how to attack the above system when it uses four rounds, but we need to start by analyzing three rounds. Therefore, we temporarily start with L1R1 instead of L0R0.

The situation is now as follows. We have obtained access to a three-round encryption device that uses the preceding procedure. We know all the inner workings of the encryption algorithm such as the S-boxes, but we do not know the key. We want to find the key by a chosen plaintext attack. We use various inputs L1R1 and obtain outputs L4R4.

We have

R2=L1f(R1, K2)L3=R2=L1f(R1, K2)R4=L3f(R3, K4)=L1f(R1, K2)f(R3, K4).

Suppose we have another message L1R1 with R1=R1. For each i, let Ri=RiRi and Li=LiLi. Then LiRi is the “difference” (or sum; we are working mod 2) of LiRi and LiRi. The preceding calculation applied to L1R1 yields a formula for R4. Since we have assumed that R1=R1, we have f(R1, K2)=f(R1, K2). Therefore, f(R1, K2)f(R1, K2)=0 and

R4=R4R4=L1f(R3, K4)f(R3, K4).

This may be rearranged to

R4L1=f(R3, K4)f(R3, K4).

Finally, since R3=L4 and R3=L4, we obtain

R4L1=f(L4, K4)f(L4, K4).

Note that if we know the input XOR, namely L1R1, and if we know the outputs L4R4 and L4R4, then we know everything in this last equation except K4.

Now let’s analyze the inputs to the S-boxes used to calculate f(L4, K4) and f(L4, K4). If we start with L4, we first expand and then XOR with K4 to obtain E(L4)K4, which are the bits sent to S1 and S2. Similarly, L4 yields E(L4)K4. The XOR of these is

E(L4)E(L4)=E(L4L4)=E(L)4

(the first equality follows easily from the bit-by-bit description of the expansion function). Therefore, we know

  1. the XORs of the inputs to the two S-boxes (namely, the first four and the last four bits of E(L4));

  2. the XORs of the two outputs (namely, the first three and the last three bits of R4L1).

Let’s restrict our attention to S1. The analysis for S2 will be similar. It is fairly fast to run through all pairs of four-bit inputs with a given XOR (there are only 16 of them) and see which ones give a desired output XOR. These can be computed once for all and stored in a table.

For example, suppose we have input XOR equal to 1011 and we are looking for output XOR equal to 100. We can run through the input pairs (1011, 0000), (1010, 0001), (1001, 0010), ..., each of which has XOR equal to 1011, and look at the output XORs. We find that the pairs (1010, 0001) and (0001, 1010) both produce output XORs 100. For example, 1010 means we look at the second row, third column of S1, which is 110. Moreover, 0001 means we look at the first row, second column, which is 010. The output XOR is therefore 110010=100.

We know L4 and L4. For example, suppose L4=101110 and L4=000010. Therefore, E(L4)=10111110 and E(L4)=00000010, so the inputs to S1 are 1011K4L and 0000K4L, where K4L denotes the left four bits of K4. If we know that the output XOR for S1 is 100, then (1011K4L, 0000K4L) must be one of the pairs on the list we just calculated, namely (1010, 0001) and (0001, 1010). This means that K4L=0001 or 1010.

If we repeat this procedure a few more times, we should be able to eliminate one of the two choices for K4 and hence determine four bits of K. Similarly, using S2, we find four more bits of K. We therefore know eight of the nine bits of K. The last bit can be found by trying both possibilities and seeing which one produces the same encryptions as the machine we are attacking.

Here is a summary of the procedure (for notational convenience, we describe it with both S-boxes used simultaneously, though in the examples we work with the S-boxes separately):

  1. Look at the list of pairs with input XOR=E(L4) and output XOR=R4L1.

  2. The pair (E(L4)K4, E(L4)K4) is on this list.

  3. Deduce the possibilities for K4.

  4. Repeat until only one possibility for K4 remains.

Example

We start with

L1R1=000111011011

and the machine encrypts in three rounds using the key K=001001101, though we do not yet know K. We obtain (note that since we are starting with L1R1, we start with the shifted key K2=01001101)

L4R4=000011100101.

If we start with

L1R1=101110011011

(note that R1=R1), then

L4R4=100100011000.

We have E(L4)=00000011 and E(L4)=10101000. The inputs to S1 have XOR equal to 1010 and the inputs to S2 have XOR equal to 1011. The S-boxes have output XOR R4L11=111101101001=010100, so the output XOR from S1 is 010 and that from S2 is 100.

For the pairs (1001, 0011), (0011, 1001), S1 produces output XOR equal to 010. Since the first member of one of these pairs should be the left four bits of E(L4)K4=0000K4, the first four bits of K4 are in {1001, 0011}. For the pairs (1100, 0111), (0111, 1100), S2 produces output XOR equal to 100. Since the first member of one of these pairs should be the right four bits of E(L4)K4=0011K4, the last four bits of K4 are in {1111, 0100}.

Now repeat (with the same machine and the same key K) and with

L1R1=010111011011 and L1R1=101110011011.

A similar analysis shows that the first four bits of K4 are in {0011, 1000} and the last four bits are in {0100, 1011}. Combining this with the previous information, we see that the first four bits of K4 are 0011 and the last four bits are 0100. Therefore, K=00001101 (recall that K4 starts with the fourth bit of K.

It remains to find the third bit of K. If we use K=000001101, it encrypts L1R1 to 001011101010, which is not L4R4, while K=001001101 yields the correct encryption. Therefore, the key is K=001001101.

7.3.2 Differential Cryptanalysis for Four Rounds

Suppose now that we have obtained access to a four-round device. Again, we know all the inner workings of the algorithm except the key, and we want to determine the key. The analysis we used for three rounds still applies, but to extend it to four rounds we need to use more probabilistic techniques.

There is a weakness in the box S1. If we look at the 16 input pairs with XOR equal to 0011, we discover that 12 of them have output XOR equal to 011. Of course, we expect on the average that two pairs should yield a given output XOR, so the present case is rather extreme. A little variation is to be expected; we’ll see that this large variation makes it easy to find the key.

There is a similar weakness in S2, though not quite as extreme. Among the 16 input pairs with XOR equal to 1100, there are eight with output XOR equal to 010.

Suppose now that we start with randomly chosen R0 and R0 such that R0=R0R0=001100. This is expanded to E(001100)=00111100. Therefore the input XOR for S1 is 0011 and the input XOR for S2 is 1100. With probability 12/16 the output XOR for S1 will be 011, and with probability 8/16 the output XOR for S2 will be 010. If we assume the outputs of the two S-boxes are independent, we see that the combined output XOR will be 011010 with probability (12/16)(8/16) = 3/8. Because the expansion function sends bits 3 and 4 to both S1 and S2, the two boxes cannot be assumed to have independent outputs, but 3/8 should still be a reasonable estimate for what happens.

Now suppose we choose L0 and L0 so that L0=L0L0=011010. Recall that in the encryption algorithm the output of the S-boxes is XORed with L0 to obtain R1. Suppose the output XOR of the S-boxes is 011010. Then R1=011010L0=000000. Since R1=R1R1, it follows that R1=R1*.

Putting everything together, we see that if we start with two randomly chosen messages with XOR equal to L0R0=011010001100, then there is a probability of around 3/8 that L1R1=001100000000.

Here’s the strategy for finding the key. Try several randomly chosen pairs of inputs with XOR equal to 011010001100. Look at the outputs L4R4 and L4R4. Assume that L1R1=001100000000. Then use three-round differential cryptanalysis with L1=001100 and the known outputs to deduce a set of possible keys K4. When L1R1=001100000000, which should happen around 3/8 of the time, this list of keys will contain K4, along with some other random keys. The remaining 5/8 of the time, the list should contain random keys. Since there seems to be no reason that any incorrect key should appear frequently, the correct key K4 will probably appear in the lists of keys more often than the other keys.

Here is an example. Suppose we are attacking a four-round device. We try one hundred random pairs of inputs L0R0 and L0R0=L0R0011010001100. The frequencies of possible keys we obtain are in the following table. We find it easier to look at the first four bits and the last four bits of K4 separately.

A table shows a table of frequencies.

It is therefore likely that K4=11000010. Therefore, the key K is 10*110000.

To determine the remaining bit, we proceed as before. We can compute that 000000000000 is encrypted to 100011001011 using K=101110000 and is encrypted to 001011011010 using K=100110000. If the machine we are attacking encrypts 000000000000 to 100011001011, we conclude that the second key cannot be correct, so the correct key is probably K=101110000.

The preceding attack can be extended to more rounds by extensions of these methods. It might be noticed that we could have obtained the key at least as quickly by simply running through all possibilities for the key. That is certainly true in this simple model. However, in more elaborate systems such as DES, differential cryptanalytic techniques are much more efficient than exhaustive searching through all keys, at least until the number of rounds becomes fairly large. In particular, the reason that DES uses 16 rounds appears to be because differential cryptanalysis is more efficient than exhaustive search until 16 rounds are used.

There is another attack on DES, called linear cryptanalysis, that was developed by Mitsuru Matsui [Matsui]. The main ingredient is an approximation of DES by a linear function of the input bits. It is theoretically faster than an exhaustive search for the key and requires around 243 plaintext–ciphertext pairs to find the key. It seems that the designers of DES had not anticipated linear cryptanalysis. For details of the method, see [Matsui].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.113.188