Implications: Subtle Differences

The advanced set of optimizations employed in today’s processors results in an interesting set of consequences. We observe that execution times depend on the following characteristics, which can be divided into three groups:

Type of instruction and the complexity of the operation. Some operations execute much faster than others.
Operand values. Certain multiple cycle algorithms prove faster for trivial inputs. For example, multiplying a value by 0 is generally rather trivial and can be done quickly.
The memory location from which the data needed for the instruction must be retrieved. Cached memory is available sooner.

The importance, prevalence, and impact of each of these characteristics depends on the exact nature of the CPU architecture in question. The first characteristic—variable instruction execution times—is shared by all multi-cycle architectures, but might be absent on some basic chips. The second—dependence on operands—is increasingly extinct in top-of-the-line processors.

In top-end devices, ALU and Floating Point Unit (FPU) components sometimes work at a speed higher than the CPU itself. Hence, even if there are computation speed differences, they cannot be precisely measured because much of the arithmetic is done within one CPU clock tick.

The last group of timing patterns—memory location dependence—is, for a change, exclusive to today’s, high-performance computers and is unheard of in low-end controllers and various embedded designs.

The first two timing pattern groups—operation complexity and operand value dependences—can also manifest themselves on a level slightly higher than the CPU itself, namely software. Processors feature arithmetic units that deal well with fairly small integers (usually from 8 to 128 bits) and some floating-point numbers, but today’s cryptography and many other applications require the manipulation of large numbers (often hundreds or thousands of digits), high-precision floats, or various mathematic operations that are not implemented in hardware. Therefore, this functionality is commonly implemented in software libraries. Algorithms in those libraries are again likely to take variable time, depending on the specifics of the operation and operands.

Using Timing Patterns to Reconstruct Data

It can be argued that an attacker could deduce certain properties of the operands or of an operation performed by monitoring how long it takes for a program to process data. This poses a potential security risk because in several scenarios, at least one of the operands can be a secret value that is not supposed to be disclosed to a third party.

Although the concept of recovering data by watching someone with a stopwatch in your hand might sound surreal, today’s CPUs offer precise counters that allow parties to determine exact time intervals. Too, some operations can be considerably more time-consuming, with certain advanced opcodes on the Intel platform taking as much as thousands of cycles to complete. With ever-increasing network throughput and ever-improving response times, it is not entirely impossible to deduce this information, even from a remote system.

The nature of information leaked as computation complexity measurements may not be immediately clear. If so, Paul Kocher from Cryptography Research demonstrated a great example of this attack last century (that is, back in the ’90s[54]), using an example of the RSA algorithm we discussed in Chapter 1.

Bit by Bit . . .

Kocher observed that the process of decrypting data in the RSA algorithm is rather simple and is based on solving the following equation:

T = ck mod M

in which T is the decrypted message, c is the encrypted message, k is the secret key, and M is a moduli, which are a part of the key pair.

A trivial integer modulo exponentiation algorithm used in a typical implementation has an important property: if a specific bit of the exponent is one, a portion of the result is calculated by performing modulo multiplication on a portion of the base (some bits of c). If the bit is 0, the step is skipped. Even when the step is not actually skipped, the time needed by software to carry out multiplication varies, as indicated earlier. Most trivial cases—such as multiplying by a power of 2—are solved more quickly than others.

Hence, on such a system, it would appear that we can determine plenty of information about the key (k) by repeatedly checking to see how long it takes to decrypt a piece of information. Even on platforms on which hardware multiplication takes a fixed amount of time, a timing pattern often results from the use of software multiplication algorithms (such as Karatsuba multiplication algorithm) that are needed for processing large numbers such as the ones used by public key cryptography. Subsequent bits of the exponent make the private key, whereas the base is a representation of the message supplied or visible to the curious bystander.

The attack is rather trivial. The villain sends the attacker two similar but slightly different portions of encrypted data. They differ in a section X, so that decrypting that section would presumably take a different amount of time to decrypt. One of the variants of X, as far as the villain’s idea of victim’s modulo multiplication implementation goes, is a trivial case that would hence make the task of decrypting X fast. The other variant is expected to take more time.

If it takes the same amount of time for the attacker to decode and respond to both sequences, the attacker can safely assume that the part of the key that was used to decode section X consisted of zeros. They can also assume that the multiplication algorithm took the early optimization path, that of not performing any multiplication at all.

If, on the other hand, one of the scenarios takes more time, it’s obvious that in both cases, the multiplication was carried out, with one case being simpler to solve. The corresponding part of the secret key bit must have been set to a nonzero value.

By following this procedure, treating subsequent bits of the encrypted message as our “section X” and generating, or even (if one has more time) simply waiting for encrypted messages that will happen to work with this scenario, it is possible to reconstruct every bit of the key.

Note

Research suggests that this approach can be successfully extended to just about any algorithm that is carried out in a variable time and discusses some practical optimizations for the attack, such as the ability to deploy limited error detection and correction functionality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.246.193