Side-Channel Attacks

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7.2. Side-Channel Attacks

Side-channel attacks refer to a class of cryptanalytic tools for determining a private key by measuring signals (like timing, power fluctuation, electromagnetic radiation) from or by inducing faults in the device performing operations involving the private key. In this section, we describe three methods of side-channel cryptanalysis: timing attack, power attack and fault attack.

7.2.1. Timing Attack

Paul C. Kocher introduced the concept of side-channel cryptanalysis in his seminal paper [155] on timing attacks. Though not unreasonable, timing attacks are somewhat difficult to mount in practice.

Details of the attack

The private-key operation in many cryptographic systems (like RSA or discrete-log-based systems) is usually a modular exponentiation of the form

y := x^d (mod n),

where d is the private key. The private-key procedure may involve other overheads (like message decoding), but the running time of the routine is usually dominated by and so can be approximated by the time of the modular exponentiation.

Assume that this exponentiation is carried out by a square-and-multiply algorithm known to Carol, the attacker. For example, suppose that Algorithm 3.9 is used. Each iteration of the for loop involves a modular squaring followed conditionally by a modular multiplication. The multiplication is done in an iteration if and only if the corresponding bit e_i in the exponent is 1. Thus, an iteration runs slower if e_i = 1 than if e_i = 0. If Carol could measure the timing of each individual iteration of the for loop, she would correctly guess most (if not all) of the bits in the exponent. But it is unreasonable to assume that an attacker can collect such detailed timing data. Moreover, if Algorithm 3.10 is used, these detailed data do not help much, because in this case the timing of an individual iteration of the for loop can at best differentiate between the two cases e_i = 0 and e_i ≠ 0. There are 2^t – 1 non-zero values for each e_i.

However, it is not difficult to think of a situation where the attacker can measure, to a reasonable accuracy, the total time of the exponentiation. In order to guess d, Carol requires the times of the modular exponentiations for several different values of x, say x₁, . . . , x_k, all known to her. (Note that x_i may be messages to be signed or intercepted ciphertexts.) The same exponent d is used for all these exponentiations. Let T_i be the time for computing (mod n), as measured by Carol. We may assume that all these k exponentiations are carried out on the same machine using the same routine.

Kocher considers the attack on the exponentiation routine of RSAREF, a cryptography toolkit available from the RSA Laboratories. This routine implements Algorithm 3.10 with t = 2. For the sake of convenience, the algorithm is reproduced below. We may assume that the exponent has an even number of bits—if not, pad a leading zero.

Algorithm 7.1. RSAREF’s exponentiation routine

Input: , and d = (d_2l–1d_2l–2 · · · d₁d₀)₂.

Output: y := x^d (mod n).

Steps:

(1)  z₁ := x.
(2)  z₂ := z₁x (mod n).
(3)  z₃ := z₂x (mod n).
(4)  y := 1.
(5)  for j = l - 1, . . . , 0 {
(6)     y := y² (mod n).
(7)     y := y² (mod n).
(8)     if ((d_2j+1d_2j)₂ ≠ 0) {
(9)         y := yz_{(d_2j+1d_2j)₂} (mod n).
(10)     }
(11)  }

Every step of the above algorithm runs in a time dependent on the operands. For example, the modular multiplication in Step (9) takes time dependent on the operands y and z_{(d_2j+1d_2j)₂}. The variation in the timing depends on the implementation of the modular arithmetic routines and also on the machine’s architecture. However, we make the assumption that for fixed operands each step requires a constant time on a given machine (or on identical machines). This is actually a loss of generality, since the running time of a complex step (like modular multiplication or squaring) for fixed operands may vary for various reasons like process scheduling, availability of cache, page faults and so on. It may be difficult, perhaps impossible, for an attacker to arrange for herself a verbatim emulation of the victim’s machine at the time when the latter performed the private-key operations. Let us still proceed with our assumption, say by conceiving of a not-so-unreasonable situation where the effects of these other factors are not sizable enough.

We use the subscript i to denote the i-th private-key operation for 1 ≤ i ≤ k. The entire routine takes time T_i for the i-th exponentiation, that is, for the input x_i. This measurement may involve some (unknown) error which we denote by e_i. The first four steps are executed only once during each call and take a total time of p_i (precomputation time). The for loop is executed l times. We ignore the time needed to maintain the loop (like decrementing j) and also the time taken by the if statement in Step (8). Let s_i,j and t_i,j be the times taken respectively by Steps (6) and (7), when the loop variable (j) assumes the value j. If Step (9) is executed, we denote by m_i,j the time taken by this step, else we set m_i,j := 0. It follows that

Equation 7.1

where the index in the sum decreases from l – 1 to 0 in steps of 1. Carol does not know this break-up (that is, the explicit values of e_i, s_i,j, t_i,j and m_i,j), but she can make an inductive guess in the following way.

Carol manages a machine and a copy of the exponentiation software both identical to those of the victim. She then successively guesses the secret bit pairs d_2l–1d_2l–2, d_2l–3d_2l–4, d_2l–5d_2l–6 and so on. Assume that at some stage Carol has correctly determined the exponent bits d_2j+1d_2j for j = l–1, l–2, . . . , j′+1. Initially j′ = l–1. Using this information Carol computes d_2j′ +1d_2j′ as follows. Carol’s knowledge at this stage allows her to measure p_i and s_i,j, t_i,j, m_i,j for j = l – 1, . . . , j′ + 1 — she simply runs Algorithm 7.1 on x_i. Carol then enters the loop with j = j′. The squaring operations are unconditional. Carol has the exact operands as the victim for the squaring steps. So Carol also measures s_i,j′ and t_i,j′.

The bit pair d_2j′+1d_2j′ (considered as a binary integer) can take any one of the four values g = 0, 1, 2, 3. Carol measures the time of Step (9) for each of the four choices of g and adds this time to the time taken by the algorithm so far, in order to obtain:

Equation 7.2

Kocher observed that the distribution of T_i, i = 1, . . . , k, is statistically related to that of only for the correct guess g. In order to see how, we subtract Equation (7.2) from Equation (7.1) to get:

Equation 7.3

Let us assume that the error term e_i is distributed like a random variable E. Similarly suppose that each multiplication (resp. squaring) has the distribution of a random variable M (resp. S). Taking the variance of Equation (7.3 ) over the values i = 1, 2, . . . , k and assuming that the sample size k is so large that the sample variances are very close to the variances of the respective random variables, we obtain:

Equation 7.4

where λ denotes the number of times Step (9) is executed for j = j′ – 1, . . . , 0. Note that λ is dependent on the private key and not on the arguments to the exponentiation routine. For the correct guess g, we have and so

On the other hand, for an incorrect guess g we have:

if one of m_i,j′ or is zero, or

if both m_i,j′ and are non-zero. (Recall that Var(αX + βY) = α² Var(X) + β² Var(Y) for any real α, β.)

Calculation of the sample variances of for the four choices of g gives Carol a handle to determine (or guess) the correct choice. Carol simply takes the g for which the variance is minimum. This is the fundamental observation that makes the timing attack work.

Of course, statistical irregularities exist in practice, and the approximation of the actual variances by the sample variances introduces errors in Equation (7.4). These errors are of particular concern for large values of j′, that is, during the beginning of the attack. However, if an incorrect guess is made at a certain stage, this is detected soon with high probability, as Carol proceeds further. Suppose that an erroneous guess of d_{2j″ + 1}d_2j″ has been made for some j″ > j′. This means that the values of y are different from the actual values starting from the iteration of the loop with j = j″ – 1. (We may assume that most, if not all, x_i ≠ 1.) We then do not have a cancellation of the timings for j = j″ – 1, . . . , j′. More correctly, if the guesses for j = l – 1, . . . , j″ + 1 are correct and the first error occurs at j = j″, then denoting the subsequent timings by one gets

Equation 7.5

Since each of the square and multiplication operations takes y as an operand, the original timings and the measured timings (the ones with hat) behave like independent variables and, therefore, taking the variance of Equation (7.5) yields

for some λ′ depending on the private key and on the previous guesses, but independent of the current guess g. In other words, Carol loses a meaningful relation of Var with the correctness of the current guess. Once Carol notices this, she backtracks and changes older guesses until the expected behaviour is restored. Thus, the timing attack comes with an error detection and correction strategy.

An analysis done by Kocher (neglecting E and assuming normal distributions for S and M) shows that Carol needs k = O(l) for a good probability of success.

Countermeasures

There are several ways in which timing attacks can be prevented.

If every multiplication step takes exactly the same time and so does every squaring step, the above timing attack does not work. Thus, forcing each multiplication and each squaring take the same respective times independent of their operands disallows Carol to mount the timing attack. Making m_i,j constant alone does not suffice, for difference in square timings can be exploited in subsequent iterations to correct a guess. Forcing every operation take exactly the same time as the slowest possibility makes the implementation run slower. Moreover, finding the slowest possibility may be difficult.
Interleaving random delays also makes timing attacks difficult to mount, because the attacker then requires more number of samples in order to smooth out the effect of the delays. But again adding delays harms performance and does not completely rule out the possibility of timing attacks.
Perhaps the best strategy to thwart timing attacks is to use a random pair (u, v) with v := u^–d (mod n) for each private-key operation. Initially x is multiplied by u and then the product ux is exponentiated to get u^dx^d ≡ v^–1y (mod n). Multiplication by v then yields the desired y. A new random pair (u, v) must be used for every exponentiation. However, the exponentiation v := u^–d (mod n) is too costly to be performed during every private-key operation and may itself invite timing attacks. A good trade-off is to choose (u, v) once, keep it secret and for the next private-key operation update (and replace) the old (u, v) by (u′, v′) with u′ ≡ u^e (mod n) and v′ ≡ v^e (mod n) for some small e (random or deterministic). The choice e = 2 is quite satisfactory in practice—performing two modular squares is much cheaper than computing the full exponentiation v := u^–d (mod n).

7.2.2. Power Analysis

In connection with timing attacks, we mentioned that if an adversary were able to measure the timing of each iteration of the square-and-multiply loop during an RSA (or discrete-log-based) private-key exponentiation, she could guess the bits in the key quite efficiently from only a few timing measurements. But it is questionable if such detailed timing data can be made available.

Now, think of a situation where Carol can measure patterns of power consumption made by the decrypting (or signing) device during one or more private-key operations with Alice’s private key. If Alice carries out the private-key operations in her personal workstation, it is difficult for Carol to conduct such measurements. So assume that Alice is using a smart card with a device to which Carol has a control. Carol inserts a small resistor in series with the line which drives Alice’s smart card. The power consumed by the smart-card circuit is roughly proportional to the current through the resistor. Measuring the voltage across the resistor (and multiplying by a suitable factor) Carol can observe the power consumed by Alice’s decryption device. Carol has to use a power measuring device that takes readings at a high frequency (100 MHz to several GHz depending on the budget of Carol). A set of power measurements obtained during a cryptographic operation is called a power trace. We now study how power traces can reveal Alice’s secrets.

Simple power analysis (SPA)

The individual steps in a private-key operation may be nakedly exposed in a power trace. This is, in particular, the case when different steps consume different amounts of power and/or take different times. Obtaining information about the operation of the decrypting device and/or the secrets by a direct interpretation of power traces is referred to as simple power analysis or SPA in short.

As an example of SPA, consider an implementation of RSA exponentiation using the naive square-and-multiply Algorithm 3.9. Here, the most power-consuming operations are modular squaring and modular multiplication. Modular multiplication typically runs slower than modular squaring. Also modular multiplication requires two different operands to fetch from the memory, whereas modular squaring requires only one operand. Thus, a multiplication operation has more and longer power requirements than a squaring operation.

A hypothetical^[1] SPA trace during a portion of an RSA private-key operation is shown in Figure 7.1. Each spike in the trace corresponds to either a square or a multiplication operation. Let us assume that the power consumption is measured with sufficient resolution, so that no spike is missed. Since multiplication runs longer (and requires more operands) than squaring, multiplication spikes are wider than squaring spikes.

^[1] SPA traces from real-life experiments on smart cards, as reported in several references, look similar to this. We, however, generated the trace using a random number generator. Absolute conformity to reality is not always crucial for the purposes of illustration.

Figure 7.1. Simulated SPA trace for a portion of an RSA private-key operation

Let us denote a squaring operation by S and a multiplication operation by M. We observe that Alice’s smart card performs the sequence

SMSMSSMSSSSMSSSMSS

of operations during the measurement interval shown. Since multiplication in an iteration of the loop is skipped if and only if the corresponding bit in the exponent is zero, we can group the operations as

(SM)(SM)(S)(SM)(S)(S)(S)(SM)(S)(S)(SM)(S)(S.

This, in turn, reveals the bit string 110100010010 in Alice’s private key.

Effective as it appears, SPA, in practice, does not pose a huge threat to the security of conventional cryptographic systems. Using algorithms for which power traces do not bear direct relationships with the bits of the private key largely reduces risks of fruitful SPA. The inefficient repeated square-and-multiply Algorithm 7.2 always performs a multiplication after squaring and thereby eliminates chances of a successful SPA.

Algorithm 7.2. SPA-resistant exponentiation

Input: , and the private key d = (d_l–1 · · · d₁d₀)₂.

Output: y := x^d (mod n).

Steps:

y := 1.
for (j = l – 1, . . . , 0) {
    t₀ := y² (mod n).
    t₁ := t₀x (mod n).
    y := t_dj.
}

Using the (more efficient) Algorithm 7.1 also frustrates SPA. Some chunks of two successive 0 bits are anyway revealed by power traces collected during the execution of this algorithm. But, for a decently large and random private key, this still leaves Carol with many unknown bits to be guessed. Note, however, that neither of the three remedies suggested to thwart the timing attack on Algorithm 7.1 seems to be effective in the context of SPA. Delays normally do not consume much power (unless some power-intensive dummy computations fill up the delays). Also, the masking of (x, y) by (u, v) fails to produce any alteration in the power consumption pattern during exponentiation.

If some private-key algorithm has unavoidable branchings due to individual bits in the private key, SPA can prove to be a notorious botheration.

Differential power analysis (DPA)

A carefully designed algorithm (like Algorithm 7.2) does not reveal key information from a simple observation of power traces. Moreover, the observed power traces may be corrupted by noise to an extent where SPA is not feasible. In such cases, differential power analysis (DPA) often helps the cryptanalyst reduce the effects of noise and exploit subtle correlation of power consumption patterns with specific bits in the operands. DPA requires availability of power traces from several private-key operations with the same key.

Consider the SPA-resistant Algorithm 7.2. Suppose that k power traces P₁(t), . . . , P_k(t) for the computations of (mod n), i = 1, . . . , k, are available to Carol, that the ciphertexts x₁, . . . , x_k are known to Carol and that d = (d_l–1 · · · d₁d₀)₂. Carol successively guesses the bits d_l–1, d_l–2, d_l–3, . . . of the exponent. Suppose that Carol has correctly guessed d_j for j = l – 1, . . . , j′ + 1. She now uses DPA to guess d_j′.

Let e := (d_l–1d_l–2 · · · d_{j′ + 1})₂. At the beginning of the for loop with j = j′ the variable y holds the value x^e modulo n. The loop computes x^2e and x^2e+1 and assigns y the appropriate value. If d_j′ = 0, then in the next iteration the loop computes x^4e and x^4e+1, whereas if d_j′ = 1, then in the next iteration the loop computes x^4e+2 and x^4e+3. It follows that the algorithm handles the value x^4e if and only if d_j′ = 0.

For each i = 1, . . . , k, Carol computes (mod n). Carol then chooses a particular bit position (say, the least significant bit) and considers the bit b_i of z_i at this position. We make the assumption that there is some subsequent step (or substep) in the implementation for which the average power consumption Π₀ for b = 0 is different from the average power consumption Π₁ for b = 1.^[2]

^[2] The exact step which exhibits differential bias toward an individual bit value is dependent on the implementation. If the implementation does not provide such a step, the attack cannot be mounted in this way. Initially, the DPA was proposed for DES, a symmetric encryption algorithm, in which such a dependence is clearly available. With asymmetric-key encryption, such a strong dependence of the power, consumed by a step, on an individual bit value is not obvious. One may, however, use other dividing criteria, like low versus high Hamming weight (that is, number of one-bits) in the operand, which bear more direct relationships with power consumption.

Carol partitions {1, . . . , k} into two subsets:

I₀	:=	{i \| b_i = 0},
I₁	:=	{i \| b_i = 1}.

Carol computes the average power traces and and subsequently the differential power trace

First, let d_j′ = 0. In this case, the routine handles and so the power consumption at some time τ is correlated to the bit b_i of . At any other instant, the power consumption is uncorrelated to this particular bit value. Therefore, if the sample size is sufficiently large and if the measurement noise has mean at zero, we have:

On the other hand, if d_j′ = 1, the value never appears in the execution of the algorithm and so at every time t the power consumption is uncorrelated to the particular bit of and so we expect

Δ(t) ≈ 0

for all t.

Figure 7.2 illustrates the two cases.^[3] If the differential power trace has a distinct spike, the guess d_j′ = 0 is correct. So by observing the existence or otherwise of a spike, Carol determines whether d_j′ = 0 or d_j′ = 1.

^[3] Once again, these are hypothetical traces obtained by random number generators.

Figure 7.2. Simulated DPA trace for a portion of an RSA private-key operation

(a) for the correct guess
(b) for an incorrect guess

The number k of samples required for a good probability of success depends on the bias Π₁–Π₀ relative to the measurement noise. We assume that . If the noise has a variance of σ², then by the central limit theorem the noise in each average power trace or has at each t an approximate variance 2σ²/k, and so in the differential power trace Δ(t) the noise has an approximate variance 4σ²/k. In order that the bias Π₁ –Π₀ stands out against the noise, we require , say, , that is, k ≥ 64σ²/(Π₁ – Π₀)².

Countermeasures

Several countermeasures can be adopted to prevent DPA, both in the software level and in the hardware level.

Interleaving random delays between instructions destroys the alignment of the time τ in different power traces. Using a clock with randomly varying tick-rate has a similar effect. The delays should be such that they cannot be easily analyzed and subsequently removed. Random delays increase the number of samples required for a successful DPA to an infeasible value.
Suitable implementations of the power-critical steps destroy the power consumption signature of these steps. For example, one may go for an implementation that exhibits a constant power consumption pattern irrespective of the operands. Another possibility is replacement of complex critical instructions by atomic instructions (like assembly instructions) for which the dependence of power consumption on operands is less or difficult to analyze. However, the assumption that one can measure power at any resolution (perhaps at infinite resolution, say, using an analog device) indicates that this countermeasure challenges only the attacker’s budget.
Masking (x, y) by multiplying with (u, v) (as we did to prevent timing attacks) also eliminates chances of mounting successful DPA. One has to use a fresh mask for each private-key operation. Random unknown masks destroy the correlation of the bit values b_i with power consumption. That is, the chosen bit b_i of behaves randomly in relation to the same bit of (u_ix_i)^4e and so the differential power trace no longer leaks the bias Π₁ – Π₀.
Another strategy to foil DPA is to use randomization in the private exponent d. Instead of computing y := x^d (mod n), one chooses a small random integer r (typically of bit size ≤ 20) and computes y := x^d+rh (mod n), where h is φ(n) for RSA or the order of the discrete-log (sub)group. Since d = O(h) typically, the performance of the exponentiation routine does not deteriorate much. But random values of r during different private-key operations change the exponent bits in an unpredictable manner.
Quick changes in the exponent (the private key, that is, the key pair) also prevent the attacker to gather sufficiently many power traces for mounting a successful DPA. A key-use counter can be employed for this purpose. Whenever a given private key has been used on a small predetermined number of occasions, the key pair is updated.
Hardware shielding of the decrypting device also reduces DPA possibilities. For example, in-chip buffers between the external power source and the chip processor have been proposed to mask off the variation of internal power from external measurements. Such hardware countermeasures are, in general, somewhat costlier than software countermeasures.

Paul Kocher asserts: DPA highlights the need for people who design algorithms, protocols, software, and hardware to work closely together when producing security products.

7.2.3. Fault Analysis

We finally come to the third genre of side-channel cryptanalysis. We investigate how hardware faults occurring during private-key operations can reveal the secret to an adversary. There are situations where a single fault suffices. Boneh et al. [30] classify hardware faults into three broad categories.

Transient faults These are faults caused by random (unpredictable) hardware malfunctioning. These may be the outcomes of occasional flips of bit values in registers or of temporary erroneous outputs from logic or arithmetic circuits in the processor. These faults are called transient, because they are not repeated. It is rather difficult to detect such (silent) faults.
Latent faults These are faults generated by some permanent malfunctioning and/or bugs inherent in the processor. For example, the floating-point bug in the early releases of the Pentium processor may lead to latent faults. Latent faults are permanent, that is, repeated, but may be difficult to locate in practice.
Induced faults An induced fault is deliberately caused by an adversary. For example, a short surge of electromagnetic radiation may cause a smart card to malfunction temporarily. A malicious adversary can induce such temporary hardware faults to extract secret information from the smart card. It is, however, difficult to induce deliberate faults in a remote workstation.

Although induced faults appear to be the ones to guard against most seriously, the other two types of faults are also of relevance. Consider a certifying authority signing many messages. Transient and/or unknown latent faults may reveal the authority’s private key to a user who can later utilize this knowledge to produce false certificates.

Fault attack on RSA based on CRT

Consider the implementation of RSA private-key operation based on the CRT combination of the values obtained by exponentiation modulo the prime divisors p and q of the modulus n (Algorithm 5.4). Suppose that m is a message to be signed and s := m^d (mod n) the corresponding signature, where d is the signer’s private key. The CRT-based implementation computes s₁ := s (mod p) and s₂ := s (mod q). Assume that due to hardware fault(s) exactly one of s₁ and s₂ is wrongly computed. Say, s₁ is incorrectly computed as . The corresponding faulty signature is denoted by . We assume that the CRT combination of and s₂ is correctly computed.

An adversary requires the faulty signature and the correct signature s on the same message m in order to obtain the factor q of n. To see how, note that (mod p), s ≡ s₁ (mod p) and (mod p), so that (mod p), that is, . On the other hand, (mod q), that is, . Therefore,

This is how the fault analysis of Boneh et al. [30] works.

Arjen K. Lenstra et al. [142] point out that the knowledge of the faulty signature alone reveals the secret divisor q, that is, one does not require the genuine signature s on m. The verification key e of the signer is publicly known. Since RSA exponentiation is bijective, (mod n). However, (mod q), and so (mod p). It follows that

Fault attack on RSA without CRT

Now, consider an implementation of RSA decryption based on a single exponentiation modulo n. For such an implementation, several models of fault attacks have been proposed. These attacks are less practical than the attack on CRT-based RSA just mentioned, because now one requires several faulty signatures in order to deduce the entire private key. Here, we present an attack due to Bao et al. [17].

As usual, the RSA modulus is n = pq and the signer’s key pair is (e, d). Consider a valid signature s on a message m. Let d = (d_l–1 · · · d₁d₀)₂ be the binary representation of the private key. Consider the powers:

s_i ≡ m^2ⁱ (mod n)

for i = 0, 1, . . . , l – 1.

The signature s can be written as:

We assume that the attacker knows m and s and hence can compute s_i and modulo n for i = 0, . . . , l – 1. There is no harm in assuming that the message m is randomly chosen. (We may assume that randomly chosen integers are invertible modulo n, because encountering a non-invertible non-zero integer by chance is a stroke of unimaginable good luck and is tantamount to knowing the factors of n.)

In order to guess a bit of d, the attacker induces a fault in exactly one of the bits d_j, changing it from d_j to . The position j is random, that is, not under the control of the attacker. Now, the algorithm outputs the faulty signature

and so

A repetition in the values s_l–1, . . . , s₀, , . . . , modulo n is again an incident of minuscule probability. Hence the attacker can uniquely identify the bit position j and the bit value d_j in d by comparing with these 2l values.

Statistical analysis implies that the attacker needs to repeat this procedure about l log l times (on same or different (m, s) pairs) in order to ensure that the probability of identifying all the bits of d is at least 1/2.

Fault attack on the Rabin digital signature algorithm

Recall from Algorithm 5.34 that the Rabin signature algorithm uses CRT to combine s₁ (mod p) and s₂ (mod q). Thus, the attack on CRT-based RSA, described earlier, is applicable mutatis mutandis to the Rabin signature scheme. The computation of the square roots s₁ and s₂ demands the major portion of the running time of the routine. Inducing a fault during the execution is, therefore, expected to affect exactly one of s₁ and s₂, as desired by the attacker.

Fault attack on DSA

Bao et al. [17] propose a fault attack on the digital signature algorithm (DSA). We work with the notations of Algorithm 5.43 and Algorithm 5.44, except that, for maintaining uniformity in this section, we use m (instead of M) to denote the message to be signed. The (public) parameters are p, a prime divisor r of p – 1 of length 160 bits and an element of multiplicative order r. The signer’s DSA key pair is (d, g^d(mod p)) with 1 < d < r.

Suppose that during the generation of a DSA signature, an attacker induces a fault in exactly one bit position of d changing it to . The routine generates the faulty signature , where

(d′, g^d′) being the session key pair (not mutilated). As in the DSA signature-verification scheme, the attacker computes the following:

For each i = 0, . . . , l – 1 (where the bit length of d is l), the attacker also computes

Assume that the j-th bit d_j of d is altered. If d_j = 0, and so

On the other hand, if d_j = 1, then and a similar calculation shows that

Thus, the attacker computes and for all j = 0, . . . , l – 1 and notices a unique match (with s). This discloses the position j and the corresponding bit d_j.

Fault attack on the ElGamal signature scheme

A fault attack similar to that on the DSA scheme can be mounted on the ElGamal signature scheme. We here propose an alternative method proposed by Zheng and Matsumoto [315]. The novelty in their approach is that it performs the cryptanalysis of the ElGamal signature scheme by inducing fault on the pseudorandom bit generator of the signer’s smart card.

Algorithms 5.36 and 5.37 describe the ElGamal signature scheme on a general cyclic group G. Here, we restrict our attention to the specific group (though the following exposition works perfectly well for a general G). The parameters are a prime modulus p and a generator g of . The signer’s key-pair is (d, g^d(mod p)) for some d, 2 ≤ d ≤ p – 2.

In order to generate a signature (s, t) on a message m, a random session key d′ is generated and subsequently the following computations are carried out:

s	≡	g^d′ (mod p),
t	≡	d′^–1(H(m) – dH(s)) (mod p – 1).

Zheng and Matsumoto attack the generation of the session key d′. They propose the possibility that an abnormal physical stress (like low voltage) forces a constant output d₀ for d′ from the pseudorandombit generator (software or hardware) in the smart card. First, assume that this particular value d₀ is known a priori to the attacker. She then lets a message m generate a signature (s, t) with the session secret d₀. The private key d is then immediately available from the equation:

d ≡ H(s)^–1(H(m) – d₀t) (mod p – 1).

Here, we assume that H(s) is invertible modulo p – 1.

If d₀ is not known a priori, the attacker generates two signatures (s₁, t₁) and (s₂, t₂) on messages m₁ and m₂ respectively. Since d′ is always d₀, we have s₁ = s₂ = s₀, say. One can then easily calculate

d₀ ≡ (t₁ – t₂)^–1(H(m₁) – H(m₂)) (mod p – 1),

which, in turn, yields

d ≡ H(s₀)^–1(H(m₁) – d₀t₁) (mod p – 1).

Fault attack on the Feige–Fiat–Shamir identification protocol

Let us conclude our repertoire of fault attack examples by explaining an attack on the FFS zero-knowledge identification protocol. This attack is again from Boneh et al. [30].

We use the notations of Algorithm 5.69. A modulus n = pq, p, , is first chosen (by Alice or by a trusted third party). Alice selects random x₁, . . . , and random bits δ₁, . . . , δ_t, computes (mod n), publishes (y₁, . . . , y_t) and keeps (x₁, . . . , x_t) secret.

During an identification session with Bob, Alice generates a random commitment and sends to Bob the witness w := c² (mod n). (For simplicity, we take γ of Algorithm 5.69 to be 0.) When Alice is waiting for a challenge from Bob, a fault occurs in her smart card changing the commitment c to c + E. Assume that the fault is at exactly one bit position, that is, E = ±2^j for some , l being the bit length of c (or of n). This fault may be purposely induced by Bob with the malicious intention of guessing Alice’s secret (x₁, . . . , x_t).

Bob then generates a random challenge as usual. Upon reception of this challenge Alice computes and sends to Bob the faulty response

The knowledge of now aids Bob to obtain the product as follows. First, note that

so that

for some

There are only 4l possible values of (E, δ). Bob tries all these possibilities one by one. To simplify matters we assume that only one value of (E, δ) with E of the special form ±2^j and with satisfies the last congruence. In practice, the existence of two (or more) solutions for (E, δ) is an extremely improbable phenomenon. For a guess of (E, δ), the commitment c can be computed as

The correctness of the guess (E, δ) can be verified from the relation w ≡ c² (mod n). Bob can now compute the desired product

In order to strengthen the confidence about the correctness of T, Bob may repeat the protocol once more with the same values of ∊₁, . . . , ∊_t, but under normal conditions (that is, without faults). This time he obtains w′ ≡ (c′)² (mod n) and r′ ≡ c′T (mod n), which together give (r′)² ≡ w′T² (mod n), a relation that proves the correctness of T.

Bob repeats the above procedure t times in order to generate the system:

Equation 7.6

Here, ∊_ki and T_k are known to Bob. Moreover, the exponents ∊_ki can be so selected that the matrix (∊_ki) is invertible modulo 2. In order to determine x₁, Bob tries to find satisfying

for some integers v₁, . . . , v_t. Comparing the coefficients gives the linear system

which can be solved for u₁, . . . , u_t, since the matrix (∊_ki) is invertible modulo 2. The solution gives v₁, . . . , v_t and hence

Similarly, x₂, . . . , x_t can be determined up to sign. Plugging in these values of x_i in System (7.6) and solving another linear system modulo 2 gives the exact signs of all x_i.

Notice that Bob could have selected ∊_ki = δ_ki (where δ is the Dirac delta). For this choice, System (7.6) immediately gives x₁, . . . , x_t. But, in practice, Alice may disagree to respond to such simplistic challenges. Moreover, Bob must not raise any suspicion about a possible malpractice. For a general choice, all Bob has to do additionally is a little amount of simple linear algebra. The parameter t is rather small (typically less than 20); so this extra effort is of little concern to Bob.

Countermeasures

Fault analysis could be a serious threat, especially to smart-card users and certification authorities. We mention here some precautions to guard against such attacks. Some of these work for a general kind of fault attack, the others are specific to the algorithms they plan to protect.

One obvious general strategy is to perform the private-key operation twice and compare the results from the two executions. If the two results disagree, a fault must have taken place. It is then necessary to restart the computation from the beginning. This strategy slows down the implementation by a factor of two. Moreover, latent (permanent) faults cannot be detected by this method—the same error creeps in during every run.
It is sometimes easier to verify the correctness of the output by performing the reverse operation. For instance, after an RSA signature s ≡ m^d (mod n) is generated, one can check whether m ≡ s^e (mod n). If so, one can be reasonably confident about the correctness of s. If the RSA encryption exponent is small (like 3 or 257), this verification is quite efficient.
Ad hoc algorithm-specific tricks often offer effective and efficient checks for errors. Shamir [268] proposes the following check for CRT-based RSA signature generation. One chooses a small random prime r (say, of length ~ 32 bits) and computes s₁ ≡ m^d (mod pr) and s₂ ≡ m^d (mod qr). If s₁ ≢ s₂ (mod r), then one or both of the exponentiations went wrongly. If, on the other hand, s₁ ≡ s₂ (mod r), then s₁ (mod p) and s₂ (mod q) are combined by the CRT.
Maintaining extraneous error-checking data can guard against random bit flips. Parity check bits can detect the existence of single bit flips. Retaining a verbatim copy of a secret information d and comparison of the two copies at strategic instants can help detect more serious faults. It appears unlikely that both the copies can be affected by faults in exactly the same way. For discrete-log-based systems, maintaining d^–1 in tandem with d appears to be a sound approach. Since the bits of d^–1 are not in direct relationship with those of d, an attack on d cannot easily produce the relevant changes in d^–1. As an example, consider the attack on DSA effected by toggling a bit of the secret key d. The second part of the signature can be generated in two ways: by computing t₁ ≡ d′^–1(H(m) + ds) (mod r) using d, and by computing t₂ ≡ d′^–1(d^–1)^–1(d^–1H(m) + s) (mod r) using d^–1. If t₁ ≡ t₂ (mod r), we can be pretty confident that this common value is the correct signature.
Appending random strings to the messages being signed also prevents timing attacks. Such random strings are not known to the adversary and cannot be easily recovered by the verification routine on a faulty signature. Also in this case the signer signs different strings on different occasions, even when the message remains the same.
Hardware countermeasures can also be adopted. Adequately shielded cards resist induced faults. In a situation described by Zheng and Matsumoto, the card should refuse to work instead of generating constant random bits. In the scenario of fault analysis, it, however, appears that robustness can be implanted easily at the software level. At any rate, sloppy hardware designs are never advocated.

Exercise Set 7.2

7.1	Consider the notations of Section 7.2.1. Assume that m_i,j is constant for all i, j (and irrespective of d_2j+1d_2j), but the square times s_i,j and t_i,j vary according to their operands. Device a timing attack on such a system.
7.2	Show that under reasonable assumptions the SPA-resistant Algorithm 7.2 can be crypt-analyzed by timing attacks.
7.3	Recall that SPA of Algorithm 7.1 may leak partial information on the private key (some 00 sequences in the key). Rewrite the algorithm to prevent this leakage.
7.4	Assume that in Bao et al.’s attack on RSA described in the text, the attacker can induce faults in exactly two bit positions of d. Suggest how the two bits of d at these positions can be revealed from the resulting faulty signature.
7.5	Consider a variant of the Bao et al.’s attack on RSA described in the text, in which the valid signature s on m is unknown to the attacker. Explain how the position j of the erroneous bit and the bit d_j at this position can still be identified. [H]
7.6	Bao et al. [17] propose an alternate fault analysis on RSA with square-and-multiply exponentiation. Use the notations (n, e, d, m, s, s_i) as in the text. Assume that the attacker knows an (m, s) pair and can induce a fault in exactly one of the values s_j (and nowhere else) and generate the corresponding faulty signature. Suggest a strategy how the position j and the bit d_j can be recovered in this case.
7.7	Propose a fault attack on the ElGamal signature scheme (Algorithms 5.36 and 5.37), similar to the attack on DSA described in the text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Side-Channel Attacks

Create new playlist

Sign In

Sign Up

7.2. Side-Channel Attacks

7.2.1. Timing Attack

Details of the attack

Algorithm 7.1. RSAREF’s exponentiation routine

Countermeasures

7.2.2. Power Analysis

Simple power analysis (SPA)

Figure 7.1. Simulated SPA trace for a portion of an RSA private-key operation

Algorithm 7.2. SPA-resistant exponentiation

Differential power analysis (DPA)

Figure 7.2. Simulated DPA trace for a portion of an RSA private-key operation

Countermeasures

7.2.3. Fault Analysis

Fault attack on RSA based on CRT

Fault attack on RSA without CRT

Fault attack on the Rabin digital signature algorithm

Fault attack on DSA

Fault attack on the ElGamal signature scheme

Fault attack on the Feige–Fiat–Shamir identification protocol

Countermeasures

Exercise Set 7.2

Table of Contents for
Side-Channel Attacks