Chapter 6. Where All Roads Meet: Modular Exponentiation

For a long time on that spot I stood, Where two roads converged in the wood and I thought: "Someone going the other way Might someday stop here for the sake Of deciding which path to take." But my direction lay where it lay. And walking on, I felt a sense Of wonder at that difference.

—Ilya Bernstein, Attention and Man

In addition to the calculational rules for addition, subtraction, and multiplication in residue classes we can also define an operation of exponentiation, where the exponent specifies how many times the base is to be multiplied by itself. Exponentiation is carried out, as usual, by means of recursive calls to multiplication: For a in • m we have

Where All Roads Meet: Modular Exponentiation

It is easy to see that for exponentiation in • m the usual rules apply (cf. Chapter 1):

Equation 6.1. 

Where All Roads Meet: Modular Exponentiation

First Approaches

The simplest approach to exponentiation consists in following the recursive rule defined above and multiplying the base a by itself e times. This requires e — 1 modular multiplications, and for our purposes that is simply too many.

A more efficient way of proceeding is demonstrated in the following examples, in which we consider the binary representation of the exponent:

Equation 6.2. 

First Approaches

Here raising the base to the fifteenth power requires only six multiplications, as opposed to fourteen in the first method. Half of these are squarings, which, as we know, require only about half as much CPU time as regular multiplications. Exponentiation to the sixteenth power is accomplished with only four squarings.

Algorithms for exponentiation of ae modulo m that calculate with the binary representation of the exponent are in general much more favorable than the first approach, as we are about to see. But first we must observe that the intermediate results of many integer multiplications one after the other quickly occupy more storage space than can be supplied by all the computer memory in the world, for from p = ab follows log p = b log a, and thus the number of digits of the exponentiated ab is the product of the exponent and the number of digits of the base. However, if we carry out the calculation of ae in a residue class ring • m, that is, by means of modular multiplication, then we avoid this problem. In fact, most applications require exponentiation modulo m, so we may as well focus our attention on this case.

Let

First Approaches

Equation 6.3. 

First Approaches

is the number of ones in the binary representation of e. If we assume that each digit takes on the value 0 or 1 with equal probability, then we may conclude that δ(e) has the expected value δ(e) = n/2, and altogether we have

First Approaches

Binary algorithm for exponentiation of ae modulo m

Equation 6.4. 

First Approaches

The following implementation of this algorithm gives good results already for small exponents, those that can be represented by the USHORT type.

Function:

mixed modular exponentiation with USHORT exponent

Syntax:

int umexp_l (CLINT bas_l, USHORT e, CLINT p_l,
               CLINT m_l);

Input:

bas_l (base)
e (exponent)
m_l (modulus)

Output:

p_l (power residue)

Return:

E_CLINT_OK if all ok
E_CLINT_DBZ if division by 0
int
umexp_l (CLINT bas_l, USHORT e, CLINT p_l, CLINT m_l)
{
 CLINT tmp_l, tmpbas_l;
 USHORT k = BASEDIV2;
 int err = E_CLINT_OK;
 if (EQZ_L (m_l))
  {
   return E_CLINT_DBZ;     /* division by zero */
  }
 if (EQONE_L (m_l))
  {
   SETZERO_L (p_l);        /* modulus = 1 ==> remainder = 0 */
   return E_CLINT_OK;
  }
 if (e == 0)     /* exponent = 0 ==> remainder = 1 */
  {
   SETONE_L (p_l);
   return E_CLINT_OK;
  }
 if (EQZ_L (bas_l))
  {
   SETZERO_L (p_l);
   return E_CLINT_OK;
  }
 mod_l (bas_l, m_l, tmp_l);
 cpy_l (tmpbas_l, tmp_l);

Note

After various checks the position of the leading 1 of the exponent e is determined. Here the variable k is used to mask the individual binary digits of e. Then k is shifted one more place to the right, corresponding to setting in - 2 in step 1 of the algorithm.

while ((e & k) == 0)
 {
  k >>= 1;
 }
k >>= 1;

Note

For the remaining digits of e we run through steps 2 and 3. The mask k serves as loop counter, which we shift to the right one digit each time. We then multiply by the base reduced modulo m_l.

while (k != 0)
  {
   msqr_l (tmp_l, tmp_l, m_l);
   if (e & k)
    {
     mmul_l (tmp_l, tmpbas_l, tmp_l, m_l);
    }
   k >>= 1;
  }
 cpy_l (p_l, tmp_l);
 return err;
}

The binary algorithm for exponentiation offers particular advantages when it is used with small bases. If the base is of type USHORT, then all of the multiplications ppa mod m in step 3 of the binary algorithm are of the type CLINT * USHORT modulo CLINT, which makes possible a substantial increase in speed in comparison to other algorithms that in this case would also require the multiplication of two CLINT types. The squarings, to be sure, use CLINT objects, but here we are able to use the advantageous squaring function.

Thus in the following we shall implement the exponentiation function wmexp_l(), the dual to umexp_l(), which accepts a base of type USHORT. The masking out of bits of the exponent is a good preparatory exercise in view of the following "large" functions for exponentiation. The way of proceeding consists essentially in testing one after the other each digit ei of the exponent against a variable b initialized to 1 in the highest-valued bit, and then shifting to the right and repeating the test until b is equal to 0.

The following function wmexp_l() offers for small bases and exponents up to 1000 bits, for example, a speed advantage of about ten percent over the universal procedures that we shall tackle later.

Function:

modular exponentiation of a USHORT base

Syntax:

int wmexp_l (USHORT bas, CLINT e_l, CLINT rest_l,
               CLINT m_l);

Input:

bas (base)
e_l (exponent)
m_l (modulus)

Output:

rest_l (remainder of base_l mod m_l)

Return:

E_CLINT_OK if all is ok
E_CLINT_DBZ if division by 0
int
wmexp_l (USHORT bas, CLINT e_l, CLINT rest_l, CLINT m_l)
{
 CLINT p_l, z_l;
 USHORT k, b, w;
 if (EQZ_L (m_l))
  {
   return E_CLINT_DBZ;     /* division by 0 */
  }
 if (EQONE_L (m_l))
  {
   SETZERO_L (rest_l);     /* modulus = 1 ==> remainder = 0 */
   return E_CLINT_OK;
  }
 if (EQZ_L (e_l))
  {
   SETONE_L (rest_l);
   return E_CLINT_OK;
  }
 if (0 == bas)
  {
   SETZERO_L (rest_l);
   return E_CLINT_OK;
  }
 SETONE_L (p_l);
 cpy_l (z_l, e_l);

Note

Beginning with the highest-valued nonzero bit in the highest-valued word of the exponent z_l the bits of the exponent are processed, where always we have first a squaring and then, if applicable, a multiplication. The bits of the exponent are tested in the expression if ((w & b) > 0) by masking their value with a bitwise AND.

b = 1 << ((ld_l (z_l) - 1) & (BITPERDGT - 1UL));
w = z_l[DIGITS_L (z_l)];
for (; b > 0; b >>= 1)
 {
  msqr_l (p_l, p_l, m_l);
  if ((w & b) > 0)
   {
    ummul_l (p_l, bas, p_l, m_l);
   }
 }

Note

Then follows the processing of the remaining digits of the exponent.

for (k = DIGITS_L (z_l) - 1; k > 0; k--)
 {
  w = z_l[k];
  for (b = BASEDIV2; b > 0; b >>= 1)
   {
    msqr_l (p_l, p_l, m_l);
    if ((w & b) > 0)
     {
      ummul_l (p_l, bas, p_l, m_l);
     }
    }
  }
 cpy_l (rest_l, p_l);
 return E_CLINT_OK;
}

M-ary Exponentiation

Through a generalization of the binary algorithm on page 82 the number of modular multiplications for exponentiation can be reduced even further. The idea is to represent the exponent in a base greater than 2 and to replace multiplication by a in step 3 by multiplication by powers of a. Thus let the exponent e be given by

M-ary Exponentiation

M-ary algorithm for exponentiation ae mod m

Equation 6.5. 

M-ary Exponentiation

The number of necessary multiplications evidently depends on the number of digits of the exponent e and thus on the choice of M. Therefore, we would like to determine M such that the exponentiation in step 3 can be computed to the greatest extent possible by means of squaring, as in the example above for 216, and such that the number of multiplications by the precomputed powers of a be minimized to a justifiable cost of storage space for the table.

The first condition suggests that we choose M as a power of 2: M = 2k. In view of the second condition we consider the number of modular multiplications as a function of M:

We require

Equation 6.6. 

M-ary Exponentiation

squares in step 3 and on average

Equation 6.7. 

M-ary Exponentiation

modular multiplications in step 4, where

Equation 6.8. 

M-ary Exponentiation

is the probability that a digit ei of e is nonzero. If we include the M - 2 multiplications for the computation of the table, then the M-ary algorithm requires on average

Equation 6.9. 

M-ary Exponentiation

Equation 6.10. 

M-ary Exponentiation

modular squarings and multiplications.

For exponents e and moduli m of, say, 512 binary places and M = 2k we obtain the numbers of modular multiplications for the calculation of ae mod m as shown in Table 6-1. The table shows as well the memory requirement for the precomputed powers of a mod m, which result from the product (2k - 2)CLINTMAXSHORT . sizeof(USHORT).

Table 6-1. Requirements for exponentiation

k

Multiplications

Memory (in Bytes)

1

766

0

2

704

1028

3

666

3084

4

644

7196

5

640

15420

6

656

31868

We see from the table that the average number of multiplications reaches a minimum of 640 at k = 5, where the required memory for each larger k grows by approximately a factor of 2. But what are the time requirements for other orders of magnitude of the exponents?

Table 6.2 gives information about this. It displays the requirements for modular multiplication for exponentiation with exponents with various numbers of binary digits and various values of M = 2k. The exponent length of 768 digits was included because it is a frequently used key length for the RSA cryptosystem (see Chapter 17). The favorable numbers of multiplications appear in boldface.

Table 6-2. Numbers of multiplications for typical sizes of exponents and various bases 2k

Number of Binary Digits in the Exponent

         

k

32

64

128

512

768

1024

2048

4096

1

45

93

190

766

1150

1534

3070

6142

2

44

88

176

704

1056

1408

2816

5632

3

46

87

170

666

996

1327

2650

5295

4

52

91

170

644

960

1276

2540

5068

5

67

105

181

640

945

1251

2473

4918

6

98

135

209

656

954

1252

2444

4828

7

161

197

271

709

1001

1294

2463

4801

8

288

324

396

828

1116

1404

2555

4858

In consideration of the ranges of numbers for which the FLINT/C package was developed, it appears that with k = 5 we have found the universal base M = 2k, with which, however, there is a rather high memory requirement of 15 kilobytes for the powers a2, a3, . . . , a31 to base a that are to be precomputed. The M-ary algorithm can be improved, however, according to [Cohe], Section 1.2, to the extent that we can employ not M - 2, but only M/2, premultiplications and thus require only half the memory. The task now additionally consists in the calculation of the power ae mod m, where

Numbers of multiplications for typical sizes of exponents and various bases 2k

M-ary Algorithm for exponentiation with reduced number of premultiplications

Equation 6.11. 

Numbers of multiplications for typical sizes of exponents and various bases 2k

The trick of this algorithm consists in dividing up the squarings required in step 3 in a clever way, such that the exponentiation of a is taken care of together with the even part 2t of ei. Within the squaring process the exponentiation of a by the odd part u of ei remains. The balance between multiplication and squaring is shifted to the more favorable squaring, and only the powers of a with odd exponent need to be precomputed and stored.

For this splitting one requires the uniquely determined representation ei = 2tu, u odd, of the exponent digit ei. For rapid access to t and u a table is used, which, for example, for k = 5 is displayed in Table 6-3.

To calculate these values we can use the auxiliary function twofact_l(), which will be introduced in Section 10.4.1. Before we can program the improved M-ary algorithm there remains one problem to be solved: How, beginning with the binary representation of the exponent or the representation to base B = 216, do we efficiently obtain its representation to base M = 2k for a variable k > 0? It will be of use here to do a bit of juggling with the various indices, and we can "mask out" the required digits ei to base M from the representation of e to base B. For this we set the following: Let

Numbers of multiplications for typical sizes of exponents and various bases 2k

Let

Numbers of multiplications for typical sizes of exponents and various bases 2k

With these settings the following statements hold:

  1. There are f + 1 digits in

    Numbers of multiplications for typical sizes of exponents and various bases 2k
  2. esi contains the least-significant bit of the digit ei.

  3. di specifies the position of the least-significant bit of ei in esi (counting of positions begins with 0). If i < f and di > 16 - k, then not all the binary digits of ei are in esi; the remaining (higher-valued) bits of ei are in esi+1. The desired digit ei thus corresponds to the k least-significant binary digits of

    Equation 6.12. 

    Numbers of multiplications for typical sizes of exponents and various bases 2k

Table 6-3. Values for the factorization of the exponent digits into products of a power of 2 and an odd factor

ei

t

u

ei

t

u

ei

t

u

0

0

0

11

0

11

22

1

11

1

0

1

12

2

3

23

0

23

2

1

1

13

0

13

24

3

3

3

0

3

14

1

7

25

0

25

4

2

1

15

0

15

26

1

13

5

0

5

16

4

1

27

0

27

6

1

3

17

0

17

28

2

7

7

0

7

18

1

9

29

0

29

8

3

1

19

0

19

30

1

15

9

0

9

20

2

5

31

0

31

10

1

5

21

0

21

   

As a result we have for

Values for the factorization of the exponent digits into products of a power of 2 and an odd factor

Equation 6.13. 

Values for the factorization of the exponent digits into products of a power of 2 and an odd factor

Since for the sake of simplicity we set e_l [sf + 2] ← 0, this expression holds as well for i = f.

We have thus found an efficient method for accessing the digits of the exponent in its CLINT representation, which arise from its representation in a power-of-two base 2k with k ≤ 16, whereby we are saved an explicit transformation of the exponent. The number of necessary multiplications and squarings for the exponentiation is now

Equation 6.14. 

Values for the factorization of the exponent digits into products of a power of 2 and an odd factor

where in comparison to μ1(k) (see page 87) the expenditure for the precomputations has been reduced by half. The table for determining the favorable values of k (Table 6-4) now has a somewhat different appearance.

Table 6-4. Numbers of multiplications for typical sizes of exponents and various bases 2k

Number of Binary Digits in the Exponent

         

k

32

64

128

512

768

1024

2048

4096

1

47

95

191

767

1151

1535

3071

6143

2

44

88

176

704

1056

1408

2816

5632

3

44

85

168

664

994

1325

2648

5293

4

46

85

164

638

954

1270

2534

5066

5

53

91

167

626

931

1237

2459

4904

6

68

105

179

626

924

1222

2414

4798

7

99

135

209

647

939

1232

2401

4739

8

162

198

270

702

990

1278

2429

4732

Starting with 768 binary digits of the exponent, the favorable values of k are larger by 1 than those given in the table for the previous version of the exponentiation algorithm, while the number of required modular multiplications has easily been reduced. It is to be expected that this procedure is on the whole more favorable than the variant considered previously. Nothing now stands in the way of an implementation of the algorithm.

To demonstrate the implementation of these principles we select an adaptive procedure that uses the appropriate optimal value for k. To accomplish this we rely again on [Cohe] and look for, as is specified there, the smallest integer value k that satisfies the inequality

Equation 6.15. 

Numbers of multiplications for typical sizes of exponents and various bases 2k

which comes from the formula μ2(k) given previously for the number of necessary multiplications based on the condition μ2(k + 1) - μ2(k) ≥ 0. The constant number of modular squarings [log2 e] for all algorithms for exponentiation introduced thus far is eliminated; here only the "real" modular multiplications, that is, those that are not squarings, are considered.

The implementation of exponentiation with variable k requires a large amount of main memory for storing the precomputed powers of a; for k = 8 we require about 64 Kbyte for 127 CLINT variables (this is arrived at via (27 - 1) * sizeof(USHORT) * CLINTMAXSHORT)), where two additional automatic CLINT fields were not counted. For applications with processors or memory models with segmented 16-bit architecture this already has reached the limit of what is possible (see in this regard, for example, [Dunc], Chapter 12, or [Petz], Chapter 7).

Depending on the system platform there are thus various strategies appropriate for making memory available. While the necessary memory for the function mexp5_l() is taken from the stack (as automatic CLINT variables), with each call of the following function mexpk_l() memory is allocated from the heap. To save the expenditure associated with this, one may imagine a variant in which the maximum needed memory is reserved during a one-time initialization and is released only at the end of the entire program. In each case it is possible to fit memory management to the concrete requirements and to orient oneself to this in the commentaries on the following code.

One further note for applications: It is recommended always to check whether it suffices to employ the algorithm with the base M = 25. The savings in time that comes with larger values of k is relatively not so large in comparison to the total calculation time so as to justify in all cases the greater demand on memory and the thereby requisite memory management. Typical calculation times for various exponentiation algorithms, on the basis of which one can decide whether to use them, are given in Appendix D.

The algorithm, implemented with M = 25, is contained in the FLINT/C package as the function mexp5_l(). With the macro EXP_L() contained in flint.h one can set the exponentiation function to be used: mexp5_l() or the following function mexpk_l() with variable k.

Function:

modular exponentiation

Syntax:

int mexpk_l (CLINT bas_l, CLINT exp_l,
               CLINT p_l, CLINT m_l);

Input:

bas_l (base)
exp_l (exponent)
m_l (modulus)

Output:

p_l (power residue)

Return:

E_CLINT_OK if all is ok
E_CLINT_DBZ if division by 0
E_CLINT_MAL if malloc() error

Note

We begin with a segment of the table for representing ei = 2tu, u odd, 0 ≤ ei < 28 The table is represented in the form of two vectors. The first, twotab[], contains the exponents t of the two-factor 2t, while the second, oddtab[], holds the odd part u of a digit 0 ≤ ei < 25. The complete table is contained, of course, in the FLINT/C source code.

static int twotab[] =
{0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5, ...};
static USHORT oddtab[]=
{0,1,1,3,1,5,3,7,1,9,5,11,3,13,7,15,1,17,9,19,5,21,11,23,3,25,13, ...};
int
mexpk_l (CLINT bas_l, CLINT exp_l, CLINT p_l, CLINT m_l)
{

Note

The definitions reserve memory for the exponents plus the leading zero, as well as a pointer clint **aptr_l to the memory still to be allocated, which will take pointers to the powers of bas_l to be precomputed. In acc_l the intermediate results of the exponentiation will be stored.

CLINT a_l, a2_l;
clint e_l[CLINTMAXSHORT + 1];
CLINTD acc_l;
clint **aptr_l, *ptr_l;
int noofdigits, s, t, i;
ULONG k;
unsigned int lge, bit, digit, fk, word, pow2k, k_mask;

Note

Then comes the usual checking for division by 0 and reduction by 1.

if (EQZ_L (m_l))
 {
  return E_CLINT_DBZ;
 }
if (EQONE_L (m_l))
 {
  SETZERO_L (p_l);    /* modulus = 1 ==> residue = 0 */
  return E_CLINT_OK;
 }

Note

Base and exponent are copied to the working variables a_l and e_l, and any leading zeros are purged.

cpy_l (a_l, bas_l);
cpy_l (e_l, exp_l);

Note

Now we process the simple cases a0 = 1 and 0e = 0 (e > 0).

if (EQZ_L (e_l))
 {
  SETONE_L (p_l);
  return E_CLINT_OK;
 }
if (EQZ_L (a_l))
 {
  SETZERO_L (p_l);
  return E_CLINT_OK;
 }

Note

Next, the optimal value for k is determined; the value 2k is stored in pow2k, and in k_mask the value 2k - 1. For this the function ld_l() is used, which returns the number of binary digits of its argument.

lge = ld_l (e_l);
k = 8;
while (k > 1 && ((k - 1) * (k << ((k - 1) << 1))/((1 << k ) - k - 1)) >= lge - 1)
 {
  --k;
 }
pow2k = 1U << k;
k_mask = pow2k - 1U;

Note

Memory is allocated for the pointers to the powers of a_l to be computed. The base a_l is reduced modulo m_l.

if ((aptr_l = (clint **) malloc (sizeof(clint *) * pow2k)) == NULL)
 {
  return E_CLINT_MAL;
 }
mod_l (a_l, m_l, a_l);
aptr_l[1] = a_l;

Note

If k > 1, then memory is allocated for the powers to be computed. This is not necessary for k = 1, since then no powers have to be precomputed. In the following setting of the pointer aptr_l[i] one should note that in the addition of an offset to a pointer p a scaling of the offset by the compiler takes place, so that it counts objects of the pointer type of p.

We have already mentioned that the allocation of working memory can be carried out alternatively in a one-time initialization. The pointers to the CLINT objects would in this case be contained in global variables outside of the function or in static variables within mexpk_l().

if (k > 1)
 {
  if ((ptr_l = (clint *) malloc (sizeof(CLINT) * ((pow2k >> 1) - 1))) == NULL)
   {
    return E_CLINT_MAL;
   }
  aptr_l[2] = a2_l;
  for (aptr_l[3] = ptr_l, i = 5; i < (int)pow2k; i+=2)
   {
    aptr_l[i] = aptr_l[i - 2] + CLINTMAXSHORT;
   }

Note

Now comes the precomputation of the powers of the value a stored in a_l. The values a3, a5, a7 , . . . , ak - 1 are computed (a2 is needed only in an auxiliary role).

msqr_l (a_l, aptr_l[2], m_l);
for (i = 3; i < (int)pow2k; i += 2)
 {
  mmul_l (aptr_l[2], aptr_l[i - 2], aptr_l[i], m_l);
 }
}

Note

This ends the case distinction for k > 1. The exponent is lengthened by the leading zero.

*(MSDPTR_L (e_l) + 1) = 0;

Note

The determination of the value f (represented by the variable noofdigits).

noofdigits = (lge - 1)/k;
fk = noofdigits * k;

Note

Word position si and bit position di of the digit ei in the variables word and bit.

word = fk >> LDBITPERDGT;     /* fk div 16 */
bit = fk & (BITPERDGT-1U);     /* fk mod 16 */

Note

Calculation of the digit en−1 with the above-derived formula; en−1 is represented by the variable digit.

switch (k)
 {
  case 1:
  case 2:
  case 4:
  case 8:
   digit = ((ULONG)(e_l[word + 1] ) >> bit) & k_mask;
   break;
  default:
   digit = ((ULONG)(e_l[word + 1] | ((ULONG)e_l[word + 2]
                                              << BITPERDGT)) >> bit) & k_mask;
}

Note

First run through step 2 of the algorithm, the case digit = en−1 ≠ 0.

if (digit != 0)    /* k-digit > 0 */
 {
  cpy_l (acc_l, aptr_l[oddtab[digit]]);

Note

Calculation of p2t; t is set to the two-part of en−1 via twotab[en−1]; p is represented by acc_l.

t = twotab[digit];
  for (; t > 0; t--)
   {
    msqr_l (acc_l, acc_l, m_l);
   }
  }
 else    /* k-digit == 0 */
  {
   SETONE_L (acc_l);
  }

Note

Loop over noofdigits beginning with f - 1.

for (--noofdigits, fk -= k; noofdigits >= 0; noofdigits--, fk -= k)
 {

Note

Word position si and bit position di of the digit ei in the variables word and bit.

word = fk >> LDBITPERDGT;   /* fk div 16 */
bit = fk & (BITPERDGT - 1U);         /* fk mod 16 */

Note

Computation of the digit ei with the formula derived above; ei is represented by the variable digit.

switch (k)
 {
  case 1:
  case 2:
  case 4:
  case 8:
   digit = ((ULONG)(e_l[word + 1] ) >> bit) & k_mask;
   break;
  default:
   digit = ((ULONG)(e_l[word + 1] | ((ULONG)e_l[word + 2]
                                                 << BITPERDGT)) >> bit) & k_mask;
 }

Note

Step 3 of the algorithm, the case digit = ei ≠ 0; t is set via the table twotab[ei] to the two-part of ei.

if (digit != 0)    /* k-digit > 0 */
 {
  t = twotab[digit];

Note

Calculation of p2k-t au in acc_l. Access to au with the odd part u of ei is via aptr_l [oddtab [ei]].

for (s = k - t; s > 0; s--)
 {
  msqr_l (acc_l, acc_l, m_l);
 }
mmul_l (acc_l, aptr_l[oddtab[digit]], acc_l, m_l);

Note

Calculation of p2t; p is still represented by acc_l.

for (; t > 0; t--)
  {
   msqr_l (acc_l, acc_l, m_l);
  }
 }
else /* k-digit == 0 */
 {

Note

Step 3 of the algorithm, case ei = 0: Calculate p2k.

for (s = k; s > 0; s--)
  {
   msqr_l (acc_l, acc_l, m_l);
  }
 }
}

Note

End of the loop; output of acc_l as power residue modulo m_l.

cpy_l (p_l, acc_l);

Note

At the end, allocated memory is released.

free (aptr_l);
 if (ptr_l != NULL) free (ptr_l);
 return E_CLINT_OK;
}

The various processes of M-ary exponentiation can be clarified with the help of a numerical example. To this end let us examine the calculation of the power 1234667 mod 18577, which will be carried out by the function mexpk_l() in the following steps:

  1. Precomputations

    The representation of the exponent e = 667 can be expressed to the base 2k with k = 2 (cf. the algorithm for M -ary exponentiation on page 89), whereby the exponent e has the representation e = (10 10 01 10 11)22.

    The power a3 mod 18577 has the value 17354. Further powers of a do not arise in the precomputation because of the small size of the exponent.

  2. Exponentiation loop

    exponent digit ei = 2tu

    21•1

    21•1

    20•1

    21•1

    20•3

    pp2 mod n

    -

    14132

    13261

    17616

    13599

    pp22 mod n

    -

    -

    4239

    -

    17343

    ppau mod n

    1234

    13662

    10789

    3054

    4445

    pp2 mod n

    18019

    7125

    -

    1262

    -

  3. Result

    p = 1234667 mod 18577 = 4445.

As an extension to the general case we shall introduce a special version of exponentiation with a power of two 2k as exponent. From the above considerations we know that this function can be implemented very easily by means of k-fold exponentiation. The exponent 2k will be specified by k.

Function:

modular exponentiation with exponent a power of 2

Syntax:

int mexp2_l (CLINT a_l, USHORT k, CLINT p_l,
               CLINT m_l);

Input:

a_l (base)
k (exponent of 2)
m_l (modulus)

Output:

p_l (residue of a_l2k mod m_l)

Return:

E_CLINT_OK if all is ok
E_CLINT_DBZ if division by 0
int
mexp2_l (CLINT a_l, USHORT k, CLINT p_l, CLINT m_l)
{
 CLINT tmp_l;
 if (EQZ_L (m_l))
  {
   return E_CLINT_DBZ;
  }

Note

If k > 0, then a_l is squared k times modulo m_l.

if (k > 0)
  {
   cpy_l (tmp_l, a_l);
   while (k-- > 0)
    {
     msqr_l (tmp_l, tmp_l, m_l);
    }
   cpy_l (p_l, tmp_l);
  }
else

Note

Otherwise, if k = 0, we need only to reduce modulo m_l.

{
   mod_l (a_l, m_l, p_l);
  }
 return E_CLINT_OK;
}

Addition Chains and Windows

A number of algorithms for exponentiation have been published, some of which are conceived for arbitrary operands and others for special cases. The goal is always to find procedures that employ as few multiplications and divisions as possible. The passage from binary to M-ary exponentiation is an example of how the number of these operations can be reduced.

Binary and M-ary exponentiation are themselves special cases of the construction of addition chains (cf. [Knut], Section 4.6.3). We have already taken advantage of the fact that the laws of exponentiation allow the additive decomposition of the exponent of a power:

Addition Chains and Windows

Equation 6.16. 

Addition Chains and Windows

from which follows the exponentiation in the form of alternating squarings and multiplications (cf. page 82):

Equation 6.17. 

Addition Chains and Windows

The associated addition chain is obtained by considering the exponents to powers of a that arise as intermediate results in this process:

Equation 6.18. 

Addition Chains and Windows

Here terms of the sequence are omitted if ej = 0 for a particular j. For the number 123, for example, based on the binary method the result is the following addition chain with 12 elements: 1, 2, 3, 6, 7, 14, 15, 30, 60, 61, 122, 123.

In general, a sequence of numbers 1 = a0, a1, a2, . . . , ar = e for which for every i = 1, . . . , r there exists a pair (j, k) with jk < i such that ai = aj + ak holds is called an addition chain for e of length r.

The M-ary method generalizes this principle for the representation of the exponent to other bases. Both methods have the goal of producing addition chains that are as short as possible, in order to minimize the calculational expense for the exponentiation. The addition chain for 123 produced by the 23-ary method is 1, 2, 3, 4, 7, 8, 15, 30, 60, 120, 123; with the 24-ary method the addition chain 1, 2, 3, 4, 7, 11, 14, 28, 56, 112, 123 is created. These last chains are, as expected, considerably shorter than those obtained by the binary method, which for larger numbers will have a greater effect than in this example. In view of the real savings in time one must, however, note that in the course of initialization for the calculation of ae mod n the M-ary methods construct the powers a2, a3, a5, aM−1 also for those exponents that are not needed in the representation of e to the base M or for the construction of the addition chain.

Binary exponentiation represents the worst case of an addition chain: By considering it we obtain a bound on the greatest possible length of an addition chain of log2 e + H (e) − 1, where H(e) denotes the Hamming weight of e.[14] The length of an addition chain is bounded below by log2 e + log2 H (e) − 2.13, and so a shorter addition chain for e is not to be found (cf. [Scho] or [Knut], Section 4.6.3, Exercises 28, 29). For our example this means that the shortest addition chain for e = 123 has length at least 8, and so the results of the M-ary methods cited earlier seem not to be the best possible.

The search for shortest addition chains is a problem for which there is as yet no known polynomial-time procedure. It lies in the complexity class NP of those decision problems that can be solved in polynomial time by nondeterministic methods, that is, those that can be solved by "guessing," where the necessary time for calculation is bounded by a polynomial p that is a function of the size of the input. In contrast to this, the class P contains those problems that can be solved deterministically in polynomial time.[15] It is not surprising that P is a subset of NP, since all polynomial-time deterministic problems can also be solved nondeterministically.

The determination of the shortest addition chain is an NP-complete problem, that is, a problem that is at least as difficult to solve as all other problems in the set NP (cf. [Yaco] and [HKW], page 302). The NP-complete problems are therefore of particular interest, since if for even one of them a deterministic polynomial-time procedure could be found, then all other problems in NP could be solved in polynomial time as well. In this case, the classes P and NP would collapse into a single set of problems. Although P ≠ NP is conjectured, this problem has remained unsolved, and it represents a central problem of complexity theory.

With this it is clear that all practical procedures for generating addition chains must rest on heuristics, that is to say, mathematical rules of thumb such as that for the determination of the exponent k in 2k-ary exponentiation, of which one knows that it has better time behavior than other methods.

For example, in 1990 Y. Yacobi [Yaco] described a connection between the construction of addition chains and the compression of data according to the Lempel–Ziv procedure; there an exponentiation algorithm based on this compression procedure as well as on the M-ary method is also given.

In the search for the shortest possible addition chains the M-ary exponentiation can be further generalized, which we shall pursue below in greater detail. The window methods represent the exponent not as in the M-ary method by digits to a fixed base M, but by digits of varying binary lengths. Thus, for example, long sequences of binary zeros, called zero windows, can appear as digits of the exponent. If we recall the M-ary algorithm from page 89, it is clear that for a zero window of length l only the l-fold repetition of squaring is required, and the corresponding step is then

Equation 6.19. 

Addition Chains and Windows

Digits different from zero will be treated, depending on the process, either as windows of fixed length or as variable windows with a maximal length. As with the M-ary process, for every nonzero window (in the following called not quite aptly a "1-window") of length t, in addition to the repeated squaring an additional multiplication by a precalculated factor is required, in analogy to the corresponding step of the 2k-ary procedure:

Equation 6.20. 

Addition Chains and Windows

The number of factors to be precomputed depends on the permitted maximal length of the 1-window. One should note that the 1-windows in the least-significant position always have a 1 and thus are always odd. The factorization of the exponent digit on page 89 into an even and odd factor will thus at first not be needed. On the other hand, in the course of exponentiation the exponent is processed from the most-significant to least-significant place, which means for the implementation that first the complete decomposition of the exponent must be carried out and stored before the actual exponentiation can take place.

Yet, if we begin the factorization of the exponent at the most-significant digit and travel from left to right, then every 0- or 1-window can be processed immediately, as soon as it is complete. This means, of course, that we will also obtain 1-windows with an even value, but the exponentiation algorithm is prepared for that.

Both directions of decomposition of the exponent into 1-windows with fixed length l follow essentially the same algorithm, which we formulate below for decomposition from right to left.

Decomposition of an integer e into 0-windows and 1-windows having fixed length l

  1. If the least-significant binary digit is equal to 0, then begin a 0-window and go to step 2; otherwise, begin a 1-window and go to step 3.

  2. Add the next-higher binary digits in a 0-window as long as no 1 appears. If a 1 appears, then close the 0-window, begin a 1-window, and go to step 3.

  3. Collect a further l − 1 binary digits into a 1-window. If the next-higher digit is a 0, begin a 0-window and go to step 2; otherwise, begin a 1-window and go to step 3. If in the process all digits of e have been processed, then terminate the algorithm.

The decomposition from left to right begins with the most-significant binary digit and otherwise proceeds analogously. If we suppose that e has no leading binary zeros, then the algorithm cannot reach the end of the representation of e within step 2, and the procedure terminates in step 3 under the same condition given there. The following examples illustrate this process:

  • Let e = 1896837 = (111001111000110000101)2, and let l = 3. Beginning with the least-significant binary digit, e is decomposed as follows:

    Equation 6.21. 

    Addition Chains and Windows

    The choice l = 4 leads to the following decomposition of e:

    Equation 6.22. 

    Addition Chains and Windows

    The 2k-ary exponentiation considered above yields, for example for k = 2, the following decomposition:

    Equation 6.23. 

    Addition Chains and Windows

    The window decomposition of e for l = 3 contains five 1-windows, while that for l = 4 has only four, and for each the same number of additional multiplications is required. On the other hand, the 22-ary decomposition of e contains eight 1-windows, requires double the number of additional multiplications compared to the case l = 4, and is thus significantly less favorable.

  • The same procedure, but beginning with the most-significant binary digit, yields for l = 4 and e = 123 the decomposition

    Equation 6.24. 

    Addition Chains and Windows

    likewise with four 1-windows, which, as already established above, are not all odd.

Finally, then, exponentiation with a window decomposition of the exponent can be formalized by the following algorithm. Both directions of window decomposition are taken into account.

Algorithm for exponentiation ae mod m with the representation of e in windows of (maximal) length l for odd 1-windows

  1. Decompose the exponent e into 0- and 1-windows (ωk−1 . . .ω0) of respective lengths lk−1, . . . ,l0.

  2. Calculate and store a3 mod m, a5 mod m, a7 mod m, . . ., a2l−1 mod m.

  3. Set paωk−1 mod m and ik−2.

  4. Set ppli mod m.

  5. If ωi ≠ 0, set ppaωi mod m.

  6. Set ii - 1; if i ≥ 0, go to step 4.

  7. Output p.

If not all 1-windows are odd, then steps 3 through 6 are replaced by the following, and there is no step 7:

3′. If ωk−1 = 0, set pp2lk−1 mod
Addition Chains and Windows
(lk−1 times) mod m. If ωk−1 ≠ 0, factor ωk−1 = 2tu with odd u; set pau mod m, and then pp2t mod m. In each case set ik − 2.
4′. If ωi = 0, set pp2li mod
Addition Chains and Windows
(li times) mod m.
If ω ≠ 0, factor ωi = 2tu with odd u; set pp2lit mod m, and then ppau mod m; now set pp2t mod m.
5′. Set ii − 1; if i ≥ 0, go to step 4′.
6′. Output p.

Montgomery Reduction and Exponentiation

Now we are going to abandon addition chains and turn our attention to another idea, one that is interesting above all from the algebraic point of view. It makes it possible to replace multiplications modulo an odd number n by multiplications modulo a power of 2, that is, 2k, which requires no explicit division and is therefore more efficient than a reduction modulo an arbitrary number n. This useful method for modular reduction was published in 1985 by P. Montgomery [Mont] and since then has found wide practical application. It is based on the following observation.

Let n and r be relatively prime integers, and let r−1 be the multiplicative inverse of r modulo n; and likewise let n−1 be the multiplicative inverse of n modulo r; and furthermore, define n′ := -n−1 mod r and m := tn′ mod r. For integers t we then have

Equation 6.25. 

Montgomery Reduction and Exponentiation

Note that on the left side of the congruence we have taken congruences modulo r and a division by r (note that t + mn ≡ 0 mod r, so the division has no remainder), but we have not taken congruences modulo n. By choosing r as a power of 2 in the form 2s we can reduce a number x modulo r simply by slicing off x at the sth bit (counting from the least-significant bit), and we can carry out the division of x by r by shifting x to the right by s bit positions. The left side of (6.8) thus requires significantly less computational expense than the right side, which is what gives the equation its charm. For the two required operations we can invoke the functions mod2_l() (cf. Section 4.3) and shift_l() (cf. Section 7.1).

This principle of carrying out reduction modulo n is called Montgomery reduction. Below, we shall institute Montgomery reduction for the express purpose of speeding up modular exponentiation significantly in comparison to our previous results. Since the procedure requires that n and r be relatively prime, we must take n to be odd. First we have to deal with a couple of considerations.

We can clarify the correctness of the previous congruence with the help of some simple checking. Let us replace m on the left-hand side of (6.8) by the expression tn′ mod r, which is (6.9), and further, replace tn′ mod r by tn′r

Montgomery Reduction and Exponentiation

Equation 6.26. 

Montgomery Reduction and Exponentiation

To summarize equation (6.8) we record the following: Let n, t, r

Montgomery Reduction and Exponentiation

Equation 6.27. 

Montgomery Reduction and Exponentiation

we have

Equation 6.28. 

Montgomery Reduction and Exponentiation

We shall return to this result later.

To apply Montgomery reduction we shift our calculations modulo n into a complete residue system (cf. Chapter 5)

Equation 6.29. 

Montgomery Reduction and Exponentiation

with a suitable r := 2s>0 such that 2s−1n < 2s. Then we define the Montgomery product "×" of two numbers a and b in R:

Equation 6.30. 

Montgomery Reduction and Exponentiation

with r−1 representing the multiplicative inverse of r modulo n. We have

Equation 6.31. 

Montgomery Reduction and Exponentiation

and thus the result of applying × to members of R is again in R. The Montgomery product is formed by applying Montgomery reduction, where again n′ := −n−1 mod r. From n′ we derive the representation 1 = gcd(n, r) = r′rn′n, which we calculate in anticipation of Section 10.2 with the help of the extended Euclidean algorithm. From this representation of 1 we immediately obtain

1 ≡r′r mod n

and

1 ≡ −n′n mod r,

so that r′ = r−1 mod n is the multiplicative inverse of r modulo n, and n′ = −n−1 mod r the negative of the inverse of n modulo r (we are anticipating somewhat; cf. Section 10.2). The calculation of the Montgomery product now takes place according to the following algorithm.

Calculation of the Montgomery product a × b in R(r, n)

  1. Set tab.

  2. Set mtn′ mod r.

  3. Set u ← (t + mn)/r (the quotient is an integer; see above).

  4. If un, output un, and otherwise u. Based on the above selection of the parameter we have a, b < n as well as m, n < r and finally u < 2n; cf. (6.21).

The Montgomery product requires three long-integer multiplications, one in step 1 and two for the reduction in steps 2 and 3. An example with small numbers will clarify the situation: Let a = 386, b = 257, and n = 533. Further, let r = 210. Then n′ = −n−1 mod r = 707, m = 6, t + mn = 102400, and u = 100.

A modular multiplication ab mod n with odd n can now be carried out by first transforming a′ar mod n and b′br mod n to R, there forming the Montgomery product p′a′ × b′ = a′b′r−1 mod n and then with pp′ × 1 = p′r−1 = ab mod n obtaining the desired result. However, we can spare ourselves the reverse transformation effected in the last step by setting pa′ × b at once and thus avoid the transformation of b, so that in the end we have the following algorithm.

Calculation of p = ab mod n (n odd) with the Montgomery product

  1. Determine r := 2s with 2s−1 ≤ n < 2s. Calculate 1 = r′rn′n by means of the extended Euclidean algorithm.

  2. Set a′ar mod n.

  3. Set pa′ × b and output p.

Again we present an example with small numbers for clarification: Let a = 123, b = 456, n = 789, r = 210. Then n′ = −n−1 mod r = 963, a′ = 501, and p = a′ × b = 69 = ab mod n.

Since the precalculation of r′ and n′ in steps 1 and 2 is very time-consuming and Montgomery reduction in this version also has two long-number multiplications on its balance sheet, there is actually an increased computational expenditure compared with "normal" modular multiplication, so that the computation of individual products with Montgomery reduction is not worthwhile.

However, in cases where many modular multiplications with a constant modulus are required, for which therefore the time-consuming precalculations occur only once, we may expect more favorable results. Particularly suited for the Montgomery product is modular exponentiation, for which we shall suitably modify the M-ary algorithm. To this end let once again e = (em−1em−2 . . . e0)B and n = (nl−1nl−2 . . . n0)B be the representations of the exponent e and the modulus n to the base B = 2k. The following algorithm calculates powers ae mod n in • n with odd n using Montgomery multiplication. The squarings that occur in the exponentiation become Montgomery products a × a, in the computation of which we can use the advantages of squaring.

Exponentiation modulo n (n odd) with the Montgomery product

  1. Set rBl = 2kl. Calculate 1 = rr′nn′ with the Euclidean algorithm.

  2. Set

    Montgomery Reduction and Exponentiation
  3. If em−1 ≠ 0, factor em−1 = 2tu with odd u. Set

    Montgomery Reduction and Exponentiation

    If em−1 = 0, set

    Montgomery Reduction and Exponentiation

    In each case set im − 2.

  4. If ei = 0, set

    Montgomery Reduction and Exponentiation

    If ei ≠ 0 factor ei = 2tu with odd u. Set

    Montgomery Reduction and Exponentiation
  5. If i ≥ 0, set ii − 1 and go to step 4.

  6. Output the Montgomery product

    Montgomery Reduction and Exponentiation

Further possibilities for improving the algorithm lie less in the exponentiation algorithm than in the implementation of the Montgomery product itself, as demonstrated by S. R. Dussé and B. S. Kaliski in [DuKa]: In calculating the Montgomery product on page 108, in step 2 we can avoid the assignment mtn′ mod r in the reduction modulo r. Furthermore, we can calculate with n′0 := n′ mod B instead of with n′ in executing the Montgomery reduction.

We can create a digit mitin′0 modulo B, multiply it by n, scale by the factor Bi, and add to t. To calculate ab mod n with a, b < n the modulus n has the representation n = (nl−1nl−2 . . . n0)B as above, and we let r := Bl as well as rr′nn′ = 1 and n′0 := n′ mod B.

Calculation of the Montgomery product a × b à la Dussé and Kaliski

  1. Set tab, n′0n′ mod B, i ← 0.

  2. Set mitin′0 mod B (mi is a one-digit integer).

  3. Set tt + minBi.

  4. Set ii + 1; if il − 1, go to step 2.

  5. Set tt/r.

  6. If tn, output tn and otherwise t.

Dussé and Kaliski state that the basis for their clever simplification is the method of Montgomery reduction to develop t as a multiple of r, but they offer no proof. Before we use this procedure we wish to make more precise why it suffices to calculate a × b. The following is based on a proof of Christoph Burnikel [Zieg]:

In steps 2 and 3 the algorithm calculates a sequence
Montgomery Reduction and Exponentiation
by means of the recursion

where

Equation 6.32. 

Montgomery Reduction and Exponentiation

is the already familiar function that is induced by the Montgomery equation (cf. (6.13), and there set rB in f(t)). The members of the sequence t(i) have the properties

Equation 6.33. 

Montgomery Reduction and Exponentiation

Properties (6.18) and (6.19) are derived inductively from (6.14), (6.15), (6.16), and (6.17); from (6.18) we obtain

Montgomery Reduction and Exponentiation

Equation 6.34. 

Montgomery Reduction and Exponentiation

(note here that t(0) = ab < n2 < nBl).

The expenditure for the reduction is now determined essentially by multiplication of numbers of order of magnitude the size of the modulus. This variant of Montgomery multiplication can be elegantly implemented in code that forms the core of the multiplication routine mul_l() (cf. page 36).

Function:

Montgomery product

Syntax:

void mulmon_l (CLINT a_l, CLINT b_l, CLINT n_l,
               USHORT nprime, USHORT logB_r,
               CLINT p_l);

Input:

a_l, b_l (factors a and b)
n_l (modulus n > a, b)
nprime (n′ mod B)
logB_r (logarithm of r to base B = 216;
it must hold that BlogB_r−1  ≤ n < BlogB_r)

Output:

p_l (Montgomery product a × b = a•b•r−1 mod n)
void
mulmon_l (CLINT a_l, CLINT b_l, CLINT n_l, USHORT nprime,
         USHORT logB_r, CLINT p_l)
{
 CLINTD t_l;
 clint *tptr_l, *nptr_l, *tiptr_l, *lasttnptr, *lastnptr;
 ULONG carry;
 USHORT mi;
 int i;

 mult (a_l, b_l, t_l);
 lasttnptr = t_l + DIGITS_L (n_l);
 lastnptr = MSDPTR_L (n_l);

Note

The earlier use of mult() makes possible the multiplication of a_l and b_l without the possibility of overflow (see page 72); for the Montgomery squaring we simply insert sqr(). The result has sufficient space in t_l. Then t_l is given leading zeros to bring it to double the number of digits of n_l if t_l is smaller than this.

for (i = DIGITS_L (t_l) + 1; i <= (DIGITS_L (n_l) << 1); i++)
 {
  t_l[i] = 0;
 }
SETDIGITS_L (t_l, MAX (DIGITS_L (t_l), DIGITS_L (n_l) << 1));

Note

Within the following double loop the partial products minBi with mi := tin′0 are calculated one after the other and added to t_l. Here again the code is essentially that of our multiplication function.

for (tptr_l = LSDPTR_L (t_l); tptr_l <= lasttnptr; tptr_l++)
 {
  carry = 0;
  mi = (USHORT)((ULONG)nprime * (ULONG)*tptr_l);

  for (nptr_l = LSDPTR_L (n_l), tiptr_l = tptr_l;
    nptr_l <= lastnptr; nptr_l++, tiptr_l++)
   {
    *tiptr_l = (USHORT)(carry = (ULONG)mi * (ULONG)*nptr_l +
      (ULONG)*tiptr_l + (ULONG)(USHORT)(carry >> BITPERDGT));
   }

Note

In the following inner loop a possible overflow is transported to the most-significant digit of t_l, and t_l contains an additional digit in case it is needed. This step is essential, since at the start of the main loop t_l was given a value and not initialized via multiplication by 0 as was the variable p_l.

for ( ;
  ((carry >> BITPERDGT) > 0) && tiptr_l <= MSDPTR_L (t_l);
  tiptr_l++)
 {
  *tiptr_l = (USHORT)(carry = (ULONG)*tiptr_l +
                      (ULONG)(USHORT)(carry >> BITPERDGT));
 }
if (((carry >> BITPERDGT) > 0))
 {
  *tiptr_l = (USHORT)(carry >> BITPERDGT);
  INCDIGITS_L (t_l);
 }
}

Note

Now follows division by Bl, and we shift t_l by logB_r digits to the right, or ignore the logB_r least-significant digits of t_l. Then if applicable the modulus n_l is subtracted from t_l before t_l is returned as result into p_l.

tptr_l = t_l + (logB_r);
SETDIGIT_L (tptr_l, DIGITS_L (t_l) - (logB_r));
if (GE_L (tptr_l, n_l))
 {
  sub_l (tptr_l, n_l, p_l);
 }
else
 {
  cpy_l (p_l, tptr_l);
 }
}

The Montgomery squaring sqrmon_l() differs from this function only marginally: There is no parameter b_l in the function call, and instead of multiplication with mult(a_l, b_l, t_l) we employ the squaring function sqr(a_l, t_l), which likewise ignores a possible overflow. However, in modular squaring in the Montgomery method one must note that after the calculation of p′a′ × a′ the reverse transformation pp′ × 1 = p′ r− 1 = a2 mod n must be calculated explicitly (cf. page 108).

Function:

Montgomery square

Syntax:

void sqrmon_l (CLINT a_l, CLINT n_l, USHORT nprime,
               USHORT logB_r, CLINT p_l);

Input:

a_l (factor a), n_l (modulus n > a)
nprime (n′ mod B)
logB_r (logarithm of r to base B = 216);
it must hold that BlogB_r−1 ≤ n < BlogB_r

Output:

p_l (Montgomery square a2r−1 mod n)

In their article Dussé and Kaliski also present the following variant of the extended Euclidean algorithm, to be dealt with in detail in Section 10.2, for calculating n′0 = n′ mod B, with which the expenditure for the precalculations can be reduced. The algorithm calculates −n−1 mod 2s for an s > 0 and for this requires long-number arithmetic.

Algorithm for calculating the inversen−1 mod 2s for s > 0, n odd

  1. Set x ← 2, y ← 1, and i ← 2.

  2. If x < ny mod x, set y ← y + x.

  3. Set x ← 2x and i ← i + 1; if i ≤ s, go to step 2.

  4. Output x − y.

With complete induction it can be shown that in step 2 of this algorithm yn ≡ 1 mod x always holds, and thus y ≡ n−1 mod x. After x has taken on the value 2s in step 3, we obtain with 2sy−n−1 mod 2s the desired result if we choose s such that 2sB. The short function for this can be obtained under the name invmon_l() in the FLINT/C source. It takes only the modulus n as argument and outputs the value −n−1 mod B.

These considerations are borne out in the creation of the functions mexp5m_l() and mexpkm_l(), for which we give here only the interface, together with a computational example.

Function:

modular exponentiation with odd modulus(25 -ary or 2k -ary method with Montgomery product)

Syntax:

int mexp5m_l (CLINT bas_l, CLINT exp_l,
               CLINT p_l, CLINT m_l);
int mexpkm_l (CLINT bas_l, CLINT exp_l,
               CLINT p_l, CLINT m_l);

Input:

bas_l (base)
exp_l (exponent)
m_l (modulus)

Output:

p_l (power residue)

Return:

E_CLINT_OK if all is ok
E_CLINT_DBZ if division by 0
E_CLINT_MAL if malloc() error
E_CLINT_MOD if even modulus

These functions employ the routines invmon_l(), mulmon_l(), and sqrmon_l() to compute the Montgomery products. Their implementation is based on the functions mexp5_l() and mexpk_l() modified according to the exponentiation algorithm described above.

We would like to reconstruct the processes of Montgomery exponentiation in mexpkm_l() with the same numerical example that we looked at for M -ary exponentiation (cf. page 100). In the following steps we shall calculate the power 1234667 mod 18577:

  1. Precomputations

    The exponent e = 667 is represented to the base 2k with k = 2 (cf. the algorithm for Montgomery exponentiation on page 114). The exponent e thereby has the representation

    Equation 6.35. 

    Montgomery Reduction and Exponentiation

    The value r for Montgomery reduction is r = 216 = B = 65536.

    The value n′0 (cf. page 110) is now calculated as n′0 = 34703.

    The transformation of the base a into the residue system R(r, n) (cf. page 107) follows from

    Equation 6.36. 

    Montgomery Reduction and Exponentiation

    The power ā3 in R(r, n) has the value ā3 = 9227. Because of the small exponent, further powers of ā do not arise in the precomputation.

  2. Exponentiation loop

    exponent digit ei = 2tu

    21 • 1

    21 • 1

    20 • 1

    21 • 1

    20 • 3

    Montgomery Reduction and Exponentiation

    16994

    3682

    14511

    11066

    Montgomery Reduction and Exponentiation

    6646

    12834

    Montgomery Reduction and Exponentiation

    5743

    15740

    8707

    16923

    1583

    Montgomery Reduction and Exponentiation

    9025

    11105

    1628

  3. Result

    The value of the power p after normalization:

    Equation 6.37. 

    Montgomery Reduction and Exponentiation

Those interested in reconstructing the coding details of the functions mexp5m_l() and mexpkm_l() and the calculational steps of the example related to the function mexpkm_l() are referred to the FLINT/C source code.

At the start of this chapter we developed the function wmexp_l(), which has the advantage for small bases that only multiplications p ← pa mod m of the type CLINT * USHORT mod CLINT occur. In order to profit from the Montgomery procedure in this function, too, we adjust the modular squaring to Montgomery squaring, as in mexpkm_l(), with the use of the fast inverse function invmon_l(), though we leave the multiplication unchanged. We can do this because with the calculational steps for Montgomery squaring and for conventional multiplication modulo n,

Equation 6.38. 

Montgomery Reduction and Exponentiation

we do not abandon the residue system R(r, n) = {ir mod n | 0 ≤ i < n } introduced above. This process yields us both the function wmexpm_l() and the dual function umexpm_l() for USHORT exponents, respectively for odd moduli, which in comparison to the two conventional functions wmexp_l() and umexp_l() again yields a significant speed advantage. For these functions, too, we present here only the interface and a numerical example. The reader is again referred to the FLINT/C source for details.

Function:

modular exponentiation with Montgomery reduction for USHORT-base, respectively USHORT exponents and odd modulus

Syntax:

int wmexpm_l (USHORT bas, CLINT e_l,
               CLINT p_l, CLINT m_l);
int umexpm_l (CLINT bas_l, USHORT e,
               CLINT p_l, CLINT m_l);

Input:

bas, bas_l (base)
e, e_l (exponent)
m_l (modulus)

Output:

p_l (residue of base_l mod m_l, resp. bas_le mod m_l)

Return:

E_CLINT_OK if all is ok
E_CLINT_DBZ if division by 0
E_CLINT_MOD if even modulus

The function wmexpm_l() is tailor-made for our primality test in Section 10.5, where we shall profit from our present efforts. The function will be documented with the example used previously of the calculation of 1234667 mod 18577.

  1. Precalculations

    The binary representation of the exponent is e = (1010011011)2.

    The value r for the Montgomery reduction is r = 216 = B = 65536.

    The value n′0 (cf. page 110) is calculated as above, yielding n′0 = 34703.

    The initial value of

    Montgomery Reduction and Exponentiation
  2. Exponentiation loop

    Exponent bit

    1

    0

    1

    0

    0

    1

    1

    0

    1

    1

    Montgomery Reduction and Exponentiation

    9805

    9025

    16994

    11105

    3682

    6646

    14511

    1628

    11066

    9350

    Montgomery Reduction and Exponentiation

    5743

    15740

    8707

    16923

    1349

    1583

  3. Result

    The value of the exponent p after normalization:

    Equation 6.39. 

    Montgomery Reduction and Exponentiation

A detailed analysis of the time behavior of Montgomery reduction with the various optimizations taken into account can be found in [Boss]. There we are promised a ten to twenty percent saving in time over modular exponentiation by using Montgomery multiplication. As can be seen in the overviews in Appendix D of typical calculation times for FLINT/C functions, our implementations bear out this claim fully. To be sure, we have the restriction that the exponentiation functions that use Montgomery reduction can be used only for odd moduli. Nonetheless, for many applications, for example for encryption and decryption, as well as for computing digital signatures according to the RSA procedure (see Chapter 17), the functions mexp5m_l() and mexpkm_l() are the functions of choice.

Altogether, we have at our disposal a number of capable functions for modular exponentiation. To obtain an overview, in Table 6-5 we collect these functions together with their particular properties and domains of application.

Table 6-5. Exponentiation functions in FLINT/C

Function:

Domain of Application

mexp5_l()

General 25-ary exponentiation, without memory allocation, greater stack requirements.

mexpk_l()

General 2k -ary exponentiation with optimal k for CLINT numbers, with memory allocation, lower stack requirements.

mexp5m_l()

25 -ary Montgomery exponentiation for odd moduli, without memory allocation, greater stack requirements.

mexpkm_l()

2k -ary Montgomery exponentiation for odd moduli, with optimal k for CLINT numbers up to 4096 binary digits, with memory allocation, lower stack requirements.

umexp_l()

Mixed binary exponentiation of a CLINT base with USHORT exponent, lower stack requirements.

umexpm_l()

Mixed binary exponentiation of a CLINT base with USHORT exponent and Montgomery reduction, thus only for odd moduli, lower stack requirements.

wmexp_l()

Mixed binary exponentiation of a USHORT base with CLINT exponent, lower stack requirements.

wmexpm_l()

Mixed binary exponentiation with Montgomery squaring of a USHORT base with CLINT exponent, odd moduli, lower stack requirements.

mexp2_l()

Mixed exponentiation with a power-of-2 exponent, lower stack requirements.

Cryptographic Application of Exponentiation

We have worked hard in this chapter in our calculation of powers, and it is reasonable to ask at this point what modular exponentiation might have to offer to cryptographic applications. The first example to come to mind is, of course, the RSA procedure, which requires a modular exponentiation for encryption and decryption—assuming suitable keys. However, the author would like to ask his readers for a bit (or perhaps even a byte) of patience, since for the RSA procedure we still must collect a few more items, which we do in the next chapter. We shall return to this extensively in Chapter 17.

For those incapable of waiting, we offer as examples of the application of exponentiation two important algorithms, namely, the procedure suggested in 1976 by Martin E. Hellman and Whitfield Diffie [Diff] for the exchange of cryptographic keys and the encryption procedure of Taher ElGamal as an extension of the Diffie–Hellman procedure.

The Diffie–Hellman procedure represents a cryptographic breakthrough, namely, the first public key, or asymmetric, cryptosystem (see Chapter 17). Two years after its publication, Rivest, Shamir, and Adleman published the RSA procedure (see [Rive]). Variants of the Diffie–Hellman procedure are used today for key distribution in the Internet communications and security protocols IPSec, IPv6, and SSL, which were developed to provide security in the transfer of data packets in the IP protocol layer and the transfer of data at the application level, for example from the realms of electronic commerce. This principle of key distribution thus has a practical significance that would be difficult to overestimate.[16]

With the aid of the Diffie–Hellman protocol two communicators, Ms. A and Mr. B, say, can negotiate in a simple way a secret key that then can be used for the encryption of communications between the two. After A and B have agreed on a large prime number p and a primitive root a modulo p (we shall return to this below), the Diffie–Hellman protocol runs as follows.

Protocol for key exchange à la Diffie–Hellman

  1. A chooses an arbitrary value xA ≤ p − 1 and sends yA := axA mod p as her public key to B.

  2. B chooses an arbitrary value xBp − 1 and sends yB := axB mod p as his public key to A.

  3. A computes the secret key sA := yBxA mod p.

  4. B computes the secret key sB := yAxB mod p.

Since

Equation 6.40. 

Cryptographic Application of Exponentiation

after step 4, A and B are dealing with a common key. The values p and a do not have to be kept secret, nor the values yA and yB exchanged in steps 1 and 2. The security of the procedure depends on the difficulty in calculating discrete logarithms in finite fields, and the difficulty of breaking the system is equivalent to that of calculating values xA or xB from values yA or yB in • p.[17] That the calculation of axy from ax and ay in a finite cyclic group (the Diffie–Hellman problem) is just as difficult as the calculation of discrete logarithms and thus equivalent to this problem is, in fact, conjectured but has not been proved.

To ensure the security of the procedure under these conditions the modulus p must be chosen sufficiently large (at least 1024 bits, better 2048 or more; see Table 17-1), and one should ensure that p − 1 contains a large prime divisor close to (p − 1)/2 to exclude particular calculational procedures for discrete logarithms (a constructive procedure for such prime numbers will be presented in Chapter 17 in connection with the generation of strong primes, for example for the RSA procedure).

The procedure has the advantage that secret keys can be generated as needed on an ad hoc basis, without the need for secret information to be held for a long time. Furthermore, for the procedure to be used there are no further infrastructure elements necessary for agreeing on the parameters a and b. Nonetheless, this protocol possesses some negative characteristics, the gravest of which is the lack of authentication proofs for the exchanged parameters yA and yB. This makes the procedure susceptible to man-in-the-middle attacks, whereby attacker X intercepts the messages of A and B with their public keys yA and yB and replaces them with falsified messages to A and B containing his own public key yX.

Then A and B calculate "secret" keys s′A := yXxA mod p and s′B := yXxB mod p, which X on his or her part calculates s′A from yAxAaxAxXaxXxAyXxAs′A mod p and s′B analogously. The Diffie–Hellman protocal has now been executed not between A and B, but between X and A as well as between X and B. Now X is in a position to decode messages from A or B and to replace them by falsified messages to A or B. What is fatal is that from a cryptographic point of view the participants A and B are clueless as to what has happened.

To compensate for these defects without giving up the advantages, several variants and extensions have been developed for use in the Internet. They all take into account the necessity that key information be exchanged in such a way that its authenticity can be verified. This can be achieved, for example, by the public keys being digitally signed by the participants and the associated certificate of a certification authority being sent with them (see in this regard page 400, Section 17.3), which is implemented, for example, in the SSL protocol. IPSec and IPv6 use a complexly constructed procedure with the name ISAKMP/Oakley,[18] which overcomes all the drawbacks of the Diffie–Hellman protocol (for details see [Stal], pages 422–423).

To determine a primitive root modulo p, that is, a value a whose powers ai mod p with i = 0, 1, ... , p − 2 constitute the entire set of elements of the multiplicative group • pX = {1, ... , p − 1 } (see in this regard Section 10.2), the following algorithm can be used (see [Knut], Section 3.2.1.2, Theorem C). It is assumed that the prime factorization p − 1 = P1e1 . . . pekk of the order of

Cryptographic Application of Exponentiation

Finding a primitive root modulo p

  1. Choose a random integer a € [0, p − 1] and set i ← 1.

  2. Compute ta(p−1)/pi mod p.

  3. If t = 1, go to step 1. Otherwise, set ii + 1. If ik, go to step 2. If i > k, output a and terminate the algorithm.

The algorithm is implemented in the following function.

Function:

ad hoc generation of a primitive root modulo p (2 < p prime)

Syntax:

int primroot_l (CLINT a_l, unsigned noofprimes,
               clint **primes_l);

Input:

noofprimes (number of distinct prime factors in p − 1, the order of the group)

primes_l (vector of pointers to CLINT objects, beginning with p − 1, then follow the prime divisors p1, ... , pk of the group order p − 1 = p1e1 ••• pkek, k = noofprimes)

Output:

a_l (primitive root modulo p_l)

Return:

E_CLINT_OK if all is ok

− 1 if p − 1 odd and thus p is not prime

int
primroot_l (CLINT a_l, unsigned int noofprimes, clint *primes_l[])
{
 CLINT p_l, t_l, junk_l;
 ULONG i;

 if (ISODD_L (primes_l[0]))
  {
   return −1;
  }

Note

primes_l[0] contains p − 1, from which we obtain the modulus in p_l.

cpy_l (p_l, primes_l[0]);
inc_l (p_l);
SETONE_L (a_l);

do
 {
  inc_l (a_l);

Note

As candidates a for the sought-after primitive root only natural numbers greater than or equal to 2 are tested. If a is a square, then a cannot be a primitive root modulo p, since then already a(p−1)/2 ≡ 1 mod p, and the order of a must be less than

Cryptographic Application of Exponentiation
if (issqr_l (a_l, t_l))
 {
  inc_l (a_l);
 }

i = 1;

Note

The calculation of ta(p − 1)/pi mod p takes place in two steps. All prime factors pi are tested in turn; we use Montgomery exponentiation. If a primitive root is found, it is output in a_l.

do
 {
  div_l (primes_l[0], primes_l[i++], t_l, junk_l);
  mexpkm_l (a_l, t_l, t_l, p_l);
 }
while ((i <= noofprimes) && !EQONE_L (t_l));
  }
 while (EQONE_L (t_l));

 return E_CLINT_OK;
}

As a second example for the application of exponentiation we consider the encryption procedure of ElGamal, which as an extension of the Diffie–Hellman procedure also provides security in the matter of the difficulty of computing discrete logarithms, since breaking the procedure is equivalent to solving the Diffie–Hellman problem (cf. page 119). Pretty good privacy (PGP), the workhorse known throughout the world for encrypting and signing e-mail and documents whose development goes back essentially to the work of Phil Zimmermann, uses the ElGamal procedure for key management (see [Stal], Section 12.1).

A participant A selects a public and associated private key as follows.

ElGamal key generation

  1. A chooses a large prime number p such that p − 1 has a large prime divisor close to (p − 1)/2 (cf. page 388) and a primitive root a of the multiplicative group

    Cryptographic Application of Exponentiation
  2. A chooses a random number x with 1 ≤ x > p − 1 and computes b := ax mod p with the aid of Montgomery exponentiation.

  3. As public key A uses the triple

    Cryptographic Application of Exponentiation

Using the public key triple

Cryptographic Application of Exponentiation

Protocol for encryption à la ElGamal

  1. B chooses a random number y with 1 ≤ y < p − 1.

  2. B calculates α := ay mod p and β := Mby mod p = M (ax)y mod p.

  3. B sends the cryptogram C := (α, β) to A.

  4. A computes from C the plain text using M = β/αx modulo p.

Since

Equation 6.41. 

Cryptographic Application of Exponentiation

the procedure works. The calculation of β/αx is carried out by means of a multiplication βαp−1−x modulo p.

The size of p should be, depending on the application, 1024 bits or longer (see Table 17-1), and for the encryption of different messages M1 and M2 unequal random values y1y2 should be chosen, since otherwise, from

Equation 6.42. 

Cryptographic Application of Exponentiation

it would follow that knowledge of M1 was equivalent to knowledge of M2. In view of the practicability of the procedure one should note that the cryptogram C is twice the size of the plain text M, which means that this procedure has a higher transmission cost than others.

The procedure of ElGamal in the form we have presented has an interesting weak point, which is that an attacker can obtain knowledge of the plain text with a small amount of information. We observe that the cyclic group

Cryptographic Application of Exponentiation
  1. axyU  (axU or ayU). This, and whether also βU, is tested with

  2. For all u

    Cryptographic Application of Exponentiation

One may ask how bad it might be if an attacker could gain such information about M. From the point of view of cryptography it is a situation difficult to accept, since the message space to be searched is reduced by half with little effort. Whether in practice this is acceptable certainly depends on the application. Surely, it is a valid reason to be generous in choosing the length of a key.

Furthermore, one can take some action against the weakness of the procedure, without, one hopes, introducing new, unknown, weaknesses: The multiplication Mby mod p in step 2 of the algorithm can be replaced with an encryption operation V (H (axy) , M) using a suitable symmetric encryption procedure V (such as Triple-DES, IDEA, or Rijndael, which has become the new advanced encryption standard; cf. Chapter 11) and a hash function H (cf. page 398) that so condenses the value axy that it can be used as a key for V.

So much for our examples of the application of modular exponentiation. In number theory, and therefore in cryptography as well, modular exponentiation is a standard operation, and we shall meet it repeatedly later on, in particular in Chapters 10 and 17. Furthermore, refer to the descriptions and numerous applications in [Schr] as well as in the encyclopedic works [Schn] and [MOV].



[14] If n possesses a representation n = (nk−1nk−2 . . . n0)2, then H(n) is defined as Σini (See [HeQu], Chapter 8.)

[15] If the input to such a problem is an integer n, then the number of digits of n can serve as a measure of the size of the input. There then exists a polynomial p such that the calculation time is bounded by p(log2 n). The difference whether the cost of solving the problem grows with n or with the number of digits of n is decisive.

[16] IP Security (IPSec), developed by the Internet Engineering Task Force (IETF), is, as an extensive security protocol, a part of the future Internet protocol IPv6. It was created so that it could also be used in the framework of the then current Internet protocol (IPv4). Secure Socket Layer (SSL) is a security protocol developed by Netscape that lies above the TCP protocol, which offers end-to-end security for applications such as HTTP, FTP, and SMTP (for all of this see [Stal], Chapters 13 and 14).

[17] For the problem of calculating discrete logarithms see [Schn], Section 11.6, as well as [Odly].

[18] ISAKMP: Internet Security Association and Key management Protocol.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.68