Chapter 1

Introduction

1.1 Multiple Precision Arithmetic

1.1.1 What Is Multiple Precision Arithmetic?

When we think of long-hand arithmetic such as addition or multiplication, we rarely consider the fact that we instinctively raise or lower the precision of the numbers we are dealing with. For example, in decimal we almost immediately can reason that 7 times 6 is 42. However, 42 has two digits of precision as opposed to the one digit we started with. Further multiplications of say 3 result in a larger precision result 126. In these few examples we have multiple precisions for the numbers we are working with. Despite the various levels of precision, a single subset1 of algorithms can be designed to accommodate them.

By way of comparison, a fixed or single precision operation would lose precision on various operations. For example, in the decimal system with fixed precision 6 · 7 = 2.

Essentially, at the heart of computer–based multiple precision arithmetic are the same long–hand algorithms taught in schools to manually add, subtract, multiply, and divide.

1.1.2 The Need for Multiple Precision Arithmetic

The most prevalent need for multiple precision arithmetic, often referred to as “bignum” math, is within the implementation of public key cryptography algorithms. Algorithms such as RSA [10] and Diffie–Hellman [11] require integers of significant magnitude to resist known cryptanalytic attacks. For example, at the time of this writing a typical RSA modulus would be at least greater than 10309. However, modern programming languages such as ISO C [17] and Java [18] only provide intrinsic support for integers that are relatively small and single precision.

The largest data type guaranteed to be provided by the ISO C programming language2 can only represent values up to 1019 as shown in Figure 1.1. On its own, the C language is insufficient to accommodate the magnitude required for the problem at hand. An RSA modulus of magnitude 1019 could be trivially factored3on the average desktop computer, rendering any protocol based on the algorithm insecure. Multiple precision algorithms solve this problem by extending the range of representable integers while using single precision data types.

image

Figure 1.1 Typical Data Types for the C Programming Language

Most advancements in fast multiple precision arithmetic stem from the need for faster and more efficient cryptographic primitives. Faster modular reduction and exponentiation algorithms such as Barrett’s reduction algorithm, which have appeared in various cryptographic journals, can render algorithms such as RSA and Diffie–Hellman more efficient. In fact, several major companies such as RSA Security, Certicom, and Entrust have built entire product lines on the implementation and deployment of efficient algorithms.

However, cryptography is not the only field of study that can benefit from fast multiple precision integer routines. Another auxiliary use of multiple precision integers is high precision floating point data types. The basic IEEE [12] standard

floating point type is made up of an integer mantissa q, an exponent e, and a sign bit s. Numbers are given in the form n = q · be · −1s, where b = 2 is the most common base for IEEE. Since IEEE floating point is meant to be implemented in hardware, the precision of the mantissa is often fairly small (23, 48, and 64 bits). The mantissa is merely an integer, and a multiple precision integer could be used to create a mantissa of much larger precision than hardware alone can efficiently support. This approach could be useful where scientific applications must minimize the total output error over long calculations.

Yet another use for large integers is within arithmetic on polynomials of large characteristic (i.e.,GF(p)[x] for large p). In fact, the library discussed within this text has already been used to form a polynomial basis library4.

1.1.3 Benefits of Multiple Precision Arithmetic

The benefit of multiple precision representations over single or fixed precision representations is that no precision is lost while representing the result of an operation that requires excess precision. For example, the product of two n–bit integers requires at least 2n bits of precision to be represented faithfully. A multiple precision algorithm would augment the precision of the destination to accommodate the result, while a single precision system would truncate excess bits to maintain a fixed level of precision.

It is possible to implement algorithms that require large integers with fixed precision algorithms. For example, elliptic curve cryptography (ECC) is often implemented on smartcards by fixing the precision of the integers to the maximum size the system will ever need. Such an approach can lead to vastly simpler algorithms that can accommodate the integers required even if the host platform cannot natively accommodate them5. However, as efficient as such an approach may be, the resulting source code is not normally very flexible. It cannot, at run time, accommodate inputs of higher magnitude than the designer anticipated.

Multiple precision algorithms have the most overhead of any style of arithmetic. For the the most part the overhead can be kept to a minimum with careful planning, but overall, it is not well suited for most memory starved platforms. However, multiple precision algorithms do offer the most flexibility in terms of the magnitude of the inputs. That is, the same algorithms based on multiple precision integers can accommodate any reasonable size input without the designer’sexplicit forethought. This leads to lower cost of ownership for the code, as it only has to be written and tested once.

1.2 Purpose of This Text

The purpose of this text is to instruct the reader regarding how to implement efficient multiple precision algorithms. That is, to explain a limited subset of the core theory behind the algorithms, and the various “housekeeping” elements that are neglected by authors of other texts on the subject. Several texts [1], [2] give considerably detailed explanations of the theoretical aspects of algorithms and often very little information regarding the practical implementation aspects.

In most cases, how an algorithm is explained and how it is actually implemented are two very different concepts. For example, the Handbook of Applied Cryptography (HAC), algorithm 14.7 on page 594, gives a relatively simple algorithm for performing multiple precision integer addition. However, the description lacks any discussion concerning the fact that the two integer inputs may be of differing magnitudes. As a result, the implementation is not as simple as the text would lead people to believe. Similarly, the division routine (algorithm 14.20, pp. 598) does not discuss how to handle sign or the dividend’s decreasing magnitude in the main loop (step #3).

Both texts also do not discuss several key optimal algorithms required, such as “Comba” and Karatsuba multipliers and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve any form of useful performance in non–trivial applications.

To solve this problem, the focus of this text is on the practical aspects of implementing a multiple precision integer package. As a case study, the “LibTom Math”6 package is used to demonstrate algorithms with real implementations7that have been field tested and work very well. The LibTom Math library is freely available on the Internet for all uses, and this text discusses a very large portion of the inner workings of the library.

The algorithms presented will always include at least one “pseudo–code” description followed by the actual C source code that implements the algorithm. The pseudo–code can be used to implement the same algorithm in other programming languages as the reader sees fit.

This text shall also serve as a walk–through of the creation of multiple precision algorithms from scratch, showing the reader how the algorithms fit together and where to start on various taskings.

1.3 Discussion and Notation

1.3.1 Notation

A multiple precision integer of n–digits shall be denoted as x = (xn−1, …,x1,x0)βand represent the integer ximageThe elements of the array x are said to be the radix β digits of the integer. For example, x = (1, 2, 3)10 would represent the integer 1 · 102 + 2 · 101 + 3 · 100 = 123.

The term “mp_int” shall refer to a composite structure that contains the digits of the integer it represents, and auxiliary data required to manipulate the data. These additional members are discussed further in section 2.2.1. For the purposes of this text, a “multiple precision integer” and an “mp_int” are assumed synonymous. When an algorithm is specified to accept an mp_int variable, it is assumed the various auxiliary data members are present as well. An expression of the type variablename.item implies that it should evaluate to the member named “item” of the variable. For example, a string of characters may have a member “length” that would evaluate to the number of characters in the string. If the string a equals hello, then it follows that a.length = 5.

For certain discussions, more generic algorithms are presented to help the reader understand the final algorithm used to solve a given problem. When an algorithm is described as accepting an integer input, it is assumed the input is a plain integer with no additional multiple precision members. That is, algorithms that use integers as opposed to mp_ints as inputs do not concern themselves with the housekeeping operations required such as memory management. These algorithms will be used to establish the relevant theory that will subsequently be used to describe a multiple precision algorithm to solve the same problem.

1.3.2 Precision Notation

The variable β represents the radix of a single digit of a multiple precision integer and must be of the form qp for q,pimage. A single precision variable must be able to represent integers in the range 0 image x < , while a double precision variable must be able to represent integers in the range 0 image x < 2. The extra radix–qfactor allows additions and subtractions to proceed without truncation of the carry. Since all modern computers are binary, it is assumed that q is two.

Within the source code that will be presented for each algorithm, the data type mp_digit will represent a single precision integer type, while the data type mp_word will represent a double precision integer type. In several algorithms (notably the Comba routines), temporary results will be stored in arrays of double precision mp_words. For the purposes of this text, xj will refer to the j′th digit of a single precision array, and image will refer to the j′th digit of a double precision array. Whenever an expression is to be assigned to a double precision variable, it is assumed that all single precision variables are promoted to double precision during the evaluation. Expressions that are assigned to a single precision variable are truncated to fit within the precision of a single precision data type.

For example, if β = 102, a single precision data type may represent a value in the range 0 ≤ x < 103, while a double precision data type may represent a value in the range 0 ≤ x < 105. Let a = 23 and b= 49 represent two single precision variables. The single precision product shall be written as ca · b, while the double precision product shall be written as ← a · b. In this particular case, = 1127 and c = 127. The most significant digit of the product would not fit in a single precision data type and as a result c.

1.3.3 Algorithm Inputs and Outputs

Within the algorithm descriptions all variables are assumed scalars of either single or double precision as indicated. The only exception to this rule is when variables have been indicated to be of type mp_int. This distinction is important, as scalars are often used as array indicies and various other counters.

1.3.4 Mathematical Expressions

The brackets imply an expression truncated to an integer not greater than the expression itself; for example,5.7= 5. Similarly, the brackets imply an expression rounded to an integer not less than the expression itself; for example, 5.1 = 6. Typically, when the / division symbol is used, the intention is to perform an integer division with truncation; for example, 5/2 = 2, which will often be written as 5/2 = 2 for clarity. When an expression is written as a fraction a real value division is implied; for example, 5/2 = 2.5.

The norm of a multiple precision integer, for example ||x||, will be used to represent the number of digits in the representation of the integer; for example, ||123||= 3 and ||79452|| = 5.

1.3.5 Work Effort

To measure the efficiency of the specified algorithms, a modified big–Oh notation is used. In this system, all single precision operations are considered to have the same cost8. That is, a single precision addition, multiplication, and division are assumed to take the same time to complete. While this is generally not true in practice, it will simplify the discussions considerably.

Some algorithms have slight advantages over others, which is why some constants will not be removed in the notation. For example, a normal baseline multiplication (section 5.2.1) requires O(n2) work, while a baseline squaring (section 5.3) requires O(image) work. In standard big–Oh notation, these would both be said to be equivalent to O(n2). However, in the context of this text, this is not the case, as the magnitude of the inputs will typically be rather small. As a result, small constant factors in the work effort will make an observable difference in algorithm efficiency.

All algorithms presented in this text have a polynomial time work level; that is, of the form O(nk) for image. This will help make useful comparisons in terms of the speed of the algorithms and how various optimizations will help pay off in the long run.

1.4 Exercises

Within the more advanced chapters a section is set aside to give the reader some challenging exercises related to the discussion at hand. These exercises are not designed to be prize-winning problems, but instead to be thought provoking. Wherever possible the problems are forward minded, stating problems that will be answered in subsequent chapters. The reader is encouraged to finish the exercises as they appear to get a better understanding of the subject material.

That being said, the problems are designed to affirm knowledge of a particular subject matter. Students in particular are encouraged to verify they can answer the problems correctly before moving on.

Similar to the exercises as described in [1, pp. ix], these exercises are given a scoring system based on the difficulty of the problem. However, unlike [1], the problems do not get nearly as hard. The scoring of these exercises ranges fromone (the easiest) to five (the hardest). Figure 1.2 summarizes the scoring system used.

image

Figure 1.2 Exercise Scoring System

Problems at the first level are meant to be simple questions the reader can answer quickly without programming a solution or devising new theory. These problems are quick tests to see if the material is understood. Problems at the second level are also designed to be easy, but will require a program or algorithm to be implemented to arrive at the answer. These two levels are essentially entry level questions.

Problems at the third level are meant to be a bit more difficult than the first two levels. The answer is often fairly obvious, but arriving at an exacting solution requires some thought and skill. These problems will almost always involve devising a new algorithm or implementing a variation of another algorithm previously presented. Readers who can answer these questions will feel comfortable with the concepts behind the topic at hand.

Problems at the fourth level are meant to be similar to those of the level–three questions except they will require additional research to be completed. The reader will most likely not know the answer right away, nor will the text provide the exact details of the answer until a subsequent chapter.

Problems at the fifth level are meant to be the hardest problems relative to all the other problems in the chapter. People who can correctly answer fifth–levelproblems have a mastery of the subject matter at hand.

Often problems will be tied together. The purpose of this is to start a chain of thought that will be discussed in future chapters. The reader is encouraged to answer the follow–up problems and try to draw the relevance of problems.

1.5 Introduction to LibTomMath

1.5.1 What Is LibTomMath?

LibTomMath is a free and open source multiple precision integer library written entirely in portable ISO C. By portable it is meant that the library does not contain any code that is computer platform dependent or otherwise problematic to use on any given platform.

The library has been successfully tested under numerous operating systems, including Unix9, Mac OS, Windows, Linux, Palm OS, and on standalone hardware such as the Gameboy Advance. The library is designed to contain enough functionality to be able to develop applications such as public key cryptosystems and still maintain a relatively small footprint.

1.5.2 Goals of LibTomMath

Libraries that obtain the most efficiency are rarely written in a high level programming language such as C. However, even though this library is written entirely in ISO C, considerable care has been taken to optimize the algorithm implementations within the library. Specifically, the code has been written to work well with the GNU C Compiler (GCC) on both x86 and ARM processors. Wherever possible, highly efficient algorithms, such as Karatsuba multiplication, sliding window exponentiation, and Montgomery reduction have been provided to make the library more efficient.

Even with the nearly optimal and specialized algorithms that have been included, the application programing interface (API) has been kept as simple as possible. Often, generic placeholder routines will make use of specialized algorithms automatically without the developer’s specific attention. One such example is the generic multiplication algorithm mp_mul(), which will automatically use Toom–Cook, Karatsuba, Comba, or baseline multiplication based on the magnitude of the inputs and the configuration of the library.

Making LibTomMath as efficient as possible is not the only goal of the LibTomMath project. Ideally, the library should be source compatible with another popular library, which makes it more attractive for developers to use. In this case, the MPI library was used as an API template for all the basic functions. MPI was chosen because it is another library that fits in the same niche as LibTomMath. Even though LibTomMath uses MPI as the template for the function names and argument passing conventions, it has been written from scratch by Tom St Denis.

The project is also meant to act as a learning tool for students, the logic being that no easy–to–follow “bignum” library exists that can be used to teach computer science students how to perform fast and reliable multiple precision integer arithmetic. To this end, the source code has been given quite a few comments and algorithm discussion points.

1.6 Choice of LibTomMath

LibTomMath was chosen as the case study of this text not only because the author of both projects is one and the same, but for more worthy reasons. Other libraries such as GMP [13], MPI [14], LIP [16], and OpenSSL [15] have multiple precision integer arithmetic routines but would not be ideal for this text for reasons that will be explained in the following sub–sections.

1.6.1 Code Base

The LibTomMath code base is all portable ISO C source code. This means that there are no platform-dependent conditional segments of code littered throughout the source. This clean and uncluttered approach to the library means that a developer can more readily discern the true intent of a given section of source code without trying to keep track of what conditional code will be used.

The code base of LibTomMath is well organized. Each function is in its own separate source code file, which allows the reader to find a given function very quickly. On average there are 76 lines of code per source file, which makes the source very easily to follow. By comparison, MPI and LIP are single file projects making code tracing very hard. GMP has many conditional code segments segments that also hinder tracing.

When compiled with GCC for the x86 processor and optimized for speed, the entire library is approximately 100KiB10, which is fairly small compared to GMP(over 250KiB). LibTomMath is slightly larger than MPI (which compiles to about 50KiB), but is also much faster and more complete than MPI.

1.6.2 API Simplicity

LibTomMath is designed after the MPI library and shares the API design. Quite often, programs that use MPI will build with LibTomMath without change. The function names correlate directly to the action they perform. Almost all of the functions share the same parameter passing convention. The learning curve is fairly shallow with the API provided, which is an extremely valuable benefit for the student and developer alike.

The LIP library is an example of a library with an API that is awkward to work with. LIP uses function names that are often “compressed” to illegible shorthand. LibTomMath does not share this characteristic.

The GMP library also does not return error codes. Instead, it uses a POSIX.1 signal system where errors are signaled to the host application. This happens to be the fastest approach, but definitely not the most versatile. In effect, a math error (i.e., invalid input, heap error, etc.) can cause a program to stop functioning, which is definitely undesirable in many situations.

1.6.3 Optimizations

While LibTomMath is certainly not the fastest library (GMP often beats LibTomMath by a factor of two), it does feature a set of optimal algorithms for tasks such as modular reduction, exponentiation, multiplication, and squaring. GMP and LIP also feature such optimizations, while MPI only uses baseline algorithms with no optimizations. GMP lacks a few of the additional modular reduction optimizations that LibTomMath features11.

LibTomMath is almost always an order of magnitude faster than the MPI library at computationally expensive tasks such as modular exponentiation. In the grand scheme of “bignum” libraries, LibTomMath is faster than the average library and usually slower than the best libraries such as GMP and OpenSSL by only a small factor.

New Developments

Since the writing of the original manuscript, a new project, TomsFastMath, has been created. It is directly derived from LibTomMath, with a major focus on multiplication, squaring, and reduction performance. It relaxes the portability requirements to use inline assembly for performance. Readers are encouraged to check out this project at http://tfm.libtomcrypt.com to see how far performance can go with the code in this book.

1.6.4 Portability and Stability

LibTomMath will build “out of the box” on any platform equipped with a modern version of the GNU C Compiler (GCC). This means that without changes the library will build without configuration or setting up any variables. LIP and MPI will build “out of the box” as well but have numerous known bugs. Most notably, the author of MPI has recently stopped working on his library, and LIP has long since been discontinued.

GMP requires a configuration script to run and will not build out of the box. GMP and LibTomMath are still in active development and are very stable across a variety of platforms.

1.6.5 Choice

LibTomMath is a relatively compact, well–documented, highly optimized, and portable library, which seems only natural for the case study of this text. Various source files from the LibTomMath project will be included within the text. However, readers are encouraged to download their own copies of the library to actually be able to work with the library.


1With the occasional optimization.

2As per the ISO C standard. However, each compiler vendor is allowed to augment the precision as they see fit.

3A Pollard–Rho factoring would take only 216 time.

4See http://poly.libtomcrypt.org for more details.

5For example, the average smartcard processor has an 8—bit accumulator.

6Available at http://math.libtomcrypt.com

7In the ISO C programming language.

8Except where explicitly noted.

9All of these trademarks belong to their respective rightful owners.

10The notation “KiB” means 210 octets, similarly “MiB” means 220 octets.

11At the time of this writing, GMP only had Barrett and Montgomery modular reduction algorithms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.126.56