Chapter 2

Getting Started

2.1 Library Basics

The trick to writing any useful library of source code is to build a solid foundation and work outward from it. First, a problem along with allowable solution parameters should be identified and analyzed. In this particular case, the inability to accommodate multiple precision integers is the problem. Furthermore, the solution must be written as portable source code that is reasonably efficient across several different computer platforms.

After a foundation is formed, the remainder of the library can be designed and implemented in a hierarchical fashion. That is, to implement the lowest level dependencies first and work toward the most abstract functions last. For example, before implementing a modular exponentiation algorithm, one would implement a modular reduction algorithm. By building outward from a base foundation instead of using a parallel design methodology, you end up with a project that is highly modular. Being highly modular is a desirable property of any project as it often means the resulting product has a small footprint and updates are easy to perform.

Usually, when I start a project I will begin with the header files. I define the data types I think I will need and prototype the initial functions that are not dependent on other functions (within the library). After I implement these base functions, I prototype more dependent functions and implement them. The process repeats until I implement all the functions I require. For example, in the case of LibTomMath, I implemented functions such as mp_init() well before I implemented mp_mul(), and even further before I implemented mp_exptmod(). As an example as to why this design works, note that the Karatsuba and Toom-Cook multipliers were written after the dependent function mp_exptmod() was written. Adding the new multiplication algorithms did not require changes to the mp_exptmod() function itself and lowered the total cost of ownership and development (so to speak) for new algorithms. This methodology allows new algorithms to be tested in a complete framework with relative ease(Figure 2.1).

image

Figure 2.1: Design Flow of the First Few Original LibTomMath Functions.

Only after the majority of the functions were in place did I pursue a less hierarchical approach to auditing and optimizing the source code. For example, one day I may audit the multipliers and the next day the polynomial basis functions.

It only makes sense to begin the text with the preliminary data types and support algorithms required. This chapter discusses the core algorithms of the library that are the dependents for every other algorithm.

2.2 What Is a Multiple Precision Integer?

Recall that most programming languages, in particular ISO C [2], only have fixed precision data types that on their own cannot be used to represent values largerthan their precision will allow. The purpose of multiple precision algorithms is to use fixed precision data types to create and manipulate multiple precision integers that may represent values that are very large.

In the decimal system, the largest single digit value is 9. However, by concatenating digits together, larger numbers may be represented. Newly prepended digits (to the left) are said to be in a different power of ten column. That is, the number 123 can be described as having a 1 in the hundreds column, 2 in the tens column, and 3 in the ones column. Or more formally, 123 = 1 · 102 + 2 · 101 + 3 · 10°. Computer–based multiple precision arithmetic is essentially the same concept. Larger integers are represented by adjoining fixed precision computer words with the exception that a different radix is used.

What most people probably do not think about explicitly are the various other attributes that describe a multiple precision integer. For example, the integer 15410 has two immediately obvious properties. First, the integer is positive; that is, the sign of this particular integer is positive as opposed to negative. Second, the integer has three digits in its representation. There is an additional property that the integer possesses that does not concern pencil–and–paper arithmetic. The third property is how many digit placeholders are available to hold the integer.

A visual example of this third property is ensuring there is enough space on the paper to write the integer. For example, if one starts writing a large number too far to the right on a piece of paper, he will have to erase it and move left. Similarly, computer algorithms must maintain strict control over memory usage to ensure that the digits of an integer will not exceed the allowed boundaries. These three properties make up what is known as a multiple precision integer, or mp_int for short.

2.2.1 The mp_int Structure

The mp_int structure is the ISO C–based manifestation of what represents a multiple precision integer. The ISO C standard does not provide for any such data type, but it does provide for making composite data types known as structures. The following is the structure definition used within LibTomMath.

The mp_int structure (Figure 2.2) can be broken down as follows.

image

Figure 2.2: The mp_int Structure

• The used parameter denotes how many digits of the array dp contain the digits used to represent a given integer. The used count must be positive (or zero) and may not exceed the alloc count.

• The alloc parameter denotes how many digits are available in the array to use by functions before it has to increase in size. When the used count of a result exceeds the alloc count, all the algorithms will automatically increase the size of the array to accommodate the precision of the result.

• The pointer dp points to a dynamically allocated array of digits that represent the given multiple precision integer. It is padded with (alloc — used) zero digits. The array is maintained in a least significant digit order. As a pencil and paper analogy the array is organized such that the rightmost digits are stored first starting at the location indexed by zero1 in the array. For example, if dp contains {a,b,c,…} where dp0=a,dp1=b,dp2=c,… then it would represent the integer a + + 2 +…

• The sign parameter denotes the sign as either zero/positive (MP_ZPOS) or negative (MP_NEG).

Valid mp_int Structures

Several rules are placed on the state of an mp_int structure and are assumed to be followed for reasons of efficiency. The only exceptions are when the structure is passed to initialization functions such as mp_init() and mp_init_copy().

1. The value of alloc may not be less than one. That is, dp always points to a previously allocated array of digits.

2. The value of used may not exceed alloc and must be greater than or equal to zero.

3. The value of used implies the digit at index (used− 1) of the dp array is non–zero. That is, leading zero digits in the most significant positions must be trimmed.

(a) Digits in the dp array at and above the used location must be zero.

4. The value of sign must be MP_ZPOS if used is zero; this represents the mp_int value of zero.

2.3 Argument Passing

A convention of argument passing must be adopted early in the development of any library. Making the function prototypes consistent will help eliminate many headaches in the future as the library grows to significant complexity. In LibTomMath, the multiple precision integer functions accept parameters from left to right as pointers to mp_int structures. That means that the source (input) operands are placed on the left and the destination (output) on the right. Consider the following examples.

image

The left to right order is a fairly natural way to implement the functions since it lets the developer read aloud the functions and make sense of them. For example, the first function would read “multiply a and b and store in c.”

Certain libraries (LIP by Lenstra for instance) accept parameters the other way around, to mimic the order of assignment expressions. That is, the destination (output) is on the left and arguments (inputs) are on the right. In truth, it is entirely a matter of preference. In the case of LibTomMath the convention from the MPI library has been adopted.

Another very useful design consideration, provided for in LibTomMath, is whether to allow argument sources to also be a destination. For example, the second example (mp_add)adds a to b and stores in a. This is an important feature to implement since it allows the calling functions to cut down on the number ofvariables it must maintain. However, to implement this feature, specific care has to be given to ensure the destination is not modified before the source is fully read.

2.4 Return Values

A well–implemented application, no matter what its purpose, should trap as many runtime errors as possible and return them to the caller. By catching runtime errors a library can be guaranteed to prevent undefined behavior. However, the end developer can still manage to cause a library to crash. For example, by passing an invalid pointer an application may fault by dereferencing memory not owned by the application.

In the case of LibTomMath the only errors that are checked for are related to inappropriate inputs (division by zero for instance) and memory allocation errors. It will not check that the mp_int passed to any function is valid, nor will it check pointers for validity. Any function that can cause a runtime error will return an error code as an int data type with one of the values in Figure 2.3.

image

Figure 2.3: LibTomMath Error Codes

When an error is detected within a function, it should free any memory it allocated, often during the initialization of temporary mp_ints, and return as soon as possible. The goal is to leave the system in the same state it was when the function was called. Error checking with this style of API is fairly simple.

image

The GMP [2] library uses C style signals to flag errors, which is of questionable use. Not all errors are fatal and it was not deemed ideal by the author ofLibTomMath to force developers to have signal handlers for such cases.

2.5 Initialization and Clearing

The logical starting point when actually writing multiple precision integer functions is the initialization and clearing of the mp_int structures. These two algorithms will be used by the majority of the higher level algorithms.

Given the basic mp_int structure, an initialization routine must first allocate memory to hold the digits of the integer. Often it is optimal to allocate a sufficiently large pre–set number of digits even though the initial integer will represent zero. If only a single digit were allocated, quite a few subsequent reallocations would occur when operations are performed on the integers. There is a trade–off between how many default digits to allocate and how many reallocations are tolerable. Obviously, allocating an excessive amount of digits initially will waste memory and become unmanageable.

If the memory for the digits has been successfully allocated, the rest of the members of the structure must be initialized. Since the initial state of an mp_int is to represent the zero integer, the allocated digits must be set to zero, the used count set to zero, and sign set to MP_ZPOS.

2.5.1 Initializing an mp_int

An mp_int is said to be initialized if it is set to a valid, preferably default, state such that all the members of the structure are set to valid values. The mp_init algorithm will perform such an action (Figure 2.4).

image

Figure 2.4: Algorithm mp_init

Algorithm mp_init. The purpose of this function is to initialize an mp_int structure so that the rest of the library can properly manipulate it. It is assumed that the input may not have had any of its members previously initialized, which is certainly a valid assumption if the input resides on the stack.

Before any of the members such as sign, used, or alloc are initialized, the memory for the digits is allocated. If this fails, the function returns before setting any of the other members. The MP_PREC name represents a constant2used to dictate the minimum precision of newly initialized mp_int integers. Ideally, it is at least equal to the smallest precision number you′ll be working with.

Allocating a block of digits at first instead of a single digit has the benefit of lowering the number of usually slow heap operations later functions will have to perform in the future. If MP_PREC is set correctly, the slack memory and the number of heap operations will be trivial.

Once the allocation has been made, the digits have to be set to zero, and the used, sign, and alloc members initialized. This ensures that the mp_int will always represent the default state of zero regardless of the original condition of the input.

Remark. This function introduces the idiosyncrasy that all iterative loops, commonly initiated with the “for” keyword, iterate incrementally when the “to” keyword is placed between two expressions. For example, “for a from b to c do” means that a subsequent expression (or body of expressions) is to be evaluatedup to cb times so long as b≤ c. In each iteration, the variable a is substituted for a new integer that lies inclusively between b and c. If b> c occurred, the loop would not iterate. By contrast, if the “downto” keyword were used in place of “to,” the loop would iterate decrementally.

image

One immediate observation of this initialization function is that it does not return a pointer to a mp_int structure. It is assumed that the caller has already allocated memory for the mp_int structure, typically on the application stack. The call to mp_init() is used only to initialize the members of the structure to a known default state.

Here we see (line 24) the memory allocation is performed first. This allows us to exit cleanly and quickly if there is an error. If the allocation fails, the routine will return MP_MEM to the caller to indicate there was a memory error. The function XMALLOC is what actually allocates the memory. Technically

XMALLOC is not a function but a macro defined in tommath.h. By default, XMALLOC will evaluate to malloc(), which is the C library’s built-in memory allocation routine.

To assure the mp_int is in a known state, the digits must be set to zero. On most platforms this could have been accomplished by using calloc() instead of malloc(). However, to correctly initialize an integer type to a given value in a portable fashion, you have to actually assign the value. The for loop (line 30) performs this required operation.

After the memory has been successfully initialized, the remainder of the members are initialized (lines 34through 35) to their respective default states. At this point, the algorithm has succeeded and a success code is returned to the calling function. If this function returns MP_OKAY, it is safe to assume the mp_int structure has been properly initialized and is safe to use with other functions within the library.

2.5.2 Clearing an mp_int

When an mp_int is no longer required by the application, the memory allocated for its digits must be returned to the application’s memory pool with the mp_clear algorithm (Figure 2.5).

image

Figure 2.5 Algorithm mp_clear

Algorithm mp_clear. This algorithm accomplishes two goals. First, it clears the digits and the other mp_int members. This ensures that if a developer acci–dentally re–uses a cleared structure it is less likely to cause problems. The second goal is to free the allocated memory.

The logic behind the algorithm is extended by marking cleared mp_int structures so that subsequent calls to this algorithm will not try to free the memory multiple times. Cleared mp_ints are detectable by having a pre-defined invalid digit pointer dp setting.

Once an mp_int has been cleared, the mp_int structure is no longer in a valid state for any other algorithm with the exception of algorithms mp_init, mp_init_copy, mp_init_size, and mp_clear.

image

The algorithm only operates on the mp_int if it hasn′t been previously cleared. The if statement (line 25) checks to see if the dp member is not NULL. If the mp_int is a valid mp_int, then dp cannot be NULL, in which case the if statement will evaluate to true.

The digits of the mp_int are cleared by the for loop (line 27), which assigns a zero to every digit. Similar to mp_init(), the digits are assigned zero instead of using block memory operations (such as memset()) since this is more portable.

The digits are deallocated off the heap via the XFREE macro. Similar to XMALLOC, the XFREE macro actually evaluates to a standard C library function; in this case, free(). Since free() only deallocates the memory, the pointer still has to be reset to NULL manually (line 35).

Now that the digits have been cleared and deallocated, the other members are set to their final values (lines 36 and 37).

2.6 Maintenance Algorithms

The previous sections described how to initialize and clear an mp_int structure. To further support operations that are to be performed on mp_int structures (such as addition and multiplication), the dependent algorithms must be able to augment the precision of an mp_int and initialize mp_ints with differing initial conditions.

These algorithms complete the set of low–level algorithms required to work with mp_int structures in the higher level algorithms such as addition, multiplication, and modular exponentiation.

2.6.1 Augmenting an mp_int’s Precision

When you are storing a value in an mp_int structure, a sufficient number of digits must be available to accommodate the entire result of an operation without loss of precision. Quite often, the size of the array given by the alloc member is large enough to simply increase the used digit count. However, when the size of the array is too small it must be re-sized appropriately to accommodate the result. The mp_grow algorithm provides this functionality (Figure 2.6).

image

Figure 2.6 Algorithm mp_grow

Algorithm mp_grow. It is ideal to prevent reallocations from being performed if they are not required (step one). This is useful to prevent mp_ints from growing excessively in code that erroneously calls mp_grow.

The requested digit count is padded up to the next multiple of MP_PREC plus an additional MP_PREC (steps two and three). This helps prevent many trivial reallocations that would grow an mp_int by trivially small values.

It is assumed that the reallocation (step four) leaves the lower a.alloc digits of the mp_int intact. This is much akin to how the realloc function from the standard C library works. Since the newly allocated digits are assumed to contain undefined values, they are initially set to zero.

image

image

A quick optimization is to first determine if a memory reallocation is required at all. The if statement (line 24) checks if the alloc member of the mp_int is smaller than the requested digit count. If the count is not larger than alloc the function skips the reallocation part, thus saving time.

When a reallocation is performed, it is turned into an optimal request to save time in the future. The requested digit count is padded upwards to 2nd multiple of MP_PREC larger than alloc (line 25). The XREALLOC function is used to reallocate the memory. As per the other functions, XREALLOC is actually a macro that evaluates to realloc by default. The realloc function leaves the base of the allocation intact, which means the first alloc digits of the mp_int are the same as before the reallocation. All that is left is to clear the newly allocated digits and return.

Note that the reallocation result is actually stored in a temporary pointer tmp. This is to allow this function to return an error with a valid pointer. Earlier releases of the library stored the result of XREALLOC into the mp_int a. That would result in a memory leak if XREALLOC ever failed.

2.6.2 Initializing Variable Precision mp_ints

Occasionally, the number of digits required will be known in advance of an initialization, based on, for example, the size of input mp_ints to a given algorithm. The purpose of algorithm mp_init_size is similar to mp_init except that it will allocate at least a specified number of digits (Function 2.7).

Algorithm mp_init_size. This algorithm will initialize an mp_int structure a like algorithm mp_init, with the exception that the number of digits allocated can be controlled by the second input argument b. The input size is padded upwards so it is a multiple of MP_PREC plus an additional MP_PREC digits. This padding is used to prevent trivial allocations from becoming a bottleneck in the rest of the algorithms (Figure 2.7).

image

Figure 2.7 Algorithm mp_init_size

Like algorithm mp_init, the mp_int structure is initialized to a default state representing the integer zero. This particular algorithm is useful if it is known ahead of time the approximate size of the input. If the approximation is correct, no further memory reallocations are required to work with the mp_int.

image

image

The number of digits b requested is padded (line 24) by first augmenting it to the next multiple of MP_PREC and then adding MP_PREC to the result. If the memory can be successfully allocated, the mp_int is placed in a default state representing the integer zero. Otherwise, the error code MP_MEM will be returned (line 29).

The digits are allocated and set to zero at the same time with the calloc() function (line 27). The used count is set to zero, the alloc count is set to the padded digit count and the sign flag is set to MP_ZPOS to achieve a default valid mp_int state (lines 33,34, and 35). If the function returns successfully, then it is correct to assume that the mp_int structure is in a valid state for the remainder of the functions to work with.

2.6.3 Multiple Integer Initializations and Clearings

Occasionally, a function will require a series of mp_int data types to be made available simultaneously. The purpose of algorithm mp_init_multi (Figure 2.8) is to initialize a variable length array of mp_int structures in a single statement. It is essentially a shortcut to multiple initializations.

image

Figure 2.8 Algorithm mp_init_multi

Algorithm mp_init_multi. The algorithm will initialize the array of mp_int variables one at a time. If a runtime error has been detected (step 1.2), all of the previously initialized variables are cleared. The goal is an “all or nothing” initialization, which allows for quick recovery from runtime errors (Figure 2.8).

image

image

This function initializes a variable length list of mp_int structure pointers. However, instead of having the mp_int structures in an actual C array, they are simply passed as arguments to the function. This function makes use of the “image” argument syntax of the C programming language. The list is terminated with a final NULL argument appended on the right.

The function uses the “stdarg.h” va functions to step in a portable fashion through the arguments to the function. A count n of successfully initialized mp_int structures is maintained (line 48) such that if a failure does occur, the algorithm can backtrack and free the previously initialized structures (lines 28 to 47).

2.6.4 Clamping Excess Digits

When a function anticipates a result will be n digits, it is simpler to assume this is true within the body of the function instead of checking during the computation. For example, a multiplication of a i digit number by a j digit produces a result of at most i + j digits. It is entirely possible that the result is i + j − 1, though, with no final carry into the last position. However, suppose the destination had to be first expanded (via mp_grow)to accommodate i + j− 1 digits than further expanded to accommodate the final carry. That would be a considerable waste of time since heap operations are relatively slow.

The ideal solution is to always assume the result is i + j and fix up the used count after the function terminates. This way, a single heap operation (at most) is required. However, if the result was not checked there would be an excess high order zero digit.

For example, suppose the product of two integers was xn = (0xn−1xn−2x0)β. The leading zero digit will not contribute to the precision of the result. In fact, through subsequent operations more leading zero digits would accumulate to the point the size of the integer would be prohibitive. As a result, even though the precision is very low the representation is excessively large.

The mp_clamp algorithm is designed to solve this very problem. It will trim high-order zeros by decrementing the used count until a nons–zero most significant digit is found. Also in this system, zero is considered a positive number, which means that if the used count is decremented to zero, the sign must be set to MP_ZPOS.

Algorithm mp_clamp. As can be expected, this algorithm is very simple.

The loop in step one is expected to iterate only once or twice at the most. For example, this will happen in cases where there is not a carry to fill the last position. Step two fixes the sign for when all of the digits are zero to ensure that the mp_int is valid at all times (Figure 2.9).

image

Figure 2.9 Algorithm mp_clamp

image

Note on line 31 how to test for the used count is made on the left of the && operator. In the C programming language, the terms to && are evaluated left to right with a boolean short–circuit if any condition fails. This is important since if the used is zero, the test on the right would fetch below the array. That is obviously undesirable. The parenthesis on line 32 is used to make sure the used count is decremented and not the pointer “a”.

Exercises

1. Discuss the relevance of the used member of the mp_int structure.

1. Discuss the consequences of not using padding when performing allocations.

2. Estimate an ideal value for MP_PREC when performing 1024-bit RSA encryption when β = 228.

1. Discuss the relevance of the algorithm mp_clamp. What does it prevent?

1. Give an example of when the algorithm mp_init_copy might be useful.


1In C, all arrays begin at the zero index.

2Defined in the “tommath.h” header file within LibTomMath.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.139.169