Solving Large Sparse Linear Systems over Finite Rings

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.7. Solving Large Sparse Linear Systems over Finite Rings

So far we have seen many algorithms which require solving large systems of linear equations (or congruences). The number n of unknowns in such systems can be as large as several millions. Standard Gaussian elimination on such a system takes time O(n³) and space O(n²). There are asymptotically faster algorithms like Strassen’s method [292] that takes time O(n^2.807) and Coppersmith and Winograd’s method [60] having a running time of O(n^2.376). Unfortunately, these asymptotic estimates do not show up in the range of practical interest. Moreover, the space requirements of these asymptotically faster methods are prohibitively high (though still O(n²)).

Luckily enough, cryptanalytic algorithms usually deal with coefficient matrices that are sparse: that is, that have only a small number of non-zero entries in each row. For example, consider the system of linear congruences available from the relation collection stage of an ICM for solving the DLP over a finite field . The factor base consists of a subexponential (in lg q) number of elements, whereas each relation involves at most O(lg q) non-zero coefficients. Furthermore, the sparsity of the resulting matrix A is somewhat structured in the sense that the columns of A corresponding to larger primes in the factor base tend to have fewer numbers of non-zero entries. In this regard, we refer to the interesting analysis by Odlyzko [225] in connection with the Coppersmith method (Section 4.4.4). Odlyzko took m = 2n equations in n unknown indices and showed that about n/4 columns of A are expected to contain only zero coefficients, implying that these variables never occurred in any relation collected. Moreover, about 0.346n columns of A are expected to have only single non-zero coefficients.

The sparsity (as well as the structure of the sparsity) of the coefficient matrix A can be effectively exploited and the system can be solved in time O^~(n²). In this section, we describe some special algorithms for large sparse linear systems. In what follows, we assume that we want to compute the unknown n-dimensional column vector x from the given system of equations

Ax = b,

where A is an m × n matrix, m ≥ n, and where b is a non-zero m-dimensional column vector. Though this is not the case in general, we will often assume for the sake of simplicity that A has full rank (that is, n). We write vectors as column vectors, that is, an l-dimensional vector v with elements v₁, . . . , v_l is written as v = (v₁ v₂ . . . v_l)^t, where the superscript ^t denotes matrix transpose.

Before we proceed further, some comments are in order. First note that our system of equations is often one over the finite ring which is not necessarily a field. Most of the methods we describe below assume that is a field, that is, r is a prime. If r is composite, we can do the following. First, assume that the prime factorization , α_i > 0, of r is known. In that case, we first solve the system over the fields for i = 1, . . . , s. Then for each i we lift the solution modulo p_i to the solution modulo . Finally, all these lifted solutions are combined using the CRT to get the solution modulo r.

Hensel lifting can be used to lift a solution of the system Ax ≡ b (mod p) to a solution of Ax ≡ b (mod p^α), where p is a prime and . We proceed by induction on α. Let us denote the (or a) solution of Ax ≡ b (mod p) by x₁, which can be computed by solving a system in the field . Now, assume that for some we know (integer) vectors x₁, . . . , x_i such that

Equation 4.14

We then attempt to compute a vector x_i+1 such that

Equation 4.15

Congruence (4.14) shows that the elements of A, x₁, . . . , x_i, b can be so chosen (as integers) that for some vector y_i we have the equality

A(x₁ + px₂ + ··· + p^i–1x_i) = b – pⁱy_i

in . Substituting this in Congruence (4.15) gives Ax_i+1 ≡ y_i (mod p). Thus the (incremental) vector x_i+1 can be obtained by solving a linear system in .

It, therefore, suffices to know how to solve linear congruences modulo a prime p. However, problems arise, when we do not know the factorization of r (while solving Ax ≡ b (mod r)). If r is large, it would be a heavy investment to make attempts to factor r. What can be done instead is the following. First, we use trial divisions to extract the small prime factors of r. We may, therefore, assume that r has no small prime factors. We proceed to solve Ax ≡ b (mod r) assuming that r is a prime (that is, that is a field). In a field, every non-zero element is invertible. But if r is composite, there are non-zero elements which are not invertible (that is, for which gcd(a, r) > 1). If, during the course of the computation, we never happen to meet (and try to invert) such non-zero non-invertible elements, then the computation terminates without any trouble. Otherwise, such an element a corresponds to a non-trivial factor gcd(a, r) of r. In that case, we have a partial factorization of r and restart solving the system modulo each suitable factor of r.

Some of the algorithms we discuss below assume that A is a symmetric matrix. In our case, this is usually not the case. Indeed we have matrices A which are not even square. Both these problems can be overcome by trying to solve the modified system A^tAx = A^t b. If A has full rank, this leads to an equivalent system.

If r = 2 (as in the case of the QSM for factoring integers), using the special methods is often not recommended. In this case, the elements of A are bits and can be packed compactly in machine words, and addition of rows can be done word-wise (say, 32 bits at a time). This leads to an efficient implementation of ordinary Gaussian elimination, which usually runs faster than the more complicated special algorithms described below, at least for the sizes of practical systems.

In what follows, we discuss some well-known methods for solving large sparse linear systems over finite fields (typically prime fields). In order to simplify notations, we will refrain from writing the matrix equalities as congruences, but treat them as equations over the underlying finite fields.

4.7.1. Structured Gaussian Elimination

Structured Gaussian elimination is applied to a sparse system before one of the next three methods is employed to solve the system. If the sparsity of A has some structures (as discussed earlier), then structured Gaussian elimination tends to reduce the size of the system considerably, while maintaining the sparsity of the system. We now describe the essential steps of structured Gaussian elimination. Let us define the weight of a row or column of a matrix to be the number of non-zero entries in that row or column.

First we delete all the columns (together with the corresponding variables) that have weight 0. These variables never occur in the system and need not be considered at all.

Next we delete all the columns that have weight 1 and the rows corresponding to the non-zero entries in these columns. Each such deleted column correspond to a variable x_i that appears in exactly one equation. After the rest of the system is solved, the value of x_i is obtained by back substitution. Deleting some rows in the matrix in this step may expose some new columns of weight 1. So this step should be repeated, until all the columns have weight > 1.

Now, choose each row with weight 1. This gives a direct solution for the variable x_i corresponding to the non-zero entry of the row. We then substitute this value of x_i in all the equations where it occurs and subsequently delete the ith column. We repeat this step, until all rows are of weight > 1.

At this point, the system usually has many more equations than variables. We may make the system a square one by throwing away some rows. Since subtracting multiples of rows of higher weights tends to increase the number of non-zero elements in the matrix, we should throw away the rows with higher weights. While discarding the excess rows, we should be careful to ensure that we are not left with a matrix having columns of weight 0. Some columns in the reduced system may again happen to have weight 1. Thus, we have to repeat the above steps again. And again and again and . . . , until we are left with a square matrix each row and column of which has weight ≥ 2.

This procedure leads to a system which is usually much smaller than the original system. In a typical example quoted in Odlyzko [225], structured Gaussian elimination reduces a system with 16,500 unknowns to one with less that 1,000 unknowns. The resulting reduced system may be solved using ordinary Gaussian elimination which, for smaller systems, appears to be much faster than the following sophisticated methods.

4.7.2. The Conjugate Gradient Method

The conjugate gradient method was originally proposed to solve a linear system Ax = b over for an n × n (that is, square) symmetric positive definite matrix A and for a nonzero vector b and is based on the idea of minimizing the quadratic function . The minimum is attained, when the gradient ∇f = Ax – b equals zero, which corresponds to the solution of the given system.

The conjugate gradient method is an iterative procedure. The iterations start with an initial minimizer x₀ which can be any n-dimensional vector. As the iterations proceed, we obtain gradually improved minimizers x₀, x₁, x₂, . . . , until we reach the solution. We also maintain and update two other sequences of vectors e_i and d_i. The vector e_i stands for the error b – Ax_i, whereas the vectors d₀, d₁, . . . constitute a set of mutually conjugate (that is, orthogonal) directions. We initialize e₀ = d₀ = b – Ax₀ and for i = 0, 1, . . . repeat the steps of Algorithm 4.8, until e_i = 0. We denote the inner product of two vectors v = (v₁ v₂ . . . v_n)^t and w = (w₁ w₂ . . . w_n)^t by .

Algorithm 4.8. An iteration in the conjugate gradient method

a_i := 〈e_i, e_i〉/〈d_i, Ad_i〉.

x_i+1 := x_i + a_id_i.

e_i+1 := e_i – a_iAd_i.

b_i := 〈e_i+1, e_i+1〉/〈e_i, e_i〉.

d_i+1 := e_i+1 + b_id_i.

This method computes a set of mutually orthogonal directions d₀, d₁, . . . , and hence it has to stop after at most n – 1 iterations, since we run out of new orthogonal directions after n – 1 iterations. Provided that we work with infinite precision, we must eventually obtain e_i = 0 for some i, 0 ≤ i ≤ n – 1.

If A is sparse, that is, if each row of A has O(log^c n) non-zero entries, c being a positive constant, then the product Ad_i can be computed using O^~(n) field operations. Other operations clearly meet this bound. Since at most n – 1 iterations are necessary, the conjugate gradient method terminates after performing O^~(n²) field operations.

We face some potential problems, when we want to apply this method to solve a system over a finite field . First, the matrix A is usually not symmetric and need not even be square. This problem can be avoided by solving the system A^tAx = A^t b. The new coefficient matrix A^tA may be non-sparse (that is, dense). So instead of computing and working with A^tA explicitly, we compute the product (A^tA)d_i as A^t (Ad_i), that is, we avoid multiplication by a (possibly) dense matrix at the cost of multiplications by two sparse matrices.

The second difficulty with a finite field is that the question of minimizing an -valued function makes hardly any sense (and so does positive definiteness of a matrix over ). However, the conjugate gradient method is essentially based on the generation of a set of mutually orthogonal vectors d₀, d₁, . . . . This concept continues to make sense in the setting of a finite field.

If A is a real positive definite matrix, we cannot have 〈d_i, Ad_i〉 = 0 for a nonzero vector d_i. But this condition need not hold for a matrix A over . Similarly, we may have a non-zero error vector e_i over , for which 〈e_i, e_i〉 = 0. (Again this is not possible for real vectors.) So for the iterations over (more precisely, the computations of a_i and b_i) to proceed gracefully, all that we can hope for is that before reaching the solution we never hit a non-zero direction vector d_i for which 〈d_i, Ad_i〉 = 0 nor a non-zero error vector e_i for which 〈e_i, e_i〉 = 0. If q is sufficiently large and if the initial minimizer x₀ is sufficiently randomly chosen, then the probability of encountering such a bad d_i or e_i is rather low and as a result the method is very likely to terminate without problems. If, by a terrible stroke of bad luck, we have to abort the computation prematurely, we should restart the procedure with a new random initial vector x₀. If q is small (say q = 2 as in the case of the QSM), it is a neater idea to select the entries of the initial vector x₀ from a field extension and work in this extension. The eventual solution we will reach at will be in , but working in the larger field decreases the possibility of an attempt of division by 0.

There is, however, a brighter side of using a finite field in place of , namely every calculation we perform in is exact, and we do not have to bother about a criterion for determining whether an error vector e_i is zero or about the conditioning of the matrix A. One of the biggest headaches of numerical analysis is absent here.

4.7.3. The Lanczos Method

The Lanczos method is another iterative method quite similar to the conjugate gradient method. The basic difference between these methods lies in the way by which the mutually conjugate directions d₀, d₁, . . . are generated. For the Lanczos method, we start with the initializations: d₀ := b, , , x₀ = a₀d₀. Then, for i = 1, 2, . . . , we repeat the steps in Algorithm 4.9 as long as .

Algorithm 4.9. An iteration in the Lanczos method

v_i+1 := Ad_i.

x_i := x_i–1 + a_id_i.

If A is a real positive definite matrix, the termination criterion is equivalent to the condition d_i = 0. When this is satisfied, the vector x_i–1 equals the desired solution x of the system Ax = b. Since d₀, d₁, . . . are mutually orthogonal, the process must stop after at most n – 1 iterations. Therefore, for a sparse matrix A, the entire procedure performs O^~(n²) field operations.

The problems we face with the Lanczos method applied to a system over are essentially the same as those discussed in connection with the conjugate gradient method. The problem with a non-symmetric and/or non-square matrix A is solved by multiplying the system by A^t. Instead of working with A^tA explicitly, we prefer to multiply separately by A and A^t.

The more serious problem with a system over is that of encountering a non-zero direction vector d_i with . If it happens, we have to abort the computation prematurely. In order to restart the procedure, we try to solve the system BAx = Bb, where B is a diagonal matrix whose diagonal elements are chosen randomly from the non-zero elements of the field or of some suitable extension (if q is small).

4.7.4. The Wiedemann Method

The Wiedemann method for solving a sparse system Ax = b over uses ideas different from those employed by the other methods discussed so far. For the sake of simplicity, we assume that A is a square non-singular matrix (not necessarily symmetric). The Wiedemann method tries to compute the minimal polynomial , d ≤ n, of A. To that end, one selects a small positive integer l in the range 10 ≤ l ≤ 20. For , let v_i denote the column vector of length l consisting of the first l entries of the vector Aⁱb. For the working of the Wiedemann method, we need to compute only the vectors v₀, . . . , v_2n. If A is a sparse matrix, this computation involves a total of O^~(n²) operations in .

Since μ_A(A) = 0, we have for every . Therefore, for each k = 1, . . . , l the sequence v_0,k, v_1,k, . . . of the k-th entries of v₀, v₁, . . . satisfies the linear recurrence

But then the minimal polynomial μ_k(X) of the k-th such sequence is a factor of μ_A(X). There are methods that compute each μ_k(X) using O(n²) field operations. We then expect to obtain μ_A(X) = lcm(μ_k(X) | 1 ≤ k ≤ l).

The assumption that A is non-singular is equivalent to the condition that c₀ ≠ 0. In that case, the solution vector can be computed using O^~(n²) arithmetic operations in the field .

If A is singular, we may find out linear dependencies among the rows of A and subsequently throw away suitable rows. Doing this repeatedly eventually gives us a non-singular A. For further details on the Wiedemann method, see [303].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Solving Large Sparse Linear Systems over Finite Rings

Create new playlist

Sign In

Sign Up