Least squares approximation

Numerically, it is relatively simple to state the approximation problem for the least squares norm. This is the topic of this section.

Linear least squares approximation

In the context of linear least squares approximation, it is always possible to reduce the problem to solving a system of linear equations, as the following example shows:

Consider the sine function f(x) = sin(x) in the interval from 0 to 1. We choose as approximants the polynomials of second degree: {a0 + a1x + a2x2}. To compute the values [a0, a1, a2] that minimize this problem, we first form a 3 × 3 matrix containing the pairwise dot products (the integral of the product of two functions) of the basic functions {1, x, x2} in the given interval. Because of the nature of this problem, we obtain a Hilbert matrix of order 3:

[   < 1, 1 >    < 1, x >    < 1, x^2 > ]     [  1   1/2  1/3 ]
[   < x, 1 >    < x, x >    < x, x^2 > ]  =  [ 1/2  1/3  1/4 ]
[ < x^2, 1 >  < x^2, x >  < x^2, x^2 > ]     [ 1/3  1/4  1/5 ]

The right-hand side of the system is the column vector with the dot product of the sine function with each basic function in the given interval:

[   < sin(x), 1 > ]     [            1 - cos(1) ]
[   < sin(x), x > ]  =  [       sin(1) - cos(1) ]
[ < sin(x), x^2 > ]     [ 2*sin(1) + cos(1) - 2 ]

We compute the coefficients and the corresponding approximation polynomial as follows:

In [1]: import numpy as np, scipy.linalg as spla, 
   ...: matplotlib.pyplot as plt
In [2]: A = spla.hilbert(3); 
   ...: b = np.array([1-np.cos(1), np.sin(1)-np.cos(1), 
   ...:               2*np.sin(1)+ np.cos(1)-2])
In [3]: spla.solve(A, b)
Out[3]: array([-0.00746493,  1.09129978, -0.2354618 ])
In [4]: poly1 = np.poly1d(spla.solve(A, b)[::-1]); 
   ...: print poly1
         2
-0.2355 x + 1.091 x - 0.007465

In general, to resolve a linear least squares approximation problem for a basis with r elements, we need to solve a basic system of linear equations with r equations and r indeterminates. In spite of its apparent simplicity, this method is far from perfect. The two main reasons are the following:

  • The system may be ill-conditioned, as was the case in the previous example.
  • There is nonpermanence of the coefficients. The value of the coefficients depends very heavily on r. Increasing the dimension of the problem results in a new set of coefficients, different from the previous set.

Note

There are ways to remediate the ill-conditioning of the system. One standard procedure is to construct an orthogonal basis from the original with the Gram-Schmidth and modified Gram-Schmidt orthogonalization methods. This topic is beyond the scope of this monograph, but a good reference for these methods can be read in Chapter 1 of the book Numerical Analysis by Walter Gautschi, Birkhäuser, 1997.

A basis that always provides simple linear systems is the B-splines. All the systems involved are tridiagonal and thus are easily solvable without the need of complex operations. The object-oriented system coded in the scipy.interpolate module allows us to perform all these computations internally. This is a brief enumeration of the classes and subclasses involved:

  • UnivariateSpline for splines in one dimension, or splines of curves in any dimension. We seldom use this class directly and resort instead to the LSQUnivariateSpline subclass.
  • BivariateSpline for splines representing surfaces over nodes placed on a rectangle. As its univariate counterpart, this class must not be used directly. Instead, we utilize the LSQBivariateSpline subclass.
  • SphereBivariateSpline for splines representing surfaces over nodes placed on a sphere. The computations must be carried out through the LSQSphereBivariateSpline subclass instead.

Tip

In all three cases, the base classes and their methods are their counterparts in the problem of interpolation. Refer to the Interpolation section for more information.

Let us illustrate these object-oriented techniques with a few selected examples:

Approximate the same sine function on the same domain, with cubic splines (k = 3), in the sense of least squares. First, note that we must provide a bounding box, a set of knots on the domain, and the weights w for the least squares approximation. We are also allowed to provide an optional smoothness parameter s. If s = 0, we obtain interpolation instead, and for large values of s, we achieve different degrees of smoothness of the resulting spline. To obtain a reliable (weighted) least squares approximation, a good choice is s = len(w) (which is what the routine does by default). Note also how small the computed error is:

In [5]: f = np.sin; 
   ...: x = np.linspace(0,1,100); 
   ...: knots = np.linspace(0,1,7)[1:-1]; 
   ...: weights = np.ones_like(x)
In [6]: from scipy.interpolate import LSQUnivariateSpline
In [7]: approximant = LSQUnivariateSpline(x, f(x), knots, k=3, 
   ...:                                   w = weights, bbox = [0, 1])
In [8]: spla.norm(f(x) - approximant(x))
Out[8]: 3.370175009262551e-06

Tip

A more convenient way to compute this error of approximation is to use the .get_residual method, as follows:

In [9]: approximant.get_residual()**(.5)
Out[9]: 3.37017500928446e-06

Approximate the two-dimensional function sin(x)+sin(y) on the [-3,3] x [-3,3] domain. We first choose a representation of the domain, a set of 100 suitable knots on a grid, and the set of weights. Since all inputs to the LSQBivariateSpline function must be one-dimensional arrays, we perform the corresponding conversions prior to calling the approximation function:

In [10]: def f(x, y): return np.sin(x) + np.sin(y); 
   ....: t = np.linspace(-3, 3, 100); 
   ....: domain = np.meshgrid(t, t); 
   ....: X, Y = domain; 
   ....: Z = f(*domain)
In [11]: X = X.ravel(); 
   ....: Y = Y.ravel(); 
   ....: Z = Z.ravel()
In [12]: kx = np.linspace(-3,3,12)[1:-1]; 
   ....: ky = kx.copy(); 
   ....: weights = np.ones_like(Z);
In [13]: from scipy.interpolate import LSQBivariateSpline
In [14]: approximant = LSQBivariateSpline(X, Y, Z, kx, kx, 
   ....:                                  w = weights)
In [15]: approximant.get_residual()
Out[15]: 0.0

Tip

It is also possible to perform this computation with the RectBivariateSpline function. To achieve least squares interpolation, we provide nodes (instead of knots, since these will be computed automatically), weights w, and a smoothness parameter s that is sufficiently large. A good choice is s = len(w).

Nonlinear least squares approximation

In the context of nonlinear least square approximations, we do not usually have the luxury of simple matrix representations. Instead, we make use of two variations of an iterative process, the Levenberg-Marquardt algorithm, which is hosted in the scipy.optimize module. The two versions, which correspond to the LMDER and LMDIF routines from the Fortran library MINPACK, can be called through the leastsq wrapper.

The following table lists all the options to this function:

Option

Description

func

error function F(a)

x0

starting estimate for the minimization, of size r

args

extra arguments to func, as a tuple

Dfun

function representing the Jacobian matrix of func

full_output

Boolean

col_deriv

Boolean

ftol

relative error desired in the sums of squares

xtol

relative error desired in the approximate solution

gtol

orthogonality desired between func and the columns of Dfun

maxfev

maximum number of calls. If zero, the number of calls is 100*(r+1)

epsfcn

if Dfun=None, we may specify a floating-point value as the step in the forward-difference approximation of the Jacobian matrix

factor

floating-point value between 0.1 and 100, indicating the initial step bound

diag

scale factors for each of the variables

The first variant of the algorithm is used when we have a trusted Jacobian for the error function. If this is not provided, a second variant of the algorithm is used, which approximates the Jacobian by forward differences. We illustrate both variants with several examples.

Let us start revisiting a previous example with this method, in order to see the differences in usage and accuracy. We will focus the computations on a partition of the interval from 0 to 1, with 100 uniformly spaced points:

In [16]: from scipy.optimize import leastsq
In [17]: def error_function(a):
   ....:     return a[0] + a[1] * x + a[2] * x**2 - np.sin(x)
In [18]: def jacobian(a): return np.array([np.ones(100), x, x**2])
In [19]: coeffs, success = leastsq(error_function, np.zeros((3,)))
In [20]: poly2 = np.poly1d(coeffs[::-1]); print poly2
         2
-0.2354 x + 1.091 x - 0.007232
In [21]: coeffs, success = leastsq(error_function, np.zeros((3,)), 
   ....:                           Dfun = jacobian, col_deriv=True)
In [22]: poly3 = np.poly1d(coeffs[::-1]); 
   ...:  print poly3
         2
-0.2354 x + 1.091 x - 0.007232
In [23]: map(lambda f: spla.norm(np.sin(x) - f(x)), 
   ....:     [poly1, poly2, poly3])
Out[23]:
[0.028112146265269783, 0.02808377541388009, 0.02808377541388009]

Tip

There is another function in the scipy.optimize module to perform nonlinear least squares approximation: curve_fit. It uses the same algorithm, but instead of an error function, we feed it a generic approximant g[a](x), together with a suitable domain for the independent variable x, and the output of the target function f on the same domain. We do need to input an initial estimate as well. The output is, together with the required coefficients, an estimation of the covariance of the said coefficients.

In [23]: from scipy.optimize import curve_fit
In [24]: def approximant(t, a, b, c):
   ....:     return a + b*t + c*t**2
In [25]: curve_fit(approximant, x, np.sin(x), 
   ....:           np.ones((3,)))
(array([-0.007232  ,  1.09078356, -0.23537796]),
 array([[  7.03274163e-07,  -2.79884256e-06,
           2.32064835e-06],
        [ -2.79884256e-06,   1.50223585e-05, 
          -1.40659702e-05],
        [  2.32064835e-06,  -1.40659702e-05,  
           1.40659703e-05]]))

In this section, we focus on the leastsq function exclusively. The goals and coding of both the functions are the same, but leastsq offers a more informative output on demand and more control over the different parameters of the Levenberg-Marquardt algorithm.

Let us experiment now with a few actual nonlinear problems:

In the first example, we will approximate the tan(2*x) function in the interval from 0 to 1 with rational functions where each of the polynomials has an at most degree of 1:

In [26]: def error_function(a):
   ....:     return (a[0] + a[1]*x)/(a[2] + a[3]*x) – np.tan(2*x)
In [27]: def jacobian(a):
   ....:     numerator = a[0] + a[1]*x
   ....:     denominator = a[2] + a[3]*x
   ....:     return np.array( [ 1./denominator, x/denominator, 
   ....:                        -1.0*numerator/denominator**2, 
   ....:                        -1.0*x*numerator/denominator**2 ])

To show the dependence of the initial estimation, we are going to experiment with three different choices: one that makes no sense (all zero coefficients), another that is a blind standard choice (with all entries equal to one), and the other choice that acknowledges the fact that the tan(2*x) function has a vertical asymptote. We will pretend that we do not know the exact location and approximate it to 0.78. Our third initial estimation then represents a simple rational function with an asymptote at 0.78.

A wrong initial estimate does not give us anything useful, obviously:

In [28]: x1 = np.zeros((4,)); 
   ....: x2 = np.ones((4,)); 
   ....: x3 = np.array([1,0,0.78,-1])
In [29]: coeffs, success = leastsq(error_function, x1); 
   ....: numerator = np.poly1d(coeffs[1::-1]); 
   ....: denominator = np.poly1d(coeffs[:1:-1]); 
   ....: print numerator, denominator
0
0
In [30]: coeffs, success = leastsq(error_function, x1, 
   ....:                           Dfun=jacobian, col_deriv=True); 
   ....: numerator = np.poly1d(coeffs[1::-1]); 
   ....: denominator = np.poly1d(coeffs[:1:-1]); 
   ....: print numerator, denominator 
0
0

None of these two approximations using x2 as the initial guess are satisfactory: the corresponding errors are huge, and neither solution has an asymptote in the interval from 0 to 1.

In [31]: coeffs, success = leastsq(error_function, x2); 
   ....: numerator = np.poly1d(coeffs[1::-1]); 
   ....: denominator = np.poly1d(coeffs[:1:-1]); 
   ....: print numerator, denominator; 
   ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x))
-9.729 x + 4.28
-1.293 x + 1.986
Out[31]: 220.59056436054016
In [32]: coeffs, success = leastsq(error_function, x2, 
   ....:                           Dfun=jacobian, col_deriv=True); 
   ....: numerator = np.poly1d(coeffs[1::-1]); 
   ....: denominator = np.poly1d(coeffs[:1:-1]); 
   ....: print numerator, denominator; 
   ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x))
-655.9 x + 288.5
-87.05 x + 133.8
Out[32]: 220.590564978504

The approximations using x3 as the initial guess are closer to the target function, and both of them have an acceptable asymptote.

In [33]: coeffs, success = leastsq(error_function, x3); 
   ....: numerator = np.poly1d(coeffs[1::-1]); 
   ....: denominator = np.poly1d(coeffs[:1:-1]); 
   ....: print numerator, denominator; 
   ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x))
0.01553 x + 0.02421
-0.07285 x + 0.05721
Out[33]: 2.185984698129936
In [34]: coeffs, success = leastsq(error_function, x3, 
   ....:                           Dfun=jacobian, col_deriv=True); 
   ....: numerator = np.poly1d(coeffs[1::-1]); 
   ....: denominator = np.poly1d(coeffs[:1:-1]); 
   ....: print numerator, denominator; 
   ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x))
17.17 x + 26.76
-80.52 x + 63.24
Out[34]: 2.1859846981334954

We can do much better, of course, but these simple examples will suffice for now.

If we desire to output more information to monitor the quality of approximation, we may do so with the full_output option set to True:

In [35]: approximation_info = leastsq(error_function, x3, 
   ....:                              full_output=True)
In [36]: coeffs = approximation_info[0]; 
   ....: print coeffs
[ 0.02420694  0.01553346  0.0572128  -0.07284579]
In [37]: message = approximation_info[-2]; 
   ....: print message
Both actual and predicted relative reductions in the sum of squares
  are at most 0.000000
In [38]: infodict = approximation_info[2]; 
   ....: print 'The algorithm performed 
   ....: {0:2d} iterations'.format(infodict['nfev'])
The algorithm performed 97 iterations

Although technically, the leastsq algorithm deals mostly with approximation to univariate functions, it is possible to work on multivariate functions with the aid of indices, raveling, unpacking (with the special * operator), and stable sums.

Tip

The usual sum of the floating-point numbers of ndarray with the numpy instance method sum (or with the numpy function sum) is far from stable. We firmly advise against using it for fairly large sums of numbers. The following example shows an undesired scenario, in which we try to add 4000 values:

>>> arr=np.array([1,1e20,1,-1e20]*1000,dtype=np.float64)
>>> arr.sum()    # The answer should be, of course, 2000
0.0

To resolve this situation, we make use of stable sums. In the math module, there is an implementation of the Shewchuk algorithm for this very purpose:

>>> from math import fsum
>>> fsum(arr)
2000.0

For more information about the Shewchuk algorithm, as well as other common pitfalls to avoid in scientific computing with floating-point arithmetic, we recommend the excellent guide What Every Computer Scientist Should Know about Floating Point Arithmetic, by David Goldberg. ACM Computing Surveys, 1991. vol. 23, pp. 5-48.

This process is best explained with an example. We start by generating the target function: an image of size 32 × 32 containing white noise on top of the addition of three spherical Gaussian functions with different locations, variances, and heights. We collect all these values in a 3 × 4 array that we name values. The first and second columns contain the x and y values of the coordinates of the centers. The third column contains the heights, and the fourth column contains the variances.

In [39]: def sphericalGaussian(x0, y0, h, v):
   ....:     return lambda x,y: h*np.exp(-0.5*((x-x0)**2+(y-y0)**2)/v)
   ....:
In [40]: domain = np.indices((32, 32)); 
   ....: values = np.random.randn(3,4); 
   ....: values[:,:2] += np.random.randint(1, 32, size=(3, 2)); 
   ....: values[:,2] += np.random.randint(1, 64, size=3); 
   ....: values[:,3] += np.random.randint(1, 16, size=3); 
   ....: print values
[[ 17.43247918  17.15301326  34.86691265   7.84836966]
 [  5.5450271   20.68753512  34.41364835   4.78337552]
 [ 24.44909459  27.28360852   0.62186068   9.15251106]]
In [41]: img = np.random.randn(32,32)
In [42]: for k in xrange(3):
   ....:     img += sphericalGaussian(*values[k])(*domain)

Let us assume that we do not know the centers, heights, and variances, and wish to estimate them from the target image img. We then create an error function to compute the 12 coefficients that are packed in the 3 × 4 array a. Note the role of the numpy function ravel and the instance method reshape in ensuring that the data is handled correctly:

In [43]: from math import fsum
In [44]: def error_function(a):
   ....:     a = a.reshape(3,4)
   ....:     cx = a[:,0]    # x-coords
   ....:     cy = a[:,1]    # y-coords
   ....:     H = a[:,2]     # heights
   ....:     V = a[:,3]     # variances
   ....:     guess = np.zeros_like(img)
   ....:     for i in xrange(guess.shape[0]):
   ....:         for j in xrange(guess.shape[1]):
   ....:             arr = H*np.exp(-0.5*((i-cx)**2+(j-cy)**2)/V)
   ....:             guess[i,j] = fsum(arr)
   ....:     return np.ravel(guess-img)

Starting the process of least squares in this situation with guarantees of success requires a somewhat close initial guess. For this particular example, we are going to produce our initial guess from the array called values:

In [45]: x0 = np.vectorize(int)(values); 
   ....: print x0
[[17 17 34  7]
 [ 5 20 34  4]
 [24 27  0  9]]
In [46]: leastsq(error_function, x0)
Out[46]:
(array([ 17.43346907,  17.14219682,  34.82077187,   7.85849653,
         5.52511918,  20.68319748,  34.28559808,   4.8010449 ,
        25.19824918,  24.02286107,   3.87170006,   0.5289382 ]),
 1)

Let us now visually compare both the target image img and its minimization by the following procedure:

In [47]: coeffs, success = _; 
   ....: coeffs = coeffs.reshape(3,4)
In [48]: output = np.zeros_like(img)
In [49]: for k in xrange(3):
   ....:     output += sphericalGaussian(*coeffs[k])(*domain)
In [50]: plt.figure(); 
   ....: plt.subplot(121); 
   ....: plt.imshow(img); 
   ....: plt.title('target image'); 
   ....: plt.subplot(122); 
   ....: plt.imshow(output); 
   ....: plt.title('computed approximation'); 
   ....: plt.show()

This gives the following diagram:

Nonlinear least squares approximation
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.229.111