Numerically, it is relatively simple to state the approximation problem for the least squares norm. This is the topic of this section.
In the context of linear least squares approximation, it is always possible to reduce the problem to solving a system of linear equations, as the following example shows:
Consider the sine function f(x) = sin(x) in the interval from 0 to 1. We choose as approximants the polynomials of second degree: {a0 + a1x + a2x2}. To compute the values [a0, a1, a2] that minimize this problem, we first form a 3 × 3 matrix containing the pairwise dot products (the integral of the product of two functions) of the basic functions {1, x, x2} in the given interval. Because of the nature of this problem, we obtain a Hilbert matrix of order 3:
[ < 1, 1 > < 1, x > < 1, x^2 > ] [ 1 1/2 1/3 ] [ < x, 1 > < x, x > < x, x^2 > ] = [ 1/2 1/3 1/4 ] [ < x^2, 1 > < x^2, x > < x^2, x^2 > ] [ 1/3 1/4 1/5 ]
The right-hand side of the system is the column vector with the dot product of the sine function with each basic function in the given interval:
[ < sin(x), 1 > ] [ 1 - cos(1) ] [ < sin(x), x > ] = [ sin(1) - cos(1) ] [ < sin(x), x^2 > ] [ 2*sin(1) + cos(1) - 2 ]
We compute the coefficients and the corresponding approximation polynomial as follows:
In [1]: import numpy as np, scipy.linalg as spla, ...: matplotlib.pyplot as plt In [2]: A = spla.hilbert(3); ...: b = np.array([1-np.cos(1), np.sin(1)-np.cos(1), ...: 2*np.sin(1)+ np.cos(1)-2]) In [3]: spla.solve(A, b) Out[3]: array([-0.00746493, 1.09129978, -0.2354618 ]) In [4]: poly1 = np.poly1d(spla.solve(A, b)[::-1]); ...: print poly1 2 -0.2355 x + 1.091 x - 0.007465
In general, to resolve a linear least squares approximation problem for a basis with r
elements, we need to solve a basic system of linear equations with r
equations and r
indeterminates. In spite of its apparent simplicity, this method is far from perfect. The two main reasons are the following:
There are ways to remediate the ill-conditioning of the system. One standard procedure is to construct an orthogonal basis from the original with the Gram-Schmidth and modified Gram-Schmidt orthogonalization methods. This topic is beyond the scope of this monograph, but a good reference for these methods can be read in Chapter 1 of the book Numerical Analysis by Walter Gautschi, Birkhäuser, 1997.
A basis that always provides simple linear systems is the B-splines. All the systems involved are tridiagonal and thus are easily solvable without the need of complex operations. The object-oriented system coded in the scipy.interpolate
module allows us to perform all these computations internally. This is a brief enumeration of the classes and subclasses involved:
UnivariateSpline
for splines in one dimension, or splines of curves in any dimension. We seldom use this class directly and resort instead to the LSQUnivariateSpline
subclass.BivariateSpline
for splines representing surfaces over nodes placed on a rectangle. As its univariate counterpart, this class must not be used directly. Instead, we utilize the LSQBivariateSpline
subclass.SphereBivariateSpline
for splines representing surfaces over nodes placed on a sphere. The computations must be carried out through the LSQSphereBivariateSpline
subclass instead.Let us illustrate these object-oriented techniques with a few selected examples:
Approximate the same sine
function on the same domain, with cubic splines (k = 3
), in the sense of least squares. First, note that we must provide a bounding box, a set of knots on the domain, and the weights w
for the least squares approximation. We are also allowed to provide an optional smoothness parameter s
. If s = 0
, we obtain interpolation instead, and for large values of s
, we achieve different degrees of smoothness of the resulting spline. To obtain a reliable (weighted) least squares approximation, a good choice is s = len(w)
(which is what the routine does by default). Note also how small the computed error is:
In [5]: f = np.sin; ...: x = np.linspace(0,1,100); ...: knots = np.linspace(0,1,7)[1:-1]; ...: weights = np.ones_like(x) In [6]: from scipy.interpolate import LSQUnivariateSpline In [7]: approximant = LSQUnivariateSpline(x, f(x), knots, k=3, ...: w = weights, bbox = [0, 1]) In [8]: spla.norm(f(x) - approximant(x)) Out[8]: 3.370175009262551e-06
Approximate the two-dimensional function sin(x)+sin(y)
on the [-3,3] x [-3,3]
domain. We first choose a representation of the domain, a set of 100 suitable knots on a grid, and the set of weights. Since all inputs to the LSQBivariateSpline
function must be one-dimensional arrays, we perform the corresponding conversions prior to calling the approximation function:
In [10]: def f(x, y): return np.sin(x) + np.sin(y); ....: t = np.linspace(-3, 3, 100); ....: domain = np.meshgrid(t, t); ....: X, Y = domain; ....: Z = f(*domain) In [11]: X = X.ravel(); ....: Y = Y.ravel(); ....: Z = Z.ravel() In [12]: kx = np.linspace(-3,3,12)[1:-1]; ....: ky = kx.copy(); ....: weights = np.ones_like(Z); In [13]: from scipy.interpolate import LSQBivariateSpline In [14]: approximant = LSQBivariateSpline(X, Y, Z, kx, kx, ....: w = weights) In [15]: approximant.get_residual() Out[15]: 0.0
It is also possible to perform this computation with the RectBivariateSpline
function. To achieve least squares interpolation, we provide nodes (instead of knots, since these will be computed automatically), weights w
, and a smoothness parameter s
that is sufficiently large. A good choice is s = len(w)
.
In the context of nonlinear least square approximations, we do not usually have the luxury of simple matrix representations. Instead, we make use of two variations of an iterative process, the Levenberg-Marquardt algorithm, which is hosted in the scipy.optimize
module. The two versions, which correspond to the LMDER
and LMDIF
routines from the Fortran library MINPACK
, can be called through the leastsq
wrapper.
The following table lists all the options to this function:
Option |
Description |
---|---|
|
error function |
|
starting estimate for the minimization, of size |
|
extra arguments to |
|
function representing the Jacobian matrix of |
|
Boolean |
|
Boolean |
|
relative error desired in the sums of squares |
|
relative error desired in the approximate solution |
|
orthogonality desired between |
|
maximum number of calls. If zero, the number of calls is |
|
if |
|
floating-point value between 0.1 and 100, indicating the initial step bound |
|
scale factors for each of the variables |
The first variant of the algorithm is used when we have a trusted Jacobian for the error function. If this is not provided, a second variant of the algorithm is used, which approximates the Jacobian by forward differences. We illustrate both variants with several examples.
Let us start revisiting a previous example with this method, in order to see the differences in usage and accuracy. We will focus the computations on a partition of the interval from 0 to 1, with 100 uniformly spaced points:
In [16]: from scipy.optimize import leastsq In [17]: def error_function(a): ....: return a[0] + a[1] * x + a[2] * x**2 - np.sin(x) In [18]: def jacobian(a): return np.array([np.ones(100), x, x**2]) In [19]: coeffs, success = leastsq(error_function, np.zeros((3,))) In [20]: poly2 = np.poly1d(coeffs[::-1]); print poly2 2 -0.2354 x + 1.091 x - 0.007232 In [21]: coeffs, success = leastsq(error_function, np.zeros((3,)), ....: Dfun = jacobian, col_deriv=True) In [22]: poly3 = np.poly1d(coeffs[::-1]); ...: print poly3 2 -0.2354 x + 1.091 x - 0.007232 In [23]: map(lambda f: spla.norm(np.sin(x) - f(x)), ....: [poly1, poly2, poly3]) Out[23]: [0.028112146265269783, 0.02808377541388009, 0.02808377541388009]
There is another function in the scipy.optimize
module to perform nonlinear least squares approximation: curve_fit
. It uses the same algorithm, but instead of an error function, we feed it a generic approximant g[a](x)
, together with a suitable domain for the independent variable x
, and the output of the target function f
on the same domain. We do need to input an initial estimate as well. The output is, together with the required coefficients, an estimation of the covariance of the said coefficients.
In [23]: from scipy.optimize import curve_fit In [24]: def approximant(t, a, b, c): ....: return a + b*t + c*t**2 In [25]: curve_fit(approximant, x, np.sin(x), ....: np.ones((3,))) (array([-0.007232 , 1.09078356, -0.23537796]), array([[ 7.03274163e-07, -2.79884256e-06, 2.32064835e-06], [ -2.79884256e-06, 1.50223585e-05, -1.40659702e-05], [ 2.32064835e-06, -1.40659702e-05, 1.40659703e-05]]))
In this section, we focus on the leastsq
function exclusively. The goals and coding of both the functions are the same, but leastsq
offers a more informative output on demand and more control over the different parameters of the Levenberg-Marquardt algorithm.
Let us experiment now with a few actual nonlinear problems:
In the first example, we will approximate the tan(2*x)
function in the interval from 0 to 1 with rational functions where each of the polynomials has an at most degree of 1:
In [26]: def error_function(a): ....: return (a[0] + a[1]*x)/(a[2] + a[3]*x) – np.tan(2*x) In [27]: def jacobian(a): ....: numerator = a[0] + a[1]*x ....: denominator = a[2] + a[3]*x ....: return np.array( [ 1./denominator, x/denominator, ....: -1.0*numerator/denominator**2, ....: -1.0*x*numerator/denominator**2 ])
To show the dependence of the initial estimation, we are going to experiment with three different choices: one that makes no sense (all zero coefficients), another that is a blind standard choice (with all entries equal to one), and the other choice that acknowledges the fact that the tan(2*x)
function has a vertical asymptote. We will pretend that we do not know the exact location and approximate it to 0.78
. Our third initial estimation then represents a simple rational function with an asymptote at 0.78
.
A wrong initial estimate does not give us anything useful, obviously:
In [28]: x1 = np.zeros((4,)); ....: x2 = np.ones((4,)); ....: x3 = np.array([1,0,0.78,-1]) In [29]: coeffs, success = leastsq(error_function, x1); ....: numerator = np.poly1d(coeffs[1::-1]); ....: denominator = np.poly1d(coeffs[:1:-1]); ....: print numerator, denominator 0 0 In [30]: coeffs, success = leastsq(error_function, x1, ....: Dfun=jacobian, col_deriv=True); ....: numerator = np.poly1d(coeffs[1::-1]); ....: denominator = np.poly1d(coeffs[:1:-1]); ....: print numerator, denominator 0 0
None of these two approximations using x2
as the initial guess are satisfactory: the corresponding errors are huge, and neither solution has an asymptote in the interval from 0 to 1.
In [31]: coeffs, success = leastsq(error_function, x2); ....: numerator = np.poly1d(coeffs[1::-1]); ....: denominator = np.poly1d(coeffs[:1:-1]); ....: print numerator, denominator; ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x)) -9.729 x + 4.28 -1.293 x + 1.986 Out[31]: 220.59056436054016 In [32]: coeffs, success = leastsq(error_function, x2, ....: Dfun=jacobian, col_deriv=True); ....: numerator = np.poly1d(coeffs[1::-1]); ....: denominator = np.poly1d(coeffs[:1:-1]); ....: print numerator, denominator; ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x)) -655.9 x + 288.5 -87.05 x + 133.8 Out[32]: 220.590564978504
The approximations using x3 as the initial guess are closer to the target function, and both of them have an acceptable asymptote.
In [33]: coeffs, success = leastsq(error_function, x3); ....: numerator = np.poly1d(coeffs[1::-1]); ....: denominator = np.poly1d(coeffs[:1:-1]); ....: print numerator, denominator; ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x)) 0.01553 x + 0.02421 -0.07285 x + 0.05721 Out[33]: 2.185984698129936 In [34]: coeffs, success = leastsq(error_function, x3, ....: Dfun=jacobian, col_deriv=True); ....: numerator = np.poly1d(coeffs[1::-1]); ....: denominator = np.poly1d(coeffs[:1:-1]); ....: print numerator, denominator; ....: spla.norm(np.tan(2*x) - numerator(x) / denominator(x)) 17.17 x + 26.76 -80.52 x + 63.24 Out[34]: 2.1859846981334954
We can do much better, of course, but these simple examples will suffice for now.
If we desire to output more information to monitor the quality of approximation, we may do so with the full_output
option set to True
:
In [35]: approximation_info = leastsq(error_function, x3, ....: full_output=True) In [36]: coeffs = approximation_info[0]; ....: print coeffs [ 0.02420694 0.01553346 0.0572128 -0.07284579] In [37]: message = approximation_info[-2]; ....: print message Both actual and predicted relative reductions in the sum of squares are at most 0.000000 In [38]: infodict = approximation_info[2]; ....: print 'The algorithm performed ....: {0:2d} iterations'.format(infodict['nfev']) The algorithm performed 97 iterations
Although technically, the leastsq
algorithm deals mostly with approximation to univariate functions, it is possible to work on multivariate functions with the aid of indices, raveling, unpacking (with the special *
operator), and stable sums.
The usual sum of the floating-point numbers of ndarray
with the numpy
instance method sum
(or with the numpy
function sum
) is far from stable. We firmly advise against using it for fairly large sums of numbers. The following example shows an undesired scenario, in which we try to add 4000 values:
>>> arr=np.array([1,1e20,1,-1e20]*1000,dtype=np.float64) >>> arr.sum() # The answer should be, of course, 2000 0.0
To resolve this situation, we make use of stable sums. In the math
module, there is an implementation of the Shewchuk algorithm for this very purpose:
>>> from math import fsum >>> fsum(arr) 2000.0
For more information about the Shewchuk algorithm, as well as other common pitfalls to avoid in scientific computing with floating-point arithmetic, we recommend the excellent guide What Every Computer Scientist Should Know about Floating Point Arithmetic, by David Goldberg. ACM Computing Surveys, 1991. vol. 23, pp. 5-48.
This process is best explained with an example. We start by generating the target function: an image of size 32 × 32 containing white noise on top of the addition of three spherical Gaussian functions with different locations, variances, and heights. We collect all these values in a 3 × 4 array that we name values. The first and second columns contain the x
and y
values of the coordinates of the centers. The third column contains the heights, and the fourth column contains the variances.
In [39]: def sphericalGaussian(x0, y0, h, v): ....: return lambda x,y: h*np.exp(-0.5*((x-x0)**2+(y-y0)**2)/v) ....: In [40]: domain = np.indices((32, 32)); ....: values = np.random.randn(3,4); ....: values[:,:2] += np.random.randint(1, 32, size=(3, 2)); ....: values[:,2] += np.random.randint(1, 64, size=3); ....: values[:,3] += np.random.randint(1, 16, size=3); ....: print values [[ 17.43247918 17.15301326 34.86691265 7.84836966] [ 5.5450271 20.68753512 34.41364835 4.78337552] [ 24.44909459 27.28360852 0.62186068 9.15251106]] In [41]: img = np.random.randn(32,32) In [42]: for k in xrange(3): ....: img += sphericalGaussian(*values[k])(*domain)
Let us assume that we do not know the centers, heights, and variances, and wish to estimate them from the target image img
. We then create an error function to compute the 12 coefficients that are packed in the 3 × 4 array a
. Note the role of the numpy
function ravel and the instance method reshape in ensuring that the data is handled correctly:
In [43]: from math import fsum In [44]: def error_function(a): ....: a = a.reshape(3,4) ....: cx = a[:,0] # x-coords ....: cy = a[:,1] # y-coords ....: H = a[:,2] # heights ....: V = a[:,3] # variances ....: guess = np.zeros_like(img) ....: for i in xrange(guess.shape[0]): ....: for j in xrange(guess.shape[1]): ....: arr = H*np.exp(-0.5*((i-cx)**2+(j-cy)**2)/V) ....: guess[i,j] = fsum(arr) ....: return np.ravel(guess-img)
Starting the process of least squares in this situation with guarantees of success requires a somewhat close initial guess. For this particular example, we are going to produce our initial guess from the array called values
:
In [45]: x0 = np.vectorize(int)(values); ....: print x0 [[17 17 34 7] [ 5 20 34 4] [24 27 0 9]] In [46]: leastsq(error_function, x0) Out[46]: (array([ 17.43346907, 17.14219682, 34.82077187, 7.85849653, 5.52511918, 20.68319748, 34.28559808, 4.8010449 , 25.19824918, 24.02286107, 3.87170006, 0.5289382 ]), 1)
Let us now visually compare both the target image img
and its minimization by the following procedure:
In [47]: coeffs, success = _; ....: coeffs = coeffs.reshape(3,4) In [48]: output = np.zeros_like(img) In [49]: for k in xrange(3): ....: output += sphericalGaussian(*coeffs[k])(*domain) In [50]: plt.figure(); ....: plt.subplot(121); ....: plt.imshow(img); ....: plt.title('target image'); ....: plt.subplot(122); ....: plt.imshow(output); ....: plt.title('computed approximation'); ....: plt.show()
This gives the following diagram:
3.15.229.111