Optimization involves finding extreme values of functions or their roots. We have already seen the power of optimization in the curve-fitting arena, but it does not stop there. There are applications to virtually every single branch of engineering, and robust algorithms to perform these tasks are a must in every scientist's toolbox.
The curve_fit
routine is actually syntactic sugar for the general algorithm that performs least-squares minimization, leastsq
, with the imposing syntax:
leastsq(func, x0, args=(), Dfun=None, full_output=0, col_deriv=0, ftol=1.49012e-8, xtol=1.49012e-8, gtol=0.0, maxfev=0, epsfcn=0.0, factor=100, diag=None):
For instance, the curve_fit
routine could have been called with a leastsq
call instead:
leastsq(error_function,p0,args=(x,y))
Here, error_function
is equal to lambda p,x,y: target_function(x,p[0],p[1],p[2])-y
The implementation is given in the corresponding section on the IPython Notebook of this chapter. Most of the optimization routines in SciPy can be accessed from either native Python code, or as wrappers for Fortran or C classical implementations of their corresponding algorithms. Technically, we are still using the same packages we did under Fortran or C, but from within Python. For instance, the minimization routine that implements the truncated Newton
method can be called with fmin_ncg
(and this is purely Python) or as fmin_tnc
(and this one is a wrap of a C implementation).
For general minimization problems, SciPy has many different algorithms. So far, we have covered the least-squares algorithm (leastsq
), but we also have brute force (brute
), simulated annealing (anneal
), Brent or Golden methods for scalar functions (brent
or golden
), the downhill simplex algorithm (fmin
), Powell's method (fmin_powell
), nonlinear conjugate gradient or Newton's version of it (fmin_cg
, fmin_ncg
), and the BFGS algorithm (fmin_bfgs
).
Constrained minimization is also possible computationally, and SciPy has routines that implement the L-BFGS-S algorithm (fmin_l_bfgs_s
), truncated Newton's algorithm (fmin_tnc
), COBYLA (fmin_cobyla
), or sequential least-squares programming (fmin_slsqp
).
The following code, for example, compares the output of all different methods to finding a local minimum of the Rosenbrock function, scipy.optimize.rosen
, near the origin using the downhill simplex algorithm:
>>> import scipy.optimize >>> scipy.optimize.fmin(scipy.optimize.rosen,[0,0])
The output is as follows:
Optimization terminated successfully. Current function value: 0.000000 Iterations: 79 Function evaluations: 146 array([ 1.00000439, 1.00001064])
Since the Version 0.11 of SciPy, all minimization routines can be called from the generic scipy.optimize.minimize
, with the method
parameter pointing to one of the strings, such as Nelder-Mead
(for the downhill simplex), Powell
, CG
, Newton-CG
, BFGS
, or anneal
. For constrained minimization, the corresponding strings are one of L-BFGS-S
, TNC
(for truncated Newton's), COBYLA
, or SLSQP
:
minimize( fun, x0, args=(), method='BFGS', jac=None, hess=None, hessp=None, bounds=None, constraints=(),tol=None, callback=None, options=None)
For most special functions included in the scipy.specia
l module, we have accurate algorithms that allow us to their zeros. For instance, for the Bessel function of first kind with integer order, jn_zeros
, offers as many roots as desired (in ascending order). We may obtain the first three roots of the Bessel J-function of order four by issuing the following command:
>>> import scipy.special >>> print (scipy.special.jn_zeros(4,3))
The output is as follows:
[ 7.58834243 11.06470949 14.37253667]
For nonspecial scalar functions, the scipy.optimize
module allows approximation to the roots through a great deal of different algorithms. For scalar functions, we have the crude bisection method (bisect
), the classical secant method of Newton-Raphson (newton
), and more accurate and faster methods such as Ridders' algorithm (ridder
), and two versions of the Brent method (brentq
and brenth
).
Finding roots for functions of several variables is very challenging in many ways; the larger the dimension, the more difficult it is. The effectiveness of any of these algorithms depends on the problem, and it is a good idea to invest some time and resources in knowing them all. Since Version 0.11 of SciPy, it is possible to call any of the designed methods with the same routine scipy.optimize.root
, which has the following syntax:
root(fun, x0, args=(), method='hybr', jac=None, tol=None, callback=None, options=None)
The different methods are obtained upon changing the value of the method
parameter to a method's string. We may choose from methods such as 'hybr'
for a modified hybrid Powell's method; 'lm'
for a modified least-squares method; 'broyden1'
or 'broyden2'
for Broyden's good and bad methods, respectively; 'diagbroyden'
for the diagonal Broyden Jacobian approximation; 'anderson'
for Anderson's extended mixing; 'Krylov'
for Krylov approximation of the Jacobian; 'linearmixing'
for scalar Jacobian approximation; and 'excitingmixing'
for a tuned diagonal Jacobian approximation.
For large-scale problems, both the Krylov approximation of the Jacobian or the Anderson extended mixing are usually the best options.
Let's present an illustrative example of the power of these techniques. Consider the following system of differential equations:
We use the plot routine quiver from the matplotlib.pyplot libraries to visualize a slope field for values of x
and y
between -0.5 and 2.5, and hence identify the location of the possible critical points in that region:
>>> import numpy >>> import matplotlib.pyplot as plt >>> f=lambda x: [x[0]**2 - 2*x[0] - x[1] + 0.5, x[0]**2 + 4*x[1]**2 - 4] >>> x,y=numpy.mgrid[-0.5:2.5:24j,-0.5:2.5:24j] >>> U,V=f([x,y]) >>> plt.quiver(x,y,U,V,color='r', linewidths=(0.2,), edgecolors=('k'), headaxislength=5) >>> plt.show()
Note how there is a whole region of the plane in which the slopes are extremely small. Because of the degrees of the polynomials involved, there are at most four different possible critical points. In this area, we should be able to identify two such points (as a matter of fact there are only two noncomplex solutions). One of them seems to be near (0, 1) and the second one is near (2, 0). We use these two locations as initial guesses for our searches:
>>> import scipy.optimize >>> f=lambda x: [x[0]**2 - 2*x[0] - x[1] + 0.5, x[0]**2 + 4*x[1]**2 - 4] >>> scipy.optimize.root(f,[0,1])
The output is as follows:
status: 1 success: True qtf: array([ -4.81190247e-09, -3.83395899e-09]) nfev: 9 r: array([ 2.38128242, -0.60840482, -8.35489601]) fun: array([ 3.59529073e-12, 3.85025345e-12]) x: array([-0.22221456, 0.99380842]) message: 'The solution converged.' fjac: array([[-0.98918813, -0.14665209], [ 0.14665209, -0.98918813]])
Let's look at second case:
>>> scipy.optimize.root(f,[2,0])
status: 1 success: True qtf: array([ 2.08960516e-10, 8.61298294e-11]) nfev: 12 r: array([-4.56575336, -1.67067665, -1.81464307]) fun: array([ 2.44249065e-15, 1.42996726e-13]) x: array([ 1.90067673, 0.31121857]) message: 'The solution converged.' fjac: array([[-0.39612596, -0.91819618], [ 0.91819618, -0.39612596]])
In the first case, we converged successfully to (-0.22221456, 0.99380842). In the second case, we converged to (1.90067673, 0.31121857). The routine gives us the details of the convergence and the properties of the approximation. For instance, nfev
tells us about the number of function calls performed, and fun
indicates the output of the function at the found location. The other items in the output reflect the matrices used in the procedure, such as qtf
, r
, and fjac
.
3.139.70.21