Optimization

The optimization problem is best described as the search for a local maximum or minimum value of a scalar-valued function f(x). This search can be performed for all possible input values in the domain of f (and in this case, we refer to this problem as an unconstrained optimization), or for a specific subset of it that is expressible by a finite set of identities and inequalities (and we refer to this other problem as a constrained optimization). In this section, we are going to explore both modalities in several settings.

Unconstrained optimization for univariate functions

We focus on the search for the local minima of a function f(x) in an interval [a, b] (the search for local maxima can then be regarded as the search of the local minima of the function –f(x) in the same interval). For this task, we have the routine minimize_scalar in the module scipy.optimize. It accepts as obligatory input a univariate function f(x), together with a search method.

Most search methods are based on the idea of bracketing that we used for root finding, although the concept of bracket is a bit different in this setting. In this case, a good bracket is a triple x < y < z where f(y) is less than both f(x) and f(z). If the function is continuous, its graph presents a U-shape on a bracket. This guarantees the existence of a minimum inside of the subinterval [x, z]. A successful bracketing method will look, on each successive step, for the target extremum in either [x, y] or [y, z].

Let's construct a simple bracketing method for testing purposes. Assume we have an initial bracket a < c < b. By quadratic interpolation, we construct a parabola through the points (a, f(a)), (c, f(c)), and (b, f(b)). Because of the U-shape condition, there must be a minimum (easily computable) for the interpolating parabola, say (d, f(d)). It is not hard to prove that the value d lies between the midpoints of the subintervals [a, c], and [c, b]. We will use this point d for our next bracketing step. For example, if it happens that c < d, then the next bracket will be either c < d < b, or a < c < d. Easy enough! Let's implement this method:

In [1]: import numpy as np; 
   ...: from scipy.interpolate import lagrange; 
   ...: from scipy.optimize import OptimizeResult, minimize_scalar
In [2]: def good_bracket(func, bracket):
   ...:     a, c, b = bracket
   ...:     return (func(a) > func(c)) and (func(b) > func(c))
   ...:
In [3]: def parabolic_step(f, args, bracket, **options):
   ...:     stop = False
   ...:     funcalls = 0
   ...:     niter = 0
   ...:     while not stop:
   ...:         niter += 1
   ...:         interpolator = lagrange(np.array(bracket),
   ...:                                 f(np.array(bracket)))
   ...:         funcalls += 3
   ...:         a, b, c = interpolator.coeffs
   ...:         d = -0.5*b/a
   ...:         if np.allclose(bracket[1], d):
   ...:             minima = d
   ...:             stop = True
   ...:         elif bracket[1] < d:
   ...:             newbracket = [bracket[1], d, bracket[2]]
   ...:             if good_bracket(f, newbracket):
   ...:                 bracket = newbracket
   ...:             else:
   ...:                 bracket = [bracket[0], bracket[1], d]
   ...:         else:
   ...:             newbracket = [d, bracket[1], bracket[2]]
   ...:             if good_bracket(f, newbracket):
   ...:                 bracket = newbracket
   ...:             else:
   ...:                 bracket = [bracket[0], d, bracket[1]]
   ...:     return OptimizeResult(fun=f(minima), x=minima,
   ...:                           nit=niter, nfev=funcalls)

Tip

The output of any minimizing method must be an OptimizeResult object, with at least the attribute x (the solution to the optimization problem). In the example we have just run, the attributes coded in this method are x, fun (the evaluation of f at that solution), nit (number of iterations), and nfev (number of functions evaluations needed).

Let's run this method over a few examples:

In [4]: f = np.vectorize(lambda x: max(1-x, 2+x))
In [5]: def g(x): return -np.exp(-x)*np.sin(x)
In [6]: good_bracket(f, [-1, -0.5, 1])
Out[6]: True
In [7]: minimize_scalar(f, bracket=[-1, -0.5, 1],
   ...:                 method=parabolic_step)
Out[7]:
  fun: array(1.5000021457670878)
 nfev: 33
  nit: 11
    x: -0.50000214576708779
In [8]: good_bracket(g, [0, 1.2, 1.5])
Out[8]: True
In [9]: minimize_scalar(g, bracket=[0,1.2,1.5],
   ...:                 method=parabolic_step)
Out[9]:
  fun: -0.32239694192707441
 nfev: 54
  nit: 18
    x: 0.78540558550495643

There are two methods already coded for univariate scalar minimization, golden, using a golden section search, and brent, following an algorithm by Brent and Dekker:

In [10]: minimize_scalar(f, method='brent', bracket=[-1, -0.5, 1])
Out[10]:
   fun: array(1.5)
 nfev: 22
  nit: 21
    x: -0.5
In [11]: minimize_scalar(f, method='golden', bracket=[-1, -0.5, 1])
Out[11]:
 fun: array(1.5)
    x: -0.5
 nfev: 44
In [12]: minimize_scalar(g, method='brent', bracket=[0, 1.2, 1.5])
Out[12]:
  fun: -0.32239694194483443
 nfev: 11
  nit: 10
    x: 0.78539817180087257
In [13]: minimize_scalar(g, method='golden', bracket=[0, 1.2, 1.5])
Out[13]:
  fun: -0.32239694194483448
    x: 0.7853981573284226
 nfev: 43

Constrained optimization for univariate functions

Although the bracket included in the routine minimize_scalar already places a constraint on the function, it is feasible to force the search for a true minimum inside of a suitable interval for which no bracket can be easily found:

In [14]: minimize_scalar(g, method='bounded', bounds=(0, 1.5))
Out[14]:
  status: 0
    nfev: 10
 success: True
     fun: -0.32239694194483415
       x: 0.78539813414299553
 message: 'Solution found.'

Unconstrained optimization for multivariate functions

Except in the cases of minimization by brute force or by basin hopping, we can perform all other searches with the common routine minimize from the module scipy.optimize. The parameter method, as with its univariate counterpart, takes care of selecting the algorithm employed to achieve the extremum. There are several well-known algorithms already coded, but we also have the possibility of implementing our own ideas via a suitable custom-made method.

In this section, we will focus on the description and usage of the coded implementations. The same technique that we employed in the construction of custom methods for minimize_scalar is valid here, with the obvious challenges that the extra dimensions bring.

To compare all the different methods, we are going to run them against a particularly challenging function: Rocksenbrock's parabolic valley (also informally referred to as the banana function). The module scipy.optimize has NumPy versions of this function, as well as its Jacobian and Hessian:

In [15]: from scipy.optimize import rosen; 
   ....: from sympy import var, Matrix, solve, pprint
In [16]: var('x y')
Out[16]: (x, y)
In [17]: F = Matrix([rosen([x, y])]); 
   ....: pprint(F)
[(-x + 1)2 + 100.0(-x2 + y)2]
In [18]: X, Y = np.mgrid[-1.25:1.25:100j, -1.25:1.25:100j]
In [19]: def f(x,y): return rosen([x, y])
In [20]: import matplotlib.pyplot as plt, matplotlib.cm as cm; 
   ....: from mpl_toolkits.mplot3d.axes3d import Axes3D
In [21]: plt.figure(); 
   ....: plt.subplot(121, aspect='equal'); 
   ....: plt.contourf(X, Y, f(X,Y), levels=np.linspace(0,800,16),
   ....:              cmap=cm.Greys)
   ....: plt.colorbar(orientation='horizontal')
   ....: plt.title('Contour plot')
   ....: ax = plt.subplot(122, projection='3d', aspect='equal')
   ....: ax.plot_surface(X, Y, f(X,Y), cmap=cm.Greys, alpha=0.75)
   ....: plt.colorbar(orientation='horizontal')
   ....: plt.title('Surface plot')
   ....: plt.show()
Unconstrained optimization for multivariate functions

The figure shows a large area (in the shape of a banana) that might contain local minima. Techniques of multivariate calculus help us locate all of the critical points precisely, instead of relying on intuition. We need first to compute the Jacobian and Hessian of the function:

In [22]: JacF = F.jacobian([x, y]); 
   ....: pprint(JacF)
[- 400.0⋅x⋅(- x2 + y) + 2⋅x - 2 - 200.0⋅x2  + 200.0⋅y]
In [23]: HesF = JacF.jacobian([x, y]); 
   ....: pprint(HesF)
[1200.0⋅x2 - 400.0⋅y + 2 -400.0⋅x
         -400.0⋅x          200.0  ]
In [24]: solve(JacF)
Out[24]: [{x: 1.00000000000000, y: 1.00000000000000}]
In [25]: HesF.subs({x: 1.0, y: 1.0})
Out[25]:
Matrix([
[ 802.0, -400.0],
[-400.0,  200.0]])
In [26]: _.det()
Out[26]: 400.000000000000

These computations show that there is only one critical point at (1, 1). Unequivocally, this point presents a local minimum at that location.

Trying to compute the critical points with this technique for a Rosenbrock function in higher dimensions, while doable, is computationally intense. Moving to four dimensions, for example, takes a decent computer about half a minute:

In [27]: var('x:4'); 
   ....: X = [x0, x1, x2, x3]; 
   ....: F = Matrix([rosen(X)])
In [28]: %time solve(F.jacobian(X))
CPU times: user 36.6 s, sys: 171 ms, total: 36.8 s
Wall time: 36.7 s
Out[28]:
[{x0:1.0,x1:1.0,x2:1.0,x3:1.0}]

For large dimensions, the search for global minima can be done by brute-force; not very elegant, but it gets the job done. A brute-force algorithm is able to track global minima (or approximate it to satisfactory precision). We can call this method with the routine brute in the module scipy.optimize. The obligatory parameters are the function to be minimized, together with a description of the domain where we will apply the optimization. This domain is best coded as a tuple of slices. For example, to search for a global minimum of the Rosenbrock function in four variables, where each variable is bounded in absolute value by three, we could issue this command:

In [29]: from scipy.optimize import brute
In [30]: interval = slice(-3, 3, 0.25); 
   ....: box = [interval] * 4
In [31]: %time brute(rosen, box)
CPU times: user 13.7 s, sys: 6 ms, total: 13.7 s
Wall time: 13.7 s
Out[31]: array([ 1.,  1.,  1.,  1.])

Still quite a slow process! To achieve speed, it is always better to use iterative methods. The search for minima in this setting is achieved according to several schema (and combinations of these):

  • The stochastic methods: These are methods suitable for the search of actual global minima. They generate and use random variables. In the module scipy.optimize, we have two exponents of this category:
    • One is the basin-hopping algorithm, called with the routine basinhopping in the module scipy.optimize. The implementation has an acceptance test given by the Metropolis criterion of a standard Monte-Carlo simulation.
    • Another is a deprecated version of the method of simulated annealing, called with method='Anneal'. This is a variation of a Monte-Carlo simulation. It is useful for optimization problems where the search space is discrete and large.
  • Deterministic algorithms that exclusively employ function evaluations: These are basically performed by successive linear minimizations in different directions. In the module scipy.optimize, we have two methods complying with this philosophy:
    • Powell's method based on the unidimensional Brent's minimization. We call it with method='Powell'.
    • The downhill simplex algorithm, also known as the amoeba method, created by Nelder and Mead in 1965. We call it with method='Nelder-Mead'.
  • The Newton methods: These are deterministic algorithms on differentiable functions that mimic multivariate calculus to search for critical points. In a nutshell, we seek for at least one critical point whose Hessians satisfy the conditions for the local minimum. These algorithms employ both Jacobian and Hessian evaluations. Because of the complexity of these expressions in general, approximations to both operators are usually implemented instead. When this is the case, we refer to these methods as quasi-Newton methods. In the module scipy.optimize, we have the quasi-Newton method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS), which uses exclusively first derivatives. We call it with method='BFGS'.
  • The conjugate gradient methods: Here, we have three variants:
    • A variant of the Fetcher-Reeves algorithm is to implement a pure conjugate gradient, written by Polak and Ribiere. It uses exclusively first derivatives and is called with the method ='CG'.
    • A combination of the conjugate gradient with a Newton method, the truncated Newton method, which we call with method='Newton-CG'.
    • Two different versions of the Newton conjugate gradient trust-region algorithm, which use the idea of trust-regions to more effectively bound the location of the possible minima. We call them with method='dogleg' and method='trust-ncg'.

Let's browse through these methods.

The stochastic methods

Let's find the global minimum of the Rosenbrock function of nine variables using the technique of basin hopping:

In [32]: from scipy.optimize import minimize, basinhopping
In [33]: %time basinhopping(rosen, np.zeros(9))
CPU times: user 4.59 s, sys: 7.17 ms, total: 4.6 s
Wall time: 4.6 s
Out[33]:
                  nfev: 75633
 minimization_failures: 52
                   fun: 2.5483642615054407e-11
                     x: array([ 0.99999992,  0.99999994,  0.99999992,  0.99999981,  0.99999962,
        0.99999928,  0.99999865,  0.9999972 ,  0.99999405])
               message: ['requested number of basinhopping iterations completed successfully']
                  njev: 6820
                   nit: 100

Let's compare to the behavior of (the deprecated) simulated annealing:

In [34]: minimize(rosen, np.zeros(9), method='Anneal')
Out[34]:
  status: 5
 success: False
  accept: 19
    nfev: 651
       T: 1130372817.0369582
     fun: 707171392.44894326
       x: array([ 11.63666756, -24.41186725,  48.26727994,   3.97730959,
       -31.52658563,  18.00560694,   1.22589971,  21.97577333, -43.9967434 ])
 message: 'Final point not the minimum amongst encountered points'
     nit: 12

Deterministic algorithms that exclusively employ function evaluations

Let's compare the results of the Powell method with the downhill simplex method:

In [35]: minimize(rosen, np.zeros(9), method='Powell')
Out[35]:
  status: 0
 success: True
   direc: array([[ -9.72738085e-06,   2.08442100e-05,   2.06470355e-05,
          4.39487337e-05,   1.29109966e-04,   1.98333214e-04,
          3.66992711e-04,   7.00645878e-04,   1.38618490e-03],
       [ -6.95913466e-06,  -7.25642357e-07,  -2.39771165e-06,
          4.10148947e-06,  -6.17293950e-06,  -6.53887928e-06,
         -1.06472130e-05,  -5.23030557e-06,  -2.28609232e-06],
       [  0.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  1.23259262e-06,   9.30817407e-07,   2.48075497e-07,
         -7.07907998e-07,  -2.01233105e-07,  -1.10513430e-06,
         -2.57164619e-06,  -2.58316828e-06,  -3.89962665e-06],
       [  6.07328675e-02,   8.51817777e-02,   1.30174960e-01,
          1.71511253e-01,   9.72602622e-02,   1.47866889e-02,
          1.12376083e-03,   5.35386263e-04,   2.04473740e-04],
       [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  3.88222708e-04,   8.26386166e-04,   5.56913200e-04,
          3.08319925e-04,   4.45122275e-04,   2.66513914e-03,
          6.31410713e-03,   1.24763367e-02,   2.45489699e-02],
       [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00],
       [ -2.82599868e-13,   2.33068676e-13,  -4.23850631e-13,
         -1.23391999e-12,  -2.41224441e-12,  -5.08909225e-12,
         -9.92053051e-12,  -2.07685498e-11,  -4.10004188e-11]])
    nfev: 6027
     fun: 3.1358222279861171e-21
       x: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
 message: 'Optimization terminated successfully.'
     nit: 56
In [36]: minimize(rosen, np.zeros(9), method='Nelder-Mead')
  status: 1
    nfev: 1800
 success: False
     fun: 4.9724099905503065
       x: array([ 0.85460488,  0.70911132,  0.50139591,  0.24591886,  0.06234451,
       -0.01112426,  0.02048509,  0.03266785, -0.01790827])
 message: 'Maximum number of function evaluations has been exceeded.'
     nit: 1287

The Broyden-Fletcher-Goldfarb-Shanno quasi-Newton method

Let's observe the behavior of this algorithm on our running example:

In [37]: minimize(rosen, np.zeros(9), method='BFGS')
Out[37]:
   status: 0
  success: True
     njev: 83
     nfev: 913
 hess_inv: array([[  1.77874730e-03,   1.01281617e-03,   5.05884211e-04,
          3.17367120e-04,   4.42590321e-04,   7.92168518e-04,
          1.52710497e-03,   3.06357905e-03,   6.12991619e-03],
       [  1.01281617e-03,   1.91057841e-03,   1.01489866e-03,
          7.31748268e-04,   8.86058826e-04,   1.44758106e-03,
          2.76339393e-03,   5.60288875e-03,   1.12489189e-02],
       [  5.05884211e-04,   1.01489866e-03,   2.01221575e-03,
          1.57845668e-03,   1.87831124e-03,   3.06835450e-03,
          5.86489711e-03,   1.17764144e-02,   2.35964978e-02],
       [  3.17367120e-04,   7.31748268e-04,   1.57845668e-03,
          3.13024681e-03,   3.69430791e-03,   6.16910056e-03,
          1.17339522e-02,   2.33374859e-02,   4.66640347e-02],
       [  4.42590321e-04,   8.86058826e-04,   1.87831124e-03,
          3.69430791e-03,   7.37204979e-03,   1.23036988e-02,
          2.33709766e-02,   4.62512165e-02,   9.24474614e-02],
       [  7.92168518e-04,   1.44758106e-03,   3.06835450e-03,
          6.16910056e-03,   1.23036988e-02,   2.48336778e-02,
          4.71369608e-02,   9.29927375e-02,   1.85683729e-01],
       [  1.52710497e-03,   2.76339393e-03,   5.86489711e-03,
          1.17339522e-02,   2.33709766e-02,   4.71369608e-02,
          9.44348689e-02,   1.86490477e-01,   3.72360210e-01],
       [  3.06357905e-03,   5.60288875e-03,   1.17764144e-02,
          2.33374859e-02,   4.62512165e-02,   9.29927375e-02,
          1.86490477e-01,   3.73949424e-01,   7.46959044e-01],
       [  6.12991619e-03,   1.12489189e-02,   2.35964978e-02,
          4.66640347e-02,   9.24474614e-02,   1.85683729e-01,
          3.72360210e-01,   7.46959044e-01,   1.49726446e+00]])
      fun: 6.00817150312557e-11
        x: array([ 0.99999993,  0.99999986,  0.99999976,  0.99999955,  0.99999913,
        0.99999832,  0.99999666,  0.99999334,  0.99998667])
  message: 'Optimization terminated successfully.'
      jac: array([  5.23788826e-06,  -5.45925187e-06,  -1.35362172e-06,
         8.75480656e-08,  -9.45374358e-06,   7.31889131e-06,
         3.34352248e-07,  -7.24984749e-07,   2.02705630e-08])

Note

Note that this method employs more iterations, but much fewer function evaluations than the method of Powell (including Jacobian evaluations). Accuracy is comparable, but the gain in complexity and speed in remarkable.

The conjugate gradient method

The pure conjugate gradient method works best with functions with a clear, unique critical point, and where the range of the slopes is not too large. Multiple stationary points tend to confuse the iterations, and too steep slopes (larger than 1000) result in terrible rounding errors.

Without offering an expression for the Jacobian, the algorithm computes a decent approximation of this operator to compute the first derivatives:

In [38]: minimize(rosen, np.zeros(9), method='CG')
Out[38]:
  status: 0
 success: True
    njev: 326
    nfev: 3586
     fun: 1.5035665428352255e-10
       x: array([ 0.9999999 ,  0.99999981,  0.99999964,  0.99999931,  0.99999865,
        0.99999733,  0.9999947 ,  0.99998941,  0.99997879])
 message: 'Optimization terminated successfully.'
     jac: array([ -1.48359492e-06,   2.95867756e-06,   1.71067556e-06,
        -1.83617409e-07,  -2.47616618e-06,  -5.34951641e-06,
         2.50389338e-06,  -2.37918319e-06,  -3.86667920e-06])

Including an actual Jacobian improves matters greatly. Note the improvement in the evaluation of the found minimum (fun):

In [39]: from scipy.optimize import rosen_der
In [40]: minimize(rosen, np.zeros(9), method='CG', jac=rosen_der)
Out[40]:
  status: 0
 success: True
    njev: 406
    nfev: 406
     fun: 8.486856765134401e-12
       x: array([ 0.99999998,  0.99999996,  0.99999994,  0.99999986,  0.99999969,
        0.99999938,  0.99999875,  0.9999975 ,  0.999995  ])
 message: 'Optimization terminated successfully.'
     jac: array([  1.37934336e-06,  -9.03688875e-06,   8.53289049e-06,
         9.77779178e-06,  -2.63022111e-06,  -1.02087919e-06,
        -6.55712127e-06,  -1.71887373e-06,  -9.12268328e-07])

The truncated Newton method requires a precise Jacobian to work:

In [41]: minimize(rosen, np.zeros(9), method='Newton-CG')
ValueError: Jacobian is required for Newton-CG method
In [38]: minimize(rosen, np.zeros(9), method='Newton-CG', jac=rosen_der)
Out[41]:
  status: 0
 success: True
    njev: 503
    nfev: 53
     fun: 5.231613200425767e-08
       x: array([ 0.99999873,  0.99999683,  0.99999378,  0.99998772,  0.99997551,
        0.99995067,  0.99990115,  0.99980214,  0.99960333])
 message: 'Optimization terminated successfully.'
    nhev: 0
     jac: array([  6.67155399e-06,   2.50927306e-05,   1.03398234e-04,
         4.09953321e-04,   1.63524314e-03,   6.48667316e-03,
        -1.91779902e-03,  -2.81972861e-04,  -5.67500380e-04])

The methods using trust regions require an exact expression for the Hessian:

In [42]: from scipy.optimize import rosen_hess
In [43]: minimize(rosen, np.zeros(9), method='dogleg',
   ....:                 jac=rosen_der, hess=rosen_hess)
Out[43]:
  status: 0
 success: True
    njev: 25
    nfev: 29
     fun: 9.559277795967234e-19
       x: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
 message: 'Optimization terminated successfully.'
    nhev: 24
     jac: array([  3.84137166e-14,   3.00870439e-13,   1.10489395e-12,
         4.32831548e-12,   1.72455383e-11,   6.77315980e-11,
         2.48459919e-10,   6.62723207e-10,  -1.52775570e-09])
     nit: 28
In [44]: minimize(rosen, np.zeros(9), method='trust-ncg',
   ....:                 jac=rosen_der, hess=rosen_hess)
Out[44]:
  status: 0
success: True
    njev: 56
    nfev: 67
     fun: 3.8939669818289621e-18
       x: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
 message: 'Optimization terminated successfully.'
    nhev: 55
     jac: array([  2.20490293e-13,   5.57109914e-13,   1.77013959e-12,
        -9.03965791e-12,  -3.05174774e-10,   3.03425818e-09,
         1.49134067e-08,   6.32240935e-08,  -3.64210218e-08])
     nit: 66

Note the huge improvement in terms of accuracy, iterations, and function evaluations over the previous methods! The obvious drawback is that quite often it is very challenging to obtain good representations of the Jacobian or Hessian operators.

Constrained optimization for multivariate functions

Take, for example, the minimization of the plane function f(x,y) = 5x – 2y + 4 over the circle x2 + y2 = 4. Using SymPy, we can implement the technique of Lagrange multipliers:

In [45]: F = Matrix([5*x - 2*y + 4]); 
   ....: G = Matrix([x**2 + y**2 - 4])      # constraint
In [46]: var('z'); 
   ....: solve(F.jacobian([x, y]) - z * G.jacobian([x, y]))
Out[46]: [{x: 5/(2*z), y: -1/z}]
In [47]: soln = _[0]; 
   ....: solve(G.subs(soln))
Out[47]: [{z: -sqrt(29)/4}, {z: sqrt(29)/4}]
In [48]: zees = _; 
   ....: [(soln[x].subs(item), soln[y].subs(item)) for item in zees]
Out[48]:
[(-10*sqrt(29)/29, 4*sqrt(29)/29), (10*sqrt(29)/29, -4*sqrt(29)/29)]

Not too bad! On top of this constraint, we can further impose another condition in the form of an inequality. Think of the same problem as before, but constraining to half a circle instead: y > 0. This being the case, the new result will be only the point with coordinates x = –10√(29)/29 = –1.8569533817705186 and y = 4√(29)/29 = 0.74278135270820744.

It is, of course, possible to address this problem numerically. In the module scipy.optimize, we have basically three methods, all of which can be called from the common routine minimize:

  • The large-scale bound-constrained optimization based on the BFGS algorithm (we call it with method='L-BFGS-B'). The implementation is actually a wrapper for a FORTRAN routine with the same name, written by Ciyou Zhu, Richard Byrd, and Jorge Nocedal (for details, see for example, R. H. Byrd, P. Lu, and J. Nocedal. A Limited Memory Algorithm for Bound Constrained Optimization, (1995), SIAM Journal on Scientific and Statistical Computing, 16, 5, pp. 1190-1208).
  • A constrained-based algorithm based on the truncated Newton method (we call it with method='TNC'). This implementation is similar to the one we called with method='Newton-CG', except this version is a wrapper for a C routine.
  • A constrained optimization by linear approximation (called with method='COBYLA'). This implementation wraps a FORTRAN routine with the same name.
  • A minimization method based on sequential least squares programming (method='SLSQP'). The implementation is a wrapper for a FORTRAN routine with the same name, written by Dieter Kraft.

Let's use our running example to illustrate how to input different constraints. We implement these as a dictionary or a tuple of dictionaries—each entry in the tuple represents either an identity ('eq'), or an inequality ('ineq'), together with a functional expression (in the form of a ndarray when necessary) and the corresponding derivative of it:

In [49]: def f(x): return 5*x[0] - 2*x[1] + 4
In [50]: def jacf(x): return np.array([5.0, -2.0])
In [51]: circle = {'type': 'eq',
   ....:           'fun': lambda x: x[0]**2 + x[1]**2 - 4.0,
   ....:           'jac': lambda x: np.array([2.0 * x[0],
                                              2.0 * x[1]])}
In [52]: semicircle = ({'type': 'eq',
   ....:                'fun': lambda x: x[0]**2 + x[1]**2 - 4.0,
   ....:                'jac': lambda x: np.array([2.0 * x[0],
   ....:                                           2.0 * x[1]])},
   ....:               {'type': 'ineq',
   ....:                'fun': lambda x: x[1],
   ....:                'jac': lambda x: np.array([0.0, 1.0])})

The constraints are fed to the routine minimize through the parameter constraints. The initial guess must satisfy the constraints too, otherwise, the algorithm fails to converge to anything meaningful:

In [53]: minimize(f, [2,2], jac=jacf, method='SLSQP', constraints=circle)
Out[53]:
  status: 0
 success: True
    njev: 11
    nfev: 13
     fun: -6.7703296142789693
       x: array([-1.85695338,  0.74278135])
 message: 'Optimization terminated successfully.'
     jac: array([ 5., -2.,  0.])
     nit: 11
In [54]: minimize(f, [2,2], jac=jacf, method='SLSQP', constraints=semicircle)
Out[54]:
  status: 0
 success: True
    njev: 11
    nfev: 13
     fun: -6.7703296142789693
       x: array([-1.85695338,  0.74278135])
 message: 'Optimization terminated successfully.'
     jac: array([ 5., -2.,  0.])
     nit: 11
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.73.207