To achieve a definite integration of functions on suitable domains, we have mainly two methods—Numerical integration and Symbolic integration.
Numerical integration refers to the approximation of a definite integral by a quadrature process. Depending on how the function f(x) is given, the domain of integration, the knowledge of its singularities, and the choice of quadrature, we have different ways to attack this problem:
In many cases, it is also possible to perform exact integration, even for not-bounded domains, with the aid of symbolic computation. In the SciPy stack, to this effect, we have an implementation of the Risch algorithm for elementary functions, and Meijer G-functions for non-elementary integrals. Both methods are housed in the SymPy
libraries. Unfortunately, these symbolic procedures do not work for all functions. And due to the complexity of the generated codes, in general, the solutions obtained by this method are by no means as fast as any numerical approximation.
The definite integral of a polynomial function on a finite domain [a,b]
can be computed very accurately via the Fundamental Theorem of Calculus, through the module numpy.polynomial
. For instance, to calculate the integral of the polynomial p(x)=x5 on the interval [-1,1]
, we could issue:
In [1]: import numpy as np In [2]: p = np.poly1d([1,0,0,0,0,0]); ...: print p; ...: print p.integ() 5 1 x 6 0.1667 x In [3]: p.integ()(1.0) - p.integ()(-1.0) Out[3]: 0.0
In general, obtaining exact values for a definite integral of a generic function is hard and computationally inefficient. This is possible in some cases through symbolic integration, with the aid of the Risch algorithm (for elementary functions) and Meijer G-functions (for non-elementary integrals). Both methods can be called with the common routine integrate in the library SymPy
. The routine is clever enough to decide which algorithm to use, depending on the source function.
Let's show you a few examples starting with the definite integral of the polynomial from the previous case:
In [4]: from sympy import integrate, symbols In [5]: x, y = symbols('x y', real=True) In [6]: integrate(x**5, x) Out[6]: x**6/6 In [7]: integrate(x**5, (x, -1, 1)) Out[7]: 0
Let's try something more complicated. The definite integral of the function f(x) = e-xsinx on the interval [0,1]
:
In [8]: from sympy import N, exp as Exp, sin as Sin In [9]: integrate(Exp(-x) * Sin(x), x) Out[9]: -exp(-x)*sin(x)/2 - exp(-x)*cos(x)/2 In [10]: integrate(Exp(-x) * Sin(x), (x, 0, 1)) Out[10]: -exp(-1)*sin(1)/2 - exp(-1)*cos(1)/2 + 1/2 In [11]: N(_) Out[11]: 0.245837007000237
Symbolic integration, when it works, treats singularities the right way:
In [12]: integrate(Sin(x) / x, x) Out[12]: Si(x) In [13]: integrate(Sin(x) / x, (x, 0, 1)) Out[13]: Si(1) In [14]: N(_) Out[14]: 0.946083070367183 In [15]: integrate(x**(1/x), (x, 0, 1)) Out[15]: 1/2
Integration over unbounded domains is also possible:
In [16]: from sympy import oo In [17]: integrate(Exp(-x**2), (x,0,+oo)) Out[17]: sqrt(pi)/2
It is even possible to perform multivariate integration:
In [18]: integrate(Exp(-x**2-y**2), (x, -oo, +oo), (y, -oo, +oo)) Out[18]: pi
However, we need to stress this point strongly—symbolic integration is not efficient (and might not work!) for simple cases, as the following example shows:
In [19]: integrate(Sin(x)**Sin(x), x) Integral(sin(x)**sin(x), x) In [20]: integrate(Sin(x)**Sin(x), (x, 0, 1)) Integral(sin(x)**sin(x), (x, 0, 1))
Even when it works for simple cases, it generates complicated code, and might use too many computational resources.
The optimal way to address these problems is to obtain good approximations instead, with the aid of numerical integration. There are different techniques, according to the type of function and integration domain. Let's examine them in detail.
The basic problem in numerical integration is the approximation to the definite integral of any function f(x) on a finite interval [a,b]
. In general, if the function f(x) does not have singularities or discontinuities, we can obtain easy quadrature formulas by integrating different interpolations with piecewise polynomials (since these are evaluated exactly):
We have efficient algorithms for composite trapezoidal and composite Simpson's rules in the module scipy.integrate
through the routines cumtrapz
and simps
, respectively. Let's show you how to use these simple quadrature formulas for the polynomial example:
In [21]: from scipy.integrate import cumtrapz, simps In [22]: def f(x): return x**5 In [23]: nodes = np.linspace(-1, 1, 100) In [24]: simps(f(nodes), nodes) Out[24]: -1.3877787807814457e-17 In [25]: cumtrapz(f(nodes), nodes) Out[25]: array([ -1.92221161e-02, -3.65619927e-02, -5.21700680e-02, -6.61875756e-02, -7.87469280e-02, -8.99720915e-02, -9.99789539e-02, -1.08875683e-01, -1.16763077e-01, -1.23734908e-01, -1.29878257e-01, -1.35273836e-01, -1.39996314e-01, -1.44114617e-01, -1.47692240e-01, -1.50787532e-01, -1.53453988e-01, -1.55740523e-01, -1.57691741e-01, -1.59348197e-01, -1.60746651e-01, -1.61920310e-01, -1.62899066e-01, -1.63709727e-01, -1.64376231e-01, -1.64919865e-01, -1.65359463e-01, -1.65711607e-01, -1.65990811e-01, -1.66209700e-01, -1.66379187e-01, -1.66508627e-01, -1.66605982e-01, -1.66677959e-01, -1.66730153e-01, -1.66767180e-01, -1.66792794e-01, -1.66810003e-01, -1.66821177e-01, -1.66828145e-01, -1.66832283e-01, -1.66834598e-01, -1.66835799e-01, -1.66836364e-01, -1.66836598e-01, -1.66836678e-01, -1.66836700e-01, -1.66836703e-01, -1.66836703e-01, -1.66836703e-01, -1.66836703e-01, -1.66836700e-01, -1.66836678e-01, -1.66836598e-01, -1.66836364e-01, -1.66835799e-01, -1.66834598e-01, -1.66832283e-01, -1.66828145e-01, -1.66821177e-01, -1.66810003e-01, -1.66792794e-01, -1.66767180e-01, -1.66730153e-01, -1.66677959e-01, -1.66605982e-01, -1.66508627e-01, -1.66379187e-01, -1.66209700e-01, -1.65990811e-01, -1.65711607e-01, -1.65359463e-01, -1.64919865e-01, -1.64376231e-01, -1.63709727e-01, -1.62899066e-01, -1.61920310e-01, -1.60746651e-01, -1.59348197e-01, -1.57691741e-01, -1.55740523e-01, -1.53453988e-01, -1.50787532e-01, -1.47692240e-01, -1.44114617e-01, -1.39996314e-01, -1.35273836e-01, -1.29878257e-01, -1.23734908e-01, -1.16763077e-01, -1.08875683e-01, -9.99789539e-02, -8.99720915e-02, -7.87469280e-02, -6.61875756e-02, -5.21700680e-02, -3.65619927e-02, -1.92221161e-02, -1.73472348e-17])
The routine cumtrapz
computes cumulative integrals over the designated subintervals. The last entry of the output is therefore the value of the quadrature we seek. We could, of course, report only the required integral by simply accessing that entry:
In [26]: cumtrapz(f(nodes), nodes)[-1] Out[26]: -1.7347234759768071e-17
The implementation of these two algorithms does not compute the interpolators explicitly. The final formulas are the target here, and the way it is coded in SciPy is by means of Newton-Cotes quadratures.
The routines to perform Newton-Cotes are hidden (in the sense that they are not reported in the tutorials or documentation in the official pages of SciPy) and are meant to be used only internally by cumtrapz
or simps
. They provide only the corresponding coefficients that multiply the function evaluation at the nodes.
However, Newton-Cotes quadrature formulas are usually very accurate by themselves in the right scenarios. They can be used to compute better approximations in many cases, without being subjected to conform to trapezoidal or Simpson's rules.
Let's show you how it works for our running example, now with only four equally spaced nodes in the interval [-1,1]
:
In [27]: from scipy.integrate import newton_cotes In [28]: coefficients, abs_error = newton_cotes(3, equal=True); ....: nodes = np.linspace(-1, 1, 4); ....: print coefficients [ 0.375 1.125 1.125 0.375] In [29]: integral = (coefficients * f(nodes)).sum(); ....: print integral 0.0 In [30]: from math import fsum In [31]: integral = fsum(coefficients * f(nodes)); ....: print integral -7.8062556419e-18
If the nodes of our choice happen to be equally spaced, then there is an improvement of the trapezoidal rule in a special case—if the number of subintervals is a power of two. In that case, we may use the Romberg rule—an improvement that uses the Richardson extrapolation. We can access it with the routine romb
in the same module.
Let's compare results with our running example, this time using 64 subintervals of size 1/32
in the interval [-1,1]
:
In [32]: from scipy.integrate import romb In [33]: nodes = np.linspace(-1, 1, 65) In [34]: romb(f(nodes), dx=1./32) 0.0
We have the option to report the table that shows the Richardson extrapolation from the given nodes:
In [35]: romb(f(nodes), dx=1./32, show=True) Richardson Extrapolation Table for Romberg Integration ==================================================================== 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 ==================================================================== Out[35]: 0.0
We might not have any preference for the choice of nodes, but still have our mind set in using Romberg's rule for our numerical integration scheme. In that case, we could use the routine romberg
, for which we only need to provide the expression of a function and the limits of integration. Optionally, we can provide absolute or relative tolerances for the error (which are both set by default to 1.48e-8
):
In [36]: from scipy.integrate import romberg In [37]: romberg(f, -1, 1, show=True) Romberg integration of <function vfunc at 0x10ffa8c08> from [-1, 1] Steps StepSize Results 1 2.000000 0.000000 2 1.000000 0.000000 0.000000 The final result is 0.0 after 3 function evaluations. Out[37]: 0.0
Another possibility is to use Gaussian quadrature formulas. These are more powerful, since the accuracy of the approximations is gained through computing, internally, the best possible choice of nodes. There are two basic routines in the module scipy.integrate
that perform implementations of this algorithm: quadrature
, if we want to specify tolerance, or fixed_quad
, if we wish to specify the number of nodes (but not their locations!):
In [38]: from scipy.integrate import quadrature, fixed_quad In [39]: value, absolute_error = quadrature(f, -1, 1, tol=1.49e-8); ....: print value 0.0 In [40]: value, absolute_error = fixed_quad(f, -1, 1, n=4); ....: print value # four nodes -9.45424294407e-16
A more advanced method to perform Gaussian quadrature, using an adaptive scheme, is obtained through the function quad
in the module scipy.integrate
. This function is a wrapper of the routine
QAGS
in the Fortran library QUADPACK
. The algorithm breaks the domain of integration into several subintervals and on each of them, performs a 21-point Gaussian-Kronrod quadrature rule. Further acceleration is achieved with Peter Wynn's epsilon algorithm.
For more information on QAGS
as well as the other routines in the QUADPACK
libraries, refer to the netlib repositories: http://www.netlib.org/quadpack/.
Let's compare this with our running example:
In [41]: from scipy.integrate import quad In [42]: value, absolute_error = quad(f, -1, 1); ....: print value 0.0
We could obtain implementation details by setting the optional argument full_output
to True
. This gives us an additional Python dictionary with useful information:
In [43]: value, abs_error, info = quad(f, -1, 1, full_output=True) In [44]: info.keys() Out[44]: ['rlist', 'last', 'elist', 'iord', 'alist', 'blist', 'neval'] In [45]: print "{0} function evaluations".format(info['neval']) 21 function evaluations In [46]: print "Used {0} subintervals".format(info['last']) Used 1 subintervals
To fully understand all the different outputs of info
, we need to know about the underlying algorithm computing the Gaussian quadratures. These particular routines use the Clensaw-Curtis method, a technique based on Chebyshev moments.
In the preceding example, by default, the code tried to use 50 Chebyshev moments. Due to the simplicity of the integrand, and since only one subinterval was needed, it was necessary only to use one of those moments. When we report the 50-entry one-dimensional outputs rlist
, elist
, alist
, and blist
from the dictionary info, we can disregard the information offered by the last 49 entries of each of them:
In [47]: np.set_printoptions(precision=2, suppress=True) In [48]: print info['rlist'] # integral approx on subintervals [ 0.00e+000 2.32e+077 6.93e-310 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 6.45e-314 2.19e-314 6.93e-310 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 -1.48e-224 2.19e-314 6.93e-310 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000] In [49]: print info['elist'] # abs error estimates on subintervals [ 3.70e-015 2.32e+077 3.41e-322 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 7.30e+245 2.19e-314 6.93e-310 0.00e+000 0.00e+000 0.00e+000 4.74e+246 2.20e-314 6.93e-310 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 -9.52e+207 2.19e-314 6.93e-310 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 0.00e+000 2.00e+000 2.00e+000 2.27e-322 1.05e-319] In [50]: print info['alist'] # subintervals left end points [-1. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] In [51]: print info['blist'] # subintervals right end pts [ 1. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 2. -0.]
It is possible to impose a different number of Chebyshev moments to be used. We do so with the optional parameter maxp1
, which imposes an upper bound to this number (rather than fixing it, for optimal results).
Oscillatory integrals of the form f(x)cos(wx) or f(x)sin(wx), even when f(x) is smooth, are especially tricky. The integrator quad
can tackle these integrals by calling the routine QAWO
in QUADPACK
. We can employ this method by specifying the arguments weight='cos'
or weight='sin'
, with wvar=w
.
For example, for the integral of g(x) = sin(x)ex on the interval [-10,10]
, we compare this method with a basic quad
. We could do the following:
In [52]: def f(x): return np.sin(x) * np.exp(x) In [53]: g = np.exp In [54]: quad(g, -10, 10, weight='sin', wvar=1) Out[54]: (3249.4589405744427, 5.365398805302381e-08) In [55]: quad(f, -10, 10) Out[55]: (3249.458940574436, 1.1767634585879705e-05)
Note the significant gain in absolute error.
For details and the theory behind all the quadrature formulas that we have explored in this section, a good reference is Chapter 3, Numerical Differentiation and Integration of Walter Gautchi's Numerical Analysis.
The second case of integration is that of definite integrals on a finite interval [a,b]
of functions with singularities. We contemplate two cases: weighted functions and generic functions with singularities.
Weighted functions can be realized as products of the f(x)w(x) kind for some smooth function f(x) with a non-negative weight function w(x) containing singularities. An illustrative example is given by cos(πx/2)/√x. We could regard this case as the product of cos(πx/2) with w(x)=1/√x. The weight presents a single singularity at x=0, and is smooth otherwise.
The usual way to treat these integrals is by means of weighted Gaussian quadrature formulas. For example, to perform principal value integrals of functions of the form f(x)/(x-c), we issue quad
with the arguments weight='cauchy'
and wvar=c
. This calls the routine QAWC
from QUADPACK
.
Let's experiment with the Fresnel-type sine integral of g(x) = sin(x)/x on the interval [-1,1]
and compare it with romberg
:
In [56]: value, abs_error = quad(f, -1, 1, weight='cauchy',wvar=0); ....: print value 1.89216614073 In [57]: romberg(g, -1, 1) Out[57]: 2.35040238729
In the case of integrals of functions with weights (x-a)α(b-x)β, where a
and b
are the endpoints of the domain of integration and both alpha
and beta
are greater than -1
, we issue quad
with the arguments weight='alg'
and wvar=(alpha, beta)
. This calls the routine QAWS
from QUADPACK
.
Let's experiment with the Fresnel-type cosine integral of g(x)=cos(πx/2)/√x on the interval [0,1]
and compare it with quadrature
:
In [58]: def f(x): return np.cos(np.pi * x * 0.5) In [59]: def g(x): return np.cos(np.pi * x * 0.5) / np.sqrt(x) In [60]: value, abs_error = quad(f, 0, 1, weight='alg', wvar=(-0.5,0)); ....: print value 1.55978680075 In [61]: quadrature(g, 0, 1) quadrature.py:178: AccuracyWarning: maxiter (50) exceeded. Latest difference = 3.483182e-04 AccuracyWarning) Out[61]: (1.5425452942607543, 0.00034831815190772275)
If the weight has the form w(x)=(x-a)α(b-x)βln(x-a), w(x)=(x-a)α(b-x)βln(b-x), or w(x)=(x-a)α(b-x)βln(x-a)ln(b-x), we issue quad
with the arguments weight='alg-loga'
, or weight='alg-logb'
or weight='alg-log'
respectively, and in each case, wvar=(alpha, beta)
. For example, for the function g(x)=x2ln(x) on the interval [0,1]
, we could issue the following:
In [62]: def f(x): return x**2 In [63]: def g(x): return x**2 * np.log(x) In [64]: value, abs_error = quad(f, 0, 1, weight='alg-loga', wvar=(0,0)); ....: print value -0.111111111111
The actual value of this integral is -1/9
.
In general, we might be handling functions with singularities that do not conform to the nice forms f(x)w(x) that were indicated in the previous section. In that case, if we are aware of the locations of the singularities, we could indicate so to the integrator quad with the optional argument points. The integrator quad
calls the routine QAGP
in QUADPACK
. For example, for the function g(x) = floor(x)ln(x) that observes a singularity on each integer number, to integrate on the interval [1,8]
, we could issue the following:
In [65]: def g(x): return np.floor(x) * np.log(x) In [66]: quad(g, 1, 8, points=np.arange(8)+1) Out[66]: (45.802300241541005, 5.085076830895319e-13)
Compare this to a simple quad computation without indicating any singularities, as the next lines of code show:
In [67]: quad(g, 1, 8) quadpack.py:295: UserWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze the integrand in order to determine the difficulties. If the position of a local difficulty can be determined (singularity, discontinuity) one will probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special- purpose integrator should be used. warnings.warn(msg) Out[67]: (45.80231242134967, 8.09803045669355e-05)
The versatile integrator quad
is also able to compute definite integrals on unbounded domains using adaptive quadrature formulas, by means of a call to the routine QAGI
from QUADPACK
. This process does not work with 'cauchy
', or any of the 'alg
'-type weight options.
In general, if the functions to integrate do not present singularities, the approximations are reliable. The presence of singularities gives unreliable integrals, as the following example suggests:
In [68]: def f(x): return 2 * np.exp(-x**2) / np.sqrt(np.pi) In [69]: value, absolute_error = quad(f, 0, np.inf); ....: print value 1.0 In [70]: def f(x): return np.sin(x)/x In [71]: integrate(Sin(x)/x, (x, 0, oo)) Out[71]: pi/2 In [72]: value, absolute_error = quad(f, 0, np.inf); ....: print value # ouch! 2.24786796347
In the case of oscillatory integrals in unbounded domains, besides issuing quad
with the argument weight='cos'
or weight='sin'
and the corresponding wvar
parameter, we can also place an upper bound on the number of cycles to use internally. We do so by setting the optional argument limlst
to the desired bound. It is usually a good idea to set it to something larger than three. For example, for the Fourier-like integral of the sinc
function on [1, ∞], we could issue the following command:
In [73]: def f(x): return 1./x In [74]: quad(f, 1, np.inf weight='sin', wvar=1, limlst=5) quadpack.py:295: UserWarning: The maximum number of cycles allowed has been achieved., e.e. of subintervals (a+(k-1)c, a+kc) where c = (2*int(abs(omega)+1))*pi/abs(omega), for k = 1, 2, ..., lst. One can allow more cycles by increasing the value of limlst. Look at info['ierlst'] with full_output=1. warnings.warn(msg) Out[74]: (0.636293340511029, 1.3041427865109276) In [75]: quad(f, 1, np.inf, weight='sin', wvar=1, limlst=50) Out[75]: (0.6247132564795975, 1.4220476353655983e-08)
It is also possible to perform multivariate numerical integration on different domains, through application of adaptive Gaussian quadrature rules. In the module scipy.integrate
, we have to this effect the routines dblquad
(double integrals), tplquad
(triple integrals), and nquad
(integration over multiple variables).
These routines can only compute definite integrals over type I regions:
a
and b
and two univariate functions f(x) and h(x).a
, b
, univariate functions f(x), h(x), and bivariate functions q(x,y), r(x,y).Let's run a numerical integration over the function of the example in line In [18]
. Note the order in which the different variables must be introduced in the definition of the function to be integrated:
In [76]: def f(x, y): return np.exp(-x**2 - y**2) In [77]: from scipy.integrate import dblquad In [78]: dblquad(f, 0, np.inf, lambda x:0, lambda x:np.inf) Out[78]: (0.785398163397, 6.29467149642e-09)
3.133.133.233