Monte Carlo simulation is used to reproduce and numerically solve a problem in which random variables are also involved, and whose solution by analytical methods is too complex or impossible. In addition, the use of simulation allows you to test the effects of changes in the input variables or the output function more easily and with a high degree of detail. Starting from modeling the processes and generating random variables, simulations composed of multiple runs capable of obtaining an approximation of the probability of certain results are performed.
This method has assumed great importance in many scientific and engineering areas, above all for its ability to deal with complex problems that previously could only be solved through deterministic simplifications. It is mainly used in three distinct classes of problems: optimization, numerical integration, and generating probability functions. In this chapter, we will explore various techniques based on Monte Carlo methods for process simulation. First, we will learn about the basic concepts and then learn how to apply them to practical cases.
In this chapter, we’re going to cover the following main topics:
In this chapter, we will provide an introduction to Monte Carlo simulation. To deal with the topics in this chapter, it is necessary to have a basic knowledge of algebra and mathematical modeling.
To work with the Python code in this chapter, you’ll need the following files (available on GitHub at the following URL: https://github.com/PacktPublishing/Hands-On-Simulation-Modeling-with-Python-Second-Edition):
In simulation procedures, the evolution of a process is followed, but at the same time, forecasts of possible future scenarios are made. A simulation process consists of building a model that closely imitates a system. From the model, numerous samples of possible cases are generated and subsequently studied over time. After this, the results are analyzed over time, all while highlighting the alternative decisions that can be made.
The term Monte Carlo simulation was born at the beginning of the Second World War by J. von Neumann and S. Ulam as part of the Manhattan project at the Los Alamos nuclear research center. They replaced the parameters of the equations that describe the dynamics of nuclear explosions with a set of random numbers. The choice of the name Monte Carlo was due to the uncertainty of the winnings that characterize the famous casino of the Principality of Monaco.
To obtain a simulation with satisfactory results, applications that use the Monte Carlo method are based on the following components:
The Monte Carlo simulation calculates a series of possible realizations of the phenomenon in question, along with the weight of the probability of a specific occurrence, while trying to explore the whole space of the parameters of the phenomenon.
Once this random sample has been calculated, the simulation gathers measurements of the quantities of interest in the sample. It is well executed if the average value of these measurements on the system realizations converges to the true value.
Important note
The functionality of Monte Carlo simulation can be summarized as follows: a phenomenon is observed n times, recording the methods adopted in each event, to identify the statistical distribution of the character.
The primary objective of the Monte Carlo method is to estimate a parameter that’s representative of a population. To do this, the calculator generates a series of n random numbers that make up the sample of the population in question.
For example, suppose we want to evaluate a parameter, A, that’s currently unknown, which can be interpreted as the average value of a random variable. The Monte Carlo method consists of, in this case, estimating this parameter by calculating the average of a sample consisting of N values of X. This is obtained using a procedure that involves the use of random numbers, as shown in the following diagram:
Figure 4.1 – Process of a random generator
In the Monte Carlo simulation, a series of possible realizations of a phenomenon are calculated to explore all the available parameters.
Important note
In this calculation, the weight of the probability of each event assumes importance. When the representative sample is calculated, the simulation measures the quantities of interest in this sample.
Monte Carlo simulation works if the average value of these measurements on the system converges to the real value.
The Monte Carlo simulation proves to be a valid tool for addressing the following problems:
The necessary conditions to apply the method are the independence and analogy of the experiments. For independence, it is understood that the results of each repetition of the experiment must not be able to influence each other. By analogy, however, reference is made to the fact that, when observing the character, the same experiment is repeated n times.
The Monte Carlo method is a problem-solving strategy that uses statistics. If we use P to indicate the probability of a certain event, then we can randomly simulate this event and obtain P by making the ratio between the number of times our event occurred and the number of total simulations, as follows:
We can apply this strategy to get an approximation of Pi. Pi (π) is a mathematical constant indicating the relationship between the circumference of a circle and its diameter. If we denote the length of a circumference with C and its diameter with d, we know that C = d * π. The circumference of a circle with a diameter equal to 1 is π.
Important note
Usually, we approximate the value of Pi with 3.14 to simplify the accounts. However, π is an irrational number; that is, it has an infinite number of digits after the decimal point that never repeats regularly.
Given that a circle has a radius of 1, it can be inscribed in a square with a side equal to 2. For convenience, we will only consider a fraction of the circle, as shown in the following figure:
Figure 4.2 – A fraction of the circle
By analyzing the previous figure, we can see that the area of the square in blue is 1 and that the area of the circular sector in yellow (1/4 of the circle) is Pi/4. We randomly place a very large number of points inside the square. Thanks to the very large number and random distribution, we can approximate the size of the areas with the number of points contained in them.
If we generate N random numbers inside the square as the number of points that fall in the circular sector, which we will denote with M, divided by the total number of generated numbers, N, we will have to approximate the area of the circular sector, in which case it will be equal to Pi/4. From this, we can derive the following equation:
The greater the number of points generated, the more precise the approximation of Pi will be.
Now, let’s analyze the code line by line to understand how we have implemented the simulation procedure for estimating Pi:
import math
import random
import numpy as np
import matplotlib.pyplot as plt
The math library provides access to the mathematical functions defined by the C standard library. The random library implements pseudo-random number generators for various distributions. The random module is based on the Mersenne Twister algorithm. The numpy library offers additional scientific functions of the Python language, designed to perform operations on vectors and dimensional matrices. Finally, the matplotlib library is a Python library for printing high-quality graphics.
N = 10000
M = 0
As we mentioned previously, N represents the number of points that we generate – that is, those that we are going to position. Instead, M will be the points that fall within the circular sector. To start, these points will be zero and as we generate them, we will try to perform a check. In a positive scenario, we will gradually increase this number.
XCircle=[]
YCircle=[]
XSquare=[]
YSquare=[]
Here, we have defined two types of points: Circle and Square. Circle is a point that falls within the circular sector, while Square is a point that falls within the space of the square outside the circular sector. Now, we can generate the points:
for p in range(N): x=random.random() y=random.random()
Here, we used a for loop that iterates the process several times equal to the number, N, of samples we want to generate. Then, we used the random() function of the random library to generate the points. The random() function returns the next nearest floating-point value from the generated sequence. All the return values fall between 0 and 1.0.
if(x**2+y**2 <= 1):
M+=1
XCircle.append(x)
YCircle.append(y)
else:
XSquare.append(x)
YSquare.append(y)
The if loop allows us to check the position of the points. Recall that the points of a circumference are defined by the following equation:
If x0=y0=0 and r=1, the previous equation turns into the following:
This makes us understand that the necessary condition for a point to fall within the circular sector is that the following equation is verified:
If this condition is satisfied, the value of M is increased by 1 unit and the values of the x and y values that are generated are stored in the Circle point vector (XCircle, YCircle). Otherwise, the value of M is not updated, and the values of x and y that are generated are stored in the vector of the Square point vector (XSquare, YSquare).
Pi = 4*M/N
print("N=%d M=%d Pi=%.2f" %(N,M,Pi))
In this way, we can calculate Pi and print the results, as follows:
N=10000 M=7857 Pi=3.14
The estimate that we’ve obtained is acceptable. Usually, we stop at the second decimal place, so this is okay. Now, let’s draw a graph, where we will draw the generated points. To start, we will generate the points of the circumference arc:
XLin=np.linspace(0,1) YLin=[] for x in XLin: YLin.append(math.sqrt(1-x**2))
The linspace() function of the numpy library allows us to define an array composed of a series of N numerical elements equally distributed between two extremes (0,1). This will be the x of the arc of the circumference (XLin). On the other hand, the y numerical elements (YLin) will be obtained from the equation of the circumference while solving them concerning y, as follows:
To calculate the square root, we used the math.sqrt() function.
plt.axis ("equal")
plt.grid (which="major")
plt.plot (XLin , YLin, color="red" , linewidth="4")
plt.scatter(XCircle, YCircle, color="yellow", marker =".")
plt.scatter(XSquare, YSquare, color="blue" , marker =".")
plt.title ("Monte Carlo method for Pi estimation")
plt.show()
The scatter() function allows us to represent a series of points not closely related to each other on two axes. The following diagram is printed:
Figure 4.3 – Plot of the Pi estimation
Consistent with what we established at the beginning of this chapter, we plotted the points inside the circular sector in yellow, while those outside the circular sector are in blue. To highlight the separation line, we have drawn the circumference arc in red.
Now that we’ve applied the Monte Carlo method to estimate Pi, the time has come to deepen some fundamental concepts for simulation based on generating random numbers.
The Monte Carlo method is essentially a numerical method for calculating the expected value of random variables; that is, an expected value that cannot be easily obtained through direct calculation. To obtain this result, the Monte Carlo method is based on two fundamental theorems of statistics: the law of large numbers and the central limit theorem.
This theorem states the following: considering a very large number of variables, ? (? → ∞), the integral that defines the average value is approximate to the estimate of the expected value. Let’s try to give an example so that you can understand this. We flip a coin 10 times, 100 times, and 1,000 times and check how many times we get heads. We can put the results we obtained into a table, as follows:
Number of coin flips |
Number of heads |
Head output frequency |
10 |
4 |
40% |
100 |
44 |
44% |
1,000 |
469 |
46.9% |
Table 4.1 – Table showing the results for a coin toss
Analyzing the last column of the previous table, we can see that the value of the frequency approaches that of the probability (50%). Therefore, we can say that as the number of tests increases, the frequency value tends to the theoretical probability value. The latter value can be achieved using the hypothesis of several throws that tend to infinity.
Important note
The use of the law of large numbers is different. The law of large numbers allowed us, in the Monte Carlo method for Pi estimation section, to equal the number of launches with the area of the circular sector. In this way, we were able to estimate the value of Pi simply by generating random numbers. Also, in this case, the greater the number of random variables generated, the closer the estimate of Pi is to the expected value.
The law of large numbers allows you to determine the centers and weights of a Monte Carlo analysis to estimate definite integrals but does not say how large the number, N, must be. You do not have an estimate to understand with what order of magnitude you can perform a simulation so that you can consider the numbers large enough. To answer this question, it is necessary to resort to the central limit theorem.
Monte Carlo not only allows us to obtain an estimate of the expected value, as established by the law of large numbers, but also allows us to estimate the uncertainty associated with it. This is possible thanks to the central limit theorem, which returns an estimate of the expected value and the reliability of that result.
Important note
The central limit theorem can be summarized with the following definition: given a dataset with an unknown distribution, the sample’s mean will approximate the normal distribution.
If the law of large numbers tells us that the random variable allows us to evaluate the expected value, the central limit theorem provides information on its distribution.
An interesting feature of the central limit theorem is that there are no constraints on the distribution of the function that’s used to generate the N samples from which the random variable is formed. It is not important what the distribution associated with the random variable is, but when the average is characterized by a finite variance and is obtained for a very large number of samples, it can be described through a Gaussian distribution.
Let’s take a look at a practical example. We generate 10,000 random numbers with a uniform distribution. Then, we extract 100 samples from this population, also taken randomly. We repeat this operation a consistent number of times and for each time, we evaluate its average and store this value in a vector. In the end, we draw a histogram of the distribution that we have obtained. Here is the Python code:
import random import numpy as np import matplotlib.pyplot as plt a=1 b=100 N=10000 DataPop=list(np.random.uniform(a,b,N)) plt.hist(DataPop, density=True, histtype='stepfilled', alpha=0.2) plt.show() SamplesMeans = [] for i in range(0,1000): DataExtracted = random.sample(DataPop,k=100) DataExtractedMean = np.mean(DataExtracted) SamplesMeans.append(DataExtractedMean) plt.figure() plt.hist(SamplesMeans, density=True, histtype='stepfilled', alpha=0.2) plt.show()
Now, let’s analyze the code line by line to understand how we have implemented the simulation procedure to understand the central limit theorem:
import random
import numpy as np
import matplotlib.pyplot as plt
The random library implements pseudo-random number generators for various distributions. The numpy library offers additional scientific functions of the Python language and is designed to perform operations on vectors and dimensional matrices.
Finally, the matplotlib library is a Python library for printing high-quality graphics.
The a and b parameters are the extremes of the range, and N is the number of values we want to generate.
Now, we can generate the uniform distribution using the NumPy random.uniform() function, as follows:
DataPop=list(np.random.uniform(a,b,N))
plt.hist(DataPop, density=True, histtype='stepfilled', alpha=0.2)
plt.show()
The matplotlib.hist() function draws a histogram; that is, a diagram in classes of a continuous character.
This is used in many contexts, usually to show statistical data when there is an interval of the definition of the independent variable divided into subintervals.
The following diagram is printed:
Figure 4.4 – Plot of the data distribution
The distribution appears to be uniform – we can see that each bin is populated with an almost constant frequency.
SamplesMeans = []
for i in range(0,1000):
DataExtracted = random.sample(DataPop,k=100)
DataExtractedMean = np.mean(DataExtracted)
SamplesMeans.append(DataExtractedMean)
First, we initialized the vector that will contain the samples. To do this, we used a for loop to repeat the operations 1,000 times. At each step, we extracted 100 samples from the population generated using the random.sample() function. The random.sample() function extracts samples without repeating the values and without changing the input sequence.
plt.figure()
plt.hist(SamplesMeans, density=True, histtype='stepfilled', alpha=0.2)
plt.show()
The following histogram is printed:
Figure 4.5 – Plot of the extracted samples
The distribution has now taken on the typical bell-shaped curve characteristic of the Gaussian distribution. This means that we have proved the central limit theorem.
Now, we are ready to apply the newly learned Monte Carlo simulation concepts to real cases.
Monte Carlo simulation is used to study the response of a model that’s used randomly generated inputs. The simulation process takes place in the following three phases:
Monte Carlo simulation is widely used for analyzing financial, physical, and mathematical models.
Generating probability distributions that cannot be found with analytical methods can easily be addressed with Monte Carlo methods. For example, let’s say we want to estimate the probability distribution of the damage caused by earthquakes in a year in Japan.
Important note
In this type of analysis, there are two sources of uncertainty: how many earthquakes there will be in a year and how much damage each earthquake will do. Even if it is possible to assign a probability distribution to these two logical levels, it is not always possible to put this information together with analytical methods to derive the distribution of the annual losses.
It is easier to do a Monte Carlo simulation of this type like so:
By cyclically repeating these three points, a sample of annual losses is generated, from which it is possible to estimate the probability distribution that could not be obtained analytically.
Various algorithms can be used to find the local minima of a function. Typically, these algorithms proceed according to the following steps:
They keep repeating these steps until they reach a minimum. In the case of a function with only one minimum, this method allows us to achieve a result. But what if we have a function with many local minima and we want to find the point that minimizes the function globally? The following diagram shows the two cases just mentioned; that is, a distribution with only one minimum (left) and a distribution with several minimums (right):
Figure 4.6 – Graphs of the two distributions
A local search algorithm could stop at any of the many local minima of the function. How would you know if you found one of the many local minimums or the global minimum? There is no way to strictly establish this. The only practical possibility is to explore different areas of the search domain to increase the probability of finding, among the various local minima, the global one.
Important note
Different methods have been developed to explore domains that can be very complicated, with many dimensions and with constraints to be respected.
Monte Carlo methods provide a solution to this problem; that is, an initial population of points belonging to the domain is created, which is then evolved by defining coupling algorithms between the points in which random genetic mutations also occur. When simulating different generations of points, a selection process intervenes that maintains only the best points – that is, those that give lower values of the function to be minimized.
Each generation keeps track of which point represents the best specimen ever. Continuing with this process, the points tend to move to local lows, but at the same time, they explore many areas of the optimization domain. This process can continue indefinitely, though at some point, it is stopped, and the best specimen is taken as an estimate of the global minimum.
Monte Carlo methods allow you to simulate the behavior of an event of interest and, in general, return a random variable as a result whose properties, such as mean, variance, probability density function, and so on, provide us with important information about the quality of the simulation.
This is a statistical analysis technique that can be applied to all those situations in which we are faced with very uncertain project estimates to reduce the level of uncertainty through a series of simulations. In this sense, it can be applied when analyzing the times, costs, and risks associated with a project and, therefore, when evaluating the impact that this project may have on the community.
Important note
For each of these variables, the simulations do not provide a single estimate but a range of possible estimates, along with, associated with each estimate, the level of probability that that estimate is accurate.
For example, this technique can be used to determine the overall cost of a project through a discrete series of simulation cycles. In the planning phase of a project, the activities that make up the project are identified, and the cost associated with each activity is estimated. In this way, the total cost of the project can be determined. Since, however, we rely on cost estimates, we cannot be sure that this overall cost, and therefore also the completion costs, are certain. In such cases, Monte Carlo simulation can be carried out.
Now, let’s learn how to apply the Monte Carlo simulation to compute integrals.
Monte Carlo simulations represent numerical solutions for calculating integrals. In fact, with the use of the Monte Carlo algorithm, it is possible to adopt a numerical procedure to solve mathematical problems, with many variables that do not present an analytical solution. The efficiency of the numerical solution increases compared to other methods when the size of the problem increases.
Important note
Let’s analyze the problem of a definite integral. In the simplest cases, there are methods for integration that foresee the use of techniques such as integration by parts, integration by replacement, and so on. In more complex situations, however, it is necessary to adopt numerical procedures that involve the use of a computer. In these cases, the Monte Carlo simulation provides a simple solution that’s particularly useful in cases of multidimensional integrals.
However, it is important to highlight that the result that’s returned by this simulation approximates the integral and not its precise value.
In the following equation, we use I to denote the definite integral of the function, f, in the limited interval, [a, b]:
In the interval, [a, b], we identify the maximum of the function, f, and indicate it with U. To evaluate the approximation that we are introducing, we draw a base rectangle, [a, b], and the height, U. The area under the function, f (x), which represents the integral of f(x), will surely be smaller than the area of the base rectangle, [a, b], and the height, U. The following diagram shows the area subtended by the function, f – which represents the integral of f(x) – and the area, A, of the rectangle with the base, [a, b], and the height, U, which represents our approximation:
Figure 4.7 – Plot of the function
By analyzing the previous diagram, we can identify the following intervals:
In Monte Carlo simulation, x and y both represent random numbers. At this point, we can consider a point in the plane of the Cartesian coordinates (x, y). Our goal is to determine the probability that this point is within the area highlighted in the previous diagram; that is, that it is y ≤ f(x). We can identify two areas:
Let’s try to write a relationship between the probability and these two areas:
It is possible to estimate the probability, P (y <= f (x)), through Monte Carlo simulation. In fact, in the Monte Carlo method for Pi estimation section, we faced a similar case. To do this, N pairs of random numbers (?I, ?i) are generated, as follows:
Generating random numbers in the intervals considered will certainly determine conditions in which ?i ≤ f (?i) will result. If we number this quantity and denote it with M, we can analyze its variation. This is an approximation whose accuracy increases as the number of random number pairs (?i, ?i) generated increases. The approximation of the calculation of the probability, P (y≤ f (x)), will therefore be equal to the following value:
After calculating this probability, it will be possible to trace the value of the integral using the previous equation, as follows:
This is the mathematical representation of the problem. Now, let’s see the numerical solution.
We will begin by setting up the components that we will need for the simulation, starting with the libraries that we will use to define the function and its domain of existence. The Python code for numerical integration through the Monte Carlo method is shown here:
import random import numpy as np import matplotlib.pyplot as plt random.seed(2) f = lambda x: x**2 a = 0.0 b = 3.0 NumSteps = 1000000 XIntegral=[] YIntegral=[] XRectangle=[] YRectangle=[]
Now, let’s analyze the code line by line to understand how we have implemented the simulation procedure to understand the central limit theorem:
import random
import numpy as np
import matplotlib.pyplot as plt
The random library implements pseudo-random number generators for various distributions. The numpy library offers additional scientific functions of the Python language, designed to perform operations on vectors and dimensional matrices. Finally, the matplotlib library is a Python library for printing high-quality graphics. Let’s set the seed:
random.seed(2)
The random.seed() function is useful if we wish to have the same set of data available to be processed in different ways as it makes the simulation reproducible. This function initializes the basic random number generator. If you use the same seed in two successive simulations, you will always get the same sequence of pairs of numbers.
f = lambda x: x**2
We know that to define a function in Python, we must use the def clause, which automatically assigns a variable to it. Functions can be treated like other Python objects, such as strings and numbers. These objects can be created and used at the same time (on the fly) without us resorting to creating and defining variables that contain them.
In Python, functions can also be used in this way, using a syntax called lambda. The functions that are created in this way are anonymous. This approach is often used when you want to pass a function as an argument for another function. The lambda syntax requires the lambda clause, followed by a list of arguments, a colon character, the expression to evaluate the arguments, and finally the input value.
a = 0.0
b = 3.0
NumSteps = 1000000
As we mentioned in the Defining the problem section, a and b represent the ends of the range in which we want to calculate the integral. NumSteps represents the number of steps in which we want to divide the integration interval. The greater the number of steps, the better the simulation will be, even if the algorithm becomes slower.
XIntegral=[]
YIntegral=[]
XRectangle=[]
YRectangle=[]
Whenever the generated y value is less than or equal to f (x), this value and the relative x value will be added at the end of the XIntegral, YIntegral vectors. Otherwise, they will be added at the end of the XRectangle, YRectangle vectors.
Before using this method, it is necessary to evaluate the minimum and maximum of the function:
Important note
Recall that if the function has only one minimum/maximum, the procedure is simple. If there are repeated minimums/maximums, then the procedure becomes more complex.
ymin = f(a)
ymax = ymin
for i in range(NumSteps):
x = a + (b - a) * float(i) / NumSteps
y = f(x)
if y < ymin: ymin = y
if y > ymax: ymax = y
ymin = f(a)
ymax = ymin
for i in range(NumSteps):
x = a + (b - a) * float(i) / NumSteps
y = f(x)
if y < ymin: ymin = y
if y > ymax: ymax = y
Now, we will apply the Monte Carlo method, as follows:
A = (b - a) * (ymax - ymin)
N = 1000000
M = 0
for k in range(N):
x = a + (b - a) * random.random()
y = ymin + (ymax - ymin) * random.random()
if y <= f(x):
M += 1
XIntegral.append(x)
YIntegral.append(y)
else:
XRectangle.append(x)
YRectangle.append(y)
NumericalIntegral = M / N * A
print ("Numerical integration = " + str(NumericalIntegral))
A = (b - a) * (ymax - ymin)
N = 1000000
M = 0
for k in range(N):
x = a + (b - a) * random.random()
y = ymin + (ymax - ymin) * random.random()
We can do this with an if statement, as follows:
if y <= f(x): M += 1 XIntegral.append(x) YIntegral.append(y)
else:
XRectangle.append(x)
YRectangle.append(y)
NumericalIntegral = M / N * A
print ("Numerical integration = " + str(NumericalIntegral))
The following result is printed:
Numerical integration = 8.996787006398996
The analytical solution for this simple integral is as follows:
The percentual error we made is equal to the following:
This is a negligible error that defines our reliable estimate.
Now, let’s plot the results using the following code:
XLin=np.linspace(a,b)
YLin=[]
for x in XLin:
YLin.append(f(x))
plt.axis ([0, b, 0, f(b)])
plt.plot (XLin,YLin, color="red" , linewidth="4")
plt.scatter(XIntegral, YIntegral, color="blue", marker =".")
plt.scatter(XRectangle, YRectangle, color="yellow", marker =".")
plt.title ("Numerical Integration using Monte Carlo method")
plt.show()
XLin=np.linspace(a,b)
YLin=[]
for x in XLin:
YLin.append(f(x))
The linspace() function of the numpy library allows us to define an array composed of a series of N numerical elements equally distributed between two extremes (0,1). This will be the x of the function, while the y of the function (YLin) will be obtained from the equation of the function solving them concerning y, as follows:
plt.axis ([0, b, 0, f(b)])
plt.plot (XLin,YLin, color="red" , linewidth="4")
plt.scatter(XIntegral, YIntegral, color="blue", marker =".")
plt.scatter(XRectangle, YRectangle, color="yellow", marker =".")
plt.title ("Numerical Integration using Monte Carlo method")
plt.show()
First, we set the length of the axes using the plt.axis() function. So, we plotted the curve of the x2 function, which, as we know, is a convex increasing the monotone function in the range of values considered, [0,3].
Then, we plotted two scatter plots:
The scatter() function allows us to represent a series of points not closely related to each other on two axes.
The following diagram is returned:
Figure 4.8 – Plot of numerical integration results
As we can see, all the points in blue are positioned below the curve of the function (curve in red), while all the points in yellow are positioned above the curve of the function.
Often, when performing a numerical calculation, it is necessary to evaluate what effect the input variables have on the output. In this case, it is possible to use sensitivity analysis. Let’s see how.
The variability, or uncertainty, associated with a parameter propagates throughout the model, making it a strong contribution to the variability of the model’s outputs. The model results can be highly correlated with an input parameter so that small changes in the input cause significant changes in the output. A widely used methodology in the field of data analytics is sensitivity analysis. It studies the correlation between the uncertainty of the output of a mathematical model and the various sources of randomness present in the input: we speak of uncertainty analysis when we focus on the quantitative aspect of the problem. There are many objectives of this type of study; here are some examples:
In the context of sensitivity analysis, we can distinguish between local and global methodologies. Local methodologies focus on a particular point in the domain of the input space when we are interested in understanding how the output behaves from this point. Global methods focus not on a single point but on a range of values in the input factor space. In general, for evaluations in the stochastic context, this type of methodology is used.
In sensitivity analysis, a change in the input of the model is required, which we can do by using a certain scenario, and the variation of the output due to this change is identified. The success of this technique derives from the possibility of studying the functioning of complex models with simplicity: this complexity prevents us from analyzing the behavior of the model through simple intuition. It follows that an operational methodology is needed to overcome these difficulties. You can think of the model as a black box: the system is described through inputs and outputs and its precise internal functioning is not visible.
Sensitivity analysis returns indices (sensitivity coefficients) that represent the importance of each parameter and thus allow you to rank the parameters. Therefore, the analysis aims to identify the parameters that require additional research to strengthen knowledge and therefore reduce the uncertainty of the output, ensuring calibration of the model. It allows you to identify insignificant parameters that can be eliminated from the model, thus allowing you to reduce the model. It also highlights how much the predictions of the model depend on the values of the parameters by carrying out a robustness analysis, and which parameters are most highly correlated with the output through adequate control of the system. Once a model is in use, it gives us the consequences of changing a given input parameter.
The methodology is based on the following tasks:
Ultimately, this analysis makes it possible to evaluate to what extent the uncertainty surrounding each of the independent variables may affect the value assumed by the valuation base. This impact essentially depends on two elements:
While the analyst cannot intervene in the latter, as it depends on the problem being faced, in the former, it is possible to intervene, in the sense that it can be reduced by taking on additional information that’s useful for reducing the uncertainty surrounding the variable in question. However, any survey supplement aimed at improving the accuracy of the estimates involves additional calculations for the analyst. It should be noted, however, that this intervention makes sense, especially for the variables whose deviations from the base case may change the outcome of the assessment.
Therefore, sensitivity analysis provides useful information on the riskiness of a project and the sources from which it originates. Concerning the latter, it should be emphasized that it is not so much a sensitivity in an absolute sense that interests us, but rather that we wish to verify if there is the possibility that the objective function changes its sign. Consequently, determining the range of variability of each variable is critical, since it is incorrect and risky to assume a similar interval for each variable for simplicity. In doing so, completely unlikely scenarios can be assumed as possible.
In the local approach, the impact of small input perturbations on the model output is studied. These small perturbations occur around nominal values, such as the average of a random variable. This deterministic approach consists of calculating or estimating the partial derivatives of the model at a specific point in the space of the input variables. Using adjoint-based methods allows you to process models with a large number of input variables. These methods are affected by problems due to linearity and normality assumptions and local variations.
Global methods have been developed to overcome these limitations. With this approach, we do not distinguish any initial set of input values of the model, but we consider the numerical model in the whole domain of the possible variations of the input parameters. Therefore, global sensitivity analysis is a tool that’s used to study a mathematical model as a whole rather than one of its solutions around specific parameter values.
Sensitivity analysis can be performed using different techniques. Let’s see some of them:
Now that we’ve adequately introduced sensitivity analysis, let’s look at a practical case in a Python environment. As we mentioned previously, with sensitivity analysis, we see how the outputs change over the entire range of possible inputs. It does not return any probability distribution of the results, instead providing a range of possible output values associated with each set of inputs. In the following code, we will learn how to use the tools available to perform sensitivity analysis on artificially generated data. Here is the Python code (sensitivity_analysis.py):
import numpy as np import math from sensitivity import SensitivityAnalyzer def my_func(x_1, x_2,x_3): return math.log(x_1/ x_2 + x_3) x_1=np.arange(10, 100, 10) x_2=np.arange(1, 10, 1) x_3=np.arange(1, 10, 1) sa_dict = {'x_1':x_1.tolist(),'x_2':x_2.tolist(),'x_3':x_3.tolist()} sa_model = SensitivityAnalyzer(sa_dict, my_func) plot = sa_model.plot() styled_df = sa_model.styled_dfs()
As always, we will analyze the code line by line:
import numpy as np
import math
from sensitivity import SensitivityAnalyzer
To start, we imported the numpy library, a library of additional scientific functions of the Python language, designed to perform operations on vectors and dimensional matrices. numpy allows you to work with vectors and matrices more efficiently and faster than you can do with lists and lists of lists (matrices). In addition, it contains an extensive library of high-level mathematical functions that can operate on these arrays. Then, we imported the math library, which provides access to the mathematical functions defined by the C standard. Finally, we imported the SensitivityAnalyzer function from the sensitivity library. This library contains the tools to perform sensitivity analysis in a Python environment.
def my_func(x_1, x_2,x_3):
return math.log(x_1/ x_2 + x_3)
Here, we have created a simple three-variable function that, using the logarithm function and a fraction function, creates a wide variability of the output from the different inputs. This is because we aim to highlight the variability of the output from the different inputs.
Now, let’s define the variable domain of our function:
x_1=np.arange(10, 100, 10) x_2=np.arange(1, 10, 1) x_3=np.arange(1, 10, 1)
Here, we have created three numpy arrays using the np.arange() function. This function creates an array with equidistant values within a given range.
sa_dict = {'x_1':x_1.tolist(),'x_2':x_2.tolist(),'x_3':x_3.tolist()}
To convert numpy arrays into lists, we used the tolist() function. This function is used to convert a certain array into a normal list with the same elements, elements, or values.
sa_model = SensitivityAnalyzer(sa_dict, my_func)
The SensitivityAnalyzer function performs sensitivity analysis based on the passed function and possible values for each argument. We only passed two arguments: sa_dict and my_func. The first is a Python dictionary that contains the values of the three input paths. The second argument is the function that defines the output. The SensitivityAnalyzer function executes the passed function with the Cartesian product of the possible values for each argument.
plot = sa_model.plot()
styled_df = sa_model.styled_dfs()
The following plots are returned:
Figure 4.9 – Plot of sensitivity analysis
Three graphs have been drawn; this is because there are three input variables. To be able to appreciate the variability of the output as a function of the variability of the inputs, three graphs were drawn that couple the input variables into pairs.
By analyzing the third graph on the right, we can see, for example, that for low values of x_2, the x_3 variable seems irrelevant: when it changes, the output remains unchanged. We can appreciate this from the hexadecimal color map, which shows the same color for the entire range of variability of x_3. As the values of x_2 increase, there is a variability of the output, even as x_3 varies.
In the next section, we will learn how to evaluate the average information contained in data distributions. To do this, we will introduce the concepts of Cross-entropy.
In Chapter 2, Understanding Randomness and Random Numbers, we introduced the entropy concepts in computing. Let’s recall these concepts.
First, there’s Shannon entropy. For a probability distribution, P={ p1, p2, ..., pN}, where pi is the probability of the N extractions, xi, of a random variable, X, Shannon defined the following measure, H, in probabilistic terms:
This equation has the same form as the expression of thermodynamic entropy and for this reason, it was defined as entropy upon its discovery. The equation establishes that H is a measure of the uncertainty of an experimental result or a measure of the information obtained from an experiment that reduces the uncertainty. It also specifies the expected value of the amount of information transmitted from a source with a probability distribution. Shannon’s entropy could be seen as the indecision of an observer trying to guess the result of an experiment or as the disorder of a system that can be under different configurations. This measure, defined as an information index, considers the only possibility that an event will happen or not, not its meaning or its value. This is the main limitation of the concept of entropy.
This equation would lead to maximum entropy if all probabilities were equal. The maximum entropy can be considered a measure of the total uncertainty. The statistically most probable state in which a system can find itself is that which corresponds to the maximum entropy. If there are two probability distributions with equally likely results, then it is possible to determine the differences in the information content of the two distributions.
The measure of entropy allows us to obtain a positive number. Since the probability is a number between 0 and 1, the logarithm of pi will take on a negative value, reaching the maximum value of 0 when the probability is equal to 1. For this reason, the minus sign before the sum allows us to obtain a value of positive entropy. From this, we can understand the use of Shannon’s entropy as a measure of uncertainty when distributing a random variable.
Cross-entropy measures the accuracy of probabilistic forecasts, which is fundamental for modern forecasting systems since it allows us to produce very high-level estimates, even in the case of alternative indicators. Cross-entropy is very useful because it allows us to model even the rarest events, which are computationally expensive.
Cross-entropy measures the difference between two probability distributions for a given random variable or set of events. The information associated with an event quantifies the number of bits needed to encode and transmit it. Lower probability events provide more information; higher probability events provide less information. Cross-entropy tells us how likely an event is to happen based on its probability: if it is very probable, we have a small cross-entropy, while if it is not probable, we have a high cross-entropy.
At this point, we can define the cross-entropy: given two probability distributions, p and q, we can define the cross-entropy, H (p, q), with the following equation:
Here, x is the total number of values, p (x) is the probability of the real distribution (actual values), and q (x) is the probability of the distribution that was calculated, starting from the statistical model (predicted values).
If the expected values are equal to the current ones, then the cross-entropy and entropy are equal. In reality, this does not happen and the cross-entropy is obtained from the sum of the entropy and a term that takes the divergence into account.
Cross-entropy is commonly used in optimization procedures as a loss function. A loss function allows us to evaluate the performance of a simulation model: the better the model can predict the behavior of the real system, the smaller the values returned by the loss function. If we correct the algorithm to improve its predictions, then the loss function will give us a measure of the direction in which we are heading: if the loss function increasesm we are heading in the wrong direction, while if it decreases, we are heading in the right direction.
An algorithm based on cross-entropy provides for an iterative procedure in which each iteration can be divided into two phases:
Now, let’s learn how to calculate cross-entropy in a Python environment.
Let’s start practicing with cross-entropy by applying the equation defined in the Introducing cross-entropy section to two artificially created distributions. Here is the Python code (cross_entropy.py):
from matplotlib import pyplot from math import log2 events = ['A', 'B', 'C','D'] p = [0.70, 0.05,0.10,0.15] q = [0.45, 0.10, 0.20,0.25] print(f'P = {sum(p):.3f}',f'Q = {sum(q):.3f}') pyplot.subplot(2,1,1) pyplot.bar(events, p) pyplot.subplot(2,1,2) pyplot.bar(events, q) pyplot.show() def cross_entropy(p, q): return -sum([p*log2(q) for p,q in zip(p,q)]) h_pq = cross_entropy(p, q) print(f'H(P, Q) = {h_pq:.3f} bits')
As always, we will analyze the code line by line:
from matplotlib import pyplot
from math import log2
The matplotlib library is a Python library for printing high-quality graphics. Next, we imported the log2() function from the math library. This library provides access to the mathematical functions defined by the C standard.
events = ['A', 'B', 'C','D']
p = [0.70, 0.05,0.10,0.15]
q = [0.45, 0.10, 0.20,0.25]
print(f'P = {sum(p):.3f}',f'Q = {sum(q):.3f}')
To start, we have defined four event labels so that we can identify them when plotted. Therefore, we have defined two lists with the probabilities associated with these events: recall that according to the definition of cross-entropy, with p, we indicate the probability of real distribution (actual values), while with q, we define the probability of the distribution that was calculated, starting from the statistical model (predicted values). Finally, after defining the two probability distributions, we made the sum to verify that it was equal to 1.
The following result is printed on the screen:
P = 1.000 Q = 1.000
pyplot.subplot(2,1,1)
pyplot.bar(events, p)
pyplot.subplot(2,1,2)
pyplot.bar(events, q)
pyplot.show()
We have plotted two bar graphs for the two probability distributions. A bar chart represents a series of data from different categories: it displays data using multiple bars of the same width, each representing a category. The height of each bar is proportional to the probability value. The following diagram is returned:
Figure 4.10 – Bar plot of probability distributions
def cross_entropy(p, q):
return -sum([p*log2(q) for p,q in zip(p,q)])
To do this, we used the zip () function, which returns a zip object. This is a tuple iterator. The two arguments we passed are coupled, element by element.
h_pq = cross_entropy(p, q)
print(f'H(P, Q) = {h_pq:.3f} bits')
The following result is printed on the screen:
H(P, Q) = 1.505 bits
This represents the crossed entropy in bits. To verify its value, we can perform a new calculation simply by reversing the order of the distributions.
Now, let’s look at a new application in which we will use cross-entropy as a loss function. We will do this for a binary classification case where the outputs belong to only two classes, (0,1). In this case, the loss is equal to the mean of the categorical cross-entropy loss on many two-category tasks. Cross-entropy loss is used to measure the performance of a classification model. The loss is calculated between 0 and 1, where 0 is a perfect model. The goal is generally to get the model as close to 0 as possible. The formula for calculating the cross-entropy changes slightly:
Here, N is the number of observations, y is the label (that is, the actual value), and p is the estimated probability.
In binary classification, the probability is modeled as the Bernoulli distribution for the class 1 label: the probability for class 1 is predicted directly by the model, and the probability for class 0 is given as 1 minus the predicted probability.
Here is the Python code (cross_entropy_loss_function.py):
import numpy as np y = np.array([1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0]) p = np.array([0.8, 0.1, 0.9, 0.2, 0.8, 0.1, 0.7, 0.3, 0.6, 0.4]) ce_loss = -sum(y*np.log(p)+(1-y)*np.log(1-p)) ce_loss = ce_loss/len(p) print(f'Cross-entropy Loss = {ce_loss:.3f} nats')
As always, we will analyze the code line by line:
import numpy as np
Here, we imported the numpy library, which offers additional scientific functions of the Python language, designed to perform operations on vectors and dimensional matrices.
y = np.array([1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0])
p = np.array([0.8, 0.1, 0.9, 0.2, 0.8, 0.1, 0.7, 0.3, 0.6, 0.4])
As mentioned previously, y is the label (that is, the actual value) and p is the estimated probability.
ce_loss = -sum(y*np.log(p)+(1-y)*np.log(1-p))
ce_loss = ce_loss/len(p)
print(f'Cross-entropy Loss = {ce_loss:.3f} nats')
The following result is returned:
Cross-entropy Loss = 0.272 nats
Note that the result is in nats and not in bits since we used the natural logarithm. nat is a logarithmic unit of information or entropy based on natural logarithms, rather than the base 2 logarithms that define the bit.
In this chapter, we addressed the basic concepts of Monte Carlo simulation. We explored the Monte Carlo components used to obtain a simulation with satisfactory results. Hence, we used Monte Carlo methods to estimate the value of Pi.
Then, we tackled two fundamental concepts of Monte Carlo simulation: the law of large numbers and the central limit theorem. For example, the law of large numbers allows us to determine the centers and weights of a Monte Carlo analysis to estimate definite integrals. The central limit theorem is of great importance, and it is thanks to this that many statistical procedures work.
Next, we analyzed practical applications of using Monte Carlo methods in real life: numerical optimization and project management. Finally, we learned how to perform numerical integration using Monte Carlo techniques.
Finally, sensitivity analysis concepts and cross-entropy methods were explained using some practical examples.
In the next chapter, we will learn the basic concepts of the Markov process. We will understand the agent-environment interaction process and how to use Bellman equations as consistency conditions for the optimal value functions to determine the optimal policy. Finally, we will learn how to implement Markov chains to simulate random walks.
18.226.104.177