Appendix I: Introduction to Probability and Statistics

As Heraclitus said, “No man ever steps in the same river twice”; the world surrounding us is fast moving and full of randomness. Either the world is created that way, such that everything is random, or our understanding or knowledge is not sufficient to understand the laws behind the randomness. Regardless of this, we have to deal with the randomness and uncertainty that is represented in the random data we observe. The only tools we humans possess to deal with randomness are probability and statistics. Probability is a branch of theoretical mathematics to study the likeliness that a certain event will occur; statistics, on the other hand, is a branch of applied mathematics that measures the events occurring in the world through observation and presents the data collected. Both fields are important for the understanding of phenomena occurring in the world. Probability provides a fundamental foundation for data analysis in statistics, while statistics uses probability theory to guide the measurement and analysis of the random data observed in the world. Systems engineering lasts years, even decades; it involves many people and thus inevitably a great deal of randomness and uncertainty; for example, the number of defects appearing in the production batch, the market share of the developed system, or the errors occurring in the manufacturing process. We have seen many examples in the book’s chapters. Understanding the basic concepts of probability and statistics is essential for the successful development and implementation of a complex system. It is expected that probability and statistics should be included in any systems engineering curriculum as a required course. We here include a brief review of the basic concepts of probability theory for readers to refresh their memory on these subjects.

I.1 Basic Probability Concepts

As mentioned above, probability studies the randomness of the universe. For example, if we toss a coin, when it lands, it could be showing either heads or tails. We cannot say for sure whether it will be heads or tails before it lands, as the result of tossing a coin is a typically random event that many people like to use to demonstrate probability events. If this coin is fair, we know the chances of a head or tail would be 50%. This simple example includes all the fundamental concepts of probability: experiment, event, sample, sample space, and probability.

I.1.1 Experiment, Sample, Sample Space and Universe/Population

I.1.1.1 Experiment

In probability theory, an experiment can be loosely defined as an orderly action procedure carried out for a certain scientific purpose, such as verifying the chances of certain things occurring. Examples of experiments include tossing a coin, throwing a die, drawing a card from a deck of cards, testing a rocket launch, predicting tomorrow’s weather temperature, finding out the number of customers in a store, and so on.

I.1.1.2 Sample, Sample Space, and Event

The outcome of an experiment is called a sample. For example, when tossing a coin, the possible sample would be either heads or tails; when throwing a die, the sample would be 1, 2, 3, 4, 5, or 6 dots on the uppermost face. We denote a sample as ω, so for throwing a die, ω = 1, 2, 3, 4, 5, or 6. The set that includes all the possible sample results is called a sample space (Ω), for example, for tossing the dice, Ω = {1, 2, 3, 4, 5, 6}. An event is a collection of samples resulting from a certain experiment; for example, if we toss a coin twice, the possible events for this experiment could be, {H,H}, {H,T}, {T,H}, or {T,T}. From this perspective, we can see events are the subsets of sample space, if we denote the event as Ai, then Ai Ω. If an outcome of an experiment ω A, then we say that event A occurs. The set of all the events possible together is the largest possible sample space; we also call it the universe.

Set theory can be used for operations of events, such as intersection () or union () of the events. For example, for event A and B, A B, A B, AB=AB¯ are all new events (B¯=ΩB , the compliment of B). Some of the notations for events are:

A = B: A and B are the same events

A B: Union of A and B, meaning at least one event occurs

A B: Intersection of A and B, meaning both events occur

A B: A occurs and B does not occur

All the computation rules of set operations will apply for events; for example,

AB=BA (I.1)

AB=BA (I.2)

A(BC)=(AB)C (I.3)

A(BC)=(AB)C (I.4)

A(BC)=(AB)(AC) (I.5)

A(BC)=(AB)(AC) (I.6)

I.1.1.3 Probability

Probability measures the likelihood of certain events occurring; that is, the probability of event A is denoted as P(A). Probability has the following characteristics:

0P(A)1 (I.7)

P(Ω)=1 (I.8)

P()=0 (I.9)

AP()=0 (I.10)

P(A+B)=P(A)+P(B)P(AB) (I.11)

I.1.1.3.1 Mutually Exclusive Events

Events A and B are said to be mutually exclusive if A and B cannot both occur at the same time. For example, if a coin is tossed, A (the event of a head showing) and B (the event of a tail showing) are two mutually exclusive events, since a head and a tail cannot both be showing at the same time; or P(A B) = 0.

For mutually exclusive events, the probability of either event occurring equals the sum of the probabilities of the individual events occurring, or

P(A+B)=P(A)+P(B)

If this is a fair coin, meaning the probability of a head or tail showing is 50%, then P(A+B)=P(A)+P(B)=0.5+0.5=1

I.1.1.3.2 Independent Events

Event A is said to be independent to Event B if the probability of A occurring does not depend upon the probability of Event B occurring; in other words, the occurrence of A has nothing to do with Event B or vice versa.

For example, if we toss a coin, the event of having a head on the first toss (Event A) is independent to the event of having a head on the second toss (Event B). If two events are independent to each other, then the probability of the occurrence of the two events are the products of the two individual probabilities. So, the probability of two heads showing in a row if we toss a fair coin twice is

P(AA)=P(A)P(A)=0.5(0.5)=0.25

I.1.1.3.2.1 Conditional Probability

Suppose that we throw a fair die twice in a row; we know that the probability of a sum of 2 for the two throws is 1/36 (i.e., throwing 1 on the first throw and 1 on the second throw, and these two throws are independent to each other). However, if we have observed the first throw result, the probability becomes a conditional probability since the first throw results will impact the probability of the sum. Suppose if we see the first throw is 1, then the probability having a sum of 2 for two throws is 1/6; however, if we observe that the first throw is 3, then the probability of having a sum of 2 is zero, since there is no way we can reach a total of 2 if the first throw exceeds 2. This example illustrates the concept of conditional probability. If we denote A and B as two events, the conditional probability of A occurring given that B has occurred is denoted by a conditional probability

P(A|B)

The conditional probability equals the probability of A and B both occurring divided by the probability of B occurring,

P(A|B)=P(AB)P(B) (I.12)

In conditional probability, Bayes’ theorem (formula) is one of the most important and fundamental concepts that one should remember.

For two events A and B, the conditional probability P(A|B) is obtained by

P(A|B)=P(B|A)P(A)P(B) (I.13)

Given that events B1, B2, , Bn are mutually exclusive to each other, and suppose that event A has occurred, we can now determine the probability of event of B1, B2, , Bn occurring by using a more general form of Bayes’ theorem:

P(Bj|A)=P(A|Bj)P(Bj)i=1nP(Bi) (I.14)

Bj is called the prior probability and P(Bj|A) is the posterior probability.

I.2 Random Variables and Distribution Function

Using events to describe randomness is quite flexible; however, it is not suitable for numeric analysis. We need a more formal way to represent random events and outcomes of experiments. These quantities of random event outcomes, in the format of real values, are known as random variables.

A random variable is a variable whose value is subject to the outcome of random events. Each value is associated with a chance of such value occurring, which is the probability of the random variable. The function that describes the relations between the values of random variables and their associated probability is called the probability distribution function, sometimes called the probability density function (p.d.f), or probability mass function.

For example, if we let X be the random variable for the outcome of a throw of a fair die, then

X=1, 2, 3, 4, 5, or 6

and their probabilities are specified as follows:

P(X=1)=P(X=2)=P(X=3)=P(X=4)=P(X=5)=P(X=6)=16

For another experiment of tossing two fair dice, if we define the random variable X as the sum of the values of two dice, then

X={2,3,4,5,6,7,8,9,10, 11, 12}

And the probability distribution function for this random variable is

P(X=2)=136

P(X=3)=236

P(X=4)=336

P(X=5)=436

P(X=6)=536

P(X=7)=636

P(X=8)=536

P(X=9)=436

P(X=10)=336

P(X=11)=236

P(X=12)=136

The reader can verify that the above values are mutually exclusive and the sum of all the probabilities equals 1, that is to say,

i=212P(X=i)=1

Here, we formally define the probability distribution function.

The discrete probability function is defined as follows: If the values of the random variables are either finite or countable (a countable set means that you can assign 1-to-1 numbering from natural numbers, 1, 2, 3, , to the elements of the set; a countable set may be finite or infinite).

We denote the discrete random variable as X, and its possible values as {x1, x2, x3, , xn, }, the p.d.f of X is denoted as

f(X=xi)=P(X=xi), i=1, 2, 3, 

and

if(xi)=1

The cumulative distribution function, denoted as F(X), is defined as

F(a)=P(Xa)=all xiaP(xi) (I.15)

For example, if we toss a fair die, the cumulative probability F(4) is obtained as

F(4)=all xi4P(xi)=P(1)+P(2)+P(3)+P(4)=16+16+16+16=23

I.2.1 Continuous Random Variables

If the random variable X is a real value from a real value interval, then X is called a continuous random variable. The probability function of a continuous random variable is defined as

f(a)=P(x=a)

And for the cumulative probability function F(a),

F(a)=P{X(, a)}=af(x)dx (I.16)

Interpreting this graphically, the cumulative probability of a continuous random variable is the area to the left of the value under the p.d.f.

I.2.1.1 Expected Value and Variance of the Random Variable

The expected value of random variable X is also sometimes called the mean value of X, or the first moment of X, denoted as E[X], or sometimes μ.

If X is a discrete random variable with probability density function of p(x), then the expected value of X is defined as

E[X]=x:P(x)>0xp(x) (I.17)

We can see the expected value of a discrete random variable is the weighted sum of its value in the sample space multiplied by its corresponding probability.

If X is a continuous random variable, then its expected value is defined as

E[X]=+xf(x)dx (I.18)

Expectation of a function of a random variable:

If X is a discrete random variable with probability density function of p(x), and suppose g is a real-value function defined on X, then the expected value of the composite function is

E[g(X)]=x:P(x)>0g(x)p(x) (I.19)

If X is a continuous random variable with probability density function of f(x), and suppose g is a real-value function defined on X, then the expected value of the composite function is

E[g(X)]=+g(x)f(x)dx (I.20)

The variance of the random variable X is another important measure of the variable, usually denoted as σ2. The variance for a discrete random variable is defined as

σ2=Var(X)=E[(XE[X])2] (I.21)

So for a discrete random variable

σ2=Var(X)=E[(XE[X])2]=x:P(x)>0(xE[X])2p(x) (I.22)

For a continuous random variable,

σ2=Var(X)=E[(XE[X])2]=+(xE[X])2f(x)dx (I.23)

In the next section, we will review some of the popular random variables and distribution functions.

I.3 Some Commonly Used Probability Functions

I.3.1 Discrete Random Variables

I.3.1.1 Bernoulli Random Variable

The Bernoulli random variable has the following distribution:

For an experiment, there are possible two cases, success or failure (such as tossing a coin, if we call a head success and a tail failure). If we let X = 1 if a success occurs, and X = 0 if a failure occurs, then X is called a Bernoulli random variable, with a probability of p, 0 p 1:

p(1)=p

p(0)=1p

The expected value E[X] for a Bernoulli random variable is

E[X]=p(1)+(1p)(0)=p

And the variance is

σ2=Var(X)=E[(XE[X])2]=p(1p)2+(1p)(0p)2=p(1p)2+(1p)p2

I.3.1.2 Binomial Random Variable

Suppose we perform the above Bernoulli experiment n times. If we denote X as the number of successes occurring in the n trials, then X is called a binomial random variable. The p.d.f of a binomial random variable is

p(i)=(ni)pi(1p)ni (I.24)

where

(ni)=n!i!(ni)!

The expected value for a binomial random variable is

E[X]=np (I.25)

and the variance of a binomial random variable is

σ2=Var(X)=np(1p) (I.26)

I.3.1.3 Poisson Random Variable

A random variable X is said to be a Poisson random variable if X takes a value from the nonnegative integers set, {0, 1, 2, 3, }, and the probability function is

p(i)=P[X=i]=eλλii!,   i=0,1, 2,  (I.27)

The expected value of a Poisson distribution is

E[X]=λ (I.28)

and the variance of a Poisson distribution is the same, that is

σ2=Var(X)=λ (I.29)

I.3.2 Continuous Random Variables

I.3.2.1 Uniform Random Variable

The general uniform random variable defined on the interval [a,b] has the following p.d.f given by

f(x)={1ba, if a<x<b0,otherwise (I.30)

Figure I.1 illustrates the p.d.f. for a uniform random variable.

The cumulative probability distribution for a uniform random variable is

f(x)={0,if x<axaba,if a<x<b1,if x>b (I.31)

The expected value for the uniform distribution is

E[X]=12(a+b) (I.32)

and the variance for the uniform distribution is

Var(X)=112(ba)2 (I.33)

I.3.2.2 Exponential Random Variables

In the chapter on queuing theory, we saw that the exponential distribution was used widely. An exponential random variable with parameter λ (λ > 0) has the following distribution function:

f(x)={λeλx,  if x00,              if x<0 (I.34)

Figure I.2 illustrates the exponential function.

The cumulative distribution function F is

F(a)=0aλeλxdx=1eλx,  for a>0 (I.35)

The expected value of the exponential function is

E[X]=1λ (I.36)

and the variance of the exponential variable is

Var (x)=1λ2 (I.37)

I.3.2.3 Normal Random Variable

The probability distribution function for a normal random variable is

f(x)=12πσe(xμ)2/2σ2 (I.38)

The normal distribution is illustrated in Figure I.3.

As easily seen, the normal distribution is a symmetric bell-shaped curve about its mean value μ.

The expected value is

E[X]=μ (I.39)

The variance is

Var(X)=σ2 (I.40)

I.3.2.4 Lognormal Random Variable

X is said to be a lognormal random variable if its logarithmic log (X) is a normal random variable. The lognormal distribution has the following form with parameters μ and σ:

f(x)=1xσ2πe[(lnxμ)2/2σ2],  x>0 (I.41)

When parameters μ and σ change, the shape of the lognormal function also changes. μ is called its scale parameter and σ is called its shape parameter. Figures I.4 and I.5 illustrate the lognormal function with different parameters.

The expected value of the lognormal distribution is

E[X]=eμ+σ2/2 (I.42)

and the variance of the lognormal distribution is

Var(x)=e(σ21)e(2μ+σ2) (I.43)

I.3.2.5 Weibull Random Variable

A Weibull random variable has the following distribution function:

f(x)=αβαxα1e(x/β)α,  x0 (I.44)

where α > 0, β > 0.

Figure I.6 illustrates the Weibull distribution. The Weibull distribution is a more general form as it has the following characteristics:

  1. If α = 1, the Weibull distribution becomes an exponential distribution with parameter β.
  2. If α = 3.4, then the Weibull distribution can approximate a normal distribution.

The mean for the Weibull distribution is

E[X]=αβΓ(1α) (I.45)

and the variance is

Var(x)=β2α{2Γ(2α)1α[Γ(1α)]2} (I.46)

Γ() is the gamma function, which is

Γ(y)=0ty1etdt (I.47)

Figure I.5

Lognormal functions with different values for σ (μ = 0.2).

Figure I.6

Weibull distribution.

Figure I.1

Uniform distribution.

Figure I.2

Exponential distribution function.

Figure I.3

Normal distribution.

Figure I.4

Lognormal functions with different values for μ (σ = 1).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.102.166