21

Probability and statistics

21.1 Introduction

Consider a system of computers and communication networks supporting a bank's cash dispensing machines. The machines provide instant money on the street on production of a cash card and a personal identification code with the only prerequisite being that of a positive bank balance.

The communication network must be reliable – that is, not subject too often to break downs. The reliability of the system, or each part of the system, will depend on each component part.

A very simple system, with only two components, can be configured in series or in parallel. If the components are in series then the system will fail if one component fails (Figure 21.1).

f21-01-9780750658553
Figure 21.1 Component A and B must function for the system to function.

If the components are in parallel then only one component need function (Figure 21.2) and we have built in redundancy just in case one of the components fails.

f21-02-9780750658553
Figure 21.2 Either component A or component B must function for the system to function.

A network could be made up of tens, hundreds, or even thousands of components. It is important to be able to estimate the reliability of a complex system and therefore the rate of replacement of components necessary and the overall annoyance level of our customers should the system fail.

The reliability will depend on the reliability of each one of the components but it is impossible to say exactly how long even one single component will last.

We would like to be able to answer questions like:

1. What is the likely lifetime of any one element of the system?

2. How can we estimate the reliability of the whole system by considering the interaction of its component parts?

This is just one example of the type of problem for which we need the ideas of probability and statistics. Others are, a factory produces 1000 electronic components each day where we wish to be able to estimate how many defective components are produced each day or a company with eight telephone lines where we wish to say whether that is a sufficient number, in general, for the number of calls likely to be received at any one time.

All of these questions are those that need the ideas of probability and statistics. Statistics is used to analyse data and produce predictions about it. Probability is the theory upon which statistical models are built.

For the engineering situations in previous chapters we have considered that we know all the factors that determine how a system will behave, that is, we have modelled the system deterministically. This is not very realistic. Any real situation will have some random element. Some problems, as the one we have just introduced, contain a large random element so that it is difficult to predict the behaviour of each part of the system. However, it is possible to say, for instance, what the average overall behaviour would be.

21.2 Population and sample, representation of data, mean, variance and standard deviation

In statistics, we are generally interested in a ‘population’ that is too large for us to measure completely. For instance, we could be interested in all the light bulbs that are produced by a certain manufacturer. The manufacturer may claim that her light bulbs have a lifetime of 1500 h. Clearly this cannot be exact but we might be satisfied to agree with the manufacturer if most of the bulbs have a lifetime greater than 1500 h. If the factory produces half a million light bulbs per year then we do not have the time to test them all. In this case we test a ‘sample’. The larger the sample, the more accurate will be the comparison with the whole population.

We have recorded the lifetimes of samples of light bulbs in Table 21.1. We can build up a table to give information about our sample. Columns 8 and 9 are given for comparison with statistical modelling and columns 4, 6, and 7 are to help with the calculation. The following describes each column of Table 21.1, beginning with a list of the ‘raw’ data.

Table 21.1

Frequency distribution of the lifetimes of a sample of light bulbs showing method of calculating the mean and standard deviation. N represents the number of classes, n the number in the sample

(1) Lifetime (h)(2) Class mid-point xi(3) Class frequency fi(4) fi(x)(5) Cumulative frequency Fi(6) (xi - x¯si2_e)2(7) fi(xix¯si3_e)2(8) Relative frequency fi /n(9) Relative cumulative frequency Fi /n
900–100095021900253684910736980.0020.002
1000–1100105000240030900.0020.002
1100–1200115055750728376914188450.0050.007
1200–1300125023287503018722943062670.0230.03
1300–1400135047634507711068952023830.0470.077
1400–150014501031493501805414955773470.1030.18
1500–160015501602480003401760928174400.160.34
1600–1700165019031350053010692031100.190.53
1700–1800175016528875069545297472850.1650.695
1800–190018501643034008592798945901960.1640.859
1900–20001950921794009517144965733080.0920.951
2000–2100205049100450100013490966105410.0491
1000??????1682700ifi=n??????x¯=16827001000x¯=1nifixi????????FN=nsi4_e39120420??????1σ2=12ifi(xix¯)2????ifin=1σ2=391204201000σ198?????FNn=1si5_e

cetable1

Column 1: the class intervals

Find the minimum and the maximum of the data. The difference between these gives the range of the data. Choose a number of class intervals (usually up to about 20) so that the class range can be Chosen as some multiple of 10, 100, 1000, etc. (like the class range of 100 above). The data range divided by the class range and then rounded up to the integer above gives the number of classes. The class intervals are chosen so that the lowest class minimum is less than the minimum data value. In our example, the lowest class minimum has been chosen as 900. Add the class range of 100 to find the class interval, for example, 900–1000. Carryon adding the class range to find the next class interval until you pass the maximum value.

Column 2: the class midpoint

The class midpoint is found by the maximum value in the class interval – the minimum value in the class interval divided by 2. For the interval 1300–1400, the class midpoint is (1400 – 1300)/2 = 1350.

The class midpoint is taken as a representative value for the class. That is, for the sake of simplicity we treat the data in the class 1300–1400 as

though there were 47 values at 1350. This is an approximation that is not too serious if the classes are not too wide.

Column 3: class frequency and the total number in the sample

The class frequency is found by counting the number of data values that fall within that class range. For instance, there are 47 values between 1300 and 1400. You have to decide whether to include values that fall exactly on the class boundary in the class above or the class below. It does not usually matter which you choose as long as you are consistent within the whole data set.

The total number of values in the sample is given by the sum of the frequencies for each class interval.

Column 4: fixi and the mean of the sample, x¯si9_e

This is the product of the frequency and the class midpoint. If the class midpoint is taken as a representative value for the class then fixi gives the sum of all the values in that class. For instance, in the eighth class, with class midpoint 1650, there are 190 values. If we say that each of the values is approximately 1650 then the total of all the values in that class is 1650 × 190 = 313 500. Summing this column gives the approximate total value for the whole sample= 1 682 700 h. If we only used one light bulb at a time and changed it when it failed then the 1000 light bulbs would last approximately 1 682 700 h. The mean of the sample is this total divided by the number in the sample, 1000. Giving 1 682 700/1000 = 1682.7 h. The mean is a measure of the central tendency. That is, if you wanted a simple number to sum up the lifetime of these light bulbs you would say that the average life was 1683 h.

Column 5: the cumulative frequency

The cumulative frequency gives the number that falls into the current class interval or any class interval that comes before it. You could think of it as ‘the number so far’ function. To find the cumulative frequency for a class, take the number in the current class and add on the previous cumulative frequency for the class below, for example, for 1900–2000 we have a frequency of 92. The cumulative frequency for 1800–1900 is 859. Add 859 + 92 to get the cumulative frequency of 951. That is, 951 light bulbs in the sample have a lifetime below 2000 h. Notice that the cumulative frequency of the final class must equal the total number in the sample. This is because the final class must include the maximum value in the sample and all the others have lifetimes less than this.

Column 6: (Xix¯si10_e)2, the squared deviation

This column is used in the process of calculating the standard deviation (see column 7). (xix¯si11_e) represents the amount that the class midpoint differs from the mean of the sample. If we want to measure how spread out around the mean the data are, then this would seem like a useful number. However, adding up (xix¯si12_e) for the whole sample will just give zero, as the positive and negative values will cancel each other. Hence, we take the square, so that all the numbers are positive. For the sixth class interval the class midpoint is 1450; subtracting the mean, 1682.7, gives – 232.7, which when squared is 54149.2. This value is the square of the deviation from the mean, or just the squared deviation.

Column 7: fi(xix¯si13_e)2, the variance and the standard deviation

For each class we multiply the frequency by the squared deviation, calculated in column 6. This gives an approximation to the total squared deviation for that class. For the sixth class we multiply the squared deviation of 54 149 by the frequency 103 to get 5 577 347. The sum of this column gives the total squared deviation from the mean for the whole sample. Dividing by the number of sample points gives an idea of the average squared deviation. This is called the variance. It is found by summing column 7 and dividing by 1000, the number in the sample, giving a variance of 39 120. A better measure of the spread of the data is given by the square root of this number, called the standard deviation and usually represented by σ. Here σ=39?120?198si14_e.

Column 8: the relative frequency

If instead of 1000 data values in the sample we had 2000, 10 000, or 500, we might expect that the proportion falling into each class interval would remain roughly the same. This proportion of the total number is called the relative frequency and is found by dividing the frequency by the total number in the sample. Hence, for the third class, 1100–1200, we find 5/1000 = 0.005.

Column 9: the cumulative relative frequency

By the same argument as above we would expect the proportion with a lifetime of less than 1900 h to be roughly the same whatever the sample size. This cumulative relative frequency can be found by dividing the cumulative frequency by the number in the sample. Notice that the cumulative relative frequency for the final class interval is 1. That is, the whole of the sample has a lifetime of less than 2000 h.

The data can be represented in a histogram as in Figure 21.3, which gives a simple pictorial representation of the data. The right-hand side has a scale equal to the left-hand scale divided by the number in the sample. These readings, therefore, give the relative frequencies and the cumulative relative frequency as given in Table 21.1.

f21-03-9780750658553
Figure 21.3 Histogram of the lifetimes of a sample of light bulbs

We may want to sum up our findings with a few simple statistics. To do this we use a measure of the central tendency and the dispersion of the data, which are calculated as already described above. These are summarized below.

Central tendency – the mean

The most commonly used measure of the central tendency is the mean, x¯si15_e

x¯=1nifixi=ifixin

si16_e

where xiis a representative value for the class and fiis the class frequency. The summation is over all classes.

If the data have not been grouped into classes then the mean is found by summing all the individual data values and dividing by the number in the sample:

x¯=1nixi

si17_e

where the summation is now over all sample values

Dispersion – the standard deviation

The variance is the mean square deviation given by

υ=σ2=1nifi(xix¯)2=ifin(xix¯)2

si18_e

where x¯si19_e is the sample mean, xiis a representative value for the class, fiis the class frequency, and the summation is over all classes. This value is the same as

υ=σ2=1nifixi2x¯2

si20_e

which can some times be quicker to calculate.

For data not grouped into classes we have

υ=σ2=1ni(xix¯2)

si21_e

or

υ=σ2=1nixi2x¯2

si22_e

where the summation is now over all sample values.

The standard deviation is given by the square root of the variance:

σ=1nifi(xix¯)2.

si23_e

We have already calculated the average lifetime and the standard deviation of the lifetimes of the sample of light bulbs. We repeat the calculation below. The mean is given by

class?midpoint?????????(2×950+0×1050+5×1150+23×1250+47×1350+103×1450??class?frequency+160×1550+190×1650+165×1750+164×1850+92×1950+49×2050)/1000=1682.7.

si24_e

The variance is given by

class midpoint??????sample?mean???????????????????????????????????????(2?×(9501682.7)2+0×(10501682.7)2+5×(11501687.7)2??class?frequency+23×(12501682.7)2+47×(13501682.7)2+103×(14501682.7)2+160×(15501682.7)2+190×(16501682.7)2+165×(17501682.7)2+164×(18501682.7)2+92×(19501682.7)2+49×(20501682.7)2/1000=39?120420/100039120.4.

si25_e

Hence, the standard deviation is

39120.4198.

si26_e

Can we agree with the manufacturer's claim that the light bulbs have a lifetime of 1500 h? The number with a lifetime of less than 1500 h is 180 (the cumulative frequency up to 1500 h). This represents only 0.18, less than 20% of the sample. However, this is probably an unreasonable number to justify the manufacturers claim as nearly 20% of the customers will get light bulbs that are not as good as advertised. We might accept a small number, say 5%, failing to live up to a manufacturer's promise. Hence, it would be better to claim a lifetime of around 1350 h with a relative cumulative frequency between 0.03 and 0.077. Approximately 0.053 or just over 5% would fail to live up to the manufacturer's amended claim.

Example 21.1

A sample of 2000-Ω resistors were tested and their resistances were found as below:

1997199820042002199920002001200219991998199719992001200320051996200020001998199919982001200320021996200219952000200019991997200420011999199919982002200320022001199919981997200220012000199920011999199820001999199820002002199920012000200120022004199620002002200019982000200520001999200020002001199920011997200220012004200020001999199820011998200020021998199819992002199920012002200620002001200120022003

si27_e

Group the data in class intervals and represent it using a table and a histogram. Calculate the mean resistance and the standard deviation.

Solution We follow the method of building up the table as described previously.

Column 1: class intervals. To decide on class intervals look at the range of values presented. The minimum value is 1995 and the maximum is 2006. A reasonable number of class intervals would be around 10, and in this case as the data are presented to the nearest whole number, the smallest possible class range we could choose would be 1 Ω. A 1-Ω class range would give 12 classes, which seems a reasonable choice. If we choose the class midpoints to be the integer values then we get class intervals of 1994.5–1995.5, 1995.5–1996.5, etc. We can now produce Table 21.2.

Table 21.2

Frequency distribution of resistances of a sample of resistors. N represents the number of classes, n the number in the sample

(1) Resistance (Ω)(2) Class mid-point xi(3) Class frequency fi(4) fi (x)(5) Cumulative frequency Fi(6) (xi - x¯si6_e)2(7) fi(xi −x)2(8) Relative frequency fi /n(9) Relative cumulative frequency Fi/n
1994.5–1995.5199511995126.5226.520.010.01
1995.5–1996.5199635988417.2251.660.030.04
1996.5–1997.519975998599.9249.60.050.09
1997.5–1998.519981325974224.6260.060.130.22
1998.5–1999.519991733983391.3222.440.170.39
1999.5–2000.520001938000580.020.380.190.58
2000.5–2001.520011632016740.7211.520.160.74
2001.5–2002.520021530030893.4251.30.150.89
2002.5–2003.5200348012938.1232.480.040.93
2003.5–2004.52004480169714.8259.280.040.97
2004.5–2005.52005240109924.5247.040.020.99
2005.5–2006.520061200610034.2234.220.011
100??????200015ifi=n????x¯200015100?????x¯1nifixix¯=2000.15?????FN=nsi7_e446.5??????1σ2=1nifi(xix¯)2????ifin=1σ2=446.5100σ=2.11?????FNn=1si8_e

cetable2

The other columns are filled in as follows:

Column 2. To find the frequency, count the number of data values in each class interval. The sum of the frequencies gives the total number in the sample; in this case 100.

Column 3. The cumulative frequency is given by adding up the frequencies so far. The number of resistances up to 1995.5 is 1. As there are three in the interval 1995.5–1996.5, add 1 + 3 = 4 to get the number up to a resistance of 1996.5. The number up to 1997.6 is given by 4 + 5 = 9, etc.

Column's 4, 6, and 7 help us calculate the mean and standard deviation. To find column 4 multiply the frequencies (column 3) by the class midpoints (column 2) to get fixi.

The sum of this column is an estimate of the total if we added all the values in the sample together. Therefore, dividing by the number of items in the sample gives the mean:

Mean=(1×1995+3×1996+5×1997+13×1998+17×1999+19×2000+16×2001+15×2002+4×2003+4×2004+2×2005+1×2006)/100=200?015/100=2000.15.

si28_e

To find the standard deviation we calculate the variance first. The variance is the mean squared deviation. Find the difference between the mean and each of the class intervals and square it. This gives column 6. Multiply by the class frequency to get column 7. Finally, add them all up and divide by the total number. This gives the variance:

σ2=446.5100=4.465

si29_e

and the standard deviation

σ=446.52.11.

si30_e

The relative frequency (given in column 8) is the fraction in each class interval; that is, the frequency divided by the total number. For example, the relative frequency in the class interval 2000.5–2001.5 is 16/100 = 0.16

The cumulative relative frequency is the cumulative frequency divided by the total number n. The cumulative frequency up to 2000.5 is 58. Divided by the total number gives 0.58 for the relative cumulative frequency.

We can sum up this sample by saying that the mean resistance is 2000.15 with a standard deviation of 2.11. These simple figures can be used to sum up how closely the claim that they are resistors of 2000Ω can be justified. We can also picture the frequency distribution using a histogram as in Figure 21.4

f21-04-9780750658553
Figure 21.4 Histogram of the frequency distribution of the resistances given in Table 21.2.

Returning to the lifetimes of the light bulbs we might want to ask what average lifetime and standard deviation might lead to less than 5% of the light bulbs having a lifetime less than 1500 h?

To answer this sort of problem we need to build up a theory of statistical models and use probability theory.

21.3 Random systems and probability

We are dealing with complicated systems with a number of random factors affecting its behaviour; for instance, those that we have already seen, production of resistors, lifetimes of bulbs. We cannot determine the exact behaviour of the system but using its frequency distribution we can estimate the probabilities of certain events.

Relative frequency and probability

The probability of an event is related to its relative frequency. If I chose a light bulb from the sample in Section 21.2, then the probability of its lifetime being between 1400 and 1500 h is 0.103. This is the same as the relative frequency for that class interval, that is, the number in the class interval divided by the total number of light bulbs in the sample. Here, we have assumed that we are no more likely to pick any one light bulb than any other, that is, each outcome is equally likely. The histogram of the relative frequencies (Figure 21.3 using the right-hand scale) gives the probability distribution function (or simply the probability function) for the lifetimes of the sample of light bulbs.

As an introduction to probability, examples are often quoted involving throwing dice or dealing cards from a pack of playing cards. These are used because the probabilities of events are easy to justify and not because of any particular predilection on the part of mathematicians to a gambling vocation!

Example 21.2

A die has the numbers one to six marked on its sides. Draw a graph of the probability distribution function for the outcome from one throw of the die, assuming it is fair.

Solution If we throw the die 10 000 times we would expect the number of times each number appeared face up to be roughly the same. Here, we have assumed that the die is fair; that is, any one number is as likely to be thrown as any other. The relative frequencies would be approximately 1/6. The probability distribution function, therefore, is a flat function with a value of 1/6 for each of the possible outcomes of 1,2,3,4,5,6. This is shown in Figure 21.5

f21-05-9780750658553
Figure 21.5 Probability distribution for a fair die

Notice two important things about the probabilities in the probability distribution for the die:

1. each probability is less than 1;

2. the sum of all the probabilities is 1:

1/6+1/6+1/6+1/6+1/6+1/6=1.

si31_e

Example 21.3

A pack of cards consists of four suits, hearts, diamonds, spades, and clubs. Each suit has 13 cards; that is, cards for the numbers 1 (the ace) to 10 and a Jack, Queen, and King. Draw a graph of the probability distribution function for the outcome when dealing one card from the pack, where only the suit is recorded. Assume the card is replaced each time and the pack is perfectly shuffled.

Solution If a card is selected, the suit recorded hearts, spades, clubs, or diamonds, the card is placed back in the pack and the pack is shuffled. If this is repeated (say 10 000 times) then we might expect that each suit will occur as often as any other, that is, 1/4 of the time. The probability distribution function again is a flat distribution, and has the value of 1/4 for each of the possible four outcomes. The probability distribution for suits is given in Figure 21.6.

f21-06-9780750658553
Figure 21.6 The probability distribution for suits from a pack of cards.

Notice again that in Example 21.3 each probability is less than 1 and that the sum of all the probabilities is 1:

14+14+14+14=1.

si32_e

We have seen that a probability distribution can be represented using a graph. Each item along the x-axis has an associated probability. Probability is a function defined on some set. The set is called the sample space, S, and contains all possible outcomes of the random system. The probability distribution function is often abbreviated to p.d.f.

Some definitions

A trial is a single observation on the random system, for example, one throw of a die, one measurement of a resistance in the example in Section 21.2.

The sample space is the set of all possible outcomes, for example, for the die it is the set {1, 2, 3, 4, 5, 6}, and for the resistance problem it is the set of all possible measured resistances. This set may be discrete or continuous. An event is a set of outcomes. For instance, A is the event of throwing less than 4 and B is the event of throwing a number greater than or equal to 5.

An event is a subset of the sample space S.

Notice that in the case of a continuous sample space an event is also a continuous set, represented by an interval of values. For example, C is the event that the resistance lies in the interval 2000 ± 1.5.

Probability

The way that probability is defined is slightly different for the case of a discrete sample space or a continuous sample space.

Discrete sample space

The outcome of any trial is uncertain; however, in a large number of trials the proportion showing a particular outcome approaches a certain number. We call this the probability of that outcome. The probability distribution function, or simply probability function, gives the value of the probability that is associated with each outcome. The probability function obeys two important conditions:

1. all probabilities are less than, or equal to 1, that is, 0p(x)1,si33_e where x is any outcome in the sample space;

2. the probabilities of the individual outcomes sum to 1, that is, ∑ p(x) = 1.

Continuous sample space

Considering a continuous sample space. We assign probabilities to intervals. We find the probability of, for instance, the resistance being in the interval 2000 ± 0.5. Hence, we assign probabilities to events and not to individual outcomes. In this case, the function that gives the probabilities is called the probability density function and we can find the probability of some event by integrating the probability density function over the interval. For instance, if we have a probability density function f (x), where x can take values from a continuous sample space, then the probability of x being in the interval a to b is given by, for example,

p(a<x<b)=abf(x)dx.

si34_e

The probability density function obeys the condition:

f(x)dx=1

si35_e

that is, the total area under the graph of the probability density function must be 1.

Equally likely events

If all outcomes are equally likely then the probability of an event E from a discrete sample set is given by

The?number?of?outcomes?in?EThe?total?number?of?outcomes?in?S.

si36_e

That is, the probability of E is equal to the proportion of the whole sample space that is in E, when each of the outcomes are equally likely.

Example 21.4

What is the probability on one throw of a die getting a number less than 3?

Solution Give this event the name A. Then A = { 1, 2} and S = {1,2,3,4,5, 6}. The number in A is 2 and the number in S is 6. Therefore, as each outcomes is equally likely, p(A), the probability of A is 2/6 = 1/3.

This particular result leads to another way of representing probability – by using area.

If we use a rectangle to represent the set of all possible outcomes, S, then an event is a subset of S, that is, one section of the rectangle and if all outcomes are equally likely then its probability can be pictured by the proportion of S that is in A. We will put a dotted line around the picture representing the set A to indicate the number in A or the area of A (see Figure 21.7).

f21-07-9780750658553
Figure 21.7 A ⊆ S ( A is a subset of S) where all the outcomes in S are equally likely. The probability of A can be pictured as the proportion of S than is in A: the ratios of the areas.

This way of picturing the probability of an event can help in remembering some of the probabilities of combined events.

21.4 Addition law of probability

Disjoint events

Disjoint events are events with no outcomes in common. They cannot happen simultaneously.

Example 21.5

A is the event that a card chosen from a playing pack is under 6 (counting ace as low, i.e. = 1) and B is the event of choosing a picture card (Jack, Queen, or King). Find the probability that a card chosen from the pack is under 6 or is a picture card.

Solution

u21-01-9780750658553

A and B are disjoint. As each outcome is considered to be equally likely the probabilities are easy to find:

p(A)=Number?in?ANumber?in?S.

si37_e

There are 52 cards; therefore, 52 possible outcomes in S, so

p(A)=2052=513.

si38_e

Also

p(B)=Number?in?BNumber?in?S=1252=313.

si39_e

We want to find the probability of A or B happening; that is, the probability that the card is either under 6 or is a picture card:

u21-02-9780750658553

We can see that:

p(AB)=Number?in?A??BNumber?in?S=3252=813Number?in?A?+?number?in?BNumber?in?S=Number?in?ANumber?in?S+Number?in?BNumber?in?Sp(A??B)=p(A)+p(B)813=513+313.

si40_e

We can also see this by using the idea of area to picture it, as in Figure 21.8.

f21-08-9780750658553
Figure 21.8 The two events A and B are disjoint; that is, they have no outcomes in common. On a Venn diagram, as in (a), they do not intersect. The probability of either A or B can be found by P(A ∪ B) =P(A) + P(B), as in (b).

We can also consider the probability of A not happening; that is, the probability of the complement of A, which we represent by A′. As A and A′ are disjoint.

p(A′)+p(A)=1p(A′)=1(A).

si41_e

This is shown in Figure 21.9.

f21-09-9780750658553
Figure 21.9 p(A’) = 1 – p(A).

Non-disjoint events

Example 21.6

Consider one throw of a die

A={a?|?a?is?an?even?number}B={b?|?b?is?a?multiple?of?3}.

si42_e

Find the probability that the result of one throw of the die is either an even number or a multiple of 3.

Solution

A={a?|?a?is?an?even?number}B={b?|?b?is?a?multiple?of?3}.

si43_e

then

A={2,4,6}B={3,6}A??B={2,3,4,6}

si44_e

and

p(A)=36=12p(B)=26=13p(AB)=46=23

si45_e

and

p(AB)p(A)+p(B).

si46_e

Looking at the problem and using the idea of areas in the set, we can see from Figure 21.10 that the rule becomes

p(A??B)=p(A)+p(B)p(A??B)

si47_e

f21-10-9780750658553
Figure 21.10 (a)A∪B. (b) p(A ∪ B) =p(A) + p(B) – p(A ∩ B).

21.5 Repeated trials, outcomes, and probabilities

Supposing we start to consider more complicated situations, like throwing a die twice. Then the outcomes can be found by considering all the possible outcomes of throwing the die the first time combined with all the possible outcomes of throwing the die the second time. As there are six possible outcomes for the first throw and six possible outcomes for the second throw, then there are 6 × 6 = 36 outcomes of throwing the die twice.

If we would like to find the probability that the first throw is a 5 and the second throw is a 5 or 6 then we can do this by listing all the 36 outcomes and finding the proportion that fall into our event.

The set S of all possible outcomes has 36 elements:

S={(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,5),(5,6),(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}

si48_e

and each has an equally likely outcome. E = the first throw is a 5 and the second is a 5 or 6. Hence

E={(5,5),(5,6)}

si49_e

p(E)=Number?in?ANumber?in?S=236=118.

si50_e

This sort of problem can be pictured more easily using a probability tree.

21.6 Repeated trials and probability trees

Repeatedly tossing a coin

The simplest sort of trial to consider is one with only two outcomes; for instance, tossing a coin. Each trial has the outcome of head or tail and each is equally likely.

We can picture repeatedly tossing the coin by drawing a probability tree. The tree works by drawing all the outcomes and writing the probabilities for the trial on the branches. Let us consider tossing a coin three times. The probability tree for this is shown in Figure 21.11.

f21-11-9780750658553
Figure 21.11 A probability tree for three tosses of a coin.

There are various rules and properties we can notice.

Rules of probability trees and repeated trials

1. The probabilities of outcomes associated with each of the vertices can be found by multiplying all the probabilities along the branches leading to it from the top of the tree. HTH has the probability

12×12×12=18

si51_e


2. At each level of the tree, the sum of all the probabilities on the vertices must be 1.

3. The probabilities along branches out of a single vertex must sum to 1. For example, after getting a head on the first trial we have a probability of 1/2 of getting a head on the second trial and a probability of 1/2of getting a tail. Together they sum to 1.
The fourth property is one that is only true when the various repeated trials are independent, that is, the result of the first trial has no effect on the possible result of the second trial, etc.

4. For independent trials, the tree keeps repeating the same structure with the same probabilities associated with the branches. Here, the event of getting a head on the third toss is independent of the event of getting a head on the first or second toss.

Using the tree we can find various probabilities, as we see in Example 21.7.

Example 21.7

What is the probability on three tosses of the coin that exactly two of them will be heads?

Solution Count up all the ways that we could have two heads and one tail, looking at the foot of the probability tree in Figure 21.11. We find three possibilities:

HHT,HTH,THH

si52_e

each of these has probability of 18.si53_eTherefore, the probability of exactly two heads is 38.si54_e

Picking balls from a bag without replacement

We have 20 balls in a bag. Ten are red and ten are black. A ball is picked out of the bag, its colour recorded and then it is not replaced into the bag. There are two possible outcomes of each trial, red (R) or black (B). Let us consider three trials and their associated outcomes in a probability tree, as shown in Figure 21.12.

f21-12-9780750658553
Figure 21.12 Probability tree for three trials of picking a ball out of a bag without replacement.

To find the probabilities we consider how many balls remain in each case. If the first ball chosen is red then there are only 19 balls left of which 9 are red and 10 are black; therefore, the probability of picking a red ball on the second trial is only 9/19 and the probability of picking black is 10/19.

The first three rules given in the last example apply.

1. The probabilities of outcomes associated with each of the vertices can be found by multiplying all the probabilities along the branches leading to it from the top of the tree. BRB has the probability:

1020×1019×918=9006840=176.

si55_e


2. At each level of the tree the sum of all the probabilities on the vertices must be 1; for example, at the second level we have

90380+100380+100380+90380=9+10+10+938=3838=1.

si56_e

3. The probabilities along branches out of a single vertex must sum to 1; for example, after picking red on the first trial we have a probability of 9/19 of picking red and 10/19 of picking black. Together they sum to 1.

The fourth property is no longer true as the various repeated trials are not independent. The result of the first trial has an effect on the possible result of the second trial, etc.

Using the tree we can find the probability of various events as in Example 21.8.

Example 21.8

Of 20 balls in a bag 10 are red and 10 are black. A ball is picked out of the bag, its colour recorded and then it is not replaced into the bag. What is the probability that of the first three balls chosen exactly two will be red?

Solution We can use the probability tree in Figure 21.12 to solve this problem. We look at the foot of the tree, which gives all the possible outcomes after three balls have been selected.

The ways of getting two red are RRB, RBR, BRR and the associated probabilities are

9006840,9006840,9006840

si57_e

or

1076,1076,1076.

si58_e

Summing these gives the probability of exactly two reds being chosen out of the three as 30/76.

21.7 Conditional probability and probability trees

The probabilities we have been writing along the branches of the probability tree are called conditional probabilities.

Example 21.9

What is the probability that the first throw of a die will be a 5 and the second throw will be a 5 or 6?

Solution A is the set of those outcomes with the first throw a 5 and B is the set of those outcomes with the second throw a 5 or 6. We can use a probability tree in the following way. After the first throw of the die either A is true or not, that is, we have only two possibilities, A or A'. After that, we are interested in whether B happens or not. Again we either get B or B’. We get the probability tree as in Figure 21.13.

f21-13-9780750658553
Figure 21.13 A probability tree showing conditional probabilities.

Here, p(B|A) means the probability of B given A, similarly p(B|A′) means the probability of B given (not A). We can fill in the probabilities using our knowledge of the fair die. The probability of A is 1/6. The second throw of the die is unaffected by the throw first throw of the die; therefore

p(B)=p(throwing?a?5?or?a?6?on?one?throw?of?the?die)=26=13.

si59_e

Working out the other probabilities gives the probability tree in Figure 21.14.

f21-14-9780750658553
Figure 21.14 The probability for Example 21.9.

The probability that the first throw of a die will be a 5 and the second throw will be a 5 or 6 is p(B∩A), given from the tree in Figure 21.14 as

26×13=118.

si60_e

Notice that the probability we have calculated is the intersection of the two events A and B, that is, we calculated the probability that both occurred. When multiplying the probabilities on the branches of the probability tree we are using the following:

p(A??B)=p(A)p(B|A).

si61_e

Furthermore, because of independence we have used the fact that p(B|A) = p(B). That is, the probability of B does no depend on whether A has happened or not: B is independent of A. We, therefore, have the following important results.

Multiplication law of probability

p(A??B)=p(A)p(B|A).

si62_e

This law applies for any two events A and B. It is the law used in finding the probabilities of the vertices of the probability trees.

Example 21.10

It is known that 10% of a selection of 100 electrical components are faulty. What is the probability that the first two components selected are faulty if the selection is made without replacement?

Solution The probability we are looking for is:

p(first faulty ∩ second faulty)

= p(first faulty) p(second faulty | first faulty)

Here there are only two possibilities at each stage, faulty or not faulty. The probability tree is as shown in Figure 21.15.

f21-15-9780750658553
Figure 21.15 Probability tree for Example 21.10.

Notice that their are only 99 components left after the first trial and whether the first was faulty or not changed the probability that the second is faulty or otherwise. Each branch of the tree, after the first layer, represents a conditional probability.

The answer to our problem is therefore

p(first?faulty??second?faulty)??=p(first?faulty)p(second?faulty?|?first?faulty)??=10100×999=1110.

si63_e

Condition of independence

If two events A and B are independent then

p(B|A)=p(B).

si64_e

Notice that the multiplication law changes in this case, as described below.

Multiplication law of probability for independent events

p(AB)=p(A)p(B).

si65_e

There is one other much-quoted law of probability that completes all the basic laws from which probabilities can be worked out. That is Bayes's theorem and it comes from the multiplication law.

Bayes's theorem

As p(A ∩ B) = p(A) p(B|A) then as A ∩ B = B ∩ A we can also write p(B ∩ A) = p(B) p(A|B) and putting the two together gives p(A) p(B|A) = p(B) p(A|B) or

p(B|A)=p(A|B)p(B)p(A).

si66_e

Bayes's theorem is important because it gives a way of swapping conditional probabilities that may be useful in diagnostic situations where not all of the conditional probabilities can be found directly.

Example 21.11

In a certain town there are only two brands of hamburgers available, Brand A and Brand B. It is known that people who eat Brand A hamburger have a 30% probability of suffering stomach pain and those who eat Brand B hamburger have a 25% probability of suffering stomach pain. Twice as many people eat Brand B compared to Brand A hamburgers. However, no one eats both varieties. Supposing one day you meet someone suffering from stomach pain who has just eaten a hamburger, what is the probability that they have eaten Brand A and what is the probability that they have eaten, Brand B?

Solution First we define the sample space S, and the other simple events.

S = people who have just eaten a hamburger

A = people who have eaten a Brand A hamburger

B = people who have eaten a Brand B hamburger

C = people who are suffering stomach pains

We are given that:

p(A)=13p(B)=23p(C|A)=0.3p(C|B)=0.25.

si67_e

Note also that S = A ∪ B.

As those who have stomach pain have either eaten Brand A or B, then A ∩ B = Ø

p(C)=p(CS)=p(CA)+p(CB)=p(C|A)p(A)+p(C|B)p(B)=0.3×13+0.25×23=830.

si68_e

Then

p(A|C)=p(C|A)p(A)p(C)=0.3×(1/3)8/30=38

si69_e

and

p(B|C)=p(C|B)p(B)p(C)=0.25×(2/3)8/30=58.

si70_e

Hence, if they have stomach pain the probability that they have eaten Brand A is 3/8 and the probability that they have eaten Brand B is 5/8.

21.8 Application of the probability laws to the probability of failure of an electrical circuit

Components in series

We denote ‘The probability that A does not fail’ as p(A). Then the probability that S fails is p(S’) = 1 – p(S).

As they are in series (see Figure 21.16), S will function if both A and B function: S = A ∩ B. Therefore

p(S)=p(AB)

si71_e

f21-16-9780750658553
Figure 21.16 Components in series. S fails if either A or B fails.

as A and B are independent

p(S)=p(A)p(B)p(S′)=1p(A)p(B).

si72_e

Example 21.12

An electrical circuit has three components in series (see Figure 21.17). One has a probability of failure within the time of operation of the system of 1/5; the other has been found to function on 99% of occasions and the third component has proved to be very unreliable with a failure once for every three successful runs. What is the probability that the system will fail on a single operation run.

f21-17-9780750658553
Figure 21.17 Three components in series.

Solution Using the method of reasoning above we call the components A, B, and C. The information we have is

p(A′)=15p(B)=0.99p(C′):p(C)=1:3

si73_e

The probability we would like to find is p(S’), the probability of failure of the system and we know that the system is in series so S = A ∩ B ∩ C; that is, all the components must function for the system to function. We can find the probability that the system will function and subtract it from 1 to find the probability of failure:

p(S′)=1p(S).

si74_e

As all the components are independent,

p(ABC)=p(A)p(B)p(C).

si75_e

We can find p(A), p(B), and p(C) from the information we are given: p(A) = 1 – p(A’) = 1 −1/5 = 0.8 and p(B) is given as 0.99.

Given that p(C') : p(C) = 1 : 3, C fails once in every 1 + 3 occasions, so p(C') = 1/4 = 0.25 and p(C) = 1 – p(C') = 0.75.

We can now find p(A)p(B)p(C) = 0.8 × 0.99 × 0.75 = 0.594. This is the probability that the system will function, so the probability it will fail is given by 1 – 0.594 = 0.406.

Components in parallel

If a system S consists of two components A and B in parallel, as in Figure 21.18, then S fails if both A and B fail.

f21-18-9780750658553
Figure 21.18 Two components in parallel. S fails if both A and B fail.

Again denote ‘The probability that A does not fail’ as p(A). Then the probability that S fails is

p(S′)=1p(S).

si76_e

As they are in parallel, S will function if either A or B functions: S = AB. Therefore

p(S)=p(AB).

si77_e

A and B are independent, but not disjoint. They are not disjoint because both A and B can function simultaneously. From the addition law of probabilities given in Section 21.4

p(AB)?p(A)+p(B)p(AB).

si78_e

As A and B are independent,

p(AB)=p(A)+p(B).

si79_e

Therefore

p(S)=p(AB)=p(A)+p(B)p(A)p(B)p(S)=1(p(A)+p(B)p(A)p(B))=1p(A)p(B)+p(A)p(B).

si80_e

As this is a long-winded expression it may be more useful to look at the problem the other way round. S fails only if both A and B fail: S' = A' ∩B'. So

p(S′)=p(A′B′)=p(A′)p(B′)

si81_e

as A and B are independent. Finally, p(S′) = p(A’) p(B’).

This is a simpler form that we may use in preference to the previous expression we derived. It must be equivalent to our previous result, so we should just check that by substituting p(A') = 1 – p(A) and p(B') = 1 – p(B). Hence

p(S′)=p(A′)p(B′)p(S′)=(1p(A))(1p(B))p(S′)=1p(A)p(B)+p(A)p(B)

si82_e

which is the same as we had before.

Let us try a mixed example with some components in series and others in parallel.

Example 21.13

An electrical circuit has three components, two in parallel, components A and B, and one in series, component C. They are arranged as in Figure 21.19. Component's A and B are identical components with a 2/3 probability of functioning and C has a probability of failure of 0.1%. Find the probability that the system fails.

f21-19-9780750658553
Figure 21.19 Two components in parallel and one in series (Example 21.12).

Solution We denote ‘The probability that A functions’ as p(A). The information we have is

p(A)=23p(B)=23p(C′)=0.001??p(C)=0.999.

si83_e

The probability we would like to find is p(S‘), the probability of failure of the system and we know that the system will function if (either A or B function) and C functions. So

S=(AB)C.

si84_e

From the multiplication law for independent events we have:

p=((AB)C)=p(AB)p(C).

si85_e

From the addition law for non-disjoint events (A and B are not disjoint because both A and B can occur), we have:

p(AB)=p(A)p(B)p(A)p(B)

si86_e

p(AB)=23+2323×23=890.8889.

si87_e

So p(S) = p (A ∪ B) p(C) = 0.8889 × 0.999 = 0.888. Hence, the probability that the system functions is 0.888.

21.9 Statistical modelling

Suppose, as in Example 21.2, we have tested 5000 resistors and recorded the resistance of each one to an accuracy of 0.01Ω. We may wish to quickly decide whether the manufacturers claim of producing 2000-Ω resistors is correct. One way of doing this is to divide our resistors into class intervals and draw up a table and a histogram. We can then count the percentage of resistors that are outside of acceptable limits and assume that the population behaviour is the same as the sample behaviour. Hence, if 98% of the resistors lie between 2000 ± 0.1% we may be quite happy.

A quicker way of doing this is to use a statistical model. That is, we can guess what the histogram would look like based on our past experience. To use such a model we probably only need to know the population mean, μ, and the population standard deviation, σ.

In the rest of this chapter, we will look at four possible ways of modelling data, the normal distribution, the exponential distribution, both continuous models, and the binomial distribution and the Poisson distribution which are discrete models.

21.10 The normal distribution

The normal distribution is symmetric about its mean. It is bell-shaped and the fatness of the bell depends on its standard deviation. Examples are given in Figure 21.20 and 21.21.

f21-20-9780750658553
Figure 21.20 A normal distribution with µ = 0 and σ = 1 (called the standard normal distribution N(0,1)).
f21-21-9780750658553
Figure 21.21 Normal distribution with µ = 2 and σ = 3.

The normal distribution is very important because of the following points:

1. Many practical distributions approximate to the normal distribution. Look at the histograms of lifetimes given in Figure 21.3 and of resistances given in Figure 21.4 and you will see that they resemble the normal distribution. Another common example is the distribution of errors. If you were to get a large group of students to measure the diameter of a washer to the nearest 0.1 mm, then a histogram of the results would give an approximately normal distribution. This is because the errors in the measurement are normally distributed.

2. The central limit theorem. If we take a large number of samples from a population and calculate the sample means then the distribution of the sample means will behave like the normal distribution for all populations (even those populations which are not distributed normally). This is as a result of what is called the central limit theorem. There is a project exploring the behaviour of sample means given in the Projects and Investigations available on the companion website for this book.

3. Many other common distributions become like the normal distribution in special cases. For instance, the binomial distribution, which we shall look at in Section 21.12, can be approximated by the normal when the number of trials is very large.

Finding probabilities from a continuous graph

Before we look at the normal distribution in more detail we need to find out how to relate the graph of a continuous function to our previous idea of probability. In Section 21.3, we identified the probability of a class with its relative frequency in a frequency distribution. That was all right when we had already divided the various sample values into classes. The problem with the normal distribution is that is has no such divisions along the x-axis and no individual class heights, just a nice smooth curve.

To overcome this problem we define the probability of the outcome lying in some interval of values, as the area under the graph of the probability function between those two values as shown in Figure 21.22.

f21-22-9780750658553
Figure 21.22 A continuous probability density function and the probability that the outcome lies in the interval a ≤ x ≤ b.

As we found in Chapter 7, the area under a curve is given by the integral; therefore, for a continuous probability distribution, f (x), we define

p(x?lies?between?a?and?b)=abf(x)dx.

si88_e

The cumulative distribution function gives us the probability of this value or any previous value (it is like the cumulative relative frequency). A continuous distribution thus becomes the ‘the area so far’ function and therefore becomes the integral from the lowest possible value that can occur in the distribution up to the current value.

The cumulative distribution up to a value a is represented by

F(a)=af(x)dx

si89_e

and it is the total area under to graph of the probability function up to a; this is shown in Figure 21.23.

f21-23-9780750658553
Figure 21.23 The value of the cumulative distribution F(a) marked as an area on the graph of the probability density function, f(x), of a continuous distribution.

We can also use the cumulative distribution function to represent probabilities of a certain interval. The area between two values can be found by subtracting two values of the cumulative distribution function as in Figure 21.24.

f21-24-9780750658553
Figure 21.24 The probability p(a < x < b) can be found by the difference between two values of the cumulative distribution function F(b) – F(a). Compare the difference in areas with Figure 21.22.

However, there is a problem with the normal distribution function in that is not easy to integrate! The probability density function for x, where x is N(μ σ2) is given by

f(x)=12πσe(xμ)2/2σ2.

si90_e

It is only integrated by using numerical methods. Hence, the values of the integrals can only be tabulated. The values that we have tabulated are the areas in the tail of the standardized normal distribution; that is

uf(z)dz

si91_e

where f (z) is the probability distribution with 0 mean (μ = 0) and standard deviation of 1 (σ = 1). This is shown in Figure 21.25 and tabulated in Table 21.3. In order to use these values we need to use ideas of transformation of graphs from Chapter 2 to transform any normal distribution into its standardized form.

f21-25-9780750658553
Figure 21.25 The area in tail of the standardized normal curve uf(z)si1_e dz tabulated in Table 21.3.

Table 21.3

Areas in the tail of the standardized normal distribution. P(z > u) values are given where z is a variable with distribution N(0, 1)

u0.000.010.020.030.040.050.060.070.080.09
0.00.500000.496010.492020.488030.484050.480060.476080.472100.468120.46414
0.10.460170.456200.452240.448280.444330.440380.436440.432510.428580.42465
0.20.420740.416830.412940.409050.405170.401290.397430.393580.389740.38591
0.30.382090.378280.374480.370700.366930.363170.359420.355690.351970.34827
0.40.344580.340900.337240.333600.329970.326360.322760.319180.315610.31207
0.50.308540.305030.301530.298060.294600.291160.287740.284340.280960.27760
0.60.274250.270930.267630.264350.261090.257850.254630.251430.248250.24510
0.70.241960.238850.235760.232700.229650.226630.223630.220650.217700.21476
0.80.211860.208970.206110.203270.200450.197660.194890.192150.189430.18673
0.90.184060.181410.178790.176190.173610.171060.168530.166020.163540.16109
1.00.158660.156250.153860.151510.149170.146860.144570.142310.140070.13786
1.10.135670.133500.131360.129240.127140.125070.123020.121000.119000.11702
1.20.115070.113140.111230.109350.107490.105650.103830.102040.100270.09853
1.30.096800.095100.093420.091760.090120.088510.086910.085340.083790.08226
1.40.080760.079270.077800.076360.074930.073530.072150.070780.069440.06811
1.50.066810.065520.064260.063010.061780.060570.059380.058210.057050.05592
1.60.054800.053700.052620.051550.050500.049470.048460.047460.046480.04551
1.70.044570.043630.042720.041820.040930.040060.039200.038360.037540.03673
1.80.035930.035150.034380.033620.032880.032160.031440.030740.030050.02938
1.90.028720.028070.027430.026800.026190.025590.025000.024420.023850.02330
2.00.022750.022220.021690.021180.020680.020180.019700.019230.018760.01831
2.10.017860.017430.017000.016590.016180.015780.015390.015000.014630.01426
2.20.013900.013550.013210.012870.012550.012220.011910.011600.011300.01101
2.30.010720.010440.010170.009900.009640.009390.009140.008890.008660.00842
2.40.008200.007980.007760.007550.007340.007140.006950.006760.006570.00639
2.50.006210.006040.005870.005700.005540.005390.005230.005080.004940.00480
2.60.004660.004530.004400.004270.004150.004020.003910.003790.003680.00357
2.70.003470.003360.003260.003170.003070.002980.002890.002800.002720.00264
2.80.002560.002480.002400.002330.002260.002190.002120.002050.001990.00193
2.90.001870.001810.001750.001690.001640.001590.001540.001490.001440.00139
3.00.001350.001310.001260.001220.001180.001140.001110.001070.001040.00100
3.10.000970.000940.000900.000870.000840.000820.000790.000760.000740.00071
3.20.000690.000660.000640.000620.000600.000580.000560.000540.000520.00050
3.30.000480.000470.000450.000430.000420.000400.000390.000380.000360.00035
3.40.000340.000320.000310.000300.000290.000280.000270.000260.000250.00024
3.50.000230.000220.000220.000210.000200.000190.000190.000180.000170.00017
3.60.000160.000150.000150.000140.000140.000130.000130.000120.000120.00011
3.70.000110.000100.000100.000100.000090.000090.000080.000080.000080.00008
3.80.000070.000070.000070.000060.000060.000060.000060.000050.000050.00005
3.90.000050.000050.000040.000040.000040.000040.000040.000040.000030.00003

cetable3

The standardized normal curve

The standardized normal curve is obtained from the normal curve by the substitution z = (x – μ) /σ and it converts the original distribution into one with zero mean and standard deviation 1. This is useful because we can use a table of values for z given in Table 21.3 to perform calculations.

Finding the probability that x lies between a given range of values

Supposing we have decided that a sample of resistors have a mean of 10.02 and a standard deviation of 0.06, then what percentage lie inside an acceptable tolerance of 10 ± 0.1?

We want to find the area under the normal curve N(10.02,0.062) between x = 9.9 and x = 10.1, that is, the shaded area in Figure 21.26.

f21-26-9780750658553
Figure 21.26 The area under the normal curve of mean 10.02 and standard deviation 0.06 between 9.9 and 10.1.

First, convert the x values to z values, by using z = (xμ) / σ:

x=9.9z=9.9-10.020.06=-2x=10.1z=10.1-10.020.06=1.3333

si92_e

So we now want to find the shaded area for z values (which will be the same area as above), shown in Figure 23.27.

In order to use Table 21.3, we need to express the problem in terms of the proportion that lies outside of the tolerance limits, as in Figure 21.28.

f21-28-9780750658553
Figure 21.28 The area outside of the tolerance limits as given in Figure 21.27.
f21-27-9780750658553
Figure 21.27 The area under the standardized normal curve of mean 0 and standard deviation 1 between z = −2 and z = 1.333. The area is equivalent to that shown in Figure 21.26.

We use the table of the standardized normal distribution to find the proportion less than z = – 2. As the curve is symmetric this will be the same as the proportion greater than z = 2. From the table this gives 0.02275.

The proportion greater than z = 1.33 from the table is 0.09176. Hence, the proportion that lies outside of our limits is

0.02275+0.09176=0.11451.

si93_e

As the total area is 1, the proportion within the limits is 1 – 0.11451 = 0.88549.

21.11 The exponential distribution

The exponential distribution is also named as the failure rate function, as it can be used to model the rate of failure of components.

Consider a set of 1000 light bulbs, a similar make to those tested in Section 21.1. However, now consider a batch of bulbs at random that have already been in use for some, unknown time. They are, therefore, of mixed ages. On measuring the time of failure we get Table 21.4.

Table 21.4

Time to failure of a sample of light bulbs

Time of failure (h)Class mid-pointFrequencyCumulative frequencyfi xi
0–20010026026026000
200–40030019445458200
400–60050015460877000
600–80070010070870000
800–10009008078872000
1000–120011006084866000
1200–140013003888649400
1400–160015003391949500
1600–180017002394239100
1800–200019001495626600
2000–220021001296825200
2200–240023001097823000
2400–26002500998722500
2600–2800270013100033800
1000638300

cetable4

These data are represented in a histogram given in Figure 21.29, giving the frequencies and relative frequencies, and Figure 21.30, giving the cumulative frequencies and relative cumulative frequencies.

f21-29-9780750658553
Figure 21.29 Histogram of frequencies given in Table 21.4.
f21-30-9780750658553
Figure 21.30 Cumulative frequency of time to failure of a sample of 1000 light bulbs.

Notice that Figure 21.29 looks like a dying exponential. This is not unreasonable as we might expect failure rates to be something like the problem of radioactive decay of Chapter 8, that is, a dying exponential.

We could think of it in a similar way to a population problem. The proportion that have failed after time t is given by the cumulative distribution function F. The proportion that are still functioning is therefore 1 − F. The increase in the total proportion of failures is given by the failure rate multiplied by the number still functioning, if λ is the failure rate this gives

dFdt=λ(1F).

si94_e

This differential equation can be solved to give:

F=1Aeλt.

si95_e

Using the fact that at time 0 there are no failures then we find A = 1. This gives the cumulative distribution of the exponential distribution as

F=1eλt

si96_e

where λ is the failure rate, that is, the proportion that will fail in unit time. The probability distribution can be found from the cumulative distribution by differentiating, giving

f=dFdt=λeλt

si97_e

Mean and standard deviation of a continuous distribution

We can find the mean and standard deviation of a continuous distribution by using integration to replace the summation over all values. The mean is given by

μ=xf(x)dx

si98_e

where the integration is over all values in the sample space for x. For the exponential distribution this gives

μ=xλeλxdx.

si99_e

which can be found by using integration by parts (Chapter 7) to be 1/λ.si100_e

The standard deviation is given by

σ=(xμ)2f(x)dx

si101_e

where the integration is over all values of x. For the exponential distribution this gives

σ=0(x1/λ)2λeλxdx.

si102_e

Again using integration by parts, we obtain σ=1/λ.si103_e

So we see that the mean is 1/λ, as is the standard deviation for the exponential distribution.

Comparison of the data with the model

We can now compare a statistical model with the data given in Table 21.4. To do this we calculate the cumulative frequencies for the maximum value in each of the class intervals. The mean of the sample is 638.3. We calculated that the mean of the exponential distribution is given by 1/λsi104_ethe inverse of the failure rate.

638.3=1λλ=1.567×103.UsingF(t)=1eλtF(200)=1e1.567×1030.269p(0<x<200)=F(200)F(0)=0.269F(400)=1e1.567×103×4000.466P(200<x<400)=F(400)F(200)=0.197

si105_e

and so on, giving the values as in Table 21.5. The model's predictions agree quite well with the data. To find the model predicted frequencies and cumulative frequencies we multiply by the number in the sample, 1000.

Table 21.5

Time to failure of a sample of light bulbs compared with values obtained by modelling with the exponential distribution

DataModel predictions
Time of failure (h)Class mid-pointFrequencyCumulative frequencyfi xiF (x)ProbabilitiesCumulative frequencyFrequency
0–200100260260260000.2690.269269269
200–400300194454582000.4660.197466197
400–600500154608770000.6090.143609143
600–800700100708700000.7140.105714105
800–100090080788720000.7910.07771977
1000–1200110060848660000.8470.05684756
1200–1400130038886494000.8880.04188841
1400–1600150033919495000.9180.0391830
1600–1800170023942391000.9400.02294022
1800–2000190014956266000.9560.01695616
2000–2200210012968252000.9680.01296812
2200–2400230010978230000.9770.0099779
2400–260025009987225000.9830.0069836
2600–28002700131000338000.9880.0059885
638300

cetable5

21.12 The binomial distribution

Consider a random system with a sequence of trials, the trials being such that:

1. Each trial has two possible outcomes (e.g. non-defective, defective), which we assign the outcomes of 1 (success) and 0 (failure). This type of trial is called a Bernoulli trial.

2. On each trial p(l)= θ and p(0)= 1 –θ and θ is the same on all trials.

3. The outcome of the n trails are mutually independent.

Pn (r) is the outcome of r successes in n trials and

pn(r)=(nr)θr(1θ)nr

si106_e

where

(nr)=nCr=n!(nr)!r!=n(n1)(nr+1)r!.

si107_e

Setting α = 1 –θ the probability of r successes in n trials is given by the rth term in the binomial expansion:

(θ+α)n=αn+nθαn1+n(n1).2!θ2αn2+n(n1)(nr+1)r!θrαnr+θn.

si108_e

Hence, the term binomial distribution.

Example 21.14

In five tosses of a coin find the probability of obtaining three heads.

Solution Assign the outcome of obtaining a head to 1 and tail to 0. Assume that the coin is fair and therefore θ=12,?1-θ=12.si109_e The probability of obtaining three heads in five tosses of a coin is given by the binomial probability:

p5(3)=5!3!2!θ3(1θ)2=5×42!(12)3(12)2=0.3125.

si110_e

Mean and variance of a single trial

The mean of a discrete distribution can be found by using µ = Σxp (x) and the variance is

σ2=(xμ)2p(x)

si111_e

where the summation is over the sample space.

We can use these to find the mean and variance of a single trial with only two outcomes, success or failure. The outcome of success has the value 1 and occurs with probability θ and the outcome of failure has the value 0 with probability 1 – θ. Then, the mean is given by 1 × θ + (1 – θ) × 0 = θ.

The variance of a single trial is given by

(1θ)2θ+(0θ)2(1θ)=θ2θ2+θ3+θ2θ3=θ(1θ).??????????x??μ??p(1)??x??μ??p(0)

si112_e

The standard deviation is the square root of the variance:

σ=θ(1θ).

si113_e

The mean and standard deviation of the binomial distribution

The expressions involving a summation over the entire sample space can be used to find the mean and standard deviation of the binomial distribution but they take a bit of manipulation to find. Instead, we can take a short cut and use the fact that each trial is independent. The mean of the union of n trials is given by the sum of the means of the n trials. Similarly (for independent trials only), the variance of the union of the n trials is given by the sum of the variances.

Therefore, the mean of the binomial distribution for n trials is given by the number of trials × mean for a single trial = nθ. The variance is given by (1 – θ) and therefore the standard deviation is

σ=nθ(1θ).

si114_e

Example 21.15

A file of data is stored on a magnetic tape with a parity bit stored with each byte (8 bits) making 9 bits in all. The parity bit is set so that the 9 bits add up to an even number. The parity bit allows errors to be detected, but not corrected. However, if there are two errors in the 9 bits then the errors will go undetected, three errors will be detected, four errors undetected, etc. A very poor magnetic tape was tested for the reproduction of 1024 bits and 16 errors were found. If on one record on the tape there are 4000 groups of 9 bits, estimate how many bytes will have undetected errors.

Solution Call 1 the outcome of a bit being in error and 0 that it is correct. We are given that in 1024 (n) trials there were 16 errors. Taking 16 as the mean over 1024 trials and using

1024θ=16

si115_e

θ=161024=164.

si116_e

Errors go undetected if there are 2,4,6, etc. The probability of two errors in 9 bits is given by

P9(2)=(92)θ2(1θ)7=9!7!2!(164)2(6364)7=0.0078717.

si117_e

Multiplying by the number of data bytes of 4000 gives approximately 31 undetected errors.

The probability of four errors will obviously be much less.

P9(4)=(94)θ4(1θ)5=9!4!5!(164)4(6364)5=0.0000008.

si118_e

This probability is too small to show up only 4000 bytes. As the probability of six or eight errors is even smaller then they can safely be ignored.

The probable number of undetected errors is 31.

21.13 The Poisson distribution

The Poisson distribution is used to model processes where the distribution of the number of incidents occurring in any interval depends only on the length of that interval. Examples of such systems are:

1. incoming telephone calls to an exchange during a busy period;

2. customers arriving at a checkout counter in a supermarket from 4 to 6p.m.;

3. accidents on a busy stretch of the M1;

4. number of misprints in a book.

When modelling situations in a Poisson process we use four assumptions:

1. If A is the event of n incidents in an interval and B the event of m incidents in another non-overlapping interval then A and B are independent, that is, p(A∩B) = p(A)p(B).

2. If A is the event of n incidents in an interval then P(A) depends only on the length of the interval– not on the starting point of the interval.

3. The probability of exactly one incident in a small interval is approximately proportional to the length of that interval, that is, P1(t)λtsi119_efor small t.

4. The probability of more than one incident in a small interval is negligible. Thus, for small t, P2 (t) ≈0 and we can also say that

limt0Pn(t)t=0for??n>1.

si120_e

It follows that P0(t) + P1(t) ≈ 1 and as by assumption (3), P1(t) ≈ λt we get P0(t) ≈ 1 – λt. We now think about the number of incidents in an interval of time of any given length, (0, t), where t is no longer small. We can divide the interval into pieces of length h, where h is small, and use the assumptions above. We can see that in each small interval of length h there is either no event or a single event. Therefore, each small interval is approximately behaving like a Bernoulli trial. This means that we can approximate the events in the interval (0, t) by using the Binomial distribution for the number of successes r in n trials. The probability of r incidents in n intervals, where the probability of an incident in any one interval is λh, is given by

Pr(t)=limh0(nr)(λh)r(1λh)nr.

si121_e

Substituting h = t / n gives

Pr(t)=limnn!(nr)!r!λrtrnr(1λtn)nr.

si122_e

We can reorganize this expression, by taking out of the limit terms not involving n

Pr(t)=(λt)rr!limnn!(nr)!nr(1λtn)n(1λtn)r.

si123_e

We can rewrite the first term inside the limit to give

Pr(t)=(λt)rr!limnnnn1nnrn(1λtn)n(1λtn)r.

si124_e

Now we notice that the first term inside the limit is made up of the product of r fractional expressions, which each have a term in n on the top and bottom lines. These will all tend to 1 as n tends to ∞. The last term is similar to the limit that we saw in Chapter 7 when calculating the value of e. There, we showed that

limn(1+1n)n=e

si125_e

and by a similar argument we could show that

limn(1+xn)n=ex.

si126_e

It, therefore follows that

limn(1+λtn)n=eλt.

si127_e

The last expression involves a negative power of (1− λt/n), which will tend to 1 as n tends to ∞.

This gives the Poisson distribution as

Pr(t)=(λt)rr!eλt.

si128_e

This is an expression in both r and t where r is the number of events and t is the length of the time interval being considered. We usually consider the probability of r events in an interval of unit time, which gives the Poisson distribution as

Pr=λrr!eλ

si129_e

where λ is the expected number of incidents in unit time.

The mean and variance of the Poisson distribution

The Poisson distribution was introduced by considering the probability of a single event in a small interval of length h as (λh). We then used the binomial distribution, with θ = λh and h = t/n and n tending to ∞, to derive the expression for the Poisson distribution. As the mean of the binomial distribution is it would make sense that the mean of the Poisson distribution is nλh. Using n = t/h we get the mean as λt over an interval of length t and therefore the mean is λ over an interval of unit length.

By a similar argument we know that the variance of the Binomial distribution is (1 – θ). Substituting θ = λh we get the variance as nλh(1 – λh). As n tends to infinity and h to 0 we get the limit λt. Therefore, the variance in unit time is λ.

Example 21.16

The average number of ‘page not found’ errors on a web server is 36 in a 24-h period. Find the probability in a 60-min period that:

(a) there are no errors;

(b) there is exactly one error;

(c) There are at most two errors;

(d) there are more than three errors.

Solution Assuming that the above process is a Poisson process, then we have that the average number of errors in 24 h is 36 and therefore the average is 1.5 in 1 h. As the mean is λ, we can now assume that the number of errors in 1 h follows a Poisson distribution with λ = 1.5, giving

Pr=(1.5)rr!e1.5.

si130_e

(a) We want to find P(no errors) = Po = ((1.5)0/0!)e−1.5= e−1.5 ≈0.2231.

(b) We want to find P(exactly one error) = P1= (1.5)1 /1!e−1.5 = 1.5e−1.5 ≈0.3347.

(c) p (at most two errors) = Po + P1 + P2= 0.2231 + 0.3347 + (1.5)2/2!e−1.5 = 0.8088.

(d) P(more than three errors) = 1 –P (at most three errors) = 1 – (P0+ P1+ P2+ P3).

Using the result from Part (c) we get

P(more?than?three?errors)=10.8808(1.5)3/3!e1.510.88080.1255=0.0657.

si131_e

21.14 Summary

1. The mean of a sample of data can be found by using

x¯=1nixi

si132_e


where the summation is over all sample values and n is the number of values in the sample. If the sample is divided into class intervals then

x¯=1nifixi

si133_e

where xiis a representative value for the class, fiis the class frequency, and the summation is over the all classes.

2. The standard deviation of a sample of data can be found by using

σ=1n?i(xi-x¯)2

si134_e


where the summation is over all sample values, n is the number of values in the sample, and x¯si135_e is the sample mean. If the sample is divided into class intervals then

σ=1n?ifi(xi-x¯)2

si136_e

where xiis a representative value for the class, fiis the class frequency, x¯si137_e is the sample mean, and the summation is over all classes. The square of the standard deviation is called the variance.

3. The cumulative frequency is found by summing the values of the current class and all previous classes. It is the ‘number so far’.

4. The relative frequency of a class is found by dividing the frequency by the number of values in the data sample – this gives the proportion that fall into that class. The cumulative relative frequency is found by dividing the relative frequency by the number in the sample.

5. In probability theory the set of all possible outcomes of a random experiment is called the sample space. The probability distribution function, for a discrete sample space, is a function of the outcomes that obeys the conditions:

0p(xi)1

si138_e


where xiis any outcome in the sample space and

ip(xi)=1

si139_e

where the summation is over all outcomes in the sample space.

6. An event is a subset of the sample space. The probability of an event (if all outcomes are equally likely) is

p(E)=The number of outcomes in the eventThe number of outcomes in the sample space.

si140_e


7. The addition law of probability is given by P(AU B) = P(A) + P(B) –(A ∩B) for non-disjoint events.
If A∩B =ø this becomes P(A∪B) = P(A)+P(B)for disjoint events.

8. Multiplication law of probabilities: p(A∩B) = p(A) p(B|A) if events A and B are not independent.

9. The definition of independence is that B is independent of A if the probability of B does not depend on A:p(B|A)= p(B) if B is independent of A. In this case the multiplication law becomes p(A∩B) = p(A)p(B), where A and B independent events.

10. Bayes's theorem is

p(B|A)=p(A|B)p(B)p(A).

si141_e


11. The normal, or Gaussian, distribution is a bell-shaped distribution. Many things, particularly involving error distributions, have a probability distribution that is approximately normal.

12. To calculate probabilities using a normal distribution we use areas of the standard normal distribution in table form, so the variable must be standardized by using the transformation

z=xμσ

si142_e


where μ is the mean and σ the standard deviation of the distribution.

13. The exponential distribution is used to model times to failure.

14. The probability density function of the exponential distribution is f(t) = λ e−λtand the cumulative density function is given by F = 1– e−λt.

15. The mean and standard deviation of a continuous distribution can be found by

μ=xf(x)dxandσ=0(xμ)2f(x)dx

si143_e


where the integration is over the sample space. For the exponential distribution these give µ = 1/λ , that is, the mean time to failure is the reciprocal of the failure rate. Also σ = 1/λ.

16. The binomial distribution is a discrete distribution that models repeated trials where the outcome of each trial is either success or failure and each trial is independent of the others. Its probability function is

pn(r)=(nr)θr(1θ)nr

si144_e


where

(nr)=nCr=n!(nr)!r!=n(n1)(nr+1)r!

si145_e

and r is the number of successes in n trials.

17. The mean of a discrete distribution is given by µ = Σxp(x) and the variance is σ2= Σ(x −µ)2p(x). mean of the binomial distribution, for n trials, is nθ, the variance is (1 – θ), and the standard deviation σ=nθ(1θ).si146_e

18. For the Poisson distribution the number of incidents occurring in any interval depends only on the length of that interval. Its probability function is

Pr=λrr!eλ

si147_e


where λ is the expected number of incidents in unit time. The mean and the variance are λ.

21.15 Exercises

21.1. An integrated circuit design includes a capacitor of 100 pF (picofarads). After manufacture, 80 samples are tested and the following capacitances found for the nominal 100-pF capacitor (the data are expressed in pF)

90 100 115 80 113 114 99 105 90 99 106 10395 105 95 101 87 91 101 102 103 90 96 9799 86 105 107 93 118 94 113 92 110 104 10495 93 95 85 96 99 98 83 97 96 98 84102 109 98 111 119 110 108 102 100 101 104 10593 104 97 83 98 91 85 92 100 91 101 103101 86 120 96 101 102 112 119

si148_e


Express the data in table form and draw a histogram. Find the mean and standard deviation.

21.2. What is the probability of throwing a number over 4 on one throw of a die?

21.3. What is the probability of throwing a number less than 4 or a 6 on one throw of a die?

21.4. What is the probability of throwing an odd number or a number over 4 on one throw of a die?

21.5. What is the probability of drawing any Heart or a King from a well shuffled pack of cards?

21.6. On two throws of a die, what is the probability of a 6 on the first throw followed by an even number on the second throw?

21.7. Throwing two dice, what is the probability that the sum of the dice is 7?

21.8. What is the probability that the first two cards dealt from a pack will be clubs.

21.9. A component in a communication network has a 1% probability of failure over a 24-h period. To guard against failure an identical component is fitted in parallel with an automatic switching device should the original component fail. If that also has a 1% probability of failure what is the probability that despite this precaution the communication will fail?

21.10. Find the reliability of the system, S, in Figure 21.31. Each component has its reliability marked in the figure. Assume that each of the components is independent of the others.

f21-31-9780750658553
Figure 21.31 A system for Exercise 21.10.

21.11. A ball is chosen at random out of a bag containing two black balls and three red balls and then a selection is made from the remaining four balls. Assuming all outcomes are equally likely, find the probability that a red ball will be selected:

(a) at the first time;

(b) the second time;

(c) both times.

21.12. A certain brand of compact disc (CD) player has an unreliable integrated circuit (IC), which fails to function on 1% of the models as soon as the player is connected. On 20% of these occasions the light displays fail and the buttons fail to respond, so that it appears exactly the same as if the power connection is faulty. No other component failure causes that symptom. However, 2% of people who buy the CD player fail to fit the plug correctly, in such a way that they also experience a complete loss of power. A customer rings the supplier of the CD player saying that the light displays and buttons are not functioning on the CD. What is the probability that the fault is due to the IC failing as opposed to the poorly fitted plug?

21.13. If a population, which is normally distributed, has mean 6 and standard deviation 2, then find the proportion of values greater than the following:

(a) 9

(b) 10

(c) 12

(d) 7

21.14. If a population, which is normally distributed, has mean 3 and standard deviation 4, find the proportion of values less than the following:

(a) 1

(b) –5

(c) –1

(d) 0

21.15. If a population, which is normally distributed, has mean 10 and standard deviation 3, find the proportion of values that satisfy the following:

(a) x> 3

(b) x < 12

(c) x > 9

(e) x < 11

(f) 9 < x < 11

(g) 3 < x < 12

21.16. A car battery has a mean life of 4.2 years and a standard deviation of 1.3 years. It is guaranteed for 3 years. Estimate the percentage of batteries that will need replacing under the guarantee.

21.17. A certain component has a failure rate of 0.3 per hour. Assuming an exponential distribution calculate the following:

(a) the probability of failure in a 4-h period;

(b) the probability of failure in a 30-min period;

(c) the probability that a component functions for 1 h and then fails to function in the second hour;

(d) of a group of five components the probability that exactly two fail in an hour.

21.18. A certain town has 50 cash dispenser machines but due to inaccessibility is only visited for repairs once a week after which all the machines are working. The failure rate of the machines is approximately 0.05 per 24 h. A town councillor makes a public complaint in a newspaper that on average at least one in 10 of the machines does not function. Assume an exponential model and calculate the number that are not functioning 1 day, 2 days,. …, 7 days after the day of the visit. Take the mean of these results to assess whether the councillor is correct.

21.19. A bag contains two red balls and eight green balls. A ball is repeatedly chosen at random from the bag, its colour recorded and then replaced. Find the following probabilities:

(a) the first three picked were green;

(b) in a selection of five there were exactly two red balls;

(c) there were no more than three red balls out of the first 10.

21.20. One hundred CDs, each containing 74 min of recording time, were tested for defects. The frequency of defects is given in Table 21.6. Calculate the mean number of defects per 74 min of recording time and choose an appropriate probability model. Using your model copy, complete the two empty columns of the table. Comment on the agreement between the number of incidents and the chosen model frequencies.

Table 21.6

Data for Exercise 21.20

Number of defects in 74 min of recording timeNumber of CDs with r defectsChosen model probabilitiesChosen model frequencies
023
132
226
312
45
52
6+0

cetable6

21.21. Telephone calls are received at a call centre at a rate of 0.2 per second on average. Calculate the probability that more than 11 calls are received in 1 min.

21.22. Tankers arrive at a dock at a rate of four per day. Assume that the arrivals are a Poisson process and find the following:

(a) the probability that less than five tankers arrive during 1 day;

(b) the probability that there are over five arrivals;

(c) the probability that there is exactly five arrivals;

(d) The probability that there are between two and five inclusive, arrivals.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.186.92