Chapter 7

Measuring Uncertainty

The study of the rules of probability is interrupted in order to deal with an important, outstanding issue: the measurement of uncertainty. The method of comparison with a standard, which was used to obtain the rules, is rarely satisfactory and other methods need to be developed.

7.1 Classical Form

With any event E is associated the complementary event E c that is true whenever E is false, and false whenever E is true. It was shown in §3.7 that your two probabilities for these events necessarily add to one: img. It follows that the measurement of the uncertainty of any event may be replaced by that of its complement because one probability can be calculated from that of the other. We saw in §5.5, an example involving birth dates where this was advantageous. Here we study the special case where your beliefs in the event and its complement are the same; p (E) = p (E c). In that case, since they add to one, both probabilities must equal one half; p (E) = p (E c) = ½. An example is provided by the genuine toss of what appears to you to be a coin from a reputable mint, where your belief that it will land heads equals that for tails; hence both events, “heads” and “tails”, have probability one half. Notice that there is no obligation on you to have the same beliefs in the two outcomes, only that if you do, your probabilities are both one half. The idea extends to the throw of a cubical die; if you have the same beliefs for each of the six faces falling uppermost, then each must have probability one sixth, for six equal numbers, adding to one makes each of them equal to one sixth. Strictly, we have yet to prove an addition rule for probabilities with more than two events; it will be done in §8.1.

Generally, if an uncertain outcome has N possibilities, only one of which can occur, and if your beliefs in each possibility are the same, then your belief in each is 1/N. The coin has N = 2, the die N = 6, and roulette N = 37 or 38. This is the classical definition of probability and has essentially been used in §3.2 when considering an urn containing N balls numbered consecutively from 1 to N, for if all numbers are equally uncertain when a single ball is withdrawn, we said the ball was drawn at random and each value had probability 1/N. The classical definition is fine in a limited context but is deficient in that for most cases such a split into equally uncertain possibilities does not exist; for example, contemplating uncertainty about tomorrow's weather has no such split. The real importance is as a standard with which other events may be compared. Notice that a tangible account of equal beliefs was provided by your attitude to a reward, in that if you are indifferent between a prize if ball 7 is withdrawn, or one contingent on ball 37, and this for any pair of different numbers, then your beliefs, and hence your probabilities, are all the same. The classical definition is therefore operational.

It is perhaps worth repeating the point, illustrated with the toss of a coin above, that there is no obligation on you to have a probability of one half for heads, since you may judge the coin to be biased and take 0.55 or any other value between zero and one. Similarly, you may judge the roulette wheel to be biased. These are not illogical values, merely unusual ones; indeed, they may be sensible if you have reason to suspect the casino. Some people argue that if there are N possibilities, only one of which can arise, and if you assign probability 1/N to each, then you are ignorant of the outcome using this as a definition of ignorance. This is unsound, for it is a strong statement to judge all possibilities equally uncertain, surely not one of ignorance. Why, with N = 2, is the probability of 0.50 ignorance but one of 0.55 knowledge? Our attitude is that judgments of uncertainty are always made against a knowledge base and that ignorance, or an empty base, is not a sensible position. As soon as you understand the meanings of “toss” and “coin”, you are not ignorant of coin tosses. Ignorance has no place here, but this does not mean that the assignments of 1/N should be avoided; on the contrary, they often provide a convenient default position. Thus, suppose geneticists are attempting to isolate a gene in a species having N chromosomes, then their knowledge base may not guide them as to which chromosome it lies on, and, in default of more information, they may assign the same probability to each. Similarly, at the commencement of a police investigation with N suspects, the police might reasonably regard all equally probable of being guilty. In neither case is it ignorance, but merely a sensible position describing uncertainty.

The concept of equal beliefs, basic to the classical form, can often be used to advantage in other situations; we illustrate this with the example in §4.1 of inflation next year. It may be convenient to think of an inflation figure that you think is equally likely to be exceeded, or not be attained. If you settle on 3%, then your probability of it being less than 3% is ½ and the same value of ½ holds for values greater than 3%. The idea can be extended to find a value, like 2%, such that you think it is as likely to be less than 2% as between 2% and 3%. Similarly, 5% might be a value such that you feel it is as likely that the inflation might exceed it as be between 3 and 5%. Now you have four ranges of inflation, all equally probable. The idea can be extended to provide a probability distribution, as will be shown in §9.8. There remain many phenomena where the classical definition cannot be used, so we pass to a more powerful device based on frequency.

7.2 Frequency Data

Earlier we met the idea that if you, as a doctor, had seen many patients with a disease and noted that a proportion p of them had exhibited a symptom, then your probability that a further patient with the disease would show the symptom was also p. That is, you pass from a frequency among the patients seen, to a probability or belief about a further patient. This passage is so common that there has grown up a confusion between frequency, which refers to data, and probability, which is belief, so that people speak of the frequency interpretation of probability. There is a connection between the two concepts, but it is wrong to identify them, so let us investigate the situation carefully, starting with a simple example.

Suppose that you have before you a drawing-pin; the American term is a thumbtack. Such a pin has the property that, when tossed, it can either land with the point down on the table, D, or sticking up in the air, U. You are uncertain about the event that the pin will fall with the point up and will express this by your probability p (U). We assume some knowledge base that remains fixed throughout the discussion. Now let the pin be tossed a number of times under conditions that remain stable; you do not, for instance, alter the tossing procedure. To be specific, let the results of 10 tosses in order be UUDUDUUUDD and denote this result by x. Notice that 6 times the pin fell uppermost and 4 times it fell with the point down, giving a frequency of U s of 0.6. You are about to toss an eleventh time and are uncertain about the event U on that occasion. What is your probability p (U |x)? A natural response is 0.6, the frequency in the series of 10 tosses, and this is the procedure used by the doctor in the example. Is it sound? Can you pass from a frequency to a belief in this way? Is it coherent to do so? There are three reasons for thinking that the passage from frequency to belief is not so straightforward.

First, suppose that you had only tossed the pin once instead of 10 times. Looking back at the series of 10 results given above, you see that the first toss gave U and that the frequency of U s is therefore 100%. Is your probability that the second toss will result in the pin falling point uppermost the same as the frequency, that is, 1? Surely not, it might have increased a little from the original value p (U), as with the red and white urns in §6.9, but not so far as to make the event certain for you, thereby violating Cromwell's rule (see §6.8). So you cannot make the identification of frequency and belief when the former is based on little information, here a single toss. If 10 is enough to allow the identification, but 1 is not, where do you draw the borderline; is 7 enough?

A second reason for doubting the identification is demonstrated by shifting the example. Suppose that the pin and the tosses are replaced by observations of the weather on successive days. Each day you observe if it is dry D or unsettled U, meaning “not dry”. Suppose you record the weather on 10 successive days and obtain the same sequence x as before. What is your probability that the eleventh day will be unsettled? I suggest that the frequency of unsettled days in the last 10 days, here 0.6, is not a reasonable answer, at least under my knowledge base, because successive days of weather tend to be alike. Indeed, the forecast that tomorrow's weather will be the same as today's is often better than the one based solely on the frequency of weather. The last two days in x were both dry, so we are in a dry spell and your probability that tomorrow will be unsettled may be less than the frequency 0.6. This example is based on weather in Western Europe, and readers in other parts of the world may need to adapt it to their own conditions, using their knowledge base. The key point is that the order of the U s and D s may matter with weather, but usually not with drawing-pins.

Those are two reasons for doubting the identification of frequency with belief. Here is another of a different character. Suppose, after having tossed the pin with the results given, you are now provided with a different type of drawing-pin and told that the next, eleventh, toss is to be made with this pin. It would not be sensible to ignore the 10 tosses already made, since they do provide you with some information about pins in general, but on the other hand, it is a different pin and the direct use of the frequency is dubious. You may, for example, look at the new pin and see that the head is heavier than the one used in the tossing, so perhaps this one is more likely to fall point upward than the other. You might therefore express your belief with a value greater than the frequency of 0.6.

So while the idea of identifying belief with frequency is attractive, it cannot be used in all circumstances. Nevertheless, frequencies surely do influence beliefs and what has to be done is to understand the relationship between the two ideas. This we proceed to do.

7.3 Exchangeability

Consider again the drawing-pin and the result of the 10 tosses UUDUDUUUDD that was abbreviated to x. Each toss could result in one of two outcomes, so there are 2 × 2 ×…× 2 (with 10 twos), or 1,024 possible results for the 10 tosses. Before you perform the tossing, you are uncertain about the outcomes and therefore, by the general thesis, ideally have probabilities for each of the 1,024 possibilities, the assessment of which is a formidable task if only because of the number involved. An assumption is now introduced, whose adoption will make this task much easier. It needs to be emphasized that the assumption is not always appropriate.

Suppose that when you think about the possible results of the 10 tosses, you feel that your probability for any series depends only on the number of times the pin falls with the point upward and not on the arrangement of the U s and D s in the series. Thus, in the case cited, your probability of the result depends only on the fact that there are 6 U s (and therefore 4 D s), so that UUUUDDUDUD, still with 6 U s and 4 D s but in a different order, is, for you, just as probable as what you actually observed. If this were so, you would have a much easier assessment task, for there would now be only 11 possibilities, not 1,024, that is, from 0 to 10 U s. One way of expressing this is to say that any one toss, with its resulting outcome, may be exchanged for any other with the same outcome, in the sense that the exchange will not alter your belief, expressing the idea that the tosses were done under conditions that you feel were identical. Here is the formal definition:

A series of results, each of which can be one of the same two types, is exchangeable for you under knowledge base img if your probability for the series under img depends only on the numbers of the two types and not on their positions in the series. It will be called the assumption of exchangeability. In the example, the two types are U and D. Your probability, assuming exchangeability, for the series x with 6 U s out of 10, may be written p (6|10). Given 10 tosses, this is your probability for 6 U s. Series that are exchangeable are of special importance because there are many series that almost all people agree are exchangeable, and because of the simplicity that they introduce into the structure of your beliefs. The concept is related to that of sufficiency mentioned toward the end of §6.9, the number of U s, rather than their order, being sufficient.

The assumption of exchangeability implies that the series of outcomes UUUUUUDDDD, in which the 6 U s and 4 D s each occur together, is just as probable for you as the original series in which the U s and D s were mixed up. People are often unhappy with this, but its resolution is to notice that the series in the last sentence has a pattern to it, whereas the other is chaotic, and there are vastly more chaotic series than there are those with a pattern. There are 210 possible arrangements of 6 U s and 4 D s, very few of which exhibit a pattern. It is the pattern that singles out that series, not its uncertainty; it is a coincidence that the U s and D s form clumps. Coincidences are hard to discuss because the striking pattern, which owes nothing to uncertainty, is easily confused with your uncertainty. And there is the question of what constitutes a pattern; does UDDUUUDDUU have a pattern because the last 5 tosses are identical to the first 5?

Exchangeability implies that your probability of U at any place in the series is the same as U at any other place. To see this, consider the first two terms in the series with the 2 × 2 = 4 possibilities

equation

Exchangeability implies that UD and DU have the same probability. U occurs at the first place if either UU or UD occurs, whereas it occurs at the second with UU or DU, and in either case, your probability of a U is the sum of your probabilities for the two possibilities. Now UU is common to both and UD has the same probability as DU, so the two sums are equal and U is just as probable at the first place as at the second. Generalizing this argument, it is apparent that any arrangement, say UDU, is just as probable at any place in an exchangeable series as at any other. An exchangeable series is stationary; its uncertainties do not change with place. In thinking about these results, you need to distinguish between your probability of U in the second place, on knowledge base img, and the same uncertainty when you have already observed U in the first place. The notation makes this clear, comparing p (U 2) and img, where a subscript refers to the place in the series and img is understood. Notice that img is easily calculated in terms of the basic, exchangeable values, as img by the multiplication rule (§5.3). Since, as was seen above, img, it follows from this last result that img, so that looking backward, on the left-hand side of the equation, is the same as looking forward, on the right. Although in general for an exchangeable series, U 1 and U 2 are not independent, img, in §7.5 it will be seen that any exchangeable series can always be built up from series in which independence does obtain.

Most people would consider the series of tosses of a drawing-pin to be exchangeable. They would not think it true of the series of weather on successive days, because consecutive days tend to be more alike than widely separated days, so that UUUDDD is more probable for them than UDUDUD, despite the frequency being 0.5 in both cases. The records of the doctor observing the presence or absence of a symptom with a disease, you might think exchangeable, though if you knew the sexes of the patients and thought the disease was sex related, you might not. This example also serves to illustrate an important point, that since the definition of exchangeable depends on your probabilities, it depends on your knowledge base, and a series exchangeable under one base, without knowledge of sex, may fail to be under another, with knowledge of sex.

Let us return to the question of the connection, if any, between frequency and probability. In the case of the pin, you wanted to pass from the results of the 10 tosses to an eleventh toss about to be made. The only aspect of the 10 tosses that matters under exchangeability is the 6 U s. One possibility is for you to consider the 11 tosses, the 10 already seen and the new one, exchangeable. If so, we say that the new toss is exchangeable with the others. This would be reasonable with the single pin, but not when the eleventh toss was to be performed with a different pin. It might be fine with the medical example, but perhaps not if you were the next patient and you thought yourself different in some relevant way from those patients the doctor had already seen. For example, the study may have been made in one country and you are the resident of another.

Three examples were mentioned in §7.2: tosses of a pin, weather on successive days, and tosses of one pin aiding beliefs about another. The second series is not exchangeable, and in the third, the further toss is not exchangeable with the previous ones. In both these cases, we ruled out the possible identification of frequency with belief. It is only in the first example with exchangeability in the series and extended to a further toss that the identification might be reasonable, and it is this case that we study further, beginning with a special type of exchangeable series.

7.4 Bernoulli Series

To illustrate the series, let us return to our basic urn with balls, all indistinguishable from each other except for color, some being red, the rest being white, the numbers of both types being known to you and from which you think a ball is to be drawn at random. Denote the proportion of red balls by the Greek letter θ, pronounced “theta” with a long e. (There is an important reason for going outside of the Roman alphabet that will appear in § 11.4.) Remember that you know the value of θ. Under these circumstances, your probability that a withdrawn ball will be red is θ. Suppose the number of balls in the urn is vast, so that the withdrawal of even a few balls will not affect the constitution of the urn and, in particular, will not change θ. Then your probability that a second ball will be red is still θ and it remains θ however many balls are withdrawn. Furthermore, your probability is not affected by the results of all previous withdrawals; even if 10 withdrawals have each produced a white ball, your probability of drawing a red ball on the eleventh withdrawal remains θ unless you discard the premise of randomness. In the terminology of §4.3, the withdrawals are independent, though there only two events were considered; the extension to many will be introduced in §8.8. With this example in mind, we make the definition:

A series, each member of which can have one of only two outcomes, is for you, a Bernoulli series if your probability of one outcome is the same for every member of the series and is independent of any earlier outcomes in the series. It is named after a Swiss mathematician

A Bernoulli series is somewhat artificial because you never learn from it, in the sense that your probability remains fixed at θ, whatever happens. In the artificiality of such a series, even 100 withdrawals, all of which resulted in a white ball, would not change your belief that withdrawal 101 would be red. Despite this, Bernoulli series is most important for a reason to be explained now. It is easy with a Bernoulli series to calculate your probability of any result. For example, take the series we had before with the toss of a drawing-pin, UUDUDUUUDD with 6 U s and 4 D s. If it is Bernoulli, the probability for each U is θ, for each D, img, and since the outcome of any one toss is judged by you to be independent of previous tosses, these probabilities may be multiplied (see §4.3). Hence, your probability for that series is img, depending only on the number of U s, here 6, out of the 10 tosses. It follows that a Bernoulli series is exchangeable, since the dependence solely on the numbers of U s was our criterion for exchangeability. It is a special type of exchangeable series in which you judge the individual events to be independent. With independence, it is easy to write down your probabilities for any series by multiplication. With series that are exchangeable, but not Bernoulli, we do not, at the moment, know how to do this. For an exchangeable series of length 10, we saw there were 11 probabilities to think about, whereas in the Bernoulli case, there is only one, namely, θ, your probability for any U, and once you know that, you can find all the others by multiplying the appropriate numbers of θ and img together. However, there is a link between exchangeable and Bernoulli series that enables the exchangeable calculation to be made in terms of the Bernoulli.

7.5 De Finetti's Result

We return to the familiar urn with a large number of balls that are identical except for their colors, some red, and the rest white. Suppose that, unlike the case in the last section, you do not know the proportion of red balls but are told truthfully that it is one of two values img or img. In §6.9, the case where img was 1/3 and img was 2/3 was considered, the former being referred to as the white urn, the latter, the red urn. Now the values of img and img are not restricted but, merely for identification, it is supposed that img is the smaller, so having a lesser proportion of red balls, it is referred to as the white urn. You do not know whether it is the red or the white urn that is before you and since you are uncertain, you will have a probability that it is the white one, p say and 1 − p that it is the red urn.

Were you to know the proportion of red balls, you would, on withdrawing balls at random, have a Bernoulli series. Suppose that the balls are drawn at random from the urn without knowing whether it is the white or the red one, and that the result of 10 such drawings is RRWRWRRRWW, abbreviated to x, essentially the same as with the tosses, though for ease of relating the result to the urns the notation has been changed, U to R, D to W, retaining x for the data. In the original urn treatment, lower-case letters were used for the data and upper-case for the true constitution. Here θ replaces the latter, freeing the capital letters. Complete consistency of notation is rarely possible. What is your probability, p (x), for this result? Before the drawings were made, what is your belief that this result would be obtained? It is not easy to see directly but recall from §5.6 the rule of the extension of the conversation and extend your discussion of the series to include the value of θ. You require p (x) which, by the rule is

equation

Now all the terms on the right-hand side of this equation are known, since once you know the proportion of red balls, the series is Bernoulli with img and similarly for the other possibility, img. Your probability of it being the white urn, corresponding to img, was written p, so substituting these values into the right-hand side, you have

(7.1) equation

and the calculation is complete. The argument has been presented here for the case where there were only two values of θ. If there were three, the same procedure of extending the conversation would be available, except that now there would be three terms on the right-hand side of (7.1). Generally, any number of values of θ can be included, resulting in that number of terms on the right-hand side.

It is obvious that this series, with two possible values of θ, is exchangeable because the result just obtained depends only on 6, the number of red balls, out of 10, and not on their order of withdrawal. So we have established that the withdrawal of balls at random from an urn of unknown composition generates an exchangeable series. De Finetti showed that every series with two possibilities, here R and W, that you judge to be exchangeable can be represented as random withdrawals from an urn of unknown composition. In other words, the procedure just described is available for every exchangeable series and exchangeable series necessarily reduce to combinations of Bernoulli series. The result is of considerable importance because it enables you to think about your beliefs for an exchangeable series in a simple way. In the case where θ can take only two values, all you need to think about is the probability of img, denoted by p above. Generally, each possible value of θ has to be assigned a probability and the extension of the conversation is then used to perform the evaluation. The last stage can be left to the mathematician or the computer and need not concern us here, but the assignment of probabilities, such as p, needs further thought.

An objection might be raised. Suppose you were to think about θ to two places of decimals; that is, you admitted values 0.01, 0.02,…, 0.99 so that there were 99 possibilities in all. (The extreme, special values of 0.00 and 1.00 being omitted.) Then there are 99 values of p for you to think about, whereas for the series of R s and W s of length 10, there were only 11 to be considered, and all this fuss has only made your task harder. This is perfectly sound, but once the 99 have been settled upon, the calculation will work for any length of series of R s and W s, not just 10; so that the 99 will replace the 1,000 needed for a series of length 999 and there is a real simplification. It will be seen in §9.8 that there are compact ways of studying the values of p that are not available for the raw series.

7.6 Large Numbers

In order to think about a series with two possible outcomes that you judge to be exchangeable, by de Finetti's result you need to think only about the values of θ underlying a Bernoulli series. In the case of the urn, θ had a concrete interpretation, as the proportion of red balls, but in other cases, such as patients with a disease, some exhibiting a symptom, it is not clear what meaning to attach to θ under exchangeability, so that before de Finetti's result can be used, we need to be able to escape from the tyranny of the Greek alphabet and think in medical terms. To do this, we need a mathematical result called a law of large numbers, which says that for any series, each member of which has two possible outcomes and that you consider exchangeable, you have probability 1, that is, you are sure, that the frequency of one of these outcomes tends to a fixed value as the length of the series increases, rather than wobbling about all over the place. The fixed value to which the frequency tends is an interpretation for θ. (Probability 1 may appear to violate Cromwell's rule in §6.8, but the law is the result of logic in the form of mathematics and is therefore exempted from the rule.) Consequently, to think about an exchangeable series of two outcomes, you need, apart from the Bernoulli calculations, only to think about your beliefs about the frequency of outcomes in a long series. This value is termed the limit of the observable frequencies.

Let all the threads be put together to produce your probabilistic description of a series with two outcomes that you judge to be exchangeable.

1. By exchangeability, you admit that the frequency of an outcome in the series will tend to a limit. Denote this limit by θ.
2. Assess your probabilities p (θ) for the various values of θ.
3. Combine this with the Bernoulli probabilities, img, giving a term img, and take the sum of these over the various values of θ. This is your probability for r outcomes of one type in an exchangeable series of length n.

Consider the case of a drawing-pin and, to illustrate, take the nine possible values of θ, 0.1, 0.2,…, 0.9 corresponding to the outcome that the pin falls with point upward, U. You need 9 numbers, adding to one, to describe your beliefs about the pin. If you feel that the frequency of it falling with point up will probably be less than its falling with point down, then the larger probabilities would be assigned to the smaller values. For example, you might assign probabilities

equation

to the 9 values; thus, p (θ = 0.1) = 0.03. Contrast this with the case of a coin, where you might attach high probability to heads and tails occurring with equal frequency in the long run but, with due attention to Cromwell's rule, would not rule out bias. A possible set of probabilities, spread over the same 9 values as before, might be

equation

This means that your probability that the coin is fair, and being tossed fairly, is 0.92, but admit that other values are possible with small probabilities.

We began in §7.2 by considering how frequency and belief were related; how the doctor's observation that a symptom, occurring with frequency p in the patients already seen, related to the belief that the next patient would exhibit the symptom. With the 10 tosses of the pin giving the result x, we sought p (U |x), your probability that the next toss, judged exchangeable with the other tosses, would result in it falling with point up, event U. (To avoid fussy notation, U now refers to the eleventh toss.) Next, we show how this probability can be calculated using the three-step procedure just described. By the multiplication rule (compare the case of U 1 and U 2 in §7.3),

equation

The denominator p (x) was calculated in §7.5, the last displayed equation therein, for two values of the limiting frequency θ, with its obvious generalization. The numerator p (Ux) follows in exactly the same way since Ux has one extra U, giving 7 U s and still 4 D s. Hence, the required result, p (U|x). This method is available for every exchangeable series and a future outcome judged exchangeable with it.

There is another way of arranging the calculation, which makes use of Bayes rule in learning about θ from the observed data, and which is illustrated using the example of the red and white urns in §6.9. Here θ 1 as in §7.5, is the proportion of red balls with θ 1=img in the white urn W and θ 2=img in the red R. Suppose that some balls are withdrawn at random and let the result be denoted x. (In §6.9, 12 balls were withdrawn and 9 found to be red, 3 white, but the exact nature of the data, x, need not concern us here.) In analogy with the doctor, uncertain about the next patient, let us consider your probability that another random ball will be red, an event denoted r. Extend the conversation to include θ, the true but unknown constitution of the urn, with the result

equation

Now, if you know it is the red urn R, the data x tells you nothing about the next ball, so p (r |Rx) = p (r|R) = 2/3 and similarly, p (r|Wx) = 1/3. (In the terminology of §4.3, r and x are independent, given R.) The other probability p (R|x) was found in §6.9, by Bayes rule, to be 64/65 and naturally, the complement p (W |x) = 1/65. Inserting these numerical values into the result just displayed yields

equation

to four decimal places. This is a little less than 2/3 = 0.6667 to the same accuracy, the slight reduction being caused by the fact that, although you are almost sure it is the red urn, just a little doubt, expressed through your probability 1/65, that it is the white one remains.

There remains a general problem, that of summing the various terms and performing the calculations in Equation (7.1) of §7.5 above. This is a technical matter and has been attended to by mathematicians. My best advice to you is to consult a statistician, just as you would consult a plumber if the repairing of your plumbing system was outside your capabilities. However, it is possible to describe one of the results that have been obtained in a form that is of immediate use without technical skills.

7.7 Belief and Frequency

Take a series with two outcomes, U and D, of length n that you judge to be exchangeable and suppose that you have just observed r U s and therefore (n r) D s. By exchangeability, it does not matter to you where the U s and D s appeared in the series. Now consider your probability that the next term, judged exchangeable with the series, will be U. This is p (U |r, n), your probability of U, given the result (r, n). Although it is tempting to equate this with r /n, the frequency of U s in the series, we saw in §7.2 that it would not be realistic to do this for short series with small n. The methods of the last section tell us how to proceed but they involve technicalities. It is now shown how they may be overcome if an assumption is made about your opinion, p (θ), of the hidden value of θ, the limiting frequency of U s.

Denote the observed frequency in the series by f = r /n, which is firmly based on data and has no element of uncertainty. There is another frequency, the limiting one, θ, that is conceptual and not data based, about which you are uncertain and have beliefs. Let g be your best guess as to the value of θ before you have any data on the series. Exactly what is meant by “best guess” will be explained in a moment. Now you have two pieces of information about the frequency with which U, rather than D, will arise: f, which is based on data, and g, which is based on initial beliefs about the series. It surely seems natural, in assessing the probability of a further U, to incorporate both these pieces of information, combining them in some way. The simplest way to do this is to take a bit of one and add it to a bit of the other; addition being the simplest arithmetic operation, So consider the expression (nf + mg)/(n + m), where m is a positive number. If m = n, the expression gives equal prominence to f and g, being the average (f + g)/2. If n is much larger than m, little attention is paid to g and the expression is near to f; similarly, if m is by far greater, the emphasis is on g. Generally, the expression lies between f and g, exactly where depends on the balance between m and n. Technical analysis shows that it is often appropriate to equate the result (nf + mg)/(n + m) to the required probability p (U|r, n). Leaving the discussion of m for the moment, the final result is

(7.2) equation

Consider an example. Suppose with the drawing-pin, you believed initially that D might be little more probable than U and that your best guess at the limiting frequency of U s was 0.4. This is g. Now you have data of 6 U s in 10 tosses, r = 6, n = 10, f = 0.6, and the formula gives

(7.3) equation

This is a simple combination of the two frequencies, which necessarily lies between them, greater than what you initially believed, because of the observations, but less than you observed, because of your lower, initial belief. It is now possible to see what g, your best guess of θ, means, for if the general result (7.2) is used with n = 0, that is, before any observations have been made, img reduces to g when n = 0. Hence, your best guess is your belief that the first member of the series will be U, rather than D. There remains the value of m to consider.

A clue to m can be found by reflecting that so far you have not inserted any indication of how strongly you felt g reflected your initial opinion. Thus, with the pin, you may not have much strength of conviction about 0.4, whereas had it been a coin that was being tossed, you would have had a firm opinion that the frequency in the limit would be 0.5 and these feelings were reflected in the two sets of nine probabilities chosen in the last section. m measures this conviction, being small in the case of the pin and high in the case of the coin. But what of an exact value? There are several ways to assess this. One of them is to assess p (U |r, n) directly and then equate it to the above expression, so obtaining m since all the other quantities are known. For example, suppose with the pin, you felt 0.55 was your probability after the 6 U s and 4 D s, then arithmetic shows that m = 10/3, a little more than three. (Put m = 10/3 in (7.3) and you will obtain the result 0.55.) Let us take 3, rather than the more precise value. Then what you are saying in using the formula is that you are taking 10 parts of the data to 3 parts of your initial belief, out of 13 parts in all. Roughly, m = 3 says that your initial belief is worth about three observations in the series. Had m been 10, you would have given equal weight to the two frequencies. With the coin and g = 0.5, you might have had a large value, say m = 100. Equation (7.2) then gives a probability for U on the next toss of 0.509 and the observed frequency of 0.6 has only slightly affected your belief that the coin is being tossed fairly. Notice how the fact that m measures your strength of conviction about g goes some way to answering those who feel that a single probability is inadequate, instead preferring upper and lower probabilities in order to incorporate this conviction (see §3.5). The analysis demonstrates that when the conviction is relevant, it can be included within our simpler framework by introducing m. Furthermore, the introduction of m is balanced by your conviction about the data f, naturally expressed by the number n of observations. Here our simplicity has paid off and the additional complexity is unnecessary. The expression above requires your best guess g about θ, in the sense of your probability that the first toss will result in U, and also the strength of your conviction about θ measured by m in comparison with n, the length of the series.

As remarked above, if n is large, the formula weighs f, the frequency, very high and the effect of g is small, so the formula says that it is sensible to identify frequency and belief, provided the exchangeable data are numerous. Thus, if the doctor had seen a lot of patients whom he judged exchangeable, with a proportion f exhibiting the symptoms, a patient judged exchangeable with them would, for him, have a probability effectively f of exhibiting the symptom. This is the justification for a procedure, adopted in many cases, of equating the probability of an aspect of the future with a frequency observed in the past. Notice that it requires three conditions: an exchangeable series, a long series, and a case exchangeable with the series. The first condition rules out the weather; the last excludes a different pin.

There is one extremely important point to be made about (7.2), a point that will repeatedly arise in probability calculations and is not confined to exchangeable series. Once you have chosen the two values, g and m, to reflect your initial opinion and the strength of that opinion, you are committed to p (U|r, n) for all values of r and n, and not just those that you originally contemplated. Thus, in the case of the pin with g = 0.4 and m = 3, a series of five tosses all of which resulted in U and hence f = 1, would give your probability for another U on the sixth toss to be 0.78. When considering the values of g and m, you need to bear in mind that all these probabilities can be affected, and it is often useful to consider several hypothetical values of r and n.

A consequence of the rules of probability and the coherence they reflect is that while a few probabilities can be chosen at will, many others are automatically determined from the few by the rules. This is a general principle and affects all calculations of beliefs. In the exchangeable case, there are many implications from your choice of g and m, one for every possible series of data, and for every possible combination of f and n. If you find that there are no values of g and m that can accommodate your beliefs for all combinations, then you have two alternatives. You can retain exchangeability, but go back to the original p (θ), which will give you more flexibility. If this is still not enough, then your only resource is to abandon your view that the series is exchangeable. Here is an example.

There are many people who believe that if you have a long series almost entirely of U s, then there is a greater probability for a D next time than if you had experienced fewer U s. The idea being that compensation is needed to make up the appropriate frequency of D s that has so far been too low. One can easily see that this view conflicts with (7.2) since the bigger the r is, the larger is the probability of U next time. It follows that if you have belief in compensation, then you cannot simultaneously have beliefs that (7.2) accommodates. More can be said, for the compensation concept and exchangeability do not even cohere and you cannot believe both. Mathematically, img increases with r and the more U s you see, the greater is your belief in U next time.

It may be felt that excessive attention has been paid to the notion of exchangeability and that we have labored unduly over a rather narrow, specialized concept. The reason for our labors is that the notion is used throughout the analysis of data, where many series, not just of two but of any number of outcomes, are generally accepted, not only by you, but by nearly everyone, to be exchangeable. Even series, such as weather, that are not exchangeable, have been studied by connecting them with other series that are exchangeable, though the technicalities are beyond us here. So exchangeability arises all over the place and our hope is that by studying it in a simple case of two outcomes, you will gain an appreciation of its value elsewhere, even though the technicalities are understandably beyond you. The quantity θ that was introduced above is called a parameter and it will be seen in Chapter 11 how parameters play a central role in science. Next, we take a closer look at the Bernoulli parameter θ.

7.8 Chance

It was seen in §4.3, with the discussion of two events, that there was some simplification if the two events were independent; in particular, the product rule was simplified. Also, instead of three probabilities needed for a complete description of the uncertainty surrounding two events, A and B, for example, p (A), p (B|A), and p (B|A c), independence required only two, p (A) and p (B). (As usual, a fixed knowledge base is assumed, for independence can be destroyed or created by changes in the base, as we will see in a moment.) The simplification produced by independence is even greater with more than two events, considered in the next chapter. It would therefore be most desirable if you could create independence in your beliefs in some way; that is what the quantity we have denoted by θ does with exchangeable series. To see this, consider the first two tosses of the pin and the result UD. These are not independent for you, since your probability for the D on the second toss is influenced by the occurrence of U on the first. But now introduce θ and you have independence since img by the Bernoulli nature of the series, given θ. Generally, for any length of an exchangeable series, you have independence, given θ, but not without θ.

It is not a topic that will arise much in this book, but there are many uncertain situations that are most profitably studied by introducing a new, and perhaps a little artificial, quantity such as θ, to create independence. For example, in agricultural experiments, two varieties will behave similarly, and therefore not independently, because they experience similar weather conditions; so a quantity representing weather is introduced to create independence, given the weather, and thereby simplifying the analysis, without weather necessarily being described in terms of sunshine, temperature, humidity, and so on. Readers who are familiar with even the simplest statistical literature will have encountered the mantra “independent and identically distributed”, which occurs so frequently that it has acquired an acronym, iid. Yet the authors hardly ever mean what they say. What they intend is iid given some quantity such as θ.

Returning to the exchangeable series of two possible outcomes, U and D, let us look at θ in more detail. First notice that it behaves like a probability; indeed, within the Bernoulli series it is a probability, namely your probability of U were you to know its value, img. Also it obeys the probability rules, for example, in calculating the result img for your probability of r U s. Does θ therefore correspond to your belief in something? You already have beliefs about its value expressed by a probability p (θ), yet according to the attitude adopted in this book, it is nonsense for you to have a belief about your belief, if only because doing so leads to an infinite regress of beliefs about beliefs about beliefs…Another feature of the Bernoulli θ is that it has a degree of objectivity in the sense that if Peter and Mary both judge a series to be exchangeable, then the value of θ, as a limiting frequency, will be common to them both, though unknown to them both. The objectivity is limited though because if Paul disagrees with exchangeability, θ may not have a meaning for him. Experience shows that there is a massive agreement about some series being exchangeable, so that objectivity can be at least a convenient approximation.

The upshot of these considerations is that θ, while it obeys the rules of the probability calculus, is not a probability in the sense of a belief. As a result, we prefer to give it a different name and it is often referred to as a chance. Thus, de Finetti's basic result in §7.5 is that an exchangeable series of two outcomes is always a mixture of Bernoulli series with different chances. Notice that there are now three words that are almost synonymous in the English language but to which we have assigned special, different meanings. Probability always refers to your belief, likelihood to your uncertainty of a single event under different circumstances, and chance is a concept pertaining to a Bernoulli series. It may appear pedantic to fuss in this way, but experience has shown that the separation of the ideas is essential for a proper appreciation of uncertainty. It has the minor misfortune that we cannot vary the language, as modern writers like (see §2.8) switching between probability, likelihood, and chance, for if probability is meant, then probability it has to remain. It also helps to understand why mathematical modes of thought differ from those of poets. Poets like to invest words with many shades of meaning and encourage ambiguity, while mathematicians are precise and a word has a single, unambiguous meaning. Poets make simple things complicated; scientists try to make complicated things simple.

The relationship between probability and chance is profitably explored a little further using the pin as an example. First, p (U) expresses your belief that the first toss will result in the pin falling with point up, U. Strictly, it should be img, referring to your knowledge base but, as usual, img will be kept constant and conveniently omitted. On the contrary, p (U|θ) is θ, your belief concerning the first toss, were you to know the value of θ. The relationship between p (U) and p (U|θ) is, for the case where two values of θ, θ 1, and θ 2, are being considered, as in §7.5, obtained by using the extension of the conversation from U to include θ, and introducing p (θ) (§7.6)

equation

Generally, if there are many values of the chance that you consider possible, there will be a term equal to the value of the chance θ, times your probability for that value p (θ), the terms being added to provide your probability for U. The expression on the right-hand side plays an important, general role that is encountered in §9.3.

The “probabilities” that are basic to quantum mechanics are really chances, in our usage of the terms. Those who accept quantum mechanics accept exchangeability as part of that acceptance and therefore have chances. In statistical mechanics, there are two forms of exchangeability, Fermi–Dirac and Bose–Einstein. The same situation is observed in genetics, which is based on chances, not on probabilities. Furthermore, since the “probabilities” that physicists and geneticists recognize are really chances, the chances are associated in their minds with frequency, so that probability is thought of in terms of frequency.

We now have two methods of assessing probabilities: using the concept of cases, which have equal uncertainties, the classical method, and that based on frequency allied with the concept of exchangeability. The former applies only to a limited class of situations, such as games that use cards or dice. The second is of such wide use that probability is often confused with frequency. There remain situations where neither of these methods apply, for example, when you attempt to assess your probability that the political party you support will win the next democratic election. Here there are no equally probable cases and the frequency with which your party has won the previous elections is no guide, only because you do not make the judgment that those elections are, for you, exchangeable. We, therefore, need a further method. This is based on coherence and is treated in Chapter 13 when we have examined the phenomena that can arise when you contemplate three events.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.167.107