Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 32. Entropy and Uncertainty

Entropy is an information-theoretic measure of the amount of uncertainty in a variable. Beginning with Shannon's seminal works [908, 909], cryptographers and information theorists used entropy to determine how well transformations on messages obscure their meaning. Entropy has applications in a wide variety of disciplines, including cryptography, compression, and coding theory. This chapter reviews the basics of entropy, which has its roots in probability theory.

Conditional and Joint Probability

Definition 32–1. A random variable is a variable that represents the outcome of an event.

A word about notation. We write p(X = x₁) for the probability that the random variable X has value x₁. When the specific value does not matter (for example, when all values are equiprobable), we abbreviate this as p(X).

Sometimes the results of two different events are of interest.

Definition 32–2. The joint probability of X and Y, written p(X, Y), is the probability that the random variables X and Y will simultaneously assume particular values.

The example above involves the probability p(X = 6, Y = heads). If the two random variables are independent (that is, if the value of one does not affect the value of the other), then

p(X, Y) = p(X)p(Y)

Now suppose that the two random variables are not independent.

EXAMPLE: Let X represent the roll of a red die, and let Y represent the sum of the values from rolling the red die and a blue die. Then,

p(Y = 2) = 1/36	p(Y = 3) = 2/36	p(Y = 4) = 3/36	p(Y = 5) = 4/36
p(Y = 6) = 5/36	p(Y = 7) = 6/36	p(Y = 8) = 5/36	p(Y = 9) = 4/36
p(Y = 10) = 3/36	p(Y = 11) = 2/36	p(Y = 12) = 1/36

If both dice are fair, the formula above yields

p(X = 1, Y = 11) = p(X = 1)p(Y = 11) = (1/6)(2/36) = 1/108

But if Y = 11, then the only two possible throws of the red die are 5 and 6 (the corresponding numbers on the blue die are 6 and 5, respectively). The events X = 1 and Y = 11 cannot be simultaneously true. So, p(X = 1, Y = 11) = 0.

This example shows that if two events are not independent, then the formula for joint probability is more complicated. The next definition captures the notion of dependence.

Definition 32–3. The conditional probability of X given Y (written p(X | Y)) is the probability that X takes on a particular value, given that Y has a particular value.

We can now write the formula for joint probability in terms of conditional probability:

p(X, Y) = p(X | Y)p(Y) = p(X)p(Y | X)

In fact, the formula for joint probability of two independent events is a special case of the formula above. When X and Y are independent random variables,

p(X | Y) = p(X)

Entropy and Uncertainty

Definition 32–4. Let the random variable X take values from some set { x₁, ... x_n }. The value x_i occurs with probability p(X = x_i), where = 1.

The entropy, or uncertainty, of x is

H(X) = - lg p(X = x_i)

where “lg x” is the base 2 logarithm of x. (For purposes of this definition, we define 0 lg 0 to be 0.)

This definition measures the uncertainty of a message in bits.

EXAMPLE: Suppose Ann, Paul, and Pamela are finalists in a game. Ann and Pamela are twice as likely to win as Paul is. Let W be the random variable representing the winner, and let w₁ = Ann, w₂ = Pamela, and w₃ = Paul. Then p(W = w₁) = 2/5, p(W = w₂) = 2/5, p(W = w₃) = 1/5, and

H(W)	=	p(W = w_i) lg p(W = w_i)
	=	–p(W = w₁) lg p(W = w₁) – p(W = w₂) lg p(W = w₂) – p(W = w₃) lg p(W = w₃)
	=	– (2/5) lg (2/5) – (2/5) lg (2/5) – (1/5) lg (1/5) = –4/5 + lg 5 Ý 1.52

Were all three players equally likely to win, the uncertainty would be lg 3 Ý 1.58, again matching our intuition that the winner is less uncertain if two of the three are more likely to win. To take an extreme case, were Paul 100 times more likely to win than either Ann or Pamela, the uncertainty would be 0.14, considerably lower still.

Joint and Conditional Entropy

Joint and conditional entropy are analogous to joint and conditional probability.

Joint Entropy

Definition 32–5. Let the random variable X take values from some set { x₁, ... x_n }. The value x_i occurs with probability p(X = x_i), where p(X = x_i) = 1. Let the random variable Y take values from some set { y₁, ... y_m }. The value y_j occurs with probability p(Y = y_j), where p(Y = y_j) = 1. The joint entropy of X and Y is
H(X, Y) = p(X = x_i, Y = y_j) lg p(X = x_i, Y = y_j)

Conditional Entropy

Definition 32–6. Let the random variable X take values from some set { x₁, ... x_n }. The value x_i occurs with probability p(X = x_i), where p(X = x_i) = 1. Let the random variable Y take values from some set { y₁, ... y_m }. The value y_j occurs with probability p(Y = y_j), where p(Y = y_j) = 1. The conditional entropy, or equivocation, of X given Y = y_j is
H(X | Y = y_j) = p(X = x_i | Y = y_j) lg p(X = x_i | Y = y_j)

The conditional entropy of X given Y is

H(X | Y) = p(Y = y_j)[ p(X = x_i | Y = y_j) lg p(X = x_i | Y = y_j)]

The latter is the weighted mean of the conditional entropies of X given Y = y_j for the possible values of Y.

EXAMPLE: Let X represent the roll of a red die, and let Y represent the sum of the values from rolling the red die and a blue die. Then the conditional entropy of X given Y = 2 is

H(X | Y = 2) = p(X = x_i | Y = 2) lg p(X = x_i | Y = 2) = 0

because p(X = 1 | Y = 2) = 1 and p(X = i | Y = 2) = 0 for i = 2, ..., 6. However, the conditional entropy of X given Y = 7 is

H(X | Y = 7) = p(X = x_i | Y = 7) lg p(X = x_i | Y = 7) = lg 6 Ý 2.58

Intuitively, these results make sense. If the total of the red and blue dice comes up 2, both must be 1, and so in the first case the conditional entropy is 0 because, given the value of Y, there is no uncertainty in the value of X. In the second case, the red die may take on any of its six possible values, so, assuming that it is fair, the uncertainty corresponds to each possible value of X being equally likely.

The conditional entropy of X given Y is

H(X \| Y)	=	p(Y = y_j)[ p(X = x_i \| Y = y_j) lg p(X = x_i \| Y = y_j)]
	=	p(Y = 2)[ p(X = x_i \| Y = 2) lg p(X = x_i \| Y = 2)] + … + p(Y = 12)[ p(X = x_i \| Y = 12) lg p(X = x_i \| Y = 12)] Ý 1.8955

Perfect Secrecy

Perfect secrecy arises when knowing the ciphertext does not decrease the uncertainty of the plaintext. More formally:

Definition 32–7. Let M be a random variable that takes values from the set of messages m₁, …m_n. The cipher C = E(M) achieves perfect secrecy if H(M | C) = H(M).

EXAMPLE: The one-time pad (see Section 9.2.2.2) meets this requirement. Let M be a random variable representing a message selected from a set of n four-letter messages. The uncertainty of this variable is H(M). An attacker intercepts C = AAVG. Given this, the uncertainty of M is H(M | C). However, the attacker has gleaned no more information than was initially available, because the key is four randomly chosen letters. Any message could produce the intercepted ciphertext. Hence, H(M | C) = H(M), and the cipher achieves perfect secrecy.

Exercises

1:	Let X represent the roll of a red die, and let Y represent the sum of the values from rolling the red die and a blue die. Prove that p(X = 3\| Y = 8) = 1/5.
2:	Prove that the maximum entropy for an unknown message chosen from the set of possible messages { “yes”, “no” } occurs when the probability of each message is 1/2.
3:	Let X and Y be random variables that take values from finite sets. Prove that H(X, Y) ≤ H(X) + H(Y), with equality holding when X and Y are independent.
4:	Let X and Y be random variables that take values from finite sets. Prove that H(X, Y) = H(X \| Y) + H(Y).
5:	Let M and C be random variables that take values from the set of possible plaintexts and the set of possible ciphertexts for some cryptosystem. Prove that the cryptosystem provides perfect secrecy if and only if p(M \| C) = p(M).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

H(M)	=	p(M = m_i) lg p(M = m_i)
	=	– p(M = yes) lg p(M = yes) – p(M = no) lg p(M = no)
	=	–2^–1 lg 2^–1 – 2^–1 lg 2^–1 = 2^–1 + 2^–1 = 1

H(M)	=	p(M = m_i) lg p(M = m_i)
	=	(1/n) lg (1/n) = –n [ (1/n) lg (1/n) ] = –lg n^–1 = lg n

H(X, Y)	=	p(X = x_i, Y = y_j) lg p(X = x_i, Y = y_j)
	=	–2 [ 6 [ (1/12) lg (1/12) ] ] = lg 12 Ý 3.59

Table of Contents for 32. Entropy and Uncertainty

Create new playlist

Sign In

Sign Up