Annals of Probability, 8(1) (1980), 142–147.
It is shown that, when conditional on a set of given average values, the frequency distribution of a series of independent random variables with a common finite distribution converges in probability to the distribution which has the maximum relative entropy for the given mean values.
In statistical mechanics and other areas of physics, empirical distributions in the phase space conform in many circumstances to the distribution maximizing the entropy of the system subject to its constraints. The constraints are typically in the form of specified mean values of some functions of phase. If denotes the probability distribution over the state space, the constraints on p take the form
and the maximum entropy distribution is the one that maximizes the entropy function
subject to the constraints.
A principle stating that the empirical distribution possesses the maximum entropy within the restrictions of the system is due to Gibbs (1902). As a special case, he proposed the so-called canonical distribution as a description of systems subject to a single constraint that the average energy has a fixed value,
where are the energy levels of each state. In this case, the maximum entropy distribution has the form
which is the form that Gibbs called canonical.
Gibbs offered no justification for the canonical distribution, and the principle of maximum entropy in general. In spite of its apparent arbitrariness, however, the maximum entropy principle has since found a number of successful applications in a wide range of situations, and has led to many new developments in physics. For an informed discussion, see Jaynes (1967).
In a subsequent paper, Jaynes (1968) presented a demonstration that the distribution with the maximum entropy “can be realized experimentally in overwhelmingly more ways than can any other.” Therefore, for large physical systems, the empirical distribution should, indeed, agree with the maximum entropy distribution.
In this chapter, a limit theorem is given that provides a foundation the above principle in the same sense in which the law of large numbers justifies interpretation of limiting frequencies as probabilities. Informally stated, the theorem asserts that in the equiprobable case, the frequencies conditional on given constraints converge in probability to the distribution that has the maximum entropy subject to these constraints.
A generalization of this result is also given, which relaxes the assumption of all states being equally likely. In the general case, the frequencies conditional upon a set of conditions converge to the distribution that maximizes the entropy relative to the underlying distribution.
Let be a finite set of k elements and consider a series of independent identically distributed random variables with values on , such that
Denote by the frequency distribution of ,
where I is the characteristic function. Let be a given matrix and a given vector. Put and define
where S is the set of probability distributions on ,
Assume that . Define the entropy of a distribution in S by
with the convention Denote by the maximum point of H on D0,
Since H is continuous on S and D0 is compact, the maximum exists. Moreover, it is unique by virtue of strict concavity of H on S and convexity of the set D0.
Theorem 1. For every , there exists such that for every δ, ,
, where is the maximum entropy distribution, .
This theorem is a special case of the more general conditional law of large numbers, which will now be stated. Replace the assumption (1) of the equiprobable case by a general assumption that
where is a given distribution. Assume, without loss of generality, that . Define the entropy of a distribution in S relative to the distribution q by
Again let D0 be the set in (2), , and replace the definition (4) of p0 by the definition
Again, the maximum relative entropy point p0 exists and is unique.
Theorem 2. For every , there exists such that for every ,
as , where is the distribution with the maximum entropy relative to q,
The maximum relative entropy distribution p0 is easy to find. It is given by
where the constants are determined by the condition .
Theorem 1 follows immediately from Theorem 2, since for
so that the maximum points in Eqs. (4) and (8) coincide.
Proof of Theorem 2. Let be fixed, and put
where p0 is given by (8). For each , define
Define uniquely a point by
Introduce a topology on S by the metric
We will first prove that
Let the set be directed by the relation if . Since S is compact, the directed set has at least one limit point. Let p* be one such limit point. Choose an arbitrary and put
There exists such that
Then
and therefore . Since this is true for every , it follows that
Now for every . Since is a continuous function, the same is true for the limiting point,
But p0 is the unique maximum point of on , and therefore . Thus, is the only limit point of , which proves Eq. (13).
It follows that there exists such that for every δ, ,
Let δ be selected arbitrarily from and fixed. Put where V is given by (10), and denote the adherence of W by . Put
Since
and by virtue of Eq. (14), it follows that
Put
so that
and define
Let
We will now show that B contains an open set. Let and put
The point is an interior point of for every . To prove that, choose
For every such that
it is true that
so that . Thus, belongs to the interior of for every . Since is an interior point of V and, by continuity of , also of R, the point will be in the interior of both V and R if λ is sufficiently small. Thus, such is an interior point of B, and consequently B contains an open set, say C.
To summarize our results so far, we have proven that there exists an open set C such that
and
Now
where
We will make use of the inequality
valid for , where we define in agreement with the earlier convention The inequality (16) is easily established from the Stirling formula. Then
and therefore
where #[Z] denotes the number of elements of a finite set Z. Now
converges with to a finite limit where are the volumes of S, C, respectively, by -dimensional Lebesgue measure, and . Since , the right-hand side of Eq. (17) converges to zero as , and consequently
which completes the proof.
3.144.2.218