A probability space will be considered throughout. In general, an r.v. with values in a measurable state space is a measurable function
Then, for every measurable subset of , it holds that
The law of is the probability measure defined on by and, more concretely, for measurable subsets , by
In this appendix, will be assumed to be discrete (finite or countably infinite), with measurable structure (-field) given by the collection of all subsets. Then, is an r.v. if and only if
In the sense of nonnegative or absolutely convergent series,
for and functions , which are nonnegative (and then is in ) or satisfy (and then is in ).
Thus, the law of can be identified with the collection of nonnegative real numbers with sum .
More generally, a (nonnegative) measure on a discrete space can be identified with a collection of nonnegative real numbers, and
where is nonnegative or satisfies . Then, and .
Note that the sum of a nonnegative or an absolutely converging series does not depend on the order of summation.
The natural state space of some random variables is , for instance when they are defined as an infimum of a possibly empty subset of or as a possibly infinite sum of integers. The first step in their study is to try to determine whether , and if yes to compute this quantity.
For this, the formula
is often more practical than . We give a related formula for . Recall that if , then , and that
This formula is a particular instance of the following integration by parts formula: if is an -valued r.v., and is absolutely continuous and has nonnegative density , then by the Fubini theorem
The generating function for (the law of) an r.v. X with values in is denoted by and is given by the power series
possibly extended to . If , then , and this formula can be used (with proper care) for with the convention . If , then
and thus the convergence radius is greater than or equal to . Hence, is finite and continuous on and has derivatives of all orders at given by
The function is determined by its restriction on , and for some computations, it is easier to work on this restriction.
A Taylor series expansion of at shows that
which provides a theoretical inversion method yielding the law from the generating function. In practice, algebraic series expansions should be preferred.
Using the monotone convergence theorem (Theorem A.3.2), in ,
and the moments can be obtained from the second formula. If , then and can be computed using the Taylor expansion of order for at , given by
If and , then these Taylor expansions limited to order yield .
The following result is one of the reasons that generating functions are important. The converse of this result uses multivariate generating functions.
The following result is an application of the ideas of Lemma A.1.1 to generating functions.
If and are two events, can be denoted in some circumstances by , and it is said “ and .”
If , then we define the probability of conditional on by
and is a probability measure on .
If is a nonnegative or integrable r.v. for , then its expectation or variance conditional to s.t. is defined as its expectation or variance for , that is,
If is an event s.t. , or equivalently , then
If is a countable collection of events s.t.
then the probability of any event or the expectation of any nonnegative or integrable r.v. can be obtained as
with the natural convention for .
Two events and are independent if and only if
and if this is equivalent to . For an arbitrary index set , the events are independent if and only if
Two random variables and are independent if and only if, for all measurable and in the respective state spaces, and are independent, that is,
This expresses that the joint law is the product law . Hence, the Fubini theorem yields that and are independent if and only if, for any and , which are nonnegative or satisfy that be ,
If and have discrete state spaces, it is sufficient that for every and in the respective state spaces.
For an arbitrary index set , the random variables are independent if and only if for any and and measurable included in the respective state spaces
that is, if and only if the joint laws are given by the product of the marginals
and the Fubini theorem can be used as earlier. If is finite, then it suffices to check this property for .
The random variables are independent and identically distributed, i.i.d. for short, if they are independent and all have same law.
The most general independence notion is as follows. The sub--fields of (see Section A.3) are independent if and only if, for all and and , it holds that
If is finite, then it suffices to check this property for .
Note that the random variables are independent if and only if the generated sub--fields are independent and that the events are independent if and only if the random variables are independent.
If is an event s.t. , all these independence notions can be applied to the conditional probability measure , and the terminology “independent conditional to ” is then used. In particular, and are independent conditional to if and only if
or, equivalently,
The notion of a Markov chain is a generalization of the notion of a sequence of i.i.d. random variables. We recall two basic limit theorems for the latter. The first result shows that in a certain scale randomness tends to disappear, and the second quantifies precisely the residual randomness in the appropriate scale.
These two results have been adapted to recurrent Markov chains using regenerative techniques in Section 4.1. This has yielded notably the pointwise ergodic theorem and the Markov chain central limit theorem.
The state space will here be discrete, and we develop the notions in Section 1.2.2, see Section A.3.2 for some extensions to general measurable state spaces.
The space of signed measures with the total variation norm can be identified to the separable (with dense countable subset) Banach space (complete normed space) of summable real sequences indexed by with its natural norm. Its dual space of bounded functions with the supremum norm can be identified with the space of bounded sequences.
These vector spaces are of finite dimension if and only if is finite. Recall that a vector space is of finite dimensions if and only if all norms are equivalent.
The subset of finite nonnegative measures is a closed (for the norm) cone (stable by nonnegative linear combinations) of . The set of probability measures is the intersection of with the unit sphere. Thus, is a closed convex subset of and is complete for the distance induced by the norm.
Let be a Banach space. Its dual is the space of all continuous (for the norm) linear forms (real linear mappings) on . The action of in on is denoted by duality brackets as
The strong dual norm on is given by the operator norm
and for this norm is a Banach space.
Let be a discrete state space. For , let and denote the spaces of real sequences s.t., respectively,
If is finite, then all these finite sequence spaces can be identified to elements of , and all these norms are equivalent. The main focus is on infinite , and these spaces are isomorphic to the classic spaces of sequences indexed by .
The Banach space of signed measures on with the total variation norm can be identified with the separable space , and its dual with by identifying in with the linear form
and the norms are in duality with this duality bracket.
The Banach space is the subspace of of the sequences that converge to : for all , there exists a finite subset of s.t. for in . Then, with continuous injections,
The countable space of sequences with finite support is dense in and in for , and these Banach spaces hence are separable.
On the contrary, is not separable for in finite , and its dual contains strictly . Indeed, let be a sequence with values in , an enumeration of , and if and else . Then, is in and
and thus cannot be dense in .
The dual space of can be identified with , with duality bracket for in and in again given by
For in , for all , there exists a finite subset of s.t. , which readily yields using Lemma A.2.1 that
so that the total variation norm (or the norm) is the strong dual norm both considering (or ) as the dual of or as a subspace of the dual of .
The Banach space can be given the weak topology
also denoted by . It can also be considered as the dual space of , and given the weak- topology
also denoted by . Recall that in infinite dimension the dual space of is much larger than .
A simple fact is that a sequence converges for if and only if it is bounded (for the norm) and converges termwise. A diagonal subsequence extraction procedure then shows that a subset of is relatively compact for if and only if it is bounded.
Let be infinite and identified with . Then, the sequence of clearly converges to for , and hence is not closed for this topology. Moreover, this sequence cannot have an accumulation point for , as this could only be as per the above-mentioned conditions, whereas . Hence, the bounded set is not relatively compact for nor for the (stronger) topology of the total variation norm.
These are instances of far more general facts. Recall that a normed vector space is of finite dimension if and only if its unit sphere is compact and that the unit sphere is always compact for the weak- topology (but not necessarily for the weak topology), which helps explain its popularity, see the Banach–Alaoglu theorem (Rudin, W. (1991), Theorem 3.15).
Let us now assume that the above-mentioned notions are restricted to the space of probability measures , that is, that both the sequence and its limit are probability measures.
Then, not only the and topologies coincide (a fact which extends to general state spaces), but as is discrete, they also coincide with both the topology of the termwise convergence (product topology) and the topology of the complete metric space given by the (trace of the) total variation norm.
The resulting topology is called the topology of weak convergence of probability measures. The convergence in law of random variables is defined as the weak convergence of their laws.
Indeed, clearly on , the weakest topology is that of termwise convergence, and the strongest is that of total variation. Let for and be in , and for every in . Let be arbitrary. It is possible to choose a finite subset of and then s.t.
As these are probability measures, if , then
and thus, , and hence,
The fact that is weak- relatively compact in , and computations quite similar to that shown earlier, show that a subset of is relatively compact for the weak convergence of probability measures if and only if is tight, in the following sense: for every , there exists a finite subset of s.t.
18.224.59.145