This appendix introduces without proofs the main notions and results in measure and integration theory, which allow to treat the subject of Markov chains in a mathematically rigorous way.
A probability space is given by
The elements of are called events, and regroup certain random outcomes leading to situations of interest in such a way that these can be attributed a “likelihood measure” using .
Clearly, , and and more generally , and if for in is in , then . Note that in order to consider finite unions or intersections it suffices to use or where necessary.
The trivial -field is included in any -field, which is in turn included in the -field of all subsets of . The latter is often the one of choice when possible, and notably if is countable, but it is often too large to define an appropriate probability measure on it. Moreover, the notion of sub--field is used to encode partial information available in a probabilistic model.
The following important property is in fact equivalent to -additivity, using the fact that is a finite measure.
An arbitrary intersection of -fields is a -field, and the set of all subsets of is a -field. This allows to define the -field generated by a set of subsets of as the intersection of all -fields containing , and thus it is the least -field containing . This -field is denoted by and encodes the probabilistic information available by observing .
A subset of containing an event of probability is said to be almost sure, a subset of included in an event of probability is said to be negligible, and these are two complementary notions. The -additivity property yields that a countable union of negligible events is negligible. By complementation, a countable intersection of almost sure sets is almost sure. A property is almost sure, or holds a.s., if the set of all in that satisfy it is almost sure. The classical abbreviation for almost sure is “a.s.” and is often left implicit, but care needs to be taken if a uncountable number of operations are performed.
A set furnished with a -field is said to be measurable. A mapping from a measurable set with -field to another measurable set with -field is said to be measurable if and only if
A (nonnegative) measure on a measurable set with -field is a -additive mapping .
By -additivity, if and are in and , then . The measure is said to be finite if , and then , and to be a probability measure or a law if , and then .
Many results for probability spaces can be extended in this framework (which is usually introduced first) using the classical computation conventions in .
For instance, if this quantity has a meaning. As in Lemma A.3.1, the -additivity property is equivalent to the fact that if is a nondecreasing sequence of events in , then . Moreover, by complementation, if is a nonincreasing sequence of events in s.t. for some , then .
A further extension is given by signed measures , which are -additive mappings . The Hahn–Banach decomposition yields an essentially unique decomposition of a signed measure into a difference of nonnegative finite measures, under the form , in which the supports and of and are disjoint. The finite nonnegative measure is called the total variation measure of , and its total mass is called the total variation norm of .
The space of all signed measures is a Banach space for this norm, which can be identified with a closed subspace of the strong dual of the functional space .
For every (nonnegative, possible infinite) reference measure , the Banach space contains a subspace that can be identified with by identifying any measure , which is absolutely continuous w.r.t. with its Radon–Nikodym derivative . If is discrete, then a natural and universal choice for is the counting measure, and thus can be identified with the collection and with .
A probability space is given. A random variable (r.v.) with values in a measurable set with -field is a measurable function , which satisfies
For an arbitrary mapping , the set
is a -field, called the -field generated by , encoding the information available on by observing . Notably, is measurable if and only if .
The probability space is often only assumed to be fixed without further precision and represents some kind of ideal probabilistic knowledge. Only the properties of certain random variables are precisely given. These often represent indirect observations or effects of the random outcomes, and it is natural to focus on them to get useful information.
The law of the r.v. is the probability measure on , which is well defined as is measurable. It is denoted by or and is given by
Then, is a probability space which encodes the probabilistic information available on the outcomes of .
The expectation will be defined as a monotone linear extension of the probability measure , first for random variables taking a finite number of values in , then for general random variables with values in , and finally for real random variables satisfying an integrability condition. The notation is sometimes used to stress .
This procedure allows to define the integral of a measurable function , from with -field to with -field , by a measure , but we restrict this to probability measures for the sake of concision.
The classic structure of is extended to by setting
If is an r.v. taking a finite number of values in , then
In particular,
For such random variables, this defines a monotone operator, in the sense that
which moreover is nonnegative linear, in the sense that
For an r.v. with values in , let
and
This extension of is still monotone and nonnegative, from which we deduce the following extension of the monotone limit lemma (Lemma A.3.1). This is where the fact that is measurable becomes crucial.
This theorem allows to prove that is nonnegative linear, by replacing the supremum in the definition by the limit of an adequate nondecreasing sequence. If is a -valued r.v., then we define for the dyadic approximation satisfying
If and are -r.v., and , then
An important corollary of the monotone convergence theorem is the following.
Let us finish with a quite useful result.
Let be an r.v. with values in . Let
so that
The natural extension to of the operations on lead to setting, except if the indeterminacy occurs,
This definition is monotone and linear: if all is well defined in , then
In particular,
and the latter is well defined. This is the most useful case and is extended by linearity to define for with values in satisfying for some (and then every) norm . Then, is said to be integrable. The integrable random variables form a vector space
It is a simple matter to check that if is an r.v. with values in and is measurable then
in all cases in which one of these expressions can be defined, and then all can.
The expectation has good properties w.r.t. the a.s. convergence of random variables. The monotone convergence theorem has already been seen. Its corollary the Fatou lemma will be used to prove an important result.
A sequence of -valued random variables is said to be dominated by an r.v. if
and to be dominated in by if moreover . The sequence is thus dominated in if and only if
This theorem can be extended to the case when
Indeed, the Borel–Cantelli lemma implies that then, from each subsequence, a further subsubsequence converging a.s. can be extracted. Applying Theorem A.3.5 to this a.s. converging sequence yields that the only accumulation point in for is , and hence that .
For , we will check that the set of all -valued random variables s.t. forms a Banach space, denoted by
if two a.s. equal random variables are identified (i.e., on the quotient space). In particular, is a Hilbert space with scalar product
This proof remains valid if is replaced by an arbitrary positive measure. The case is a special case of the Cauchy–Schwarz inequality.
The Jensen inequality yields that if , then . The linear form hence has operator norm , as
with equality for constant .
Let us go back to Section 1.1. Let be given a family of laws on , for and in . Two natural questions arise:
that is, this family of laws are the finite-dimensional marginals of ?
Clearly, the must be consistent, or compatible: if is a -tuple included in the -tuple , then must be equal to the corresponding marginal of .
It is natural and “economical” to take , called the canonical space, the process given by the canonical projections
called the canonical process, and to furnish with the smallest -field s.t. each and hence each is an r.v.: the product -field
Note that if is a sequence of subsets of the discrete space , then
and that events of this form are sufficient to characterize convergence in results such as the pointwise ergodic theorem (Theorem 4.1.1). See also Section 2.1.1.
By construction, is measurable and hence an r.v. on furnished with the product -field, and if this space is furnished with a probability measure , then has law .
The following result is fundamental. It is relatively easy to show the uniqueness part: any two laws on the product -field with the same finite-dimensional marginals are equal. The difficult part is the existence result, which relies on the Caratheodory extension theorem.
The explicit form given in Definition 1.2.1, in terms of the initial law and the transition matrix , allows to check easily that these probability measures are consistent. The Kolmogorov extension theorem then yields the existence and uniqueness of the law of the Markov chain on the product space. This yields the mathematical foundation for all the theory of Markov chains.
3.147.85.221