Suppose that F : (a, b) → ℝ is an increasing function which is not necessarily strictly increasing. Let
Definition A.18. A function q : (c, d) → (a, b) is called an inverse function for F if
The functions
are called the left- and right-continuous inverse functions.
◊
The following lemma explains the reason for calling q− and q+ the left- and right-continuous inverse functions of F.
Lemma A.19. A function q : (c, d) → (a, b) is an inverse function for F if and only if
In particular, q− and q+ are inverse functions. Moreover, q− is left-continuous, q+ is right-continuous, and every inverse function q is increasing and satisfies q(s−) = q−(s) and q(s+) = q+(s) for all s ∈ (c, d). In particular, any two inverse functions coincide a.e. on (c, d).
Proof. We have q− ≤ q+, and any inverse function q satisfies q− ≤ q ≤ q+, due to the definitions of q− and q+. Hence, the first part of the assertion follows if we can show that F(q+(s)−) ≤ s ≤ F(q−(s)+) for all s. But x < q+(s) implies F(x) ≤ s and y > q−(s) implies F (y) ≥ s, which gives the result.
Next, it is clear that both q− and q+ are increasing. Moreover, the set {x | F (x) > s} is the union of the sets {x | F (x) > s + ε} for ε > 0, and so q+ is right-continuous. An analogous argument shows the left-continuity of q−.
Remark A.20. The left- and right-continuous inverse functions can also be represented as
To see this, note first that q−(s) is clearly dominated by the infimum on the right. On the other hand, y > q−(s) implies F(y) ≥ s, and we get q−(s) ≥ inf{x ∈ ℝ | F(x) ≥ s}. The proof for q+ is analogous.
◊
Lemma A.21. Let q be an inverse function for F. Then F is an inverse function for q. In particular,
Proof. If s > F(x) then q(s) ≥ q−(s) ≥ x, and hence q(F(x)+) ≥ x. Conversely, s < F (x) implies q(s) ≤ q+(s) ≤ x, and thus q(F(x)−) ≤ x. This proves that F is an inverse function for q.
Remark A.22. By defining q(d) := b we can extend (A.14) to
◊
From now on we will assume that
and that F is normalized in the sense that c = 0 and d = 1. This assumption always holds if F is the distribution function of a random variable X on some probability space (Ω,F, P), i.e., F is given by F(x) = P[ X ≤ x ]. The following lemma shows in particular that also the converse is true: any normalized increasing right-continuous functions F : ℝ → [0, 1] is the distribution function of some random variable. By considering the laws of random variables, we also obtain the one-to-one correspondence F (x) = μ((−∞, x]) between all Borel probability measures μ on ℝ and all normalized increasing right-continuous functions F : ℝ → [0, 1].
Lemma A.23. Let U be a random variable on a probability space (Ω,F, P) with a uniform distribution on (0, 1), i.e., P[ U ≤ s ] = s for all s ∈ (0, 1). If q is an inverse function of a normalized increasing right-continuous function F : ℝ → [0, 1], then
has the distribution function F.
Proof. First note that any inverse function for F is measurable because it is increasing. Since q(F(x)−) ≤ x by Lemma A.21, we have q(s) ≤ x for s < F (x). Moreover, the monotonicity of F and (A.13) yield that q(s) ≤ x implies F (x) ≥ F (q(s)) = F (q(s)+) ≥ s. It follows that
Hence,
The assertion now follows from the identity P[ U ∈ {s | q(s) ≤ x}] = P[ X ≤ x ].
Definition A.24. An inverse function q : (0,1) → ℝ of a distribution function F is called a quantile function. That is, q is a function with
The left- and right-continuous inverses,
are called the lower and upper quantile functions.
◊
We will often use the generic notation F X for the distribution function of a random variable X. When the emphasis is on the law μ of X, we will also write F μ. In the same manner, we will write q X or qμ for the corresponding quantile functions. The value q X(λ) of a quantile function at a given level λ ∈ (0, 1) is often called a λ-quantile of X.
Exercise A.3.1. Compute the upper and lower quantile functions for the following distribution functions.
(a) The distribution function of a Dirac point mass: F (x) = for some x0 ∈ ℝ. {x≥x0}
(b) The exponential distribution function: F (x) = 0 for x < 0 and dy for x ≥ 0, where λ > 0.
(c) The distribution function of a Gumbel distribution: where μ ∈ ℝ and β > 0.
◊
The following result complements Lemma A.23. It implies that a probability space supports a random variable with uniform distribution on (0, 1) if and only if it supports any nonconstant random variable X with a continuous distribution.
Lemma A.25. Let X be a random variable with a continuous distribution function F X and with quantile function qX. Then U := F X(X) is uniformly distributed on (0, 1), and X = qX(U) P-almost surely.
Proof. Let (, F,) be any probability space that supports a random variable with a uniform distribution on (0, 1). Then := qX() has the same distribution as X due to Lemma A.23. Hence, F X(X) and F X() also have the same distribution. But if F X is continuous, then F XqX(s)= s and thus F X() = . This shows that F X(X) is uniformly distributed.
To show that X = q X(U) P-a.s., note first that by Lemma A.21. Hence, P-almost surely. Now let f : ℝ → (0, 1) be any strictly increasing function. Since qX(U) and X have the same law, we have E[ f (qX(U)) ] = E[ f (X) ] and get
There are several possibilities how the preceding lemma can be generalized to the case of discontinuous distribution functions F X. A first possibility is provided in the following exercise. It requires the existence of an independent random variable with a uniform distribution on (0, 1). The second possibility will be given in Lemma A.32. There, we will only assume the existence of some uniformly distributed random variable, not its independence of X.
Exercise A.3.2. Let X be a random variable with distribution function F X. The modified distribution function of X is defined by
Suppose that is a random variable that is independent of X and uniformly distributed on (0, 1). Show that
is uniformly distributed on (0, 1) and that
X = qX(U) P-a.s.
◊
The following lemma uses the concept of the Fenchel–Legendre transform of a convex function as introduced in Definition A.8.
Lemma A.26. Let X be a random variable with distribution function F X and quantile function qX such that E[ |X|] < ∞. Then the Fenchel–Legendre transform of the convex function
is given by
Moreover, for 0 < y < 1, the supremum above is attained in x if and only if x is a y-quantile of X.
Proof. Note first that, by Fubini’s theorem and Lemma A.23,
It follows that Ψ∗(y) = +∞ for y < 0,Ψ∗(0) = − infx Ψ(x) = 0,
and Ψ∗(y) = ∞ for y > 1. To prove our formula for 0 < y < 1, note that the right-hand and left-hand derivatives of the concave function f (x) = xy − Ψ (x) are given by and A point x is a maximizer of f if and which is equivalent to x being a y-quantile. Taking x = q X(y) and using (A.15) gives
and our formula follows.
Lemma A.27. If X = f (Y) for an increasing function f and qY is a quantile function for Y, then f (qY (t)) is a quantile function for X. In particular,
for any quantile function q X of X.
If f is decreasing, then f (qY(1 − t)) is a quantile function for X. In particular,
Proof. If f is decreasing, then q(t) := f (qY(1 − t)) satisfies
since FY (qY(1 − t)−) ≤ 1 − t ≤ FY (qY(1 − t)) by definition. Hence q(t) = f (qY(1 − t)) is a quantile function. A similar argument applies to an increasing function f .
Exercise A.3.3. Let X, Y, and Z by random variables such that X = f (Z) and Y = g(Z) for two increasing functions f and g. Show that if qX, qY , and qX+Y are quantile functions for X, Y, and X + Y, then
Note: Two random variables X and Y in the form considered in this exercise are called comonotone. See Section 4.7 for more background on the notion of comonotonicity.
◊
The following theorem is a version of the Hardy–Littlewood inequalities. They estimate the expectation E [ XY ] in terms of quantile functions qX and qY .
Theorem A.28. Let X and Y be two random variables on (Ω,F, P) with quantile functions qX and qY . Then,
provided that all integrals are well defined. If X = f (Y) and the lower (upper) bound is finite, then the lower (upper) bound is attained if and only if f can be chosen as a decreasing (increasing) function.
Proof. We first prove the result for X, Y ≥ 0. By Fubini’s theorem,
Since
and since
for any random variable Z ≥ 0, another application of Fubini’s theorem yields
In the same way, the upper estimate follows from the inequality
For X = f (Y),
due to Lemma A.23, and so Lemma A.27 implies that the upper and lower bounds are attained for increasing and decreasing functions, respectively.
Conversely, assume that X = f (Y), and that the upper bound is attained and finite:
where fis an increasing function on [0,∞) such that
if 0 < FY (x) < 1 and x is a continuity point of F Y , and
if F Y (x−) < FY (x). It is shown in Exercise 3.4.1 that
where Eλ[ · | qY ] denotes the conditional expectation with respect to qY under the Lebesgue measure λ on (0, 1). Therefore, (A.17) implies that
where we have used Lemma A.23 in the first identity. After these preparations, we can proceed to proving (A.18). Let ν denote the distribution of Y. By introducing the positive measures dμ = f dν and d = dν, (A.20) can be written as
On the other hand, with g denoting the increasing function , the upper Hardy-[y,∞) Littlewood inequality, Lemma A.27, and (A.19) yield
In view of (A.21), we obtain μ = , hence f =ν-a.s. and X =(Y) P-almost surely. An analogous argument applies to the lower bound, and the proof for X, Y ≥ 0 is concluded.
The result for general X and Y is reduced to the case of nonnegative random variables by separately considering the positive and negative parts of X and Y:
where we have used the upper Hardy–Littlewood inequality on the positive terms and the lower one on the negative terms. Since qZ+ (t) = (qZ(t))+ and qZ− (t) = (qZ(1 − t))− for all random variables due to Lemma A.27, one checks that the right-hand side of (A.22) is equal to and we obtain the general form of the upper Hardy–Littlewood inequality. The same argument also works for the lower one.
Now suppose that X = f (Y). We first note that (A.16) still holds, and so Lemma A.27 implies that the upper and lower bounds are attained for increasing and decreasing functions, respectively. Conversely, let us assume that the upper Hardy–Littlewood inequality is an identity. Then all four inequalities used in (A.22) must also be equalities. Using the fact that XY+ = f (Y+)Y+ and XY− = f (−Y−)Y−, the assertion is reduced to the case of nonnegative random variables, and one checks that f can be chosen as an increasing function. The same argument applies if the lower Hardy–Littlewood inequality is attained.
Remark A.29. For indicator functions of two sets A and B in F, the Hardy– Littlewood inequalities reduce to the elementary inequalities
note that these estimates were used in the preceding proof. Applied to the sets {X ≤ x} and {Y ≤ y}, where X and Y are random variables with distribution functions F X and F Y and joint distribution function F X,Y defined by F X,Y (x, y) = P[ X ≤ x, Y ≤ y ], they take the form
The estimates (A.23) and (A.24) are often called Fréchet bounds. The Hardy– Littlewood inequalities, which are also called Hoeffding–Fréchet bounds, provide their natural extension from sets to random variables.
◊
Exercise A.3.4. Let X and Y be two random variables on (Ω,F, P) for which all terms appearing in the Hardy–Littlewood inequalities make sense. Suppose moreover that there exists a third random variable Z and functions f and g such that X = f (Z) and Y = g(Z).
(a) Show that the upper Hardy–Littlewood inequality reduces to an equality if X and Y are comonotone in the sense that f and g are both increasing or both decreasing.
(b) Show that the lower Hardy–Littlewood inequality reduces to an equality if X and Y are anticomonotone in the sense that one of the functions f and g is increasing and the other one is decreasing.
Note: See Section 4.7 for more background on the notion of comonotonicity.
◊
Exercise A.3.5. This exercise complements the Fréchet bounds (A.23) and (A.24).
(a) Derive similar bounds as in (A.23) for the probability P[ A ∪ B ] of a union of two events A, B ∈ F.
(b) Show that the bounds in (A.23) admit the following extension to the case of n events:
for A1, . . . , An ∈ F. Then derive an extension of (A.24) to the case of n random variables X1, . . . , Xn.
◊
Definition A.30. A probability space (Ω,F, P) is called atomless if it contains no atoms. That is, there is no set A ∈ F such that P[ A ] > 0 and P[ B ] = 0or P[ B ] = P[ A ] whenever B ∈ F is a subset of A.
◊
Proposition A.31. For any probability space, the following conditions are equivalent.
(a) (Ω,F, P) is atomless.
(b) There exists an i.i.d. sequence X1, X2, . . . of random variables with Bernoulli distribution
(c) For any μ ∈ M1(ℝ) there exist i.i.d. random variables Y1, Y2, . . . with common distribution μ.
(d) (Ω,F, P) supports a random variable with a continuous distribution.
Proof. (a)⇒(b):We need the following intuitive fact from measure theory: If (Ω,F, P) is atomless, then for every A ∈ F and all δ with 0 ≤ δ ≤ P[ A ] there exists a measurable set B ⊂ A such that P[ B ] = δ; see Theorem 9.51 of [3] for a proof. Thus, we may take a set A ∈ F such that P[ A ] = 1/2 and define X1 := 1 on A and X1 := 0 on Ac. Now suppose that X1, . . . , X n have already been constructed. Then
for all x1, . . . , xn ∈ {0, 1}, and this property is equivalent to X1, . . . , Xn being independent with the desired symmetric Bernoulli distribution. For all x1, . . . , xn ∈ {0, 1} we may choose a set
B ⊂ {X1 = x1, . . . , Xn = xn}
such that P[ B ] = 2−(n+1) and define Xn+1 := 1 on B and Xn+1 := 0 on Bc ∩ {X1 = x1, . . . , Xn = xn}. Clearly, the collection X1, . . . , Xn+1 is again i.i.d. with a symmetric Bernoulli distribution.
(b)⇒(c): By relabeling the sequence X1, X2, . . . , we may obtain a double-indexed sequence (Xi,j)i,j∈N of independent Bernoulli-distributed random variables. If we let
then it is straightforward to check that Ui has a uniform distribution. Let q be a quantile function for μ. Lemma A.23 shows that the i.i.d. sequence Yi := q(Ui), i = 1,2, . . . , has common distribution μ.
The proofs of the implications (c)⇒(d) and (d)⇒(a) are straightforward.
Lemma A.32. If X is a random variable on an atomless probability space, then there exists a random variable U with uniform distribution on (0, 1) such that X = qX(U) P-a.s. for any quantile function q X of X.
Proof. Let us write q := q X and denote by λ the Lebesgue measure on (0, 1). For each x ∈ ℝ, Ix := {t ∈ (0, 1) | q(t) = x} is a (possibly empty or degenerate) real interval, which has Lebesgue measure λ(Ix) = P[ X = x ] by Lemma A.23. Consider the set D := {x ∈ ℝ | P[ X = x ] > 0}, which is at most countable. For each x ∈ D, the probability space (Ω,F, P[ · | X = x ]) is again atomless and hence supports a random variable Ux : Ω → Ix with a uniform law on Ix. That is, P[ Ux ∈ A | X = x ] = λ(A ∩ Ix)/λ(Ix) or, equivalently,
Let F := F X be the distribution function of X and define the random variable
Since q(Ux(ω)) = x for any x ∈ D, we have q(U(ω)) = X(ω) for all ω ∈ {X ∈ D}. Let us show next that q(F(X)) = X P-a.s. on {X ∈ / D}, which will then yield that q(U) = X P-a.s. as desired. To this end, let Δ := {t ∈ (0, 1) | q(t+) > q(t−)} denote the set of discontinuities of q. If F (x) ∈ /Δ, then q(F(x)+) = q(F(x)−), while Lemma A.21 implies that q(F(x)−) ≤ x ≤ q(F(x)+). Combining these two facts gives
Thus, we will have q(F(X)) = X P-a.s. on {X / ∈ D} if we can prove that
To prove (A.27), fix t ∈ Δ. Then Remark A.20 and Lemma A.19 yield that the set Jt := {x ∈ ℝ | F (x) = t} is a real interval with endpoints q(t−) and q(t+). On the one hand, by the right-continuity of F,
On the other hand, P[ X ∈ {q(t−), q(t+)}, X / ∈ D ] = 0. Thus, P[ X ∈ Jt , X / ∈ D ] = 0 for each t ∈ Δ. Since Δ is countable, our claim (A.27) follows.
It remains to show that U has a uniform law. To this end, take a measurable subset A of (0, 1). Using (A.25) we get
Thus, the proof is completed by showing that P[ F(X) ∈ A, X ∈ / D ] = λ(A ∩ Ic), where To this end, recall first that λ ◦ q−1 = P ◦ X−1 by Lemma A.23. Now, by (A.27),
as {t | q(t) /∈ D} = Ic. We now claim that
To prove this claim, note first that {F(q) ∈ A} ⊂ {q(F(q)) ∈ q(A)} and q(F(q)) = q on {F(q) /∈ Δ} by (A.26). Therefore, the set on the left-hand side is contained in {q ∈ q(A)}∩{F(q) /∈ Δ}∩Ic. Next, suppose q(t) ∈ q(A) for some t ∈ {F(q) /∈ Δ}∩Ic. Then q(t) = q(a) for some a ∈ A. But since t ∈ Ic,wehave q(t) /∈ D and therefore λ(Iq(t)) = 0. Hence, there is no other s ∈ (0, 1) with q(s) = q(t), which implies t = a and in turn t ∈ A. This gives “⊂” in (A.29). Conversely, suppose that t belongs to the set on the right-hand side of (A.29). Since t ∈ Ic, we have q(t) /∈ D and hence F (q(t)−) = F (q(t)). Combining the latter fact fact with F (q(t)−) ≤ t ≤ F ((q(t)) gives F (q(t)) = t ∈ A and in turn “⊃”, which completes the proof of (A.29). Plugging (A.29) into (A.28) gives
Applying Lemma A.23 to (A.27) yields
Therefore, λ(A ∩ {F(q) /∈ Δ} ∩ Ic) = λ(A ∩ Ic), and the proof is complete.
3.147.89.30