Chapter 4
Analysis for Inverse Problem
Inverse problems are ill-posed when measurable data are either insufficient for uniqueness or insensitive to perturbations of parameters to be imaged. To solve an inverse problem in a robust way, we should adopt a reasonably well-posed modified model at the expense of a reduced spatial resolution and/or add additional a priori information. Finding a well-posed model subject to practical constraints of measurable quantities requires deep knowledge about various mathematical theories in partial differential equations (PDEs) and functional analysis, including uniqueness, regularity, stability, layer potential techniques, micro-local analysis, regularization, spectral theory and others. In this chapter, we present various mathematical techniques that are frequently used for rigorous analysis and investigation of quantitative properties in forward and inverse problems.
4.1 Examples of Inverse Problems in Medical Imaging
Most inverse problems in imaging are to reconstruct cross-sectional images of a material property P from knowledge of input data X and output data Y. We express its forward problem in an abstract form as
4.1
where F is a nonlinear or linear function of P and X. To treat the problem in a computationally manageable way, we need to figure out its sensitivity, explaining how a perturbation P + ΔP influences the output data Y + ΔY. In this section, we briefly introduce some examples of forward problem formulations in imaging electrical and mechanical material properties of an imaging object such as the human body.
4.1.1 Electrical Property Imaging
We assume that an imaging object occupies a domain Ω in the three-dimensional space . When the object is a human body, we are interested in visualizing electrical properties of biological tissues inside the body. We denote the admittivity of a biological tissue at a position r and angular frequency ω as , where σω(r) and are the conductivity and permittivity, respectively. Most biological tissues are resistive at low frequencies but capacitive terms may not be negligible at 10 kHz or above. Electrical property imaging aims to image the internal admittivity distribution P = γω(r), which depends on r = (x, y, z) and ω. To measure the passive material property, we must employ a probing method, which excites the object by externally applying a form of energy and measures its response affected by the admittivity.
As a probing method, we inject a sinusoidal current of Isin(ωt) mA through a pair of electrodes that are attached on the surface Ω. This sinusoidally time-varying current produces sinusoidal variations of electric and magnetic fields at every point r with the same angular frequency ω. For these sinusoidal fields, it is convenient to use the phasor notation. We denote the sinusoidally time-varying electric and magnetic fields by and , respectively. We express them using vector field phasors of E(r) and H(r) as
Each component of E(r) or H(r) is a complex-valued function (independent of time) that contains amplitude and phase information. Table 4.1 summarizes important variables and parameters in the steady-state or time-harmonic electromagnetic field analysis.
Table 4.1 Time-varying and time-harmonic electromagnetic fields in Maxwell's equations
Gauss's law |
|
|
Gauss's law for magnetism |
|
∇ · B = 0 |
Faraday's law of induction |
|
∇ × E = − iωB |
Ampère's circuit law |
|
∇ × H = σE + iωD |
For the admittivity imaging, it is imperative to produce a current density J inside Ω to sense the admittivity by Ohm's law , where E is the electric field intensity. To create J(r) and E(r) inside the body Ω, we can use a pair of electrodes (attached on the surface of the body) to inject sinusoidal current. This produces J, E, B inside Ω, where B is the magnetic flux density. Alternatively, we may use an external coil to produce eddy currents inside the body Ω. In both cases, we should measure some quantities that enable us to estimate J and E (directly or iteratively). Here, we use Maxwell's equations to establish relations among J, E, input data and measured data.
We inject current of Isin(ωt) mA with 0 ≤ ω/2π ≤ 100 kHz through a pair of surface electrodes and on ∂Ω. The diameter of the imaging object is less than 1 m. Since the Faraday induction is negligibly small in this case, E, J and H approximately satisfy
4.2
4.3
with the boundary conditions
4.4
4.5
4.6
where n is the outward unit normal vector on ∂Ω and dS is the area element on ∂Ω. The first boundary condition (4.4) comes from J = γωE = 0 in the air, where γω ≈ 0. The boundary condition (4.5) means that the vector J on each electrode is parallel or antiparallel to n since the electrodes are highly conductive. One may adopt the Robin boundary condition to include effects of the electrode–skin contact impedance.
Since ∇ × E ≈ 0, there exists a scalar potential u such that
Since 0 = ∇ · (∇ × H) = ∇ · J = − ∇ · (γω∇u), the potential u is a solution of the following boundary value problem:
4.7
Example 4.1.1
Electrical impedance tomography (EIT) aims to visualize (the time or frequency change of) the admittivity distribution P = γ. One may use multiple electrodes attached on the boundary ∂Ω to inject currents and to measure boundary voltages. The input X could be a sequence of injection currents through chosen pairs of electrodes, and the output Y is the resulting boundary voltage data set uj[σ]|∂Ω, j = 1, …, N − 1, where the potential uj[σ] is a solution of the Neumann boundary value problem (4.7) with and . If γ = γ0 + Δγ (or P = P0 + ΔP) is a small perturbation of a known quantity γ0, we may deal with a linearized problem:
Example 4.1.2
Magnetic resonance current density imaging (MRCDI) is an imaging technique that visualizes a current density vector distribution by using a magnetic resonance imaging (MRI) scanner. Measurements are performed by applying an external current to the imaging object during an MRI acquisition. In MRCDI, the quantity to be imaged is the internal current density P = J; the input X is an injection current through a pair of electrodes and ; the output Y is the internal data of b that is measured by the MRI scanner. Note that measuring all three components of B = (Bx, By, Bz) requires rotating the imaging object inside the MRI scanner, since one can measure only one component of b,
which is in the direction of the main magnetic field of the MR scanner. MRCDI uses the relation
Example 4.1.3
In magnetic resonance electrical impedance tomography (MREIT), the property to be imaged is the conductivity distribution P = σ at a low frequency; the input X is an injection current through a pair of electrodes and ; the output Y is the internal data Hz(
r)
that is measured by an MRI scanner with its main field in the z direction. The relation between Hz and σ is determined by the z component of the Biot–Savart law:
4.8
Here, is a harmonic function determined by the geometry of lead wire and electrodes and u is the voltage satisfying the Neumann boundary value problem (4.7) with γ replaced by σ.
Example 4.1.4
Magnetic resonance electrical property tomography (MREPT) aims to visualize P = γω at the Larmor frequency. In MREPT, the input X is a radio-frequency (RF) excitation by an external coil; the output Y is the positive rotating magnetic field that is measured by a B1-mapping technique (Stollberger and Wach 1996; Akoka et al. 2009). The relation between γω and H+ is determined by
4.9
Example 4.1.5
Magnetic induction tomography (MIT) aims to provide the admittivity distribution P = γω in the frequency range of 1–10 MHz. In MIT, multiple transmit coils are used to excite the imaging object and the same or different receive coils are used to measure induced voltages by time-varying magnetic fields produced by eddy currents. The sinusoidal frequency of the external excitation must be high enough to produce a measurable induced voltage:
4.10
where C is the closed path of the receive coil and S is the coil surface. We note that the magnetic field H outside Ω conveys information on the admittivity γω inside the imaging object Ω. In MIT, the input X is the external excitation; the output Y is the measured induced voltage.
4.1.2 Mechanical Property Imaging
Elasticity is a mechanical property of an elastic object describing how the deformed object, subject to an external force, returns to its original state after the force is removed. Elastography measures the propagation of transverse strain waves in an object. Tissue stiffness is closely related to the velocity of the wave, and the shear modulus (or modulus of rigidity) varies over a wide range, differentiating various pathological states of tissues. Hence, the speed of the harmonic elastic wave provides quantitative information for describing malignant tissues, which typically are known to be much stiffer than normal tissues.
In imaging a mechanical property, we should use Hooke's law, which links the stress and strain tensors:
where (Cijkl) is the stiffness tensor, (σij) is the strain tensor and is the stress tensor. We need to apply a mechanical vibration or stress to create a displacement in Ω and measure some quantity that enables us to estimate the displacement inside Ω. Then, we use an elasticity equation to connect the stress tensor, strain tensor, displacement, input data and measured data.
Assuming that we are trying to image a shear modulus distribution in a linearly elastic and isotropic material, the time-harmonic elastic displacement field denoted by u = (u1, u2, u3) is dictated by the following PDE:
4.11
where we use the following notation: ρ is the density of the material,
is the shear modulus,
is the Lamé coefficient, E is Young's modulus and σ is Poisson's ratio.
Example 4.1.6
Elastography has been used as a non-invasive technique for the evaluation of the stiffness of the liver, for example. The mechanical property to be imaged is the shear modulus P = μ; the input X is an external low-frequency mechanical vibration applied to the boundary of the imaging object; the output Y is an internal measurement of the displacement vector u.
The relation between P = μ and Y =
u is determined by the PDE (4.11). Magnetic resonance elastography (MRE) uses an MR scanner to measure the interior displacement, whereas transient elastography (TE) uses an ultrasound system.
4.1.3 Image Restoration
The image restoration problem is to recover an original image P = u from a degraded measured (or observed) image Y = f that are related by
4.12
where η represents noise and H is a linear operator including blurring and shifting. One may often use a priori knowledge about the structure of the true image that can be investigated by looking at geometric structure by level curves.
A typical way of denoising (or image restoration) is to find the best function by minimizing the functional
4.13
where the first fidelity term forces the residual Hu − f to be small, the second regularization term enforces the regularity of u and the regularization parameter λ controls the tradeoff between the residual norm and the regularity.
4.2 Basic Analysis
To deal with an imaging object with an inhomogeneous material property, we consider the following PDE:
4.14
where σ is differentiable and u is twice differentiable in Ω for ∇ · (σ(r)∇u(r)) = 0 to make sense with the classical derivatives.
In practice, the material property σ may change abruptly. For example, we may consider a conductivity distribution σ inside the human body Ω. Then, σ may have a jump along the boundary of two different organs. Along such a boundary, the electrical field E = − ∇u, induced by an injection current through a pair of surface electrodes attached on ∂Ω, may not be continuous due to interface conditions of the electric field (like the refractive condition of Snell's law). In this case, there exists no solution u ∈ C2(Ω) in the classical sense and we should seek a practically meaningful solution u ∉ C2(Ω). Hence, we need to expand the admissible set of solutions of (4.14), which must include all practically meaningful solutions.
This motivates the concept of the generalized derivative called the weak derivative, which is a natural extension of the classical derivative. With the use of the weak derivative (reflecting the refractive condition of Snell's law), we can manage the equation ∇ · (σ∇u) = 0 for a discontinuous σ. For a quick understanding of the weaker derivative, we consider the following one-dimensional Dirichlet problem:
4.15
Hence, u satisfies
We should note that u is different from the solution v of
This is because v is linear whereas the potential u is piecewise-linear satisfying the following transmission condition (the refractive condition) at x = 0 where σ is discontinuous:
4.16
Indeed, the practical solution of (4.15) is
Note that the classical derivative u′ does not exist at x = 0:
The difficulty regarding the refraction contained in the PDE
can be removed by the use of the variational framework:
4.17
Exercise 4.2.1
Show that a solution u of (4.17) satisfies the transmission condition (4.16).
We need to take account of the set of physically meaningful solutions of the variational problem (4.17). A practically meaningful solution u, which is a voltage, should have a finite energy, that is,
We solve (4.15) by finding u in the admissible set Indeed, the solution u of (4.15) is a minimizer of Φ(v) within the set :
4.18
This will be explained in section 4.3.
We return to the three-dimensional problem of (4.14), where u(r) represents an electrical potential at r in an electrically conducting domain Ω. The physically meaningful solution u must have a finite energy:
Hence, the solution of (4.14) should be contained in the set . Assuming , the set is the same as the following Hilbert space:
where
Indeed, the solution is the minimizer of Φ(u) within the set {v ∈ H1(Ω):v|∂Ω = f}. As we mentioned before, the PDE − ∇ · (σ(r)∇u(r)) = 0 should be understood in the variational framework as
4.19
Remark 4.2.2
When , there is no difference between the classical and variational problems. However, there exist reasonable situations where the variational problem lacks a smooth solution; there are many practical σ for which the minimization problem has no solution in the class . Indeed, we can consider a problem with . Obviously, the classical problem does not have a solution. We can construct a minimizing sequence {
un}
in that is a Cauchy sequence with respect to the norm . Although the Cauchy sequence {
un}
does not converge within , it converges in the Sobolev space H1(Ω),
the completion of with respect to the norm . This means that we can solve the minimization and variational problem within the Sobolev space H1(Ω).
4.2.1 Sobolev Space
To explain the solution of a PDE in the variational framework, it is convenient to introduce the Sobolev space. Let Ω be a bounded smooth domain in with its smooth boundary ∂Ω and let x = (x1, …, xn) represent a position. We introduce a Sobolev space H1(Ω), which is the closure of the set equipped with the norm
The finite element method (FEM) for computing an approximate numerical solution and its error analysis can be easily accomplished within the variational framework with the Hilbert space. This Hilbert space H1(Ω) is the most widely used Sobolev space in PDE theory.
The generalized derivative can be explained by means of the integration-by-parts formula:
In general,
where the notions ∂α and |α| are understood in the following way:
- , and ;
- , for example, ;
- |α| = ∑k=1nαk.
A function vi satisfying the following equality behaves like the classical derivative :
Definition 4.2.3
Let and 0 ≤ α ≤ 1.
An open set is said to be a Ck, α domain if, for each p ∈ ∂Ω,
there exists an open neighborhood Up of p and a Ck, α diffeomorphism Φ
p:
Up →
B1(0)
such that
We will assume that Ω is an open subdomain of with its Ck, α boundary ∂Ω, where k + α ≥ 1. We are now ready to introduce the Sobolev spaces Wk, p(Ω) and , where Ω is a domain in with its boundary ∂Ω and :
- W1, p(Ω) and are the completion (or closure) of and , respectively, with respect to the norm
In other words,
- W2, p(Ω) and are the completion (or closure) of and , respectively, with respect to the norm
- We denote and .
Example 4.2.4
Let Ω = (0, 1)
and u(
x) =
x(1 −
x)
in Ω. We can show that but . To see this, assume that there exists such that um → u in H2(Ω)
and without loss of generality. For 0 <
s < 1,
it follows from the Schwarz inequality that
Hence,
An elementary computation shows that
Combining the above two inequalities leads to
4.20
In particular, substituting into (4.20) yields
and therefore
This means that , which contradicts the assumption. For details on this example, see the lecture note by Feldmann and Uhlmann (2003).
Example 4.2.6
Let . Let and with and tanθ =
y/
x.
Show that u1 and u2 satisfy
Hence, w = u1 − u2 satisfies
According to the maximum principle, we obtain
But, we know that . What is wrong with this conclusion? We should note that u1∉
H1(Ω)
and hence u1 is not a practically meaningful solution. Because of w∉
H1(Ω),
we cannot apply the maximum principle.
4.2.2 Some Important Estimates
We begin by explaining simplified versions of two important inequalities, the Poincaré and trace inequalities.
Example 4.2.7
Let Ω = {(x, y):0 < x, y < a} be a square with side length a > 0.
- A simplified version of the Poincaré inequality is
- A simplified version of the trace inequality is
These inequalities are based on the fundamental theorem of calculus:
From the first identity in the above expressions, we have
and therefore
Integrating over the y variable gives
and therefore
Application of the above estimate to the three other sides of ∂Ω leads to the trace inequality.
The Poincaré inequality uses the special property that u|∂Ω = 0 to get
and hence
which gives the Poincaré inequality.
Theorem 4.2.8 (Hardy–Littlewood–Sobolev inequality)
Let p, r > 1 and 0 < λ < n, with
For and ,
where
λ* = λ/n and Sn−1 denotes the unit sphere.
For the proof, see Theorem 4.3 in the book by Lieb and Loss (2001).
Theorem 4.2.9 (Sobolev's inequality)
Let . Then the following inequalities hold.
4.21
- If (one dimension), then
4.22
- If (two dimensions), then, for each ,
4.23
- Let Ω be a C0, 1 domain and 1 ≤ p ≤ q, m ≥ 1 and k ≤ m. Then,
4.24
4.25
where C is independent of u.
For the proof, see Theorem 8.8 in the book by Lieb and Loss (2001).
Lemma 4.2.10
Let Ω be a convex open set in with diameter d and let E be an open subset of Ω. For , we have
4.26
4.27
where uE = (1/|E|)∫Eu is the average of u over E and
Proof.
Let x ∈ Ω be fixed. We have
where θy = (y − x)/|y − x|. Hence,
Using the change of the order of integration and y = x + ρθ, we obtain
where
. The second inequality (
4.27) follows from the Hardy–Littlewood–Sobolev inequality.
Definition 4.2.11
For , define
This definition is consistent with the previous definition of Hk(Ω) for . For the example of , it follows from the Plancherel theorem that
Theorem 4.2.12
For given with s ≥ 0, there exists such that
Moreover, and
Proof.
By the Riesz representation theorem, there exists
such that
where 〈 ·, · 〉
s is the inner product of
Hs. Defining
v by
we have
.
Exercise 4.2.13
Prove that, for each , there exists such that
where C1 and C2 are positive constants and independent of u. Note that we need to choose such that
There are many choice of {f0, f1, f2}. For example, we could choose
Definition 4.2.15 (Negative Sobolev space)
The negative Sobolev space is for s > 0 where so that .
Theorem 4.2.16
For each , there exists {
f0,
f1, …,
fn} ⊂
L2(Ω)
such that
4.28
Moreover, obeys (4.28).
Proof.
Let
be a bounded linear map defined by
. Since
, the map
is unitary. Hence, we can define a bounded linear map
by
From the Riesz representation theorem, there exists (f0, f1, …, fn) ∈ W such that
and
. For the complete proof, see the lecture note by Feldmann and Uhlmann (2003).
Remark 4.2.17
For v ∈
L2(Ω),
the map is defined by v(
u) = ∫
Ωvu d
x.
Since , we have and . Moreover, we can show that L2(Ω)
is dense in H−1(Ω).
By the Riesz representation theorem, for all ∈
H−1(Ω),
there exists such that for all and . Consider the map
If L2(Ω)
is not dense in H−1(Ω),
there exists non-zero such that for all v ∈
L2(Ω).
Since for all v ∈
L2(Ω),
then w0 = 0.
Hence,
H−1(Ω)
can be viewed as a completion of L2(Ω)
with the norm
Remark 4.2.18
According to the Hahn–Banach theorem in section 4.4.2, we can extend to . But is more useful than [
H1(Ω)]*
since (4.28) can be written as
provided that f0, …, fn are differentiable.
Definition 4.2.19
Choosing such that in a neighborhood of ∂Ω, we define Hs(∂Ω) by the closure of with respect to the norm
Lemma 4.2.20 (Trace: restriction and extension)
Let . There is a positive constant C depending only on s such that
On the other hand, for each , there exists such that
Proof.
From the uniform bounded principle theorem, it suffices to prove that
For simplicity of notation, we shall prove only the two-dimensional case
n = 2. Indeed, we can apply the same idea to prove it for the general case. Let
be the projection map to the first component. From the Fourier inversion formula, we have
and
Comparing the above two identities, we have
and it follows from the Schwarz inequality that
Hence, we have
For the remaining proof, see the lecture note by Feldmann and Uhlmann (2003).
4.2.3 Helmholtz Decomposition
The Helmholtz decomposition states that any smooth vector field F in a smooth bounded domain Ω can be resolved into the sum of a divergence-free (solenoidal) vector field and a curl-free (irrotational) vector field.
Theorem 4.2.21
Every vector field F(r) = (F1(r), F2(r), F3(r)) ∈ [L2(Ω)]3 can be decomposed into
4.29
where u is a scalar function, ∇ · A = 0 and harmonic is a vector field whose Laplacian is zero in Ω. Moreover, u and A are solutions of ∇2u = ∇ · F and ∇2A = ∇ × F, with appropriate boundary conditions. Hence, these can be uniquely determined up to harmonic functions:
4.30
and
4.31
Proof.
We write the vector field F as
Integration by parts yields
4.32
where
Owing to the property
we have
Using the vector identity − ∇
2F = ∇ × (∇ ×
F) − ∇(∇ ·
F), we can express (
4.32) as
4.33
Integrating by parts again, we have
where Ψ
2 is a function satisfying
for
r ∈ Ω and
r′ ∈ ∂Ω. This completes the proof of the Helmholtz decomposition.
4.3 Variational Problems
4.3.1 Lax–Milgram Theorem
Let Ω be a bounded domain in with smooth boundary ∂Ω. In this section, we discuss the existence, uniqueness and stability of the boundary value problem (BVP):
4.34
where c0 and c1 are positive constants. In the case when σ is discontinuous, the fundamental mathematical theories of solutions such as existence, uniqueness and stability can be established within the framework of the Hilbert space H1(Ω), which is a norm closure of with the norm .
This problem (4.34) is equivalent to the following variational problem:
Define the map by
4.35
and define by
4.36
Hence, the solvability problem of (4.34) is equivalent to the uniqueness and existence question of finding satisfying
4.37
Note that the map satisfies the following:
- both and are linear for all u, w ∈ X;
- |a(u, v)| < c1||u|| ||v|| and c0||u||2 ≤ a(u, u).
Here, the norm ||u|| is . The estimate |a(u, v)| < c1||u|| ||v|| can be obtained using the Schwarz inequality and the Poincaré inequality gives the estimate c0||u||2 ≤ a(u, u).
Assuming that the Hilbert space has a basis {ϕk:k = 1, 2, …}, the variational problem (4.37) is to determine the coefficient {uk:k = 1, 2, …} of u = ∑ukϕk satisfying
4.38
Taking advantage of the linearity of a( ·, · ) and b( · ), the problem (4.38) is equivalent to solving
where A can be viewed as an matrix with coefficients akj = a(ϕk, ϕj). From the fact that , A is positive definite and invertible. Therefore, for given data b, there exists a unique u satisfying Au = b.
Now, we will prove the uniqueness and existence in in a general setting. Let X be a Hilbert space over with a norm || · ||. The map is called a bounded bilinear map on the Hilbert space X if
- both and are linear for all u, w ∈ X, and
- there exist M so that |a(u, v)| < M||u|| ||v||.
The solvability of the problem (4.34) can be obtained from the following theorem.
Theorem 4.3.1 (Lax–Milgram theorem)
Suppose the bounded bilinear map is symmetric and
4.39
Suppose is linear.
1. There exists a unique solution u ∈ X of the following minimization problem:
4.40
2. The solution u of the minimization problem (4.40) is the solution of the following variational problem:
4.41
Before proving Theorem 4.3.1, we gain its key idea by simple examples.
Example 4.3.2
Let ,
b ∈
X and A be an n ×
n symmetric matrix having positive eigenvalues. Define
Then, a( ·, · ) is bounded bilinear and b (·)
is bounded linear. Since A has positive eigenvalues,
a( ·, · )
satisfies the condition (4.39). Hence, it meets the requirements of Theorem 4.3.1. Noting that
we have
Proof.
(Lax–Milgram theorem) We begin by proving the existence of the minimization problem. Denoting
there exists a minimizing sequence {un}, that is,
Existence of the minimizer can be proved by showing that
4.44
The proof of (
4.44) is based on the identity
4.45
This leads to
4.46
Setting
, we have
Hence {
un} is a Cauchy sequence and there exists
u ∈
X so that
. Since Φ( · ) is continuous, owing to the assumption that
a( ·, · ) and
b( · ) are bounded and linear, we prove the existence of the minimizer
u:
Now, we prove uniqueness. If u is a minimizer,
Hence, if u and v are minimizers, then
which is equivalent to
In particular,
. Then, it follows from (
4.39) that
which gives u = v.
Exercise 4.3.4
Prove that .
4.3.2 Ritz Approach
Taking account of the finite element method, we continue to study Hilbert space techniques. As before, we assume the following:
- X is a real Hilbert space with norm || · ||;
- {Xn} is a sequence of a finite-dimensional subspaces of X such that Xn ⊂ Xn+1 and
- is a basis of Xn;
- is a bounded, symmetric, strongly positive, bilinear map and
- b ∈ X* where X* is the set of linear functionals on X.
As before, we consider the minimization problem:
4.47
The variational problem is as follows:
4.48
According to the Lax–Milgram theorem, we know the solvability of (4.47) and (4.48). We try to give a rather different analysis by using the new inner product a(u, v). Since ||u||2 ≈ a(u, u), can be viewed as a norm of the Hilbert space X.
Theorem 4.3.5 (Projection theorem)
The space X equipped with the new inner product and norm
4.49
is also a Hilbert space. For any closed subspace V =
⊂
X and for each u ∈
X,
4.50
Denoting , if u is decomposed as
then u1 = uV and u2 = u − uV.
Proof.
Let u be fixed. Since
the problem (
4.50) is equivalent to solving the following minimization problem:
4.51
where
b(
v) =
a(
u,
v). According to the Lax–Milgram theorem, there exists a unique solution
uV ∈
V of the minimization problem (
4.51). Next, we will show
u −
uV ∈
V⊥. Since
we have
Hence,
Letting t → 0, we get
and u − uV ∈ V⊥. Uniqueness of the decomposition follows from the fact that
Definition 4.3.6
Let X be a Hilbert space over the scalar field . The dual of X, denoted by X*, is the set of linear maps .
Theorem 4.3.7 (Riesz representation theorem)
For each linear functional b ∈ X*, there is a unique u* ∈ X such that
4.52
Moreover, .
Proof.
Assume b ≠ 0, that is, b(v) ≠ 0 for some v ∈ X. We first prove that the null space V: = {v ∈ X:b(v) = 0} is a closed subspace of X of codimension 1, that is,
From the assumption of b ≠ 0 and using the linearity of b( · ),
Owing to the linearity of b( · ) and b(u) = 1, we have
Hence,
This eventually gives a decomposition:
According to the projection theorem, the above decomposition is unique, and this proves that dimV⊥ = 1. Setting u* = u/a(u, u), we have
From the above identity, it is easy to see that
u* is unique. Since |
b(
v)| ≤ ||
u*||
a||
v||
a and
, we have ||
b|| = ||
u*||
a. This completes the proof.
Theorem 4.3.8
Assume that un ∈
Xn is a unique solution of the minimization problem (4.47) or the variational problem (4.48). Let u be a solution of the variational problem (4.48) with Xn replaced by X. Then,
i.
ii.
iii. (c/2)||u − un||2 ≤ Φ(un) − Φ(u).
Proof.
Since a(u, v) = b(v) = a(un, v) for all v ∈ Xn, we have
Since a(u − un, u − un) = a(u − un, u) and a(u − un, u − v) = a(u − un, u) for all v ∈ Xn, we have
which gives (ii). Part (i) follows from the assumption that
Xn →
X as
. For all
v ∈
X,
and therefore
which proves (iii).
Exercise 4.3.9 (Contraction mapping)
Let X be a Hilbert space over . Assume an operator is a contraction mapping, that is, there exists 0 < θ < 1 such that for all u, v ∈ X. Then, there exists a unique u* ∈ X such that
Note that, for fixed u0 ∈
X,
we can define a sequence and prove that the sequence is a Cauchy sequence.
Exercise 4.3.10 (Lax–Milgram theorem for non-symmetric bilinear operator)
Let X be a Hilbert space over . Assume that the map is bounded, coercive and bilinear:
Show that, for each b ∈ X*, there is a unique u ∈ X such that
For sufficiently small β > 0, is a contraction map because
With 0 < β < c/(M2 + 1), there is a unique u ∈ X such that , which is equivalent to .
4.3.3 Euler–Lagrange Equations
We now study how to compute the gradient of Φ(u) on a subset X of the Hilbert space H1(Ω) using several important examples.
Example 4.3.12 (Minimization problem in one dimension)
Let σ(
x) ∈
C+([
a,
b]) = {
v ∈
C([
a,
b]):
v > 0}.
For a given f ∈
L2(
a,
b)
and , suppose that u* is a minimizer of the following minimization problem:
Then, u* satisfies
with the boundary conditions u*(a) = α, u*(b) = β.
Proof.
1. In this Hilbert space H1, the inner product and the distance (or metric) between two functions u and v are given respectively by
2. The goal is to investigate a minimizer u* ∈ X satisfying
which is equivalent to
where
.
3. Hence,
4. By the chain rule and integrating by parts,
This can be viewed as a directional derivative of Φ at u* in the direction v, that is, the direction where the derivative is computed.
5. Since this holds for all functions
, we have
The last one is the Euler–Lagrange equation of the minimization problem.
Example 4.3.13
Consider the curve minimization problem joining two points (a, α) and (b, β):
If u* is a minimizer, then
Proof.
For all
, we have
Since the above identity holds for all
, the minimizer
u* satisfies the Euler–Lagrange equation:
Since
we have
Hence, the straight line joining (a, α) and (b, β) is the minimizer.
Example 4.3.14
Let Ω be a domain in . Consider the minimization problem:
If u is a minimizer, then u satisfies the Dirichlet problem:
Proof.
For all
,
Since this holds for all
, we have
Example 4.3.15
Consider the minimal surface problem:
If a minimizer u exists, it satisfies the following nonlinear problem with the Dirichlet boundary condition:
Proof.
For all
,
Since this holds for all , we have
Example 4.3.16
For a given f ∈ L2(Ω), consider the total variation minimization problem:
If a minimizer u exists, it satisfies the following nonlinear problem with zero Neumann boundary condition:
Proof.
For all v ∈ X,
Since this holds for all v ∈ X, we have
4.3.4 Regularity Theory and Asymptotic Analysis
We briefly discuss the regularity theory for the elliptic PDE. We know that a solution of the Laplace equation has the mean value property and therefore it is analytic (“having the ability to analyze”): if we have knowledge of all derivatives of a harmonic function at one fixed point, we can get full knowledge of the solution in its neighborhood. This mean value type property can be applied in some sense to solutions of elliptic equations. Basically, Harnack's inequality comes from a weighted mean value property. For details, please refer to the book by Gilbarg and Trudinger (2001).
Let Ω be a bounded smooth domain in . Consider the following divergence-form elliptic operator:
4.53
where and
is symmetric, bounded and positive definite. Assume that there exist positive constants c1 and c2 such that
4.54
The bounded linear form associated with a divergence-form elliptic operator L is
4.55
Theorem 4.3.17
Let f ∈
L2(Ω),
and . Assume that is a weak solution of Lu = f, that is,
4.56
Then, for each Ω0 ⊂ ⊂ Ω,
4.57
where the constant C depends only on ,
c0 and c1.
Here by Ω
0 ⊂ ⊂ Ω
we mean .
Proof.
Set
. Choose an open set Ω
1 such that Ω
0 ⊂ ⊂ Ω
1 ⊂ ⊂ Ω. Take a cut-off function
such that
and
. Substituting
into (
4.56) gives
4.58
4.59
Using
, we have
4.60
Similarly, using the Schwarz inequality, we have
4.61
By taking sufficiently small
, the estimates (
4.59)–(
4.61) lead to
4.62
for some positive constant C1.
Now, we will estimate . From now on, we shall use the notation x = (x1, x2, x3) = r = (x, y, z) just for simple expressions. For the estimate of , substitute into (4.56), where denotes the difference quotient
where h is very small and . Then,
We write I as
From the assumption of ellipticity,
Using the Hölder inequality and Poincaré inequality,
where can be chosen arbitrarily small, depends on , and .
Then, we have the estimate
Since this is true for arbitrarily small h, we have the estimate
Here, we may use a convergence theorem in real analysis. Combining this estimate into (4.59) with an appropriate choice of leads to (4.57).
Remark 4.3.18
For beginners in this area, we recommend starting with the equation Lu = − ∇
2u =
f to prove Theorem 4.3.17. With this simpler model, a much simpler computation gives the estimate (4.57). Then, it is easy to see that the use of the condition (4.54) leads to an extension to the general case.
Next, we study some important techniques for asymptotic analysis. We will use a nice simple model in the book by Chipot (2009) describing asymptotic analysis for problems in a large cylinder.
For a given f(y) ∈ C([ − a, a]), let uL and , respectively, be solutions of
4.63
and
4.64
Then, we have the following estimates.
Theorem 4.3.19
For 0 <
<
L,
we have
4.65
where [
L −
]
is the maximal integer less than L −
.
Proof.
The proof of (
4.65) is based on the identity
4.66
where
r = (
x,
y) and
is a linear function such that
The identity (
4.66) can be expressed as
4.67
Since ∇ϕ
= 0 in Ω
,
4.68
Since |∇ϕ
| ≤ 1/(
L −
),
4.69
and the left-hand side of (
4.69) can be estimated by
4.70
Dividing both sides of (
4.70) by
, we have
4.71
From the Poincaré inequality, the right-hand term in (
4.71) can be estimated by
4.72
From (
4.71) and (
4.72), we have
4.73
where
The estimate (
4.73) can be simplified as
4.74
Similar arguments for getting (
4.74) with changing ϕ
give
4.75
Hence, we have
4.76
where θ = θ(1) = 4a/(1 + 4a) < 1. Now, we will estimate Φ(L). Since
we have
and, therefore,
This completes the proof.
We can generalize this simple asymptotic analysis (4.76) to an elliptic PDE:
For details, please see section 6.3 in the book by Chipot (2009).
4.4 Tikhonov Regularization and Spectral Analysis
In this section, we briefly present a functional analytic approach for the Tikhonov regularization method, which is probably the most commonly used method when the problem is ill-posed. Regularization is an important tool to deal with an ill-posed problem by imposing a priori information on the solution. We discuss the regularization techniques for solving T*Tx = T*y, where T*T is an n × n matrix having a large condition number. For more detailed aspects, the reader may refer to the book by Engl et al. (1996).
Let X and Y be Hilbert spaces. We denote by the set of linear operators T:X → Y. For each , the Hilbert spaces X and Y, respectively, can be decomposed into
where
- N(T): = {x ∈ X:Tx = 0} is the kernel of T,
- R(T): = {Tx:x ∈ X} is the range of T,
- N(T)⊥: = {x ∈ X:〈x, z 〉 = 0 for all z ∈ N(T)} is the orthogonal space of N(T),
- is the orthogonal space of R(T).
Definition 4.4.1
For each , its generalized inverse, denoted by T†,
is defined as follows:
- D(T†) = R(T) ⊕ R(T)⊥,
- T†:D(T†) → N(T)⊥ is a linear operator such that N(T†) = R(T)⊥ and
Theorem 4.4.2
Let and let x† =
T†y.
Then,
where T* is the dual operator of T, which requires 〈Tx, y 〉 = 〈x, T*y 〉 for all x ∈ X and y ∈ R(T).
Proof.
The proof goes as follows:
Theorem 4.4.3
Let . The generalized inverse T†:
D(
T†) →
N(
T)
⊥ is bounded if and only if R(
T)
is closed, that is,
.
Proof.
The proof follows from the closed graph theorem in Theorem 4.4.12.
4.4.1 Overview of Tikhonov Regularization
This section is based on a lecture by Bastian von Harrach. To get insight into Tikhonov regularization, we consider the simplest operator , which is an m × n matrix. Assume that the problem is ill-posed. For ease of explanation, we assume that satisfies the following:
- is injective and bounded,
- (very large),
where T* is the transpose of T. Consider the ill-posed problem:
or the corresponding least-squares problem
In practice, the data b are always contaminated by noise; hence we may consider the following problem:
where bδ are the noisy data. Here, the norm || · || is the standard Euclidean distance.
From the assumption that , unregularized solutions (T*T)−1T*bdelta are prone to magnify noise in the data btrue, where btrue are the true data. Hence, (T*T)−1T*bdelta may be very different from the true solution xtrue: = (T*T)−1T*btrue even if δ ≈ 0. Hence, the goal is to find a good operator Gδ:X → X so that Gδ ≈ (T*T)−1 and GδT*bδ is a reasonably good approximation of xtrue when bδ ≈ btrue.
The motivation of Tikhonov regularization is to minimize not only ||Tx − bδ|| but also ||x||. By adding a regularization parameter, α, we consider the following minimization problem:
4.77
Lemma 4.4.4
For α > 0, T*T + αI is invertible and its inverse is bounded by
4.78
More precisely,
4.79
Proof.
Using the property of the adjacent operator T*, we have
Hence T*T + αI is injective. We can prove that T*T + αI is surjective by the Riesz lemma or Lax–Milgram theorem. Therefore T*T + αI is invertible and
Next, we will prove (4.79). Using the fact that
(see Exercise 4.4.5), we have
This proves (4.79). Here, the last inequality in the above estimate comes from
(see Exercise 4.4.5) and the following estimate:
Exercise 4.4.5
Prove the following.
1. Prove that (T*T + αI)−1T* = T*(TT* + αI)−1 by showing the identity
2. Prove that T(T*T + αI)−1 = (TT* + αI)−1T.
Theorem 4.4.6
The minimization problem (4.77) has a unique solution, and the minimizer of (4.77) is given by
Proof.
The minimization problem (
4.77) can be written as
which is the same as
The corresponding Euler–Lagrange equation is
Owing to the invertibility of T*T + αI in Lemma 4.4.4, the minimizer xα must satisfy xα = (T*T + αI)−1T*bδ.
To maximize the effect of the regularization, it is important to choose an appropriate regularization parameter α. The strategy is to choose an appropriate parameter α = α(δ), which can be viewed as a function of δ, such that
Hence, we must have α(δ) → 0 as δ → 0 because xtrue = (T*T)−1T*btrue. We need to make an appropriate choice of α(δ) so that it provides a “stable” convergence xα = (T*T + αI)−1T*bδ → xtrue as δ → 0.
Theorem 4.4.7
Let Txtrue = btrue and denote Gα: = (T*T + αI)−1. Let xα = GαT*bδ with ||bδ − btrue|| ≤ δ. Then, for α > 0, we have
Proof.
Since xα = GαT*bδ, we have
where the last inequality comes from Lemma 4.4.4.
In the proof of the previous theorem for a fixed δ, note the estimate
The term ψ1(α): = ||GαT*(bδ − btrue)|| indicates the amplified measurement error related to the difference between the exact data and the measurement data, and the term ψ2(α): = ||GαT*btrue − xtrue||, called the regularization error, illustrates how well GαT* approximates T−1. Roughly speaking, the optimal α would be a quantity near the intersecting point of ψ1(α) = ψ2(α) as shown in Figure 4.1.
4.4.2 Bounded Linear Operators in Banach Space
Definition 4.4.8
A normed linear space X equipped with a norm || · || is called a Banach space if it is complete.
The following are important examples of Banach spaces in PDE theory:
- for ;
- the closure of with norm ;
- .
Note that the vector space equipped with the Lp-norm is a normed space but not a Banach space.
Theorem 4.4.9 (Baire category)
Let X be a Banach space. If and , then one of the Xk contains an open ball.
Proof.
If every Xn does not contain an open set, we can choose the following Cauchy sequences {xn} and {rn}.
1. Choose
(because
is open).
2. For
n = 1, 2, …, there exists
with
(because
is open).
Since
X is complete,
xn →
x ∈
X. However,
x ∉
Xn for all
n, which contradicts the assumption
.
Theorem 4.4.10 (Uniform boundedness principle or Banach–Steinhaus)
Let X be a Banach space and Y be a normed linear space. Let . If for all x ∈ X, then
Proof.
From the assumption, we have
From the Baire category theorem, there exists Xn that contains an open ball Br(x*):
This implies that
Theorem 4.4.11
Let T:X → Y be linear, where X and Y are Banach spaces. If T(X) = Y, then T is open. Moreover, if T(X) = Y and T is one-to-one, then T−1 is continuous.
Proof.
Denote
Br: = {
x ∈
X:||
x|| <
r} and
. From the Baire category theorem and the assumption
, there exists
N such that
T(
BN) contains an open ball
. This proves that
T is open because
for all
x ∈
X and
a > 0. If
T is one-to-one, it is obvious that
T−1 is continuous.
Theorem 4.4.12
Let T:X → Y be linear, where X and Y are Banach spaces. Then, T is bounded if and only if the graph {(x, Tx):x ∈ X} is closed with respect to the product norm ||(x, y)|| = ||x||X + ||y||Y.
Proof.
If the graph
G: = {(
x,
Tx):
x ∈
X} is closed,
G can be viewed as a Banach space and hence the projection map Π
1:
G →
X defined by Π
1((
x,
y)) =
x has a bounded inverse due to the open mapping theorem. Since the projection map Π
2:
G →
Y defined by Π
2((
x,
y)) =
y is continuous, then
is bounded. Conversely, if
T is bounded, then clearly
G is closed because
xn →
x implies
Txn →
Tx.
Theorem 4.4.13
Let X be a Hilbert space. If an operator T:X → X is linear and symmetric, then T is bounded.
Proof.
From the closed graph theorem, it suffices to prove that the graph G: = {(x, Tx):x ∈ X} is closed. We need to show that if xn → x and Txn → y, then y = Tx. From the assumption that T* = T, we have
Since Tx − y is orthogonal to all ϕ ∈ X, we must have Tx = y and hence the graph G is closed.
We should note that, in order to apply the closed graph theorem, the domain of T should be a Banach space. To see this, consider the Hilbert space X = L2([0, 1]) equipped with L2-norm. We know that C2([0, 1]) is a dense subset of the Hilbert space X = L2([0, 1]). It is easy to see that the Laplace operator T:C2([0, 1]) → X defined by
is unbounded because
Defining , we can extend the unbounded operator T to V by defining the extension operator :V → X in such a way that whenever it makes sense. Clearly is a closed operator. However, is an unbounded operator, but it does not contract to the closed graph theorem because V is not complete (not a Banach space). At this point, we should note that V is a dense subset in L2([0, 1]) since .
Theorem 4.4.14 (Hahn–Banach)
Let X be a normed linear space over (or ) equipped with a norm || · ||
and let Y be a subspace of X. Let with (
x) ≤ ||
x||
for x ∈
Y.
Then, there exists an extension such that and , where .
Proof.
We will only prove the case where
X is a normed linear space over
. Choose
and set
, the vector space spanned by
Y and
x1. From the usual induction process and Zohn's lemma, it suffices to show that there exists an extension
such that
and
(
x) ≤ ||
x||. Since for
we must have
for all
and
y ∈
Y, we only need to show the existence of
satisfying
or
Hence, the existence of
requires
which is equivalent to
Rearranging the above inequality, we need to show that
The above inequality holds owing to the convexity of the norm || · || and the fact that
Proof.
From the Hölder inequality,
. Taking
, we get
and
, which lead to
. Hence,
.
For
∈ (
Lp)*, we can define a kind of Radon–Nikodym derivative:
For the proof of g ∈ Lq, please refer to the book by Lieb and Loss (2001).
Theorem 4.4.16 (Dual of Hilbert space: Riesz representation theorem)
Let H be a Hilbert space equipped with an inner product 〈 ·, · 〉 . For each , there exists g ∈ H such that (
f) = 〈
f,
g 〉
for all f ∈
H.
The proof is left as an exercise for the reader.
4.4.3 Regularization in Hilbert Space or Banach Space
Regularization is an important tool to deal with an ill-posed problem by approximating it as a well-posed problem using a priori information on solutions. We discuss regularization techniques for solving K*Kx = K*y, where K*K is an n × n matrix having a large condition number.
Definition 4.4.17
Let .
- The map K:X → Y is said to be a compact operator if is compact, where B1 is the unit ball in X. Equivalently, K is compact if, for any bounded sequence {xn} in X, {Kxn} has a convergent subsequence.
- The operator norm of K is .
- For an eigenvalue , Xσ: = {x ∈ X:K*Kx = σx} is called a σ-eigenspace of K*K.
Throughout this section, we assume that is a compact operator. Note that K*K maps from X to X and KK* maps from Y to Y. Both K*K and KK* are compact operators and self-adjoint, that is, 〈K*Kx, x′ 〉 = 〈x, K*Kx′ 〉 for all x, x′ ∈ X.
Exercise 4.4.18
Show that
Theorem 4.4.19
If is a compact operator, the self-adjoint compact operator K*
K:
X →
X satisfies the following.
- There exist eigenvalues σ1 ≥ σ2 ≥ ··· ≥ 0 and the corresponding orthonormal eigenfunctions {vn:n = 1, 2, …} of K*K such that
- For each σn > 0, and
- If K*K has infinitely many different eigenvalues, then .
- .
- Writing un = Kvn/||vn||, we have
Proof.
Since
K is compact, it is easy to prove that
K*
K:
X →
X is compact, that is, for any bounded sequence {
xn} in
X, there exists a convergent subsequence
. Hence, we have a singular system (σ
n;
vn,
un). Moreover,
; if not,
K*
K is not compact. The remaining proofs are elementary.
Theorem 4.4.20
Let K†:
R(
K) ⊕
R(
K)
⊥ →
N(
K)
⊥ be a generalized inverse of a compact operator . Then,
and
Equivalently,
Remark 4.4.21
Assume that . Then, and hence, for any arbitrarily small ,
This means that any small error in the data y may result in a large error in a reconstructed solution x = K†y.
To see this effect clearly, let us consider the backward diffusion equation. For a given time t > 0, let Kt:L2([0, π]) → L2([0, π]) be the compact operator defined by
where u(
x,
t)
is a solution of the diffusion equation ut −
uxx = 0
in with the initial condition u(
x, 0) =
f(
x)
and the homogeneous boundary condition u(0,
t) =
u(π,
t) = 0.
The standard method of separable variables gives the explicit representation of Ktf:
where and
Hence,
and its generalized inverse is
Though any small perturbation of f results in a small perturbation of Ktf,
the generalized inverse is severely ill-posed when t is not small. To be precise, suppose that we try to find u(
x, 0)
from knowledge of noisy data that is a small perturbation of the true data u(
x, 2) =
v1(
x).
Then, the computed solution
is very different from the true solution . This means that the true solution u(
x, 0) = e
2v1(
x)
is negligibly small in the computed solution .
For a compact operator K, the following are equivalent:
- K† is unbounded;
- R(K) is not closed, that is, ;
- ;
- if is a singular system for K such that , then there are infinitely many σn > 0 such that σn → 0 as .
Hence, if K† is unbounded and K is compact, the problem Kx = y is ill-posed, that is, a small error in the data y may result in a large error in the solution x = K†y. Recall that K† = (K*K)†K*. Numerous regularization techniques have been used to deal with this ill-posedness.
We now introduce the Tikhonov regularization method. Let be a compact operator. To explain the regularization method, it will be convenient to express the compact operator K*K in the form
where Pλ is the projection map from X to the set . Note that . Hence, if x ∈ D(K*K), then
Using the above property, we can show that
where is defined by
For y ∈ D(K†), the solution x† = K†y can be expressed as
4.80
If is compact and R(K) is not closed, then the operator K† is unbounded and the problem Kx = y is ill-posed. If , then
On the other hand,
This reconstruction method is called Tikhonov regularization. The solution
can also be interpreted as a minimizer of the energy functional:
4.81
Denoting , the regularization can be viewed as an approximation of K† in the sense that
Here, we should note that N(K*) = R(K)⊥. The most widely used iteration method for solving y = Kx is based on the identity x = x + αK*(y − Kx) for some α > 0. This method is called the Landweber iteration, which expects xk → x by setting
With this Landweber iteration, y ∈ D(K†) if and only if xk converges. Indeed,
which leads to
Regularization is to replace the ill-conditioned problem Kx = y by its neighboring well-posed problem in which we impose a constraint on desired solution properties. The problem of choosing the regularization parameter α in (4.81) is critical. In MR-based medical imaging, we often need to deal with the tradeoff between spatial resolution and temporal resolution. Accelerating the MR imaging speed requires the phase encoding lines in k-space to be skipped, and results in insufficient k-space data for MR image reconstruction via inverse Fourier transformation. To deal with the missing data, one may use the minimization problem (4.81), which consists of the fidelity term (ensuring that the k-space data y are consistent with the image x) and the regularization term (which incorporates the desired solution properties). Nowadays many research areas use regularization techniques.
4.5 Basics of Real Analysis
We now briefly review the real analysis techniques that are used in this chapter. For more details, we refer to the books by Lieb and Loss (2001), Marsden (1974), Rudin (1970) and Wheeden and Zygmund (1977). Throughout this section, we assume that Ω is a bounded domain in (n = 1, 2 or 3) and is a function.
4.5.1 Riemann Integrability
For a quick overview of the Riemann integral, we will explain its concept only in two-dimensional space. Let be a bounded domain and let be a bounded function. We enclose Ω in a rectangle and extend f to the whole rectangle B = [a1, b1] × [a2, b2] by defining it to be zero outside Ω. Let be a partition of B obtained by dividing a1 = x0 < x1 < ··· < xn = b1 and a2 = y0 < y1 < ··· < ym = b2:
- Define the upper sum of f by
and the lower sum of f by
- Define the upper integral of f on Ω by
and the lower integral of f on Ω by
We say that f is Riemann integrable or integrable if
If f is integrable on Ω, we denote
4.5.2 Measure Space
For a domain E in , we will denote the area of the domain E by μ(E). We can measure the area μ(E) based on the area of rectangles. If Q = (a, b) × (c, d) and Qj for j = 1, 2, … is a sequence of rectangles with for j ≠ k, then we measure them in the following ways:
It seems that any open domain E is measurable since any open set can be expressed as a countable union of rectangles. Then, can we measure any subset in ? For example, consider the set , where is the set of rational numbers. What is its area μ(E)? It is a bit complicated. Measure theory provides a systematic way to assign to each suitable subset E a positive quantity μ(E) representing its area.
The triple is said to be a measure space if the following are true:
1. X is a set;
2. is a σ-algebra over the set
X, that is, closed under a countable union and complementations;
3. μ is a measure on
, that is,
a. μ(
E) ≥ 0 for any
,
b. for all countable collections {
Ej:
j = 1, 2, … } of pairwise disjoint sets in
,
, and
c. μ(∅) = 0.
If the σ-algebra includes all null sets (a null set is a set N such that μ(N) = 0), then μ is said to be complete. Let denote the collection of all subsets of X.
Definition 4.5.1 (Outer measure)
An outer measure on a non-empty set X is a set function μ* defined on that is non-negative, monotone and countably subadditive.
With the use of the outer measure, we can describe a general constructive procedure for obtaining the complete measure. Let be an algebra of sets and a set-valued function such that μpre(∅) = 0. For A ⊂ X, we define
Then, μ* is an outer measure.
Example 4.5.2
Let and let be the σ-algebra generated by the set of all open rectangles in . Define
Then is a measure space but it may not be complete. This ρ is called the pre-measure. For A ⊂ X, we define
Then, μ* is an outer measure.
The following Caratheodory definition provides a method of constructing a complete measure space .
Definition 4.5.3
Let μ* be an outer measure on a set X. A subset A ⊂ X is said to be μ*-measurable if
The Lebesgue measure on is an extension of the pre-measure defined by μpre((a, b]) = b − a.
4.5.3 Lebesgue-Measurable Function
Throughout this section, we restrict ourselves to the standard Lebesgue measure μ in X ( or ) that is generated by the outer measure,
where μpre is a pre-measure defined on open sets in X. For example, in , μpre(U) is the volume of U.
A function is said to be measurable if f−1(U) is measurable for every open set U.
Exercise 4.5.4
Given two functions f and g, we define
1. Then
is measurable if
for all
.
2. Prove that, if f and g are measurable, then so are f + g, fg, f ∨ g, f∧g, f+, f− and |f|.
3. If {
fj} is a sequence of measurable functions, then
and
are measurable.
From now on, we assume that E, Ej are measurable sets. The characteristic function of E, denoted by χE, is the function defined by
Let be the set of all simple functions, a finite linear combination of characteristic functions:
4.82
Let be the set of measurable functions and let
For given , we say that f = 0 holds almost everywhere (abbreviated a.e.) if μ({x:f(x) ≠ 0}) = 0.
Theorem 4.5.5
Any can be approximated by a sequence of simple functions
4.83
where En, k =
f−1((
k2
−n, (
k + 1)2
−n])
and . Moreover, each ϕn satisfies
The proof is straightforward by drawing a diagram of ϕn. From the above theorem, we can prove that, for any measurable function f, there is a sequence of simple functions ϕn such that ϕn → f on any set on which f is bounded. The reason is that f can be decomposed into f = f+ − f−, where f = max{f, 0} and f− = max{ − f, 0}.
Now, we are ready to give the definition of the Lebesgue integral using the integral of a measurable simple function , which is defined as
4.84
We use the convention that . Let be a vector space of measurable simple functions. Then the integral (as a function of ) can be viewed as a linear functional on , that is, the operator is linear.
Proof.
It suffices to prove this for ϕ = χ
A where
. The proof follows from the following identity for
of mutually disjoint
:
Using the definition (4.84) of the integral of a measurable simple function, we can define the integral of a measurable function f ≥ 0.
Definition 4.5.7
The integral of is defined by
4.86
From the definition, we obtain that f ≤ g implies ∫f dμ ≤ ∫g dμ.
Theorem 4.5.8 (Monotone convergence theorem)
For a non-decreasing sequence , we have
Proof.
Since
, there exists
f = lim
nfn ∈
Mable. Since
and
fn ≤
f, we have
It remains to prove lim
n∫
fn dμ ≥ ∫
f dμ. Since
, it suffices to prove that, for any α, 0 < α < 1, and any
with ϕ ≤
f,
Let En = {fn ≥ αϕ}. Then,
Since ν is a measure and
, lim
nν(
En) = ν(
X) = ∫ϕ dμ. Thus, lim
n∫
fn dμ ≥ α∫ϕ dμ.
Exercise 4.5.9
Prove that, if and for some , then
Proposition 4.5.10
Let . Then, ∫
f dμ = 0
if and only if f = 0 a.e.
Proof.
If
, then the statement is immediate. If
f = 0 a.e. and ϕ ≤
f, then ϕ = 0 a.e., and hence
.
Conversely, if ∫f dμ = 0, then
and therefore f = 0 a.e. because
Lemma 4.5.11 (Fatou's lemma)
For any sequence , we have
Proof.
Since
, we have
Definition 4.5.12
(
L1(
X, dμ))
A function is integrable if and . We denote by L1(
X, dμ)
the class of all integrable functions. For f ∈
L1(
X, dμ),
we define
It is easy to see that L1(X, dμ) is a vector space equipped with the norm ||f|| = ∫|f| dμ satisfying the following:
1. ||f|| ≥ 0, for all f ∈ L1;
2. ||f|| = 0 if and only if f = 0 a.e.;
3. ||λf|| = |λ| ||f||, for all f ∈ L1 and every scalar λ;
4. ||f + g|| ≤ ||f|| + ||g||, for all f, g ∈ L1.
Now, we will prove that L1(X, dμ) is a complete normed space (or Banach space). To prove this, we need to study several convergence theorems.
Theorem 4.5.13 (Lebesgue dominated convergence theorem (LDCT))
If {fn} ⊂ L1 with fn → f a.e. and there exists g ∈ L1 so that |fn| ≤ g a.e. for all n, then
Proof.
Since g + fn ≥ 0, it follows from Fatou's lemma that
This leads to
.
On the other hand, applying the same argument to the sequence g − fn ≥ 0 yields
Exercise 4.5.14
Let
denote the fundamental solution of the heat equation in one dimension. Let fn(x) = G(x, 1/n). Then, fn → f = 0 a.e. and
The above exercise explains the reason why the LDCT requires the assumption that {fn} is dominated by a fixed L1 function g.
Exercise 4.5.15
Let {
fj} ⊂
L1 so that . Show that . Using the LDCT, prove that there exists f ∈
L1 such that
From (4.83), for any function f ∈ L1, there exists a sequence such that
From the LDCT, ||ϕn − f|| = ∫|ϕn − f| dμ → 0. Hence, is dense in L1, that is, every element in L1 is an L1-limit of a sequence of elements in .
Exercise 4.5.16
Show that a sequence of continuous functions
Using this idea, show that is dense in .
Exercise 4.5.17
Let be a mapping. Suppose that f is differentiable with respect to t and that
Then, F(t) = ∫f(x, t) dμ is differentiable on a ≤ t ≤ b and
Note that, for each t ∈ (a, b), we can apply the LDCT to the sequence
because |hn| ≤ g from the mean value theorem.
4.5.4 Pointwise, Uniform, Norm Convergence and Convergence in Measure
Let Ω be a subdomain in . The sequence of functions fk ∈ C(Ω) is said to converge pointwise to f if, for each , fk(r) → f(r). We say that the sequence of functions fk ∈ C(Ω) converges uniformly to f if .
Example 4.5.18
Consider the following.
1. The function fk(x) = xk converges pointwise to
on the interval [0, 1],
but for all k. Hence, the convergence is not uniform.
2. The function
converges uniformly to sinx in [0, 1].
3. The following functions satisfy in some sense:
a. uniformly;
b. ϕn = χ(n, n+1) → 0 pointwise;
c. ϕn = nχ(0, 1/n) → 0 a.e.
d. Define a sequence for j = 0, …, 2
k − 1
and k = 0, 1, 2, ….
Then, and ||ψ
k, j − 0|| = 2
−k → 0,
while for any x.
Definition 4.5.19
The sequence {fn} is said to converge in measure to f if
and {fn} is said to be a Cauchy sequence in measure if
For example, the sequences ϕn = (1/n)χ(0, n), χ(0, 1/n) and converge to 0 in measure, but the sequence ϕn = χ(n, n+1) does not converge to 0 in measure.
Theorem 4.5.20
Suppose {fn} is a Cauchy sequence in measure. Then,
- there exists such that fn → f in measure,
- there exists such that a.e.,
- f is uniquely determined a.e.
Proof.
Choose a subsequence
nk such that
,
Then, denoting
and
, we have
For
and
j >
i, we have
Therefore,
exists for
. By letting
f(
x) = 0 for
x ∈
Z, we have
gk →
f a.e. It is easy to prove that the sequence
gk converges to
f uniformly on
because
. We can prove
fn →
f in measure from
The statement [ fn → f in L1 ] implies [ fn → f in measure ] since
From the proof of the previous theorem, if [ fn → f in L1 ], then we can choose a subsequence nk such that ,
Hence, according to Theorem 4.5.20, [ fn → f in L1 ] implies the statement [ gk → f a.e. ].
Theorem 4.5.21 (Egorov's theorem)
Let and let fn →
f a.e. Then, for any , there exists F ⊂
E so that and fn →
f uniformly on F.
Proof. Since fn → f a.e., there exists Z ⊂ E so that μ(Z) = 0 and fn(x) → f(x) for . Hence, it suffices to prove the theorem for the case when Z = ∅.
Let . Then, for n = 1, 2, … . Hence, there exists mn such that . Let . Then, . Moreover, if j > mn, then Hence, fn → f uniformly on F.
4.5.5 Differentiation Theory
Throughout this section, we consider a bounded function . We will study a necessary and sufficient condition that f′ exists almost everywhere and
Recall that the derivative of f at x exists if all four of the following numbers have the same finite value:
According to Lebesgue's theorem, every monotonic function is differentiable almost everywhere.
Example 4.5.22
If f is a Cantor function, then f′ = 0 almost everywhere but
To understand why for a Cantor function f, we need to understand the concept of absolute continuity.
Definition 4.5.23
A measure ν is absolutely continuous with respect to μ if and only if . A non-negative f ∈
BV[
a,
b]
(the space of bounded variation functions in [
a,
b])
is absolutely continuous with respect to μ if and only if the measure ν(
E) = ∫
Ef dμ
is absolutely continuous with respect to μ. The measures μ and ν are mutually singular, written μ⊥ν, if and only if
Theorem 4.5.24 (Absolute continuity)
If f′ exists almost everywhere, f′ ∈ L1(dμ) and
then f is absolutely continuous.
Proof.
We want to prove that, for a given
, there exists δ so that
If |
f′| is bounded, then we choose
and
If f′ ∈ L1(dμ) but not bounded, then we decompose
This is possible because the bounded functions are dense in
L1(dμ). The result follows by choosing
.
From this theorem, the Cantor function is not absolutely continuous.
Every continuous and non-decreasing function f can be decomposed into the sum of an absolutely continuous function and a singular function, both monotone:
where g is an absolutely continuous function.
Theorem 4.5.25 (Radon–Nikodym, Riesz representation)
Let ν be a finite measure on [a, b], and suppose that ν is absolutely continuous with respect to μ. Then, there exists f ∈ L1(X, dμ) such that
References
Akoka S, Franconi F, Seguin F and le Pape A 2009 Radiofrequency map of an NMR coil by imaging. Magn. Reson. Imag. 11, 437–441.
Chipot M 2009 Elliptic Equations: An Introductory Course. Birkhäuser Advanced Texts. Birkhäuser, Basel.
Engl HW, Hanke M and Neubauer A 1996 Regularization of Inverse Problems. Mathematics and Its Applications. Kluwer Academic, Dordrecht.
Feldman J and Uhlmann G 2003 Inverse Problems. Lecture Note. See http://www.math.ubc.ca/∼feldman/ibook/.
Gilbarg D and Trudinger N 2001 Elliptic Partial Differential Equations of Second Order. Springer, Berlin.
Lieb EH and Loss M 2001 Analysis, 2nd edn. Graduate Studies in Mathematics, no. 14. American Mathematical Society, Providence, RI.
Marsden JE 1974 Elementary Classical Analysis. W. H. Freeman, San Francisco.
Rudin W 1970 Real and Complex Analysis. McGraw-Hill, New York.
Stollberger R and Wach P 1996 Imaging of the active B1 field in vivo. Magn. Reson. Med. 35, 246–251.
Wheeden RL and Zygmund A 1977 Measure and Integral: An Introduction to Real Analysis. Monographs and Textbooks in Pure and Applied Mathematics, vol. 43. Marcel Dekker, New York.
Further Reading
Evans LC 2010 Partial Differential Equations. Graduate Studies in Mathematics, no. 19. American Mathematical Society, Providence, RI.
Folland G 1976 Introduction to Partial Differential Equations. Princeton University Press, Princeton, NJ.
Giaquinta M 1983 Multiple Integrals in the Calculus of Variations and Non-linear Elliptic Systems. Princeton University Press, Princeton, NJ.
Grisvard P 1985 Elliptic Problems in Nonsmooth Domains. Monographs and Studies in Mathematics, no. 24. Pitman, Boston, MA.
John F 1982 Partial Differential Equations. Applied Mathematical Sciences, vol. 1. Springer, New York.
Kellogg OD 1953 Foundations of Potential Theory. Dover, New York.
Reed M and Simon B 1980 Functional Analysis. Methods of Modern Mathematical Physics, vol. I. Academic Press, San Diego, CA.
Rudin W 1970 Functional Analysis. McGraw-Hill, New York.
Zeidler E 1989 Nonlinear Functional Analysis and Its Applications. Springer, New York.