Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2
Measurement of Information of a Discrete Source and Channel Capacity

2.1. Introduction and definitions

The purpose of a message is to provide information to a recipient. Information, like energy, for that matter, is a fundamental notion, which is part of our daily life. It can be transported (transmitted), stored (memorized) and transformed (processed).

The value of information lies in the surprise effect it causes; it is all the more interesting because it is less predictable. As a result, information is assimilated to an event of a random nature, considered, in the following, as a random variable.

The measure of the information is then given by measuring the uncertainty of an event system, that is to say, the random choice of an event in a set of possible and distinct events.

We will use the terms message or sequence to indicate any finite sequence of symbols taken in an alphabet A = { } of a discrete source: a finite set of predetermined symbols, for example:

We will use the term discrete source of information S_n to indicate the system selecting, from the set of symbols from A_s = S, the symbols to transmit:

The choice of s_m,n is carried out according to a given probability law, which can be steady temporally or variable over time.

2.2. Examples of discrete sources

2.2.1. Simple source (memoryless)

This is a source for which the probability of the occurrence of a symbol does not depend on the previous symbols:

[2.1]

[2.2]

2.2.2. Discrete source with memory

It is a source for which the probability of occurrence of a symbol depends on the symbols already emitted (statistical dependence) and on instants 1, 2, ..., n where they have been emitted:

[2.3]

If the source is in addition stationary then the dependence is only on the ranks and not on the moments when the symbols have been transmitted, so:

[2.4]

that is, the statistical properties are independent of any temporal translation. We can consider that a text written in any language is an example of such a source.

2.2.3. Ergodic source: stationary source with finite memory

The ergodicity of an information source implies that a temporal average calculated over an infinite duration is equal to the statistical average.

2.2.4. First order Markovian source (first order Markov chain)

First order Markov sources play an important role in several domains, for example, in (Caragata et al. 2015), the authors used such a source to model the cryptanalysis of a digital watermarking algorithm.

It is characterized by:

[2.5]

with:

The probability = p_l,k is called transition probability from state l to state k, and:

[2.6]

The probability that at time n the source is in the state k is:

[2.7]

By introducing matrix notations:

[2.8]

Taking into account the relationship [2.8], the relation [2.7] is also written as:

[2.9]

where T^t is the transposed matrix of transition probabilities.

Moreover, if the source is, stationary, then:

in other words, the probabilities of the occurrence of the symbols do not depend on n. It is the same for the transition probabilities p_l,k, then the relation [2.9] is written:

[2.10]

where P₀ is the matrix of probabilities governing the generation of symbols by the source at the initial instant n = 0.

2.3. Uncertainty, amount of information and entropy (Shannon’s 1948 theorem)

The realization of an event x of probability p(x) conveys a quantity of information h(x) related to its uncertainty. h(x) is an increasing function of its improbability 1/p(x):

If an event x is certain, then p(x) = 1, the uncertainty is zero and therefore the quantity of information h(x) brought by its realization is null.

Moreover, let us consider the realization of a pair of independent events x and y. Their joint realization leads to the amount of information it brings being the sum of the quantities of information brought by each of them:

[2.11]

but ft(x,y) is also written:

[2.12]

so, from the relationships [2.11] and [2.12], the form of the function f is of logarithmic type, the base a of the logarithmic function can be any

[2.13]

It is agreed by convention that we select, as the unit of measure of information, the information obtained by the random selection of a single event out of two equally probable events p_i = p_j = 1/2. In this case, we can write:

If we choose the logarithm in base 2, λ becomes equal to unity and therefore h(x_i) = h(x_j) = log₂(2) = 1 Shannon (Sh) or 1 bit of information, not to be confused with the digital bit (binary digit) which represents one of the binary digits 0 or 1.

Finally, we can then write:

[2.14]

It is sometimes convenient to work with logarithms in base e or with logarithms in base 10. In these cases, the units will be:

log_e e = 1 natural unit = 1 nat (we choose 1 among e)

log₁₀ 10 = 1 decimal unit = dit (we choose 1 among 10)

Knowing that:

the relationships between the three units are:

– natural unit: 1 nat = log₂(e) = 1/log_e(2) = 1.44 bits of information;
– decimal unit: 1 dit= log₂(10) = 1/log₁₀(2) = 3.32 bits of information.

They are pseudo-units without dimension.

2.3.1. Entropy of a source

Let a stationary memoryless source S produce random independent events (symbols) s, belonging to a predetermined set [S] = [s₁,s₂, ... ,s_N]. Each event (symbol) S_i is of given probability p_i, with:

The source S is then characterized by the set of probabilities [P] = [p₁,p₂, ... ,P_N]. We are now interested in the average amount of information from this source of information, that is to say, resulting from the possible set of events (symbols) that it carries out, each is taken into account with its probability of occurrence. This average amount of information from the source S is called “entropy H(S) of the source”.

It is therefore defined by:

[2.15]

2.3.2. Fundamental lemma

Let two probability partitions on S:

we have the inequality:

[2.16]

Indeed, since: log_e(x) ≤ x − 1, ∀x positive real, then:

2.3.3. Properties of entropy

– Positive: since 0 ≤ p_i ≤ 1; (with the agreement
– Continuous: because it is a sum of continuous functions “log” of each p_i.
– Symmetric: relative to all the variables p_i.
– Upper bounded: entropy has a maximum value: got for a uniform law:
– Additive: let , then

[2.17]

2.3.4. Examples of entropy

2.3.4.1. Two-event entropy (Bernoulli’s law)

Graph depicts Entropy of a two-event source. — **Figure 2.1.** *Entropy of a two-event source*

The maximum of the entropy is obtained for equal to 1 bit of information.

2.3.4.2. Entropy of an alphabetic source with (26 + 1) characters

– For a uniform law:
⟹H = log₂ (27) = 4.75 bits of information per character
– In the French language (according to a statistical study):
⟹H = 3.98 bits of information per character

Thus, a text of 100 characters provides an information = 398 bits.

The inequality of the probabilities makes a loss of 475 – 398 = 77 bits of information.

2.4. Information rate and redundancy of a source

The information rate of a source is defined by:

[2.18]

Where: represents the average duration of a symbol emitted by the source.

The redundancy of a source is defined as follows:

[2.19]

2.5. Discrete channels and entropies

Between the source of information and the destination, there is the medium through which information is transmitted. This medium, including the equipment necessary for transmission, is called the transmission channel (or simply the channel).

Let us consider a discrete stationary and memoryless channel (discrete: the alphabet of the symbols at the input and the one at the output are discrete).

Flow diagram depicts basic transmission system based on a discrete channel. — **Figure 2.2.** *Basic transmission system based on a discrete channel. For a color version of this figure, see* *www.iste.co.uk/assad/digital1.zip*

We denote:

– [X] = [x_l, x₂, ... , x_n]: the set of all the symbols at the input of the channel;
– [y] = [y_i, ... , y_m]: the set of all the symbols at the output of the channel;
– [P(X)] = [p(x₁), p(x₂), ...,p(x_n)]: the vector of probability of symbols at the input of the channel;
– [P(Y)] = [p(y_i), p(y₂), ... , p(y_m)]: the vector of probability of symbols at the output of the channel.

Because of the perturbations, the space [Y] can be different from the space [X], and the probabilities P(Y) can be different from the probabilities P(X).

We define a product space [X • Y] and we introduce the matrix of the probabilities of the joint symbols, input-output [P(X, Y)]:

[2.20]

We deduce, from this matrix of probabilities:

[2.21]

[2.22]

We then define the following entropies:

– the entropy of the source:
[2.23]
– the entropy of variable Y at the output of the transmission channel:
[2.24]
– the entropy of the two joint variables (X, Y)Because of the disturbances in the transmission channel, if the symbolinput-output:
[2.25]

2.5.1. Conditional entropies

Because of the disturbances in the transmission channel, if the symbol y_j appears at the output, there is an uncertainty on the symbol x_i, j = 1, ... ,,n which has been sent.

Schematic illustration of ambiguity on the symbol at the input when j is received. — **Figure 2.3.** *Ambiguity on the symbol at the input when yj is received*

The average value of this uncertainty, or the entropy associated with the receipt of the symbol y_j, is:

[2.26]

The mean value of this entropy for all the possible symbols yj received is:

[2.27]

Which can be written as:

[2.28]

or:

[2.29]

The entropy H(X/Y) is called equivocation (ambiguity) and corresponds to the loss of information due to disturbances (as I(X, Y) = H(X)− H(X/Y)). This will be specified a little further.

Because of disturbances, if the symbol x_i is issued, there is uncertainty about the received symbol y_j, j = 1, ... , m.

Schematic illustration of uncertainty on the output when we know the input. — **Figure 2.4.** *Uncertainty on the output when we know the input*

The entropy of the random variable Y at the output knowing the X at the input is:

[2.30]

This entropy is a measure of the uncertainty on the output variable when that of the input is known.

The matrix P(Y/X) is called the channel noise matrix:

[2.31]

A fundamental property of this matrix is:

[2.32]

Where: p(y_j/x_i) is the probability of receiving the symbol y_j when the symbol x_i has been emitted.

In addition, one has:

[2.33]

with:

[2.34]

p(y_j) is the probability of receiving the symbol y_jwhatever the symbol x_i emitted, and:

[2.35]

p(x_i/y_j) is the probability that the symbol x_i was issued when the symbol y_j is received.

2.5.2. Relations between the various entropies

We can write:

[2.36]

In the same way, as one has: H(Y, X) = H(X, Y), therefore:

[2.37]

In addition, one has the following inequalities:

[2.38]

and similarly:

[2.39]

SPECIAL CASES.–

– Noiseless channel: in this case, on receipt of y_j, there is certainty about the symbol actually transmitted, called x_i (one-to-one correspondence), therefore:

[2.40]

Consequently:

[2.41]

and:

[2.42]

– Channel with maximum power noise: in this case, the variable at the input is independent of that of the output, i.e.:

[2.43]

We then have:

[2.44]

[2.45]

[2.46]

Note.– In information security, if x_i is the plaintext, and y_j is the corresponding ciphertext, then p(x_i/y_i) = p(x_i) is the condition of the perfect secret of a cryptosystem.

2.6. Mutual information

The mutual information obtained on the symbol x_i when the symbol y_j is received is given by:

[2.47]

where i(x_i, y_j) represents the decrease in uncertainty on x_i due to the receipt of y_j.

The average value of the mutual information, or the amount of information I(X, Y) transmitted through the channel is:

[2.48]

or:

[2.49]

Hence:

[2.50]

Given relationships [2.36] and [2.37], we get:

[2.51]

[2.52]

[2.53]

Special cases.

– Noiseless channel: X and Y symbols are linked, so:
I(X, Y)=H(X)=H(Y)
– Channel with maximum power noise: X and Y symbols are independent, therefore:
I(X, Y) = 0

2.7. Capacity, redundancy and efficiency of a discrete channel

Claude Shannon introduced the concept of channel capacity, to measure the efficiency with which information is transmitted, and to find its upper limit.

The capacity C of a channel: (information bit/symbol) is the maximum value of the mutual information I(X, Y) over the set of input symbols probabilities

[2.54]

The maximization of I(X, Y) is performed under the constraints that:

The maximum value of I(X, Y)occurs for some well-defined values of these probabilities, which thus define a certain so-called secondary source.

The capacity of the channel can also be related to the unit of time (bitrate C_t of the channel), in this case, one has:

[2.55]

The channel redundancy R_c and the relative channel redundancy p_c are defined by:

[2.56]

[2.57]

The efficiency of the use of the channel is defined by

[2.58]

2.7.1. Shannon’s theorem: capacity of a communication system

Shannon also formulated the capacity of a communication system by the following relation:

[2.59]

where:

– B: is the channel bandwidth, in hertz;
– P_s: is the signal power, in watts;
is the power spectral density of the (supposed) Gaussian and white noise in its frequency band B;
is the noise power, in watts.

EXAMPLE.– Binary symmetric channel (BSC).

Any binary channel will be characterized by the noise matrix:

If the binary channel is symmetric, then one has:

p(y1/x2) = p(y2/x1) = p

p(y1/x1) = p(y2/x2) = 1 − p

Schematic illustration of binary symmetric channel. — **Figure 2.5.** *Binary symmetric channel*

The channel capacity is:

The conditional entropy H(Y/X) is:

Hence:

But max H(Y) = 1 for p(y₁) = p(y₂). It follows from the symmetry of the channel that if p(y₁) = p(y₂), then p(x₁) = p(x₂) = 1/2, and C will be given by:

Graph depicts variation of the capacity of a BSC according to p. — **Figure 2.6.** *Variation of the capacity of a BSC according to p*

2.8. Entropies with k random variables

The joined entropy of k random variables is written:

[2.60]

Furthermore, by setting down in the relationship:

One has:

[2.61]

Equality occurs when the variables are independent.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2 Measurement of Information of a Discrete Source and Channel Capacity

Create new playlist

Sign In

Sign Up