Chapter 4

Random Signals and Stochastic Processes

Luiz Wagner Pereira Biscainho,    DEL/Poli & PEE/COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil

Abstract

This chapter is divided in three main parts. First, the main concepts of Probability are introduced. In the sequence, they are encapsulated into the Random Variable framework; a brief discussion of estimators is provided as an application example. At last, Random Processes and Sequences are tackled. After their detailed presentation, noise models, and modulation, spectral characterization and sampling of random processes are briefly discussed. The description of linear time-invariant processing of stationary processes and sequences leads to applications like Wiener filtering and special models for random signals.

Keywords

Probability; Random variable; Random process; Random sequence; Linear time-invariant system

Acknowledgements

The author thanks Dr. Paulo A.A. Esquef for carefully reviewing this manuscript and Leonardo de O. Nunes for kindly preparing the illustrations.

1.04.1 Introduction

Probability is an abstract concept useful to model chance experiments. The definition of a numerical representation for the outcomes of such experiments (the random variable) is essential to build a complete and general framework for probabilistic models. Such models can be extended to non-static outcomes in the form of time signals,1 leading to the so-called stochastic process, which can evolve along continuous or discrete time. Its complete description is usually too complicated to be applied to practical situations. Fortunately, a well-accepted set of simplifying properties like stationarity and ergodicity allows the modeling of many problems in so different areas as Biology, Economics, and Communications.

There are plenty of books on random processes,2 each one following some preferred order and notation, but covering essentially the same topics. This chapter is just one more attempt to present the subject in a compact manner. It is structured in the usual way: probability is first introduced, then described through random variables; within this framework, stochastic processes are presented, then associated with processing systems. Due to space limitations, proofs were avoided, and examples were kept at a minimum. No attempt was made to cover many families of probability distributions, for example. The preferred path was to define clear and unambiguously the concepts and entities associated to the subject and whenever possible give them simple and intuitive interpretations. Even risking to seem redundant, the author decided to explicitly duplicate the formulations related to random processes and sequences (i.e., continuous- and discrete-time random processes); the idea was to provide always a direct response to a consulting reader, instead of suggesting modifications in the given expressions.

Writing technical material always poses a difficult problem: which level of detail and depth will make the text useful? Our goal was making the text approachable for an undergraduate student as well as a consistent reference for more advanced students or even researchers (re)visiting random processes. The author tried to be especially careful with the notation consistence throughout the chapter in order to avoid confusion and ambiguity (which may easily occur in advanced texts). The choices of covered topics and order of presentation reflect several years of teaching the subject, and obviously match those of some preferred books. A selected list of didactic references on statistics [1,2], random variables and processes [3,4], and applications [5,6] is given at the end of the chapter. We hope you enjoy the reading.

1.04.2 Probability

Probabilistic models are useful to describe and study phenomena that cannot be precisely predicted. To establish a precise framework for the concept of probability, one should define a chance experiment, of which each trial yields an outcome s. The set of all possible outcomes is the sample space S. In this context, one speaks of the probability that one trial of the experiment yields an outcome s that belongs to a desired set image (when one says the event A has occurred), as illustrated in Figure 4.1.

image

Figure 4.1 Sample space S, event A, and outcome s.

There are two different views of probability: the subjectivist and the objectivist. For subjectivists, the probability measures someone’s degree of belief on the occurrence of a given event, while for objectivists it results from concrete reality. Polemics aside, objectivism can be more rewarding didactically.

From the objectivist view point, one of the possible ways to define probability is the so-called classical or a priori approach: given an experiment whose possible outcomes s are equally likely, the probability of an event A is defined as the ratio between the number image of acceptable outcomes (elements of A) and the number image of possible outcomes (elements of S):

image (4.1)

The experiment of flipping a fair coin, with image, is an example in which the probabilities of both individual outcomes can be theoretically set: image.

Another way to define probability is the relative frequency or a posteriori approach: the probability that the event A occurs after one trial of a given experiment can be obtained by taking the limit of the ratio between the number image of successes (i.e., occurrences of A) and the number n of experiment trials, when the repeats go to infinity:

image (4.2)

In the ideal coin flip experiment, one is expected to find equal probabilities for head and tail. On the other hand, this pragmatic approach allows modeling the non-ideal case, provided the experiment can be repeated.

A pure mathematical definition can provide a sufficiently general framework to encompass every conceptual choice: the axiomatic approach develops a complete probability theory from three axioms:

1. image.

2. image.

3. Given image, such that image, then image.

Referring to an experiment with sample space S:

• The events image, are said to be mutually exclusive when the occurrence of one prevents the occurrence of the others. They are the subject of the third axiom.

• The complementimage of a given event A, illustrated in Figure 4.2, is determined by the non-occurrence of A, i.e., image. From this definition, image. Complementary events are also mutually exclusive.

image

Figure 4.2 Event A and its complement image.

• An event image is called impossible. From this definition, image.

• An event image is called certain. From this definition, image.

It should be emphasized that all events A related to a given experiment are completely determined by the sample space S, since by definition image. Therefore, a set B of outcomes not in S is mapped to an event image. For instance, for the experiment of rolling a fair die, image; the event corresponding to image (showing a 7) is image.

According to the experiment, sample spaces may be countable or uncountable.3 For example, the sample space of the coin flip experiment is countable. On the other hand, the sample space of the experiment that consists in sampling with no preference any real number from the interval image is uncountable. Given an individual outcome image, defining image,

image (4.3)

As a consequence, one should not be surprised to find an event image with image or an event image with image. In the image interval sampling experiment, image has image; and image has image.

1.04.2.1 Joint, conditional, and total probability—Bayes’ rule

The joint probability of a set of events image, is the probability of their simultaneous occurrence image. Referring to Figure 4.3, given two events A and B, their joint probability can be found as

image (4.4)

By rewriting this equation as

image (4.5)

one finds an intuitive result: the term image is included twice in image, and should thus be discounted. Moreover, when A and B are mutually exclusive, we arrive at the third axiom. In the die experiment, defining image and image, and image.

image

Figure 4.3 image and image.

The conditional probabilityimage is the probability of event A conditioned to the occurrence of event B, and can be computed as

image (4.6)

The value of image accounts for the uncertainty of both A and B; the term image discounts the uncertainty of B, since it is certain in this context. In fact, the conditioning event B is the new (reduced) sample space for the experiment. By rewriting this equation as

image (4.7)

one gets another interpretation: the joint probability of A and B combines the uncertainty of B with the uncertainty of A when B is known to occur. Using the example in the last paragraph, image in sample space image.

Since image, the Bayes Rule follows straightforwardly:

image (4.8)

This formula allows computing one conditional probability image from its reverse image. Using again the example in the last paragraph, one would arrive at the same result for image using image in sample space image in the last equation.

If the sample space S is partitioned into M disjoint sets image, such that image, then any event image can be written as image. Since image, are disjoint sets,

image (4.9)

which is called the total probability of B.

For a single event of interest A, for example, the application of Eq. (4.9) to Eq. (4.8) yields

image (4.10)

Within the Bayes context, image and image are usually known as the a priori and a posteriori probabilities of A, respectively; image is called likelihood in the estimation context, and also transition probability in the communications context. In the latter case, image could refer to a symbol a sent by the transmitter and image, to a symbol b recognized by the receiver. Knowing image, the probability of recognizing b when a is sent (which models the communication channel), and image, the a priori probability of the transmitter to send a, allows to compute image, the probability of a having been sent given that b has been recognized. In the case of binary communication, we could partition the sample space either in the events TX0 (0 transmitted) and TX1 (1 transmitted), or in the events RX0 (0 recognized) and RX1 (1 recognized), as illustrated in Figure 4.4; the event “error” would be image.

image

Figure 4.4 Binary communication example: partitions of S regarding transmission and reception.

1.04.2.2 Probabilistic independence

The events image are said to be mutually independent when the occurrence of one does not affect the occurrence of any combination of the others.

For two events A and B, three equivalent tests can be employed: they are independent if and only if.

1. image, or

2. image, or

3. image.

The first two conditions follow directly from the definition of independence. Using any of them in Eq. (4.6) one arrives at the third one. The reader is invited to return to Eq. (4.7) and conclude that B and image are independent events. Consider the experiment of rolling a special die with the numbers 1, 2, and 3 stamped in black on three faces and stamped in red on the other three; the events image and image are mutually independent.

Algorithmically, the best choice for testing the mutual independence of more than two events is using the third condition for every combination of image events among image.

As a final observation, mutually exclusive events are not mutually independent; on the contrary, one could say they are maximally dependent, since the occurrence of one precludes the occurrence of the other.

1.04.2.3 Combined experiments—Bernoulli trials

The theory we have discussed so far can be applied when more than one experiment is performed at a time. The generalization to the multiple experiment case can be easily done by using cartesian products.

Consider the experiments image with their respective sample spaces image. Define the combined experiment E such that each of its trials is composed by one trial of each image. An outcome of E is the M-tuple image, where image, and the sample space of E can be written as image. Analogously, any event of E can be expressed as image, where image is a properly chosen event of image.

In the special case when the sub-experiments are mutually independent, i.e., the outcomes of one do not affect the outcomes of the others, we have image. However, this is not the general case. Consider the experiment of randomly selecting a card from a 52-card deck, repeated twice: if the first card drawn is replaced, the sub-experiments are independent; if not, the first outcome affects the second experiment. For example, for image if the first card is replaced, and image if not.

At this point, an interesting counting experiment (called Bernoulli trials) can be defined. Take a random experiment E with sample space S and a desired event A (success) with image, which also defines image (failure) with image. What is the probability of getting exactly k successes in N independent repeats of E? The solution can be easily found by noticing that the desired result is composed by k successes and image failures, which may occur in image different orders. Then,

image (4.11)

When image, and image,

image (4.12)

Returning to the card deck experiment (with replacement): the probability of selecting exactly 2 aces of spades after 300 repeats is approximately 5.09% according to Eq. (4.11), while Eq. (4.12) provides the approximate value 5.20%. In a binary communication system where 0 and 1 are randomly transmitted with equal probabilities, the probability that exactly 2 bits 1 are sent among 3 bits transmitted is 0.25%; this result can be easily checked by inspection of Figure 4.5.

image

Figure 4.5 Three consecutive bits sent through a binary communication system.

1.04.3 Random variable

Mapping each outcome of a random experiment to a real number provides a different framework for the study of probabilistic models, amenable to simple interpretation and easy mathematical manipulation. This mapping is performed by the so-called random variable.

Given a random experiment with sample space S, a random variable is any function image that maps each image into some image (see Figure 4.6). The image of this transformation with domain S and co-domain image, which results of the convenient choice of the mapping function, can be seen as the sample space of image; any event A of image can be described as a subset of image; and each mapped outcome x is called a sample of image. The following conditions should be satisfied by a random variable image:

1. image is an event image;

2. image.

We will see later that these conditions allow the proper definition of the cumulative probability distribution function of image.

image

Figure 4.6 Sample space S mapped into image via random variable image.

As seen in Section 1.04.2, sample spaces (and events) can be countable or uncountable, according to the nature of the random experiment’s individual outcomes. After mapped into subsets of image, countable events remain countable—e.g., one could associate with the coin flip experiment a random variable image such that image and image. On the other hand, uncountable events may or may not remain uncountable. Consider the following four distinct definitions of a random variable image associated with the image interval experiment described immediately before Eq. (4.3):

1. image;

2. image

3. image

4. image.

The sample space of:

1. image, is uncountable;

2. image, is countable;

3. image, is part uncountable, part countable;

4. image, even if continuous, may be difficult to classify.

The classification of sample spaces as countable or uncountable leads directly to the classification of random variables as discrete, continuous, or mixed. A discrete random variable image has a countable sample space image, and has image—this is the case of image and image defined above. A continuous random variable image has an uncountable sample space image, and (to avoid ambiguity) has image—this is the case of image defined above. A mixed variable image has a sample space image composed by the union of real intervals with continuously distributed probabilities, within which image, and discrete real values with finite probabilities image—this is the case of image and image defined above. Specifically, since image, image should rather be treated as part uncountable, part countable than as simply uncountable.

1.04.3.1 Probability distributions

From the conditions that must be satisfied by any random variable image, an overall description of its probability distribution can be provided by the so-called cumulative probability distribution function (shortened to CDF),

image (4.13)

Since image. It is obvious that image; but since image. Moreover, image and is a non-decreasing function of x, i.e., image. Also, image, i.e., image is continuous from the right; this will be important in the treatment of discrete random variables. A typical CDF is depicted in Figure 4.7. One can use the CDF to calculate probabilities by noticing that

image (4.14)

For the random variable image whose distribution is described in Figure 4.7, one can easily check by inspection that image.

image

Figure 4.7 Example of cumulative probability distribution function.

The CDF of the random variable image associated above with the image interval sampling experiment is

image (4.15)

The CDF of the random variable image associated above with the coin flip experiment is

image (4.16)

Given a random variable image, any single value image such that image contributes a step4 of amplitude p to image. Then, it is easy to conclude that for a discrete random variable image with image,

image (4.17)

An even more informative function that can be derived from the CDF to describe the probability distribution of a random variable image is the so-called probability density function (shortened to PDF),

image (4.18)

Since image is a non-decreasing function of x, it follows that image. From the definition,

image (4.19)

Then,

image (4.20)

The PDF corresponding to the CDF shown in Figure 4.7 is depicted in Figure 4.8. One can use the PDF to calculate probabilities by noticing that

image (4.21)

Again, for the random variable image whose distribution is described in Figure 4.8, one can easily check by inspection that image.

image

Figure 4.8 Example of probability density function.

The PDF of the random variable image associated above with the image interval sampling experiment is

image (4.22)

The PDF of the random variable image associated above with the coin flip experiment is

image (4.23)

which is not well-defined. But coherently with what has been seen for the CDF, given a random variable image, any single value image such that image contributes an impulse5 of area p to image. Then, it is easy to conclude that for a discrete random variable image with image,

image (4.24)

In particular, for the coin flip experiment the PDF is image.

In the case of discrete random variables, in order to avoid the impulses in the PDF, one can operate directly on the so-called mass probability function6:

image (4.25)

In this case,

image (4.26)

This chapter favors an integrate framework for continuous and discrete variables, based on CDFs and PDFs.

It is usual in the literature referring (for short) to the CDF as “the distribution” and to the PDF as “the density” of the random variable. This text avoids this loose terminology, since the word “distribution” better applies to the overall probabilistic behavior of the random variable, no matter in the form of a CDF or a PDF.

1.04.3.2 Usual distributions

The simplest continuous distribution is the so-called uniform. A random variable image is said to be uniformly distributed between a and image if its PDF is

image (4.27)

depicted in Figure 4.9. Notice that the inclusion or not of the interval bounds is unimportant here, since the variable is continuous. The error produced by uniform quantization of real numbers is an example of uniform random variable.

image

Figure 4.9 Uniform probability distribution.

Perhaps the most recurrent continuous distribution is the so-called Gaussian (or normal). A Gaussian random variable image is described by the PDF

image (4.28)

As seen in Figure 4.10, this function is symmetrical around image, with spread controlled by image. These parameters, respectively called statistical mean and standard deviation of image, will be precisely defined in Section 1.04.3.4. The Gaussian distribution arises from the combination of several independent random phenomena, and is often associated with noise models.

image

Figure 4.10 Gaussian probability distribution.

There is no closed expression for the Gaussian CDF, which is usually tabulated for a normalized Gaussian random variable image with image and image such that

image (4.29)

In order to compute image for a Gaussian random variable image, one can build an auxiliary variable image such that image, which can then be approximated by tabulated values of image.

Example 1

The values r of a 10-% resistor series produced by a component industry can be modeled by a Gaussian random variable image with image. What is the probability of producing a resistor within image of image?

Solution 1

The normalized counterpart of image is image. Then,

image (4.30)

Notice once more that since the variable is continuous, the inclusion or not of the interval bounds has no influence on the result.

In Section 1.04.2.3, an important discrete random variable was implicitly defined. A random variable image that follows the so-called binomial distribution is described by

image (4.31)

and counts the number of occurrences of an event A which has probability p, after N independent repeats of a random experiment. Games of chance are related to binomial random variables.

The so-called Poisson distribution, described by

image (4.32)

counts the number of occurrences of an event A that follows a mean rate of image occurrences per unit time, during the time interval T. Traffic studies are related to Poisson random variables.

It is simple to derive the Poisson distribution from the binomial distribution. If an event A may occur with no preference anytime during a time interval image, the probability that A occurs within the time interval image is image. If A occurs N times in image, the probability that it falls exactly k times in image is image. If image; if A follows a mean rate of image occurrences per unit time, then image and the probability of exact k occurrences in a time interval of duration T becomes image, where image substituted for Np. If a central office receives 100 telephone calls per minute in the mean, the probability that more than 1 call arrive during 1s is image.

Due to space restrictions, this chapter does not detail other random distributions, which can be easily found in the literature.

1.04.3.3 Conditional distribution

In many instances, one is interested in studying the behavior of a given random variable image under some constraints. A conditional distribution can be built to this effect, if the event image summarizes those constraints. The corresponding CDF of image conditioned to B can be computed as

image (4.33)

The related PDF is simply

image (4.34)

Conditional probabilities can be straightforwardly computed from conditional CDFs or PDFs.

When the conditioning event is an interval image, it can be easily deduced that

image (4.35)

and

image (4.36)

Both sentences in Eq. (4.36) can be easily interpreted:

• Within image, the conditional PDF has the same shape of the original one; the normalization factor image ensures image in the restricted sample space image.

• By definition, there is null probability of getting x outside image.

As an example, the random variable H defined by the non-negative outcomes of a normalized Gaussian variable follows the PDF

image (4.37)

The results shown in this section can be promptly generalized to any conditioning event.

1.04.3.4 Statistical moments

The concept of mean does not need any special introduction: the single value that substituted for each member in a set of numbers produces the same total. In the context of probability, following a frequentist path, one could define the arithmetic mean of infinite samples of a random variable image as its statistical meanimage or its expected valueimage.

Recall the random variable image associated with the fair coin flip experiment, with image. After infinite repeats of the experiment, one gets 50% of heads (image) and 50% of tails (image); then, the mean outcome will be7image. If another variable image is associated with an unfair coin with probabilities image and image, the same reasoning leads to image. Instead of averaging infinite outcomes, just summing the possible values of the random variable weighted by their respective probabilities also yields its statistical mean. Thus we can state that for any discrete random variable image with sample space image,

image (4.38)

This result can be generalized. Given a continuous random variable image, the probability of drawing a value in the interval image around image is given by image. The weighted sum of every image is simply

image (4.39)

By substituting the PDF of a discrete variable image (see Eq. (4.24)) into this expression, one arrives at Eq. (4.38). Then, Eq. (4.39) is the analytic expression for the expected value of any random variable image.

Suppose another random variable image is built as a function of image. Since the probability of getting the value image is the same as getting the respective image, we can deduce that

image (4.40)

A complete family of measures based on expected values can be associated with a random variable. The so-called nth-order moment of image (about the origin) is defined as

image (4.41)

The first two moments of image are:

• image, i.e., the mean of image, given by the centroid of image.

• image, i.e., the mean square value of image.

A modified family of parameters can be formed by computing the moments about the mean. The so-called nth-order central moment of image is defined as

image (4.42)

Subtracting the statistical mean from a random variable can be interpreted as disregarding its “deterministic” part (represented by its statistical mean) Three special cases:

• image, known as the variance of image, which measures the spread of image around image. The so-called standard deviationimage is a convenient measure with the same dimension of image.

• image, whose standardized version image is the so-called skewness of image, which measures the asymmetry of image.

• image, whose standardized version8 minus three image is the so-called kurtosis of image, which measures the peakedness of image. One can say the distributions are measured against the Gaussian, which has a null kurtosis.

Of course, analogous definitions apply to the moments computed over conditional distributions.

A useful expression, which will be recalled in the context of random processes, relates three of the measures defined above:

image (4.43)

Rewritten as

image (4.44)

it allows to split the overall “intensity” of image, measured by its mean square value, into a deterministic part, represented by image, and a random part, represented by image.

As an example, consider a discrete random variable image distributed as shown in Table 4.1. Their respective parameters are:

• mean image, indicating the PDF of image is shifted to the right of the origin;

• mean square value image, which measures the intensity of image;

• variance image, which measures the random part of the intensity of image;

• standard deviation image, which measures the spread of image around its mean;

• skewness image, indicating the PDF of image is left-tailed;

• kurtosis image, indicating the PDF of image is less peaky than the PDF of a Gaussian variable.

Table 4.1

Statistical Distribution of image

Image

At this point, we can discuss two simple transformations of a random variable image whose effects can be summarized by low-order moments:

• A random variable image can be formed by adding a fixed offset image to each sample of image. As a consequence of this operation, the new PDF is a shifted version of the original one: image, thus adding image to the mean of image: image.

• A random variable image can be formed by scaling by image each sample of image. As a consequence of this operation, the new PDF is a scaled version of the original one: image, thus scaling by image the standard deviation of image: image.

Such transformations do not change the shape (and therefore the type) of the original distribution. In particular, one can generate:

• a zero-mean version of image by making image, which disregards the deterministic part of image;

• a unit-standard deviation version of image by making image, which enforces a standard statistical variability to image;

• a normalized version of image by making image, which combines both effects.

We already defined a normalized Gaussian distribution in Section 1.04.3.2. The normalized version image of the random variable image of the last example would be distributed as shown in Table 4.2.

Table 4.2

Statistical Distribution of image

Image

The main importance of these expectation-based parameters is to provide a partial description of an underlying distribution without the need to resource to the PDF. In a practical situation in which only a few samples of a random variable are available, as opposed to its PDF and related moments, it is easier to get more reliable estimates for the latter (especially low-order ones) than for the PDF itself. The same rationale applies to the use of certain auxiliary inequalities that avoid the direct computation of probabilities on a random variable (which otherwise would require the knowledge or estimation of its PDF) by providing upper bounds for them based on low-order moments. Two such inequalities are:

• Markov’s inequality for a non-negative random variable image.

• Chebyshev’s inequality for a random variable image: image.

The derivation of Markov’s inequality is quite simple:

image (4.45)

Chebyshev’s inequality follows directly by substituting image for image in Markov’s inequality. As an example, the probability of getting a sample image from the normalized Gaussian random variable image is image; Chebyshev’s inequality predicts image, an upper bound almost 100 times greater than the actual probability.

Two representations of the distribution of a random variable in alternative domains provide interesting links with its statistical moments. The so-called characteristic function of a random variable image is defined as

image (4.46)

and inter-relates the moments image by successive differentiations:

image (4.47)

The so-called moment generating function of a random variable image is defined as

image (4.48)

and provides a more direct way to compute image:

image (4.49)

1.04.3.5 Transformation of random variables

Consider the problem of describing a random variable image obtained by transformation of another random variable image, illustrated in Figure 4.11. By definition,

image (4.50)

If image is a continuous random variable and image is a differentiable function, the sentence image is equivalent to a set of sentences in the form image. Since image, then image can be expressed as a function of image.

image

Figure 4.11 Random variable image mapped into random variable image.

Fortunately, an intuitive formula expresses image as a function of image:

image (4.51)

Given that the transformation image is not necessarily monotonic, image are all possible values mapped to y; therefore, their contributions must be summed up. It is reasonable that image must be directly proportional to image: the more frequent a given image, the more frequent its respective image. By its turn, the term image accounts for the distortion imposed to image by image. For example, if this transformation is almost constant in a given region, no matter if increasing or decreasing, then image in the denominator will be close to zero; this just reflects the fact that a wide range of image values will be mapped into a narrow range of values of image, which will then become denser than image in thar region.

As an example, consider that a continuous random variable image is transformed into a new variable image. For the CDF of image,

image (4.52)

By Eq. (4.51), or by differentiation of Eq. (4.52), one arrives at the PDF of image:

image (4.53)

The case when real intervals of image are mapped into single values of image requires an additional care to treat the nonzero individual probabilities of the resulting values of the new variable. However, the case of a discrete random variable image with image being transformed is trivial:

image (4.54)

Again, image are all possible values mapped to image.

An interesting application of transformations of random variables is to obtain samples of a given random variable image from samples of another random variable image, both with known distributions. Assume that exists image such that image. Then, image, which requires only the invertibility of image.

1.04.3.6 Multiple random variable distributions

A single random experiment E (composed or not) may give rise to a set of N random variables, if each individual outcome image is mapped into several real values image. Consider, for example, the random experiment of sampling the climate conditions at every point on the globe; daily mean temperature image and relative air humidity image can be considered as two random variables that serve as numerical summarizing measures. One must generalize the probability distribution descriptions to cope with this multiple random variable situation: it should be clear that the set of N individual probability distribution descriptions, one for each distinct random variable, provides less information than their joint probability distribution description, since the former cannot convey information about their mutual influences.

We start by defining a multiple or vector random variable as the function image that maps image into image, such that

image (4.55)

and

image (4.56)

Notice that image are jointly sampled from image.

The joint cumulative probability distribution function of image, or simply the CDF of image, can be defined as

image (4.57)

The following relevant properties of the joint CDF can be easily deduced:

• image, since image for any image.

• image, since this condition encompasses the complete S.

• image and is a nondecreasing function of image, by construction.

Define image containing image variables of interest among the random variables in image, and leave the remaining imagenuisance variables in image. The so-called marginal CDF of image separately describes these variables’ distribution from the knowledge of the CDF of image:

image (4.58)

One says the variables image have been marginalized out: the condition image simply means they must be in the sample space, thus one does not need to care about them anymore.

The CDF of a discrete random variable image with image, analogously to the single variable case, is composed by steps at the admissible N-tuples:

image (4.59)

The joint probability density function of image, or simply the PDF of image, can be defined as

image (4.60)

Since image is a non-decreasing function of image, it follows that image. From the definition,

image (4.61)

Then,

image (4.62)

The probability of any event image can be computed as

image (4.63)

Once more we can marginalize the nuisance variables image to obtain the marginal PDF of image from the PDF of image:

image (4.64)

Notice that if image and image are statistically dependent, the marginalization of image does not “eliminate” the effect of image on image: the integration is performed over image, which describes their mutual influences. For a discrete random variable image with image consists of impulses at the admissible N-tuples:

image (4.65)

Suppose, for example, that we want to find the joint and marginal CDFs and PDFs of two random variables image and image jointly uniformly distributed in the region image, i.e., such that

image (4.66)

The marginalization of image yields

image (4.67)

The marginalization of image yields

image (4.68)

By definition,

image (4.69)

The reader is invited to sketch the admissible region image on the image plane and try to solve Eq. (4.69) by visual inspection. The marginal CDF of x is given by

image (4.70)

which could also have been obtained by direct integration of image. The marginal CDF of y is given by

image (4.71)

which could also have been obtained by direct integration of image.

Similarly to the univariate case discussed in Section 1.04.3.3, the conditional distribution of a vector random variable image restricted by a conditioning event image can be described by its respective CDF

image (4.72)

and PDF

image (4.73)

A special case arises when B imposes a point conditioning: Define image containing image variables among those in image, such that image. It can be shown that

image (4.74)

an intuitive result in light of Eq. (4.6)—to which one may directly refer in the case of discrete random variables. Returning to the last example, the point-conditioned PDFs can be shown to be

image (4.75)

and

image (4.76)

1.04.3.7 Statistically independent random variables

Based on the concept of statistical independence, discussed in Section 1.04.2.2, the mutual statistical independence of a set of random variables means that the probabilistic behavior of each one is not affected by the probabilistic behavior of the others. Formally, the random variables image are independent if any of the following conditions is fulfilled:

• image, or

• image, or

• image, or

• image.

Returning to the example developed in Section 1.04.3.6, x and y are clearly statistically dependent. However, if they were jointly uniformly distributed in the region defined by image, they would be statistically independent—this verification is left to the reader.

The PDF of a random variable image composed by the sum of N independent variables image can be computed as9

image (4.77)

1.04.3.8 Joint statistical moments

The definition of statistical mean can be directly extended to a vector random variable image with known PDF. The expected value of a real scalar function image is given by:

image (4.78)

We can analogously proceed to the definition of N-variable joint statistical moments. However, due to the more specific usefulness of the image cases, only 2-variable moments will be presented.

An imageth-order joint moment of image and image (about the origin) has the general form

image (4.79)

The most important case corresponds to image: image is called the statistical correlation between image and image. Under a frequentist perspective, image is the average of infinite products of samples x and y jointly drawn from image, which links the correlation to an inner product between the random variables. Indeed, when image, image and image are called orthogonal. One can say the correlation quantifies the overall linear relationship between the two variables. Moreover, when image, image and image are called mutually uncorrelated; this “separability” in the mean is weaker than the distribution separability implied by the independence. In fact, if image and image are independent,

image (4.80)

i.e., they are also uncorrelated. The converse is not necessarily true.

An imageth-order joint central moment of image and image, computed about the mean, has the general form

image (4.81)

Once more, the case image is specially important: image is called the statistical covariance between image and image, which quantifies the linear relationship between their random parts. Since10image, image and image are uncorrelated when image. If image, one says image and image are positively correlated, i.e., the variations of their statistical samples tend to occur in the same direction; if image, one says image and image are negatively correlated, i.e., the variations of their statistical samples tend to occur in opposite directions. For example, the age and the annual medical expenses of an individual are expected to be positively correlated random variables. A normalized covariance can be computed as the correlation between the normalized versions of image and image:

image (4.82)

known as the correlation coefficient between image and image, image, and can be interpreted as the percentage of correlation between x and y. Recalling the inner product interpretation, the correlation coefficient can be seen as the cosine of the angle between the statistical variations of the two random variables.

Consider, for example, the discrete variables image and image jointly distributed as shown in Table 4.3. Their joint second-order parameters are:

• correlation image, indicating image and image are not orthogonal;

• covariance image, indicating image and image tend to evolve in opposite directions;

• correlation coefficient image, indicating this negative correlation is relatively strong.

Table 4.3

Statistical Distribution of image

Image

Returning to N-dimensional random variables, the characteristic function of a vector random variable image is defined as

image (4.83)

and inter-relates the moments of order image by:

image (4.84)

The moment generating function of a vector random variable image is defined as

image (4.85)

and allows the computation of image as:

image (4.86)

1.04.3.9 Central limit theorem

A result that partially justifies the ubiquitous use of Gaussian models, the Central Limit Theorem (CLT) states that (under mild conditions in practice) the distribution of a sum image of N independent variables image approaches a Gaussian distribution as image. Having completely avoided the CLT proof, we can at least recall that in this case, image (see Eq. (4.77). Of course, image and, by the independence property, image.

Gaussian approximations for finite-N models are useful for sufficiently high N, but due to the shape of the Gaussian distribution, the approximation error grows as y distances from image.

The reader is invited to verify the validity of the approximation provided by the CLT for successive sums of independent image uniformly distributed in image.

Interestingly, the CLT also applies to the sum of discrete distributions: even if image remains impulsive, the shape of image approaches the shape of the CDF of a Gaussian random variable as N grows.

1.04.3.10 Multivariate Gaussian distribution

The PDF of an N-dimensional Gaussian variable is defined as

image (4.87)

where image is an image vector with elements image, and image is an image matrix with elements image, such that any related marginal distribution remains Gaussian. As a consequence, by Eq. (4.74), any conditional distribution image with point conditioning B is also Gaussian.

By this definition, we can see the Gaussian distribution is completely defined by its first- and second-order moments, which means that if N jointly distributed random variables are known to be Gaussian, estimating their mean-vectorimage and their covariance-matriximage is equivalent to estimate their overall PDF. Moreover, if image and image are mutually uncorrelated, image, then image is diagonal, and the joint PDF becomes image, i.e., image will be mutually independent. This is a strong result inherent to Gaussian distributions.

1.04.3.11 Transformation of vector random variables

The treatment of a general multiple random variable image resulting from the application of the transformation image to the multiple variable image always starts by enforcing image, with image.

The special case when image is invertible, i.e., image (or image), follows a closed expression:

image (4.88)

where

image (4.89)

The reader should notice that this expression with image reduces to Eq. (4.51) particularized to the invertible case.

Given the multiple random variable image with known mean-vector image and covariance-matrix image, its linear transformation image will have:

• a mean-vector image

• a covariance-matrix image

It is possible to show that the linear transformation image of an N-dimensional Gaussian random variable image is also Gaussian. Therefore, in this case the two expressions above completely determine the PDF of the transformed variable.

The generation of samples that follow a desired multivariate probabilistic distribution from samples that follow another known multivariate probabilistic distribution applies the same reasoning followed in the univariate case to the multivariate framework. The reader is invited to show that given the pair image of samples from the random variables image, jointly uniform in the region image, it is possible to generate the pair image of samples from the random variables image, jointly Gaussian with the desired parameters image by making

image (4.90)

1.04.3.12 Complex random variables

At first sight, defining a complex random variable image may seem impossible, since the seminal event image makes no sense. However, in the vector random variable framework this issue can be circumvented if one jointly describes the real and imaginary parts (or magnitude and phase) of image.

The single complex random variable image, is completely represented by image, which allows one to compute the expected value of a scalar function image as image. Moreover, we can devise general definitions for the mean and variance of image as, respectively:

• image;

• image, which measures the spread of image about its mean in the complex plan.

An N-dimensional complex random variable image can be easily tackled through a properly defined joint distribution, e.g., image. The individual variables image, image, will be independent when

image (4.91)

The following definitions must be generalized to cope with complex random variables:

• correlation: image;

• covariance: image.

Now, image. As before, image and image will be:

• orthogonal when image;

• uncorrelated when image.

1.04.3.13 An application: estimators

One of the most encompassing applications of vector random variables is the so-called parameter estimation, per se an area supported by its own theory. The typical framework comprises using a finite set of measured data from a given phenomenon to estimate the parameters11 of its underlying model.

A set image of N independent identically distributed (iid) random variables such that

image (4.92)

can describe an N-size random sample of a population modeled by image. Any function image (called a statistic of image) can constitute a point estimatorimage of some parameter image such that given an N-size sample image of image, one can compute an estimate image of image.

A classical example of point estimator for a fixed parameter is the so-called sample mean, which estimates image by the arithmetic mean of the available samples image:

image (4.93)

Not always defining a desired estimator is trivial task as in the previous example. In general, resorting to a proper analytical criterion is necessary. Suppose we are given the so-called likelihood function of image, image, which provides a probabilistic model for image as an explicit function of image. Given the sample image, the so-called maximum likelihood (ML) estimator image computes an estimate of image by direct maximization of image. This operation is meant to find the value of image that would make the available sample image most probable.

Sometimes the parameter image itself can be a sample of a random variable image described by an a priori PDF image. For example, suppose one wants to estimate the mean value of a shipment of components received from a given factory; since the components may come from several production units, we can think of a statistical model for their nominal values. In such situations, any reliable information on the parameter distribution can be taken in account to better tune the estimator formulation; these so-called Bayesian estimators rely on the a posteriori PDF of image:

image (4.94)

Given a sample image, the so-called maximum a posteriori (MAP) estimator image computes an estimate of image as the mode of image. This choice is meant to find the most probable image that would have produced the available data image. Several applications favor the posterior mean estimator, which for a given sample image computes image, over the MAP estimator.

The quality of an estimator image can be assessed through some of its statistical properties.12 An overall measure of the estimator performance is its mean square errorimage, which can be decomposed in two parts:

image (4.95)

where image is the estimator bias and image is its variance. The bias measures the deterministic part of the error, i.e., how much the estimates deviate from the target in the mean, thus providing an accuracy measure: the smaller its bias, the more accurate the estimator. The variance, by its turn, measures the random part of the error, i.e., how much the estimates spread about their mean, thus providing a precision measure: the smaller its variance, the more precise the estimator. Another property attributable to an estimator is consistence13: image is said to be consistent when image. Using the Chebyshev inequality (see Section 1.04.3.4), one finds that an unbiased estimator with image is consistent. It can be shown that the sample mean defined in Eq. (4.93) has image (thus is unbiased) and image (thus is consistent). The similarly built estimator for image, the unbiased sample variance, computes

image (4.96)

The reader is invited to prove that this estimator is unbiased, while its more intuitive form with denominator N instead of image is not.

One could argue how much reliable a point estimation is. In fact, if the range of image is continuous, one can deduce that image. However, perhaps we would feel safer if the output of an estimator were something like “image with probability p.” image and image are the confidence limits and p is the confidence of this interval estimate.

Example 2

Suppose we use a 10-size sample mean to estimate image of a unit-variance Gaussian random variable image. Determine the 95%-confidence interval for the estimates.

Solution 2

It is easy to find that image is a Gaussian random variable with image and image. We should find a such that image, or image. For the associated normalized Gaussian random variable, image. Then, image.

1.04.4 Random process

If the outcomes s of a given random experiment are amenable to variation in time t, each one can be mapped by some image into a real function image. This is a direct generalization of the random variable. As a result, we get an infinite ensemble of time-varying sample functions or realizationsimage which form the so-called stochastic or random processimage. For a fixed image, image reduces to a single random variable, thus a random process can be seen as a time-ordered multiple random variable. An example of random process is the simultaneous observation of air humidity at every point of the globe, image: each realization image describes the humidity variation in time at a randomly chosen place on the earth whereas the random variable image describes the distribution of air humidity over the earth at instant image. A similar construction in the discrete time n domain would produce image composed of realizations image. If the hourly measures of air humidity substitute for its continuous measure in the former example, we get a new random process image.

As seen, there can be continuous- or discrete-time random processes. Since random variables have already been classified as continuous or discrete, one analogously refers to continuous-valued and discrete-valued random processes. Combining these two classifications according to time and value is bound to result ambiguous or awkwardly lengthy; when this complete categorization is necessary, one favors the nomenclature random process for the continuous-time case and random sequence for the discrete-time case, using the words “continuous” and “discrete” only with reference to amplitude values.14 In the former examples, the continuous random process image and random sequence image would model the idealized actual air humidity as a physical variable, whereas the practical digitized measures of air humidity would be modeled by their discrete counterparts.

1.04.4.1 Distributions of random processes and sequences

One could think of the complete description of a random process or sequence as a joint CDF (or PDF) encompassing every possible instant of time t or n, respectively, which is obviously impractical. However, their partial representation by Lth-order distributions (employing a slightly modified notation to include the reference to time) can be useful. The CDF of a random process image can be defined as

image (4.97)

with an associated PDF

image (4.98)

The CDF of a random sequence image can be defined as

image (4.99)

with an associated PDF

image (4.100)

This convention can be easily extended to the joint distributions of M random processes image (sequences image), as will be seen in the next subsection. The notation for point-conditioned distributions can also be promptly inferred (see Section 1.04.4.12).

1.04.4.2 Statistical independence

As with random variables, the mutual independence of random processes or sequences is tested by the complete separability of their respective distributions. The random processes image, are independent when

image (4.101)

for any choice of time instants. Similarly, the random sequences image, are independent when

image (4.102)

for any choice of time instants.

1.04.4.3 First- and second-order moments for random processes and sequences

It is not necessary to redefine moments in the context of random processes and sequences, since we will always be tackling a set of random variables as in Section 1.04.3.8. But it is useful to revisit the first- and second-order cases with a slightly modified notation to include time information.

For a random process image, the following definitions apply:

• the meanimage;

• the mean square value or mean instantaneous powerimage;

• the varianceimage;

• the auto-correlationimage;

• the auto-covarianceimage.

As seen for random variables,

image (4.103)

which means that the mean instantaneous power of the process is the sum of a deterministic parcel with a random parcel.

Analogously, for a random sequence image, the following definitions apply:

• the meanimage;

• the mean square value or mean instantaneous powerimage;

• the varianceimage;

• the auto-correlationimage;

• the auto-covarianceimage.

Again,

image (4.104)

i.e., the mean instantaneous power of the sequence is the sum of a deterministic parcel with a random parcel.

Given two random processes image and image, one defines:

• the cross-correlationimage;

• the cross-covarianceimage.

The processes image and image are said to be mutually:

• orthogonal when image;

• uncorrelated when image, which is the same as image.

The mean instantaneous power of image is given by

image (4.105)

which suggests the definition of image as the mean instantaneous cross power between image and image.

Analogously, given two random sequences image and image, one defines:

• the cross-correlationimage;

• the cross-covarianceimage.

The sequences image and image are said to be mutually:

• orthogonal when image;

• uncorrelated when image, which is the same as image.

The mean instantaneous power of image is given by

image (4.106)

which suggests the definition of image as the mean instantaneous cross power between image and image.

The interpretation of this “cross power” is not difficult: it accounts for the constructive or destructive interaction of the two involved processes (sequences) as dictated by their mutual correlation.

We will dedicate some space later to the properties of second-order moments.

1.04.4.4 Stationarity

The stationarity of random processes and sequences refers to the time-invariance of their statistical properties, which turns their treatment considerably easier.

Strict-sense is the strongest class of stationarity. A strict-sense stationary (SSS) random process image (sequence image) bears exactly the same statistical properties as image (image), which means that its associated joint distribution for any set of time instants does not change when they are all shifted by the same time lag. A random process (sequence) in which each realization is a constant sampled from some statistical distribution in time is SSS.

Under this perspective, we can define that a random process image is Lth-order stationary when

image (4.107)

or, alternatively,

image (4.108)

Analogously, a random sequence image is Lth-order stationary when

image (4.109)

or, alternatively,

image (4.110)

We can say that an SSS random process or sequence is stationary for any order L. Moreover, Lth-order stationarity implies imageth-order stationarity, by marginalization at both sides of any of the four equations above.

First-order stationarity says the statistical distribution at any individual instant of time t (n) is the same, i.e., image does not depend on t (image does not depend on n). A natural consequence is the time-invariance of its statistical mean: imageimage. Second-order stationarity says the joint statistical distribution at any two time instants t and image for a fixed image (n and image for a fixed k) is the same. A natural consequence is the time-invariance of its auto-correlation, which can depend only of the time lag between the two time instants: imageimage.

A weaker class of stationarity called wide-sense is much used in practice for its simple testability. A random process image (sequence image) is said to be wide-sense stationary (WSS) when its mean and auto-correlation are time-invariant. Such conditions do not imply stationarity of any order, although second-order stationarity implies wide-sense stationarity. As an extension of such definition, one says that two processes image and image (sequences image and image) are jointly WSS when they are individually WSS and their cross-correlation is time-invariant, i.e., can depend only of the time lag between the two time instants: imageimage.

1.04.4.5 Properties of correlation functions for WSS processes and sequences

Wide-sense stationarity in random processes and sequences induces several interesting properties, some of which are listed below. For processes,

• image;

• image;

• If image;

• If image has a periodic component, image has the same periodic component;

• image;

• image;

• image.

For sequences,

• image;

• image;

• If image, image;

• If image has a periodic component, image has the same periodic component;

• image;

• image;

• image.

Even if the properties are not proven here, some interesting observations can be traced about them:

• The statistical behavior of the process (sequence) at a given instant of time is maximally correlated with itself, an intuitive result.

• The autocorrelation commutes.

• The statistical behaviors of a WSS process (sequence) with no periodic components at two time instants separated by image are uncorrelated. Then, the autocorrelation tends to the product of two identical statistical means.

• Periodic components in the process (sequence) cause the correlation maximum and surrounding behavior to repeat at each fundamental period.

• The arithmetic and geometric mean properties are in a certain sense linked to the first property. The second property is more stringent than the first.

1.04.4.6 Time averages of random processes and sequences

The proper use of random processes and sequences to model practical problems depends on the careful understanding of both their statistical and time variations. The main difference between these two contexts, which must be kept in mind, is the fact that time is inexorably ordered, thus allowing the inclusion in the model of some deterministic (predictable) time-structure. When we examine the statistical auto-correlation between two time instants of a random process or sequence, we learn how the statistical samples taken at those instants follow each other. In a WSS process or sequence, this measure for different lags can convey an idea of statistical periodicity.15 But what if one is concerned with the time structure and characteristics of each (or some) individual realization of a given process or sequence? The use of time averages can provide this complementary description.

The time average of a given continuous-time function image can be defined as

image (4.111)

Given the random process image, the time average of any of its realizations image is

image (4.112)

The average power of image is

image (4.113)

The time auto-correlation of image for a lag image is defined as

image (4.114)

The time cross-correlation between the realizations image of process image and image of process image for a lag image is defined as

image (4.115)

As in the statistical case, an average cross power between image and image, which accounts for their mutual constructive or destructive interferences due to their time structure, can be defined as

image (4.116)

Computed for the complete ensemble, such measures produce the random variables image, image, image, image, and image, respectively. Under mild convergence conditions,

image (4.117)

image (4.118)

image (4.119)

image (4.120)

and

image (4.121)

which are respectively the overall mean value, mean power, and auto-correlation (for lag image) of process image, and cross-correlation (for lag image) and mean cross power between processes image and image.

The time average of a given discrete-time function image can be defined as

image (4.122)

Given the random sequence image, the time average of any of its realizations image is

image (4.123)

The average power of image is

image (4.124)

The time auto-correlation of image for lag k is defined as

image (4.125)

The time cross-correlation between the realizations image of sequence image and image of sequence image for lag k is defined as

image (4.126)

As in the statistical case, an average cross power between image and image, which accounts for their mutual constructive or destructive interferences due to their time structure, can be defined as

image (4.127)

Computed for the complete ensemble, such measures produce the random variables image, image, image, and image, respectively. Under mild convergence conditions,

image (4.128)

image (4.129)

image (4.130)

image (4.131)

and

image (4.132)

which are respectively the overall mean value, mean power, and auto-correlation (for lag k) of sequence image, and cross-correlation (for lag k) and mean cross power between sequences image and image.

As an example, consider that a deterministic signal image (with image) is additively contaminated by random noise image which can be modeled as a realization of the WSS random process image (with image and image), thus yielding the random signal image. The corresponding process image is obviously not WSS, since image. Furthermore, (following the nomenclature we have adopted) we can compute its:

• mean instantaneous power image;

• mean power image;

• overall mean power image.

Defining the signal-to-noise ratio (SNR) as the ratio between signal and noise powers, we can conclude that image.

Now, suppose that two differently contaminated versions of image are available: image and image, and that the underlying noise processes image and image are jointly WSS and uncorrelated, with image, image, and image. If an average signal image is computed, we can guarantee image and image (i.e., noise is reduced) as long as image.

1.04.4.7 Ergodicity

The so-called ergodicity is a property that allows interchanging statistic and temporal characteristics of some random processes (sequences)—which are then called ergodic. There are several levels of ergodicity, some of which are discussed below.

A random process image with constant statistical mean is said to be mean-ergodic when any time average image is equal to image with probability 1, which requires image. If image is WSS, a necessary and sufficient condition for mean-ergodicity is

image (4.133)

A random process image with time-invariant auto-correlation is said to be auto-correlation-ergodic when any time auto-correlation image is equal to image with probability 1, which requires image. Two processes image and image with time-invariant cross-correlation are cross-correlation-ergodic when any time cross-correlation image is equal to image with probability 1, which requires image. Conditions for correlation-ergodicity of random processes involve 4th-order moments.

A random sequence image with constant statistical mean is said to be mean-ergodic when any time average image is equal to image with probability 1, which requires image. If image is WSS, a necessary and sufficient condition for mean-ergodicity is

image (4.134)

A random sequence image with time-invariant auto-correlation is said to be auto-correlation-ergodic when any time auto-correlation image is equal to image with probability 1, which requires image. Two sequences image and image with time-invariant cross-correlation are cross-correlation-ergodic when any time cross-correlation image is equal to image with probability 1, which requires image. Conditions for correlation-ergodicity of random sequences involve 4th-order moments.

A process (sequence) that is mean- and auto-correlation-ergodic is called wide-sense ergodic. Two wide-sense ergodic processes (sequences) that are cross-correlation-ergodic are called jointly wide-sense ergodic.

A process (sequence) is distribution-ergodic when is ergodic for every moment.

The SSS random process (sequence) formed by random constant realizations is not ergodic in any sense. From Eqs. (4.133) and (4.134), a WSS process image (sequence image), statistically uncorrelated at any different time instants image and image (image and image), i.e., with image (image), is mean-ergodic.

In practical real-life situations, quite often just a single realization of an underlying random process (sequence) is available. In such cases, if the latter is known to be ergodic, we can make use of the complete powerful statistical modeling framework described in this chapter. But how can one guarantee the property is enjoyed by a process (sequence) of which an only sample function is known? This is not so stringent a requirement: in fact, one needs just to be sure that there can be an ergodic process (sequence) of which that time function is a realization. Strictly speaking, a given music recording image additively contaminated by background noise image cannot be modeled as a realization of a mean-ergodic process,16 since each member of the ensemble would combine the same image with a different image. The random process which describes the noisy signal can be written as image; thus, image, while in practice we know that image.

1.04.4.8 An encompassing example

Define the random sequences image and image, such that:

• image is a random variable with mean image and variance image;

• image is a random variable with mean image and variance image;

• image rad/sample is a real constant;

• image is a random variable uniformly distributed between image and image rad;

• image is a random variable uniformly distributed between image and image rad;

• image, image, image, and image are mutually independent.

Compute their time and statistical means, mean powers, auto-correlations, cross-correlations, and mean cross-powers; and discuss their correlation, orthogonality, stationarity, and ergodicity.

Solution 3

Time-averages:

image (4.135)

image (4.136)

image (4.137)

then,

image (4.138)

image (4.139)

then,

image (4.140)

image (4.141)

then,

image (4.142)

Expected values:

image (4.143)

image (4.144)

image (4.145)

then,

image (4.146)

image (4.147)

then,

image (4.148)

image (4.149)

Conclusions:

• Since image and image do not depend on image, image is WSS.

• Since image and image do not depend on image, image is WSS.

• Since image is WSS, image is WSS, and image does not depend on image, image and image are jointly WSS.

• Since image, image and image are uncorrelated.

• Since image, image and imageare orthogonal.

• Since image, image is mean-ergodic.

• Since image, image is mean-ergodic.

1.04.4.9 Gaussian processes and sequences

A very special case of random processes (sequences) are the Gaussian-distributed. The corresponding Lth-order PDFs can be written (see Section 1.04.3.10).

• for processes as

image (4.150)

where image is the image mean-vector with elements image, for image, and image is the image covariance-matrix with elements image, for image;

• for sequences as

image (4.151)

where image is the image mean-vector with elements image, for image, and image is the image covariance-matrix with elements image, for image.

From the definition above:

• A WSS Gaussian random process (sequence) is SSS: Since the PDF of a Gaussian process (sequence) is entirely described by first- and second-order moments, if these moments do not depend on t (n), the PDF itself does not depend on t (n).

• Uncorrelated Gaussian processes (sequences) are independent: A joint PDF of two uncorrelated Gaussian processes (sequences) can be easily factorized as the product of their individual PDFs, since the covariance-matrix becomes block diagonal.

These two strong properties turn Gaussian models mathematically simpler to tackle, and the strict-sense stationarity and independence conditions easier to meet in practice.

The reader is invited to show that a stationary Gaussian process (sequence) whose auto-correlation is absolutely integrable in image (summable in k) is ergodic.

1.04.4.10 Poisson random process

Poisson is an example of discrete random process that we have already defined in Section 1.04.3.2. Each realization image of image counts the occurrences along time of a certain event whose mean occurrence rate is image per time unit, provided that.

• there are no simultaneous occurrences;

• occurrence times are independent.

It can model the entry of clients in a store, for example. By convention:

• image, i.e., the count starts in image;

• image provides the count between image and image;

• image provides the count between image and image.

Time origin can be shifted if necessary.

From Eq. (4.32), the first-order PDF of a Poisson process is given by

image (4.152)

Applying the definition iteratively, one can show that for image, the corresponding Lth-order PDF is

image (4.153)

1.04.4.11 Complex random processes and sequences

Analogously to what has been done for random variables, real random processes and sequences can be generalized to handle the complex case.

A complex random process can be described as image, where image and image are real random processes. It is stationary in some sense as long as image e image are jointly stationary in that sense. The remaining definitions related to process stationarity are kept the same.

For a complex random process image, the following definitions apply:

• the meanimage;

• the mean instantaneous powerimage;

• the varianceimage;

• the auto-correlationimage;

• the auto-covarianceimage.

Given two random processes image and image, one defines:

• the mean instantaneous cross powerimage;

• the cross-correlationimage;

• the cross-covarianceimage.

The processes image and image are said to be mutually:

• orthogonal when image;

• uncorrelated when image, which is the same as image.

For a random sequence image, the following definitions apply:

• the meanimage;

• the mean instantaneous powerimage;

• the varianceimage;

• the auto-correlationimage;

• the auto-covarianceimage.

Given two random processes image and image, one defines:

• the mean instantaneous cross powerimage

• the cross-correlationimage;

• the cross-covarianceimage.

The processes image and image are said to be mutually:

• orthogonal when image;

• uncorrelated when image, which is the same as image.

1.04.4.12 Markov chains

The simplest random processes (sequences) are those whose statistics at a given time are independent from every other time: a first-order distribution suffices to describe them. A less trivial model is found when the statistical time interdependency assumes a special recursive behavior such that the knowledge of a past conditioning state summarizes all previous history of the process (sequence). For the so-called Markov random process (sequence), the knowledge of the past does not affect the expectation of the future when the present is known.

Mathematically, a random process image is said to be a Markov process when for image,

image (4.154)

or for image,

image (4.155)

We will restrict ourselves to the discrete-time case. A random sequence image is said to be a Markov sequence when

image (4.156)

which can be seen as a transition PDF. From the definition, one arrives at the chain rule

image (4.157)

which means that the overall Markov sequence can be statistically described for image from the knowledge of its distribution at image and its subsequent transition distributions. Some interesting properties can be deduced from the expressions above:

• Since image, a time-reversed Markov sequence is also a Markov sequence.

• For image, image.

A much important property of Markov sequences is the so-called Chapman-Kolmogorov equation: for image,

image (4.158)

which provides a recursive way to compute arbitrary transition PDFs.

We can say a Markov sequence image is stationary when image and image are shift-invariant. In this case, the overall sequence can be obtained from image. A less trivial (and quite useful) model is provided by homogeneous Markov sequences, which are characterized only by a shift-invariant transition distribution; they are not stationary in general, but can be asymptotically stationary (i.e., for image) under certain conditions.

Discrete Markov processes and sequences are called Markov chains, which can assume a countable number of random states image described by their state probabilities (image for sequences) and transition probabilities (image for sequences). Discrete-time Markov chains enjoy the following properties, for image:

• image, which totalizes all possible ways to left state image in image;

• image, which totalizes all possible ways to arrive at state image in image;

• image (Chapman-Kolmogorov equation).

If the chain has a finite number of states, a matrix notation can be employed. The state probability vector image, with elements image, and the transition probability matrix image, with elements image, are related by image.

The special class of homogeneous Markov chains enjoys the following additional properties:

• image;

• image.

Accordingly, the transition probability matrix becomes image. Defining image,

image (4.159)

i.e., image. When asymptotic stationarity is reachable, one can find the steady-state probability vectorimage such that image.

Consider, for example, the homogeneous Markov chain image depicted in Figure 4.12, with states 1 and 2, state probabilities image and image, and transition probabilities image and image. Its corresponding one-sample transition matrix is

image (4.160)

such that

image (4.161)

It can also be shown that the chain reaches the steady-state probability vector

image (4.162)

image

Figure 4.12 Homogeneous Markov chain with two states.

1.04.4.13 Spectral description of random processes and sequences

At first sight, the direct conversion of each realization image (image) to the frequency domain seems the easiest way to spectrally characterize a random process image (sequence image). However, it is difficult to guarantee the existence of the Fourier transform

image (4.163)

(in the case os random processes) or

image (4.164)

(in the case of random sequences) for every realization. And even if possible, we would get a random spectrum of limited applicability. In Section 1.04.4.5, we have found that the auto-correlation conveys information about every sinusoidal component found in the process (sequence). A correct interpretation of the Fourier transform17 suggests that the auto-correlation in fact conveys information about the overall spectrum of the process (sequence). Furthermore, it is a better behaved function than the individual realizations, and thus amenable to be Fourier transformed.

In Section 1.04.4.6, Eq. (4.118), we defined the overall mean power of the random process image, which for the general complex case, becomes

image (4.165)

It can be shown that

image (4.166)

where

image (4.167)

and

image (4.168)

We can then define a power spectral density

image (4.169)

such that image. Some additional algebraic manipulation yields

image (4.170)

which directly relates the auto-correlation to the power spectral density of the process, as predicted. From the expressions above, image is a non-negative real function of image. Furthermore, if image is real, then image is an even function of image.

In Section 1.04.4.6, Eq. (4.121), we also defined the overall mean cross-power between the processes image and image, which for the general complex case, becomes

image (4.171)

Following the same steps as above, we arrive at

image (4.172)

where image is defined as before,

image (4.173)

and

image (4.174)

We can then define the corresponding cross-power spectral density

image (4.175)

such that image. In terms of the cross-correlation, the cross-power spectral density can be written as

image (4.176)

As expected, the cross-power density of orthogonal processes is zero. From the expressions above, image. Furthermore, if image and image are real, then the real part of image is even and the imaginary part of image is odd.

It should be noticed that in the WSS case, image, image, image, and image.

A similar development can be done for random sequences. In Section 1.04.4.6, Eq. (4.129), we defined the overall mean power of the random sequence image, which for the general complex case, becomes

image (4.177)

It can be shown that

image (4.178)

where

image (4.179)

and

image (4.180)

We can then define a power spectral density

image (4.181)

such that image. Some additional algebraic manipulation yields

image (4.182)

which directly relates the auto-correlation to the power spectral density of the random sequence, as expected. From the expressions above, image is a non-negative real function of image. Furthermore, if image is real, then image is an even function of image.

In Section 1.04.4.6, Eq. (4.132), we also defined the overall mean cross-power between the random sequences image and image, which for the general complex case, becomes

image (4.183)

Following the same steps as above, we arrive at

image (4.184)

where image is defined as before,

image (4.185)

and

image (4.186)

We can then define the corresponding cross-power spectral density

image (4.187)

such that image. In terms of the cross-correlation, the cross-power spectral density can be written as

image (4.188)

As expected, the cross-power density of orthogonal processes is zero. From the expressions above, image. Furthermore, if image and image are real, then the real part of image is even and the imaginary part of image is odd.

It should be noticed that in the WSS case, image, image, image, and image.

1.04.4.14 White and colored noise

One usually refers to a disturbing signal imageimage as noise. Since noise signals are typically unpredictable, they are preferably modeled as realizations of some noise random process image (sequence image).

A very special case is the so-called white noise (by analogy with white light), which is a uniform combination of all frequencies.

Continuous-time white noise image is sampled from a random process image characterized by the following properties:

• zero mean: image;

• non-correlation between distinct time instants: image;

• constant power spectral density: image;

• infinite overall mean power: image.

From the last property, continuous-time white noise is not physically realizable. Furthermore, WSS continuous-time white noise has image.

Discrete-time white noise image is sampled from a random sequence image characterized by the following properties:

• zero mean: image;

• non-correlation between distinct time instants: image;

• constant power spectral density: image;

• overall mean power image.

For WSS discrete-time white noise, image, image, and image.

Unless differently stated, white noise is implicitly assumed to be generated by a WSS random process (sequence). Notice, however, that the sequence image, where image rad and image is WSS white noise with unit variance, for example, satisfies all conditions to be called white noise, even if image.

A common (more stringent) model of white noise imposes that values at different time instants of the underlying process (sequence) be statistically i.i.d.

Any random noise whose associated power spectral density is not constant is said to be colored noise. One can find in the literature several identified colored noises (pink, grey, etc.), each one with a pre-specified spectral behavior. It should also be noticed that any band-limited approximation of white noise destroys its non-correlation property. Consider image generated by a WSS random process image such that

image (4.189)

It is common practice to call image “white noise” of bandwidth W and overall mean power P. Since its auto-correlation is

image (4.190)

strictly speaking one cannot say image is white noise.

1.04.4.15 Applications: modulation, “Bandpass” and band-limited processes, and sampling

An interesting application of the spectral description of random processes is modeling of AM (amplitude modulation). Assuming a carrier signal image modulated by a real random signal image sampled from a process image, the resulting modulated random process image has auto-correlation

image (4.191)

If image is WSS, then

image (4.192)

and the corresponding power spectral density is

image (4.193)

We conclude that the AM constitution of image carries through its auto-correlation, which provides a simple statistical model for AM.

Quite often we find ourselves dealing with band-limited random signals. In the AM discussion above, typically image has bandwidth image. For a random process image whose spectrum is at least concentrated around image (baseband process), we can attribute it an RMS (root-mean-squared) bandwidthimage such that

image (4.194)

If the spectrum is concentrated around a centroid

image (4.195)

image is given by

image (4.196)

If the process bandwidth excludes image, it is called18 a “bandpass” process.

It can be shown that if a band-limited baseband WSS random process image with bandwidth W, i.e., such that image for image, is sampled at rate image to generate the random sequence

image (4.197)

then image can be recovered from image with zero mean-squared-error. This is the stochastic version of the Nyquist criterion for lossless sampling.

1.04.4.16 Processing of random processes and sequences

Statistical signal modeling is often employed in the context of signal processing, and thus the interaction between random signals and processing systems calls for a systematic approach. In this chapter, we will restrict ourselves to linear time-invariant (LTI) systems.

A system defined by the operation image between input x and output y is called linear if any linear combination of m inputs produce as output the same linear combination of their m corresponding outputs:

• for a continuous-time system, image,

• for a discrete-time system, image.

For a linear system, one can define an impulse response h as the system output to a unit impulse image applied to the system input at a given time instant:

• for a continuous-time system, a unit impulse image applied at instant image, produces an impulse response image,

• for a discrete-time system, a unit impulse19image applied at instant k, produces an impulse response image.

A system is called time-invariant if a given input x applied at different time instants produces the same output y accordingly time-shifted:

• for a continuous-time system, if image, then image; thus, image,

• for a discrete-time system, if image, then image; thus, image.

The output of an LTI system to any input x is the convolutionimage between the input and the impulse response h of the system:

• for a continuous-time system, image,

• for a discrete-time system, image.

In the frequency domain, the output Y is related to the input X by the system frequency response H:

• for a continuous-time system, image,

• for a discrete-time system, image.

The product XH brings the concept of filtering: H defines how much of each frequency component of input X passes to the output Y. From now on, when referring to the processing of a random process (sequence) by a system we imply that each process realization is filtered by that system.

The filtering of a random process image by an LTI system with impulse response image results in another random process image such that image. Assuming image WSS and possibly complex, so as image, then image will be also WSS and20:

• image;

• image;

• image;

• image;

• image.

The processing of a random sequence image by an LTI system with impulse response image results in another random sequence image such that image. Assuming image WSS and possibly complex, so as image, then image will be also WSS and21:

• image;

• image;

• image;

• image;

• image.

Among the above results, one of the most important is the effect of filtering on the power spectral density image of the input WSS random process (sequence): it is multiplied by the squared magnitude image of the frequency response of the LTI system to produce the power spectral density image of the output WSS random process (sequence). As a direct consequence, any WSS random process (sequence) with known power spectral density S can be modeled by the output of an LTI system with squared magnitude response image whose input is white noise.

In order to qualitatively illustrate this filtering effect:

• Figure 4.13 shows 200 samples of white noise image;

image

Figure 4.13 White noise.

• Figure 4.14 shows 200 samples of image, a slowly evolving signal obtained by low-pass filtering of image to 20% of its original bandwidth;

image

Figure 4.14 Realization of a “low-pass” sequence.

• Figure 4.15 shows 200 samples of image, an almost sinusoidal signal obtained by band-pass filtering of image to the central 4% of its original bandwidth.

image

Figure 4.15 Realization of a “band-pass” sequence.

From the area of optimum filtering, whose overall goal is finding the best filter to perform a defined task, we can bring an important application of the concepts presented in this section. The general discrete Wiener filter H, illustrated in Figure 4.16, is the linear time-invariant filter which minimizes the mean quadratic value image of the error image between the desired signal image and the filtered version image of the input signal image. Assuming that image and image are realizations of two jointly WSS random sequences image and image, if the optimum filter is not constrained to be causal, one finds that it must satisfy the equation

image (4.198)

i.e., its frequency response is

image (4.199)

If one needs a causal FIR solution, a different but equally simple solution can be found for image.

image

Figure 4.16 General discrete Wiener filter.

We can solve a very simple example where we compute an optimum zero-order predictor for a given signal of which a noisy version is available. In this case, image (a simple gain to be determined), the input image (signal image additively corrupted by noise image) and image. The solution is

image (4.200)

which yields the minimum mean quadratic error

image (4.201)

Notice that if no noise is present,

image (4.202)

and

image (4.203)

The reader should be aware that this example studies a theoretical probabilistic model, which may therefore depend on several parameters which probably would not be available in practice and should rather be estimated. In fact, it points out the solution a practical system should pursue.

The Wiener filter is a kind of benchmark for adaptive filters [7], which optimize themselves recursively.

1.04.4.17 Characterization of LTI systems and WSS random processes

A direct way to find the impulse response image of an LTI system is to apply white noise image with power spectral density image to the system input, which yields image. From an estimate image of the cross-correlation, one can obtain an estimate of the desired impulse response: image. In practice, assuming ergodicity, the following approximations are employed:

• image;

• image for sufficiently long T.

Furthermore, the operations are often performed in the discrete-time domain.

The effective bandwidth of an LTI system with frequency response image can be estimated by its noise bandwidthimage. For a low-pass system, one looks for an equivalent ideal low-pass filter with bandwidth image around image with bandpass gain image; image is computed to guarantee that both systems deliver the same mean output power when the same white noise is applied to their both inputs. A straightforward algebraic development yields

image (4.204)

For a band-pass system with magnitude response centered about image, one looks for an ideal band-pass filter equivalent with bandwidth image around image with bandpass gain image is computed to guarantee that both systems delivers the same mean output power when the same equal white noise is applied to their both inputs. For this case,

image (4.205)

A real “bandpass” random process (see Section 1.04.4.15) about center frequency image can be expressed as image, where the envelope image and the phase image are both baseband random processes. We can write

image (4.206)

where image and image are the quadrature components of image. In the typical situation where image is a zero-mean WSS random process with auto-correlation image, we can explicitly find the first and second-order moments of the quadrature components that validate the model. First define the Hilbert transform of a given signal image as image, where

image (4.207)

From image, we can find image and proceed to write:

• image;

• image;

• image.

In the frequency domain, if we define

image (4.208)

then

• image;

• image.

In the particular case where image is Gaussian, it can be shown that image and image are Gaussian, and thus completely defined by their first- and second-order moments above.

1.04.4.18 Statistical modeling of signals: random sequence as the output of an LTI system

A discrete-time signal image with spectrum image can be modeled as the output of a linear time-invariant system whose frequency response is equal to image when excited by a unit impulse image. In this case, the impulse response of the system is expected to be image.

In the context of statistic models, an analogous model can be built: now, we look for a linear time-invariant system which produces at its output the WSS random sequence image with power spectral density image, when excited by a unit-variance random sequence image of white noise. As before, the frequency response of the system alone is expected to shape the output spectrum, yet this time in the mean power sense, i.e., the system must be such that image.

Assume the overall modeling system is described by the difference equation

image (4.209)

where image is white noise with image. The corresponding transfer function is

image (4.210)

The output image of this general system model with image at its input is called ARMA (auto-regressive moving-average22) process of order image. It can be described by the following difference equation in terms of its auto-correlation:

image (4.211)

The values of image for image can be easily found by symmetry.

If one restricts the system to be FIR (i.e., to have a finite-duration impulse response), then image, i.e., image for image, and

image (4.212)

The output of this “all-zero”23 system model with image at its input is called an MA (moving-average) process of order q. Its auto-correlation can be found to be

image (4.213)

In this case, image for image or image. This model is more suited for modeling notches in the random sequence power spectrum.

If one restricts the system to be “all-pole,”24 then image, i.e., image for image, and

image (4.214)

The output of this system model with image at its input is called an AR (auto-regressive) process of order p. Its auto-correlation follows the difference equation below:

image (4.215)

Again, the values of image for image can be easily found by symmetry. This model is more suited for modeling peaks in the random sequence power spectrum, which is typical of quasi-periodic signals (as audio signals, for example). It should be noticed that, differently from the ARMA and MA cases, the equation for the AR process auto-correlation is linear in the system coefficients, which makes their estimation easier. If image is known for image, and recalling that image, one can:

• solve the pth-order linear equation system obtained by substituting image in Eq. (4.215) to find image;

• compute image from Eq. (4.215) with image.

All-pole modeling of audio signals is extensively used in restoration systems [8].

Relevant Theory: Signal Processing Theory

See this Volume, Chapter 2 Continuous-Time Signals and Systems

See this Volume, Chapter 3 Discrete-Time Signals and Systems

References

1. Deep R. Probability and Statistics. San Diego: Academic; 2006.

2. Casella G, Berger RL. Statistical Inference. second ed. Pacific Grove: Duxbury; 2001.

3. Papoulis A, Unnikrishna P. Probability, Random Variables and Stochastic Processes. fourth ed. New York: McGraw-Hill; 2002.

4. Peebles Jr PZ. Probability, Random Variables and Random Signal Principles. fourth ed. New York: McGraw-Hill; 2001.

5. Shanmugan KS, Breipohl AM. Random Signals: Detection, Estimation and Data Analysis. New York: Wiley; 1988.

6. Hayes MH. Statistical Digital Signal Processing and Modeling. Hoboken: Wiley; 1996.

7. Diniz PSR. Adaptive Filtering: Algorithms and Practical Implementation. New York: Springer; 2012.

8. Godsill SJ, Rayner JW. Digital Audio Restoration: A Statistical Model Based Approach. London: Springer; 1988.


1Of course, other independent variables (e.g., space) can substitute for time in this context.

2Even if the literature sometimes prefers to employ “stochastic” for processes and “random” for signals, the expression “random processes” became common use and will be used throughout the chapter.

3One could think of mixed cases, but this discussion is better conveyed in the random variable framework, which comprises every possible mapping from the original experiment to a subset of the real numbers.

4Unit step function: image

5Unit impulse distribution, or Dirac delta: For image and image.

6This unusual notation is employed here for the sake of compatibility with the other definitions.

7Notice that the expected value of a random variable is not necessarily meaningful for the associated experiment.

8The classical definition does not subtracts three.

9Convolution integral: image.

10Equation (4.43) can be seen as the next expression computed for image, since image and image.

11We will tackle only scalar parameters, without loss of generality.

12For the sake of simplicity, we refer only to the estimation of a fixed parameter image.

13There are other definitions of consistence.

14In this text, for completeness we opted for always explicitly writing both formulations, for processes and sequences.

15We hinted this point when discussing the properties of second-order moments, in Section 1.04.4.5.

16Even if the author has done so many times.

17The Fourier transform represents a time signal as a linear combination of continuously distributed sinusoids.

18With considerable freedom of nomenclature.

19Unit impulse function, or Kronecker delta: For image, and image.

20Where notation image was used.

21Where notation image was used.

22Auto-regressive for its output feedback, moving average for its weighted sum of past input samples.

23This is a misname, since the poles of the system are at the origin. In fact, the zeros are the only responsible for shaping the frequency response of the system.

24This is another misname, since the zeros of the system are at the origin. In fact, the poles are the only responsible for shaping the frequency response of the system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.123.120