In this chapter we take a distinctly different approach to the problem of channel modeling. In our previous work, the channel was defined in terms of the noise, interference, and other disturbances that combine with the transmitted signal to produce a distorted and noisy waveform at the input to the receiver. The transmitted signal, as well as the noise, interference, and other channel disturbances, were all represented by samples of waveforms. The result was a waveform-level simulation that processed data on a sample-by-sample basis. We now consider partitioning the system in a way that eliminates much of the need for simulating at the waveform level. The result is a discrete channel model in which simulations operate on a symbol-by-symbol basis. The motivation for replacing a waveform-level channel model by a discrete channel model is speed of simulation. We will see that the discrete channel model is an abstraction of the physical (waveform) channel in which the channel is completely characterized by a small set of parameters. Determining these parameters constitutes an important part of the modeling process, and must be accomplished through measurements made on the physical channel or through the use of a single waveform-level simulation.
The basic model of a communication system, illustrated in Figure 15.1, typically consists of a discrete data source, a channel coder for error control, a modulator and transmitter, a channel, a receiver, and a decoder. Depending on the application for the system and the detail of the simulation, other elements such as equalizers, interleavers, deinterleavers, carrier synchronizers, and symbol synchronizers, may be part of the system model. As we know, the modulator maps a symbol, or a sequence of symbols, at the modulator input onto a waveform at the modulator output. The waveform is subjected to a number of degrading effects in the channel. Among these effects are noise, bandlimiting, interference, and fading. All of these effects can be characterized by waveforms. The input to the receiver is also a waveform, which is the transmitted waveform combined with the degrading effects of the channel. The role of the receiver is to observe the input waveform over a symbol period, or over a sequence of symbols, and determine the transmitted symbols. The channel in this case is referred to as a waveform channel, since both the channel input and the channel output are characterized as waveforms. It is important to note that all system elements, with the exception of the channel, are characterized by deterministic mappings. The channel performs a stochastic, or random, mapping of the channel input to the channel output.
The terminology “discrete channel model” (DCM) is used to denote all the elements of a communication system that lie between the two points A and B in the system, where the input at point A is a vector of discrete symbols (input sequence), denoted X = [x1,x2, . . . ,xk, . . . ], and the output at B is another vector of discrete symbols (output sequence), denoted Y = [y1,y2, . . . , yk, . . . ]. Usually, point A will be the output of the channel encoder or, equivalently, the modulator input, and point B will be the input to the decoder as shown in Figure 15.1. Note that, with the partitioning shown in Figure 15.1, the modulator and transmitter, the waveform channel, and the receiver are part of the discrete channel model. The relationship between the discrete channel input vector X and the output vector Y will be affected by the system components, such as filters, and by the random disturbances induced in the physical (waveform) channel. In a binary communication system with hard decision receivers, the elements of both X and Y will be binary sequences. In the case of soft-decision receivers, the elements of Y will be from an M-ary alphabet. Transmission errors resulting from imperfections in the system elements between points A and B in Figure 15.1, including the physical channel, will cause elements of X and Y to be different, at least occasionally.
Discrete channel models describe the error-generation mechanism probabilistically. These models fall into two categories. The first class of these models, referred to as “memoryless” channel models, are used to model the transmission errors or the transitions from the channel input to the channel output under the assumption that there is no temporal correlation in the transition mechanism. That is, the input-output transition probabilities for the nth channel input symbol are not affected by what happened to any other input symbol. Such models are applicable to channels in which there is no inter-symbol interference (ISI) or fading, and the noise is AWGN. In a hard decision binary system, this assumption implies that bit errors are uncorrelated. Discrete channel models for memoryless channels are usually trivial to derive.
The second and more interesting class of discrete models apply to situations in which the transitions from the input symbols to the output symbols are temporally correlated. In this case the probability of error for the nth symbol depends on whether or not an error occurred in the transmission of previous symbols. The fading channel commonly encountered in wireless communications is a good example of a channel having correlated errors. Errors will tend to occur in bursts when the radio channel is in a deep fade, and these channels are referred to as burst error channels or channels with memory.
Discrete channel models are probabilistic models that are computationally more efficient than waveform channel models. The increased efficiency comes from two factors. Discrete models are simulated at the symbol rate, whereas waveform level models are typically simulated at 8 to 16 times the symbol rate. This alone reduces the computational burden by roughly an order of magnitude. While each individual block is simulated in detail in a waveform-level model, there is a high level of abstraction in the discrete channel model. This level of abstraction, as we will see, further reduces the computational burden. These two factors may well contribute to several orders of magnitude savings in the time required to execute a simulation.
In simple cases, discrete channel models can be derived analytically from the models of the underlying components between the discrete channel input A and the discrete channel output B. However, in most cases, discrete channel models are derived from simulated or measured error patterns between the points A and B. Discrete channel models are used to design and analyze the impact of components outside the portion of the system between points A and B. This includes, for example, error control coders, source encoders, and interleavers.
Modeling discrete memoryless channels is a straightforward process. For example, in the case of a discrete memoryless symmetric channel with binary input and binary output, all we need to know to characterize the channel is a single number, namely, the bit error probability. Simulation of this channel involves drawing a single random number and comparing that number to a threshold in order to decide whether or not a given bit will suffer a transmission error or be received without error. Thus, in order to simulate the transmission of a million bits through a given channel, all we have to do is generate a million independent uniformly distributed random numbers and perform the required threshold comparisons. The entire system represented by this channel is therefore simulated efficiently. (Compare this with a waveform-level simulation, which involves generating millions of samples representing the waveforms present in the system being simulated, and processing these samples though all of the functional blocks in the system.)
Discrete channels with memory are more difficult to model. This temporally correlated error generation mechanism is usually modeled by a discrete-time Markov sequence [1, 2, 3, 4] in which a state model is used to characterize the various states of the channel and a set of transition probabilities are used to capture the progression of the channel states. Each state will also have associated with it a set of input-to-output symbol transition probabilities. Thus, the model is now more complex and requires more parameters than the uncorrelated channel. Model structure and parameter values are estimated from either simulated or measured error patterns. Simulation of the discrete channel consists of generating a random number prior to the transmission of each symbol to determine the channel state, and then drawing another random number to determine the input-to-output transition. While the model and parameter estimation procedures are complex, simulation of the Markov model is very efficient. We will elaborate on these ideas in the following sections.
In a discrete memoryless channel, the mapping of input to output is instantaneous and is described by a set of transition probabilities. The simplest discrete memoryless channel model is the binary symmetric channel (BSC), which is illustrated in Figure 15.2(a). The input to the discrete channel is a sequence of binary symbols, denoted by the vector X. The kth component of this vector, xk, corresponds to the kth channel input, and pk corresponds to the probability of error on the kth symbol transmission. For a memoryless channel, pk is independent of k, so the error on all symbols is effected by the channel in the same manner. (If the channel has memory, pk will, in general, be a function of pk−1.) A sequence of symbols of length M is processed through the memoryless channel by invoking the channel model M times in succession. For the kth symbol, a binary input of “0” is received correctly as “0” with a probability 1 − pk, and incorrectly as “1” with an error probability of pk. The kth channel output is denoted by yk and the M-symbol sequence of outputs is denoted by the vector Y. The channel is symmetric in that zeroes and ones are affected by the channel in the same manner. Note that one consequence of symmetry is that the error probability is independent of the transmitted symbol so that the error source can be simulated separately from the information (data) source.
For a binary channel, the input-output relationship can be expressed as
in which X and Y are the input and output data vectors, respectively, ⊕ denotes the XOR operation, and E is the error vector. Specifically, E = {e1,e2,e3, . . .} is a binary vector or sequence having elements {0,1} in which ek = 0 denotes that the kth element of X, xk, is received correctly (yk = xk), and ek = 1 denotes that the kth element of X is received in error (yk ≠ xk). The parameter defining the performance of the binary symmetric channel model is the probability of error PE, which can be easily estimated from measurements or from a system simulation performed at the waveform level. Note that, for a binary system, the Monte Carlo estimate of the error probability is the Hamming weight of the error vector E divided by N, the number of elements in the vector E.[1]
Note that once pk is known, the error sequence corresponding to N transmitted symbols can be generated by making N calls to a uniform random number generator and comparing the random number to the threshold p. Specifically:
where Uk denotes the number obtained from the kth call to the random number generator. It is important to realize that the discrete channel model will not reproduce the original error pattern used to originally determine p. It will, however, reproduce a statistically equivalent error pattern. The simplicity of (15.2) illustrates why substitution of a DCM for the portion of the system between points A and B in Figure 15.1 results in a significant savings in computational complexity.
A slightly more complicated model for the binary channel is the binary nonsymmetric channel model in which the probability of error may be different for zeroes and ones, as in some optical communication systems. This model is illustrated in Figure 15.2(b). This model is characterized by a set of four input-to-output transition probabilities, but only two of these probabilities, αk and βk, are independent. The channel is nonsymmetric if αk ≠ βk.
Extension of this model to the case in which the input and output belong to an M-ary alphabet is straightforward and is shown in Figure 15.3 for M = 3. The channel is characterized by a set of probabilities αij, i,j = 0,1,2, in which αij represents the conditional probability
If the set of probabilities αij are constant, and therefore independent of k, the kth channel output depends only on the kth channel input. For this case, the model shown in Figure 15.3 is memoryless. The transition probabilities are estimated using a detailed waveform-level simulation in which the transitions from the ith input to the jth output are counted. In these models, the transition probabilities are estimated from simulated or measured data as
where Ni is the number of occurrences of the ith input symbol, and nij is the number of times the ith input symbol transitions to the jth output. Another example, shown in Figure 15.4, is used for memoryless channels in which the output of the channel is quantized to a different number of levels from the number of input levels, as in soft decision decoding.
For channels with memory, the most commonly used model is the discrete-time, finite-state Markov model (MM). There are several reasons for the popularity of Markov models. These models are analytically tractable, their theory is well established in the statistical literature, and they have been applied successfully to a variety of important communication problems. Markov models have been used to model the output of discrete information sources such as English text. (The occurrence of a sequence of letters of the English alphabet in a typical passage can be modeled as a Markov sequence.) Similarly, the sampled values of an audio or video waveform can also be modeled by an MM. Markov modeling techniques are directly applicable to the modeling and analysis of discrete communication channels [1]. Also, Markov models can be used to evaluate the capacity of a discrete channel and for the design of optimal error control coding techniques. Markov models have been successfully used to characterize fading channels in wireless communication systems [5–10]. Most importantly, computationally efficient techniques are available for estimating the parameters of Markov sequences from simulated or measured error patterns. As an introduction to the Markov Model, we first consider a simple two-state model. The results are then generalized to an N-state model.
To set the stage for Markov channel models, consider a fading channel in which the received signal strength is above an acceptable performance threshold part of the time, and below the threshold during a deep fade. If we have interest only in the “above threshold” or “below threshold” conditions, we can model the channel to be in either one of the two states as shown in Figure 15.5. In Figure 15.5 we have a good state, g, in which the system performance is acceptable (PE < 10−3) and a bad state, b, in which the received signal level is so low that the probability of error is unacceptably high (PE > 10−3).
Thus, in the simple two-state model shown in Figure 15.5, the states can be represented by the set
As time progresses, the channel goes from good state to bad state, and vice versa. The rate of transition and the length of stay in each of the two states will depend on the temporal correlation of the fading process. If time is measured in increments of a symbol (bit) time, then we can construct a discrete channel model as follows.
At the beginning of each symbol (bit) interval, the channel is in one of the two states. If the channel is in a good state, the probability of transmission error is negligible. On the other hand, if the channel is in a bad state, the probability of transmission error is unacceptably high. Prior to the transmission of each new bit, the channel may change state or remain in the current state. This transition between states takes place with a set of transition probabilities, aij. These are conditional probabilities, and for the two-state model there are four probabilities of interest. Let St and St+1 denote the state of the channel at time t and at time t+1, respectively. (Note that t+1 represents time one increment greater than t, and not one second greater than t, so that t can be viewed as a discrete time index.) With this notation, we define the four transition probabilities to be
This can be represented by the state transition matrix
Note that since we are considering a two-state model, given that St = g, we must have St+1 = g (the channel remains in the good state) or St+1 = b (the channel transitions to the bad state). Thus, each row of the state transition matrix must sum to 1. A graphical representation of the state transition diagram of the two-state model is shown in Figure 15.6.
In the work to follow we will assume that the channel model is stationary and therefore the state state transition matrix A(t) is fixed so that A(t) = A. However, if a simulation is performed in which the initial state probability distribution Π0 is different from the steady-state distribution, some time is required for the state distribution to evolve to the steady-state value Πss. The probability of finding the model in a given state is of interest. Define Πt as the state probability distribution at time t. Specifically:
in which πt,g and πt,b represent the probabilities of finding the channel in the good state or in the bad state at time t, respectively.[2] By definition of the state transition matrix, the state distribution at time t+1 is given by
In a similar manner, the state distribution at time t+2 is
Thus, in general
where Ak represents the k step transition matrix.
Most, but not all, Markov processes settle to a steady-state probability distribution as time evolves.[3] Assuming that the Markov process converges to a steady-state value distribution Πt+k = Πt for sufficiently large t and arbitrary k, it follows that
for arbitrary k. It is easily shown that for sufficiently large values of k, the rows of Ak give Πss [1]. The convergence of Π0 to Πss is illustrated in the following example.
Example 15.1.
As a simple example let
with the initial state distribution
The following MATLAB program illustrates the convergence of Π0 to Πss:
% File: c15_MMtransient.m N = 100; pie = zeros(N,2); A = [0.98 0.02; 0.05 0.95]; pie(1,:) = [0.50 0.50]; for k=2:N pie(k,:) = pie(k-1,:)*A; end kk = 1:N; plot(kk,pie(:,1),'k-',kk,pie(:,2),'k:') xlabel('Iteration') ylabel('Probability') text1 = ['The steady-state probabilities are ',... num2str(pie(N,1)),' and ' ,num2str(pie(N,2)),'.']; legend('State 1','State 2',2) disp(text1) disp(' ') disp('The value of A^N is'), A^N % End of script file.
Executing the program results in the script
≫ c15_MMtransient The steady-state probabilities are 0.71412 and 0.28588. The value of A^N is ans = 0.7145 0.2855 0.7138 0.2862
We see that calculating the value of Πss iteratively using (15.9) and by raising A to a high value gives, as expected, consistent results. The manner in which the probability distribution converges to the steady-state value is shown in Figure 15.7.
Before leaving the two-state model, we need to discuss the error generation matrix, defined as
in which “C” denotes that a correct decision is made and “E” denotes that an error is made. By simple matrix multiplication it follows that the unconditional probability of a correct decision PC and error PE are given by
where Πss is the steady-state state distribution matrix and BT is the transpose of the error generation matrix B.
Note that, if all elements of B are nonzero, either state can produce an error although the error probabilities may be quite different for different states. Thus, upon observation of an error, the state that produced the error cannot be identified. It is for this reason that we call the model “hidden”. These models are referred to as hidden Markov models (HMMs).
Example 15.2.
In this example, the simulation of the channel using the Markov model is demonstrated and the system error probability is computed. Assume that errors can be produced in either state where, obviously, the probability of error in the good state will be much less than the error probability in the bad state. Specifically we define the conditional error probabilities as
and
For these error probabilities, the error generation matrix B takes the form
In addition, the Markov chain is defined by the transition matrix
as in the previous example. The MATLAB program for simulating the channel is
% File: c15_hmm2.m N = 100000; % number of iterations state = 'Good'; % initial state A = [0.98 0.02; 0.05 0.95]; % state transition matrix B = [0.0005 0.1000]; % second row of B out = zeros(1,N); % initialize matrix errors = 0; for i=1:N error = 0; % initialize error counter y = rand(1); % RV for state transition err = rand(1); % RV for error given state if state=='Good' % test for Good state if y<A(1,1) state='Good'; % remain in Good state if err<B(1); % test for error error = 1; % record an error end else state='Bad '; % transition to Bad state if err<B(2); % test for error error = 1; % record an error end end else % state = Bad if y<A(2,2); state='Bad '; % remain in Bad state if err<B(2); % test for error error = 1; % record an error end else state='Good'; % transition to Good state if err<B(1); % test for error error = 1; % record an error end end end errors = errors + error; % increment error counter end PE = errors/N % calculate error probability % End of script file.
Executing the program yields the following MATLAB result:
≫ c15_hmm2 PE = 0.0285
Thus, we have
Note that PE computed in this manner is a random variable. We now “sanity-check” this simulation result.
From the previous example, the steady-state probabilities are
From (15.16) we have
This gives
which is very close (PE = 0.0289) to the value given by the simulation. Thus, the simulation result appears reasonable.
A more general version of the two-state Markov model can be constructed with a larger number of states, with many good and bad states with varying degrees of “goodness” or “badness” corresponding to increasingly more severity of fading and or noise. The result is the N-state Markov model, which is a generalization of the two-state model considered in this section.
An N-state Markov (processes) model for a discrete communication channel is defined by a number of parameters. The set of states S is denoted by
and St, as before, denotes the state at time t. Thus, St ranges over the set S. We have interest in the probability, Πt,i, that the Markov model will be in state i at time t. We denote this as
The transition probabilities, aij, denote the probability of going from state i at time t(St = i) to state j at time t+1 (St+1 = j). In equation form
This transition normally takes place in a time increment equal to a bit or symbol duration. Finally we have the input-to-output transition (error symbol) probabilities. The error symbols for an M-ary symbol alphabet are denoted by the set
For the familiar binary case
where 0 denotes no error and 1 denotes an error. The probability that error symbol ek occurs given that the model is in state i is denoted bi(ek). In equation form we have
In binary hard decision applications, the parameters bi(ek) are represented by a matrix having two rows and N columns, where N represents the number of states in the model. Thus, the error generation matrix takes the form
where b1i denotes the probability of a correct decision given that the model is in state i, and b2i represents the probability of error given that the system is in state i. With this formulation, note that the columns of B must sum to one. For nonbinary applications, or binary soft-decision applications, the error generation matrix B will have more than two rows.
These parameters define a discrete-time Markov process operating at a transition rate equal to the symbol rate of the communication system, and the output of the process consists of two sequences: the sequence of states {St}, and a sequence of error symbols {Et}, where t is the time index that can be indexed over the integer set {0,1,2, . . .}. Normally, only the input and the output of the channel and hence the error sequence can be observed. The state sequence itself cannot be observed. Hence the state sequence is “hidden” or not visible from external observations and the Markov model is labeled as a hidden Markov model.
The Markov process of order m (or memory m), defined as the probability of the state at time t+1, St+1, is given by
which states that the probability is dependent upon the previous m states. For a first-order Markov process, the probability of the state at time t+1 is a function only of the previous state. In other words:
For channel models we will use a first-order Markov model. For modeling data sources, higher-order models may be required. As an example, modeling the symbol sequences in an English-language text may require a second-order or third-order Markov model, since the probability of occurrence of a given letter may depend on the previous two or three letters.
Since stationarity is usually assumed in modeling the analog portion of the communication channel, it is common to assume stationarity of the Markov model for discrete channels also. Stationarity implies that the parameters of the model, that is, the probabilities Πt,i, aij(t), and bi(ek) do not depend on t. In this case, we have
where the terms aki are the elements in the ith column of the state transition matrix
and Πk are the elements of the state probability distribution vector
As in the case of a two-state model, the n-stage transition matrix is given by An. As was demonstrated in Example 15.1 for a two-state model, (15.34) along with the constraint that
implies that the state probabilities, πi, are uniquely determined from the transition probabilities, aij. Hence the Markov model is completely defined by the matrix A of the state transition probabilities and B, where B is the matrix of the input-to-output symbol transition (i.e., error) probabilities
as previously discussed.
Once the discrete Markov model is derived, simulation of the model is relatively easy. We saw this in Example 15.2 for a two-state model, and the technique used in Example 15.2 is now generalized for an N-state model.
Assume that the model is in state k at time t and that both the next state and the corresponding component of the error vector, ek+1, is to be simulated. The first step is to generate two uniformly distributed random variables, U1 and U2. The first random variable, U1, is used to determine the new state. Given that the model is in state k, the transition probabilities are defined by the kth row of the state transition matrix A. The kth row of A is designated by the vector Ak where
The cumulative probabilities associated with Ak are denoted βk and are given by
Note that the MATLAB library function cumsum
maps the vector Ak into the a vector having elements βk. The probability of making a transition from state k to state i is, by definition, aki, and is given by
This probability is represented by the shaded area in Figure 15.8(a).
After the new state is established, the next step is to determine whether or not an error is made in the new state. In order to accomplish this, we compare the second uniform random variable U2 with the threshold γ shown in Figure 15.8(b). The probability of a correct decision is Pr{U2 < γ}, and the probability of error is Pr{U2 > γ}, which is the shaded region in Figure 15.8(b). Since the current state is i, the value of γ is given by the element b1i(row 1, column i) in the state observation matrix B.
The implementation of the simulation procedure is realized using the following MATLAB code:
u1 = rand(1); % get uniform RV 1 cum_sum = [0 cumsum(A(state,:))]; for i=1:total_states % loop to determine new state if u1 > = cum_sum(i) & u1 < cum_sum(i+1); state = i; % assign new state end end state_seq(t) = state; % new state u2 = rand(1); % get uniform RV 2 if u2 > B(1,state) out(t) = 1; % record error end
We now demonstrate this process in the following example.
Example 15.3.
In this example, a binary sequence of length N = 20,000 is generated representing symbol errors on a channel. This error sequence is generated using a three-state Markov model. If a given element in the sequence is a binary “one,” a channel symbol error is recorded for the position corresponding to the given element. A binary “zero” denotes that an error did not occur in the corresponding position.
We assume that errors can occur in any of the three states. Specifically we assume
and
which corresponds to the error probability matrix
The state transition matrix for the Markov model is assumed to be
We assume that the model is initially in state 1.
MATLAB code c15_errvector.m
, for generating the error sequence, is given in Appendix A. Execution of the program c15_errvector
requires as input data the parameters N and the matrices B and A. The default values are N = 20,000, and the matrices B and A are defined by (15.45) and (15.46), respectively. The program produces two outputs. These are the error vector out
, which gives the error positions in a sequence of N transmissions, and the state sequence vector state_seq
, which details the state transitions. The probability of finding the model in a given state can be computed from the state sequence vector, and the simulated error probability can be computed from the error vector. These calculations are carried out in the MATLAB program c15_hmmtest.m
. Execution of these two programs using the default values produces the following output, in which the dialog relating to the acceptance or rejection of default values is suppressed:
≫ c15_errvector ≫ c15_hmmtest Simulation results: The probability of State 1 is 0.2432. The probability of State 2 is 0.1634. The probability of State 3 is 0.5934. The error probability is 0.01445.
Note that the simulated values given for the state probabilities and the error probability are random variables. The variance of these random variables is reduced by increasing the length of the error vector.
A “sanity check” on the above values can be obtained by calculating the state probabilities and the error probability directly from the matrices B and A. The simple MATLAB code, together with the output, is as follows:
≫ A100 = A^100 A100 = 0.2353 0.1765 0.5882 0.2353 0.1765 0.5882 0.2353 0.1765 0.5882 ≫ PE = B*A100' PE = 0.9851 0.9851 0.9851 0.0149 0.0149 0.0149
Note that the theoretical state probabilities S1, S2, and S3 are 0.2353, 0.1765, and 0.5882, respectively, and that the theoretical error probability is 0.0149. These obviously differ slightly from the simulated values, which, as previously mentioned, are random variables. The variance of these random variables can obviously be reduced, as the student should verify, by choosing a larger value of N .
The characterization of discrete channels using discrete-time, finite-state Markov sequences has been proposed in the past by Gilbert, Fritchman, and others [11, 12]. The Gilbert model is a two-state model with a good error-free state, and a bad state with an error probability of p. This model was used by Gilbert to calculate the capacity of a channel with burst errors. Parameters of the model can be estimated from measured or simulated data using the procedures that will be explored in the following section.
The Fritchman model, first proposed in 1967 [12], is now receiving considerable attention, since the Fritchman model appears to be suitable for modeling burst errors in mobile radio channels and also because it is relatively easy to estimate the parameters of the Fritchman model from burst error distributions. For binary channels, Fritchman’s framework divides the state space into k good states and N − k bad states. The good states represent error-free transmissions, and the bad states always produce a transmission error. Hence the entries in the B matrix are zeros and ones and they need not be estimated. A Fritchman model with three good states and one bad state is illustrated in Figure 15.9. This model has the state transition matrix
The B matrix for this case takes the very simple form
In general, the state transition matrix A for the Fritchmann model can be partitioned as
where the submatrices represent the transition probabilities between various good and bad states. For this model, it is possible to derive analytically the expressions for burst error distributions in terms of the model parameters and use these expressions to estimate the parameters of the model from empirical (measured or simulated) burst error distributions.
Let {O1 ,O2, . . . , OT} be an error sequence with Ok = 1 indicating that the kth transmitted bit suffered a transmission error, and Ok = 0 indicating error-free transmission of the kth symbol. Also, let the notation (0m|1) denote the event of observing m or more consecutive error-free transmissions following an error, and (1m|0) represent the event of observing m or more consecutive errors following an error-free transmission. Fritchman showed that the probabilities of occurrence of these two events are given by the sum of weighted exponentials
and
where λi, i = 1, 2, . . ., k, and λi, i = k + 1, k + 2 , . . . ,N, are the eigenvalues of AGG and ABB, respectively, and the corresponding values of fi are functions of aij. From (15.50) and (15.51) we can obtain the probability of obtaining exactly m zeros (or ones) as
and
The Fritchman model can be interpreted as equivalent to a Markov process with a state transition probability matrix
where Λgg and Λbb are diagonal matrices given by
and
Note that in this equivalent model there are no transitions within the set of k good states and no transitions within the set of N − k bad states. Such transitions are indistinguishable from the observed error sequence, since a transition from one good state to another good state still produces no errors and the transition is not observable from the output. This structure will be revisited when semi-Markov models are considered in a later section.
The Fritchman model is not unique except when there is only one bad state. This is so because, in the general case, the error-free run distribution Pr(0m|1) and the error burst distribution Pr(1m|0) do not specify the statistical dependence of the error-free runs and the error bursts. In the case of a single error state model, as in Figure 15.9, Fritchman showed that [12]
From (15.57) it can be seen that, in the case of a single error state model, the error-free run uniquely specifies the 2(N − 1) model parameters. An empirical procedure is used to fit an N− 1 mixture of exponentials as in (15.50) and (15.51) to the (simulated or measured) error-free run distribution.
While Fritchman’s model is applicable to discrete channels with simple burst error distributions, it may not be adequate to characterize very complex burst error patterns that will require more than one error state in the model. In such cases it will be very difficult to estimate the model parameters from the burst error distributions alone. Fortunately, it is possible to find a maximum likelihood estimate of the parameters using iterative techniques.
The Markov model for a discrete channel is described by the N × N state transition matrix A and the M × N error probability generation matrix B. An iterative procedure for estimating these parameters γ = {A, B} from a given error sequence, obtained through simulation or measurement, = {O1,, ... ,Ot, ... OT} is based on the Baum-Welch algorithm [13]. This iterative algorithm is designed to converge to the maximum likelihood estimator of Γ = {A,B} that maximizes Pr(|Γ).
The goal is to compute estimates of the elements of the state-transition matrix A. These are given by
We also require the estimates of the elements of the error generation matrix, which are given by
The calculations required to implement the Baum-Welch algorithm are described in the following steps. In addition, the MATLAB program for implementing the Baum-Welch algorithm is developed. Two versions of the MATLAB code for implementing the detailed calculations are given. The first version, denoted the “basic code,” contains a number of looping operations and is therefore inefficient. It is, however, easier to relate the basic code to the defining equations. The second version, denoted the “more efficient version,” eliminates some or all of the looping operations in order to take advantage of MATLAB’s ability to implement vector operations. It should be noted that in several instances the code is not fully vectorized. This was done in order to have more readable code. The version of the Baum-Welch algorithm given in Appendix B was developed by combining the vectorized code segments.
Start with an initial (assumed) model Γ= {A,B}.
With Γ = {A,B} as the model, we first compute the “forward variables”
and the “backward variables”
for t = 1, 2, ..., T and i = 1, 2, ..,N. Details of these calculations follow.
Forward variables Calculation of the forward variables involves three steps: initialization, induction, and termination.
Initialization:
Induction:
Termination:
Note that
The basic MATLAB code for performing these calculations follows. (Note the use of the scaling factor. The scaling factor is described in the following section.)
alpha = zeros(len,states); % memory allocation for column = 1:states alpha(1,column) = pye(1, column)*b(1, column); hspace*50em% initialization end scale(1) = sum(alpha(1,:)); % normalizing factor alpha(1,:) = alpha(1,:)/scale(1,:); % normalization sum1 = 0; for t = 1:(len - 1) % induction for j = 1:states for i = 1:states inner_sum = alpha(t,i)*p(i,j); sum1 = sum1 + inner_sum; end alpha(t+1,j) = sum1*b(out(t+1)+1,j); sum1 = 0; end scale(t+1) = sum(alpha(t+1,:)); % normalizing factor alpha(t+1,:) = alpha(t+1,:)/scale(t+1); % normalization end
More efficient MATLAB code results by eliminating several of the looping operations. This yields
alpha = zeros(len,states); % memory allocation alpha(1,:) = pye.*b(1,:); % initialization scale(1) = sum(alpha(1,:)); % normalizing factor alpha(1,:) = alpha(1,:)/scale(1); % normalization for t = 1:len-1 % induction alpha(t+1,:) = (alpha(t,:)*p).*b(out(t+1)+1,:); scale(t+1) = sum(alpha(t+1,:)); % normalizing factor alpha(t+1,:) = alpha(t+1,:)/scale(t+1); % normalization end
Backward variables Calculation of the backward variables involves two steps: initialization and induction.
Initialization:
Induction:
The details of these calculations are illustrated in Figure 15.10. The basic MATLAB code for this segment follows:
beta = zeros(len, states); % memory allocation beta(len,:) = 1/scale(len); % initialization and scaling for t = (len-1):-1:1 % induction for i = 1:states for j = 1:states inner_sum = p(i,j)... * b(out(t+1)+1,j) * beta(t+1,j); sum2 = sum2 + inner_sum; end beta(t,i) = sum2; sum2 = 0; end beta(t,:) = beta(t,:)/scale(t); % scaling end
The more efficient (partially vectorized) version follows:
beta = zeros(len, states); % memory allocation beta(len,:) = 1/scale(len); % initialization for t = len-1:-1:1 % induction beta(t,:) = (beta(t+1,:).*b(out(t+1)+1,:))... *(p')/scale(t); end
The next step is to compute γt(i) according to
The basic MATLAB code for implementing this calculation follows:
gamma = zeros(len,states); % memory allocation for i = 1:len for j = 1:states gamma(i,j) = alpha(i,j)*beta(i,j); % calculation of gamma % variable end gamma(i,: = gamma(i,:)/sum(gamma(i.:)); end
We observe that we don’t need to keep all of these values in memory, since they can be summed for the time indices. The following MATLAB code explains this further:
gamma_sum = zeros(1,states); % memory allocation for t = 1:len gamma_sum = gamma_sum + alpha(t,:).*beta(t,:); end
The quantity ξt(i, j) is defined by
which can be calculated as follows:
prob_model_given_seq = zeros(1,len); % memory allocation % for model eta = zeros(states,states,len); % memory allocation for eta for t = 1:len % start the loop for i = 1:states prob_model_given_seq = zeros(1,len); % memory allocation temp(i) = alpha(t,i)*beta(t,i); end prob_model_given_seq(t) = sum(temp); % probability of model end for i = 1:states for j = 1:states for t = 1:(len-1) eta(i,j,t)=((alpha(t,i)*p(i,j)... *b(out(t+1)+1,j)*beta(t+1,j))); end end end
Careful MATLAB programming indicates that we don’t need to save this variable for all the time indices. Rather we can compute
eta = zeros(states,states); % memory allocation sum_eta = zeros(states,states) % memory allocation for t = 1:(len-1) for i = 1:states eta(i,:)= ((alpha(t,i)*(p(i,:). ... *(b(out(t+1)+1,:))).*beta(t+1,:))); end sum_eta = sum_eta + eta; % adds the values of eta end
We now determine the new state state transition probability âij according to
The basic MATLAB code is
for i = 1:states for j = 1:states p_estimate(i,j) = sum(eta(i,j,:))/(sum(gamma(:,i)) -gamma(len,i)); end p_estimate(i,:) = p_estimate(i,:)/sum... (p_estimate(i,:)); % normalization end
Alternatively:
for i = 1:states for j = 1:states p_estimate(i,j) = sum_eta(i,j)... /(gamma_sum(i)-alpha(len,i).*beta(len,i)... /(sum(alpha(len,:).*beta(len,:)))); end p_estimate(i,:) = p_estimate(i,:)/sum(p_estimate(i,:)); end
Next, , defined by
is computed. The basic MATLAB code is
for j = 1:states i = find(out==0); % find indices of correct bits for k = 1:length(i) sum_gamma = sum_gamma +gamma(i(k),j); end b(1,j) = sum_gamma/sum(gamma(:,j)); sum_gamma = 0; end for j = 1:states ii = find(out==1); % find the indices of errors for k = 1:length(ii) sum_gamma = sum_gamma +gamma(ii(k),j); end b(2,j) = sum_gamma/sum(gamma(:,j)); sum_gamma = 0; end for i = 1:states b(:,i) = b(:,i)/sum(b(:,i)); end
Using more efficient computations, the output symbol probability matrix can be estimated as
out_0 = find(out == 0); % find correct bits indices out_1 = find(out == 1); % find indices of errors sum_0 = zeros(1,states); sum_1 = zeros(1,states); gamma_sum = sum(gamma); for i = 1:length(out_0) sum_0 = sum_0 + gamma(out_0(i),:); % adds correct bits end for i = 1:length(out_1) sum_1 = sum_1 + gamma(out_1(i),:); % adds error bits end for i = 1:states for j = 1:2 if j == 1 b(j,i) = sum_0(i)/gamma_sum(i); % elements b correct bits end if j == 2 b(j,i) = sum_1(i)/gamma_sum(i); % elements b error bits end end end for i = 1:states b(:,i) = b(:,i)/sum(b(:,i)); % normalize the b matrix end
We can also compute
Go back to Step 1 with the new values of or equivalently obtained in Step 2 and repeat until the desired level of convergence, as discussed in a section to follow, is reached.
The forward and backward vectors tend to zero exponentially for large data size and must be scaled properly in order to prevent numerical underflow. The scaling constant, the use of which is implemented in the MATLAB code given in Appendix B, is first defined as
The scaled values of αt(j), denoted , are given by
This, of course, implies that
The values of Ct are saved and used to scale the backward variables. The scaled values of βt(i), denoted are given by
with the initialization
where 1 denotes the column vector containing all ones. The gamma variable can also be normalized if desired, but scaling the gamma variables is not necessary. Turin [1] discusses scaling in more detail.
Since the Baum-Welch algorithm is iterative, the number of iterations to be performed for a required level of model accuracy must be determnined. Perhaps the best way to accomplish this is to display the estimates of A and B as the algorithm is executing. If one desires each element of A and B to be accurate to a given number of significant figures, execution of the algorithm is allowed to continue until the elements of A and B no longer change, within the given accuracy, from iteration to iteration. The algorithm is then terminated manually. This technique has considerable appeal, since the level of accuracy is known. Also, based on previous knowledge, one may simply perform a given number of iterations.
Another commonly used method for determining convergence is to continue the iterations until successive values of Pr[|Γ] differ very little. (The Baum-Welch algorithm is guaranteed to converge to a maximum likelihood solution. A proof of this statement is given in [1].) The value of Pr[|Γ] is determined in terms of the scaling constant Ct in (15.73). Specifically
For T large, this number will be very small and is usually expressed as
This is referred to as the log-likelihood ratio and is illustrated in Example 15.4.
It should be pointed out that the estimates of A and B, for a given set of data, are not unique unless the initial estimates of A, B, and Π are specified. Since the error vector is a function of both A and B, various combinations may produce statistically equivalent results and a specific result will depend on the initial conditions.
The Baum-Welch algorithm is one of many reestimation algorithms available for the computation of model parameters based on an error vector. This error sequence could involve many thousands, or even millions, of symbols. Note that long observation periods are required to accurately estimate small values of aij. In communication systems, these small values are often critical for estimating system performance. The computational burden associated with the Baum-Welch algorithm is quite high when the error vector is long, since the forward and backward variables are computed for each symbol in the given error sequence. In addition, convergence is sometimes slow, which adds to the computational burden. For low error probability applications, the error vector has long runs of zeros. In addition, the error vector must contain a significant number of error events in order for the error vector to accurately define the model. In these situations the Baum-Welch algorithm is especially inefficient and faster algorithms are needed.
One method, proposed by Turin [14], for dealing with this problem, involves the computation of the forward and backward variables using a block matrix version of the form
The relationship between the terms in the matrix defined by (15.79) are illustrated in Figure 15.11. Note that the powers of submatrices involved in the computations can be precomputed and reused in order to reduce the overall computational burden.
Another modification proposed by Sivaprakasam and Shanmugan [6] is based on the fact that for a general Markov model there is a statistically equivalent Fitchman-like model with k good states and N − k bad states and an A matrix having the form
where Λ00 and Λ11 are diagonal matrices. These models are often referred to as block diagonal Markov models. With this model, the channel remains in the same state during an error burst and changes state only at the end of a burst. As a result, all variables are computed only at time steps involving the change of error symbol. In other words, variables are computed at the beginning of each burst of errors rather than once every symbol. With this modification, the computations take the form illustrated in Figure 15.12. Note that the computations during long bursts of ones and zeros now involve raising the power of a diagonal matrix rather than raising the power of arbitrary matrices, and matrix multiplications occur only at the transition from one burst to another. Thus, the computations are very efficient, and long error bursts can be processed without excessive storage and computational requirements.
Using this model we have interest only in the run lengths of zeros and ones. Thus, the error vector can be placed in a very compact form. As a simple illustration, consider an error sequence
The run-length vector corresponding to this error sequence can be written as
The MATLAB routine c15_seglength.m
, given in Appendix D, generates the run-length vector defined by (15.83) from the error vector defined by (15.81).
The development of reestimation algorithms based on block equivalent Markov models will not be discussed here. They have, however, been extensively studied in the literature [1, 14, 15]. A MATLAB program, based on the work of Sivaprakasam and Shanmugan [15] is given in Appendix C. An example illustrating the use of this algorithm is presented in a following section.
Irrespective of the algorithms used, the Markov model for a discrete channel must be computed for different values of the parameters of the underlying physical channel. If the underlying physical channel is changed in any way, the discrete channel model must be reestimated. For example, if a discrete channel model is developed for a given channel, then the Markov model must be developed from measurements or waveform-level simulations for different values of the parameters of the channel including S/N. Thus a parametrized set of Markov models for the underlying channel can be developed and used for the design and analysis of error control coders, interleavers, etc.
We conclude this chapter with two examples that demonstrate the determination of a Markov channel and the determination of a semi-Markov model. The flow of these examples is as follows:
In the first example (Example 15.4), the error vector generated in Example 15.3 is used as input to the Baum-Welch algorithm. The result is an estimated channel model. The error probability and the run-length distribution resulting from this model are determined and are then compared to the error probability and run-length distribution of the original sequence generated in the first example.
The second example (Example 15.5) is similar to the first example. The essential difference is that a block equivalent semi-Markov model is used rather than a Markov model. The motivation for using a semi-Markov model is that the semi-Markov model leads to a substantial reduction in the time required to derive the model from measured or simulated data.
Example 15.4. In this example we generate an error sequence of length N = 20,000 with a known two-state model defined by
and
The Baum-Welch algorithm is then used to estimate the model using the generated error sequence. Note that we assume a three-state model when the Baum-Welch model is applied. This may at first seem strange but, given simulation or measurement data, the channel model that produced the data is not known. The MATLAB dialog, with some of the nonessential dialog eliminated, is as follows:
≫ c15_errvector Accept default values? Enter y for yes or n for no > n Enter N, the number of points to be generated > 20000 Enter A, the state transition matrix > [0.95 0.05; 0.1 0.9] Enter B, the error distribution matrx > [0.95 0.5; 0.05 0.5] ≫ out1 = out; ≫ c15_bwa(20,3,out) Enter the initial state transition matrix P > [0.9 0.05 0.05; 0.1 0.8 0.1; 0.1 0.2 0.7] Enter the initial state probability vector pye > [0.3 0.3 0.4] Enter the initial output symbol probability matrix... B > [0.9 0.8.7; 0.1 0.2 0.3]
After one iteration we have[4]
Also, after 10 iterations, the estimated state transition matrix is
At the 20th iteration, which is the termination point, the estimated state transition matrix is
The log likelihood function is illustrated in Figure 15.13. It can be seen that the log likelihood function converges in about 10 iterations. Note that this conclusion is consistent with the preceding computations of Ak and Bk for k = 10 and k = 20. Note also that, as discussed previously, the likelihood numbers are very small.
We now generate a second sequence, out2
, using the model estimated by the Baum-Welch algorithm. This gives, with the dialog dealing with default values suppressed:
≫ a = [0.9437 0.0299 0.0263; 0.1173 0.7425 0.1402;... 0.0663 0.1324 0.8013]; ≫ b = [0.9522 0.6637 0.4313; 0.0478 0.3363 0.5687]; ≫ c15_errvector Accept default values? Enter y for yes or n for no > n Enter N, the number of points to be generated > 20000 Enter A, the state transition matrix > a Enter B, the error distribution matrix > b ≫ out2 = out;
In order to calculate and plot Pr{0m|1} for both out1
and out2
, we execute the following lines of MATLAB code:
≫ runcode1 = c15_seglength(out1); ≫ runcode2 = c15_seglength(out2); ≫ c15_intervals2(runcode1,runcode2)
This produces the plots illustrated in Figure 15.14. The top frame illustrates Pr{0m|1} for the original data and the bottom frame illustrates Pr{0m|1} using the data generated by the model estimated by the Baum-Welch algorithm.
There are a few obvious differences between the two plots illustrated in Figure 15.14, especially in the neighborhood of m = 10. Note, however, that the error probabilities are given by
≫ p1 = sum(out1)/N p1 = 0.0038 ≫ p2 = sum(out2)/N p2 = 0.0035
Example 15.5. In this final example, we consider a test case used by Sivaprakasam and Shanmugan [6] to illustrate the block diagonal Markov model. Assume a three-state model having two good states and one bad state, defined by[5]
Since the good states are error-free and errors are always produced in the bad state, the error observation matrix is defined by
Note that this model is semi-hidden. If an error occurs, we know that the error was generated by state 3. If, however, no error is generated, the state cannot be identified.
First, we generate an error vector, having length 20,000, for the state transition matrix A and the error observation matrix B. The appropriate MATLAB commands, with the nonessential dialog supressed, follows:
≫ a1 = [0.90 0.02 0.08; 0.10 0.50 0.40; 0.10 0.20 0.70]; ≫ b1 = [1 1 0; 0 0 1]; ≫ c15_errvector Accept default values? Enter y for yes or n for no > n Enter N, the number of points to be generated > 20000 Enter A, the state transition matrix > a1 Enter B, the error distribution matrix > b1 ≫ out1 = out; ≫ runcode1 = c15_seglength(out1); ≫ [A matrix, pi_est] = c15_semiMarkov(runcode1,100,[2 1])
Note that we must run c15_seglength
prior to running c15_semiMarkov
in order to generate the run-length vector.
After 50 iterations, the estimate of the state transition matrix is
Note that this is close to, but not exactly equal to, the result obtained by Sivaprakasam and Shanmugan [6]. There are several explanations for the differences. First, the error vector upon which the estimated model is based is a sample function of a stochastic process and, for practical purposes, no two error vectors will be identical. Also, error vectors of different lengths will result in slightly different models. Finally, we fixed the number of iterations, and Sivaprakasam and Shanmugan used a different stopping criteria.
The next step is to generate a second error vector based on the estimated model and plot Pr{0m|1} for both the original error vector and for the error vector generated by the estimated model. This is accomplished using the following MATLAB code:
≫ a2 = [0.9043 0.0 0.0957; 0.0 0.4911 0.45089; 0.1402 0.1515 0.7083]; ≫ b2 = [1 1 0; 0 0 1]; ≫ c15_errvector Accept default values? Enter y for yes or n for no > n Enter N, the number of points to be generated > 20000 Enter A, the state transition matrix > a2 Enter B, the error distribution matrix > b2 ≫ out2 = out; ≫ runcode2 = c15_seglength(out2); ≫ intervals2(runcode1,runcode2);
The results are illustrated in Figure 15.15. Note that the results are in close agreement even though the original model, defined by A, and the estimated model, defined by Â, are quite different. Obviously we knew from the start of this problem that A and  would be different, since  is constrained to have a block diagonal form.
It is also instructive to check the error probabilities. In order to accomplish this we compute
≫ pe1 = sum(out1)/20000 pe1 = 0.3617 ≫ pe2 = sum(out2)/20000 pe2 = 0.3538
Once again we have close agreement.
The focus of this chapter was the development of discrete channel models. Discrete channel models are attractive, since their use greatly reduces the computational complexity associated with the execution of a simulation. The parameters of the model may be determined from either measured data or from the results of a waveform-level simulation. Discrete channel models have found widespread use in wireless communication systems, since they can be used to effectively model fading channels. Discrete channel models are an abstraction of waveform-level models in that they characterize the input-output characteristics of the channel but do not model the physical functionality of the channel. It is through this abstraction that the computational burden is reduced.
A two-state model was initially considered in order to establish the concept of a state-transition matrix, the state distribution vector, and the error generation matrix. Several different techniques for determining the steady-state distribution matrix were considered. These concepts were then extended to the N-state model including the development of a technique for efficiently simulating a channel based on a discrete channel model.
We next considered the two-state Gilbert model and the N-state Fritchman model as examples of a hidden Markov model. The Fritchman model is attractive from a computational point of view, since the state-transition matrix contains a large number of zeros (relatively sparse). The Fritchman model has also been shown to effectively model the fading channel environments encountered in wireless communication systems.
The heart of this chapter dealt with the development of discrete channel models based on measured data or data obtained from a waveform-level simulation. The basic tool used for parameter estimation is the Baum-Welch algorithm. The development of the Baum-Welch algorithm was briefly outlined and the development of a MATLAB implementation of the Baum-Welch algorithm was presented. Scaling to prevent underflow was discussed, as was the block equivalent Markov model, which can be estimated more efficiently than the general Markov model.
The chapter concluded with three examples summarizing techniques for estimating both Markov models and block-equivalent Markov models. The first example focuses on the generation of an error vector given a Markov model. Validation of the result is based on the probability of state occupancy and the probability of error. The second example uses the Baum-Welch algorithm to estimate a channel model. Validation includes a comparison of the run-length statistic Pr{0m|1} for both the original error vector and the error vector produced by the estimated model. The third example is similar to the second example except that a block equivalent Markov model is used. These three examples should serve as a guide to the use of the tools developed in this chapter.
The development and use of Markov models for burst error channels (channels with memory) are currently an active area of research and new papers appear frequently. The interested reader may find recent articles on this topic in IEEE Transactions on Communications, IEEE Transactions on Vehicular Technology, and other related journals and conference proceedings. Markov models are also being developed for burst errors at the higher layers of communication networks. For example, packet errors at the transport layer in a network. The book by Turin [1] is recommended as a comprehensive treatment of the subject.
15.1 | Assume that a binary symmetric channel is defined by the error probability p = 10−2. Using the technique defined by (15.2), with pk independent of k, simulate the transmission of N symbols through the channel for N = 100, N = 1,000 and N = 10,000. In each case count the number of errors that actually occur and calculate the BER resulting from the simulation. Repeat this experiment ten times for each value of N. What do you conclude? |
15.2 | Rework Example 15.1 assuming that |
15.3 | Repeat the preceding problem assuming that the initial state probability distribution and the state transition matrix |
15.4 | A Markov process has the state transition matrix Determine the steady-state distribution Πss using the technique illustrated in Example 15.1. Verify the result by raising A to a sufficiently high power. |
15.5 | We know that Πss can be determined by raising the state transition probability matrix to a high power. As yet another method for solving this problem, recognize that Πss = ΠssA implies that Πss is an eignvector of A. Using this observation, develop a MATLAB program for determining Πss given the state transition matrix A. Apply this technique to |
15.6 | Develop a MATLAB program to generate a Markov sequence based on the state transition matrix Assuming that the system is initially in state 1, generate 200 state transitions and develop a figure that illustrates the state of the channel for each time step. Next generate 40,000 state transitions and, from the simulation results, compute the steady-state probabilities of each state. Verify the result by computing Ak for sufficiently large k. |
15.7 | Consider the channel model illustrated in Figure 15.16. The probability of error given that the channel is in the good state is 0 and the probability of error given that the channel is in the bad state is 1.
|
15.8 | Develop a figure similar to Figure 15.9 for the Fritchman model |
15.9 | Generalize |
15.10 | Rework Example 15.4 by estimating a two-state model. Compare the estimated model with the original two-state model. Plot the estimated parameters as a function of the number of iterations up to 20 iterations. Discuss the convergence. |
15.11 | Rework Example 15.4 using, as initial conditions to the Baum-Welch algorithm, the matrices A and B that were used to generate the original error vector. Discuss the convergence. |
15.12 | Rework Example 15.4 with N = 4, 3, and 2, where N is the number of states in the stimated model. Compare the likelihood metric after convergence is reached. Use as input to the Baum-Welch algorithm the data generated by the two-state model used in Example 15.2. Work this problem several times using different initial conditions for the Baum-Welch algorithm and discuss the results. |
15.13 | Modify the Baum-Welch algorithm given in Appendix A ( |
15.14 | Modify the MATLAB code for Example 15.4 so that the CPU time spent in the model estimation algorithm can be measured. After this is accomplished, rework Example 15.4 for error vectors having length N = 20,000 and 100,000. Compare the times required to estimate the model. Also compare the error probabilities. |
15.15 | Rework Example 15.5 assuming two good states and two bad states. |
15.16 | Assume that a discrete channel is defined by the state transition matrix Determine an equivalent block diagonal Markov model having two good states and two bad states. (Note: This channel model represents another test case presented by Sivaprakasam and Shanmugan [6]. The student should compare the result with that found by Sivaprakasam and Shanmugan.) |
% File: c15_errvector.m disp(' ') disp('Default values are:') N = 20000 % default N A = [0.8 0.1 0.1; 0.2 0.6 0.2; 0.02 0.08 0.90] % default A B = [0.999 0.95 0.99; 0.001 0.05 0.01] % default B disp(' ') disp('Accept default values?') dtype = input('Enter y for yes or n for no > ','s'), if dtype == 'n' N = input(' Enter N, the number of points to be generated > '), A = input(' Enter A, the state transition matrix > '), B = input(' Enter B, the error distribution matrix > '), end state = 1; % initial state total_states = size(A,1); out = zeros(1,N); % initialize error vector state_seq = zeros(1,N); % initialize state sequence h = waitbar(0,'Calculating Error Vector'), % ran_b = rand(1); % get random number if ran_b>B(1,state) % test for error out(1) = 1; % record error end state_seq(1) = state; % record state for t=2:N u1 = rand(1); % get random number cum_sum = [0 cumsum(A(state,:))]; for i=1:total_states % loop to determine new state if u1 >= cum_sum(i) & u1 < cum_sum(i+1); state = i; % assign new state end end state_seq(t) = state; % new record state u2 = rand(1); % get random number if u2 > B(1,state)} out(t) = 1; % record error end waitbar(t/N) end close(h) % End of script file.
Note: This program is for a three-state model. It is easily generalized for an arbitrary number of states.
% File: c15_hmmtest.m pe = sum(out)/N; state_sum = zeros(1,total_states); for k=1:N if state_seq(k)==1 state_sum(1)=state_sum(1)+1; end if state_seq(k)==2 state_sum(2)=state_sum(2)+1; end if state_seq(k)==3 state_sum(3)=state_sum(3)+1; end end a = ['The probability of State 1 is ',num2str(state_sum(1)/N),'.']; b = ['The probability of State 2 is ',num2str(state_sum(2)/N),'.']; c = ['The probability of State 3 is ',num2str(state_sum(3)/N),'.']; d = ['The error probability is ',num2str(pe),'.']; disp('Simulation results:') disp(a) % display probability of state 1 disp(b) % display probability of state 2 disp(c) % display probability of state 3 disp(d) % display error probability % End script file.
% File: c15_bwa.m function [p, pye, b] = c15_bwa(iterations,states,out); len = length(out); p = input('Enter the initial state transition matrix P > '), pye = input('Enter the initial state probability vector pye >'), b = input('Enter the initial output symbol probability matrix B > '), alpha=zeros(len,states); beta=zeros(len,states); eta=zeros(states,states); gamma=zeros(1,states); scale=zeros(len,1); log_likelihood = zeros(1,iterations); iplot = 1; % plot switch % p % display initial p pye % display initial pye % pye_rec = zeros(states,1); pye_rec(:,1) = pye'; sum_gamma = 0; sum_eta = 0; % for cycle = 1:iterations cycle % display iteration index % % alpha generation % alpha(1,:) = pye.*b(1,:); scale(1) = sum(alpha(1,:)); alpha(1,:) = alpha(1,:)/scale(1); for t = 2:len alpha(t,:) = (alpha(t-1,:)*p).*b(out(t)+1,:); scale(t) = sum(alpha(t,:)); alpha(t,:) = alpha(t,:)/scale(t); end % % beta generation % beta(len,:) = 1/scale(len); for t = len-1:-1:1 beta(t,:) = (beta(t+1,:).*b(out(t+1)+1,:))*(p')/scale(t); end % % eta generation % sum_eta = zeros(states); for t = 1:len-1 for i = 1:states eta(i,:) = ... ((alpha(t,i)*(p(i,:).*(b(out(t+1)+1,:))). ... *beta(t+1,:)))... /sum(alpha(t,:).*beta(t,:)); end sum_eta = sum_eta + eta; end % % gamma generation % gamma_sum = zeros(1,states); for t = 1:len gamma_sum = gamma_sum + alpha(t,:).*beta(t,:); end % % calculate and display the log_likelihood function % loglikelihood = sum(log10(scale)); log_likelihood(cycle) = loglikelihood; loglikelihood % display result % % Re-estimation of the intial state probability vector pye % pye(1,:) = alpha(1,:).*beta(1,:)/sum(alpha(1,:).*beta(1,:)); % pye % display pye % pye_rec(:,cycle+1) = pye'; % Save for plot % % Re-estimation of the state transition matrix P % for i = 1:states for j = 1:states p_estimate(i,j) = ... sum_eta(i,j)/(gamma_sum(i)-alpha(len,i). ... *beta(len,i)... /(sum(alpha(len,:).*beta(len,:)))); end p_estimate(i,:) = p_estimate(i,:)/sum(p_estimate(i,:)); end % p = p_estimate % display p % % Re-estimation of output symbol probability matrix B % out_0 = find(out == 0); out_1 = find(out == 1); sum_0 = zeros(1,states); sum_1 = zeros(1,states); for i = 1:length(out_0) sum_0 = sum_0 + alpha(out_0(i),:).*beta(out_0(i),:)... /sum(alpha(out_0(i),:).*beta(out_0(i),:)); end for i = 1:length(out_1) sum_1 = sum_1 + alpha(out_1(i),:).*beta(out_1(i),:)... /sum(alpha(out_1(i),:).*beta(out_1(i),:)); end for i = 1:states for j = 1:2 if j == 1 b(j,i) = sum_0(i)/gamma_sum(i); end if j == 2 b(j,i) = sum_1(i)/gamma_sum(i); end end end for i = 1:states b(:,i) = b(:,i)/sum(b(:,i)); end b % display b end if iplot==1 plot(1:iterations,log_likelihood) xlabel('iterations') ylabel('log likelihood') end % End of function file.
% File: c15_semiMarkov.m function [A_matrix, pi_est] = c15_semiMarkov(runlength,cycles,... partition) % runlength = runlength code % cycles = number of iterations % partition = 1 by 2 vector = [good states, bad states] % [symbols len_symbols] = size(runlength); % gets size of runlength % vector m = runlength(1,:); % the first bit received is error free u = runlength(2,:); % arbitrary number of elements C = len_symbols; % total length of runlength vector % A = cell(length(partition)); % a 2x2 array only 2 symbols pye = rand(1,sum(partition)); % initialize the initial state vector pye = pye(1,:)/sum(pye(1,:)); % normalize pi_u1 = pye(1:partition(1)); % % initialize A matrix % A = cell(partition); % allocate memeory for the A matrix A00 = diag(1 - abs(randn(partition(1),1)/1000)); % initialize A00 % matrix A10 = rand(partition(2),partition(1)); % initialize A10 % matrix A01 = rand(partition(1),partition(2)); % initialize the % A01 matrix A11 = diag(1 - abs(randn(partition(2),1)/1000)); % initialize A11 % matrix A{1,1} = A00; A{1,2} = A01; A{2,1} = A10; A{2,2} = A11; A_matrix = [A{1,1} A{1,2};A{2,1} A{2,2}]; % cell-array in % matrix form % for i = 1:sum(partition) A_matrix(i,:) = A_matrix(i,:)/sum(A_matrix(i,:)); % normalize the % A matrix end A_matrix; % A{1} = A_matrix(1:partition(1),1:partition(1)); A{2} = A_matrix(partition(1)+1:partition(1)+partition(2),1:... partition(1)); A{3} = A_matrix(1:partition(1),partition(1)+1:partition(1)... +partition(2)); A{4} = A_matrix(partition(1)+1:partition(1)+... partition(2),partition(1)+1:partition(1)+partition(2)); % for iterations = 1:cycles % % alpha generation % alpha{1} = pi_u1*(A{u(1)+1,u(1)+1}.^(m(1)-1)); scale(1)= ... sum(alpha{1}); alpha{1}= alpha{1}/scale(1); % normalization for c = 2:C alpha{c}= alpha{c-1}*A{u(c-1)+1,u(c)+1}*A{u(c)+1,... u(c)+1}^(m(c)-1); scale(c)= sum(alpha{c}); % scaling factor alpha{c}= alpha{c}/scale(c); % normalize alpha end; % % beta generation % beta{C}= ones(partition(u(C)+1),1)/scale(C); % last element of % beta for(c= C-1:-1:1) beta{c}= A{u(c)+1,u(c+1)+1}*(A{u(c+1)+1,u(c+1)+1}... ^(m(c+1)-1))*beta{c+1}/scale(c); end; % % gamma generation % Gamma{1} = alpha{1}.*beta{1}'; Gamma{2} = alpha{2}.*beta{2}'; % sum_Tii_00s = diag(zeros(partition(1),1)); % initialization % of A00 sum_Tii_11s = diag(zeros(partition(2),1)); % initialization % of A11 sum_Tij_01s = zeros(partition(1),partition(2)); % initialization % of A01 sum_Tij_10s = zeros(partition(2),partition(1)); % initialization % of A10 % % re-estimation for the A00 matrix % for c=1:2:C-1 if ( c == 1) Tii_00s = diag((m(1)-1)*(pi_u1)'.*(diag(A{u(1)+1,... u(1)+1}).^(m(1)-1)).*beta{1}); else Tii_00s = diag((m(c)-1)*((alpha{c-1}*A{u(c-1)+1,... u(c)+1})'.*(diag(A{u(c)+1,u(c)+1}^(m(c)-1)))). ... *beta{c}); end sum_Tii_00s = sum_Tii_00s + Tii_00s; % sum elements % of A00 end % % re-estimation for the A11 matrix % for c=2:2:C-1 Tii_11s = diag((m(c)-1)*((alpha{c-1}*A{u(c-1)+1,u(c)+1})'... .*(diag(A{u(c)+1,u(c)+1}^(m(c)-1)))).*beta{c}); sum_Tii_11s = sum_Tii_11s + Tii_11s; % sum elements % of A11 end % % re-estimation for the A01 matrix % for c=1:2:C-1 Tij_01s = (alpha{c}'*((A{u(c+1)+1,u(c+1)+1}^(m(c+1)-1))... *beta{c+1})').*A{u(c)+1,u(c+1)+1}; sum_Tij_01s = sum_Tij_01s + Tij_01s; % sum elements % of A01 end % % re-estimation for the A10 matrix % for c=2:2:C-1 Tij_10s = (alpha{c}'*((A{u(c+1)+1,u(c+1)+1}^(m(c+1)-1))... *beta{c+1})').*A{u(c)+1,u(c+1)+1}; sum_Tij_10s = sum_Tij_10s + Tij_10s; % sums elements of A10 end % A_matrix = [sum_Tii_00s sum_Tij_01s; sum_Tij_10s sum_Tii_11s]; % for i = 1:sum(partition) A_matrix(i,:) = A_matrix(i,:)/sum(A_matrix(i,:)); % normalize % A end % A{1} = A_matrix(1:partition(1),1:partition(1)); A{2} = A_matrix(partition(1)+1:partition(1)+partition(2),1:... partition(1)); A{3} = A_matrix(1:partition(1),partition(1)+1:partition(1)... +partition(2)); A{4} = A_matrix(partition(1)+1:partition(1)+partition(2),... partition(1)+1:partition(1)+partition(2)); % pi_est = [Gamma{1} Gamma{2}]; % re-estimated initial state vector pi_est = pi_est/sum(pi_est); % normalized initial state vector pi_rec(iterations,:) = pi_est; pi_u1 = pi_est(1:partition(1)); iterations % display current iteration A_matrix % display estimated A matrix end % End of function file.
% File: c15_seglength.m function runcode=c15_seglength(errvect) % Produces a two-row matrix of error intervals and error-free % intervals. Row 1 specifies the interval length and row 2 % specifies the interval class (error(1) or no error(0)). % len = length(errvect); % length of input vector j = 1; % initialize index of m count = 1; % initialize counter for i=1:(len-1) if errvect(i+1) == errvect(i); % compare elements count = count+1; % on match increment count else m(j) = count; % record count j = j+1; % increment index of m count = 1; % reset counter end end % runcode = zeros(2,length(m)); % allocate memory runcode(1,:) = m; % assign counts to row 1 % if errvect(1)==0 runcode(2,2:2:length(m)) = 1; % even index error count else runcode(2,1:2:length(m)) = 1; % odd index error count end % End of function file.
In this appendix two MATLAB programs are given for determing a plotting the error-free distribution. The first program is designed for a single data file, and the second program is for the comparison of two data files.
% File: c15_intervals1.m function [] = c15_Intervals1(r1); start = find(r1(2,:)==0); % index of first % interval maxLength_1 = max(r1(1,start(1):2:length(r1))); % maximum length % of interval interval_1 = r1(1,start(1):2:length(r1)); % get the intervals for i = 1:maxLength_1 rec_1(i) = length(find(interval_1>=i)); % record the % intervals end int1out = rec_1/max(rec_1); figure; plot(1:maxLength_1,int1out) grid; ylabel('Pr(0m|1)'), xlabel('Length of intervals m'), % End of function file.
% File: c15_intervals2.m function [] = c15_intervals2(r1,r2); start1 = find(r1(2,:)==0); % index of first % interval maxLength_1 = max(r1(1,start1(1):2:length(r1))); % maximum length % of interval interval_1 = r1(1,start1(1):2:length(r1)); % get the % intervals for i = 1:maxLength_1 rec_1(i) = length(find(interval_1>=i)); % record the % intervals end start2 = find(r2(2,:)==0); % index of first % interval maxLength_2 = max(r2(1,start2(1):2:length(r2))); % maximum length % of interval interval_2 = r2(1,start2(1):2:length(r2)); % get the % intervals for i = 1:maxLength_2 rec_2(i) = length(find(interval_2>=i)); % record the % intervals end subplot(2,1,1) plot(1:maxLength_1,rec_1/max(rec_1)) v = axis; grid; ylabel('Pr(0m|1)'), xlabel('Original sequence - Length of intervals m'), subplot(2,1,2) plot(1:maxLength_2,rec_2/max(rec_2)) axis([v]) grid; ylabel('Pr(0m|1)'), xlabel('Regenerated sequence - Length of intervals m'), % End of function file.
[1] Recall that the Hamming weight of a binary vector, consisting only of the elements “0” and “1,” is equal to the number of binary ones in the vector.
[2] Note that in this formulation we are not using t as a continuous variable but as a time index that ranges over a set of integers.
[3] We have interest only in those process for which the state distribution converges to a steady-state value.
[4] It should be mentioned that rerunning this program may result in estimates for A and B that are different from the results given here, even though the same parameters and initial conditions are used. The reason for this lies in the fact that the error vector, upon which the estimates of A and B are based, is a sample function of a random process. In addition, the rows of  and the columns of may not sum to one when the elements of  and are represented with finite precision.
[5] Note that A represents a physical channel and does not have a block diagonal form. The estimated state transition matrix  will have a block diagonal form.
18.119.131.235