4.2. Linear Protection

4.2.1. Introduction to Linear Protection

Linear protection is the simplest and perhaps the fastest of all the protection mechanisms. Figure 4-2 illustrates linear protection. In this figure, N (≥ 1) primary (or working) lines and M (≥ 1) backup (or protection) lines are shown. This is sometimes referred to as linear M:N protection. The lines shown can be SONET lines, WDM optical links, SDH paths, and so on. Additionally, the equipment shown on either end of the working and protection lines are those that terminate these signals in some form. There may be other equipment between these line termination equipment, but their operation will be essentially transparent and they do not participate in the linear protection mechanism. The basic idea behind linear protection is fairly simple: Should any of the N working lines fail, then the signal being transported over that line will be switched onto an available protection line. It seems like nothing could be simpler! There are, however, a few issues that need to be dealt with: (a) What is the benefit of linear protection? (b) What triggers the switchover to protection (called the “protection switch”)? (c) If there are more working lines than protection lines, should the traffic be reverted back to the original working line from the protection line once the working line is repaired? (d) If signals are bidirectional, then should protection be bidirectional or unidirectional? (e) How much coordination is involved between the two sides in implementing protection? and (f) Could idle protection line bandwidth be used for carrying working traffic?

Figure 4-2. M:N Linear Protection


4.2.1.1. BENEFIT OF LINEAR PROTECTION

To a provider of communications services, the availability of the service to customers is of prime concern since it tends to have associated legal and monetary consequences. Availability is a statistical quantity. Equation (1) gives the definition of availability in terms of mean time before failure (MTBF) and mean time to repair (MTTR), both of which are statistical quantities.

Equation 1


Availability is typically expressed as a percentage, or in common jargon as “X nines.” For example “five nines” is 99.999 percent, which implies that if the MTBF is one year than the MTTR must be less than 5.25 minutes.

Table 4-1 shows more examples of availability, MTBF and MTTR.

Table 4-1. Example Availability, MTBF and MTTR
AvailabilityMTBFMTTRMTBFMTTR
99.999%1 year5.25 minutes1 Month25.9 seconds
99.9999%1 year31.5 seconds1 Month2.6 seconds

It is clear that the longer the time between failures and the quicker the time to repair, the higher the availability.

4.2.1.2. DETECTING THAT SOMETHING IS WRONG

MTTR also includes the time to detect the failure. This is significant since a correctable fault could go unnoticed until a phone call comes in from an unhappy customer. This, in effect, lengthens the MTTR unnecessarily. On the other hand, falsely reporting a fault violates the maxim of “do no harm” as it causes outages due to unnecessary protection switches. Hence, timely and accurate indication of fault conditions is an important aspect of any protection scheme. This is easier with some technologies than with others. For example, with SONET/SDH signals, loss of frame (LOF) or loss of pointer (LOP) can be detected very quickly and accurately. In addition, the extensive performance monitoring capabilities of SONET and SDH make it fairly easy to detect signal degrade or signal fail conditions that result in unacceptable signal quality (bit error rate). This is not the case in transparent (OOO) optical networks, where the main indicator of problems is a “loss of light” condition. Unfortunately, the presence of light does not necessarily indicate that the signal is in good shape.

4.2.1.3. TYPES OF LINEAR PROTECTION AND COMMUNICATION REQUIREMENTS

Figure 4-3 illustrates a very specialized form of linear protection known as 1+1 unidirectional protection. In this form of protection, the same signal (content) is sent on both the working and the protection lines. The receiver is responsible for selecting one of these signals, based on signal quality or fault information. No coordination is required between the sender and the receiver.

Figure 4-3. Unidirectional 1+1 Protection


Figure 4-4 illustrates 1+1 protection for bidirectional links, that is, each “link” consists of receive and transmit fibers. In this case, the unidirectional 1+1 protection described earlier can still be applied. Specifically, the receiving node can select the signal from either the working or the protection lines. This could, however, result in one node receiving the signal from the working line while the other node receives the signal from the protection line. It may be required that both nodes receive the signal from the same line, working or protection. This is important, for instance, if servicing is to be performed on a fiber carrying the working or the protection line. The bidirectional 1+1 protection can be used in these cases. This type of protection requires coordination between the two sides to receive the signal from the same line, that is, if one side switches from working to protection (or vice versa), the other side should also do the same.

Figure 4-4. Bidirectional 1+1 Protection


The next level of complexity comes with the 1:N protection scheme shown in Figure 4-5. Here, a single protection line is used to protect N working lines. When a failure occurs, the node detecting the failure must coordinate with the other node to determine which of the N working lines needs to be backed up by the protection line. As in the 1+1 case, the restoration can occur in unidirectional or bidirectional mode. Because the protection line is being shared among multiple working lines, the unidirectional mode can potentially restore more faults since the protection path is not tied up in the direction along which the working link is still functioning.

Figure 4-5. Bidirectional 1:N Protection


Finally, the general M:N case was shown in Figure 4-2. In this case, there are N working lines and M protection lines with N > M. Under M:N protection, the node detecting the failure must indicate to the other node the identity of the working line to protect, as well as the identity of the protection line to use.

4.2.1.4. EXTRA TRAFFIC

Under 1+1 protection, the signal is concurrently sent on both the working and the protection lines. Under 1:N or M:N protection, however, the protection lines remain unused until a failure occurs. Thus, it seems reasonable to consider using the bandwidth on the protection line(s) for carrying lower priority traffic (called “extra traffic”) that could be preempted. That is, the extra traffic would be dropped whenever the corresponding protection line is used to back up a working line after a failure event.

4.2.1.5. REVERSION

In the 1:N and M:N cases, what should be done after a protection switch? Well, a good idea would be to repair the fault in the working line. After that the signal may be reverted back to the repaired working line. This frees up the protection resources and also allows extra traffic to return to the protection link.

4.2.2. Theoretical Underpinning of Linear Protection

Although this section is not needed to understand the rest of this chapter, the subject is important as it deals with the quantitative benefits of protection and its limitations. Specifically, it will be shown that the benefits of M:N protection as compared with 1:N protection are not as significant as one might think. This is one of the reasons why interoperable standards for M:N linear protection do not yet exist.

The probabilistic quantities quoted here are based on “classic” assumptions of independence between the failure probabilities of the components of the protection group, that is, working and protection lines. It is assumed that the “lifetimes” of the lines involved are all exponentially distributed with the same mean λ, that is, each of the lines have an MTBF = 1/λ. The exponential nature of the probability distribution is actually quite a reasonable approximation as described in [Trivedi82].

As given in [Trivedi82], the MTBF for an m-out-of-n system with components having independent exponential lifetimes with mean λ is

Equation 2


By an m-out-of-n system, we mean a system where m components out of n (m <= n) must be operational for the system to function. Since a 1:1 or 1+1 linear protection system can be viewed as a 1-out-of-2 system,

Equation 3


Equation (3) shows that by adding a redundant line, the MTBF does not double but only increases by a half as compared with that of an unprotected line. If an additional redundant line is added to the 1:1 or 1+1 system, the MTBF only increases by a factor of 1/3λ; hence we see that (1) gives us a “law of diminishing returns” in terms of the additional cost needed for each protection line. Another issue not reflected in (2) is that if a line fails (whether working or protect) it will be repaired. With a 1:N protection group, this is a significant effect since it frees the protection line once the failed line is repaired.

4.2.3. SONET/SDH Line Protection

SONET line (SDH Multiplex Section (MS) protection can be applied between two interconnected pieces of SONET line (SDH MS) equipment, and it is one of the most widely used forms of optical protection. In linear 1+1 switching, the same SONET line signal is sent (bridged) on two separate SONET links, and the receiving equipment selects which of the copies to use. In the linear 1:N case, N working lines share one protection line (which can also be used to carry extra traffic). Although linear protection is the simplest category of protection, the SONET/SDH protocols for this are somewhat complex due to the features supported.

SONET/SDH protection switching is automatically initiated under two general conditions: signal fail and signal degrade. Signal fail is a hard failure condition such as loss of signal, loss of frame or the occurrence of AIS-L (see Chapter 3). In addition, signal fail is also declared when the line BER exceeds a user-specified threshold in the range, 10-3 to 10-5. The signal degrade condition, on the other hand, is a soft failure condition triggered when the line BER exceeds a user-specified threshold in the range, 10-5 to 10-9. These threshold settings are associated with individual SONET lines (SDH Multiplex Sections).

4.2.4. SDH/SONET Linear 1:N Protection

4.2.4.1. INTRODUCTION

Linear 1:N protection allows up to fourteen working lines to be protected by one protection line. In addition, when not used for protection purposes a protection line can also be used to carry extra traffic [ANSI95c, ITU-T95a].

The user can assign a protection priority (high or low) for each working channel. This priority is used to determine which requests for protection take precedence in the APS protocol. In the case of a tie, the channel with the lowest APS channel number is given priority. Note that the APS channel number is a user-assigned protection attribute for the line and is distinct from other line identifiers (port numbers, etc.). It must be consistently set at each end of a SONET/SDH link.

4.2.4.2. GROUP CONFIGURATION

The configuration of a linear APS system first requires the configuration of a linear 1:N APS group. This starts with the creation of a new APS group and the selection of its attributes (Table 4-2). These include name, type (1:N or 1+1), reversion, and wait-to-restore period. The next step is to add the protection and working lines to this APS group.

The APS protocol requires the use of channel numbers for the working lines in an APS group. These are numbers between 1 and 14 assigned to the working lines independently of their port numbers. Note that the number 0 is reserved for the protection channel (sometimes referred to as the null channel), and 15 is reserved for describing extra data that can optionally be carried over the protection channel.

4.2.4.3. LINE PARAMETERS

Table 4-3 lists the parameters that can be set at the individual line level. Note that the user must necessarily specify those parameters without defaults.

In addition to the line level parameters, two key APS statistics are kept at the line level: Protection Switch Count (PSC) and Protection Switch Duration (PSD).

4.2.5. SONET/SDH K1/K2 Linear APS (LAPS) Protocol

The K1 and K2 bytes in the SONET line (SDH MS) overhead are used to control automatic protection switching (APS). Although most of the configuration of APS groups is done at the management layer, a few code points from the K1 byte (bits 5–8) are also used to inform the other side of the link about APS configuration (e.g., whether the APS group is configured for 1+1 or 1:N operation, unidirectional or bidirectional modes, etc.). Table 4-4 illustrates this. The K1 and K2 bytes of the protection channel are used in both directions as the signaling channel for the APS group. These bytes are considered valid if they are identical in three successive frames. Since the frame time is 125 µs, this gives a worse case of 375 µs notification (signaling) latency, not including propagation delay.

Table 4-2. APS Group Parameters
ParameterDescriptionDefault
NameThe name of the groupN/A
Protection Group Type= 1+1, 1:NN/A
Protection LineThe protection lineN/A
Working LineA list of the working linesN/A
DirectionalityUnidirectional (only in 1+1 case) or Bi-directionalN/A
ReversionNonrevertive or Revertive option. Default for 1:N is revertive, for 1+1 nonrevertive [Bellcore95]See description
WTR PeriodWait-to-Restore period (for revertive switching only)5 minutes

Table 4-3. APS Line Parameters
ParameterDescriptionDefault
Channel Number= 0 – 14 (limited by the K1 bits 5-8, see for details)N/A
APS Line TypeWorking or ProtectN/A
Extra TrafficExtra Traffic State, applies to protect line onlyN/A
APS Line PriorityAPS line priority: High or LowLow
SF BER ExponentBER threshold to cause switchover due to signal failure.10-5
SD BER ExponentBER threshold to cause switchover due to signal degrade10-7

Table 4-4. K1 bits (1–4) Request Codes for LAPS
Bits 1-4Condition
1111Lockout of protection
1110Forced switch
1101Signal fail—high priority (not used in 1+1)
1100Signal fail—low priority
1011Signal degrade—high priority (not used in 1+1)
1010Signal degrade—low priority
1001(Not used)
0110Wait-to-restore (revertive only)
0101(Not used)
0100Exerciser
0010Reverse request (bidirectional only)
0001Do not revert (nonrevertive only)
0000No request

4.2.5.1. UNIDIRECTIONAL CASE

The linear APS (LAPS) protocol is described by considering the unidirectional case first (Figure 4-6). As per standard APS terminology [ANSI95c], the transmitter of the signal is denoted as the head end and the receiver of the signal as the tail end. It is the tail end's duty is to request an APS switch based upon the detection of either signal fail, signal degrade, or an external command. When the tail end detects such a condition, it puts the appropriate command code (see Table 4-4) in bits 1–4 of the K1 byte being sent to the head end on the protection line. The tail end sends the channel number of the line requesting protection action in bits 5–8 of the K1 byte, as shown in Table 4-5. (Note that the command code is mostly informative in the unidirectional switching case, but it is important in the bidirectional, multiple failure case.)

Figure 4-6. 1:N Unidirectional LAPS Switch-Over


When the head end receives the request from the tail end, it first bridges or switches the appropriate working line onto the protection line. It then writes the channel number of this working line in bits 1–4 of the K2 byte of the protection line, as shown in Table 4-6. At this point, the tail end selects the protection line as a substitute for the failed working line.

Table 4-5. K1 (bits 5–8), Channel Numbers and Meaning
Channel NumberFunction
0Protection channel requesting switch action. The protection channel will be supplied with a signal containing valid transport overhead for carriage of the APS bytes and line BIP-8.
1–14Number of the working channel requesting switch action
15Extra traffic is present on the protection channel (not valid for 1+1)

4.2.5.2. BIDIRECTIONAL CASE

In the unidirectional case, the tail end used the K1 byte to request a bridge from the head end and the head end indicated the completion of this request by writing the channel number into bits 1–4 of the K2 byte. In the bidirectional case, the head end is also responsible for initiating the second half of the bidirectional switch. It does this by putting the reverse request code, 0010, into bits 1–4 of the K1 byte and the channel number of the working line to be switched into bits 5–8 of the K1 byte. The tail end processes this switch request and puts the desired working line onto the protection line. It also places the channel number into bits 1–4 of the K2 byte sent on the protection line. At this point, the bidirectional APS switch has completed. This process is shown in Figure 4-7. Now, in the unidirectional case the priorities of APS requests from different working lines within the same protection group could be evaluated based on local information only. In the bidirectional case, requests received from the far end via the K1 byte also need to be evaluated. Figure 4-8 illustrates a bidirectional 1:2 protection group. In this example, working line #2 undergoes a signal degrade condition first, prompting a bidirectional protection switch. Note that Network Element (NE) A sends an indication to NE B via the K1 byte of this signal degrade condition on line #2. At a later time, working line #1 experiences a signal fail. This signal failure is detected directly at NE B. Since this is a higher priority request (i.e., the locally detected signal failure vs. the asserted signal degrade via the K1 byte), the signal on working line #1 will be switched onto the protection line. At the time of the failure on line #1, NE B compares the remote request being received on K1 from NE A to the local condition that it has detected.

Figure 4-7. Bidirectional 1:N LAPS with Message Sequence Chart


Figure 4-8. Bidirectional 1:N Protection with Multiple Failures of Different Priorities


Table 4-6. K2 Bit Functions
BitsFunction
1–4These bits indicate the number of the channel that is bridged onto the protection line unless channel 0 is received on bits 5–8 of K1 (in this case, they will be set to 0000)
51 = Provisioned for 1:N mode 0 = Provisioned for 1+1 mode
6–8111 = AIS-L

110 = RDI-L

101 = Provisioned for bidirectional switching

100 Provisioned for unidirectional switching

000-011 reserved for future use

4.2.5.3. REVERSION AND WAIT TO RESTORE

Since a single protection line protects many working lines, it is highly desirable to switch a signal back to the original working line once the fault condition that caused the switchover has been cleared. This process is called reversion. A potential issue with reversion is that a line can experience an intermittent failure, that is, one that comes and goes. When such a condition exists, the signal on that line would tend to be switched back and forth between the working and protection lines. Rapid protection switching and reversion can wreak havoc on the signal quality. Hence, instead of reverting the signal back to its working line immediately after the failure is cleared, the reversion is delayed for an interval of time. This time interval is known as the Wait To Restore (WTR) period. This is typically 5 minutes but can usually be set to other values (some service providers may want to wait till a non-peak traffic period to restore).

Now, what happens if there is a failure on another line during the WTR period? From Table 4-4, it can be seen that the WTR is also a command sent in the K1 byte to indicate that the protection line is in the WTR “mode.” This command has a lower priority than either the signal fail or the signal degrade condition. Hence, another line experiencing a failure or degrade condition can be immediately switched onto the protection line during the WTR period.

4.2.6. APS Commands

The “external” APS commands are listed in Table 4-7 in order of their priority, with protection lockout having the highest precedence. Automatically generated “commands,” for example, such as switching based on detection of signal fail, have lower priority than forced switch and higher priority than manual switch. Hence, a switch based on signal degrade or a signal fail conditions takes precedence over a manual switch request, but not a forced switch request.

4.2.7. Linear Subnetwork Connection Protection

Linear M:N protection, as discussed earlier, works on a link basis. A transport network, however, may consist of a number of switching layers as discussed in previous chapters. Figure 4-9 illustrates a SONET line (SDH MS) being carried over a network of transparent optical switches (or at least switches that appear transparent to the SONET line layer) denoted as PXCs (see Chapter 1). In this case, the connection between the SONET line equipment via the transparent optical network can be considered as a subnetwork connection. In such a situation, the individual subnetwork connections (SNCs) can be protected via a linear protection scheme implemented by the end systems that terminate that SNC.

Figure 4-9. Example of a SONET Line/SDH MS as a Subnetwork Connection across a Transparent Optical Network


When setting up the working and protection SNCs within a protection group, it is important to understand whether the SNCs are physically diverse across the subnetwork. For example, if it turns out that these connections traverse the same path within the subnetwork, then the probability of the protection SNC being available in the event of a problem with the working SNC will be greatly reduced. In particular, in Figure 4-9, if the links within the subnetwork are single fiber WDM links, then two SNCs on the same path are actually on the same fiber and hence vulnerable to the same fiber cuts.

Table 4-7. APS Commands
CommandDescription
Protection LockoutPrevents the chosen working line from being switched to the protection line. If the protection line is chosen, this will lockout all protection switching requests.
Forced SwitchForces a switch from either a working line—to the protection line or the protection line—to a working line without regard to the state of either line.
Manual SwitchInitiates a switch from either a working line—to the protection line or the protection line—to a working line, assuming the line being switched is not in the failed state. This command has lower priority than the previous two. Also, this command has lower priority than automatically initiated requests based on signal fail or degrade conditions.
ExerciseExercises the APS protocol without causing a switchover.
ClearClears any of the above switching requests or protocol exercises.

4.2.7.1. 1+1 SUBNETWORK CONNECTION PROTECTION

The primary responsibility for implementing SNC protection rests with the end systems rather than the subnetwork elements. Since elaborate linear protection mechanisms are currently only defined for the SONET line (SDH MS) layer, most SNC protection is done via the unidirectional 1+1 approach discussed in section 4.2.1.3. It was seen previously, however, that 1+1 mechanisms are not bandwidth efficient when compared to 1:N methods. But if only a relatively few finer granularity SNCs require additional protection then a 1+1 SNC method may be appropriate.

As an example, consider a SONET OC-48 (2.5 Gbps) line containing a mix of different types of traffic. Suppose it is desired that a particular STS-1 SPE within this OC-48 be protected. For this, either 1:1 protection can be established at the SONET line level (with 2.5 Gbps of protection capacity ensured on all the links within the line level subnetwork), or a 1+1 protection group can be set up at the STS-1 SPE level. The latter would require an additional 51 Mbps of capacity within the subnetwork. Since the SONET STS path level supports good performance monitoring and fault management capabilities, implementation of the 1+1 SNC functionality is straightforward. Hence, instead of requiring another OC-48 line dedicated to protection purposes (2.5 Gbps) only an additional 51 Mbps of bandwidth needs to be found on an alternative link (this other link does not have to have the same overall capacity).

4.2.7.2. VIRTUAL 1:N SUBNETWORK CONNECTION PROTECTION

Linear SNC protection considered in the previous two sections requires minimal involvement from the subnetwork itself, that is, only the end systems were involved in restoration and the subnetwork only provided diverse paths for working and protection SNCs. Figure 4-10 illustrates SNC level protection. Here, two SONET path level (SDH VC-3/4 level) SNCs are set up between two pairs of SONET path termination equipment (PTE). In addition, two diversely routed protection connections are set up between these pairs of PTE.

Figure 4-10. Potential Bandwidth Savings with Linear SNC Protection (Virtual 1:N SNC)


As seen from Figure 4-10, the two protection connections could share the capacity on the links between NE A and NE C, and between NE B and NE D. For the sharing to be implemented, the sub-network itself must now get involved, that is, the subnetwork must understand the difference between working and protection connections so that only protection connections are allowed to share bandwidth on a link. More elaborate mechanisms are possible to avoid the sharing of bandwidth among protection connections whose working connections are vulnerable to the same failures. For the subnetwork to respond to a failure or signal degrade condition, either it must be informed of such a condition by the end system(s) or it must perform some type of nonintrusive fault and/or performance monitoring of the SNC.

This type of protection generally works in a mesh subnetwork, and it is called “shared mesh protection.” Recall that in section 1.2 it was emphasized that there are always trade-offs involved with various restoration techniques. The SNC protection scheme described above gives up robustness for bandwidth efficiency. Specifically, a connection cannot be restored if failures impact both the working and protection paths. This is analogous to 1:N line level protection in that the diversity of the working and protection paths determines the overall reliability. Table 4-8 summarizes the design choices concerning reliability and robustness of virtual 1:N SNC protection. Table 4-9 summarizes the bandwidth efficiency and restoration speed design choices.

Table 4-8. Virtual 1:N SNC Protection: Reliability and Robustness Design Choices
FactorDesign Choice
Failure types that are restorableOne or more failures that affect the original path only, and not the diverse reserved path
Distributed vs. Centralized restorationDistributed
Interoperability with other protection schemesFully interoperable with line layer 1+1, 1:N, BLSR and source rerouted mesh
Implementation complexitySet up of diverse path and reserved protection path slightly more complicated than regular connection establishment. Restoration behavior simpler than source reroute mesh restoration.

Table 4-9. Virtual 1:N SNC Protection: Bandwidth Efficiency and Restoration Speed Design Choices
FactorDesign Choice
Bandwidth granularitySame as connection granularity
Impact on existing connectionsNone. If “extra traffic” option is used then this traffic will be preempted in the case of restoration
Local vs. End-to-End restorationEnd-to-end restoration
Method(s) used to find alternative routesPrecalculated diverse route
Bandwidth reservation vs. coordination timeProtection bandwidth is reserved. No contention coordination is required.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.37.62