Chapter 9

Global Snapshot

9.1 Introduction

One of the difficulties in a distributed system is that no process has access to the global state of the system, that is, it is impossible for a process to know the current global state of the system (unless the computation is frozen). For many applications, it is sufficient to capture a global state that happened in the past instead of the current global state. For example, in case of a failure the system can restart from such a global state. As another example, suppose that we were interested in monitoring the system for the property that the token in the system has been lost. This property is stable, that is, once it is true it stays true forever; therefore, we can check this property on an old global state. If the token is found to be missing in the old global state, then we can conclude that the token is also missing in the current global state. An algorithm that captures a global state is called a global snapshot algorithm.

A global snapshot algorithm is a useful tool in building distributed systems. Computing a global snapshot is beautifully exemplified by Chandy and Lamport as the problem of taking a picture of a big scene such as a sky filled with birds. The scene is so big that it cannot be captured by a single photograph, and therefore multiple photographs must be taken and composed together to form the global picture. The multiple photographs cannot be taken at the same time instant because there is no shared physical clock in a distributed system. Furthermore, the act of taking a picture cannot change the behavior of the underlying process. Thus birds may fly from one part of the sky to the other while the local pictures are being taken. Despite these problems, we require that the composite picture be meaningful. For example, it should give us an accurate count of the number of birds. We next define what is meant by “meaningful” global state.

Consider the following definition of a global state: A global state is a set of local states that occur simultaneously. This definition is based on physical time. We use the phrase “time-based model” to refer to such a definition. A different definition of a global state based on the “happened-before model” is possible. In the happened-before model, a global state is a set of local states that are all concurrent with each other. By concurrent, we mean that no two states have a happened-before relationship with each other. A global state in the time-based model is also a global state in the happened-before model; if two states occur simultaneously, then they cannot have any happened-before relationship. However, the converse is not true; two concurrent states may or may not occur simultaneously in a given execution.

We choose to use the definition for the global state from the happened-before model for two reasons.

1. It is impossible to determine whether a given global state occurs in the time-based model without access to perfectly synchronized local clocks. For example, the statement. “there exists a global state in which more than two processes have access to the critical section” cannot be verified in the time-based model. In the happened-before model, however, it is possible to determine whether a given global state occurs.

2. Program properties that are of interest are often more simply stated in the happened-before model than in the time-based model, which makes them easier to understand and manipulate. This simplicity and elegance is gained because the happened-before model inherently accounts for different execution schedules. For example, an execution that does not violate mutual exclusion in the time-based model may do so with a different execution schedule. This problem is avoided in the happened-before model.

It is instructive to observe that a consistent global state is not simply a product of local states. To appreciate this, consider a distributed database for a banking application. Assume for simplicity that there are only two sites that keep the accounts for a customer. Also assume that the customer has $500 at the first site and $300 at the second site. In the absence of any communication between these sites, the total money of the customer can be easily computed to be $800. However, if there is a transfer of $200 from site A to site B, and a simple procedure is used to add up the accounts, we may falsely report that the customer has a total of $1000 in his or her accounts (to the chagrin of the bank). This happens when the value at the first site is used before the transfer and the value at the second site after the transfer. It is easily seen that these two states are not concurrent. Note that $1000 cannot be justified even by the messages in transit (or, that “the check is in the mail”).

images

Figure 9.1: Consistent and inconsistent cuts

Figure 9.1 depicts a distributed computation. The dashed lines labeled G1 and G2 represent global states that consist of local states at P1, P2, and P3, where G1 and G2 intersect the processes. Because a global state can be visualized in such a figure as a cut across the computation, the term, “cut” is used interchangeably with “global state.” The cut GI in this computation is not consistent because it records the message m2 as having been received but not sent. This is clearly impossible. The cut G2 is consistent. The message m3 in this cut has been sent but not yet received. Thus it is a part of the channel from process P1 to P3.

Formally, in an event-based model of a computation (E, →), with total order ≺ on events in a single process, we define a cut as any subset FE such that

fFefeF.

We define a consistent cut, or a global snapshot, as any subset FE such that

fFefeF.

9.2 Chandy and Lamport’s Global Snapshot Algorithm

In this section, we describe an algorithm to take a global snapshot (or a consistent cut) of a distributed system. Our example of the distributed database in the previous section illustrates the importance of recording only the consistent cuts. The computation of the snapshot is initiated by one or more processes. We assume that all channels are unidirectional and satisfy the FIFO property. Assuming that channels are unidirectional is not restrictive because a bidirectional channel can simply be modeled by using two unidirectional channels. The assumption that channels are FIFO is essential to the correctness of the algorithm as explained later.

The interface that we study in this chapter is called Camera. It allows any application that uses a camera to invoke the method globalState, which records a consistent global state of the system.

public interface Camera extends MsgHandler {
      void globalState ();
}

The class Camera can be used by any application that implements the interface CamUser. Thus, the application is required to implement the method localState, which records the local state of the application whenever invoked.

public interface CamUser extends MsgHandler {
      void localState ();
}

The algorithm is shown in Figure 9.3. We associate with each process a variable called color that is either white or red. Intuitively, the computed global snapshot corresponds to the state of the system just before the processes turn red. All processes are initially white. After recording the local state, a process turns red. Thus the state of a local process is simply the state just before it turned red.

There are two difficulties in the design of rules for changing the color for the global snapshot algorithm: (1) we need to ensure that the recorded local states are mutually concurrent, and (2) we also need a mechanism to capture the state of the channels. To address these difficulties, the algorithm relies on a special message called a marker. Once a process turns red, it is required to send a marker along all its outgoing channels before it sends out any message. A process is required to turn red on receiving a marker if it has not, already done so. Since channels are FIFO, the above mentioned rule guarantees that no white process ever receives a message sent by a red process. This in turn guarantees that local states are mutually concurrent.

Now let us turn our attention to the problem of computing states of the channels. Figure 9.2 shows that messages in the presence of colors can be of four types:

1. ww messages: These are the messages sent by a white process to a white process. These messages correspond to the messages sent and received before the global snapshot.

2. rr messages: These are the messages sent by a red process to a red process. These messages correspond to the messages sent and received after the global snapshot.

images

Figure 9.2: Classification of messages

3. rw messages: These are the messages sent by a red process received by a white process. In the figure, they cross the global snapshot in the backward direction. The presence of any such message makes the global snapshot inconsistent. The reader should verify that such messages are not possible if a marker is used.

4. wr messages: These are the messages sent by a white process received by a red process. These messages cross the global snapshot in the forward direction and form the state of the channel in the global snapshot because they are in transit when the snapshot is taken.

To record the state of the channel, Pj starts recording all messages it receives from Pi after turning red. Since Pi sends a marker to Pj on turning red, the arrival of the marker at Pj from Pi indicates that there will not be any further white messages from Pi sent to Pj. It can, therefore, stop recording messages once it has received the marker.

The program shown in Figure 9.3 uses chan[k] to record the state of the kth incoming channel and closed [k] to stop recording messages along that channel. In the program, we say that Pj is a neighbor of Pi if there is a channel from Pi to Pj. In our implementation, we have assumed that channels are bidirectional.

Lines 10-17 initialize the variables of the algorithm. All channels are initialized to empty. For each neighboring process Pk, closed[k] is initialized to false. The method globalState turns the process red, records the local state, and sends the marker message on all outgoing channels. Lines 25-34 give the rule for receiving a marker message. If the process is white, it turns red by invoking globalState. Line 27 sets closed[src] to true because there cannot be any message of type wr in that channel after the marker is received. The method isDone determines whether the process has recorded its local state and all incoming channels. Lines 29-33 print all the messages recorded as part of the channels. Lines 36-38 handle application messages. The condition on line 36 is true if the application message is of type wr.

In the algorithm, any change in the value of color must be reported to all neighbors. On receiving any such notification, a process is required to update its own color. This may result in additional messages because of the method globalState. The net result is that if one process turns red, all processes that can be reached directly or indirectly from that process also turn red.

The Chandy-Lamport algorithm requires that a marker be sent along all channels. Thus it has an overhead of e messages, where e is the number of unidirectional channels in the system. We have not discussed the overhead required to combine local snapshots into a global snapshot. A simple method would be for all processes to send their local snapshots to a predetermined process, say, P0.

9.3 Global Snapshots for non-FIFO Channels

We now describe an algorithm due to Mattern that works even if channels are not FIFO. We cannot rely on the marker any more to distinguish between white and red messages. Therefore, we include the color in all the outgoing messages for any process besides sending the marker. Further, even after Pi gets a red message from Pj or the marker, it cannot be sure that it will not receive a white message on that channel. A white message may arrive later than a red message due to the overtaking of messages. To solve this problem we include in the marker the total number of white messages sent by that process along that channel. The receiver keeps track of the total number of white messages received and knows that all white messages have been received when this count equals the count included in the marker. We leave the details of the algorithm to the reader as an exercise.

9.4 Channel Recording by the Sender

Chandy and Lamport’s algorithm requires the receiver to record the state of the channel. Since messages in real channels may get lost, it may be advantageous for senders to record the state of the channel. We will assume that control messages can be sent over unidirectional channels even in the reverse direction.

The mechanism to ensure that we do not record inconsistent global state is based on the coloring mechanism discussed earlier. A process sends white messages before it has recorded its local state and red messages after it has recorded the local state. By ensuring that a white process turns red before accepting a red message, we are guaranteed that there are no rw messages and therefore we will record only a consistent global snapshot.

images

Figure 9.3: Chandy and Lamport’s snapshot algorithm

Now let us turn our attention to recording the state of channels. We assume that the sender records all the messages that it sends out on any outgoing channel before it turned red. Whenever a process turns red, it sends a marker message on all its incoming channels (in the reverse direction) indicating the messages it has received on that channel sofar. The sender can now compute the state of the channel by removing from its buffer all messages that have been received according to the marker.

In this scheme, the sender may end up storing a large number of messages before the marker arrives. Assuming that all control messages (marker and acknowledgment messages) follow FIFO ordering, we can reduce the storage burden at the sender by requiring the receiver to send acknowledgments. When the sender receives an acknowledgment and has not received the marker, it can delete the message from the storage. To identify each message uniquely, we use sequence numbers with messages as encapsulated by the class SeqMessage given below.

public class SeqMessage {
      Msg m.
      int seqNo;
      public SeqMessage (Msg m, int seqNo) {
            this m = m;
            this seqNo = seqNo;
      }
      public int getSeqNo() {
            return seqNo;
      }
      public Msg getMessage () {
            return m;
      }
}

Thus, the algorithm can be summarized by the following rules.

1. Every process is white before recording its state and red after recording its state. A white process sends white messages and a red process sends a red message.

2. A white process turns red before accepting a red message or a marker.

3. On turning red, a process sends markers on all incoming channels in the reverse direction.

4. A white process acknowledges a white message.

5. A white process records any message sent. On receiving an acknowledgment, the corresponding message is removed from the record.

Since this algorithm requires every application message to include the color and the sequence number, we extend the Linker class as shown in Figure 9.4. The method sendMsg works as follows. If the message is either a marker or an acknowledgment message, then no special action is required, and super. sendMsg is invoked. If it is a white application message, then it is recorded as part of the channel history. The method sendMsg also appends the tag “white” or “red” with the message and includes a sequence number.

The algorithm for recording a global snapshot with channel states recorded by the sender is shown in Figure 9.5. For simplicity we have assumed a completely connected topology.

The method globalState is identical to Chandy and Lamport’s algorithm except that the markers are sent on the incoming channels in the reverse direction. When a marker message is received, a white process invokes the method globalState at line 26. Also, that incoming channel is closed at line 27. When an acknowledgement message is received then the corresponding message is removed from the channel history at line 31. This is accomplished by the method removeM. When an application message is received, a white process sends an acknowledgment for a white message at line 37. If the message is red, then the process also turns red by invoking globalState at line 39.

Note that this algorithm also does not require channels for application messages to be FIFO. If channels are known to be FIFO then the receiver only needs to record the sequence number of the last message it received before turning red. The algorithm does require the ability to send control messages in the reverse direction for any application channel. Furthermore, it requires control messages to follow FIFO order. (Why?) If the underlying network does not support FIFO, then sequence numbers can be used to ensure FIFO ordering of messages.

9.5 Application: Checkpointing a Distributed Application

As a simple example, let us try our snapshot algorithm on the circulating token algorithm discussed in the Chapter 2. Figure 9.6 gives a program that constructs a circulating token and a camera. The computation of the global snapshot is initiated by the method globalState.

images

Figure 9.4: Linker extended for use with Sendercamera

images

Figure 9.5: A global snapshot algorithm based on sender recording

images

Figure 9.6: Invocation of the global snapshot algorithm

The global snapshot algorithm can be used for providing fault tolerance in distributed systems. On failure, the system can be restarted from the last snapshot. Global snapshots can also be used for distributed debugging. Inspection of intermediate snapshots may sometimes reveal the source of an error.

9.6 Problems

9.1. Show that if G and H are consistent cuts of a distributed computation (E, →), then so are GH and GH.

9.2. The global snapshot algorithms discussed in this chapter do not freeze the underlying computation. In some applications it may be okay for the underlying application to be frozen while the snapshot algorithm is in progress. How can the snapshot algorithm be simplified if this is the case? Give an algorithm for global snapshot computation and its Java implementation.

9.3. Extend the Java implementation of Chandy and Lamport’s algorithm to allow repeated computation of global snapshots.

9.4. The original algorithm proposed by Chandy and Lamport does not require FIFO but a condition weaker than that. Specify the condition formally.

9.5. How can you use Lamport’s logical clock to compute a consistent global snapshot?

9.6. Give Java implementation of global snapshot algorithm when channels are not FIFO.

9.7. Extend Chandy and Lamport’s algorithm to compute a transitless global state. A consistent global state is transitless if there are no messages in any channel in that global state. Note that a process may have to record its local state multiple times until the recorded local state can be part of a transitless global state. Give Java implementation of your algorithm.

9.8. Give an example of a distributed computation in the interleaving model (with the events of the superimposed global snapshot algorithm) in which the recorded global snapshot does not, occur in the computation.

9.9. How will you use snapshot algorithms to detect that the application has reached a deadlock state?

9.7 Bibliographic Remarks

Chandy and Lamport [CL85] were the first to give an algorithm for computation of a meaningful global snapshot (a colorful description of this algorithm is given by Dijkstra [Dij85]). Spezialetti and Kearns have given efficient algorithms to disseminate a global snapshot to processes initiating the snapshot computation [SK86]. Bouge [Bou87] has given an efficient algorithm for repeated computation of snapshots for synchronous computations. In the absence of the FIFO assumption, as shown by Taylor [Tay89], any algorithm for a snapshot is either inhibitory (that is, it may delay actions of the underlying application) or requires piggybacking of control information on basic messages. Lai and Yang [LY87] and Mattern [Mat93] have given snapshot algorithms that require only the piggybacking of control information. Helary [Hel89] has proposed an inhibitory snapshot algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.223.208