Chapter 14

Integration of the VideoLAN Client with OpenSAF: An Example

Anik Mishra1 and Ali Kanso2

1Ericsson, Town of Mount Royal, Quebec, Canada

2Concordia University, Montreal, Quebec, Canada

14.1 Introduction

The Availability Management Framework (AMF) manages the high availability (HA) of the services provided by an application through dynamically assigning the workload to the application's components and controlling their life-cycle. To achieve the highest level of availability, the application's components typically need to interface with the AMF [48] and possibly with other services. This design decision is usually made in the development process of the application's components. However legacy applications do not implement this interface that would allow them to interact with AMF. In order to improve the availability of such applications, different levels of integration are possible: They range from the nonproxied-non-SA-aware integration that leaves the application's code intact, through SA-aware integration in which the application is modified allowing more interaction with AMF, to the integrations with additional services of the Service Availability (SA) Forum middleware (e.g., Checkpoint).

In any case, for AMF to manage the availability of an application, it requires a configuration describing the application.

In this chapter we illustrate how these different levels of integration offered by the SA Forum middleware can be used to improve the availability of a legacy application.

More specifically we focus of the steps and the efforts required in achieving various levels of integration and demonstrate them on the example of integrating the VLC (VideoLAN Client) application [109] with the OpenSAF implementation of the SA Forum services [91]. We discuss the achieved availability of the application services versus the complexity associated with implementing each of these levels.

We chose VLC as our example for this exercise because with the increasing growth of internet bandwidth and the number of users, video streaming is gaining more and more interest from both users and suppliers.

HA is an important factor in the quality of service of the delivered stream. This availability is reflected in two main features: (i) the availability of the stream upon demand and (ii) the continuity of the stream during the transmission.

As a result a streaming application like VLC is ideal for examining the effectiveness of HA solutions, because it is a real-time application and the service outage is visually experienced, so the end user can easily appreciate fault tolerance.

In addition VLC is an open source video streaming application. It can be used as a streaming server, or a client receiving the video. It has a modular architecture and it is a product intended to be used by both developers and consumers, in the sense that it offers developers Application Programming Interfaces (APIs) which they can use to add VLC functionalities to their own applications. The application is also reasonably documented.

14.2 Going Under the Hood: The VLC Workflow

VLC's code is structured into cohesive functional modules shown in Figure 14.1. We can divide them into two major categories: stream modules (shown in the rounded rectangle) and management modules.

  • Stream modules take input from various sources, process this input and produce the required output. Examples of this processing are multiplexing/de-multiplexing, encoding/decoding, and so on.
  • The management modules are necessary to coordinate and manage the stream modules, one such management module is the VideoLAN Manager (VLM) module.

Figure 14.1 VLC workflow.

14.1

For instance, to broadcast a video file, the Control Module informs VLM of the location of the file, and instructs it how this file needs to be streamed (whether it is a broadcast or video-on-demand (VoD), the broadcast address, etc.). VLM in turn will request (i) the I/O module to open the file and (ii) the other streaming modules to process the file as needed (e.g., convert its format if needed). Finally, the Real-time Transport Protocol (RTP) module will stream the file and send it to the network.

Video streaming can be configured to function in one of two modes: broadcast or VoD:

  • Broadcast simply sends the video stream to a configured address. This is typically used with a multicast destination address. Within this dedicated range of Internet protocol (IP) addresses, special handling is defined for end users to be able to subscribe to an address instead of establishing a connection. This allows multiple clients to subscribe to the same feed without consuming additional resources on the video server. Of course, this entails some drawbacks as individual clients cannot personalize their experience. They simply receive what is on the network. As such, pause and seek commands would affect all clients and there is no option to resend data in case of packet drop.
  • By contrast, VoD is client request driven. A user contacts an Real-Time Streaming Protocol (RTSP) service to set up a video stream. This stream starts at the beginning and can be paused/resumed as requested by the user without interfering with other users' feeds. When configured for VoD, VLM starts an RTSP server module to accept the streaming requests. Each requesting user then gets its own instance of the video stream to control.

VLM wraps almost all the functionality needed for a streaming service; however, it does not start by itself and it also requires some input to properly perform its task. This is provided by a Control Module.

VLC offers several Control Modules: For clients, VLC offers a graphical user interface (GUI) Control Module that makes it into a full featured video player application. On the server side, among others Telnet and HyperText Transmission Protocol (HTTP) Control Modules are offered.

For our exercise, we stripped down VLC of certain functionalities: We only support the broadcast aspect of VLM, and we implemented our own Control Module that we explain in more details in the next section.

14.3 Integrating VLC with OpenSAF

There is no standard (or single) way of integrating legacy applications with the SA Forum middleware. The method we present is based on our experience with the specifications. We start with the nonproxied-non-SA-aware integration of VLC with OpenSAF. Then we move onto the SA-aware version. Finally we present the addition of checkpointing to this SA-aware version.

In all three cases we followed these generic steps:

1. Selecting an integration method.
2. Defining the component types and their component service types (CSTs).
3. Implementing the life-cycle commands.
a. For nonproxied-non-SA-aware: Implementing the health monitoring.
4. Integrating with the middleware APIs.
a. Selection of services to integrate with.
b. For SA-aware: Implementing the AMF API.
c. If with checkpointing: implementing the Checkpoint service (CKPT) API [42].
5. Designing the AMF configuration.
6. Deploying and testing the implementation.

The steps that are in common for all integrations may in fact differ slightly in their details as we will see subsequently.

The application integrator must make a choice of which integration technique to adopt. Normally this decision is driven by the specificities of the application itself and the implementation efforts to be invested in the integration.

In terms of the implementation efforts the nonproxied-non-SA-aware integration is the least demanding. We present first this approach.

14.3.1 Nonproxied-Non-SA-Aware Integration

14.3.1.1 Component and CS Types

AMF manages the availability of the services of a nonproxied-non-SA-aware component by controlling its life-cycle. The assumption is that the component starts to provide its service—the single component service instance (CSI) it can provide—at the moment of its instantiation. Obviously when the component is terminated it stops providing the CSI.

All VLC modules discussed in Section 14.2 run as a single process, so without code modifications we need to equate the AMF component—at least—to such a single process. This determines the VLC-component component type.1

When the VLC process starts, it reads a configuration file, which is a list of command lines. It defines the mode in which different medias—different streams—are enabled and disabled and all the required attributes for them. Among these attributes each media is associated with one or more inputs composing a playlist the VLC process should stream. In broadcast mode each media stream is associated with a broadcast IP address. This means that a client can access the broadcast by subscribing to this broadcast IP address.

This behavior determines for our VLC-components the CST (Video-CST): it is a request to broadcast a preconfigured playlist to the configured IP address. Accordingly, different configuration files represent different CSIs.

Since the VLC process implementing the VLC-component reads its configuration at its start, there is no need for additional environment variables for assigning the CSI. It also means that different configuration files are needed if we need to run more than one component providing different CSIs on a node. At this time we limited our solution to a single component per node.

Note that even though a VLC-component could support VoD (since we did not change the code) we do not consider and do not enable this mode in the configuration.

From the clients' perspective each stream is identified by two IP addresses: The destination broadcast IP address mentioned above and the source IP address of the streaming server. This means that when, for example, the node goes down and the VLC service is failed over to another node it would result in the change of the source IP address (i.e., the IP address of the originally standby node). The clients would perceive this as a different stream due to the different source IP address and would not play it until the original stream times-out.

In this case and for any service where a client should expect continued communications from the same system, or needs to perform a request, the IP address of the service must be preserved; and therefore in case of a failure, when the service is failed over to a redundant component deployed on a different node, the IP address must be migrated to this node in order to mask the failure. This is applicable to both the broadcast and the VoD—not covered here—services offered by VLC.

This can be done in one of several ways. Many vendors have prepackaged solutions; however, we will use the simplest one: binding the IP address, when needed, to the node with the active video service.

To make this transition automatic we decided to use the life-cycle management provided by AMF for components. Therefore we have created an additional nonproxied-non-SA-aware component type—namely the IP-component—that binds and unbinds the IP address from the node. AMF is able to perform the migration for us simply by specifying a second CST (IP-CST).

This IP-CSI is required for the proper streaming represented by the Video-CSI as it needs to be initiated at this preserved IP address. In other words we also defined a dependency between the two CSIs.

14.3.1.2 Implementing the Life-Cycle Commands

AMF requires three CLC-CLI (Component Life-Cycle—Command Line Interface) commands to be implemented for a nonproxied-non-SA-aware component: the INSTANTIATE, the TERMINATE, and the CLEANUP.

OpenSAF is implemented on the Linux operating system therefore we implemented the CLC-CLI commands as BASH (Born Again Shell) scripts.

To perform error recovery AMF also needs to detect component failures therefore the CLC-CLI commands include also the optional AM_START and AM_STOP commands to start and stop external active monitoring.

We start our discussion of the CLC-CLIs with the issue of health monitoring.

VLC-Component Health Monitoring

AMF can only monitor the health of nonproxied-non-SA-aware components through passive or external active monitoring, because these types of monitoring can be implemented without modifications to the component itself.

External active monitoring involves defining some entity external to the component (referred to as the active monitor) that assesses the health of the component and that reports back to AMF when it detects a component error using the AMF API.

On the other hand passive monitoring uses mostly operating system features to assess the health of the component therefore in our nonproxied-non-SA-aware integration we opt for the later one.

This solution still requires the implementation of the API that instructs AMF to start the passive monitoring namely saAmfPmStart_3(), which our nonproxied-non-SA-aware VLC does not implement obviously. One way of doing this is through the instantiate command where the instantiation script will not only start VLC but also passive monitoring.

The INSTANTIATE command is implemented as shell script that cannot invoke the passive monitoring function of AMF; therefore we implemented a small program in C, which performs this task:

#include <saAmf.h>
...
SaVersionTver = {.releaseCode = ‘B’, .majorVersion = 0x01,
       .minorVersion = 0x01};
SaAisErrorTrc;
...
// initialize a handle
rc = saAmfInitialize(&amf_hdl, &reg_callback_set, &ver);
if (rc!= SA_AIS_OK)
    {
     fprintf(stderr, “cannot get handle to AMF - %u
”, rc);
     return 1;
    }
/* call the passive monitoring function, where comp_name and
       argv[2] would be the component name and process ID
       that were passed as arguments.*/
rc = saAmfPmStart(amf_hdl, &comp_name, atoi(argv[2]),0,
    SA_AMF_PM_NON_ZERO_EXIT|SA_AMF_PM_ZERO_EXIT,
    SA_AMF_NO_RECOMMENDATION
    );
if (rc!= SA_AIS_OK)
    {
fprintf(stderr, “saAmfPmStart FAILED - %u
”, rc );
return 2;
    {
...

The INSTANTIATE command will run the executable file (we refer to as exec_StartMonitoring) of this code, as illustrated in Figure 14.2.

Figure 14.2 OpenSAF interactions during the instantiation of VLC.

14.2

It is important to note that when the INSTANTIATE shell script runs the exec_StartMonitoring, it will pass as arguments the process ID and the component name assigned to VLC, as this information is part of the parameter list of the passive monitoring API invoked later. The code snippet above shows the saAmfInitialize() and the saAmfPmStart() function calls made to start the passive monitoring. These calls constitute the bulk of the C code. In the code above if the passive monitoring fails, we give no recommendation to AMF, while in fact we could have specified a recovery action to take place, for example, a component restart.

Figure 14.2 illustrates the interactions performed to instantiate and start the passive monitoring on the nonproxied-non-SA-aware VLC implementation.

VLC-Component CLC-CLIs

Instantiate
cvlc --daemon --pidfile var
unvlcvlc.pid
exec_StartMonitoring

The cvlc command invokes VLC.

We use the first argument (--daemon) for two reasons:

  • It serves as a notification toward AMF as it returns a zero integer value when the command is executed successfully satisfying the AMF requirement on CLC-CLI commands to have a zero exit status in case of success.
  • It also instructs VLC to detach itself from the controlling terminal and run in the background.

The next pair of arguments instructs VLC to create a file containing its process ID (--pidfile) and store it under the name defined by the following argument—/var/run/vlc/vlc.pid in our case. The value stored in this file will be used by the other two CLC-CLI commands.

The exec_StartMonitoring initiates the AMF passive monitoring as presented in section ‘VLC Component Health Monitoring’.

Cleanup
kill -9 “$(< var
unvlcvlc.pid)”
rm var
unvlcvlc.pid

The CLEANUP command will read the process ID (pid) defined in the (var unvlcvlc.pid) file and pass it to the kill CLI command to terminate the process. The SIGKILL (9) signal cannot be caught or ignored by a process and therefore it is used for immediate termination.

For a complete cleanup we also need to remove the file containing the process ID using the remove command (rm).

Terminate
pid= “$(< var
unvlcvlc.pid)”
kill $pid
if [[ -n “$pid”] ]; then
  i=0
  while [ -n “$(ps -ef | grep “$pid”)”] ; do
    i=$i+1
    if [ $i -gt 5] ; then
     exit 1
    fi
    sleep 1
  done
fi
rm “var
unvlcvlc.pid”
exit 0

The TERMINATE command will also read the process ID (pid) defined in the (var unvlcvlc.pid) file and pass it to the kill CLI command to terminate the process. This kill command sends the default signal (SIGTERM), which is interpreted by the application as a request to terminate itself.

After issuing the termination command we check to see if a process with this pid still exists, if that is the case then we wait for 1 second, and repeat the loop. Otherwise we remove (rm) the file storing the process ID and return success (exit 0).

When the loop is repeated five times, that is, 5 seconds have passed by and the process is still alive, then we exit with an error (exit 1), which notifies AMF that the termination was unsuccessful.

IP-Component CLC-CLI

For the nonproxied-non-SA-aware IP-component we also need to implement the same three life-cycle commands. For this component type the TERMINATE and CLEANUP implementations are identical, because we are simply unbinding an IP address; there is no process to kill or resources to de-allocate.

Instantiate
ip addr add $ip dev $dev
arping -U -c 1 -I $dev $ip

The INSTANTIATE command script consists of two commands: The first adds the IP address (held in $ip) to the selected device (given by $dev).

The second command performs an ARP (Address Resolution Protocol) takeover of the address to indicate to any attached network switch that the address now belongs to this server. This is done by indicating to the arping command to send an unsolicited ARP reply to the networking equipment.

Terminate, Cleanup
ip addr del$ip/32 dev$dev

This command removes the IP address (given by $ip) from the device (given by $dev).

14.3.1.3 The AMF Configuration

For AMF to manage any application, it requires the configuration of this application. Using this configuration AMF selects the components to instantiate and the workload to assign to them.

Figure 14.3 illustrates the AMF entities of our VLC configuration:2 we have two redundant service units (SUs) that form a service group (SG); each SU includes two components—one instance of IP-component and one instance of VLC-component. As discussed in Section 14.3.1.1 a CSI is defined for each component: the IP-CSI of type IP-CST and the Video-CSI of type Video-CST.

Figure 14.3 The AMF configuration structure for VLC.

14.3

The AMF configuration is specified in an XML (eXtensible Markup Language) file compliant to the IMM (Information Model Management) XML schema [110] which is loaded by IMM at cluster start.

There are certain attributes in the configuration that require more considerations:

As mentioned in Section 14.3.1.1, the IP-CSI must be assigned first (before streaming any video); this we capture through the CSI dependency attribute in the CSI class, where the Video-CSI is configured to depend on the IP-CSI.

For the SG of our nonproxied-non-SA-aware VLC we select the no-redundancy redundancy model, which is captured in the redundancy model attribute of the SG type.

There are several timer attributes that are typically set based on experimenting with the application. Among these attributes are the timeouts for the life-cycle commands.

Another attribute which is configured based on the relation of the measured timings is the recovery on error attribute for the components (e.g., whether to set the recommended recovery is to restart or to failover). Typically the choice is based on which recovery takes less time to complete. For the nonproxied-non-SA-aware integration of VLC the time needed to recover the service is the same for both recoveries. However we favor the failover, because it starts the service on a different node and therefore platform related faults that cannot be recovered by restarting the component are also covered by this recovery.

We set the appropriate BASH script for each of the component types as the CLC-CLI commands. While in the class associating the software bundles with nodes we indicate the location of the executables as required by the deployment environment.

14.3.1.4 Test Deployment

To test our application, we used a test environment available in OpenSAF's development sources, under the toolscluster_sim_uml directory. It is a preconfigured User-mode Linux (UmL) cluster that allows one to easily start and stop an entire cluster.

UmL is a lightweight virtualization solution in that it simply uses the current real Linux kernel to provide actual functionality. However, applications running within it see a complete Linux environment, with its own independent root user. Furthermore, to simply run a UmL instance, root access is not necessary.

To integrate with this test system, we used the XML file containing our AMF configuration for the VLC application and merged it with the existing imm.xml describing other parts of the system. We included this file in building the test UmL software (consisting of the Linux kernel and the system image to be loaded at the virtual cluster startup). We then installed our VLC software into the common mapped folder, which is in the root_fs directory. The merged IMM configuration that we made indicated that the executables resided in that directory. That means that if we want to test a new version, we simply need to replace the software and restart the test cluster.

Since our application is network oriented, it is useful for the host system to be able to talk to the test cluster through standard networking interface. This is needed so that the broadcasted video can be seen by the hosting system, and thereafter forwarded to the system's external network interface if needed (i.e., if the end user receiving the stream is on a different machine). To do this we created a tunnel using the following steps:

  • Using the tunctl program we created the tunnel.
  • We then assigned an IP address to the interface within the same subnet as the test cluster and brought the interface up.
  • We started the cluster with the tap environment variable set to—tap tap0. This indicates the cluster startup script to use the tap0 interface for networking.

On the end user side, she would have to start the VLC client application, and request VLC to start playing the current broadcasted stream by specifying the broadcast IP address and port, where the stream is being broadcast, that is, as defined in the configuration file of the VLC-component instances running the server side.

14.3.2 SA-Aware VLC Integration

The main difference between the SA-aware and the nonproxied-non-SA-aware integrations is the addition of the AMF API implementation to the VLC application so that it can interact with the AMF implementation of the OpenSAF middleware. This means that we need to change the application code in order to allow dynamic work load assignments and other required features.

This adds more complexity to the integration and requires a deeper understanding of the application's workflow, compared to the superficial knowledge needed for the nonproxied-non-SA-aware integration.

14.3.2.1 Component and CS Types

VLC is a highly threaded application: it consists of various modules, each of which runs in its own thread. These threads are tightly coupled; therefore it would be a tedious task to separate each thread into its own process with its own independent life-cycle.

An alternative is to consider the different threads as contained components; however, their fault isolation is still a problem and they would fail together anyway.

Consequently, and instead of making substantial modifications to the application, we decided again to represent the VLC process as a single component. We call this component type again the VLC-component.

In contrast to nonproxied-non-SA-aware components that start to provide their CSIs the moment they are instantiated, SA-aware components are assigned the CSIs any time after their instantiation. Therefore we also implement the Video-CST of our SA-aware VLC-component differently.

The workload represented by a VLM-configuration file, which contains information such as the media to broadcast and its properties is no longer associated with the instantiate command. Instead it is passed as an attribute in the assignment of a CSI, when this CSI is assigned to a VLC-component. We again refer to our modified CST as Video-CST.

By the same reason as presented in Section 14.3.1.1 we need the IP-component and the related CST. They can be reused as-is.

14.3.2.2 Implementing the Life-Cycle Commands

As the implementation of the IP-component remains the same its CLC-CLI commands remain as well.

For the VLC-component the INSTANTIATE and CLEANUP CLC CLI command implementations remain the same as defined in Section 14.3.1.2 for the nonproxied-non-SA-aware implementation.

We do not need the TERMINATE CLC-CLI command any more as the termination is implemented as a callback function for SA-aware components. We will discuss the implementation of this callback in the next section.

Regarding health monitoring we can use the same passive monitoring as discussed for the nonproxied-non-SA-aware solution, which is started by the INSTANTIATE command. However since the application code is now linked with the AMF library implementation, AMF can use its own tools and the passive monitoring is not essential.

In either case the health monitoring can be enhances with the use of health-checks, however we will not cover that in this chapter.

14.3.2.3 Integrating with the AMF API

In its original form VLC is capable of being idle—when no VLM-configuration file is provided yet. Once the configuration file has been loaded VLC starts to broadcast the requested media, that is, it assumes the HA active state for the CSI described in the VLM-configuration file that has been loaded.

For our SA-aware integration we would like VLC also to be able to assume the HA standby state: In this first approach this would mean that when a CSI is assigned to a VLC-component, it loads the configuration file, but does not start the broadcast.

As discussed in Section 14.2, in VLC it is the Control Module, which instructs VLM and the other modules what to do in terms of setting up and controlling streams by loading a configuration, then starting and stopping them. Considering that the primary goal of the AMF API is to control of the workload of the components this seems to be a suitable integration point with AMF. Therefore our intention is to create a new Control Module with the interface providing the interaction with AMF, thus implement the SA-awareness in the VLC-component type.

Component Life-Cycle API

Internally, any VLC module provides three functions: Open, Close, and Run (or equivalents) that implement the VLC module life-cycle API. These functions respectively are responsible for initializing, terminating, and performing the assigned tasks of the module.

This means that it is within these functions that we can map the AMF life-cycle instructions (CLC CLI commands and callbacks) and forward them to other VLC modules.

When an SA-aware VLC-component is instantiated by AMF (such as the AMF implementation of OpenSAF), our Control Module invokes first the Open function on VLM (in VLM it is called vlm_New), then it would initialize a handle and register this instance of the VLC-component with AMF.

Thereafter a selection object is obtained by calling the saAmfSelectionObjectGet() function. This allows the SA-aware VLC-component to discover AMF callbacks and dispatch them without continual polling. Figure 14.4 illustrates these interactions.

Figure 14.4 The main (SA-aware) VLC interactions upon instantiation.

14.4

For each of the above calls, we must of course verify that the invocation was successful. When it is not the case, we close any resources we have successfully opened and we tell VLC to exit.

The Control Module registers with AMF the following three callbacks: saAmfCSISetCallback(), saAmfCSIRemoveCallback(), and saAmfComponentTerminateCallback(). AMF can use the first two for managing the component's workload:

  • saAmfCSISetCallback: When AMF desires the VLC-component instance to take an HA state for a CSI it calls this function. As arguments it provides the name and attributes of the CSI and the desired HA state as well as some information about the status of other assignments for the same CSI.
  • saAmfCSIRemoveCallback: When AMF desires the VLC-component to drop a CSI assignment, it calls this function indicating which CSI is concerned.
  • saAmfComponentTerminateCallback: SA-aware components are terminated by invoking this callback function as opposed to the TERMINATE CLC CLI command, which is used in the case of nonproxied-non-SA-aware components. This callback invokes the Close function implemented in the Control Module. As the reader might expect, we free allocated resources and make VLC quit when this function is called.

By calling saAmfDispatch (Figure 14.4) we allow AMF to perform callbacks to the API implemented in the Control Module.

The continuous operation tasks of the VLC-component are performed within the Run function of the Control Module: It handles the AMF workload related requests as they arrive. We look at these details next.

CSI Management Callbacks

As we mentioned, when the saAmfDispatch() function is invoked, three functions may be called. Here we elaborate further on the two callbacks AMF uses to manage the CSI assignments.

The first one is saAmfCSISetCallback(), which is the most complicated. This function is responsible for assigning an HA state on behalf of some CSI. The HA states are the already mentioned active and standby, as well as the quiescing and quiesced states. The function has three arguments of note: the component name, the desired state and a CSI descriptor. In our case, in each component instance the component name is constant since each component registers only itself. So we are only interested in the desired state and the CSI descriptor.

This descriptor is a structure that has four properties: The first is csiFlags. This indicates if this invocation applies to one CSI or all currently assigned CSIs. In our implementation a component takes only a single CSI assignment, so the csiName property indicates the name of the CSI the callback is invoked for. CsiStateDescriptor holds additional information about the state transition. Since it does not affect the behavior of our solution we ignore it. Finally, the csiAttr property may contain a list of the CSI attributes for the CSI if an active or standby assignment is being given. In our case it contains for the active and standby HA state assignments the name of the VLM configuration file describing the workload represented by the CSI for which the HA state is being assigned.

The most important question is what to do when assigned one of those four HA states for a CSI:

  • Active: This means that the component should be providing the service for the CSI indicated in the csiName. The csiAttr provides the name of the VLM configuration file to be loaded if necessary. Since the original VLC implementation does not include and the AMF API by itself also does not provide any functionality that helps synchronizing the stream position between different VLC instances, the component receiving this assignment starts to broadcast from the beginning the media indicated in the VLM configuration file after loading it if necessary (e.g., initial CSI assignment). For this purpose VLM may open and run additional modules.
  • Standby: It means another component is tasked with providing the service; and the component receiving this standby assignment needs to prepare to be ready to take over the active assignment if needed. As we set out at the beginning of the section, in this integration the only preparation we want the standby component to make is to load the VLM-configuration file.
  • Quiesced: This state is assigned when AMF wishes to switch-over active assignments from the active component to another component. In this integration the switch from active to quiesced state is done instantly by stopping all the broadcast streams the VLC-component instance is providing.
  • Quiescing: This state is assigned when AMF wishes for the active assignment to terminate gracefully. That is to say that it should finish servicing its current clients before giving up its assignment. When the component is assigned this state, it continues streaming the current video in the playlist until its completion and then calls the saAmfQuiescingComplete() once to indicate to AMF that the task is done.

The second function is saAmfCSIRemoveCallback(). This is called when AMF wishes to remove an assignment. That is when this callback is invoked on a VLC-component instance we offload the configuration file by cleaning up any objects and releasing any resources created and allocated in association with the configuration file. The Control Module closes all other modules that are not needed for the idle state.

14.3.2.4 AMF Configuration

The entities of the configuration and their arrangement of the SA-aware VLC application remain the same as the one presented in Figure 14.3. However, we need to adjust certain attribute values and add some objects:

  • the VLC-component type category: we set it to SA_AMF_COMP_SA_AWARE instead of SA_AMF_COMP_LOCAL specified for nonproxied-non-SA-aware local components);
  • the redundancy model of the SG type changes to SA_AMF_2N_REDUNDANCY_MODEL instead of SA_AMF_NO_REDUNDANCY_MODEL;
  • the Video-CST now includes a CSI attribute, so we specify the attribute name, and accordingly;
  • we add to the Video-CSI configuration object an association object specifying the VLC configuration file.

14.3.2.5 Test Deployment

The deployment and testing remains the same, as described in Section 14.3.2.5. As noted there for deployment we need to install the new version of the VLC application into the root_fs common mapped folder.

14.3.3 SA-Aware VLC with Service Continuity

Service continuity is a feature that any highly available streaming application must acquire as it would be frustrating for the end user to have to watch the same video from the beginning if a failure occurs on the streaming server side. Therefore the purpose of this third integration variant is to solve the service continuity by using additional middleware services.

When the application provides a state-full service to ensure service continuity the state information needs be communicated to the standby component so that in case of a failure it can resume the work from the state where the now failed previously active component left of the execution. There are several ways of doing this synchronization with the different AIS utility services as discussed in Chapter 7. There are two factors to consider:

The first one is the data required to describe the state so that another component can take over the service provisioning. The important point here is to make sure that with the state information we do not communicate the fault that caused the failure of the active component. In case of our streaming application the state of the streaming is described by the contents of the VLM configuration file (e.g., playlists for the different streams, their associated broadcast address, etc.) and the position of the broadcast for each configured media stream. This data needs to be externalized from the application, and potentially duplicated on other nodes. Note that only the position information changes over time but the amount of data required describing it remains constant.

Secondly, we need to consider the frequency at which the state updates need to be propagated to the standby. This is usually a tradeoff among several factors: the amount of data required to transfer, the conceptual effort to segment the work and produce this data, the resources needed to recover the proper state from the communicated data and most importantly the experience of the user and the guarantees promised to the user.

We need to compare these needs with the functionality of the different services. The Event or Messaging services can be used to propagate the stream position; however the initial configuration would need to be repeated in case of a standby failure. On the other hand all the information can be stored easily in a checkpoint, which can be read by any standby or if there is no standby even the restarted active can use it for picking up its previous state.

We decided to use the Checkpoint Service as it meets all our needs. This service allows us to atomically update the checkpoint, guaranteeing that either all or none of the changes were performed. Furthermore, the service is responsible for all of the logic with regards to duplicating the data on other node(s). Additionally using collocated checkpoint we can improve the performance of the checkpoint update operations.

Accordingly there is a need for some changes in our SA-aware VLC-component. Namely, we need to add the Checkpoint API implementations and we need to incorporate them in the handling of the CSI assignment callbacks. In our discussion we will focus on the traits where the difference exists.

In particular we will not discuss the component type and CST definitions as they remain the same as for the basic SA-aware solution as well as the CLC-CLI commands.

We also need to migrate the source IP address to the new standby component the same way as we presented for the nonproxied-non-SA-aware component in section ‘IP-Component CLC-CLI’. Therefore our IP-component and related IP-CST need to remain part of our application and they require no change.

14.3.3.1 Integrating the Middleware APIs

The objective is to store the current stream position in a checkpoint so that when another VLC-component instance need to continue the broadcast currently broadcasted by the active instance it can do so from the stored position; that is we obtain service continuity.

As a result the Run function of the Control Module will have more tasks to perform: When a component is active it will also periodically update the video position within the checkpoint. While at take-over this checkpoint needs to be read to obtain the position.

To do this in addition to the initialization of the AMF library described in section ‘Component Life-Cycle API’ and shown in Figure 14.4, the component also needs to initialize the CKPT by calling saCkptInitialize() and obtain a handle and in turn a selection object for the CKPT as well.

CSI Management Callbacks

To obtain service continuity the workload assignment callback (i.e., saAmfCSISetCallback) is implemented differently for the different HA states to incorporate checkpointing:

  • Active: This means that in addition to providing the service for the CSI the component needs to periodically checkpoint its state information (i.e., the current position of the stream being broadcasted). When the component is assigned the active state, it first opens the checkpoint associated with the CSI and tries to read it. If the checkpoint is not empty then so this means that this is not an initial assignment and if the component was the standby for the CSI it already obtained the VLM configuration contents and only needs to read the current position from the checkpoint to be able to resume the stream from the position specified in the checkpoint.

If the component was not the standby (i.e., the active which is being restarted) it needs to read the configuration information and the position information.

If the checkpoint is empty, then it was created with the open call and this is an initial assignment; therefore the component needs to read the configuration information from the VLM configuration file, store the information in the checkpoint and to start to broadcast all streams from the beginning.

In any case the component also needs to update the checkpoint periodically with the current streaming position. To allow this we need to modify the way the component detects callbacks. Rather then continuously waiting now it should wait for a limited amount of time. If an event comes up before the timeout, it is dispatched by calling the saAmfDispatch() function and this time it should dispatch one callback at a time so that it does not miss the checkpointing time. After the dispatch returns, the component checks if it is time to perform a checkpoint and, if that is the case, it does so. Then it resumes the timed wait on the selection object.

  • Standby: The component needs to open the checkpoint first. Opening the checkpoint indicates to the Checkpoint service to ensure that there is a replica of the checkpoint on the local node. With the change of storing the configuration information in the checkpoint itself taking the standby assignment for a CSI means that the component needs to obtain the configuration information from the checkpoint rather than from the configuration file itself. (With that the configuration file now can be modified without impacting ongoing stream.) The OpenSAF implementation of the service also allows for tracking checkpoint updates to keep the standby up to date. We do not use this extension as in our case the state information is not incremental.
  • Quiesced: This state assignment is given to change the current active one so that another component—the standby can take over the active assignment. Accordingly the component reacts to the assignment by stopping all broadcasts and saving their position in the checkpoint so that the component taking over resuming the broadcast can do so from the stored position. It also closes the checkpoint.
  • Quiescing: This state assignment is also given to the component currently active for the CSI. In this state, the component continues to broadcast as well as checkpoint all the streams, but only till the end of the current item on the playlist. When the end of the current item is reached in each stream the quiescing is complete and the quiesced state is reached by the component, which confirms this change with AMF by calling the saAmfCSIQuiescingComplete function. It also closes and unlinks the checkpoint as going through the quiescing state indicates that the service is being removed gracefully.

We also need to modify the processing of the saAmfCSIRemoveCallback for the case when the service is abruptly stopped, that is if the CSI is being removed from the component when it has the active assignment. In this case the component unlinks the checkpoint as part of the removal procedure, so that the checkpoint can be removed completely from the system as soon as all components close it.

14.3.3.2 Implementing the Checkpointing

The components assigned the active and the standby HA states need to open the checkpoint for the CSI of concern.

For this they use the saCkptCheckpointOpen() call. It takes as arguments:

  • The handle to the Checkpoint service obtained at initialization.
  • The distinguished name of the checkpoint, we use the CSI name for this purpose.
  • The checkpoint attributes, which are set only by the component with the active assignment:
    • The checkpoint creation flags. We chose to use a collocated checkpoint to maximize the performance of the active component by needing to write to the active replica only and have it as a local replica. Therefore we set the SA_CKPT_WR_ACTIVE_REPLICA and the SA_CKPT_CHECKPOINT_COLLOCATED flags.
    • The maximum checkpoint size: 10 000 000 (i.e., 10 MB).
    • The retention duration of the checkpoint, that is, until what time should it be kept around. We gave SA_TIME_MAX. This is significant only for the time when no component keeps the checkpoint open, that is, when the service is not being provided. We do not keep the state for the CSI if it has been stopped gracefully or abruptly.
    • The maximum number of section: 3 (only two of them are used).
    • The maximum size of each section: 10 000 000 (i.e., 10 MB).
    • The maximum length of each section Id: 2.
  • For the opening mode for the checkpoint. The component with the standby assignment sets the SA_CKPT_CHECKPOINT_READ flag only. The component with the active assignment sets the SA_CKPT_CHECKPOINT_READ, the SA_CKPT_CHECKPOINT_WRITE and the SA_CKPT_CHECKPOINT_CREATE flags.
  • A pointer to a checkpoint handle instance, so that the open can return it to the calling function.

Opening the checkpoint ensures that it is being replicated on the local node of the component opening it.

Following that, if the component is assigned the active HA state it also needs to tell the Checkpoint service to make the local checkpoint replica the active one by calling saCkptActiveReplicaSet().

If the checkpoint was just created (it is empty) then the sections need to be created using the saCkptSectionCreate() function. It takes as arguments the checkpoint handle, the creation attributes, which include the section name and the section expiration, a pointer to initial data to put in the checkpoint, as well as the size of this data.

We use two sections. The first is to host a copy of the content of the VLM configuration file. This guarantees that the active configuration stays consistent, even if the version on disk has been modified. The second section maintains the current position of the media streams, so that we can resume at the same position whenever is needed.

Next to write the initial data to the checkpoint, saCkptCheckpointWrite() is used. It takes as arguments:

  • The checkpoint handle the we obtained at the opening of the checkpoint.
  • A pointer to an array of vectors. These vectors contain the section ID, the offset to write at, the amount of data to write and finally a pointer to the data that needs to be written.
  • The number of vectors in the array, which is two for the two sections.
  • A pointer to a SaUint32T. This number is modified on error. It gives the index of the vector that caused the error.

For subsequent updates the component actively broadcasting only updates the second section with the current positions for this it overwrites the section using saCkptSectionOverwrite() function, which requires the checkpoint handle, the section ID, the pointer to the data, and the data size.

To read from a checkpoint, an saCkptCheckpointRead() function exists which is analogous to saCkptCheckpointWrite(), it has the same arguments.

To close and unlink a checkpoint the saCkptCheckpointClose()and the saCkptCheckpointUnlink() need to be called respectively. They take as argument the handle to the checkpoint.

14.4 Summary and Conclusion

In this chapter we have presented three different levels of integration of the VLC application with the OpenSAF middleware:

In the first one we left the application code intact and had AMF manage the application as a nonproxied-non-SA-aware component. This is the minimum level of integration where the interaction between AMF and the application is limited to life-cycle management, and therefore the least effort is required to implement it. The integration simply consisted of implementing the three life-cycle commands and the passive monitoring code—approximately 70 lines of C code—that instructed AMF to start monitoring the health of the process implementing the component and providing AMF with the configuration based on which it could perform the management.

With this level of integration AMF was already capable of detecting the failure of the VLC process and restart it on the other node within a couple of seconds that would result in an availability measure of approximately 4 × 9's (with a mean time to fail of roughly every 8 hours). The main shortcoming of this integration was that it did not offer the service continuity needed for streaming: In case of a failure the streams restarted from the beginning.

In the second version we rendered VLC a SA-aware application. This was done by creating our own Control Module for VLC, which implemented the APIs required for the interaction with AMF. This new Control Module is roughly 400 lines of C code which also required about 30 lines for the build system. In this case only two CLC CLI were implemented. In terms of effort we used a similar configuration as in the first version.

In this SA-aware VLC implementation the standby was better prepared as it did not need to read the configuration file, so the failover became faster, below half a second on average, which would allow for 5 × 9's of availability (again with the same failure rate). But this solution would still not provide the service continuity expected from a streaming application as the streams would still be restarted. However the introduction of the standby was a necessary step toward the next level of integration.

In order to ensure service continuity, in the third integration besides AMF we also used the Checkpoint service. This addition was handled within the Control Module that now also opened, read and updated a checkpoint with the current stream positions. When the active component failed, the redundant standby VLC-component instance would fetch the stream positions from the checkpoint and would continue the streaming from the same positions. Adding check-pointing increased the effort needed for the integration: We added approximately 390 lines of C code to the 400 lines of our Control Module. We also had to add 150 lines of C code to the VLM and RTP modules of VLC. Again we used a similar AMF configuration as in the previous cases.

With these changes the time required for the recovery remained similar as in the previous case but with this added effort we finally achieved the service continuity experience that the end user watching the stream expects.

Table 14.1 summarizes the efforts and the benefits of the different integration levels for the VLC application.

Table 14.1 Comparison of the different integration levels of VLC

images/c14tnt001.jpg

In conclusion we can say that there is more than one method to integrate legacy applications with an SA Forum middleware implementation. Depending on the application little effort may go a long way, for example, if the application does not have state information or already uses some solution, such as a database to store it. With more efforts better integration is possible, which is still far from a complete application rewrite or implementing all the availability concepts from scratch. For applications that do support some of these concepts and most importantly the concept of the standby, additional options such as the use of proxies or containers are also possible that we did not experiment with. We have also seen that the SA Forum concepts allow not only for achieving HA, but also providing service continuity.

The appropriate level of integration is really influenced by (i) the application itself and the type of service(s) it provide and whether it already implements some availability concepts, (ii) the availability level expected after the integration, and (iii) the amount of effort to be spent on implementing the integration.

In our experience the integration challenge was more on the side of understanding the application features and mapping them with the middleware concepts. But at the simplest level for the nonproxied-non-SA-ware integration even this was not necessary. It was straightforward and did not require any deep understanding or changes/additions to the application's code. On the other hand such a solution may not offer a good user experience for state-full services as we have seen for our media streaming application.

At the other extreme the SA-aware integration with checkpointing has offered the expected availability level and user experience, but required a deeper understanding of the application and its structure as it required some modifications to the code.

Finally while the API integration of the application is major part of the story, it is not the complete one. AMF requires a configuration of the application without which it cannot perform its task and its settings do affect the availability at runtime. It is important that this configuration is carefully defined, and populated with the proper values.

 

 

1 For short and easier read we will refer to the VLC-component component type as VLC-component and components of its type as VLC-component instances.

2 The same entities are used for all three integrations we implemented, but some their configuration attributes differ as we will discuss subsequently.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.60.158