Chapter 14
Integration of the VideoLAN Client with OpenSAF: An Example
The Availability Management Framework (AMF) manages the high availability (HA) of the services provided by an application through dynamically assigning the workload to the application's components and controlling their life-cycle. To achieve the highest level of availability, the application's components typically need to interface with the AMF [48] and possibly with other services. This design decision is usually made in the development process of the application's components. However legacy applications do not implement this interface that would allow them to interact with AMF. In order to improve the availability of such applications, different levels of integration are possible: They range from the nonproxied-non-SA-aware integration that leaves the application's code intact, through SA-aware integration in which the application is modified allowing more interaction with AMF, to the integrations with additional services of the Service Availability (SA) Forum middleware (e.g., Checkpoint).
In any case, for AMF to manage the availability of an application, it requires a configuration describing the application.
In this chapter we illustrate how these different levels of integration offered by the SA Forum middleware can be used to improve the availability of a legacy application.
More specifically we focus of the steps and the efforts required in achieving various levels of integration and demonstrate them on the example of integrating the VLC (VideoLAN Client) application [109] with the OpenSAF implementation of the SA Forum services [91]. We discuss the achieved availability of the application services versus the complexity associated with implementing each of these levels.
We chose VLC as our example for this exercise because with the increasing growth of internet bandwidth and the number of users, video streaming is gaining more and more interest from both users and suppliers.
HA is an important factor in the quality of service of the delivered stream. This availability is reflected in two main features: (i) the availability of the stream upon demand and (ii) the continuity of the stream during the transmission.
As a result a streaming application like VLC is ideal for examining the effectiveness of HA solutions, because it is a real-time application and the service outage is visually experienced, so the end user can easily appreciate fault tolerance.
In addition VLC is an open source video streaming application. It can be used as a streaming server, or a client receiving the video. It has a modular architecture and it is a product intended to be used by both developers and consumers, in the sense that it offers developers Application Programming Interfaces (APIs) which they can use to add VLC functionalities to their own applications. The application is also reasonably documented.
VLC's code is structured into cohesive functional modules shown in Figure 14.1. We can divide them into two major categories: stream modules (shown in the rounded rectangle) and management modules.
For instance, to broadcast a video file, the Control Module informs VLM of the location of the file, and instructs it how this file needs to be streamed (whether it is a broadcast or video-on-demand (VoD), the broadcast address, etc.). VLM in turn will request (i) the I/O module to open the file and (ii) the other streaming modules to process the file as needed (e.g., convert its format if needed). Finally, the Real-time Transport Protocol (RTP) module will stream the file and send it to the network.
Video streaming can be configured to function in one of two modes: broadcast or VoD:
VLM wraps almost all the functionality needed for a streaming service; however, it does not start by itself and it also requires some input to properly perform its task. This is provided by a Control Module.
VLC offers several Control Modules: For clients, VLC offers a graphical user interface (GUI) Control Module that makes it into a full featured video player application. On the server side, among others Telnet and HyperText Transmission Protocol (HTTP) Control Modules are offered.
For our exercise, we stripped down VLC of certain functionalities: We only support the broadcast aspect of VLM, and we implemented our own Control Module that we explain in more details in the next section.
There is no standard (or single) way of integrating legacy applications with the SA Forum middleware. The method we present is based on our experience with the specifications. We start with the nonproxied-non-SA-aware integration of VLC with OpenSAF. Then we move onto the SA-aware version. Finally we present the addition of checkpointing to this SA-aware version.
In all three cases we followed these generic steps:
The steps that are in common for all integrations may in fact differ slightly in their details as we will see subsequently.
The application integrator must make a choice of which integration technique to adopt. Normally this decision is driven by the specificities of the application itself and the implementation efforts to be invested in the integration.
In terms of the implementation efforts the nonproxied-non-SA-aware integration is the least demanding. We present first this approach.
AMF manages the availability of the services of a nonproxied-non-SA-aware component by controlling its life-cycle. The assumption is that the component starts to provide its service—the single component service instance (CSI) it can provide—at the moment of its instantiation. Obviously when the component is terminated it stops providing the CSI.
All VLC modules discussed in Section 14.2 run as a single process, so without code modifications we need to equate the AMF component—at least—to such a single process. This determines the VLC-component component type.1
When the VLC process starts, it reads a configuration file, which is a list of command lines. It defines the mode in which different medias—different streams—are enabled and disabled and all the required attributes for them. Among these attributes each media is associated with one or more inputs composing a playlist the VLC process should stream. In broadcast mode each media stream is associated with a broadcast IP address. This means that a client can access the broadcast by subscribing to this broadcast IP address.
This behavior determines for our VLC-components the CST (Video-CST): it is a request to broadcast a preconfigured playlist to the configured IP address. Accordingly, different configuration files represent different CSIs.
Since the VLC process implementing the VLC-component reads its configuration at its start, there is no need for additional environment variables for assigning the CSI. It also means that different configuration files are needed if we need to run more than one component providing different CSIs on a node. At this time we limited our solution to a single component per node.
Note that even though a VLC-component could support VoD (since we did not change the code) we do not consider and do not enable this mode in the configuration.
From the clients' perspective each stream is identified by two IP addresses: The destination broadcast IP address mentioned above and the source IP address of the streaming server. This means that when, for example, the node goes down and the VLC service is failed over to another node it would result in the change of the source IP address (i.e., the IP address of the originally standby node). The clients would perceive this as a different stream due to the different source IP address and would not play it until the original stream times-out.
In this case and for any service where a client should expect continued communications from the same system, or needs to perform a request, the IP address of the service must be preserved; and therefore in case of a failure, when the service is failed over to a redundant component deployed on a different node, the IP address must be migrated to this node in order to mask the failure. This is applicable to both the broadcast and the VoD—not covered here—services offered by VLC.
This can be done in one of several ways. Many vendors have prepackaged solutions; however, we will use the simplest one: binding the IP address, when needed, to the node with the active video service.
To make this transition automatic we decided to use the life-cycle management provided by AMF for components. Therefore we have created an additional nonproxied-non-SA-aware component type—namely the IP-component—that binds and unbinds the IP address from the node. AMF is able to perform the migration for us simply by specifying a second CST (IP-CST).
This IP-CSI is required for the proper streaming represented by the Video-CSI as it needs to be initiated at this preserved IP address. In other words we also defined a dependency between the two CSIs.
AMF requires three CLC-CLI (Component Life-Cycle—Command Line Interface) commands to be implemented for a nonproxied-non-SA-aware component: the INSTANTIATE, the TERMINATE, and the CLEANUP.
OpenSAF is implemented on the Linux operating system therefore we implemented the CLC-CLI commands as BASH (Born Again Shell) scripts.
To perform error recovery AMF also needs to detect component failures therefore the CLC-CLI commands include also the optional AM_START and AM_STOP commands to start and stop external active monitoring.
We start our discussion of the CLC-CLIs with the issue of health monitoring.
AMF can only monitor the health of nonproxied-non-SA-aware components through passive or external active monitoring, because these types of monitoring can be implemented without modifications to the component itself.
External active monitoring involves defining some entity external to the component (referred to as the active monitor) that assesses the health of the component and that reports back to AMF when it detects a component error using the AMF API.
On the other hand passive monitoring uses mostly operating system features to assess the health of the component therefore in our nonproxied-non-SA-aware integration we opt for the later one.
This solution still requires the implementation of the API that instructs AMF to start the passive monitoring namely saAmfPmStart_3(), which our nonproxied-non-SA-aware VLC does not implement obviously. One way of doing this is through the instantiate command where the instantiation script will not only start VLC but also passive monitoring.
The INSTANTIATE command is implemented as shell script that cannot invoke the passive monitoring function of AMF; therefore we implemented a small program in C, which performs this task:
#include <saAmf.h> ... SaVersionTver = {.releaseCode = ‘B’, .majorVersion = 0x01, .minorVersion = 0x01}; SaAisErrorTrc; ... // initialize a handle rc = saAmfInitialize(&amf_hdl, ®_callback_set, &ver); if (rc!= SA_AIS_OK) { fprintf(stderr, “cannot get handle to AMF - %u ”, rc); return 1; } /* call the passive monitoring function, where comp_name and argv[2] would be the component name and process ID that were passed as arguments.*/ rc = saAmfPmStart(amf_hdl, &comp_name, atoi(argv[2]),0, SA_AMF_PM_NON_ZERO_EXIT|SA_AMF_PM_ZERO_EXIT, SA_AMF_NO_RECOMMENDATION ); if (rc!= SA_AIS_OK) { fprintf(stderr, “saAmfPmStart FAILED - %u ”, rc ); return 2; { ...
The INSTANTIATE command will run the executable file (we refer to as exec_StartMonitoring) of this code, as illustrated in Figure 14.2.
It is important to note that when the INSTANTIATE shell script runs the exec_StartMonitoring, it will pass as arguments the process ID and the component name assigned to VLC, as this information is part of the parameter list of the passive monitoring API invoked later. The code snippet above shows the saAmfInitialize() and the saAmfPmStart() function calls made to start the passive monitoring. These calls constitute the bulk of the C code. In the code above if the passive monitoring fails, we give no recommendation to AMF, while in fact we could have specified a recovery action to take place, for example, a component restart.
Figure 14.2 illustrates the interactions performed to instantiate and start the passive monitoring on the nonproxied-non-SA-aware VLC implementation.
cvlc --daemon --pidfile var unvlcvlc.pid exec_StartMonitoring
The cvlc command invokes VLC.
We use the first argument (--daemon) for two reasons:
The next pair of arguments instructs VLC to create a file containing its process ID (--pidfile) and store it under the name defined by the following argument—/var/run/vlc/vlc.pid in our case. The value stored in this file will be used by the other two CLC-CLI commands.
The exec_StartMonitoring initiates the AMF passive monitoring as presented in section ‘VLC Component Health Monitoring’.
kill -9 “$(< var unvlcvlc.pid)” rm var unvlcvlc.pid
The CLEANUP command will read the process ID (pid) defined in the (var unvlcvlc.pid) file and pass it to the kill CLI command to terminate the process. The SIGKILL (9) signal cannot be caught or ignored by a process and therefore it is used for immediate termination.
For a complete cleanup we also need to remove the file containing the process ID using the remove command (rm).
pid= “$(< var unvlcvlc.pid)” kill $pid if [[ -n “$pid”] ]; then i=0 while [ -n “$(ps -ef | grep “$pid”)”] ; do i=$i+1 if [ $i -gt 5] ; then exit 1 fi sleep 1 done fi rm “var unvlcvlc.pid” exit 0
The TERMINATE command will also read the process ID (pid) defined in the (var unvlcvlc.pid) file and pass it to the kill CLI command to terminate the process. This kill command sends the default signal (SIGTERM), which is interpreted by the application as a request to terminate itself.
After issuing the termination command we check to see if a process with this pid still exists, if that is the case then we wait for 1 second, and repeat the loop. Otherwise we remove (rm) the file storing the process ID and return success (exit 0).
When the loop is repeated five times, that is, 5 seconds have passed by and the process is still alive, then we exit with an error (exit 1), which notifies AMF that the termination was unsuccessful.
For the nonproxied-non-SA-aware IP-component we also need to implement the same three life-cycle commands. For this component type the TERMINATE and CLEANUP implementations are identical, because we are simply unbinding an IP address; there is no process to kill or resources to de-allocate.
ip addr add $ip dev $dev arping -U -c 1 -I $dev $ip
The INSTANTIATE command script consists of two commands: The first adds the IP address (held in $ip) to the selected device (given by $dev).
The second command performs an ARP (Address Resolution Protocol) takeover of the address to indicate to any attached network switch that the address now belongs to this server. This is done by indicating to the arping command to send an unsolicited ARP reply to the networking equipment.
ip addr del$ip/32 dev$dev
This command removes the IP address (given by $ip) from the device (given by $dev).
For AMF to manage any application, it requires the configuration of this application. Using this configuration AMF selects the components to instantiate and the workload to assign to them.
Figure 14.3 illustrates the AMF entities of our VLC configuration:2 we have two redundant service units (SUs) that form a service group (SG); each SU includes two components—one instance of IP-component and one instance of VLC-component. As discussed in Section 14.3.1.1 a CSI is defined for each component: the IP-CSI of type IP-CST and the Video-CSI of type Video-CST.
The AMF configuration is specified in an XML (eXtensible Markup Language) file compliant to the IMM (Information Model Management) XML schema [110] which is loaded by IMM at cluster start.
There are certain attributes in the configuration that require more considerations:
As mentioned in Section 14.3.1.1, the IP-CSI must be assigned first (before streaming any video); this we capture through the CSI dependency attribute in the CSI class, where the Video-CSI is configured to depend on the IP-CSI.
For the SG of our nonproxied-non-SA-aware VLC we select the no-redundancy redundancy model, which is captured in the redundancy model attribute of the SG type.
There are several timer attributes that are typically set based on experimenting with the application. Among these attributes are the timeouts for the life-cycle commands.
Another attribute which is configured based on the relation of the measured timings is the recovery on error attribute for the components (e.g., whether to set the recommended recovery is to restart or to failover). Typically the choice is based on which recovery takes less time to complete. For the nonproxied-non-SA-aware integration of VLC the time needed to recover the service is the same for both recoveries. However we favor the failover, because it starts the service on a different node and therefore platform related faults that cannot be recovered by restarting the component are also covered by this recovery.
We set the appropriate BASH script for each of the component types as the CLC-CLI commands. While in the class associating the software bundles with nodes we indicate the location of the executables as required by the deployment environment.
To test our application, we used a test environment available in OpenSAF's development sources, under the toolscluster_sim_uml directory. It is a preconfigured User-mode Linux (UmL) cluster that allows one to easily start and stop an entire cluster.
UmL is a lightweight virtualization solution in that it simply uses the current real Linux kernel to provide actual functionality. However, applications running within it see a complete Linux environment, with its own independent root user. Furthermore, to simply run a UmL instance, root access is not necessary.
To integrate with this test system, we used the XML file containing our AMF configuration for the VLC application and merged it with the existing imm.xml describing other parts of the system. We included this file in building the test UmL software (consisting of the Linux kernel and the system image to be loaded at the virtual cluster startup). We then installed our VLC software into the common mapped folder, which is in the root_fs directory. The merged IMM configuration that we made indicated that the executables resided in that directory. That means that if we want to test a new version, we simply need to replace the software and restart the test cluster.
Since our application is network oriented, it is useful for the host system to be able to talk to the test cluster through standard networking interface. This is needed so that the broadcasted video can be seen by the hosting system, and thereafter forwarded to the system's external network interface if needed (i.e., if the end user receiving the stream is on a different machine). To do this we created a tunnel using the following steps:
On the end user side, she would have to start the VLC client application, and request VLC to start playing the current broadcasted stream by specifying the broadcast IP address and port, where the stream is being broadcast, that is, as defined in the configuration file of the VLC-component instances running the server side.
The main difference between the SA-aware and the nonproxied-non-SA-aware integrations is the addition of the AMF API implementation to the VLC application so that it can interact with the AMF implementation of the OpenSAF middleware. This means that we need to change the application code in order to allow dynamic work load assignments and other required features.
This adds more complexity to the integration and requires a deeper understanding of the application's workflow, compared to the superficial knowledge needed for the nonproxied-non-SA-aware integration.
VLC is a highly threaded application: it consists of various modules, each of which runs in its own thread. These threads are tightly coupled; therefore it would be a tedious task to separate each thread into its own process with its own independent life-cycle.
An alternative is to consider the different threads as contained components; however, their fault isolation is still a problem and they would fail together anyway.
Consequently, and instead of making substantial modifications to the application, we decided again to represent the VLC process as a single component. We call this component type again the VLC-component.
In contrast to nonproxied-non-SA-aware components that start to provide their CSIs the moment they are instantiated, SA-aware components are assigned the CSIs any time after their instantiation. Therefore we also implement the Video-CST of our SA-aware VLC-component differently.
The workload represented by a VLM-configuration file, which contains information such as the media to broadcast and its properties is no longer associated with the instantiate command. Instead it is passed as an attribute in the assignment of a CSI, when this CSI is assigned to a VLC-component. We again refer to our modified CST as Video-CST.
By the same reason as presented in Section 14.3.1.1 we need the IP-component and the related CST. They can be reused as-is.
As the implementation of the IP-component remains the same its CLC-CLI commands remain as well.
For the VLC-component the INSTANTIATE and CLEANUP CLC CLI command implementations remain the same as defined in Section 14.3.1.2 for the nonproxied-non-SA-aware implementation.
We do not need the TERMINATE CLC-CLI command any more as the termination is implemented as a callback function for SA-aware components. We will discuss the implementation of this callback in the next section.
Regarding health monitoring we can use the same passive monitoring as discussed for the nonproxied-non-SA-aware solution, which is started by the INSTANTIATE command. However since the application code is now linked with the AMF library implementation, AMF can use its own tools and the passive monitoring is not essential.
In either case the health monitoring can be enhances with the use of health-checks, however we will not cover that in this chapter.
In its original form VLC is capable of being idle—when no VLM-configuration file is provided yet. Once the configuration file has been loaded VLC starts to broadcast the requested media, that is, it assumes the HA active state for the CSI described in the VLM-configuration file that has been loaded.
For our SA-aware integration we would like VLC also to be able to assume the HA standby state: In this first approach this would mean that when a CSI is assigned to a VLC-component, it loads the configuration file, but does not start the broadcast.
As discussed in Section 14.2, in VLC it is the Control Module, which instructs VLM and the other modules what to do in terms of setting up and controlling streams by loading a configuration, then starting and stopping them. Considering that the primary goal of the AMF API is to control of the workload of the components this seems to be a suitable integration point with AMF. Therefore our intention is to create a new Control Module with the interface providing the interaction with AMF, thus implement the SA-awareness in the VLC-component type.
Internally, any VLC module provides three functions: Open, Close, and Run (or equivalents) that implement the VLC module life-cycle API. These functions respectively are responsible for initializing, terminating, and performing the assigned tasks of the module.
This means that it is within these functions that we can map the AMF life-cycle instructions (CLC CLI commands and callbacks) and forward them to other VLC modules.
When an SA-aware VLC-component is instantiated by AMF (such as the AMF implementation of OpenSAF), our Control Module invokes first the Open function on VLM (in VLM it is called vlm_New), then it would initialize a handle and register this instance of the VLC-component with AMF.
Thereafter a selection object is obtained by calling the saAmfSelectionObjectGet() function. This allows the SA-aware VLC-component to discover AMF callbacks and dispatch them without continual polling. Figure 14.4 illustrates these interactions.
For each of the above calls, we must of course verify that the invocation was successful. When it is not the case, we close any resources we have successfully opened and we tell VLC to exit.
The Control Module registers with AMF the following three callbacks: saAmfCSISetCallback(), saAmfCSIRemoveCallback(), and saAmfComponentTerminateCallback(). AMF can use the first two for managing the component's workload:
By calling saAmfDispatch (Figure 14.4) we allow AMF to perform callbacks to the API implemented in the Control Module.
The continuous operation tasks of the VLC-component are performed within the Run function of the Control Module: It handles the AMF workload related requests as they arrive. We look at these details next.
As we mentioned, when the saAmfDispatch() function is invoked, three functions may be called. Here we elaborate further on the two callbacks AMF uses to manage the CSI assignments.
The first one is saAmfCSISetCallback(), which is the most complicated. This function is responsible for assigning an HA state on behalf of some CSI. The HA states are the already mentioned active and standby, as well as the quiescing and quiesced states. The function has three arguments of note: the component name, the desired state and a CSI descriptor. In our case, in each component instance the component name is constant since each component registers only itself. So we are only interested in the desired state and the CSI descriptor.
This descriptor is a structure that has four properties: The first is csiFlags. This indicates if this invocation applies to one CSI or all currently assigned CSIs. In our implementation a component takes only a single CSI assignment, so the csiName property indicates the name of the CSI the callback is invoked for. CsiStateDescriptor holds additional information about the state transition. Since it does not affect the behavior of our solution we ignore it. Finally, the csiAttr property may contain a list of the CSI attributes for the CSI if an active or standby assignment is being given. In our case it contains for the active and standby HA state assignments the name of the VLM configuration file describing the workload represented by the CSI for which the HA state is being assigned.
The most important question is what to do when assigned one of those four HA states for a CSI:
The second function is saAmfCSIRemoveCallback(). This is called when AMF wishes to remove an assignment. That is when this callback is invoked on a VLC-component instance we offload the configuration file by cleaning up any objects and releasing any resources created and allocated in association with the configuration file. The Control Module closes all other modules that are not needed for the idle state.
The entities of the configuration and their arrangement of the SA-aware VLC application remain the same as the one presented in Figure 14.3. However, we need to adjust certain attribute values and add some objects:
The deployment and testing remains the same, as described in Section 14.3.2.5. As noted there for deployment we need to install the new version of the VLC application into the root_fs common mapped folder.
Service continuity is a feature that any highly available streaming application must acquire as it would be frustrating for the end user to have to watch the same video from the beginning if a failure occurs on the streaming server side. Therefore the purpose of this third integration variant is to solve the service continuity by using additional middleware services.
When the application provides a state-full service to ensure service continuity the state information needs be communicated to the standby component so that in case of a failure it can resume the work from the state where the now failed previously active component left of the execution. There are several ways of doing this synchronization with the different AIS utility services as discussed in Chapter 7. There are two factors to consider:
The first one is the data required to describe the state so that another component can take over the service provisioning. The important point here is to make sure that with the state information we do not communicate the fault that caused the failure of the active component. In case of our streaming application the state of the streaming is described by the contents of the VLM configuration file (e.g., playlists for the different streams, their associated broadcast address, etc.) and the position of the broadcast for each configured media stream. This data needs to be externalized from the application, and potentially duplicated on other nodes. Note that only the position information changes over time but the amount of data required describing it remains constant.
Secondly, we need to consider the frequency at which the state updates need to be propagated to the standby. This is usually a tradeoff among several factors: the amount of data required to transfer, the conceptual effort to segment the work and produce this data, the resources needed to recover the proper state from the communicated data and most importantly the experience of the user and the guarantees promised to the user.
We need to compare these needs with the functionality of the different services. The Event or Messaging services can be used to propagate the stream position; however the initial configuration would need to be repeated in case of a standby failure. On the other hand all the information can be stored easily in a checkpoint, which can be read by any standby or if there is no standby even the restarted active can use it for picking up its previous state.
We decided to use the Checkpoint Service as it meets all our needs. This service allows us to atomically update the checkpoint, guaranteeing that either all or none of the changes were performed. Furthermore, the service is responsible for all of the logic with regards to duplicating the data on other node(s). Additionally using collocated checkpoint we can improve the performance of the checkpoint update operations.
Accordingly there is a need for some changes in our SA-aware VLC-component. Namely, we need to add the Checkpoint API implementations and we need to incorporate them in the handling of the CSI assignment callbacks. In our discussion we will focus on the traits where the difference exists.
In particular we will not discuss the component type and CST definitions as they remain the same as for the basic SA-aware solution as well as the CLC-CLI commands.
We also need to migrate the source IP address to the new standby component the same way as we presented for the nonproxied-non-SA-aware component in section ‘IP-Component CLC-CLI’. Therefore our IP-component and related IP-CST need to remain part of our application and they require no change.
The objective is to store the current stream position in a checkpoint so that when another VLC-component instance need to continue the broadcast currently broadcasted by the active instance it can do so from the stored position; that is we obtain service continuity.
As a result the Run function of the Control Module will have more tasks to perform: When a component is active it will also periodically update the video position within the checkpoint. While at take-over this checkpoint needs to be read to obtain the position.
To do this in addition to the initialization of the AMF library described in section ‘Component Life-Cycle API’ and shown in Figure 14.4, the component also needs to initialize the CKPT by calling saCkptInitialize() and obtain a handle and in turn a selection object for the CKPT as well.
To obtain service continuity the workload assignment callback (i.e., saAmfCSISetCallback) is implemented differently for the different HA states to incorporate checkpointing:
If the component was not the standby (i.e., the active which is being restarted) it needs to read the configuration information and the position information.
If the checkpoint is empty, then it was created with the open call and this is an initial assignment; therefore the component needs to read the configuration information from the VLM configuration file, store the information in the checkpoint and to start to broadcast all streams from the beginning.
In any case the component also needs to update the checkpoint periodically with the current streaming position. To allow this we need to modify the way the component detects callbacks. Rather then continuously waiting now it should wait for a limited amount of time. If an event comes up before the timeout, it is dispatched by calling the saAmfDispatch() function and this time it should dispatch one callback at a time so that it does not miss the checkpointing time. After the dispatch returns, the component checks if it is time to perform a checkpoint and, if that is the case, it does so. Then it resumes the timed wait on the selection object.
We also need to modify the processing of the saAmfCSIRemoveCallback for the case when the service is abruptly stopped, that is if the CSI is being removed from the component when it has the active assignment. In this case the component unlinks the checkpoint as part of the removal procedure, so that the checkpoint can be removed completely from the system as soon as all components close it.
The components assigned the active and the standby HA states need to open the checkpoint for the CSI of concern.
For this they use the saCkptCheckpointOpen() call. It takes as arguments:
Opening the checkpoint ensures that it is being replicated on the local node of the component opening it.
Following that, if the component is assigned the active HA state it also needs to tell the Checkpoint service to make the local checkpoint replica the active one by calling saCkptActiveReplicaSet().
If the checkpoint was just created (it is empty) then the sections need to be created using the saCkptSectionCreate() function. It takes as arguments the checkpoint handle, the creation attributes, which include the section name and the section expiration, a pointer to initial data to put in the checkpoint, as well as the size of this data.
We use two sections. The first is to host a copy of the content of the VLM configuration file. This guarantees that the active configuration stays consistent, even if the version on disk has been modified. The second section maintains the current position of the media streams, so that we can resume at the same position whenever is needed.
Next to write the initial data to the checkpoint, saCkptCheckpointWrite() is used. It takes as arguments:
For subsequent updates the component actively broadcasting only updates the second section with the current positions for this it overwrites the section using saCkptSectionOverwrite() function, which requires the checkpoint handle, the section ID, the pointer to the data, and the data size.
To read from a checkpoint, an saCkptCheckpointRead() function exists which is analogous to saCkptCheckpointWrite(), it has the same arguments.
To close and unlink a checkpoint the saCkptCheckpointClose()and the saCkptCheckpointUnlink() need to be called respectively. They take as argument the handle to the checkpoint.
In this chapter we have presented three different levels of integration of the VLC application with the OpenSAF middleware:
In the first one we left the application code intact and had AMF manage the application as a nonproxied-non-SA-aware component. This is the minimum level of integration where the interaction between AMF and the application is limited to life-cycle management, and therefore the least effort is required to implement it. The integration simply consisted of implementing the three life-cycle commands and the passive monitoring code—approximately 70 lines of C code—that instructed AMF to start monitoring the health of the process implementing the component and providing AMF with the configuration based on which it could perform the management.
With this level of integration AMF was already capable of detecting the failure of the VLC process and restart it on the other node within a couple of seconds that would result in an availability measure of approximately 4 × 9's (with a mean time to fail of roughly every 8 hours). The main shortcoming of this integration was that it did not offer the service continuity needed for streaming: In case of a failure the streams restarted from the beginning.
In the second version we rendered VLC a SA-aware application. This was done by creating our own Control Module for VLC, which implemented the APIs required for the interaction with AMF. This new Control Module is roughly 400 lines of C code which also required about 30 lines for the build system. In this case only two CLC CLI were implemented. In terms of effort we used a similar configuration as in the first version.
In this SA-aware VLC implementation the standby was better prepared as it did not need to read the configuration file, so the failover became faster, below half a second on average, which would allow for 5 × 9's of availability (again with the same failure rate). But this solution would still not provide the service continuity expected from a streaming application as the streams would still be restarted. However the introduction of the standby was a necessary step toward the next level of integration.
In order to ensure service continuity, in the third integration besides AMF we also used the Checkpoint service. This addition was handled within the Control Module that now also opened, read and updated a checkpoint with the current stream positions. When the active component failed, the redundant standby VLC-component instance would fetch the stream positions from the checkpoint and would continue the streaming from the same positions. Adding check-pointing increased the effort needed for the integration: We added approximately 390 lines of C code to the 400 lines of our Control Module. We also had to add 150 lines of C code to the VLM and RTP modules of VLC. Again we used a similar AMF configuration as in the previous cases.
With these changes the time required for the recovery remained similar as in the previous case but with this added effort we finally achieved the service continuity experience that the end user watching the stream expects.
Table 14.1 summarizes the efforts and the benefits of the different integration levels for the VLC application.
In conclusion we can say that there is more than one method to integrate legacy applications with an SA Forum middleware implementation. Depending on the application little effort may go a long way, for example, if the application does not have state information or already uses some solution, such as a database to store it. With more efforts better integration is possible, which is still far from a complete application rewrite or implementing all the availability concepts from scratch. For applications that do support some of these concepts and most importantly the concept of the standby, additional options such as the use of proxies or containers are also possible that we did not experiment with. We have also seen that the SA Forum concepts allow not only for achieving HA, but also providing service continuity.
The appropriate level of integration is really influenced by (i) the application itself and the type of service(s) it provide and whether it already implements some availability concepts, (ii) the availability level expected after the integration, and (iii) the amount of effort to be spent on implementing the integration.
In our experience the integration challenge was more on the side of understanding the application features and mapping them with the middleware concepts. But at the simplest level for the nonproxied-non-SA-ware integration even this was not necessary. It was straightforward and did not require any deep understanding or changes/additions to the application's code. On the other hand such a solution may not offer a good user experience for state-full services as we have seen for our media streaming application.
At the other extreme the SA-aware integration with checkpointing has offered the expected availability level and user experience, but required a deeper understanding of the application and its structure as it required some modifications to the code.
Finally while the API integration of the application is major part of the story, it is not the complete one. AMF requires a configuration of the application without which it cannot perform its task and its settings do affect the availability at runtime. It is important that this configuration is carefully defined, and populated with the proper values.
1 For short and easier read we will refer to the VLC-component component type as VLC-component and components of its type as VLC-component instances.
2 The same entities are used for all three integrations we implemented, but some their configuration attributes differ as we will discuss subsequently.
18.221.241.116