CHAPTER 42
Network Adapter Failover

The script in this chapter provides network redundancy. It monitors the network accessibility of the local machine for issues. When there is a problem detected with a primary network interface, it reverts its configuration to a backup interface. We are assuming a network architecture where two network interface cards (NICs) are installed in the machine that runs the script. We're also assuming there are network connections running to both interfaces, which are configured in the same fashion (subnet/vlan, speed, duplex, and so on). Each interface should be physically connected to a different network switch for the sake of redundancy.

The goal is that if the primary network hardware fails for any reason, the system will recognize the lack of connectivity and switch the network settings to a backup interface. This script probably wouldn't be very useful in a small environment, as redundant network hardware can get expensive. However, it is a good tool for use in an environment where high availability and redundancy are key.

This script performs very well. In testing, I was logged into the system through the network and, after executing some commands validating connection, I disconnected the primary interface cable. The failover of the interface occurred in less than 10 seconds and my command-line session carried on as if nothing had happened.

Depending on when the interface failure occurs, the maximum time for a failover to complete would be about 15 seconds. The script first checks network availability, sleeps for 10 seconds, wakes up and checks again, and continuously repeats this process. The shortest amount of time the script could take to recognize and execute a failover is probably less than 5 seconds. Most systems can take that amount of interruption without much impact.

Like in many scripts in this book, the configuration of variables happens in the script itself. It would probably make for cleaner code to save the configuration information in a separate file, which can then be sourced from the script. If this were done, you could change the values without interfering with the code.

#!/bin/sh
LOG=/var/log/messages
PRIMARY=eth0
SECONDARY=eth1
ME=`uname -n`

This first group of configuration variables sets up the log file where log entries for any potential network failures will be entered. The primary and secondary interface names are also defined. These names will change depending on your hardware and operating system. For instance, network interfaces on most Linux machines have names like eth0 or eth1. Other UNIX variants might use names such as iprb0 or en1. We also determine the system name so that failover messages can indicate the machine that had the problem.

The following code sets the networking information. These are the settings that will be switched when a failure occurs:

IP=`grep $ME /etc/hosts | grep -v '^#' | awk '{print $1}'`
NETMASK=255.255.255.0
BROADCAST="`echo $IP | cut -d. -f1-3`.255"

The networking information will be specific to your implementation. You will need to determine your IP address appropriately. The address could be located in the local hosts file (as shown here) or the NIS or DNS information locations. The IP address could also have been set manually. The subnet mask and broadcast address are also system-specific.

The next set of configuration variables determines the way the script monitors for network availability.

PINGLIST="Replace with a space-separated list of IP addresses"
PING_COUNT=2
SLEEPTIME=10
MAILLIST=sysadmins

The PINGLIST variable holds a list of IP addresses situated in a route architecturally beyond the switches to which the redundant interfaces are attached. All PINGLIST addresses should refer to systems that are always running, such as core network routers. The variable can specify any number of IP addresses. Having a single address doesn't give enough redundancy, whereas two or three do. I used three router addresses outside our local subnet.

The PING_COUNT and SLEEPTIME variables describe the number of pings to use for each of the addresses in the PINGLIST and the amount of time to sleep between network checks. The MAILLIST variable is a comma-delimited list of mail addresses that will be sent a notification when any failover takes place.

The ping utility has operating system—dependent command-line switches that are used when sending specific numbers of ping packets to a system. This check determines the OS of the system the script is running on. It then sets a variable containing the appropriate ping switch.

if [ "`uname | grep -i hp`" != "" ]
then
  ping_switch="-n"
elif [ "`uname | grep -i linux`" != "" ]
then
  ping_switch="-c"
fi

Now we have to determine the currently active network interfaces.

NICS=`netstat -i | awk '{print $1}' |
  egrep -vi "Kernel|Iface|Name|lo" | sort -u`
NIC_COUNT=`netstat -i | awk '{print $1}' |
  egrep -v "Kernel|Iface|Name|lo" | sort -u | wc -l`

The script needs to know which interface is the primary interface prior to entering the main loop. This is so that it will be able to switch interfaces in the correct direction. The commands may need to be validated on your specific operating system. There may also be other values that you'll want to filter out with the egrep command. For instance, on my FreeBSD box, there is a point-to-point interface that I wouldn't want involved, and I'd filter it out here.

Now we have the list of currently active interfaces on the system. If there is only one interface, we of course assume it to be the primary interface. If there are more interfaces, we loop through all the active ones to find the interface with the specified primary IP address and make it the current interface.

if [ $NIC_COUNT -gt 1 ]
then
  for nic in $NICS
  do
    current=`ifconfig $nic | grep $IP`
    if [ "$current" != "" ]
    then
      CURRENT_NIC=$nic
    fi
  done
else
  CURRENT_NIC=$NICS
fi

If the initial active primary interface is the specified SECONDARY interface, you have to reverse the variables so the script won't switch interfaces in the wrong direction.

if [ "$CURRENT_NIC" = "$SECONDARY" ]
then
  SECONDARY=$PRIMARY
  PRIMARY=$CURRENT_NIC
fi

This starts the main loop for checking the network's availability. It starts by sleeping for the configured amount of time and then initializes the variable for the ping response.

while :
do
  sleep $SLEEPTIME
  answer=""

Check the Network

The core of the script can be found in the following loop. It iterates through each of the IP addresses in the PINGLIST variable and sends two pings to each of them.

  for node in $PINGLIST
  do
    if ping $node $ping_switch $PING_COUNT > /dev/null 2>&1
    then
      answer="${answer}alive"
    else
      answer="${answer}"
    fi
  done

The answer is based on the return code of the ping. If a ping fails, its return code will be nonzero. If the ping is successful, the answer variable will have "alive" appended to it. Under normal conditions, if all router addresses are replying, the answer variable will be in the form of "alivealivealive" (if you have, say, three addresses in the PINGLIST).

If the answer from the pings is non-null, we break out of the loop because the network is available. Thus all IP addresses present in the PINGLIST variable must fail to respond for a failover to occur.

  if [ "$answer" != "" ]
  then
    echo network is working...
    continue

This allows us to avoid moving the network settings unnecessarily in the event of one IP address in the PINGLIST being slow to respond or down when the network is in fact available through the primary interface.

If all pings fail, you should use the logger program to put an entry in the LOG file. Logger is a shell interface to syslog. Using syslog to track the failover in this way is simpler than creating your own formatted entry to the log file.

  else
    logger -i -t nic_switch -f $LOG "Ping failed on $PINGLIST"
    logger -i -t nic_switch -f $LOG "Possible nic or switch
      failure. Moving $IP from $PRIMARY to $SECONDARY"

Switch the Interfaces

Now we perform the actual interface swap.

    ifconfig $PRIMARY down
    ifconfig $SECONDARY $IP netmask $NETMASK broadcast $BROADCAST
    ifconfig $SECONDARY up

First we need to take down the primary interface. Then we have to configure the secondary interface. Depending on your operating system, the final command to bring up the newly configured interface may not be required. With Linux, configuring the interface is enough to bring it online, whereas Solaris requires a separate command for this.

In Solaris the interface remains visible with the ifconfig command after it is brought down. To remove the entry, we have to perform an ifconfig INTERFACE unplumb. The same command used with the plumb option makes the interface available prior to being configured. FreeBSD will work with the same command options, although that option has been provided only for Solaris compatibility. The native ifconfig options for FreeBSD are create and destroy.

We now need to send out an e-mail notification that the primary interface had an issue and was switched over to an alternate NIC. An additional check here to verify that the network is available would be wise. This way, if both interfaces are down, mail won't start filling the mail queue.

     echo "`date +%b %d %T` $ME nic_switch[$$]: Possible nic or
       switch failure. Moving $IP from $PRIMARY to $SECONDARY" |
       mail -s "Nic failover performed on $ME" $MAILLIST

Now that the interfaces have been switched, the script will swap the values of the PRIMARY and SECONDARY variables so any subsequent failovers will be performed in the right direction.

    place_holder=$PRIMARY
    PRIMARY=$SECONDARY
    SECONDARY=$place_holder
  fi
done
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.35.148