Chapter 7. High Availability

BES version 5 offers high availability (HA) straight out of the box. Unlike all previous versions where disaster recovery was offered as an active-passive model with a manual intervention, BES 5 offers this feature as a standard. In the previous versions, this meant that we would have to configure a standby server, which would usually mean an additional BlackBerry Server license, and at the time of failure, users would need to be issued with new service books if the SRP ID changed. In this chapter, we are going to have a look at the vast improvements made in this area with BES version 5, starting with how high availability is offered straight out of the box without any additional licensing issues. You would still need an additional server to run the standby BlackBerry Server.

Understanding high availability

The first improvement is that the manual intervention when a failure occurs has been removed.

With BES version 5 there is an automatic failover between BES instances (actually BES Servers) and BES components (individual services running, as seen in Chapter 1, Introduction to BES 5). Instead of the old model of active-passive, the new one is a primary standby, where the instances exchange health scores to ensure components are running at an optimal level. There is still the option to force a manual failover for maintenance tasks. No longer do we need to worry that the SRP ID will be locked out (this occurs if two instances of a BES join the RIM network with the same SRP ID), this is avoided when running the BES in high availability mode. We can also use the primary-standby architecture to aid in limited downtime for upgrades, such as applying MR releases. The key about this architecture is that no additional BlackBerry Server license is needed for the standby server.

Understanding how it works

The primary BES would have all live connections and communications with the organization's mail servers, the RIM network and the local BES configuration database. The standby server would have a live connection to the BES configuration database, and a warm connection to the mail server, and no connectivity to the RIM network.

The primary BES running all the services with live connections fails the dispatcher component in the standby BES, identifies the failure, and automatically begins the failover process. The standby BES will then try to establish a connection to the SRP.xx.blackberry.net (where xx is the country location). The standby server now becomes the primary server and verifies it as the only BES with the SRP connection and creates an active connection. Once online, the original primary BES server becomes the standby server and attempts to make a live connection to the database server and a warm connection to the mail servers.

The primary standby model works on health scores; these are calculated for each BES component and shared as a heartbeat between the two instances.

The threshold values for scores of each component are configured once on the primary server (these are known as failover thresholds) and on the standby server (these are known as promotion thresholds). Scores above these threshold values would trigger a failover or promotion.

A default threshold score is configured for each component. There is room for an administrator to change the default health scores, which trigger a failover or promotion; this is shown later on in the chapter.

Note

It should also be noted that the BlackBerry configuration database is a vital component to high availability of your BES server; therefore, if it is housed on an SQL server, I would strongly recommend setting up mirroring of the database, which is now a certified option with RIM.

Configuring high availability

  1. On a new server extract the BlackBerry Enterprise media.
  2. Select Install standby server.
  3. If mirroring has been set up, point the database location to the mirror database location, so if a failover of the primary database occurs, the mirror takes over.

As mentioned in the introduction of high availability, the architecture revolves around health scores being exchanged between the primary and standby server. If the value of a component's health score goes above or below the failover threshold or promotion threshold then an automatic failover is triggered.

The components send their health scores to the BlackBerry dispatcher, which in turn writes them to the configuration database. There are 17 health components or parameters that are measured; each can be set with a threshold health score from 0 to 63.

For example, let's say the default promotion threshold value for the wireless network access is 57. If the primary BES reports a health score of 20 and the standby server reports a health score of 58, then an automatic failover will occur and the standby server will initiate the process to become the primary BES. It's important to understand the following when we are setting thresholds:

  • Promotion thresholds: These are examined when the standby instance needs to determine whether it can promote itself to the primary instance
  • Failover thresholds: These are examined when the primary BES needs to determine whether it should demote itself or not

Let's have a look at these settings.

Examining the default threshold values and setting failovers

The following are the steps you can take to have a look at the settings mentioned previously. Again, the following steps show how you can change the default values to meet your network needs:

  1. Expand Servers and components, then expand High Availability, and select Highly available BlackBerry Enterprise Servers.
  2. Select the BlackBerry Enterprise pair.
  3. Click on Automatic Failover Settings.
  4. We can now change the order and threshold values of the 17 parameters.

In the same section, we can choose whether we want the failover to happen automatically. This can be done by selecting Turn on automatic BlackBerry Enterprise Server failover.

Forcing a manual failover

We also have the option to perform a manual failover — ideal for applying maintenance packs on the primary server. To force a manual failover we need to:

  1. Expand Servers and components, then expand High Availability, and select Highly available BlackBerry Enterprise Servers.
  2. Select the BlackBerry Enterprise pair.
  3. Click on Manual failover.
  4. In the list, choose the BlackBerry Enterprise instance that we want to failover to.
  5. Click on Yes — failover to standby instance.

Introducing HA for databases

If your environment is using MS SQL Server 2005 with SP2 or later, we can introduce high availability for the database that the BES server uses by utilizing database mirroring. In principle, database mirror is run in a RAID 1 configuration and produces a copy of the databases on to a separate physical hard disk drive. If the primary hard disk drive was to fail, active connections would be made to the mirrored database automatically. In this type of RAID 1, disk mirroring provides fault tolerance to the BES infrastructure. If the database connection fails on our primary BES, it will try and open a connection to the mirrored databases. If this fails then the primary BES will lower its health score in anticipation that the standby server will now assume the role and try to open an active connection to its BES database.

Most of the components within the BES environment have some sort of feature to help with high availability by either using load balancing, DNS round-robin, (demonstrated during the installation for the BlackBerry Administration Service site) and failover with active connection to one instance and standby connections to another warm instance. There is no high availability for one of the BES components and that is the monitoring website that was created during the installation. We will be having a look at this site next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.72.212