Chapter 12. Application Failover

For many applications, the cost of having a database be unavailable is enormous. For applications used in the banking and brokerage industry, for example, the cost of database downtime can exceed a million dollars per hour. Consequently, backup databases often are used to provide protection against the failure of a primary database.

When a primary database instance fails, applications connected to the instance will receive an error indicating a lost connection. Not only do those user connections to the database get lost, but their uncommitted transactions also get lost. Users then are forced to reestablish a connection to the backup database and resubmit their transactions. This creates a lot of work disruption. The process of automatically switching an application from the primary database to a backup database is referred to as application failover. The objective of application failover is to minimize work disruptions as much as possible by transparently reconnecting to the backup database.

Often, an Oracle Parallel Server database is used to support application failover. In an OPS environment, when one or more Oracle instances fails, applications still can access the shared database through one of the surviving instances. Other approaches to failover include the use of standby databases and replicated databases. Although the focus in this chapter is on application failover in an OPS environment, we’ll talk briefly about these other approaches as well.

In this chapter, we’ll discuss the factors you need to consider when an OPS database is used to support failover and explain the difficulty of the failover process. We’ll also discuss the various aspects of application design and Oracle network configuration you’ll need to know about in order to implement application failover. Many of the techniques we’ll describe here for OPS environments are equally applicable when you’re using a standby database or a replicated database for application failover.

Tip

The discussion in this chapter assumes some knowledge on your part of Net8, Oracle’s networking product. You also should be familiar with the tnsnames.ora file and with the Oracle Names product.

Maintaining a Failover Database

As we mentioned, application failover is the ability of client applications to automatically, without user intervention, connect to a backup database when the primary database fails. When the backup database is not maintained, any failure of the primary instance results in downtime because users have to wait until the primary database becomes available again. By maintaining a backup database, also referred to as a failover database, and implementing application failover, you can decrease or eliminate downtime from database failures.

There are three commonly used methods for maintaining a failover database:

  • Using OPS to ensure that you have a backup instance available

  • Using Oracle’s replication features to maintain a replicated database

  • Using Oracle’s standby database features to maintain a standby database

In addition to the three methods discussed here, many sites have implemented proprietary ways to maintain a backup database. For the Windows NT environment, Oracle also offers a product called Fail-Safe, which is used to implement failover.

Tip

In an OPS environment, there is only one shared database. Consequently, failover occurs to a backup instance, not to a backup database. When replicated or standby databases are used, failover occurs to a backup database. In this chapter, we use the term backup database when we are talking about failover in general and not in reference to a specific method.

Using an OPS Database for Failover

In an OPS environment, multiple nodes access a shared database. When one database instance fails because of hardware or software failure, the client applications connected to the failed instance can reconnect to another OPS instance. This allows users to continue working. Figure 12.1 shows a two-node OPS database where a client begins by using the Oracle instance on node 1 to access the database. When node 1 fails, the client application fails over to the Oracle instance running on node 2. The mechanism for achieving this automatic application failover is discussed later in this chapter.

Application failover

Figure 12-1. Application failover

Advantages of OPS-based failover

The OPS-based approach to failover has some distinct advantages over the replication-based approach and the standby database-based approach:

  • In an OPS configuration, all of the instances concurrently access the database. Under normal circumstances, these instances also are being used for processing other transactions. The result is an efficient use of hardware resources, because you don’t need to have a dedicated (and unused) backup system that kicks in only when a failure occurs.

  • When OPS is used to support failover, the failure of a node impacts only the subset of users who are connected to that node. Users connected to the remaining nodes are not affected and continue to access the database without interruption.

  • A final advantage of OPS is the immediacy with which clients can switch to a backup instance. When one node fails, clients can immediately connect to another node, because a database instance is already up and running on that node. In contrast, using a standby database for failover requires that the standby database be recovered and opened before it can be used. (There are some factors that can add to the failover time in an OPS environment, and we’ll discuss those issues later in this chapter.)

Disadvantages of OPS-based failover

Everything in life seems to be a tradeoff. While using of OPS to support failover has some distinct advantages, there are also disadvantages to consider:

  • With OPS, the shared disk used to store the database files represents a single point of failure. This is because all OPS instances access the database. When a disk failure occurs, the hardware failure has to be rectified first. Then the database has to be restored and recovered. The OPS database is unavailable until the recovery is complete. This situation is illustrated in Figure 12.2. To guard against hardware failures like this, you can mirror your data on multiple disks.

Failure of shared disk prevents any instance from operating on the database

Figure 12-2. Failure of shared disk prevents any instance from operating on the database

  • OPS systems require either cluster or MPP architectures. In both types of architectures, nodes must be located in close proximity to each other due to the performance requirements of the interconnect. In essence, all nodes must be in the same physical location. This means that an OPS system cannot provide protection from a site failure. If your building burns down, all your OPS nodes will go with it. For disaster recovery, you may consider other failover configurations, such as a standby database or a replicated database. Also, these configurations can be used in combination with OPS.

Using a Replicated Database for Failover

Oracle’s advanced replication features can be used to maintain a failover database. With multi-master replication, changes made to one database are replicated to the other database. In contrast to the standby database setup described in the next section, a replicated database is always available for transactions. Replication can be done either synchronously or asynchronously. Figure 12.3 illustrates failover to a replicated database.

Failover to a replicated database

Figure 12-3. Failover to a replicated database

Advantages of replicated database-based failover

The use of a replicated database to support failover has two significant advantages:

  • A replicated database can provide protection when there is a disaster at the primary site. Protection from a disaster comes from the fact that a replicated database does not need to be physically close to the master database. By placing your replicated database in a separate geographic location, you ensure that you will still have a database even in the event of a fire or other such disaster at the primary site.

  • A replicated database can eliminate the shared disk system as a single point of failure. In contrast to an OPS configuration in which many instances share one database, when multi-master replication is used, each replicated instance has its own database. As each replicated database resides on its own disk system, the loss of one disk system will not bring down all your databases. With OPS, if the disk fails, none of the instances can run.

Disadvantages of replicated database-based failover

Along with the good comes the bad. Replication also has several disadvantages when it comes to supporting failover:

  • Multi-master replication does not support the replication of tables with LONG RAW columns. Use of the LONG RAW datatype is diminishing in favor of Oracle’s newer large object types, but if you are currently using LONG RAWs, having to write your own routines to replicate them could present a formidable challenge.

  • Oracle’s replication mechanism may not be able to handle very high transaction volumes; for example, those on the order of a hundred or more transactions per second. The actual scalability limit of replication depends on the hardware platform, network characteristics, and so on.

  • Updates to the main database may conflict with data in the replicated database. You have to write conflict-resolution routines to resolve those conflicts, and developing these routines can be challenging and time consuming.

Using a Standby Database for Failover

A standby database is another approach to maintaining a copy of a primary database that can be used as a backup in case the primary database fails. Initially, you create the standby database as a copy of the primary database. You then continually apply archived redo log files from the primary database to the standby database. In essence, the standby database is in continual recovery mode. This mechanism allows the standby database to remain synchronized with the primary database. In the event of a primary database failure, you apply the last of the redo logs to the standby database and then open it for use. Figure 12.4 illustrates this scenario.

Failover to a standby database

Figure 12-4. Failover to a standby database

Advantages of standby database-based failover

There is one main advantage to using standby databases for failover. The standby database can be maintained in a physical location separate from the primary database. Consequently, a standby database, like a replicated database, can provide protection from site-wide disasters. Standby databases do not suffer from the same scalability limitations as replicated databases, so they can be effectively maintained for a database with a very high transaction volume.

Disadvantages of standby database-based failover

There are also a few drawbacks to using standby databases for failover:

  • In contrast to OPS instances, a standby database is not immediately available when the primary database fails. Before applications can failover to a standby database, all of the achived redo logs have to have been applied, and then the database needs to be opened.

  • Use of a standby database for failover requires that you have two systems, each with its own disk storage. You also have the administrative overhead of maintaining two separate databases. And, in contrast to an OPS configuration, the additional system resources cannot be utilized during normal operations. The one exception is that in Oracle8i, it is possible to temporarily open a standby database in read-only mode.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.212.124