Failover Methods

Three methods are commonly used to implement application failover:

  • Connection-time failover using Net8

  • Transparent Application Failover (TAF) using Net8 and OCI8

  • Failover using a three-tier architecture

The failover methods discussed here are equally applicable to OPS environments, environments that use standby databases, and environments that use replicated databases as a basis for failover.

Connect-Time Failover with Net8

Connect-time failover refers to application failover when it occurs during new connection attempts to a database. Using Net8, you can configure a service so that if a connection to the primary database is not successful, the connection is automatically routed to an alternate database. Net8 configuration in support of connect-time failover involves making appropriate entries in each client’s tnsnames.ora file or in your Oracle Names server. Because clients go through Net8 to connect to a database, connect-time failover can be completely transparent to your applications.

tnsnames.ora configuration

On client machines, Net8 uses a configuration file named tnsnames.ora to translate a database service name into a specific hostname, port number, and instance name. When you connect to a service, Net8 reads this file to find out where that service resides. For any given service, you can define alternate database instances in order to implement connect-time failover. When you do this, any new connections to a service are first attempted using the first database instance in the list. If that attempt fails, Net8 automatically tries again using the second connection in the list. This process continues until a connection is made or until Net8 runs out of instances to try.

The following example shows a database service name definition from a tnsnames.ora file. The service name HADB contains three DESCRIPTION entries in its definition. One points to the database instance named HADB1 on NODE1. The other two point to the instance HADB2 on NODE2 and to the instance HADB3 on NODE3. When a client connects to the HADB service, Net8 will attempt to connect to the three instances in the order in which they are listed. HADB1 will be tried first. If that connection cannot be made for any reason, Net8 will try HADB2. If the connection attempt to HADB2 also fails, Net8 will move on to HADB3:

HADB =
  (DESCRIPTION_LIST =
    (DESCRIPTION = 
      (ADDRESS=
        (PROTOCOL=TCP)(HOST=NODE1)(PORT = 1521)
      )
      (CONNECT_DATA = (SID = HADB1))
    )
    (DESCRIPTION = 
      (ADDRESS=
        (PROTOCOL=TCP)(HOST=NODE2)(PORT = 1521)
      )
      (CONNECT_DATA = (SID = HADB2))
    )
(DESCRIPTION = 
      (ADDRESS=
        (PROTOCOL=TCP)(HOST=NODE3)(PORT = 1521)
      )
      (CONNECT_DATA = (SID = HADB3))
    )

  )

Connect-time failover happens transparently without requiring any changes to the client application. Note that tnsnames.ora allows the failover configuration to be different for each client machine. Thus, it is possible to configure a different order for failover instances for different client machines. For example, in the tnsnames.ora file of some client machines, the order of failover configuration for service name HADB can be set as HADB1, HADB2, and HADB3. On some other client machines, the order could be set as HADB1, HADB3, and HADB2. This permits the failover load to be balanced between HADB2 and HADB3 when HADB1 fails.

Net8 release 8.1.5 uses the service name rather than the SID to identify a database. When using the service name, it is not necessary to specify connect data for every SID. Using service names, our sample tnsnames.ora configuration looks like this:

HADB =
  (DESCRIPTION = 
    (ADDRESS=
      (PROTOCOL=TCP)(HOST=NODE1)(PORT = 1521)
    )
    (ADDRESS=
      (PROTOCOL=TCP)(HOST=NODE2)(PORT = 1521)
    )
    (ADDRESS=
      (PROTOCOL=TCP)(HOST=NODE3)(PORT = 1521)
    )
    (CONNECT_DATA= (SERVICE_NAME= HADB))
  )

Refer to Oracle Corporation’s Net8 Administrators Guide, Release 8.1 for detailed information on tnsnames.ora configuration, and for information about Net8 features that facilitate load balancing in an OPS environment.

Name server configuration

Sometimes, Oracle Names is used to resolve database service names. This is very helpful when you have a large number of clients that connect to a database. Oracle Names allows you to centralize the management of service name definitions, so that whenever there is a change to be made, you can do it in one central location instead of having to go out and edit each tnsnames.ora file on each individual client machine. For purposes of failover, alternate databases instances are registered in Oracle Names in a manner similar to that used for tnsnames.ora.

Transparent Application Failover with Net8 and OCI8

Transparent Application Failover (TAF) is a new feature in Oracle8. It provides connect-time failover as well as runtime failover. However, some restrictions apply when TAF is used for runtime failover. The use of TAF is possible for applications developed using either Net8 or the Oracle Call Interface 8 (OCI8). To support TAF, Net8 requires additional configuration information in your tnsnames.ora file. An additional requirement is that your applications must use the failover-related API calls in OCI8.

Currently, Oracle has implemented TAF in SQL*Plus using the new failover features of OCI8. Client sessions using SQL*Plus can automatically failover to a backup instance when the connection to the primary instance is lost. In the future, Oracle will build TAF capabilities into other tools and products such as Developer 2000, the Pro*C precompiler, and the JDBC-Thick driver, which uses OCI8.

Net8 provides two TAF parameters that you specify in the tnsnames.ora file. These parameters, TYPE and METHOD, control the failover operation and are described in the following two sections.

The TYPE parameter

The TYPE parameter is used in a tnsnames.ora connect string to specify the type of failover functionality that you want for the connection. There are three possible selections:

NONE

No failover is to be attempted. This is the default setting.

SESSION

A connection to a backup database is automatically established in the event that the primary database goes down. However, TAF does not apply any of the ALTER SESSION commands to re-create the session environment.

SELECT

A connection to a backup database is automatically established in the event that the primary database goes down. SELECT statements that were in progress in the primary database are then automatically reexecuted in the backup database.

The METHOD parameter

The METHOD parameter is used in a tnsnames.ora connect string to determine when the connection to the backup database occurs. It has two possible values:

BASIC

Connects to the backup database only when the connection to the primary database fails. This is the default value.

PRECONNECT

Connects to the backup database at the same time that the connection to the primary database is made.

Using the PRECONNECT setting saves time when a failure occurs, because you will already be connected to the backup database. The tradeoff, however, is the overhead caused by the normally unused connections to the backup database.

TAF limitations

TAF has several restrictions. When a runtime failover occurs using TAF, the effects of any ALTER SESSION statements that were executed in the primary database are lost. They are not carried over to the backup database. The state of any PL/SQL package variables is also lost. TAF is helpful with query-only transactions when the value of the TYPE parameter is set to SELECT. In that case, the OCI8 library keeps track of the number of rows fetched by the SELECT statements being executed in an instance. When a failover occurs, OCI8 automatically reexecutes the queries in the backup database. The rows that were already fetched before are ignored, and the remaining rows are visible to the user. However, because the query has to be processed twice, the query response time will be slower than normal.

DML transactions represent the most difficult case to handle in a failover situation. Currently, failover-aware products such as SQL*Plus cannot failover transactions composed of DML statements such as INSERT, UPDATE, and DELETE. Remember that when a primary instance fails, one of the surviving instances will recover any lost transactions. All committed changes are reapplied, and all uncommitted changes are rolled back. To take care of any DML transactions interrupted by a failover, your application will need to have failover-specific code. This failover code will have to re-create the session environment and resubmit the failed transaction to the backup instance. Also, if the application is a batch application, the failover code needs to be sophisticated enough to skip over transactions that were already committed prior to the failover, allowing the application to resume processing from the point at which the failure occurred.

A TAF example

The following excerpt from a tnsnames.ora file shows an example of two net service names configured for TAF. The two services back each other up. HADB2 serves as the backup for HADB1 and vice versa. In this configuration, runtime failover is indicated by the FAILOVER_MODE parameter:

HADB1 =
  (DESCRIPTION = 
    (ADDRESS = (PROTOCOL = TCP)(HOST = NODE1)(PORT = 1521))
    (CONNECT_DATA = 
      (SID = HADB1)
      (FAILOVER_MODE = (TYPE = SESSION)(METHOD = BASIC)
                       (BACKUP = HADB2))
    )
  )

HADB2 =
  (DESCRIPTION = 
    (ADDRESS= (PROTOCOL = TCP)(HOST = NODE2)(PORT = 1521))
    (CONNECT_DATA = 
      (SID = HADB2)
      (FAILOVER_MODE = (TYPE = SESSION)(METHOD = BASIC)
                       (BACKUP = HADB1))
    )
  )

Using SQL*Plus, you can easily test the TAF configuration illustrated in the sample tnsnames.ora file. You can test failover by shutting down the primary database. You also can use the ALTER SYSTEM DISCONNECT SESSION command to trigger a failover even though the primary instance is alive. Note that failover does not occur when the user process (the Oracle shadow process) is killed when the primary instance is alive. In other words, using a “kill -9” command to kill a SQL*Plus session won’t result in failover. For a user session, you can verify the TAF failover configuration and check to see if the session has failed over, using the following query against the V$SESSION view:

SELECT sid, username, failover_type, 
            failover_method, failed_over 
FROM V$SESSION 
WHERE sid = sid;

The example in this chapter uses a simple tnsnames.ora configuration to explain how the TAF mechanism works. TAF also may be used in conjunction with other Net8 options such as the Oracle Names Server. For information on how to configure TAF in different Net8 environments, refer to Oracle Corporation’s Oracle8i Net8 Administrator’s Guide.

Failover in a Three-Tier Architecture

Three-tier architectures often are used in OLTP environments with high transaction volumes. As the name suggests, there are three distinct logical layers that participate in the processing of a transaction:

  • The frontend is responsible for presentation and interaction with the user.

  • Often, the middle tier contains Transaction Processing (TP) monitors that are responsible for transaction routing. Application servers are also often on the middle tier and implement the business logic to process client requests.

  • The backend database servers provide access to the data from the application servers.

You may have several different types of application servers in a three-tier environment, with each application server providing a specific type of service. TP monitors then distribute the workload among application servers. In an OPS environment, TP monitors can reduce synchronization overhead by helping to achieve application partitioning. TP monitors can route transactions to the appropriate application server and database server combination based on the data access requirements of those transactions.

Application servers register their service with the TP monitor when they are up and have successfully connected to the backend database. In the event that any service is unavailable because a database instance has failed (or for any other reason), the service is deregistered from the TP monitor. Thus, when the TP monitor routes a new client transaction, it routes to an application server that is available and is still connected to a database instance. This mechanism achieves connect-time failover indirectly.

Application servers can trap database errors to detect instance failures that occur once they begin working on a specific transaction assigned by the TP monitor. Instance failures that occur before a transaction starts usually are indicated by one of the following errors:

ORA-1033: ORACLE initialization or shutdown in progress
ORA-1034: ORACLE not available
ORA-1089: immediate shutdown in progress--no operations are permitted

If the database connection is lost during a transaction, then the application server will typically get one of the following errors:

ORA-3113: end-of-file on communication channel
ORA-3114: not connected to ORACLE
ORA-1092: ORACLE instance terminated--disconnection forced

Application servers can be programmed to check for these errors and to reconnect to a predetermined backup database when these errors occur. Application servers then can resubmit any failed transaction. The actual mechanisms used to implement application failover vary between different implementations of three-tier architecture. The specific mechanisms used depend on the capability and features of the TP monitor and of the application server. Figure 12.5 illustrates a failover situation in a three-tier architecture. One instance has failed, and the application servers have connected to the backup instance.

Failover in a three-tier architecture

Figure 12-5. Failover in a three-tier architecture

The physical location of the different layers in a three-tier architecture may vary greatly. Sometimes, the middle tier and the database servers are all located on the same physical system. In other implementations, the TP monitors and application servers may be located on physically separate systems, in effect creating a four-tier environment. Often, the middle layer is deployed on more than one system in order to eliminate the TP monitor and application servers as a single point of failure. BEA’s Tuxedo three-tier architecture is one of the widely used TP monitor architectures .

Failback

Failback refers to the process by which client applications that have failed over to a backup node are reconnected to the primary node after the database instance on that node has been successfully restarted. Failback involves disconnecting the client application from the backup instance at an appropriate time, so that ongoing transactions are not interrupted. The client application then is reconnected to the original instance.

In a two-tier architecture, failback is achieved when the application exits and restarts. When the application restarts, it is by default connected to the primary instance. In a three-tier architecture, the middle layer handles failback. Instead of restarting the middle-tier applications, which would disrupt a large number of users, a signal may be sent that causes the middle-tier applications to attempt a reconnect to the original database instance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.67.16