Chapter 11. Data Guard Best Practices

In many Data Guard installation cases, people may think that seeing the main functions of Data Guard running is enough for a successful deployment of Data Guard. In other words, if redo is being transferred from the primary database to a standby database(s), and is being applied on the standby, it's a smooth Data Guard configuration. However, if the configuration is prepared keeping in mind best practices, which is the topic of this chapter, it will be more robust, effective, and complete.

In this chapter, we will discuss the following topics:

  • Configuring connection failover
  • Archived log deletion policy on a standby database
  • Using flashback on a standby database
  • Database rolling upgrade using a transient logical standby
  • Corruption detection, prevention, and automatic repair with Oracle Data Guard

Let's start with configuring a connection failover in a Data Guard environment.

Configuring a connection failover

Building a configuration in which database clients are able to automatically connect to a new primary database after a role change is vital in Data Guard. If we skip this important aspect, it may be very hard to configure connections of all database users to the new primary database.

Now, let's learn about the important connection failover terms – Transparent Application Failover (TAF), Fast Connection Failover (FCF), and Fast Application Notification (FAN).

Transparent Application Failover (TAF)

TAF is a connection failover configuration of Oracle Call Interface (OCI) that is used for high-availability environments such as Oracle Data Guard, Oracle Real Application Clusters (RAC), and Oracle Fail Safe.

When using this configuration, clients can automatically establish a prespecified connection to the database after a failure of the database instance. In RAC, this means connecting to one of the surviving instances and in Data Guard it means connecting to the new primary database after failover.

We can configure TAF in two ways – client-side configuration and server-side configuration:

  • Client-side TAF configuration: The TNS connect string is configured to specify the failover details.
  • Server-side TAF configuration: The database service attributes are configured to specify the failover details. This method will be more effective when there are many client connections.

Tip

If both client- and server-side TAF configurations exist in the database environment, the server-side configuration properties will be valid regarding the order of precedence.

TAF will not only establish a new connection to the database, but also re-run a select statement and reposition the cursor if preferred. We can configure TAF only for establishing a new connection, which is called session failover, or for the recovery of the session and query, which is called select failover. With the select failover mode, the number of rows fetched by the cursor is tracked and when the connection is lost, it's recovered by TAF by repositioning the cursor. So, the client doesn't restart but resumes fetching rows. This is especially good for long-running, time-critical reports and data extractions.

Tip

TAF cannot failover inserts, updates, or deletes. The Oracle database rolls back these DML operations in case of a failure.

It's possible to monitor TAF properties of sessions using the V$SESSION dynamic performance view. The following query result will show the service name, failover type, failover method, and failover failovers, if occurred, for the clients connected to the database:

SQL> SELECT USERNAME, SERVICE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER FROM V$SESSION;

Configuring the client-side TAF

The client-side TAF is configured using the TNS connection string. The following string is an example of a Data-Guard-enabled client-side TAF configuration where primary and standby databases are 11gR2 RAC with Single Client Access Names (SCAN) being used:

OLTP =  
  (DESCRIPTION =  
    (LOAD_BALANCE=OFF)                                    
    (FAILOVER=ON)
    (ADDRESS_LIST =  
      (ADDRESS = (PROTOCOL = TCP)(HOST = PRIMARY_SCAN)(PORT = 1521))  
      (ADDRESS = (PROTOCOL = TCP)(HOST = STANDBY_SCAN)(PORT = 1521))  
    )  
    (CONNECT_DATA =  
      (SERVICE_NAME = OLTP)  
      (SERVER = DEDICATED)  
        (FAILOVER_MODE =  
          (TYPE = session)  
          (METHOD = BASIC)  
          (RETRIES = 15)  
          (DELAY = 10)  
        ))))

Configuring the server-side TAF

In 11gR2, we configure the server-side TAF using Server Control Utility (SRVCTL). The srvctl add service command adds a new service, and the srvctl modify service command changes settings for a predefined service. The following is an example of creating a TAF-enabled database service on a RAC and Data Guard configuration. We must create this service both in the primary and standby hosts with the following commands:

  • For the primary cluster, use the following:
    srvctl add service -d ORCL_PRIM -s OLTP -r prim_node1,prim_node2 -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 15
  • For the standby cluster, use the following:
    srvctl add service -d ORCL_STD -s OLTP -r std_node1,std_node2 -l    PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 15

The following table lists the definitions of the command options:

Option

Description

-d

This gives a unique name for the database.

-s

This gives the service name.

-r

For RAC databases, this gives the list of preferred instances on which the service runs.

-l {[primary] | [physical_standby] |

[logical_standby] | [snapshot_standby]}

This gives the service role. Service is automatically started when the database is in this role.

-q {TRUE | FALSE}

This indicates whether AQ HA notifications should be enabled for this service.

-e {NONE | SESSION | SELECT}

This gives the failover type – session failover, select failover, or none.

-m {NONE | BASIC}

This gives the failover method. If the session failover or select failover type is selected, you should use BASIC for this option. NONE means TAF is disabled.

-z

This gives the number of failover retries.

-w

This gives the time delay between failover attempts.

If we're going to use the physical standby database with active Data Guard for reporting, we should create a service for this purpose on both primary and standby hosts. For example:

  • For the primary cluster, create the following service:
    srvctl add service -d ORCL_PRIM -s REPORTING -r prim_node1,prim_node2 -l PHYSICAL_STANDBY -q TRUE -e SESSION -m BASIC -w 10 -z 15
  • For the standby cluster, create the following service:
    srvctl add service -d ORCL_STD -s REPORTING-r std_node1,std_node2 -l    PHYSICAL_STANDBY -q TRUE -e SESSION -m BASIC -w 10 -z 15

In this case, the service will be created at the cluster level but the service definition will not be applied to the standby database because it's read-only. For this reason, we run the DBMS_SERVICE.CREATE_SERVICE procedure for the REPORTING service on the primary database, and the service definition will be replicated to the standby database with Redo Apply.

EXECUTE DBMS_SERVICE.CREATE_SERVICE( service_name => 'reporting' network_name => 'reporting' goal => 'NULL' dtp => 'NULL' aq_ha_notifications => 'TRUE' failover_method => 'BASIC' failover_type => 'SESSION' failover_retries => 15 failover_delay => 10 clb_goal => 'NULL'),

Using the previous server-side TAF configuration (the OLTP and REPORTING services), it's not necessary to configure TAF at the client side in the tnsnames.ora file. The following TNS entry is an example that can be used to connect the OLTP service:

OLTP= 
  (DESCRIPTION_LIST=
  (LOAD_BALANCE=OFF) 
  (FAILOVER=ON) 
(DESCRIPTION= 
  (CONNECT_TIMEOUT=3)(TRANSPORT_CONNECT_TIMEOUT=2)(RETRY_COUNT=3) 
    (ADDRESS_LIST=
   (LOAD_BALANCE=ON)
   (ADDRESS=(PROTOCOL=TCP)(HOST=PRIMARY_SCAN)(PORT=1521))) 
    (CONNECT_DATA=(SERVICE_NAME=OLTP)))
(DESCRIPTION= 
  (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3) 
    (ADDRESS_LIST=
   (LOAD_BALANCE=ON) 
(ADDRESS=(PROTOCOL=TCP)(HOST= STANDBY_SCAN)(PORT=1521))) 
    (CONNECT_DATA=(SERVICE_NAME=OLTP))))

In this TNS entry, both the primary and standby SCAN hostnames are involved. Just below DESCRIPTION_LIST, we can see LOAD_BALANCE=OFF. This means that the client will try to connect the DESCRIPTION definitions in order. If it can't connect to the primary database, it'll try to connect to the standby database. However, below the DESCRIPTION definitions, we see LOAD_BALANCE=ON. This is about the connection to the RAC database and new connections are going to be assigned to three IP addresses of the SCAN name randomly.

For each DESCRIPTION definition, we can see some TNS string parameters set. CONNECT_TIMEOUT specifies the total time to establish an Oracle net connection to a database. It includes the TRANSPORT_CONNECT_TIMEOUT value, which is the time taken by a client to establish a TCP connection to the database server. It's possible to set the CONNECT_TIMEOUT value globally for a database instance in the sqlnet.ora file using the SQLNET.OUTBOUND_CONNECT_TIMEOUT parameter. Also, we can set the TCP.CONNECT_TIMEOUT parameter for a global TRANSPORT_CONNECT_TIMEOUT value. The last parameter, RETRY_COUNT, specifies the maximum number of connection attempts for the DESCRIPTION definition.

In this configuration, the following algorithm will be applied when clients connect to a database using the OLTP service:

  1. The PRIMARY_SCAN hostname is resolved to three IP addresses.
  2. One of the IP addresses is randomly selected and a connection attempt is performed.
  3. If the IP address doesn't respond in the time we set in TRANSPORT_CONNECT_TIMEOUT, which is two seconds, or the IP address responds but a connection can't be established in three seconds (CONNECT_TIMEOUT), it'll try the next IP address. There will be a maximum of three retry attempts because of the RETRY_COUNT setting.
  4. When the client can't connect to the primary database with the first DESCRIPTION definition, it'll try to connect the second DESCRIPTION definition, which is the standby database.
  5. The STANDBY_SCAN hostname will be resolved to three IP addresses.
  6. One of the IP addresses is randomly selected and an attempt for a connection is performed. The same settings are defined for the standby database description.

Note that automatically controlling the startup of services by assigning a role to the service with SRVCTL is an 11gR2 feature. In earlier releases, we can create a trigger to ensure that the service is started only for the specified database role, such as the following example:

create trigger TAF_TRIGGER after startup on database
declare
 db_role varchar(30);
begin
 select database_role into db_role from v$database;
 if db_role = 'PRIMARY' then
 DBMS_SERVICE.START_SERVICE('OLTP'),
 else
 DBMS_SERVICE.STOP_SERVICE('OLTP'),
 end if;
end;
/

Fast Connection Failover (FCF)

Fast Connection Failover is the equivalent of Transparent Application Failover for Java Database Connectivity (JDBC) clients. TAF works for OCI clients and FCF works for JDBC clients.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.206.69