Chapter 6. High availability

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

High availability

IBM z Systems is a perfect choice for installations where high availability (HA) is key customer criteria in their hardware infrastructure selection for Oracle database deployments. This chapter shows how the highly regarded HA characteristics of IBM z Systems are an ideal complement for installations that are running the Oracle Database and use Oracle’s specific HA features.

In this chapter, we describe many of the Oracle Database options that are available to customers as a standard feature or a chargeable License Fee Based option. For more information about all the Oracle options, see Oracle’s documentation or online resources.

We also describe the Linux on IBM z Systems features that play an important role for clients who are looking for a hardware server with a robust set of characteristics in this domain.

Finally, we provide examples of many scenarios where the Oracle Database that is running Linux on IBM z Systems can provide an excellent HA enterprise-ready environment solution.

This chapter includes the following topics:

•Oracle HA options

•HA building blocks for Oracle Database on IBM z Systems

•IBM z Systems with Oracle Database: Considerations and examples for HA

•Summary and recommendations

6.1 Oracle HA options

The topic of HA is important for installations that are running databases that process critical applications for the organizations that they serve. This subject affects almost every data center or Cloud environment in the IT Industry. HA is important for those customers who choose Oracle Database as their software in this area and have (or are considering) IBM z Systems as their infrastructure hardware.

The available Oracle HA options are listed in Table 6-1.

Table 6-1 Oracle HA options

HA/DR feature	Benefit	Cost	Level of availability
Real Application Clusters One Node	Automated fail over (active/passive) for planned and unplanned outages	Oracle optional licensed feature	High Availability with minimal disruption to database (seconds)
Real Application Clusters	Continuous availability of cluster	Oracle optional licensed feature	High Availability with no disruption to database
Application Continuity	Protects applications from database session failures by replaying in-flight transactions	Licensed (included with RAC, RAC One Node, and Active Data Guard)	Disaster Recovery
Edition-Based Redefinition (EBR)	Enables online applications upgrades with uninterrupted availability of the applications	Included with Enterprise Edition	High Availability for maintenance
Oracle Flashback	Flashback provides efficient and quick error correction (logical and table rows)	Included with Enterprise Edition	High Availability for error correction
Online Reorganization and Redefinition	Provides flexibility to modify a table’s physical attributes, while allowing users full access to the database	Included with Enterprise Edition	High Availability for database maintenance
Data Guard/Active Data Guard	Disaster-recovery solution that provides fast automatic failover, in the case of database or node failures.	Data Guard included with Enterprise Edition. Active Data Guard requires a license	Disaster Recovery
GoldenGate	Provides real-time data replication for heterogeneous environments	Licensed product	Disaster Recovery and Migration/Replication Services
Global Data Services (GDS)	GDS provides region-based workload routing, and replication lag based workload routing for Active Data Guard	Included with GoldenGate or Active Data Guard	Disaster Recovery
Storage Base Replication Solutions	Storage replication uses disk consistency groups for databases and applications, to provide a simplified DR approach.	Replication software cost (non-Oracle products)	Disaster Recovery
Oracle Site Guard	Oracle Site Guard enables administrators to automate complete site switchover or failover.	Included with Data Guard	Disaster Recovery
Recovery Manager (RMAN)	RMAN provides a foundation for backing up and recovering the Oracle database.	Included with Oracle Database Enterprise Edition.	High Availability for back ups
Secure Backup (OSB)	OSB delivers centralized disk or backup	Included with Oracle Database Enterprise Edition	High Availability for back ups
Oracle Multitenant	Consists of a container database (CDB) for the metadata and the pluggable databases (PDB) which contain the application database (can be unplugged and plugged into different CDB databases)	Optional licensed feature of the Database	High Availability

Table 6-1 on page 76 is meant to be a general overview of possible Oracle Database HA features that might be used with IBM z Systems.

Oracle Maximum Availability Architecture (MAA) is a set of best practices that are developed by Oracle that help with a HA architecture. It uses the Oracle HA and disaster technologies. MAA is designed to provide data protection and availability by minimizing or eliminating planned and unplanned downtime at the technology layers, including hardware and software components. It delivers protection and HA for many types of failure, whether from hardware failures that cause data corruption, or anything ranging from human error to software malfunctions. It also includes HA features to help an installation when nature-related disasters affect a broad geographic area.

Oracle provides a comprehensive and integrated set of HA technologies that enable rapid recovery from failures and minimize planned downtimes. The focus of this section is on HA options that are related to the Oracle Database.

6.1.1 Oracle Real Application Clusters (RAC)

Oracle Real Application Clusters (RAC) is a cluster database with a shared cache architecture that provides highly scalable and available database solutions for all business applications.

It was introduced on Oracle Database Version 9 (9i) several years ago. It is one of the key options for HA that installations can choose when running Oracle Database. RAC is a chargeable License Fee Based option of the Oracle Database Enterprise Edition.

Oracle RAC on Linux running on IBM z Systems provides multiple Linux guest operating systems for a database application to help limit application downtime. Every quarter, Oracle provides patch set updates with important security and database fixes. Oracle RAC/RAC One Node application downtime can be minimized by patching one system at a time (rolling), while the other Linux guest systems continue to run applications.

6.1.2 Oracle RAC One Node

Oracle RAC One Node is a single instance of an Oracle RAC-enabled database that is running on one node in a cluster. It is an HA solution that runs a database on one node of a cluster at a lower price point than full Oracle RAC.

RAC One Node provides automatic failover capability to other nodes in the cluster. Typically, planned maintenance operations can be completed with minimal disruption to application users by relocating the application Virtual IP addresses (VIPs) while the instances are active and running.

Technologies, such as transparent application failover (TAF), can be used to seamlessly move an active running SQL statement from one Oracle RAC ONE node to another node without disruption to the user.

One of the benefits of Oracle RAC One Node is that applications that might not be RAC friendly are not as affected as a full multi-node RAC Cluster solution. For example, an insert intensive application that uses many non-cached sequences might perform better running on only one node versus being distributed across multiple nodes in a RAC cluster.

Another benefit of Oracle RAC One Node is that a RAC One Node mode can be converted to full RAC (and vice versa) by using the srvctl convert command.

6.1.3 Oracle Clusterware

Oracle Clusterware provides the foundation for running Oracle RAC/RAC One Node.

A cluster consists of multiple interconnected computers or servers that appear as though they are one server to users and applications. Oracle RAC uses Oracle Clusterware for the infrastructure to bind multiple servers so that they operate as a single system.

In the addition to being the integrated foundation for Oracle RAC, Oracle Clusterware 12c Release 1 can be used to deliver HA for other applications. Also, Oracle Clusterware depends on interconnect technology for rapid dissemination of information between different Oracle instances.

6.1.4 Edition-based redefinition

Edition-based redefinition (EBR) is an Oracle Database feature that was originally introduced with Oracle Database 11gR2. It enables online applications upgrades with uninterrupted availability of the applications. When the installation of an upgrade is complete, the pre-upgrade application and the post-upgrade application can be used simultaneously. Therefore, a session can continue to use the pre-upgrade application until its user decides to end it and all new sessions can use the post-upgrade application. When there are no longer any sessions using the pre-upgrade application, it can be closed. EBR enables hot rollover from a pre-upgrade version to a post-upgrade version, with no downtime.

6.1.5 Online Reorganization and Redefinition

The Online Reorganization and Redefinition feature that is available in Oracle Database 12c provides the flexibility to modify a table’s physical attributes and transform data and table structure, while allowing users full access to the database. This capability improves data availability, query performance, response time, and disk space utilization.

During an application upgrade, administrators can start the redefinition process and at intervals synchronize the interim table so that it includes the latest changes to the original table. The advantage is that the amount of time to complete the final redefinition step is reduced. Also, administrators can validate and use the data in the interim table before completing the redefinition process.

6.1.6 Oracle Flashback

Oracle Database Flashback Technologies are a set of data recovery solutions that allow reversing mistakes by selectively undoing the effects of a previous error. Flashback provides efficient and quick error correction. Flashback supports recovery at all levels, including the row, transaction, table, and entire database. It also provides a growing set of features to view and rewind data back and forth in time.

Oracle Flashback Database is related to data protection features that enable a user to rewind data back in time to correct any problems that are caused by logical data corruption or user errors within a designated time window. Oracle Flashback provides an efficient alternative to point-in-time recovery and does not require a backup of the database to be restored first.

Flashback Database and restore points are effective in traditional database recovery situations and can also be useful during database upgrades, application deployments, and testing scenarios when test databases must be quickly created and re-created. Flashback Database also provides an efficient alternative to rebuilding a failed primary database after a Data Guard failover.

Note: Oracle Flashback does not replace a regular database backup and recovery procedure.

6.1.7 Oracle Data Guard/Active Data Guard

Data Guard is an Oracle database offering that provides the management, monitoring, and automation software to create and maintain one or more synchronized copies of a production database. It also provides HA for mission critical applications.

It is a HA and disaster-recovery solution that provides fast automatic failover, in the case of database failures, node failures, corruption, and media failures. The standby databases can be used for read-only access, reporting purposes, and testing and development purposes. Data Guard is included with Oracle Database Enterprise Edition.

Active Data Guard is an optional license component for Oracle Database Enterprise Edition. Active Data Guard adds advanced capabilities to extend basic Data Guard functionality by allowing for databases to be opened at the disaster recovery (DR) site for read-only access use by applications.

Oracle Far Sync allows for zero data loss at any distance by first copying transaction logs initially to another DR site that is geographically close by using synchronous replication of the database to the nearby site. Then, this nearby site’s data is replicated asynchronously to another standby DR site at a great distance.

6.1.8 Application Continuity

Application Continuity (AC) protects applications from database session failures because of instance, server, storage, network, or any other related component, and even complete database failure. Application Continuity replays affected “in-flight” requests so that the failure appears to the application as a slightly delayed execution, masking the outage to the user.

AC is a feature that is available with the Oracle RAC, Oracle RAC One Node, and Oracle Active Data Guard.

If an entire Oracle RAC cluster fails, which makes the database unavailable, Application Continuity replays the session, including the transaction, following an Oracle Active Data Guard failover. Use of Application Continuity with a standby database requires Data Guard Maximum Availability mode (zero data loss) and Data Guard Fast Start Failover (automatic database failover).

6.1.9 Oracle GoldenGate

Oracle GoldenGate is a software package for real-time data integration and replication in heterogeneous IT environments. It also enables HA solutions, real-time data integration, transactional change data capture, data replication, transformations, and verification between operational and analytical enterprise systems.

Oracle GoldenGate 12c is the newest version of this Oracle product. While maintaining excellent performance with simplified configuration and management, it features a tighter integration with Oracle Database, support for cloud environments, expanded heterogeneity, and enhanced security.

6.1.10 Global Data Services

Oracle GoldenGate and Oracle Active Data Guard allow for distribution of application workloads with replicated databases. When applications are spread across multiple databases in multiple data centers, it can sometimes be challenging to efficiently optimize databases with the best performance based on server loads and network latencies.

Oracle Global Data Services (GDS) provides the following key capabilities for a set of replicated databases that are globally distributed or located within the same data center:

•Region-based workload routing

•Connect-time Load balancing

•Runtime load balancing advisory for Oracle-integrated clients

•Inter-database Service failover

•Replication lag based workload routing for Active Data Guard

•Role-based global Services for Active Data Guard

•Centralized workload management framework

Oracle GDS/Global Service Manager (GSM) is available for Linux running on IBM z Systems, starting with Oracle 12c. Oracle GDS is included with an Active Data Guard or a Golden Gate replication license.

6.1.11 Storage-based replication

Oracle databases typically use a principle of dependent-write I/O through a series of synchronized writes to the data files and logs. No dependent-write is issued until the predecessor write completes. This feature allows those systems to be restorable if there is a power failure.

For an Oracle consistent backup/DR solution in “hot” or online mode, the database must be synchronized with the online database logs. This configuration is typically done by using the alter database begin and alter database end backup commands. When these Oracle backup commands are issued, data blocks for the table spaces are flushed to disk and the data file headers are updated with the last checkpoint System Change Number (SCN).

Updates of the checkpoint SCN to the data file headers are not performed while in this backup mode. Oracle does not update these data file headers while in backup mode. When these files are copied, the non-updated SCN signifies to the database that recovery is required.

For a storage-based “consistent” Oracle replication strategy, co-ordination between the storage-based disk backup solution and the Oracle database is needed.

Oracle Data Guard is the only certified and supported way (by Oracle) to protect Oracle database files from Database block corruption. If an Oracle data block is corrupted on disk, a storage-based replication can replicate the corrupted blocks to the DR site.

Disk-based storage replication is supported for replicating Linux OS file systems, Oracle binaries, and any non-database file systems that are required to be kept in sync with the database.

6.1.12 Oracle Site Guard

Oracle Site Guard is a disaster-recovery solution that enables administrators to automate complete site switchover or failover. It orchestrates the coordinated failover of Oracle Data Guard with Application Servers, which can be integrated with a storage-based replication technique.

Oracle Site Guard integrates with underlying replication mechanisms that synchronize primary and standby environments and protect mission critical data. It includes built-in support for Oracle Data Guard for Oracle database. Oracle Site Guard supports typical storage replication solutions that are used with IBM z Systems but requires some scripting to integrate with a SAN storage-based replication solution.

6.1.13 Oracle Recovery Manager

Oracle Recovery Manager (RMAN) provides a comprehensive foundation for efficiently backing up and recovering the Oracle database. Providing block-level corruption detection during backup and restore, RMAN optimizes performance and space consumption during backup with file multiplexing and backup set compression. It works with Oracle Secure Backup and third-party media management products for tape backup.

RMAN handles all underlying database procedures before and after backup or restore, which frees dependency on OS and SQL*Plus scripts.

6.1.14 Oracle Secure Backup

Oracle Secure Backup (OSB) delivers centralized disk and backup management for the entire IT environment. It consists of the following offerings, both of which are integrated with Oracle RMAN:

•Centralized backup management and high-performance and heterogeneous data protection in distributed UNIX, Linux, Windows, and network-attached storage (NAS) environments.

•An OSB loud Module that provides integrated Oracle Database backup to Amazon S3 cloud (Internet) storage.

6.1.15 Oracle Multitenant

Oracle Database 12c offers a new chargeable License Fee Based option that is named Oracle Multitenant. This option provides simplified consolidation that requires no changes to your applications. In this new architecture, a multitenant container database can hold many pluggable databases. An administrator manages the multitenant container database, but application code connects to one pluggable database.

Oracle Multitenant consists of a container database (CDB) which features all of the metadata for the Oracle system. No application data schemas are configured in the CDB.

The pluggable database (PDB) contains the application-specific database, and can be unplugged and plugged into different CDB databases.

Oracle Multitenant allows the management of many databases as one and can provide several benefits from an administration perspective. For example, in a multiple PDB configuration, many pluggable databases can be backed up with one Oracle RMAN configuration, which reduces the time that is necessary to back up many databases if they were independent.

Oracle Multitenant in a HA scenario allows for the ability to unplug and move a pluggable database from one container database to another. This feature is useful in a HA environment where a pluggable database is moved to another CDB running on another guest.

In the case of an Oracle upgrade, the database upgrade downtime is reduced as the schema updates are mostly made to the system table spaces, which are a component of the CDB and not to the metadata components of the PDB. For example, unplugging a PDB from a 12.1.0.1 CDB into a 12.1.0.2 CDB reduces downtime for an application instead of directly upgrading a database from 12.1.0.1 to 12.1.0.2.

6.2 HA building blocks for Oracle Database on IBM z Systems

This section describes how Oracle HA options interact with Linux running on IBM z Systems. The many Oracle HA options are an ideal complement for the superb HA features that are provided by IBM z Systems.

Figure 6-1 shows a complete HA scenario that consists of the following major components:

•The Oracle Database and its HA options

•The Linux operating system

•IBM z Systems

Figure 6-1 Building blocks of an HA environment for Oracle Database on Linux for IBM z Systems

6.2.1 Hardware provided HA

IBM z Systems has very high reliability through various redundant hardware options. IBM z Systems has transparent CPU sparing. That is, if a CPU encounters a hardware failure, a spare CPU can be quickly brought online without affecting the applications or database.

IBM z Systems has Redundant Array of Independent Memory (RAIM) memory modules. RAIM memory prevents a memory module from affecting the availability of a system or application.

In addition, IBM z Systems includes numerous other RAS features. The following hardware-provided HA features are a standard offering:

•Concurrent maintenance: At a minimum of two processor books, hardware parts of the processor can be replaced without a Power on Reset with the normal continuation of operations.

•Chip Sparring in Memory: An error detection and correction mechanism in the IBM z Systems, which allows error detection during instruction execution and transparent error correction of spare processors have been configured to the system.

•N+1 Processor supplies: The ability to have two sets of redundant power supplies. Each set of the power supplies has its individual power cords or pair of power cords, depending on the number of Bulk Power Regulator (BPR) pairs installed.

6.2.2 Operating system HA

Several operating systems can run on IBM z Systems. In this section, we describe those systems that are relevant for an environment in which Oracle Databases are deployed.

Linux clustering

Red Hat Enterprise Linux and SUSE Enterprise Linux Server are the only two Linux distributions that are certified by Oracle to run its database on IBM z Systems. Oracle Clusterware works by moving application VIPs between Linux guest nodes. Several Linux operating system-based solutions can also be used to provide HA for applications.

SUSE’s Linux Enterprise High Availability Extension, Sine Nomine’s High Availability Option (HAO) for Red Hat, Red Hat High Availability, and Resilient Storage Add Ons (as of RHEL 7 Update 2), and IBM Tivoli System Automation all can be configured to provide high availability for business-critical applications.

These Linux-based HA solutions work in a similar manner, whereby Oracle Database is configured to run on a Linux guest. If planned maintenance or an unplanned system event occurs, the HA solution fails over to another idle Linux guest.

The Linux HA solution shuts down the source system database and unmounts the Oracle file systems on the source system. Next, the Linux HA solution mounts the necessary Oracle file systems, starts the application VIPs on the server, and then, starts the Oracle listener on this VIP IP address on the failover server. The last step is to start the database on the failover Linux guest such that the applications can now connect to the failover server.

Depending on activity load in the database and other factors, a failover can be as quick as a few seconds to the new server, or up to several minutes if many insert or update operations require rolling back.

Hypervisor (z/VM)

IBM z/VM offers a base to use IBM virtualization technology on IBM z Systems. Fully tested and deployed in Oracle development environments, Oracle products are certified with z/VM.

IBM z/VM provides a highly secure and scalable enterprise cloud infrastructure. It also provides an environment for efficiently running multiple diverse critical applications on IBM z Systems with support for more virtual servers than any other platform in a single footprint.

IBM z/VM virtualization technology is designed to allow for the capability to run hundreds to thousands of Linux servers on a single IBM z Systems footprint.

IBM Wave for z/VM helps dramatically simplify administration and management of z/VM and virtual Linux servers that are running on IBM z Systems. IBM Wave integrates seamlessly with z/VM and enterprise Linux environments to help administrators view, organize, and manage resources in an optimized and standardized manner.

6.2.3 Oracle provided HA

Oracle Clusterware provides HA for an application by transferring the applications connectivity information with VIP. When a node goes down (planned or unplanned), the application connectivity information or VIP is moved to another node in the cluster to accept the application’s connections.

When the VIP is moved from one Linux host to another, Oracle broadcasts the new IP to the network and the newly associated network media access control or MAC address with an ARP command.

Applications can seamlessly route database connections to the new server without incurring downtime by connecting to the highly available VIP, as opposed to a hardcoded Linux server IP address.

Oracle Clusterware interfaces directly with the hardware technology available on IBM z Systems.

6.3 IBM z Systems with Oracle Database: Considerations and examples for HA

In this section, we provide some considerations and examples of HA.

6.3.1 Networking

IBM z Systems has several options for designing the network between Logical Partitions (LPARs) and or other servers on a network.

An Open System Adapter (OSA) is a physical network card in IBM z Systems. An OSA can be dedicated to an Oracle Linux guest, which is shared among multiple Linux guests, or configured in a virtual switch (VSwitch) configuration. VSwitches are beneficial for systems with many Linux guests sharing network infrastructure.

A HiperSocket is a high-speed, low-latency memory to memory network that traverses internally within IBM z Systems. IBM HiperSockets™ are advantageous for workloads in which a large amount of data must be transferred quickly between LPARs provided there is enough CPU capacity to support the network transfers.

Figure 6-2 shows some of these network concepts to consider in a network design for development, test, and production environments.

Figure 6-2 IBM z Systems network considerations

6.3.2 Oracle HA networking options

IBM z Systems include the following options for supporting the Oracle Interconnect between Oracle nodes in an Oracle RAC or Oracle RAC OneNode cluster configuration:

•Link Aggregation: (active/active) Allows up to eight OSA-Express adapters to be aggregated using a VSwitch type of configuration.

•Linux Bonding: Whereby Linux provides the HA by using two or more Linux interfaces in an active/backup or active/active configuration; for example, Linux interfaces “eth1 and eth2” are configured to create a highly available bonded interface that is named “bond0.”

Oracle HA IP (HAIP): With Oracle 11gR2+, you can now have up to four Private interconnect interfaces in which Oracle’s HAIP functionality is used to balance network load across the Oracle RAC/RACONE interconnect network interfaces.

6.3.3 Oracle RAC Interconnect z/VM Link Aggregation

Figure 6-3 shows typical IBM z Systems topology, with multiple Oracle RAC clusters, that uses multiple VSwitches with Link Aggregation.

Figure 6-3 Oracle Interconnect with z/VM Link Aggregation

6.3.4 Oracle RAC interconnect considerations

The design of the Oracle RAC/OneNode interconnect is vital to the availability of an Oracle cluster. Oracle requires that the Cluster interconnect is configured on a private, dedicated LAN or VLAN (tagged or untagged), which is non-routable and isolated from other non-interconnect traffic.

The benefit of a VSwitch approach is that multiple RAC clusters can share the network infrastructure. It is strongly recommended to have a minimum of two VSwitches for an Oracle RAC configuration (public and private).

IBM z Systems environments that run in two separate LPARs can use multiple HiperSocket interfaces, which can be load balanced with Oracle’s HAIP load balancing capability.

For environments that are CPU bound or cluster interconnect sensitive, the dedicated OSA cards approach (outside of a VSwitch) works well to ensure that the cluster interconnect continues to operate efficiently during periods of high CPU load.

6.3.5 I/O channel failover considerations

IBM z Systems supports multiple logical channel subsystem (LCSS) images that are mapped onto a physical channel subsystem (see Figure 6-4).

Figure 6-4 Example of an I/O channel failover architecture

Consider the following points:

•The physical hardware, such as the FICON channels and OSA (Open System Adapter) cards, are shared by using the logical channel architecture.

•The processors that run the channel subsystem are called the System Assist Processors (SAP). More than one SAP can be running the channel subsystem. The SAP drives the z Systems I/O channel subsystem, which serves a collection of more than 1,000 high-speed buses.

•The SAP relieves the operating system and the general-purpose CPs of much of the work that is required to run I/O operations.

Specifically, the SAP schedules and starts each I/O request; that is, it locates an available channel path to the requested I/O device and starts the I/O operation. The SAP does not handle the actual data movement between central storage (CS) and the channel.

•Channels are the communication path from the CSS to the control units and I/O devices. They are represented by the black rectangles in Figure 6-4: One for FICON, one for OSA.

•Channel Path Identifier (CHPID) is a value that is assigned to each channel path that uniquely identifies that path. You can have a maximum of 256 in an LCSS.

•Within the physical I/O subsystem, channel paths and I/O adapters (for example, Open System Adapter Ethernet cards) can be shared, potentially by all logical subsystems. As shown in Figure 6-4, the FICON Channel is being shared by all 4 LCSS, and thus all partitions.

•If necessary (for example, for performance reasons), a path or an adapter can be restricted to one LCSS or a subset of the LCSSs. a path or an adapter can be restricted to one LCSS or a subset of the LCSSs (as shown with the OSA on the right in Figure 6-4), which can be used only by LCSS 4 or the blue LPARs that are shown at the top of Figure 6-4.

More information

In this section, we provide more information about the LCSS, its maximums in terms of LPARs, Channels, and Physical devices it supports, and why we must create the LCSS concept to map to the physical channel subsystem.

IBM z Systems has a unique channel architecture that is designed to provide powerful and flexible support for the most demanding I/O performance and high volume workloads. This channel architecture is managed through the foundation technology called LCSS. Each IBM z Systems can have up to four of these LCSSs, and each can support 15 LPARs. Each LPAR can address 256 data channels and 64,000 I/O devices. Therefore, a single IBM z Systems today can handle over 1,000 data channels and a quarter of a million I/O devices.

The LCSS’s architecture is further enhanced by the z Systems Multiple Image Facility. This Multiple Image Facility allows all 15 logical partitions that share a common LCSS to directly access each I/O device without having to forward the request through an intermediate partition, as is the case with UNIX architectures. The direct result of a shorter path length is improved I/O performance.

IBM z System’s Channel Spanning allows each device (disk or tape units that are attached by using Fibre Channel technology) to appear to be on any LCSS. The result is that it can be accessed by any partition on z Systems. Therefore, greater I/O flexibility and simplified operational management can be achieved.

IBM z Systems Supports:

•Up to four Logical Channel Subsystems (LCSS)

•Up to 15 Logical Partitions (LPARs) per LCSS

•Up to 256 Channels per LCSS

•Up to 256 CHIPD per LCSS

•Up to 64 K Physical Devices per LCSS

•Up to 1024 CHPIDs made available for entire system

•Up to 1024 Physical Channels for entire system

Usually in computing, virtualization allows more logical resources than physical. In this case, the physical resources exceed the logical channel addressing ability of the LCSS.

The architecturally defined channel-path identification number, called the channel-path identifier (CHPID) must be maintained without change. The CHPID value is defined as an 8-bit binary number that results in a range of unique CHPID values 0 - 255.

Since the inception of the precursor S/370 XA channel-subsystem architecture in the late 1970s, this 8-bit CHPID was maintained without change because of its pervasive use in the IBM z/VM operating system.

For example, the CHPID value is maintained in many internal programming control blocks, is displayed in various operator messages, and is the object of various system commands, programming interfaces, and so on, all of which must be redesigned if the CHPID value was increased to more than an 8-bit number to accommodate more than 256 channel paths.

Therefore, another level of channel-path-addressing indirection was created because of the I/O subsystem that allows more than 256 physical channel paths to be installed and uniquely identified without changing the former 8-bit CHPID value and the corresponding programming dependencies on the CHPID.

The new channel-path-identification value, called the physical-channel identifier (PCHID), is a 16-bit binary number 0 - 65,279, which uniquely identifies each physically installed channel path.

With current IBM z Systems, a maximum of 1024 external channel paths out of the 65 K (for example, ESCON, FICON, OSA) and 48 internal channel paths (for example, Internal Coupling and IQDIO hyperlink) are each assigned a unique PCHID value of 0 - 2,047.

6.3.6 Banking example: High availability architecture

Figure 6-5 shows an example of a highly available banking infrastructure design using IBM z Systems. Oracle RAC, and IBM Geographically Dispersed Parallel Sysplex™ (GDPS®) use peer to peer synchronous (PPRC) disk replication that uses existing IBM z Systems infrastructure for disaster recovery of IBM z/OS® workloads.

IBM GDPS uses the dependent-write concept available through Oracle’s Storage Replication as a basis for providing a storage-based disaster recovery solution. This configuration can be done by ensuring that all the disks contained in the Data and Log disk group/file system are kept in the same disk consistency groups.

IBM GDPS provides advanced clustering technology with automated failover. GDPS moves Linux services from one physical IBM z Systems machine to another within seconds, helping to minimize failover and recovery time.

Synchronous disk replication with Extended Count Key Data (ECKD) disk volumes in a storage consistency groups is used to ensure availability at a DR site (Site 2) during a problem with infrastructure at Site 1.

Figure 6-5 Banking - Example of High Availability Architecture deployed on IBM z Systems

This banking high availability solution mitigates against most high availability or disaster scenarios; except for the case of an Oracle data block corruption. Oracle Data Guard and Oracle Block Media Recovery can help against this additional failure scenario.

Oracle’s Block Media Recovery functionality is used to search the Oracle flashback logs for good copies of the blocks. It then searches for the blocks in a full or level 0 incremental backups. When RMAN finds good copies, it restores them and performs media recovery on the blocks.

Having a layered approach to availability with Oracle RAC, coupled with a disk replication DR solution such as IBM’s GDPS, with a reliant Oracle RMAN / Oracle Flashback configuration, on the highest available hardware platform helps ensure this banking customer the highest availability possible for their business.

6.3.7 Government Client Example: Oracle MAA RAC and Data Guard

Oracle’s Maximum Availability Architecture (MAA) best practices can also be used to provide an Oracle-centric HA and DR solution with IBM z Systems. In this scenario, the Oracle Database, RAC, and Data Guard are deployed on IBM z Systems (for the database portion of the deployment).

Oracle MAA (see Figure 6-6) provides a comprehensive architecture for reducing time for scheduled and unscheduled outages. An Oracle MAA solution consists of two identical sites: The primary site contains the RAC database, the secondary site contains Physical Standby database or both Physical and Logical one on RAC. Active Data Guard is used for online reporting (extra licensed) at the DR site. Data Guard switchover and failover functions allow the roles to be traded between sites.

Figure 6-6 Public Sector client - MAA example

6.4 Summary and recommendations

In this chapter, we described the main Oracle Database options for High Availability. We also described a few of the High Availability options for IBM z Systems.

IBM z Systems has the highest availability rating of any commercially available server on the market today. The IBM z Systems ability to perform dynamic reconfigurations to spare out another processor, use RAIM Memory, utilize another I/O / network path is true high availability.

IBM z Systems virtualization technology has been in the marketplace for over 50 years. It was deployed in Oracle development since the first 10gR2 database was certified on the platform. Decades of experience in error correction and testing methodologies help ensure that the hardware is highly available and that the software also is designed that way.

Combining the highest available server infrastructure of IBM z Systems with Oracle’s maximum availability architecture creates for the most powerful and robust combination available for running business critical Oracle workloads.

IBM z Systems is also a good fit for Oracle HA, due to IBM z Systems dynamic capabilities. More memory, CPU, I/O, or network resources can be dynamically turned on and added to a system microcode without the need to shut down an application.

IBM z Systems with Oracle’s HA can protect an application and a business from unplanned events, which provides excellent business value through Reliability, Availability, Scalability, and Oracle’s HA options.

The IBM z Systems hardware stack is fully certified with Oracle for the entire Oracle database Enterprise Edition solution stack to run virtualized under z/VM.

Oracle continues to demonstrate the delivery of their quarterly patch set updates and critical Patches for Red Hat and SUSE Linux distributions for IBM z Systems currently with other platforms. The most recent on January 17, 2017. This commitment to the currency of critical patches helps add to HA capabilities of running the Oracle database with IBM z Systems.

Customers that have deployed Oracle solutions with IBM z Systems have enjoyed lower total cost of ownership over the entire product life cycle. They also achieve near immediate return on investment by using its crucial embedded security, near flawless system availability, and reliability.

When it comes to HA, IBM z Systems delivers unmatched performance, scalability and mission-critical reliability in physical, virtual, and cloud environments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6. High availability

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 6. High availability