Chapter 5: Storage and ASM Practices

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 5

Storage and ASM Practices

by Kai Yu

In this chapter, I will discuss one of the key infrastructure components for the Real Application Cluster (RAC) database: storage. In the RAC Database environment, storage needs to be accessible and shared by all the nodes of the RAC Database. Database files such as data files, control files, online redo log files, temp files, spfiles, and the Flash Recovery Area (FRA) are kept in this shared storage. In addition, two key sets of files of the Oracle Clusterware—Oracle Cluster Registry (OCR) and voting disk files—are also stored here. Due to the presence of these key components of the RAC Database and Clusterware, shared storage is one of the most critical components of the RAC Database infrastructure.

This shared storage plays a key role in the stability of the Oracle Clusterware. In case any node of the RAC cluster is not able to access the voting disk files that are stored in the storage within a predetermined time threshold (default is 200 seconds), the RAC node will be evicted and get rebooted. As a result, the database instance on the RAC node will also get rebooted. Even more seriously, if the OCR on the storage is lost permanently, the Oracle Clusterware on the entire RAC cluster will no longer function. If this happens, the OCR needs to be recovered from its backup to resume the normal operation of Oracle Clusterware.

Shared storage is also essential for the availability of the RAC Database. Any storage issue may lead to the loss of a partial or entire RAC Database. For example, losing access to shared storage by a RAC node will bring down the RAC Database instance on the RAC node. Loss of a data file or data corruption due to disk errors may cause the loss of a partial or entire RAC Database.

The performance of the RAC Database depends upon the I/O performance of the shared storage. Storage I/O performance is even more important in a RAC Database environment where the heavier I/Os from multiple RAC Database nodes to a single shared storage may trigger huge I/O bottlenecks that hinder database performance.

As an essential part of the life cycle management of a RAC Database, the design, configuration, and management of such a shared storage system is vital for the long-term health and performance of the RAC Database. However, in a typical IT department, storage administration and database administration are two separate job responsibilities that may belong to two different teams, in line with the separation of duties policy commonly followed in IT. Cooperation and mutual understanding between the storage administrator and the database administrator (DBA) are crucial for RAC Database storage design and configuration. In order to achieve database availability and performance requirements defined by the database SLA (Service Level Agreement), some special design considerations and best practices should be incorporated into the storage provisioning process for Oracle RAC Databases. Here we can highlight some of the major tasks of storage provisioning and examine how the different roles such as the storage administrator, the system administrator, and the DBA can play together.

Architecting and implementing the storage solution. Since shared storage is required by the RAC infrastructure, many IT departments use SAN (Storage Area Network) storage for the RAC Database, as it can be accessed by multiple hosts through the storage network. The reality is that a big SAN storage may be also shared by many different applications. The storage administrator and the DBA need to work together to make sure the storage requirements of the RAC Database are met. Some key areas for storage design include storage network protocol, the topology of the physical network connections among the RAC node hosts, and storage, storage capacity, and I/O load balance among the applications that share the same storage.
Provisioning storage volumes in the shared storage for the Oracle RAC Databases. At the very least, you need to provision the storage volume for OCR and the voting disk files and the volumes for the database files. The goal is to ensure that these volumes meet the high availability and I/O performance and capacity requirements of the RAC Database with the optimal design and configuration of these storage volumes, including the RAID configuration of the volume; the capacity and number of disk spindles and what kind of disks (speed of disks) form the storage volume; and the storage controller to which the storage volume should be assigned.
Making the storage volumes accessible to the RAC nodes. This includes configuring the logical network connections to the storage volumes and presenting the storage volumes as OS devices in all the RAC nodes.
Creating Automatic Storage Management (ASM) diskgroups on the storage volumes for OCR and voting disk files of the Oracle Clusterware.
Creating ASM diskgroups for database files and optional ASM Cluster File System (ACFS) for non–Oracle Database files such as Oracle RAC home and clusterfile system for other applications.
Ongoing maintenance tasks such as monitoring database I/O performance and identifying storage I/O issues and performing storage reconfiguration, upgrade, or migration tasks.

In the preceding task list, tasks 1 and 2 are usually completed by the storage administrator and the OS system administrator with input from the Oracle DBA. Task 3 is performed by the OS system administrator with input from the Oracle DBA. Tasks 4–6 are traditionally the Oracle DBA’s responsibility. In many IT organizations, especially smaller IT departments, these responsibilities are combined and performed by one system administrator or DBA.

This chapter covers some of the techniques and best practices that are used for shared storage design and configuration of the RAC Database, with a focus on how to meet the special storage requirements for the Oracle Clusterware and RAC Database. These topics will be covered in this chapter:

Shared architecture and configuration for RAC
ASM
Storing OCR and voting disk files in ASM
ACFS

Storage Architecture and Configuration for Oracle RAC

In the Oracle RAC Database cluster, the cluster nodes connect directly to the same storage array where the critical components of the Oracle Clusterware OCR and voting disk files and the database files are stored. The availability of the storage array for each cluster node and the I/O performance of the storage array are critical to the Oracle RAC Database. Optimal storage array design and configuration are the foundation of Oracle RAC Database design. Understanding the optimal configuration of the storage array and how it contributes to the Oracle RAC Database design help us make the right design decisions and establish the optimized infrastructure for Oracle RAC Database at the very start of RAC Database configuration.

Note Oracle RAC 12c introduced a new cluster architecture called Oracle Flex Cluster in addition to the standard cluster configuration, which is similar to the Oracle 11gR2 cluster. In the Oracle 12c Flex Cluster, there are two types of cluster nodes: Hub nodes and Leaf nodes. All the Hub nodes have direct access to the shared storage, while the Leaf nodes do not require direct access to shared storage. These Leaf nodes get data from the shared storage through Hub nodes. (Please refer to Chapter 4 for more details.) This chapter will focus mainly on the standard cluster in which all the nodes have direct access to the shared storage.

Storage Architecture and I/O for RAC

Before we design the storage architecture, you need to understand the characteristics of I/O operations between the Oracle RAC nodes and storage. The storage architecture for an Oracle RAC environment consists of OCR and voting disk files of the Oracle Clusterware and database files and online redo logs for Oracle RAC Database. In theory, these storage components can be stored in one physical volume. However, for optimal I/O performance and better manageability, separate physical volumes may be created for each of the components.

To examine the I/O operations between the RAC Database nodes and the shared storage, for example, we create four volumes for a two-node RAC Database as illustrated in Figure 5-1. These four volumes are the two online redo log volumes, one OCR and voting disk volume, and one data volume. Each of the two online redo log volumes is for one one redo log thread of one RAC node. The data volume is for database files, control files, etc. Online redo logs and database files are very critical components of the RAC Database.

Figure 5-1. Oracle RAC Database architecture

The OCR and voting disk volume is for the OCR and voting disk file. The OCR stores the metatada of the resources that Oracle Clusterware manages, such as Oracle RAC Database, listeners, virtual IPs, and service. The voting disk files are used to store and manage the node membership of the cluster. Oracle Clusterware processes Cluster Ready Service (CRS), and Cluster Synchronization Services (CSS) on each RAC node constantly accesses OCR and voting disks. OCR and voting disks need to be accessible from each of the RAC nodes all the time. If a RAC node fails to access the voting disk file in 200 seconds, it will trigger a node eviction event that causes the RAC node to reboot itself. Therefore, the critical requirement for OCR and voting disk is the availability and fault tolerance of the storage volume.

To understand how the Oracle RAC Database accesses shared storage, let’s review the Oracle RAC Database architecture . Figure 5-1 shows a two-node RAC Database. Each of the RAC Database instances has a set of database background processes, such as the log writers (LGWR), DB writers (DBWn) and server processes, etc., along with RAC-specific processes such as LMS, LMON, LMD, etc. Each RAC instance also has its memory structure, including the System Global Area (SGA) memory structure, where database buffer cache and redo log buffers are located. The database buffer cache is the memory area that stores copies of data blocks read from data files. The redo log buffer is a circular buffer in the SGA that stores redo entries describing changes made to the database.

When a user sends a query request to the database instance, a server process is spawned to query the database. The block request is sent to the master instance of the block to check if this block has been read into any instance’s buffer cache. If the blocks cannot be found in any instance’s buffer cache, the server process will have to get the block from the storage I/O by reading the data block from the data files to the local buffer cache through data file read operations. If the data block is found in the buffer cache of one or more RAC instances, the instance will ask the Global Cache Service (GCS) which is the LMS process to get the latest copy of the block. If the latest copy is on a remote instance, the copy will be shipped from the buffer cache of the remote instance to the local buffer cache. In this way, Oracle cache fusion moves the current blocks between the RAC instances. As long as the block is in an instance’s buffer cache, all other instances can get the latest copy of the block from the buffer cache of an instance instead of reading from the storage.

Different types of application workloads determine the way in which RAC Database instances interact with the storage. For example, the data file read can be “random read” or “sequential read.” For online transaction processing (OLTP) type database workloads, most queries involve small random reads on the data files by taking advantage of the index scan. For data warehouse or decision support (DSS) workloads, the queries involve large sequential reads of the data files due to large full-table scan operations. In order to achieve optimal I/O performance for the OLTP workload, it is very critical to have a fast I/O operation, as measured by IOPS (I/O operations per second) and I/O latency. The IOPS number is about the I/O throughput, namely, how many I/O operations can be performed per second, while I/O latency is defined as the time which it takes to complete a single I/O operation.

One way to achieve higher IOPS is to have the data striped across multiple disk drives so that these multiple disk drives can be read in parallel. Another more promising solution is to use Solid State Drives (SSD), which can significantly increase IOPS and reduce I/O latency by removing the performance bottleneck created by the mechanical parts of the traditional hard disk. For DSS workloads, it is important to be able to read a large amount of data contiguously stored in the disk to the buffer cache at high speed (as defined by MBPS (megabytes/second)). The bandwidth of the components linking the server with the storage such as HBAs (Host Bus Adapters), storage network protocol, physical connection fabrics, and the storage controllers is the key to this type of performance. In reality, many database applications fix these two types of workloads. When we look for storage for the database, its IOPS, MBPS, and I/O latency should be evaluated to ensure that they meet the database I/O requirements.

Besides reading data from storage, writing data to storage is also critical to RAC Database performance. This is especially true for the OLTP-type database workload. Two important disk write operations are as follows:

writing redo logs from the redo log buffer in SGA to the online redo log files in the storage by the logwriter process (LGWR).
writing modified blocks (dirty blocks) in the buffer cache to the data files by the DB writer process (DBWn).

While writing dirty blocks by DBWR involves a random write operation, writing redo logs by LGWR involves a sequential write operation. In order to guarantee that a data change made by a transaction is fully recoverable, the transaction would have to wait until all the redo logs for the transaction and the system change number (SCN) of the transaction are written to the online redo logs. In an OLTP database with high-volume transactions, this sequential writing operation by LGWR can be the performance bottleneck that holds up these database transactions. A high number of ‘log file sync’ wait events shown in the AWR report is a good indication of this performance bottleneck. In order to optimize database performance, we need to focus on how to improve the LGWR’s I/O performance by allocating the online redo logs on storage that has high IOPS and low I/O latency.

Although user transactions don’t have to wait for the DBWR directly. a slow DBWR writing operation could still delay user transactions. As we know, at a database checkpoint, the DBWR process needs to write all the dirty blocks to the data files in the storage. A slow DBWR process can delay the checkpoint operation; for instance, if during the checkpoint the logwriter needs to reuse a log file that has redo information on a dirty block. The logwriter process has to wait until DBWR finishes writing the dirty block for the checkpoint. The logwriter wait caused by the slow DBWR process will slow down the user transactions. In this case, you will see the “Checkpoint not complete, cannot allocate new log” message in the alert.log file. This indicates that the logwriter had to wait and transaction processing was suspended while waiting for the checkpoint to complete. Therefore, the speed of writing dirty blocks to data files by DBWR impacts database performance.

Having explained the requirements of storage I/O under different database workloads and its impact on database performance, in the next section I will discuss the technology options that we should consider in the storage architecture design for the Oracle RAC Database.

RAID Configuration

As you saw in the last section, the availability and performance of a RAC Database depends on the shared storage infrastructure configuration. One of the most important storage configurations is RAID (Redundant Array of Inexpensive Disks), which was introduced as a way to combine multiple disk drive components into a logical unit. The following are some of the commonly used levels of RAID configuration adopted to enhance storage reliability and performance:

RAID 0 for block level striping: data blocks are striped across a number of disks in a sequential order. This is often used to improve storage performance as it allows reading data from multiple disks in parallel. As RAID 0 doesn’t provide mirroring or parity, any drive failure will destroy the array.
RAID 1 for mirroring: two or more disks have exact copies of the data. This is often used to achieve reliability against disk failures, as it saves more than one copy of the database. This can improve the data read performance, as the data read can get the data from the faster of multiple copies. But it will slow down the data writes, as the controller needs to write more than one copy of the data. It also cuts the storage capacity and the number of disks in half. This option requires minimal two disk drives. RAID 1 doesn’t provide striping.
RAID 1+0 for mirroring and block level striping: As shown on the right side of Figure 5-2, this option create mirror sets by mirroring the disks, then stripe data blocks such as B1, B2 . . . across these mirrored sets so it is also called as “stripe of mirrors”. It provides both reliability and performance improvement.
RAID 0+1 for block level striping and mirroring: As shown on the left side of Figure 5-2, this option create two striping sets each with data blocks such as B1, B2 . . . , and then let them mirror each other, so it is also called as “mirror of stripes”. This option achieves both reliability and performance improvement.

Figure 5-2. RAID 0+1 vs. RAID 1+0

The difference between RAID 1+0 and RAID 0+1 is that if one disk fails in RAID 0+1, the entire striping set fails. Then, if another disk in the second mirroring set fails, this array is lost. While in the RAID 1+0 configuration, as long as both disks in the same mirror set fail, the array is fine. Both these options reduce the disk capacity by half. This configuration is commonly used for OLTP-type database workload, as this option doesn’t impose a heavy performance overhead for the write operation which is very frequent for OLTP type of workload.

RAID 5 for striping blocks with parity on all the disks. To provide error protection, distributed parity blocks are kept on all the disks. In case of a disk failure, the parity blocks are used to reconstruct the errant sector. Although this option reduces the capacity of only one disk, it imposes a heavy performance overhead for write operations, as the parity block has to be recalculated every time. Considering the cost savings, RAID 5 may be an acceptable configuration option for a data warehouse–type database, which mostly does read operations and which needs a significant amount of disk capacity, but it is not a good choice for update-heavy databases engaged in OLTP workloads.
RAID 6 for striping blocks with parity on all the disks. It is similar to RAID 5 except for having double distributed parity. It provides fault tolerance up to two failed drives.

A RAID array can be implemented as either hardware RAID or software RAID. Hardware RAID operates at the level of the RAID Controller; the OS does not know anything about it. When the data comes from the OS to the storage, the RAID controller card takes care of striping or mirroring. In software RAID, the OS takes the responsibility of striping the data and writing to the appropriate disk as needed. Software RAID is low cost as it doesn’t require special hardware, but it costs the host CPU resource. Hardware RAID doesn’t consume any host CPU, but there is an additional cost for the RAID controller card. The external storage used for shared storage in Oracle RAC Database usually uses SAN or something similar. The storage controller of these storage systems provides hardware RAID functionality and supports different kinds of disk RAID configuration such as RAID 0, 1, 1+0, and 5, etc. It is a part of the storage design and configuration to create a disk RAID configuration (called the RAID group in some storage product terminologies ) based on a set of individual physical disks. Then, data volumes or logical unit numbers (LUNs) can be created on these RAID configurations or groups.

Figure 5-3 shows an example of the storage configuration. Table 5-1 lists the design of the RAID groups and the storage volumes associated with these RAID groups. In this example, there are three RAID 10 RAID groups—DataG1, DataG2, and DataG3—and two RAID 1 RAID groups—RedoG1 and RedoG2. There are eight storage volumes or LUNs: Data1-Data3, OCR1-OCR3, and Redo1-Redo2. These volumes are used as the storage LUNs for database files, OCT, voting disk files, and Redo logs, as indicated by their names. Figure 5-3 shows an example of the actual configuration of storage volumes on a storage Controller Management GUI tool called Dell PowerVault Modular Disk Storage Manager (MDSM). Although different storage vendors have different storage management tools, they provide similar functionality that allows storage administrators to create storage volumes based on physical disks with certain RAID configurations.

Table 5-1. Storage RAID configuration design

Figure 5-3. An example of storage RAID configuration for RAC Database

Storage Protocols

Apart from RAID configuration, it is a critical part of storage design to determine how the RAC node hosts share access to the storage and perform database I/O operations through the storage. This design includes the topology of physical links between the hosts and the storage and storage protocols that are used to transfer the data. Here we explore some widely used storage network protocols such as Small Computer System Interface (SCSI), Fibre Channel (FC), Internet Protocol (IP), and Network Area Storage (NAS). SCSI, FC, and IP protocols send block-based data, called the block base protocol, while NAS sends file-based data across the network, called the file-based protocol.

The SCSI protocol defines how host operating systems do I/O operations on disk drives. In the SCSI protocol, the data is sent in a chunk of bits called a “block” in parallel over a physical connection such as a copper SCSI cable. Every bit needs to arrive at the other end of cable at the same time. This limits the maximum distance between the host and the disk drives to under 25 meters. In order to transfer the data over a longer distance, the SCSI protocol usually works with other protocols such as FC and Internet SCSI (iSCSI).

FC is a SAN technology that carries data transfers between hosts and SAN storage at very high speeds. The protocol supports 1 Gbps, 2 Gbps, 4 Gbps, and 16 Gbps (most recently) and maintains very low latency. The FC protocol supports three different topologies: 1) connecting two devices in a point-to-point model; 2) An FC-arbitrated loop connecting up to 128 devices; 3) FC switched fabric model. FC switched fabric is the most common topology for Oracle RAC Database. The example in Figure 5-4 shows a configuration of two-node Oracle RAC connecting with Dell Compellent SC8000 FC SAN storage using two FC switches.

Figure 5-4. FC switch fabric topology

To provide highly available storage connections, each RAC host has two HBAs (Host Bus Adapter), each of which connects to an FC switch through a fiber optical cable. Each FC switch connects to both FC storage controllers. To increase the storage I/O bandwidth, each FC switch is connected to both FC controllers with two fiber cables. By using FC switches, the number of servers that can connect to an FC storage is not restricted. And the distance between hosts and storage can be up to 10 km. The components in the storage path such as HBAs, fiber optical cables, FC switches, and the storage controllers are very robust and capable of providing highly efficient and reliable data transfer between the server hosts and the storage. The FC storage controller can also connect to multiple storage enclosures. These storage enclusures are connected by daisy-chained SAS cables. The physical links between the storage controllers and the storage enclosures use SAS cables.

Figure 5-3 shows a configuration with two enclosures connected to two storage controllers. It is also possible to add more storage enclosures. Storage vendors usually specify the maximum number of storage enclosures that their controllers can support. Adding multiple storage enclosures allows you to add more disk spindles to the storage. For example, if one enclosure holds 24 disks, two enclosures can hold up to 48 disks. You also can put disks of different speeds in different enclosures. For example, to improve storage I/O performance, you can put SSDs in one enclosure and 15K rpm hard disks in another enclosure. Today, many SAN storage vendors provide some kind of storage tiering technology that can direct application workloads to different levels of storage media to achieve the most suitable performance and cost characteristics. As well as the redundant connection paths, a disk RAID configuration such as RAID 10 or RAID 5 should be implemented to avoid any single point of failure at the disk level.

Using the configuration shown in Figure 5-4, let’s examine how multiple I/O paths are formed from a RAC node to the storage servers. In an FC storage configuration, all the devices connected to the fibers such as HBAs and storage controller ports are given a 64-bit identifier called World Wide Number (WWN). For example, the HBA’s WWN number in Linux can be found as follows:

$ more /sys/class/fc_host/host8/port_name
0x210000e08b923bd5

On the switch layer, a zone is configured to connect an HBA’s WWN with a storage controller port’s WWN. As shown in Figure 5-4, these WWNs are as follows:

RAC host 1 has HBA1-1 and HBA1-2, which connect to FC switches SW1 and SW2, respectively.
RAC host 2 has HBA2-1 and HBA2-2, which connect to FC switches SW1 and SW2, respectively.
There are two FC controllers, FC1 and FC2, which are connected to FC switches SW1 and SW2, respectively.

The storage zoning process is to create multiple independent physical I/O paths from RAC node hosts to the storage through the FC switches to eliminate the single point of failure.

After zoning, each RAC host establishes multiple independent physical I/O paths to the SAN storage. For example, RAC host 1 has four paths:

I/O Path1: HBA1-1, SW2, FC1
I/O Path2: HBA1-1, SW2, FC2
I/O Path3: HBA1-2, SW1, FC1
I/O Path4: HBA1-2, SW1, FC2

These redundant I/O paths give a host multiple independent ways to reach a storage volume. The paths using different HBAs show up as different devices in the host (such as /dev/sda or /dev/sdc), even though these devices point to the same volume. These devices share one thing in common, in that they all have the same SCSI ID. In the next section, I will explain how to create a logical device that includes all the redundant I/O paths. Since this logical device is supported by multiple independent I/O paths, the storage access to this volume is protected from multiple-component failure up to a case when one HBA, one switch, and one controller all fail at the same time.

An FC SAN provides a highly reliable and high-performance storage solution for RAC Database. However, the cost and complexity of FC components make it hard to adopt for many small and medium businesses. However, the continuously improving speed of Ethernet and the low cost of its components has led to more adoption of the iSCSI storage protocol. iSCSI SAN storage extends the traditional SCSI storage protocol by sending SCSI commands over IP on Ethernet. This protocol can transfer data at a high speed for very long distances, especially by adding high-performance features such as high-speed NICs with TCP/IP Offload engines (TOE), and switches with low-latency ports. The new 10g GbE Ethernet allows iSCSI SAN storage to deliver even higher performance. Today, network bandwidths for both FC and iSCSI are improving. FC has moved to 1 Gbps, 2 Gbps, 4 Gbps, and even 16 Gbps, and iSCSI is also moving from 1 GbE to 10 GbE. Both FC and iSCSI storage are able to delive storage performance good enough to meet enterprise database needs.

As shown in Figure 5-5, iSCSI storage uses regular Ethernet to connect hosts and storage. For traditional 1GbE Ethernet, it can use regular Ethernet network cards, cables, and switches for data transfer between servers and storage. To design 10 GbE iSCSI storage SAN solution, you would have to make sure all the components support 10GbE Ethernet, including 10 GbE network adapter, high-speed cables, 10 GbE switches, and 10GbE storage controllers. And of course, this configuration will raise the cost of iSCSI storage deployment.

Figure 5-5. iSCSI storage configuration for a two-node RAC

Multipath Device Configuration

It is a recommended best practice to configure redundant I/O paths between Oracle RAC servers and storage, to ensure high availability of shared storage access, as shown in the figures in the last section. For example, from RAC host1, there are four redundant I/O paths to a same storage volume. These redundant I/O paths are represented by multiple SCSI devices on the RAC node that point to the same storage volume (also called logical unit, identified by the LUN). Since these SCSI devices are pointing to the same LUN, they have the same SCSI ID.

For example, /dev/sdc and /dev/sde represent two redundant I/O paths to the same LUN with the scsi ID: 36842b2b000742679000007a8500b2087. You can use the scsi_ID command to find the SCSI ID of a device:

[root@k2r720n1 $scsi_id  /dev/sdc
36842b2b000742679000007a8500b2087

The following Linux shell script finds the SCSI IDs of the devices:

[root@k2r720n1 ∼]# for i in sdc sdd sde sdf ; do printf "%s %s
" "$i" "$(scsi_id --page=0x83 --whitelisted --device=/dev/$i)"; done
sdc 36842b2b000742679000007a8500b2087
sdd 36842b2b000742679000007a5500b1cd9
sde 36842b2b000742679000007a8500b2087
sdf 36842b2b000742679000007a5500b1cd9

Many OS have their own multipathing software that can be used to create a pseudo-device to facilitate the sharing and balancing of I/O operations of LUNs across all available I/O paths. For example in Linux, a commonly used multipath device driver is the Linux native Device Mapper. To verify whether or not the rpm package is already installed, you can run this command:

[root@k2r720n1 yum.repos.d]# rpm -qa | grep device-mapper-multipath
device-mapper-multipath-libs-0.4.9-56.el6_3.1.x86_64
device-mapper-multipath-0.4.9-56.el6_3.1.x86_64

If the rpm package is not installed by default, you can install it from the yum repository:

$yum -y install device-mapper-multipath

Or install it manually:

rpm -ivh device-mapper-multipath-0.4.9-56.el6_3.1.x86_64.rpm

To configure the multipath driver to combine these I/O paths into a pseudo-device, you need to add related entries to the multipathing driver configuration file /etc/multipath.conf. This is an example of the /etc/multipath.conf file:

defaults {
        udev_dir                /dev
        polling_interval        5
        path_selector           "round-robin 0"
        path_grouping_policy    failover
        getuid_callout          "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
        prio                    const
        path_checker            directio
        rr_min_io               1000
        rr_weight               uniform
        failback                manual
        no_path_retry           fail
        user_friendly_names     yes
}
...
multipaths {
        multipath {
                wwid            36842b2b000742679000007a8500b2087  #<---sdc and sde
                alias           ocrvoting
         }
         multipath {
                wwid            36842b2b000742679000007a5500b1cd9  #<--- sdd and sdf
                alias           data
         }
}
...

Start the multipath service:

# service multipathd start
Starting multipathd daemon:                                [  OK  ]

The device-mapper-multipath daemon creates a pseudo-device ‘ocrvoting’:

[oracle@k2r720n1 ∼]$ ls -l /dev/mapper/
lrwxrwxrwx 1 root root      7 Oct 29 06:33 /dev/mapper/data -> ../dm-9
lrwxrwxrwx 1 root root      7 Oct 29 06:33 /dev/mapper/ocrvoting -> ../dm-8
This shows that the soft links such as  /dev/mapper/data pointing to block device /dev/dm-9

In addition to the benefit of combining multipath devices into a single pseudo-device, the multipath driver also creates a consistent device name. In Linux, by default the device name of storage LUN may not guarantee consistency every time the OS gets rebooted. For example, for the same LUN, /dev/sdc may be changed to /dev/sdd the next time the OS gets rebooted. This can create a serious problem if the RAC Database uses this volume to store the database files, as the RAC Database instance recognizes the volume by name. If the name changes, this volume cannot be found again. And in the Oracle RAC environment, we also need to make sure the same shared storage LUN has the same consistent name on every node of the RAC. However, by default, a storage volume is not guaranteed to have the same name across all the RAC nodes. By mapping a LUN SCSI ID to the pseudo-device name and also setting ‘user_friendly_names’ to ‘yes’ in the multipath.conf file, the pseudo-device name will be kept consistent every time each of the RAC servers gets rebooted, and this name is guaranteed to be identical on every node of the RAC if we use the same multipath.conf on all the RAC nodes.

Set Ownership of the Devices

The next challenge in setting up devices for the Oracle RAC environment is to set proper ownership of these devices. When these devices were initially created, they were owned by the root user, thus:

$ ls -l /dev/mapper/*
lrwxrwxrwx 1 root root      7 Oct 29 17:16 ocrvoting -> ../dm-8
lrwxrwxrwx 1 root root      7 Oct 29 17:16 data -> ../dm-9

And block devices dm-8 and dm-9 are owned by root:

$ls -l /dev/dm-*
brw-rw---- 1 root disk 253,  8 Oct 29 06:33 /dev/dm-8
brw-rw---- 1 root disk 253,  9 Oct 29 06:33 /dev/dm-9

In the Oracle RAC Database, if Oracle ASM is selected to manage the shared storage volumes for OCR, voting disk files, and database files, the ASM instance needs to have write privileges on these devices. The solution is to change the ownership of these devices from root to the owner of the ASM instance, for example the ‘grid’ user.

There are two methods to change the proper ownerships in Linux:

Use the Linux udev utility to create udev rules that change the ownership of the devices.
Create ASM disks using ASMLib. The ASM disks will be given a new owner, which can be the owner of ASM instance like the “grid” user.

Of these two methods, the udev rule method is based on a Linux utility which is available for various versions of Linux such as Red Hat Enterprise Linux 5.x and 6.x, and Oracle Linux 5.x and 6.x. Oracle ASMLib has been generally available for both Red Hat Enterprise Linux 5.x and Oracle Linux 5.x. However, for Enterprise Linux 6.x, ASMLib was initially only available for Oracle Linux 6.x UEK kernel. ASMLib was not available for Oracle Linux 6 Red Hat–compatible kernel and Red Hat Exterprise Linux 6, until Spring 2013, when Oracle released the rpm packages for Oracle Linux 6 and Red Hat made the ASMLib kernel package for Red Hat Enterprise Linux 6 (beginning with 6.4). The dev rules method is the choice if you don’t use ASMLib. Let’s discuss how to set the udev rules in detail here. ASMLib will be discussed in the next section.

The following are the steps to create udev rules for the devices:

Get the SCSI ID (WWIDs) for all the devices that will be used for the RAC Database by using the SCSI script mentioned previously.

Create a udev rule file, for example /etc/udev/rules.d/99-oracle-asmdevices.rules

#------------------------ start udev rule contents ----------------------------------------------------------#
KERNEL=="dm-*", PROGRAM="scsi_id --page=0x83 --whitelisted --device=/dev/%k",RESULT=="36842b2b000742679000007a8500b2087", OWNER:="grid", GROUP:="asmadmin"
KERNEL=="dm-*", PROGRAM="scsi_id --page=0x83 --whitelisted --device=/dev/%k",RESULT=="36842b2b000742679000007a5500b1cd9", OWNER:="grid", GROUP:="asmadmin"
#-------------------------- end udev rule contents --------------------------------------------------------#

Run the start_udev command to apply the newly created rules:

[root@k2r720n1 rules.d]# start_udev
Starting udev:                                              [  OK  ]

Check the new ownership and permission settings:

# ls -l  /dev/dm-*
brw-rw---- 1 grid asmadmin 253,  8 Oct 29 12:08 /dev/dm-8
brw-rw---- 1 grid asmadmin 253,  9 Oct 29 12:08 /dev/dm-9

In summary, the provisioning of storage volumes for an Oracle RAC Database consists of these steps:

Establish redundant paths from RAC hosts to the storage.
Design and configure the proper storage volumes and assign these volumes to the RAC hosts so that these volumes are presented as the block devices on the RAC hosts.
Configure multipathing on the block devices by mapping them to the multipath block device names.
Set the proper ownerships and the access permissions on these multipath block device names.

ASM

In the previous section, I discussed storage design and how to present storage volumes as block devices to Oracle RAC Database hosts. In order to be able to store database files using these block devices, we need to establish database file systems on these block devices. For the Oracle RAC environment, this file system has to be a cluster file system that is shared by multiple RAC nodes. Existing volume managers like LVM in Linux and OS file systems like the ext3 file system in Linux are only for local systems; they are not designed for the cluster environment. Therefore, they cannot be used for shared storage in the Oracle RAC environment. Prior to Oracle 10g, the Oracle RAC Database was built on raw devices, which turned out to be very difficult to use and manage due to lack of flexibility. For example it required one raw device for every file, and the file could not grow or extend. For some Oracle E-Business suite databases which might need more than 255 files, it actually ran out of the 255 maximum number of available raw devices.

Oracle ASM was initially introduced in Oracle 10g as a volume manager and file system for Oracle data files. Oracle ASM supports both single-node Oracle Database and Oracle RAC Database. For Oracle RAC Database, Oracle ASM provides a cluster file system that allows all the RAC nodes to share. With ASM, database files can be created or grown and expanded as needed. Oracle ASM also provides high availability and performance features such as RAID configuration and I/O load balancing. Oracle ASM is Oracle’s recommended storage solution for Oracle Database.

Oracle ASM has evolved since it was first introduced in Oracle 10gR1. For example, 10gR2 introduced the command-line interface ASMCMD, multiple database version support, and database storage consolidation with single instance and RAC. 11gR1 introduced Fast Mirror Resynchronization for ASM redundancy disk groups and ASM instance rolling upgrade and patching support, separate connect privilege SYSASM. Oracle Grid Infrastructure 11gR2 introduced quite a few new changes in ASM. The most significant ones include combining Oracle Clusterware and ASM into a single Grid Infrastructure stack, storing Oracle OCR and cluster voting disk files in ASM, and introducing the ASM cluster file system ACFS and ASM Dynamic Volume Manager ASVM. Oracle 12cR1 also brings many new features to Oracle ASM and Oracle ACFS. One of the significant new features is Oracle Flex ASM, which decouples the Oracle ASM instance from the database server and allows database instances to connect the remote ASM instances. Chapter 4 will give you more details about Oracle Flex ASM. Some other Oracle ASM enhancements introduced in Oracle 12c include increasing storage limits to support up to 511 storage diskgroups and the maximum Oracle ASM disk size to 32 Petabytes (PB); ASM shared password file in a disk group; ASM rebalance enhancements and ASM Disk Resync enhancements; ASMCMD extensions by including icmd command-line interface, a unified shell environment that integrates all the required functionality to manage the Oracle Grid Infrastructure home, etc. The Oracle ASM software binary is installed in the Oracle ASM home directory, as the Oracle RAC Database software binary is installed in the Oracle Database home directory.

In Oracle 10g and 11g R1, although Oracle ASM and Oracle Database can be installed in the same home directory, Oracle highly recommends separating the Oracle ASM home from the Oracle Database home. This helps to improve flexibility in performing upgrade and patching operations. This is even more important if you run multiple versions of Oracle Database software on a cluster. In Oracle 11gR2 and Oracle 12cR1, the separation of the Oracle ASM home from the Oracle Database home has become mandatory as Oracle combines Oracle ASM and Oracle Clusterware into a new product called Grid Infrastructure. In this case, both Oracle ASM software binary and Oracle Clusterware binary are installed at the same time to the same Grid Infrastructure home.

ASM Instance

Oracle ASM instances provide the storage management for Oracle Database. An Oracle ASM instance has an architecture similar to an Oracle Database instance. It has SGA, the background processes, and the listener process. Since Oracle ASM needs to perform fewer tasks than the Oracle Database instance, it has a much smaller SGA size and usually has a minimal performance impact on a server. To provide storage access to Oracle Database instance(s), no matter how many versions of Oracle Database binaries and how many database instances are running, a RAC node can run a maximum of one Oracle ASM instance.

Prior to Oracle 11gR2, Oracle ASM ran above the Oracle cluster stack and provided the storage management for Oracle Database. As shown in Figure 5-6, the Oracle ASM instance depends on Oracle Clusterware to join the cluster environment and access the shared storage, and Oracle Database instances depend on the ASM instance to access the ASM diskgroups on the shared storage and depend on the Clusterware for cluster management. The entire RAC Database stack starts with Clusterware startup followed by ASM instance startup. The database instances will not start until both Clusterware and ASM instance have started up successfully.

Figure 5-6. Dependency prior to Oracle 11gR2

Oracle 11gR2 introduced Grid Infrastructure, which combined Oracle ASM and Oracle Cluster into a single product that is installed and configured during the Grid Infrastructure installation. ASM becomes a part of the CRS of the Clusterware. A single Grid Infrastructure startup command starts up both Oracle Clusterware and the ASM instance. Then the database instances can be started. This dependency also applies to Oracle 12cR1. Prior to Oracle 12c, it is required to run an Oracle ASM instance on every RAC Database node. This requirement has been removed by introducing the Oracle Flex ASM option in Oracle 12cR1. By enabling this Flex Oracle ASM option in Oracle 12c, Oracle ASM instances may run separate nodes from Oracle Database 12c instances.

Some RAC nodes can run Oracle Database instances without an Oracle ASM instance. For the RAC nodes where there is no local ASM instance or the local ASM instance fails, the database instance on the RAC nodes can remotely connect to an Oracle ASM instance on another RAC node. Therefore in Oracle 12c, depending on whether or not the Oracle Flex ASM option is enabled, Oracle ASM can run in two distinct modes: the Standard ASM and the Flex ASM. The standard ASM works similarly to Oracle 11gR2 ASM, with every RAC node running an Oracle ASM instance and all the Oracle Database instances on the cluster connecting to the local ASM instances. Figure 5-7 shows the dependency in Oracle 11gR2 and Oracle 12cR1 Standard ASM without the Flex ASM being enabled.

Figure 5-7. Dependency 11gR2 ASM and 12cR1 standard ASM

In this architecture, a single ASM instance per RAC node provides a clustered pool of storage to all the database instances running on that node; the ASM instance has access to the shared storage volumes and presents the ASM diskgroup as the file system for all the database instance(s) running on the RAC node.

In Oracle 12cR1, by enabling the Flex ASM, a small set of cluster nodes run Oracle ASM instances to provide storage access to a large number of cluster nodes. Figure 5-8 shows an example of the Flex ASM configuration. By default, three RAC nodes (nodes 1, 2, and 3) run Oracle ASM instances, and all the database instances, including ones on node 4 and 5, remotely run these ASM instances. Oracle Flex ASM often works with Oracle Flex Cluster. The nodes that have a direct connection to the shared storage are called the Hub nodes (such as nodes 1, 2, 3, and 4 in Figure 5-8), and the nodes (such as node 5 in Figure 5-8) that don’t have a direct connection to the shared storage are Leaf nodes. Usually one or more Leaf nodes are connected to a Hub node to access shared storage. In Figure 5-8, Leaf node 5 accesses shared storage through Hub node 3. Chapter 4 discusses the details of Oracle Flex ASM and Oracle Flex Clusters.

Figure 5-8. Dependency on Oracle 12c Flex ASM

Creation and Ownership of ASM Instance

However, the relationship between the ASM instance and Oracle Clusterware has been changed from Oracle 11g R1 to Oracle 11gR2/12cR1. The ASM instance depends on Oracle Clusterware for the basic cluster infrastructure and the communication between the cluster nodes. The 11gR1 Oracle Clusterware doesn’t rely on the ASM instance to access the OCR and voting disk files, as these two important components for the Clusterware metadata are directly stored in block devices. In 11gR1, the ASM instance is created with the DBCA tool after the Oracle Clusterware installation and before database creation. However in 11gR2 and 12cR1, if you chose to store OCR and voting disks in ASM, Oracle Clusterware depends on the ASM instance to access the OCR and voting disk. This leads to mutual dependency between Oracle Clusterware and Oracle ASM. The solution to this dependency is that these two products are combined together into one product, Grid Infrastructure, and they are installed together into a single Grid Infrastructure home by the Grid Infrastructure installer. The installation process also creates the ASM instance, creates the diskgroup for storing OCR and voting disk, and finishes the Oracle Clusterware configuration on all the RAC nodes in the end.

In a RAC Database environment, the default ASM instance SID is +ASM node_number, such as +ASM1 for node 1 and +ASM2 for node 2, etc. Like an Oracle Database instance, an ASM instance has a set of background processes. For example, the following is a list of processes of an Oracle 12cR1 ASM instance:

$ps -ef | grep -v grep | grep asm_
grid      4335     1  0 Feb19 ?        00:06:51 asm_pmon_+ASM1
grid      4337     1  0 Feb19 ?        00:05:08 asm_psp0_+ASM1
grid      4339     1  2 Feb19 ?        07:47:18 asm_vktm_+ASM1
grid      4343     1  0 Feb19 ?        00:01:48 asm_gen0_+ASM1
grid      4345     1  0 Feb19 ?        00:00:43 asm_mman_+ASM1
grid      4349     1  0 Feb19 ?        00:18:12 asm_diag_+ASM1
grid      4351     1  0 Feb19 ?        00:03:18 asm_ping_+ASM1
grid      4353     1  0 Feb19 ?        01:12:41 asm_dia0_+ASM1
grid      4355     1  0 Feb19 ?        00:48:21 asm_lmon_+ASM1
grid      4357     1  0 Feb19 ?        00:27:40 asm_lmd0_+ASM1
grid      4359     1  0 Feb19 ?        01:00:51 asm_lms0_+ASM1
grid      4363     1  0 Feb19 ?        00:13:18 asm_lmhb_+ASM1
grid      4365     1  0 Feb19 ?        00:01:06 asm_lck1_+ASM1
grid      4367     1  0 Feb19 ?        00:00:14 asm_gcr0_+ASM1
grid      4369     1  0 Feb19 ?        00:00:44 asm_dbw0_+ASM1
grid      4371     1  0 Feb19 ?        00:00:54 asm_lgwr_+ASM1
grid      4373     1  0 Feb19 ?        00:01:52 asm_ckpt_+ASM1
grid      4375     1  0 Feb19 ?        00:00:38 asm_smon_+ASM1
grid      4377     1  0 Feb19 ?        00:00:45 asm_lreg_+ASM1
grid      4379     1  0 Feb19 ?        00:06:21 asm_rbal_+ASM1
grid      4381     1  0 Feb19 ?        00:04:37 asm_gmon_+ASM1
grid      4383     1  0 Feb19 ?        00:02:14 asm_mmon_+ASM1
grid      4385     1  0 Feb19 ?        00:03:35 asm_mmnl_+ASM1
grid      4387     1  0 Feb19 ?        00:03:49 asm_lck0_+ASM1
grid      4429     1  0 Feb19 ?        00:00:31 asm_asmb_+ASM1
grid     22224     1  0 Mar03 ?        00:00:16 asm_scrb_+ASM1

In this case, the “grid” user is the owner of the ASM instance as well as the owner of Grid Infrastructure. The “grid” user belongs to three operating system groups: asmdba, asmoper, and asmadmin.

$id grid
uid=54322(grid) gid=54321(oinstall) groups=54321(oinstall), 54325(asmadmin), 54326(asmdba), 54327(asmoper)

During Grid Infrastructure installation, these three OS groups, namely, asmdba, asmoper, and asmasm, are granted the Oracle ASM DBA (SYSDBA), Oracle ASM Operator (SYSOPER), and Oracle ASM Administrator (SYSASM) privileges, as shown in Figure 5-9.

Figure 5-9. Three OS groups for ASM management

The SYSDBA privilege provides access to the data stored in ASM diskgroups; the SYSOPER privilege allows performing instance operations such as startup, shutdown, mount, dismount, and checking diskgroup; and the SYSASM privilege allows full administration for the Oracle ASM instance, such as creating or altering diskgroup.

Through these three OS groups, the “grid” is granted all three system privileges for the Oracle ASM instance: SYSDBA, SYSOPER, and SYSASM. This user is the owner of Grid Infrastructure, which is used to install and manage the Grid Infrastructure including the ASM instance. In your environment, you also can create additional users with one or two of these system privileges to perform specific operations such as monitoring ASM diskgroups.

Like a database instance, you can manage an ASM instance using the sqlplus command. For example, in Linux or Unix, you can set the following environment in the bash profile of the grid user:

export ORACLE_SID=+ASM1
export ORACLE_HOME=/u01/app/12.1.0/grid

Through the grid user, you can log in to the ASM instance with the sysasm privilege and perform ASM administration tasks:

$ sqlplus / as sysasm
 
SQL*Plus: Release 12.1.0.1.0 Production on Tue Mar 5 13:22:37 2013
 
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
 
SQL> show parameter spfile
 
NAME      TYPE  VALUE
------------------------------------ ----------- ------------------------------
spfile   string  +DATA1/knewrac/ASMPARAMETERFILE/registry.253.807834851

Notice that in the preceding example, the spfile of the ASM instance is stored in a diskgroup “DATA.” Another user is “oracle,” which is granted SYSDBA and SYSOPER privileges through asmdba and asmoper groups.

$ id oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall), 54322(dba), 54326(asmdba), 54327(asmoper)

Without being granted the SYSASM privilege, the “oracle” user cannot use the sysasm privilege to log in to the Oracle ASM instance to do operations such as creating and altering ASM diskgroup. SYSASM enables the separation of SYSDBA database administration from Oracle ASM storage administration. In this case, we log in as the grid user to do ASM administration and log in as the Oracle user to do database administration.

Listener for ASM Instance

By default, the listener runs from the Grid Infrastructure home and listens for both ASM instance and database instances on that RAC node. You don’t need to run another listener from Oracle RAC home.

Here is an example of the Oracle 12c listener:

$ps -ef | grep -v grep | grep LISTENER
grid      4799     1  0 Feb19 ?        00:02:00 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
grid      8724     1  0 Feb19 ?        00:01:11 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
 
$lsnrctl status
LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 05-MAR-2013 13:39:22
 
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                       LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                19-FEB-2013 22:47:56
Uptime                    13 days 14 hr. 51 min. 29 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/knewracn1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.9.41)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.150.37)(PORT=1521)))
Services Summary...
Service "+APX" has 1 instance(s).
  Instance "+APX1", status READY, has 1 handler(s) for this service...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 2 handler(s) for this service...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 2 handler(s) for this service...
Service "knewdb.kcloud.dblab.com" has 1 instance(s).
  Instance "knewdb_3", status READY, has 1 handler(s) for this service...
Service "knewdbXDB.kcloud.dblab.com" has 1 instance(s).
  Instance "knewdb_3", status READY, has 1 handler(s) for this service...
Service "knewpdb1.kcloud.dblab.com" has 1 instance(s).
  Instance "knewdb_3", status READY, has 1 handler(s) for this service...
Service "knewpdb2.kcloud.dblab.com" has 1 instance(s).
  Instance "knewdb_3", status READY, has 1 handler(s) for this service...
The command completed successfully

In this example, the listener listens for the ASM instance plus three database instances: APX1, -MGMTDB, and knewdb_3. The APX1 is the Oracle Enterprise Manager Database Express database instance,–MGMTDB is the Grid Infrastructure Management repository database instance, and knewdb_3 is the user database instance.

Startup and Shutdown of ASM Instance

Similar to an Oracle Database instance, an ASM instance can be shut down or started up using the startup or shutdown command after you log in to ASM instance using SQLPLUS or using the command srvctl stop –n <nodename>.

However if the OCR and voting disk files of the Clusterware are stored in Oracle ASM, shutting down ASM will also bring down Oracle CRS of the Clusterware. Here are some of the error messages shown in the alert.log file of Clusterware after executing the “shutdown” command in the ASM instance:

crsd(5491)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/12.1.0/grid/log/k2r720n1/crsd/crsd.log.2013-02-01 17:17:29.604
...
[crsd(5491)]CRS-2765:Resource 'ora.VOCR.dg' has failed on server 'k2r720n1'.2013-02-01 17:17:29.924
[/u01/app/12.1.0/grid/bin/oraagent.bin(5600)]CRS-5822:Agent '/u01/app/12.1.0/grid/bin/oraagent_grid' disconnected from server. ...
...
[crsd(8205)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
...
[ohasd(3813)]CRS-2765:Resource 'ora.crsd' has failed on server 'k2r720n1'.
2013-02-01 17:17:50.104
[ohasd(3813)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart
In Cluster Registry process log: crsd.log:
...
2013-02-01 17:17:39.253: [  OCRASM][2793891616]ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup
2013-02-01 17:17:39.254: [  OCRASM][2793891616]proprasmo: The ASM instance is down
2013-02-01 17:17:39.254: [  OCRRAW][2793891616]proprioo: Failed to open [+VOCR]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2013-02-01 17:17:39.254: [  OCRRAW][2793891616]proprioo: No OCR/OLR devices are usable
...
2013-02-01 17:17:39.258: [  CRSOCR][2793891616] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
2013-02-01 17:17:39.258: [ CRSMAIN][2793891616] Created alert : (:CRSD00111:) :  Could not init OCR, error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
2013-02-01 17:17:39.258: [    CRSD][2793891616][PANIC] CRSD exiting: Could not init OCR, code: 26
2013-02-01 17:17:39.258: [    CRSD][2793891616] Done.

The crsctl check also showed that the CRS was offline.

$crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
At this point, restarting ASM instance still failed to mount the diskgroup for the OCR/voting disk, which prevented OCR and voting disk files being accessible from the Clusterware. Without being able to access OCR and voting disk files, we were not able to start the CRS service.
SQL> startup
ASM instance started
Total System Global Area  409194496 bytes
Fixed Size                  2228864 bytes
Variable Size             381799808 bytes
ASM Cache                  25165824 bytes
ORA-15032: not all alterations performed
ORA-15017: diskgroup "VOCR" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "VOCR"

The crsctl check also showed that the CRS was offline.

$crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

ASM disk groups are registered as resources with Grid Infrastructure. Each resource with Grid Infrastructure has its resource dependencies that define the relations among resources. The dependency determines the startup order or shutdown order of the resources. For example, the following shows the dependency associated with VOCR diskgroup:

$  crsctl stat res ora.VOCR.dg -p | grep DEPENDENCIES
START_DEPENDENCIES=pullup:always(ora.asm) hard(ora.asm)
STOP_DEPENDENCIES=hard(intermediate:ora.asm)

This shows that the VOCR diskgroup has the hard STOP dependency on ASM instance, which defines if ASM instance stops the VOCR diskgroup also stops. It also shows that the VOCR diskgroup has both HARD and pull-up START dependencies on ASM instance, which defines that the VOCR diskgroup can only be started when ASM instance is running (HARD dependency) and the VOCR diskgroup must be automatically started whenever the ASM instance starts. For more details about resource dependency in 12cR1 Clusterware, refer to Chapter 8.

Because of this dependency of the VOCR diskgroup on the ASM instance, whenever ASM instance stops, the VOCR diskgroup goes offline. As a result, the OCR stored in the VOCR diskgroup is offline, which caused OCR Service to stop and bring down Oracle Clusterware. That was exactly what we saw in the alert.log and crsd.log files.

This case shows that if we have OCR/voting disk stored in ASM, the best way to shut down ASM is to shut down the entire Grid Infrastructure cleanly when all the components of the Grid Infrastructure are up:

$crsctl stop crs

We should avoid just shutting down the ASM instance as Clusterware and ASM are deeply connected and integrated. They should be started up and shut down together.

ASM Storage Structure

In this section, I discuss how Oracle ASM provides a volume manager and a cluster file system for both Oracle Clusterware and RAC Databases. I will start with the ASM storage components such as ASM disk, ASMLib, and ASM diskgroups, and then examine the ASM file system structure and how to manage ASM diskgroup and ASM files and directories.

ASM Disks

The ASM disk is the basic element of the Oracle ASM storage structure. It forms the Oracle ASM diskgroup. An ASM disk is usually based on a piece of a storage device such as a physical disk, or a storage volume like the LUN of a storage array. Depending on how the storage is presented to the RAC node, an ASM disk can be built on a block device or a partition of block device, network-attached files, or a pseudo-device.

As stated in the preceding “Multipath Device Configuration” section, in an Oracle RAC environment that has multiple I/O paths to the external shared SAN storage, the same LUN in the SAN storage is presented as multiple devices on a RAC node, for example /dev/sdc and /dev/sde. These devices point to the same storage volume and have the same SCSI ID. We should not take these devices as different ASM disks; otherwise, Oracle ASM will find these ASM disks for the same storage volume and report an error. We should also not use just one device name, such as /dev/sdc as for an ASM disk, because the ASM disk based on this single I/O path device will not take advantage of the multiple I/O paths to this storage volume. A highly recommended method is to use the multipath pseudo-device of this storage volume for the ASM disk. Although Oracle ASM itself doesn’t provide a way to implement multipathing, Oracle ASM can work with the OS multipath utility by using the multipath pseudo-device for the ASM disk. This method ensures that the Oracle ASM disk can benefit from load balancing as well as high availability against the single point of failure on the I/O paths from the RAC node to the storage array.

In order for an ASM instance to use a storage volume for the ASM disk, the ASM instance needs to discover the storage volume. To help an ASM discover a storage volume, a proper value may need to be set for the ASM instance initialization parameter ASM_DISKSTRING with a pattern to match the device name that represents the storage volume. For example, to discover a storage volume that is represented by a Linux multipath pseudo-device with a name pattern like /dev/mapper/, the ASM instance initialization parameter can be set as ASM_DISKSTRING =/dev/mapper/*.

Another requirement is ownership of the storage devices. For example, in the Linux environment, all the devices are owned by the root user by default. In the previous section entitled “Set Ownership of the Devices,” I discussed the method that uses the Linux udev rule utility to change the owner of the devices from root to the owner of the ASM instance such as the grid user. The following will discuss how to use ASMLib to create ASM disks with the proper ownership setting.

ASMLib: What and Why

For Linux platforms, especially Oracle Linux and Red Hat Enterprise Linux, Oracle introduced a support library called Oracle ASMLib that provides an alternative interface to access disks and gives the Oracle Database more efficient and capable access to the disk groups. Notice that since ASMLib is an optional tool, use of ASMLib is not required for you to use ASM for your Oracle Database. ASMLib provides a simplified method to meet the two requirements for preparing storage devices for ASM disks:

Ensure the name consistency of a storage volume across all the RAC nodes
Set the proper ownership and permissions on the disk devices that are used for ASM disks

ASMLib can achieve both requirements. If you use ASMLib in your RAC environment, you don’t have to set the udev rules. Table 5-2. compares these three methods.

Table 5-2. Comparison of Multipathing, udev Rules, and ASMLib

* As for Red Hat Enterprise Linux 6.x, the uid and gid parameters in multipath.conf file that were previously used to set the ownership of devices have been deprecated. The ownership of the device is set by means of udev rules.

In addition to the two benefits mentioned previously, with ASMLib, it is possible to add a data protection feature in the Linux kernel that will allow the checksum operation of a data block. This checksum will be validated at the firmware level of the storage adapter before the data is sent to the storage network, where it is validated again at the other end of the storage network before writing the data block to the physical disk. Wim Coekaerts’s ASMLib blog at https://blogs.oracle.com/wim/entry/asmLib explains this new feature, which has its first implemention through a joint effort by EMC, Emulex, and Oracle. For the details of this solution and its implementation, refer to the article “How to Prevent Silent Data Corruption” at this link: www.oracle.com/technetwork/articles/servers-storage-dev/silent-data-corruption-1911480.html.

Download and Install ASMLib Packages

ASMLib configuration starts with ASMLib package installation. All ASMLib installations require three rpms packages installed on the host:

Support package: oracleasm-support
Tool library: oracleasmlib
Kernel driver: oracleasm

For Oracle Linux 5.x and Red Hat Enterprise Linux 5.x, you can download the rpms of these packages from www.oracle.com/technetwork/server-storage/linux/downloads/rhel5-084877.html and install them with the rpm tool. For example, in Oracle Linux 5.8:

#rpm -ivh oracleasm-support-2.1.7-1.el5.x86_64.rpm
#rpm -ivh oracleasm-2.6.18-274.el5-2.0.5-1.el5.x86_64.rpm
#rpm -ivh oracleasmlib-2.0.4-1.el5.x86_64.rpm

The stories for Oracle Linux 6.x and Red Hat Enterprise Linux 6.x are different. Oracle ships Oracle Enterprise Linux 6.x with two sets of kernel: Unbreakable Enterprise Kernel (UEK kernel), and Red Hat Compatible Kernel. You can boot up your operating system with either of these kernels. For either of these two kernels, you need to download the oracleasm-support from the Oracle Unbreakable Linux Network (ULN) (https://linux.oracle.com/) if you have an active support subscription or from the Oracle public Yum repository (http://public-yum.oracle.com). You also need to download oracleasmlib from the Oracle ASMLib 2.0 web site: http://www.oracle.com/technetwork/server-storage/linux/asmlib/ol6-1709075.html.

However, for the oracleasm kernel driver, you will need to treat these two kernels differently. Since the oracleasm kernel driver is already built into the UEK kernel, you don’t need to load the oracleasm kernel driver if you boot up your OS using the UEK kernel. If you use Oracle Linux 6.x Red Hat Campatible kernel, you need to download the Oracleasm kernel driver kmod-oracleasm-2.0.6.rh1-2.el6.x86_64.rpm manually from ULN and install it:

rpm -ivh kmod-oracleasm-2.0.6.rh1-2.el6.x86_64.rpm

or you can install it from Oracle public yum http://public-yum.oracle.com using the yum tool:

# yum install kmod-oracleasm

This kernel driver is not version-specific, and you don’t need to upgrade the driver when the kernel is upgraded. However, in order to install this rpm package, you need to run Oracle Linux Server release 6.4 and later with kernel version > 2.6.32-358.el6. Otherwise, you will get an error message similar to this one, where the OS version is Oracle Linux Server release 6.2:

[root]# rpm -ivh kmod-oracleasm-2.0.6.rh1-2.el6.x86_64.rpm
error: Failed dependencies:
        kernel >= 2.6.32-358.el6 is needed by kmod-oracleasm-2.0.6.rh1-2.el6.x86_64
        kernel(kmem_cache_alloc_trace) = 0x2044fa9e is needed by kmod-oracleasm-2.0.6.rh1-2.el6.x86_64

Starting with Red Hat Enterprise Linux Server 6.4, you can install ASMLib. The kernel driver package ‘kmod-oracleasm’ for Red Hat Enterprise Linux Server 6.4 now is available in the Red Hat Enterprise Linux 6 Supplementary RHN channel. You also need to download and install ‘oracleasmlib’ and ‘oracleasm-support’ packages, as they are required for the ASMLib kernel package ‘kmod-oracleasm’. These two packages are maintained by Oracle and can be downloaded from Oracle Technology Network at www.oracle.com/technetwork/server-storage/linux/asmlib/rhel6-1940776.html.

For other details about Oracle ASMLib for Red Hat Enterprise Linux, refer to Oracle support note “Oracle ASMLib Software Update Policy for Red Hat Enterprise Linux Supported by Red Hat [ID 1089399.1].” For ASMLib support in other versions of Linux, you can refer to the ASMLib Release Notes at www.oracle.com/technetwork/server-storage/linux/release-notes-092521.html.

Configure ASMLib and Create ASM Disks

After you install the three ASMLib packags (two ASMLib packages for UEK kernel), you need to configure ASM library drivers with the oracleasm configure command, as shown in the following example:

# oracleasm configure -i
Configuring the Oracle ASM library driver.
This will configure the on-boot properties of the Oracle ASM library
driver. The following questions will determine whether the driver is
loaded on boot and what permissions it will have. The current values
will be shown in brackets ('[]'). Hitting <ENTER> without typing an
answer will keep that current value. Ctrl-C will abort.
Default user to own the driver interface []: grid
Default group to own the driver interface []: asmadmin
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done

This configuration will ensure that all the ASM disks are owned by grid user and group asmadmin. Then, you can create the ASM disks on one RAC node:

# service oracleasm createdisk  OCR1 /dev/mapper/ocrvoting1
# service oracleasm createdisk  OCR2 /dev/mapper/ocrvoting2
# service oracleasm createdisk  OCR3 /dev/mapper/ocrvoting3
# service oracleasm createdisk  DATA1 /dev/mapper/data1
# service oracleasm createdisk  DATA2 /dev/mapper/data2
# service oracleasm createdisk  FRA /dev/mapper/fra
Scan ASM disks on all other RAC nodes:
# service oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]
# service oracleasm listdisks
DATA1
DATA2
FRA
OCR1
OCR2
OCR3

With Oracle ASMLib, Oracle ASM disks are named with the prefix ‘ORCL:’, such as ‘ORCL:OCR1’ and ‘ORCL:DATA.’ The Oracle ASM instance is able to discover these ASM disks with the default ASM_DISKSTRING setting, and you can see the ASM disks in /dev/oracleasm/disks. All of these ASM disks are owned by the grid user.

# ls –l /dev/oracleasm/disks
total 0
brw-rw---- 1 grid asmadmin    8,       81    OCT 20, 10:35    DATA1
brw-rw---- 1 grid asmadmin    8,       49    OCT 20, 10:35    DATA2
brw-rw---- 1 grid asmadmin    8,      129    OCT 20, 10:35    DATA3
brw-rw---- 1 grid asmadmin    8,      130    OCT 20, 10:35    DATA4
brw-rw---- 1 grid asmadmin    8,       65    OCT 20, 10:35    OCR1
brw-rw---- 1 grid asmadmin    8,       97    OCT 20, 10:35    OCR2
brw-rw---- 1 grid asmadmin    8,       98    OCT 20, 10:35    OCR3
brw-rw---- 1 grid asmadmin    8,       98    OCT 20, 10:35    FRA

Notice that all the Oracle ASMLib commands require root privilege to execute.

ASM Diskgroup

Once an ASM disk is discovered, it can be used to create ASM diskgroups that are used to store the file systems for Clusterware and the database files and ACFS Oracle home. Every Oracle ASM diskgroup is divided into allocation units (AU), the sizes of which are determined by the AU_SIZE disk group attribute. The AU_SIZE can be 1, 2, 4, 8, 16, 32, or 64 MB. Files that are stored in an ASM diskgroup are separated into stripes and evenly spread across all the disks in the diskgroup. This striping aims to balance loads across all the disks in the disk group and reduce I/O latency. Every ASM disk that participates in striping should have the same disk capacity and performance characteristics. There are two types of striping: coarse striping and fine striping. The coarse striping size depends on the size of the AU; it is used for most files, including database files, backup sets, etc. The fine striping is used for control files, online redo logs, and flash back logs. The stripe size is 128 KB.

The failure group concept was introduced to define a subset of disks in a diskgroup that could fail at the same time; for example, disks in the same failure group could be linked to the same storage controller. If we mirror the storage in an ASM diskgroup, we want to make sure to put the mirroring copies on the disks in different failure groups to avoid losing all the mirroring copies at the same time.

ASM provides three types of redundancy for ASM diskgroup:

External Redundancy: Oracle ASM does not provide mirroring redundancy and depends on the system to provide the RAID. The ASM files are striped across all the disks of the diskgroup. In this setting, the ASM diskgroup cannot tolerate the failure of any disk in the diskgroup. A failure of one or more disks in the diskgroup leads to dismount of the diskgroup from the ASM instance.
It is highly recommended to use an external RAID configuration such as RAID 1+0 for the ASM disks to ensure the redundancy of the disks.
Normal Redundancy: Oracle ASM uses two-way mirroring. This requires having two failure groups for mirroring. The effective disk space is one-half of the entire disk’s capacity. The mirroring is at the file extent level, which means all the files have two copies of every extent: the primary extent and the mirroring extent. In order to achieve the maximum I/O bandwidth and load balance on all the disks, the primary extents are distributed across both failure groups. If the primary copy of an extent fails, the mirroring copy of the extent will be read from the other failure group.
High Redundancy:Oracle ASM uses three-way mirroring. It requires at least three failure groups for mirroring. Due to this three-way mirroring, the effective disk space is one-third of all the disk capacity. With this setting, the ASM diskgroup can tolerate the failure of two failure groups: if the primary copy of the extent fails, one of the mirroring copies will be used. If both the primary copy and one of the mirroring copies fail, the remaining mirroring copy will be used.

The ASM diskgroup can be created with the ASMCA GUI tool, or with a SQL command like this one after login to ASM instance using SYSASM privilege:

SQL>  CREATE DISKGROUP  data  NORMAL  REDUNDANCY
   FAILGROUP fg1 disk 'ORCL:DATA1' name data1, 'ORCL:DATA2' name data2,
   FAILGROUP fg2  disk  'ORCL:DATA3' name data3, 'ORCL:DATA4' name data4;

If you decide to take advantage of the RAID 1+1 configuration in external SAN storage, you can use external redundancy for the diskgroup:

SQL>  CREATE DISKGROUP  data  EXTERNAL  REDUNDANCY
   Disk 'ORCL:DATA1' name data1, 'ORCL:DATA2' name data2;

You can perform the diskgroup administration tasks in an ASM instance on one of the RAC nodes with sqlplus:

SQL>  Drop DISKGROUP data  INCLUDING CONTENTS;
SQL>  ALTER DISKGROUP DATA ADD DISK 'OCR:DATA3'  REBALANCE Power 3
SQL> ALTER DISKGROUP DATA DROP DISK data2  REBALANCE Power 3

The REBALANCE Power clause specifies the power for the rebalancing operation. Combining the add disk and drop disk operations in a single alter diskgroup operation will allow you to perform online storage migration for your database: that is, to migrate your database from one storage to another while keeping it online.

For example, imagine that you want to migrate your database from an old storage to a new storage. DATA1 and DATA2 ASM disks are on the old storage and DATA3 and DATA4 ASM disks are on the new storage. Executing the following ‘alter diskgroup’ SQL command in the ASM instance of one RAC node will migrate your database from the old storage to the new storage without any database downtime:

SQL>ALTER DISKGROUP DATA ADD DISK 'ORCL:DATA3' , 'ORCL:DATA4'
                   DROP DISK 'ORCL:DATA1','ORCL:DATA2' REBALANCE Power 8;

This feature has been widely used for storage migration and can be considered one of the greatest benefits of using ASM as the storage solution for Oracle Database.

The range of values for Power clause is 0-11 inclusive, if the diskgroup ASM compatibility is set to less than 11.2.0.2. This range is extended to 1-1024 if the diskgroup ASM compatibility is set to 11.2.0.0 or higher; for example, COMPATIBLE.ASM=12.1.0.0 in Oracle 12cR1 ASM.ASMCMD Utility and File System.

Oracle ASM provides the volume manager and a file system for the Oracle Database files.

When an ASM diskgroup is created in ASM, this diskgroup will be presented as a file system for Oracle Databases. Unlike the OS file system, which you see and manage through OS file system commands, ASM files have to be managed through the following two interfaces:

Oracle ASM command-line utility (ASMCMD): This utility provides a tool to administer Oracle ASM; for example, to manage ASM instances, diskgroups, and file access control for disk groups, files, and directories with a diskgroup.
SQL commands: You can log in to the ASM instance to execute the SQL command. It also provides a set of V$ ASM views for you to query the status of the ASM disks, diskgroup, etc.

ASMCMD Utility

To use the ASMCMD utility, you need to log in as an OS user such as a grid user in SYSDBA group and set the ORACLE_SID and ORACLE_HOME environment variables:

$ env | grep ORA
ORACLE_SID=+ASM1
ORACLE_HOME=/u01/app/12.1.0/grid

Get into the ASMCMD utility by running the OS command ‘asmcmd’ to connect to the ASM instance.

[grid@k2r720n1 ∼]$ asmcmd
ASMCMD>

Then you can run the ASMCMD commands in the interactive mode from the ASMCMD> prompt.

ASMCMD> du
Used_MB      Mirror_used_MB
1036439       1036439

Most ASMCMD commands are similar to Linux file system commands, such as ls, cd, mkdir, pwd, rm, etc. You can type “help” to get a list of ASMCMD commands. The ASMCMD utility can also be used in non-interactive mode. In non-interactive mode, you can run a single asmcmd command with the ASMCMD utility.

[grid@k2r720n1 ∼]$ asmcmd ls

Or put a series of ASMCMD commands in a command file like this test.cmd file:

[grid@k2r720n1 ∼]$ cat test.cmd
ls
pwd
du

And execute this command file using the ASMCMD utility

[grid@k2r720n1 ∼]$  asmcmd <test.cmd
ASMCMD> DATA/
VOCR/
ASMCMD> +
ASMCMD> Used_MB      Mirror_used_MB
                  1036439                  1036439

ASM File System

We can use the ASMCMD utility to examine the ASM file system structure. When you enter the ASMCMD utility, you are at the very top of the ASM file system marked as “+,” which is similar to “/” on the Linux file system. From the top, you can see all the diskgroups in ASM which are subdirectories under “+” and navigate the file system directory down using the cd command:

[grid@k2r720n1 ∼]$ asmcmd
ASMCMD> pwd
+
ASMCMD> ls
DATA/
VOCR/
ASMCMD> cd DATA
ASMCMD> cd KHDB

On the ASM file system, each file is given a fully qualified file name during its creation.

+diskgroup/dbname/filetype/filetypetag.file.incarnation.

For example, the system tablespace is stored in: file:'+VOCR/KHDB/DATAFILE/SYSTEM.256.789808227'

ASMCMD> ls -l +VOCR/KHDB/DATAFILE/SYSTEM.256.789808227
Type      Redund  Striped  Time             Sys  Name
DATAFILE  UNPROT  COARSE   NOV 01 07:00:00  Y    SYSTEM.256.789808227

The control file: ‘+DATA/KHDB/CONTROLFILE/ Current.256.789808289’:

ASMCMD> ls -l +DATA/KHDB/CONTROLFILE/Current.256.789808289
Type         Redund  Striped  Time             Sys  Name
CONTROLFILE  UNPROT  FINE     NOV 01 07:00:00  Y    Current.256.789808289

This fully qualified file name also indicates the file directory structure where the file is located. You can use the cd command to navigate the directory structure to reach the leaf directory where this file is located.

ASMCMD> cd +VOCR/KHDB/DATAFILE
ASMCMD> ls
SYSAUX.257.789808227
SYSTEM.256.789808227
UNDOTBS1.258.789808227
UNDOTBS2.264.789808307
USERS.259.789808227

The ASM file that stores OCR and voting disks looks like this:

ASMCMD> ls -l +VOCR/kr720n-scan/OCRFILE/REGISTRY.255.789801895
Type     Redund  Striped  Time             Sys  Name
OCRFILE  UNPROT  COARSE   NOV 01 07:00:00  Y    REGISTRY.255.789801895

This is the ASM instance spfile:

ASMCMD> ls -l +VOCR/kr720n-scan/ASMPARAMETERFILE/REGISTRY.253.789801893
Type              Redund  Striped  Time             Sys  Name
ASMPARAMETERFILE  UNPROT  COARSE   JUL 28 05:00:00  Y    REGISTRY.253.789801893

Note Since these two files are not the database file, they use the Clusterware name ‘kr720n-scan’ instead of a database name on the file path.

Each ASM diskgroup can store multiple database files and Clusterware and ASM instance files. For example:

+VOCR stores the files of the KHDB database and the kr720n-scan cluster’s files:

ASMCMD> pwd
+VOCR
ASMCMD> ls
KHDB/
KHRN/
kr720n-scan/

Manage ASM Using SQL Command and V$ASM Views

A set of SQL commands and V$ ASM views can be used to administer ASM diskgroups and ASM disks as well as the ASM instance itself. In the last section, I showed the SQL commands that are used to create and alter diskgroups. This section covers some V$ ASM views and how to write queries using these views to monitor the capacity and performance information of ASM disks and ASM diskgroup. Some of the commonly used V$ ASM views include V$ASM_DISKGROUP, V$ASM_DISKS, V$ASM_DISK_STAT, V$ASM_DISKGROUP_STAT.

The following query shows the capacity and space usage of ASM diskgroups and the ASM disks:

SQL>SELECT D.PATH, D.TOTAL_MB DISKSIZE, G.NAME GROUP_NAME,
G.TOTAL_MB GROUPSIZE,  G.FREE_MB GROUP_FREE
FROM V$ASM_DISK D, V$ASM_DISKGROUP G WHERE
 D.GROUP_NUMBER = G.GROUP_NUMBER;
 
PATH            FAILGROUP       DISKSIZE GROUP_NA GROUPSIZE GROUP_FREE
---------------    ---------------          --------          --------          ---------              ----------
/dev/dm-7      DATA_0000       1433589      DATA          1433589           515833
/dev/dm-5      VOCR_0000       138231       VOCR           138231           19298

And this query shows the I/O performance on each ASM disk:

SQL> SELECT PATH, READS, WRITES, READ_TIME, WRITE_TIME,
READ_TIME/ READS  AVEREADTIME,  WRITE_TIME/ WRITES  AVGWRITETIME
FROM  V$ASM_DISK_STAT;
   
PATH            READS     WRITES  READ_TIME    WRITE_TIME AVEREADTIME AVGWRITETIME
---------- ---------- ---------- ---------- -                            ---------            -----------                 ------------
/dev/dm-7        2371     193779      .747134            258.353864  .000315113          .00133324
/dev/dm-5     2237295  600232      635.231634     453.596385   .000283928          .000755702

Store OCR and Voting Disk in ASM

This section will look at ways of storing the two most important components of Oracle Clusterware: OCR and voting disk files in ASM.

Choose ASM for OCR and Voting Disk at GI Installation

On different Oracle Clusterware releases, storage for OCR and voting disk varies: raw devices on 10gR2, block devices on 11gR1. Starting with 11g R2, OCR file and voting disk files can be stored in ASM. Unless you are upgrading your Clusterware from 11gR1 to 11gR2 where you can keep the voting disk in the block device, you should put the voting disk file in ASM. OCR and voting disk files can be stored in the same ASM diskgroup as Oracle Database files or in a separate ASM diskgroup. The ASM diskgroup that stores voting disk files can be configured with one of the following three redundancy level settings. These are similar to a regular diskgroup for database files, but their requirement for the minimum numbers of failure groups is higher than those diskgroups that only store database files. The number of the voting disk files is determined by the number of the failure groups of the ASM diskgroup that stores the voting disk files:

External Redundancy: no mirroring; only one failure group is needed. This provides only one copy of the voting disk file. It is strongly recommended to have an external RAID configuration for this setting.
Normal Redundancy: at least three failure groups are needed. This provides three copies of the voting disk files.
High Redundancy: at least five failure groups needed. This provides five copies of the voting disk files.

If you decide to store the voting disk file in the same ASM diskgroup as the data files, adding an additional failure group could mean adding a huge disk space, which you may not have. To solve this problem, a new quorum failure group is introduced. When this new failure group is added to a diskgroup, it is only used to store an extra copy of voting disk file, and non-database files will be stored in this failure group. As a result, the size of the disk for this failure group can be much smaller than other disks in the ASM diskgroup. For example, the ASM disk for a quorum failure group can be as small as 300MB to cover the 200MB voting disk file. The following example shows how to use the quorum failgroup clause to include a quorum failure group in a diskgroup:

CREATE DISKGROUP data NORMAL REDUDANCY
FAILGROUP fg1 DISK 'ORCL:DATA1'
FAILGROUP fg1 DISK 'ORCL:DATA2'
QUORUM FAILGROIUP fg3 DISK 'ORCL:OCR1'
ATTRIBUTE  'compatible.asm' = 12.1.0.0.0;

In this case DISK ORCL:DATA1 and ORCL:DATA2 can be 200GB each, DISK ORCL:OCR1 can be small as 300MB.

During the Grid Infrastructure installation you will be asked to provide the disks to create the ASM diskgroup for OCR and voting disk files (Figure 5-10). You have an option to select the redundancy level setting of the ASM diskgroup for OCR and voting disks.

Figure 5-10. Select three-ASM disk for the high-redundancy diskgroup

Oracle 12c R1 introduced the Grid Infrastructure Management repository (also called Cluster Health Monitor (CHM) repository), which is a central database used to store all the metrics database collected by the System Monitor Service process of the Clusterware in all the cluster nodes. Currently, this repository is configured during Oracle Grid Infrastructure installation. Refer to the CHM section in Chapter 2. Figure 2-6 shows the Grid Infrastructure Management Repository Option that you can select during Grid Infrastructure installation. If you select the Grid Infrastructure Management Repository option, the diskgroup you are going to create needs to be big enough to store the CHM repository as well as OCR and voting disks; otherwise you will get the error message shown in Figure 5-11. In my test run, I allocated a 20GB diskgroup to store CHM repository, OCR, and voting disks. Once I made this ASM diskgroup a bigger size such as 100 GB, this issue was solved.

Figure 5-11. The diskgroup has insufficient space for the CHM repository

As part of a successful Grid Infrastructure installation, the VOCR diskgroup was created with three failure diskgroups: VOCR1, VOCR2, and VOCR3. Three copies of voting disk files are stored in this diskgroup, as shown in the following queries:

SQL> SELECT d.PATH, g.NAME, d.FAILGROUP FROM V$ASM_DISK d, V$ASM_DISKGROUP g
           where d.GROUP_NUMBER = g.GROUP_NUMBER and g.name='OCRVOTDSK';
 
PATH                 NAME                        FAILGROUP
----------           ----------                 ------------
OCRL:VOCR1           VOCRVOTDSK                  VOCR1
OCRL:VOCR2           VOCRVOTDSK                  VOCR2
OCRL:VOCR3           VOCRVOTDSK                  VOCR3

Three copies of voting disk files are as follows:

[grid@kr720n1 bin]$ ./crsctl   query css votedisk
##    STATE    File Universal Id                                     File Name          Disk group
--    -----    --------------------------------                     --------------     ------------
 1.   ONLINE   8b562c9b2ec34f88bfe8343142318db7                     (ORCL:VOCR1)        [VOCR]
 2.   ONLINE   1c88a98d5c9a4f88bf55549b6c2dc298                     (ORCL:VOCR2)        [VOCR]
 3.   ONLINE   7581ef0020094fccbf1c8d3bca346eb1                     (ORCL:VOCR3)        [VOCR]
]
 
Located three voting disk(s).

Unlike the voting disk files, only one copy of OCR can be stored in one ASM diskgroup or a cluster file system. Oracle requires at least one OCR copy with the ability to add up to five copies of OCR. If you want to store multiple copies of the OCR, you can store them on multiple ASM diskgroups or a combination of ASM diskgroups and cluster file systems.

In order to add an additional copy of OCR on a different ASM diskgroup (for example, DATA2), you can use the following command:

$ocrconfig -add +DATA2

This example shows how to add an additional copy of your OCR on a cluster file system labeled /u01/data:

$ocrconfig –add /u01/data/ocr.dbf

Another example showing how to move the original OCR to a different ASM diskgroup DATA3 is as follows:

$ocrconfig –replace +VOCR replacement +DATA3

The following example shows how to list all copies of OCR:

$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2860
Available space (kbytes) : 259260
ID : 2322227200
Device/File Name :+VOCR Device/File integrity check succeeded
Device/File Name :+DATA1 Device/File integrity check succeeded
Device/File Name :+DATA2 Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded

In this example, there are three copies of OCR. They are stored in three ASM diskgroups: +VOC, +DATA1, and +DATA2, respectively.

Move OCR and Voting Disk Files to a New ASM Diskgroup

Although it may not frequently occur, you may need to move your OCR and voting disk files to a different ASM diskgroup. This kind of task may be required on two occasions: 1) moving the OCR and voting disk files of a newly upgraded 11gR2 Clusterware from a raw device or block device to an ASM diskgroup; and 2) moving the OCR and voting disk files from one diskgroup to a new diskgroup. The methods for moving OCR and voting disks in these two cases are very similar. The following is an example of implementing the OCR and voting disk files from an ASM diskgroup. These steps also apply if you need to move them from raw devices or block devices.

Show that the current OCR and voting disks are in +OCRV diskgroup:

$ ocrcheck
Status of Oracle Cluster Registry is as follows:
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3480
         Available space (kbytes) :     258640
         ID                       : 1318929051
         Device/File Name         :      +OCRV
                                    Device/File integrity check succeeded
 
                                    Device/File not configured
 
                                    Device/File not configured
 
                                    Device/File not configured
 
                                    Device/File not configured
 
         Cluster registry integrity check succeeded
 
         Logical corruption check bypassed due to non-privileged user
 
$ crsctl query css votedisk
##  STATE    File Universal Id                   File Name      Disk group
--  -----    -----------------                   ---------      ---------
1. ONLINE    327b1f75f8374f1ebf8611c847ffbdad    (ORCL:OCR1)    [OCRV]
2. ONLINE    985958d30e4f4f45bf8134a9b07cae4f    (ORCL:OCR2)    [OCRV]
3. ONLINE    80650a28b2fa4ffdbf93716a8bc22668    (ORCL:OCR3)    [OCRV]

Create the ASM diskgroup “VOCR”:

SQL>CREATE DISKGROUP VOCR NORMAL REDUNDANCY
FAILGROUP VOCRG1 DISK  'ORCL:VOCR1' name VOCR1
FAILGROUP VOCRG2 DISK  'ORCL:VOCR2' name VOCR2
FAILGROUP VOCRG3 DISK  'ORCL:VOCR3' name VOCR3;
SQL>alter diskgroup VOCR set attribute 'compatible.asm'=12.1.0.0.0';

Move OCR to the new ASM diskgroup:
Add the new ASM diskgroup for OCR
```
$ocrconfig -add +VOCR
```
Drop the old ASM diskgroup from OCR
```
$ocrconfig -delete +OCRV
```

Move the voting disk files from old ASM diskgroup to the new ASM diskgroup:

$ crsctl replace votedisk +VOCR
Successful addition of voting disk 29adbae485454f72bf9d66519c921e17.
Successful addition of voting disk 529b802332674f9fbf8543d5acd55672.
Successful addition of voting disk 017523650cd64f47bf65bb90e8ed98e6.
Successful deletion of voting disk 327b1f75f8374f1ebf8611c847ffbdad.
Successful deletion of voting disk 985958d30e4f4f45bf8134a9b07cae4f.
Successful deletion of voting disk 80650a28b2fa4ffdbf93716a8bc22668.
Successfully replaced voting disk group with +VOCR.
CRS-4266: Voting file(s) successfully replaced
 
$ crsctl query css votedisk
##  STATE    File Universal Id                   File Name       Disk group
--  -----    -----------------                   ---------       ---------
1. ONLINE    29adbae485454f72bf9d66519c921e17    (ORCL:VOCR1)    [VOCR]
2. ONLINE    529b802332674f9fbf8543d5acd55672    (ORCL:VOCR2)    [VOCR]
3. ONLINE    017523650cd64f47bf65bb90e8ed98e6    (ORCL:VOCR3)    [VOCR]
Located 3 voting disk(s).

Make sure the /etc/oracle/ocr.loc file gets updated to point to the new ASM diskgroup:

$ more /etc/oracle/ocr.loc
ocrconfig_loc=+VOCR
local_only=FALSE

Shut down and restart CRS using the force option “the Clusterware on all the nodes”:
```
#./crsctl stop  crs –f
# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
```
Before you can dismount the old ASM diskgroup, you need to check if there is anything on it. In this example, we found that the ASM instance spfile was on that diskgroup and needed to move the spfile to the new ASM diskgroup.

Move the ASM spfile to the new diskgroup.

Check the current ASM spfile location:

SQL> show parameter spfile
 
NAME         TYPE            VALUE
----         -----------     ------------------------------------------------------------
spfile       string         +OCRV/k2r720n-cluster/asmparameterfile/registry.253.773325457
    Create a pfile from the spfile:
       SQL> create pfile='/home/grid/init+ASM.ora' from spfile;

Recreate the new spfile on the new diskgroup VOCR from the pfile:

SQL> create spfile='+VOCR' from pfile='/home/grid/init+ASM.ora';

Restart HAS (Oracle High Availability Services)

# ./crsctl stop has
# ./crsctl start has

Check the new spfile location:

SQL> show parameter spfile;
 
NAME         TYPE           VALUE
----         -----------    ------------------------------------------------------------
spfile       string         +VOCR/k2r720n-cluster/asmparameterfile/registry.253.779041523

In this example, there are a few places where you need to pay special attention. Since the new ASM diskgroup VOCR is for the OCR and voting disk files, you need to follow the failure group rules, namely, three failure groups for normal redundancy and five failure groups for high redundancy; otherwise, the command ’crsctl replace votedisk +VOCR’ would fail with the error “ora-15274 error, Not enough failgroups(s) to create voting file.” You also need to set the compatible parameter to 12.1.0.0.0 for the new VOCR diskgroup in order for the VOCR diskgroup to store the OCR and the voting disk file. The default setting of the compatible parameter of an ASM diskgroup is ‘10.1.0.0.0’.

ACFS

When it was introduced in Oracle 10gR1, Oracle ASM was designed to be a volume manager and a file system for Oracle Database files, not for general-purpose files. This remained the case until Oracle ACFS was introduced in Oracle 11gR2. Oracle ACFS is an extension of Oracle ASM to store those non–Oracle Database files. Such files can be software binaries such as Oracle Database binary, applications files, trace file, BFILES, video, audio, text, images, and other general-purpose files. To help you understand the file types supported by Oracle ASM and Oracle ACFS, here are some general guidelines:

Oracle ASM is designed for and optimized to provide the best performance for Oracle Database files such as data files, controlfiles, spfiles, redo logs, tempfiles, and the OCR and voting disk files of Oracle Clusterware.
In Oracle 11gR2, Oracle ACFS provided support for non-database files and didn’t support database files. This limitation has been removed in Oracle 12c. With Oracle 12c, Oracle ACFS provides support for all the database files for Oracle Database 12cR1 except for data files and redo logs in an Oracle Restart configuration and database files on Windows. Oracle ACFS on Windows supports only RMAN backups, archive logs, and data pump dumpsets.
Oracle ACFS supports Oracle Database home files for Oracle Database release 11gR2 or higher. However, it doesn’t support any files associated with the management of Oracle ASM. It doesn’t support Oracle Grid Infrastructure home, which should be stored in the local disk in each RAC node.
Starting with 11.2.03, Oracle ACFS supports the RMAN backups archive logs (ARCGIVELOG file type) and data pump dump sets (DUMPSET file type).

In Oracle 12cR1, Oracle ACFS supports all database files for Oracle Database 12cR1, except for data files and redo logs in an Oracle Restart configuration. It also doesn’t support data files for Oracle 12cR1 on Windows. Like Oracle ASM, Oracle ACFS is based on the Oracle ASM technology. Since Oracle ACFS files are stored in the Oracle ASM diskgroup, ACFS leverages data reliability through redundancy and mirroring within the diskgroup and also benefits from I/O load balancing through data striping among the ASM disks. Oracle ACFS is a cluster file system which can be accessed from all nodes of a cluster. Some of the use cases of this cluster file system include the shared Oracle RAC Database home and other applications’ binaries and files that need to be shared by multiple hosts.

Unlike Oracle Database files in Oracle ASM, which are created in an ASM diskgroup directly, Oracle ACFS is based on ASM volumes which are created in an Oracle ASM diskgroup. These volumes are equivalent to the “logical volume” in Linux. Like the LVM (Logical Volume Manager), which provides logical volume management in Linux, starting with Oracle 11gR2, a new feature called Oracle ASM Dynamic Volume Manager (ADVM) was introduced to manage ASM volumes and also to provide the standard device driver interface to file systems such as ACFS, OCFS2, and regular ext3fs.

Figure 5-12 shows an example of ASM diskgroup configuration. In this example, there are two ASM diskgroups in ASM: DATADG and ACFSDG. The diskgroup DATADG is for the Oracle Database files and Oracle Clusterware OCR and voting disk files. Two databases, namely, RACDB1 and RACDB2, store the database files in two directories of the ASM file system in the diskgroup ‘DATADG’:

Figure 5-12. Oracle ASM and Oracle ACFS

+DATADG/RACDB2/
    +DATADG/RACDB2/

And the Oracle 11gR2 Clusterware stores its OCR/voting disk files in +DATADG/<Clustername>.

The diskgroup ACFSDG is for Oracle ACFS. ASM volumes are created in the ASM diskgroup ACFSDG. For each of these volumes, an ASM device on the operating system is automatically created under /dev/asm. These volumes are managed by Oracle ADVM. Oracle ACFS can be created on these volumes. Oracle ADVM provides the standard device driver interface for ACFS to access these volumes. These ACFS are mounted under given OS mount points; for example, /acfs1, /acfs2, and /acfs3 in this case.

Create ACFS

Before starting to create an ACFS, you have to ensure that several prerequisites are met in your environment.

In Oracle 11gR2, you need to ensure that the required modules are loaded by running the lsmode command. These modules are as follows: oracleasm, oracleoks, and oracleacfs. If they are all loaded, the lsmod command should have the following results:

$lsmod
   Module          Size       User by
   Oracleacfs      781732     0
   Oracleadvm      212736     0
   Oracleoks       224992     2 oracleacfs, oracleadvm
   oracleasm       46484      1

If these modules are not automatically loaded, you need to load them manually using the acfsload command. You might see an error message like this:

[root@k2r720n1 ∼]# /u01/app/11.2.0/grid/bin/acfsload start -s
ACFS-9459: ADVM/ACFS is not supported on this OS version: '2.6.32-220.el6.x86_64'

This indicates that the version of OS is not supported by ACFS. Refer to MOS note [1369107.1] to check the current ACFS supported on the OS platform. For example, at the time this chapter was written, ACFS was supported for update 3 or later for Oracle Linux 5.x and Red Hat Enterprise Linux 5.x. For Oracle Linux 6.x and Red Hat Enterprise Linux 6.x, it would be necessary to apply 11.2.0.3.3.GI PSU patch for 6.0, 6.1, 6.2 and required to apply 11.2.0.3.4GI PSU patch for 6.3. If the ACFS is not supported in your OS kernel version, the acfsload command will report the error, as shown in the preceding. And in the ASMCA GUI interface, the Volume tab and ASM Cluster File systems tab are grayed out as shown in Figure 5-13. You need to follow the support document’s instructions to get the ACFS supported in your OS. For example, since the test cluster is running EL 6.2 with kernel version ‘2.6.32-220.el6.x86_64’, it is required to apply 11.2.0.3.3.GI PSU (Patch 13919095) to get the ACFS supported.

Figure 5-13. Volume tab and ASM Cluster file systems tab grayed out

In Oracle Clusterware 12cR1 and Oracle ASM 12cR1, the automatic loading of these modules is supported. For example, these are the related modules automatically loaded on Oracle Linux 6.3 after Oracle 12cR1 Grid Infrastructure installation.

# lsmod | grep oracle
          oracleacfs            3053165   2
          oracleadvm            320180    8
          oracleoks             417171    2 oracleacfs,oracleadvm
          oracleasm             53352     1

The creation of ACFS starts with the ASM volumes. Once all the prerequisites are met, you need to create ASM volumes for ACFS. ASM volumes can be created in one of the following ways:

Use ASMCA GUI tool
User Oracle Enterprise Manager
Use ASMCMD tool
Use SQL*Plus

Here is an example of creating an ASM volume acf_vm1 on diskgroup ACFSDG1 using the ASMCMD tool:

ASMCMD> volcreate -G ACFSDG1 -s 1g acf_vm1
ASMCMD>
ASMCMD> volinfo -G ACFSDG1 acf_vm1
Diskgroup Name: ACFSDG1
 
         Volume Name: ACF_VM1
         Volume Device: /dev/asm/acf_vm1-105
         State: ENABLED
         Size (MB): 1024
         Resize Unit (MB): 32
         Redundancy: UNPROT
         Stripe Columns: 4
         Stripe Width (K): 128
         Usage: Mountpath

Now you can use the mkfs command to create ACFS based on this volume device /dev/ams/ acf_vm1-105:

# mkfs -t acfs  /dev/asm/acf_vm1-105
mkfs.acfs: version                   = 12.1.0.1.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/acf_vm1-105
mkfs.acfs: volume size               = 1073741824
mkfs.acfs: Format complete.

Then you can mount the ACFS. The acfsutil registry command can be used to register the ACFS with the ACFS mount registry. Once being registered on the registry, the ACFS mount registry will ensure that the ACFS are mounted automatically.

Create an OS directory as a mount point: /u01/acfs/ asm_vol1

mkdir /u01/acfs/asm_vol1
/sbin/acfsutil  registry -a /dev/asm/acf_vm1-105 /u01/acfs/asm_vol1
acfsutil registry: mount point /u01/acfs/asm_vol1 successfully added to Oracle Registry

Now you can see the ACFS that is mounted on /901/acfs/asm_vol1.

# df -k | grep 'Filesystem |asm'
Filesystem            1K-blocks      Used      Available  Use%      Mounted on
/dev/asm/acf_vm1-105  1048576        119112    929464     12%       /u01/acfs/asm_vol1

Create ACFS for Oracle RAC Home with ASMCA

In an Oracle RAC Database environment, you have the option of storing the Oracle RAC Database software ORACLE_HOME in an ACFS. You can either use ASMCMD to follow the steps just mentioned to manually create the ACFS for Oracle HOME, or you can use the ASMCA shortcut based on GUI to simplify the creation of ACFS for shared Oracle RAC Database home. Assume that you have created an ASM diskgroup ORAHOME for this ACFS. This step usually occurs after the successful configuration of the Grid Infrastructure installation and configuration and before you are ready to install Oracle RAC software. The creation of this ACFS allows you to install Oracle RAC software on this ACFS to have an Oracle RAC home that can be shared by all the RAC nodes.

We use the ASMCA utility to create a dedicated ASM diskgroup as shown in the following diagram (Figure 5-14).

As shown in Figure 5-14, start ASMCA and highlight the ASM diskgroup ORAHOME that is the diskgroup for ACFS. Right-click to open a drop-down menu and select “Create ACFS for Database Home” from the menu.

Figure 5-14. Start “Create ACFS for Database Home” on the ACSMCA menu
Specify the database home’s volume name, mountpoint, size, owner name, and owner group (Figure 5-15).

Figure 5-15. Specify the parameters for Oracle database home volume
Click OK to start the creation process (Figure 5-16).

Figure 5-16. Creating ACFS for database home
Run the acfs_script.sh as the root user on one of the RAC nodes to complete the creation of ACFS Oracle home (Figure 5-17).

Figure 5-17. Prompt for acfs_script

Check the ACFS:

Node1:

# df -k | grep 'Filesystem |u01'
Filesystem            1K-blocks Used       Available   Use%    Mounted on
/dev/asm/orahome-10   47185920  159172     47026748    1%      /u01/app/oracle/acfsorahome

Node2:

#df -k | grep 'Filesystem |u01'
Filesystem           1K-blocks Used       Available    Use%    Mounted on
/dev/asm/orahome-10   47185920  159172      47026748     1%      /u01/app/oracle/acfsorahome

Summary

Shared storage is one of the key infrastructure components for Oracle Grid Infrastructure and the RAC Database. Storage design and configuration can largely determine RAC Database availability and performance. The introduction of Oracle ASM has significantly simplified storage configuration and management. The striping and mirroring features of Oracle ASM also help improve storage I/O performance and reliability against hardware failure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5: Storage and ASM Practices

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5: Storage and ASM Practices