Performance
This chapter describes how IBM FlashSystem software and IBM FlashSystem A9000 or IBM FlashSystem A9000R hardware work together to deliver and monitor system performance. Performance characteristics are ingrained in the system design to deliver optimized and consistent performance. Little else is necessary on FlashSystem A9000 or FlashSystem A9000R that can contribute to performance gains beyond what the system automatically provides. However, several considerations and practices can help prioritize performance for business-critical applications or certain hosts or domains.
This chapter includes the following sections:
 
6.1 Performance considerations
The architecture of FlashSystem A9000 and FlashSystem A9000R is designed to deliver a high performance, hotspot-free storage system.
Real-world production environments involve multiple application servers that make multiple simultaneous I/O demands on storage systems. When customers decide to purchase a FlashSystem A9000 or FlashSystem A9000R, they have the reasonable expectation that they will migrate existing applications or install new applications to FlashSystem A9000 or FlashSystem A9000R and experience great flash performance.
6.1.1 Sizing
To get the required performance from a FlashSystem A9000 or FlashSystem A9000R, it is essential, as with any storage system, to size the system characteristics correctly. For FlashSystem A9000R, each grid controller adds an amount of cache and processing power to the grid, and each additional flash enclosure adds more capacity and performance capabilities. The overall system performance increases with the number of grid elements that are included in FlashSystem A9000R.
The best performance configuration might look different from the required capacity configuration. Storage requirements might indicate that a FlashSystem A9000R that is built of just two grid elements is sufficient to fulfill the capacity needs, but the performance requirements of the applications might indicate the need for a FlashSystem A9000R that consists of four grid elements. In this case, to satisfy the performance requirements, you need to select the larger configuration of four grid elements.
Performance requirements can be determined in several ways:
Benchmark results
Application vendor specifications (based on benchmark testing)
Actual /O performance characteristics of the existing storage that is being replaced by FlashSystem A9000 or FlashSystem A9000R
6.1.2 Multipathing considerations
The optimum performance of FlashSystem A9000 and FlashSystem A9000R is realized by maximizing the use of the grid controllers. Ensuring this optimal usage and balancing the application workload among all of the grid controllers are the most important considerations when you deploy FlashSystem A9000 or FlashSystem A9000R in your environment.
 
Important: When you plan the host connections to FlashSystem A9000 and FlashSystem A9000R, it is important to ensure that all grid controllers are used.
One main multipathing goal, from a performance perspective, is for the host connectivity to create a balance of the I/O workload across all of the resources in FlashSystem A9000 or FlashSystem A9000R. The best way to achieve this balance is by distributing the host physical connections evenly across all of the grid controllers.
Providing host I/O access to every grid controller from every host host bus adapter (HBA) has the following advantages:
Uses the most cache
Uses the maximum available processor power to handle I/O
Fully uses the grid architecture
Minimizes the impact of a host interface hardware failure
For different multipathing configurations, see IBM FlashSystem A9000, IBM FlashSystem A9000R and IBM XIV Storage System Host Attachment and Interoperability, SG24-8368.
 
Important: To achieve a balance between port usage and performance, use ideally six paths per host and use as many grid controllers as possible. The maximum number of paths must not exceed 12 because a higher number results in higher resource use on the host side and no additional performance gain.
6.2 Quality of service
The quality of service (QoS) feature allows FlashSystem A9000 and FlashSystem A9000R to deliver different service levels to hosts that are connected to the same FlashSystem.
The QoS feature is intended to favor the performance of critical business applications that run concurrently with less critical applications. Because FlashSystem A9000 and FlashSystem A9000R processing capacity and cache are shared among all applications and all hosts are attached to the same resources, equal allocation of these resources among both critical and less critical applications might negatively affect the performance of the business-critical applications.
The response to this issue is to limit the input/output operations per second (IOPS) rate and bandwidth of certain applications by specifying and then enforcing limits. As a result, the QoS feature in FlashSystem A9000 and FlashSystem A9000R enables better performance for the critical host applications that run on the same system, concurrently with the noncritical host applications.
See IBM Spectrum Accelerate Family Storage Configuration and Usage for IBM FlashSystem A9000, IBM FlashSystem A9000R and IBM XIV Gen3, SG24-8376 for information about how to set QoS by defining performance classes in terms of IOPS and bandwidth limitation. It also explains how to assign specific members to a particular performance class. Each member can be assigned to only a single performance class at a time. However, the number of members within a specified class is not limited.
6.2.1 Limitation by bandwidth
The interface service that is running on each grid controller enforces the configured limitations. The intended limitation value depends on the number of grid controllers that are used by the hosts within the same performance class. The maximum rate value that is specified is divided by the number of grid controllers that are installed in FlashSystem A9000 (always three) or FlashSystem A9000R (4 - 12) to determine the rate for the class.
For example, a noncritical host is connected to all three grid controllers on a FlashSystem A9000:
If the application administrator intends to enforce a 300 MBps limit for that host, the administrator user must set the QoS bandwidth limit for that host to 300 and the Bandwidth Limit per grid controller is automatically set to 100.
With three grid controllers, the enforcement is 100 MBps per grid controller, limiting the host to an aggregate bandwidth of 300 MBps (100 MBps x 3 modules = 300 MBps). If only two interface modules were used, the limit for the host is 200 MBps (100 MBps x
2 modules = 100 MBps).
If the host has connections to only two of three grid controllers in a FlashSystem A9000, the actual host bandwidth limitation is only 200 MBps with this performance class setting (100 MBps x 2 modules = 200 MBps). Therefore, if the user intends to have a 300 MBps bandwidth limitation with two grid controllers that are connected in a full three-grid controller FlashSystem A9000, the bandwidth limit per interface is 150 MBps and the Bandwidth Limit must be set to 450.
6.2.2 Limitation by input/output operations per second
If the intent is to set a limitation at 10,000 IOPS for a specified host in a FlashSystem A9000 configuration, the IOPS limit must be set to 10,000 and the enforcement is 3,333 (10,000/3) for each grid controller.
If the host is attached to only two interface modules in a three grid controller FlashSystem A9000, the host IOPS limitation is only 6666 with this performance class setting
(3333 IOPS x 2 interface modules = 6666 IOPS).
If the intent is to have a 10 K IOPS limitation for a host that is connected to only two grid controllers in the specific scenario, “IOPS Limit Per Interface” must be set to 5000 or “IOPS Limit” for the performance class needs to be set to 15,000.
 
Note: Users must consider these grid controller multiplication factors to meet their expected limitations correctly when a host is connected to only a few grid controllers.
Shared or independent limitation
During the creation of a performance class, the performance class can be given a parameter setting of either Shared or Independent. This parameter defines whether the specific limits are shared among all class members or apply individually to each class member.
When Shared is selected, the maximum limit (bandwidth, IOPS, or both) is shared among all members of a performance class. Each member can reach the maximum limited value, but not all at the same time. For example, with a performance class with a limit of 300 MBps and two assigned hosts, each host can reach a maximum of 300 MBps because 300 MBps is the maximum allowed limitation. However, when both hosts are performing I/O, they share the limit. In this situation, no division between the members is done. Every member gets as much as it can get, but not more than the defined limit.
When Independent is selected, the maximum limit (bandwidth, IOPS, or both) applies to each member of the performance class. For example, for a performance class with a limit of 300 MBps and two assigned members, each member can reach a maximum of 300 MBps, regardless of how much bandwidth another member in the performance class consumes at the same time.
Performance class members
A performance class can be defined for one of the following four levels:
Domains
Hosts
Pools
Volumes
When a performance class is created, either by using the graphical user interface (GUI) or through the command-line interface (CLI), it is assigned a name, the system where it is created, the shared or independent parameter, and either the bandwidth limit, IOPS limit, or both. Later on, the members are added to the class. The members can be one of the four types that are listed, but you cannot mix the types of members. Each performance class can contain one or multiple members of a certain type.
Domains
When a domain is added to a performance class, the total bandwidth or IOPS rate that is defined as the limit for this domain applies to all I/O operations of a domain member. Each host, volume, or pool within the domain cannot exceed the specific limit and all domain members share the maximum limitation. For example, if two hosts are in a domain, they both share the limitation that is given to the domain by the performance class.
If the performance class is defined with the shared parameter setting, the limitation is even shared among all domains that are part of the performance class. If it is defined as independent, each domain has its own limitation.
Hosts
When a host is added to a performance class, the total limit applies to all I/O operations of the member host, regardless of which volume the host has access to.
Pool
A performance class limitation on a pool level applies to all volumes in that particular pool. A host that is accessing a volume from that pool will be limited to the defined values. If the same host is accessing other volumes from a different pool, the limit does not apply because the pool might not be a member of a performance class.
Volume
The behavior for volumes is comparable to the behavior setting for pools. The only difference is that not all volumes in the pool will be limited, but only the particular volumes that are added to the performance class. All other volumes in the pool that are not members of a performance class can be accessed by hosts without limitation.
Layering quality of service
You can layer FlashSystem A9000 and FlashSystem A9000R performance classes.
For instance, a domain can be added to a performance class, and hosts that are part of the domain can be added to other performance classes. In this situation, the lower limit always applies. For example, if a domain has a limit of 1,000 MBps and a host, which is a member of this domain, is added to another performance class by the domain administrator with a limit of 200 MBps, that host will be limited to 200 MBps. Otherwise, the limit of the domain, which is 1,000 MBps, applies and the host can use up to that bandwidth.
To illustrate this possibility, consider the following scenario. A service provider has multiple customers that run their services on a single FlashSystem A9000R, and the service provider uses multi-tenancy. The service provider wants to provide different QoS levels to the tenants that run on this system and decides to create three service levels: Bronze, Silver, and Gold.
To achieve this goal, the service provider creates two performance classes with the following settings and one customer set with no settings:
Bronze: Bandwidth limit: 1,000 MBps and Shared
Silver: Bandwidth limit: 1,000 MBps and Individual
Gold: No limit
For the Gold customers, the service provider does not create a performance class because the Gold customers are not limited.
Next, the service provider adds the correct customer domains to each performance class and leaves the Gold customer domains out of any performance class.
With this setting, the Bronze customers can reach up to 1,000 MBps, but they have to share the maximum bandwidth with all of the other customers at the Bronze level. The Silver customers can reach up to 1,000 MBps, and they do not have to share their limit with other customers in their performance class. The Gold customers are unlimited because they are not part of any performance class.
Each domain administrator now can create their own performance class, for example, to limit certain hosts, which are less critical than other hosts within their own domain. The limit can be either on bandwidth, on IOPS, or on both, regardless of what was defined in the domain level performance class. However, the lowest limit applies always. Figure 6-1 shows an illustration of this scenario.
Figure 6-1 Multiple performance classes
In Figure 6-1, five domains are defined on the system, Customer 1 Domain to Customer 5 Domain. Two performance classes were defined to which domains can be added, one for Bronze customers and one for Silver customers:
Customer 1 Domain and Customer 2 Domain are added to the Bronze performance class.
Customer 3 Domain and Customer 4 Domain are added to the Silver performance class.
Customer 5 Domain is not added to any class because the customer is a Gold level customer and will not be limited.
The domain administrator for Customer 1 Domain created a performance class (C1_Class) with a limit of 200 Mbps and added HostA and HostB to it. HostC and HostD were not added to the C1-Class. HostA and HostB are now limited to 200 Mbps. HostC and HostD are limited to a maximum of 1,000 Mbps due to their domain membership.
By looking at Customer 2 Domain, we see that the domain administrator created C2_Class. HostX and HostY are members of this class, so they are limited to 1,000 IOPS, but they are also limited to 1,000 Mbps due to their domain limit.
Furthermore, all I/Os to Customer 1 Domain and Customer 2 Domain must share 1,000 Mbps because the Bronze performance class is configured as shared.
The domain administrator for the Customer 3 Domain created a performance class (C4_lowVolumes) and added volumes to it, which is also possible. Vol1 and Vol2 are limited to 400 Mbps and Vol3 is limited to 1,000 Mbps, due to the domain limit in the Silver tier. All volumes of this domain share the limit of 1,000 Mbps even if Silver is set to shared. Shared means that Customer 3 Domain needs to share the bandwidth with Customer 4 Domain, which is also a member of the same performance class.
In Customer 4 Domain, we see a configuration that will not work correctly. This configuration can be created, but HostK and HostL will be limited to 1,000 Mbps instead of 2,000 Mbps, due to the lower limit of the Silver performance class.
Even if Customer 5 Domain is not a member of any performance class and has no bandwidth or IOPS limit, the domain administrator can create performance classes inside the domain and use its own QoS rules. The domain administrator created a class C5_slowPool with a limit of 1,000 IOPS and added Pool1 and Pool2 to it.
6.3 Performance monitoring
You monitor the performance of FlashSystem A9000 or FlashSystem A9000R by using different methods or tools.
The Hyper-Scale Storage Management GUI is the primary tool to monitor performance. Other tools, such as IBM Spectrum Control and IBM certified third-party tools, can be used for monitoring. The Hyper-Scale Storage Management GUI displays both current and historical performance statistics, and it allows historical statistics to be exported to files by using the CLI utility for further trending and analysis by the user.
6.3.1 Using the Storage Management GUI
Several views and panels are available for performance management.
Overall performance
The lower-right section of the Dashboard view displays the overall performance of all of the systems in the Hyper-Scale Manager inventory. The panel shows current statistics. You can switch between displaying IOPS and latency.
System statistics
Users can navigate to the system statistics view by clicking the performance charts that are shown in the Dashboard or by clicking the Statistics icon on the left side panel and choosing Systems and Interfaces Statistics from the Statistic Views menu, as shown in Figure 6-2.
Figure 6-2 System and Interfaces Statistics selection
The system statistics Workspace view is displayed. The top portion displays a list of systems that are managed under the Storage Management GUI, which are filtered according to your criteria. The bottom shows the average IOPS, latency, and bandwidth for systems that are selected in the list, as illustrated in Figure 6-3.
Figure 6-3 System statistics view
By clicking any of the vertical arrows at the bottom, you can display a chart that shows the historical statistics for the corresponding selection (IOPS, latency, or bandwidth). You can drill down to details about various aspects of the performance statistics. Figure 6-4 on page 117 shows an example for IOPS.
Figure 6-4 Filter options on the Statistics view
You can also refine the view or select to display historical statistics, as shown in the menu in Figure 6-4.
The system can display historical statistics for IOPS, bandwidth, and latency for various time ranges, up to one year, as illustrated in Figure 6-5.
Figure 6-5 Historical latency
You can choose various filters so that you can view the following measurements:
Read + Write: Filters for Read, Write, and Read + Write to view historical IOPS, bandwidth, and latency by reads and writes.
Hit + Miss: Filters Memory miss, Memory Hit, and Hit + Miss to view historical IOPS, bandwidth, and latency by hits and misses, and total hits plus misses.
All: Filters under All provide a view of the block size breakdown, where you can choose
0 - 8 KB, > 8 - 64 KB, > 64 - 512 KB, > 512 KB, or All. The breakdown filters are useful in determining the I/O profile of the host and to further tune host-side queues and other parameters, such as size and coalesce.
Range: You can use range filters to filter and view by time period. You can choose Last Year, Last Month, Last Week, Last Day, Last Hour, and From To to customize the time period.
6.3.2 Using the command-line interface
The second method to collect statistics is by using the command-line interface (CLI).
First, you must retrieve the system’s time. To retrieve the system’s time, issue the time_list command, as shown in Example 6-1.
Example 6-1 Retrieving the system time
TSO_2_A9000R>>
ITSO_2_A9000R>>time_list
Time Date Time Zone Daylight Saving Time
13:37:58 2016-03-16 US/Arizona no
ITSO_2_A9000R>>
After you obtain the system time, the statistics_get command can be formatted and issued. The statistics_get command requires various parameters to operate. It requires that you enter a starting or ending time point, a count for the number of intervals to collect, the size of the interval, and the units that are related to that size. The time stamp is modified by the previous time_list command. Example 6-2 provides a description of the command.
Example 6-2 Syntax of statistics_get command
statistics_get [ perf_class=perfClassName | host=HostName |
host_iscsi_name=initiatorName | host_fc_port=WWPN | target=RemoteTarget |
remote_fc_port=WWPN | remote_ipaddress=IPAddress | vol=VolName |
ipinterface=IPInterfaceName | local_fc_port=ComponentId ] < start=TimeStamp |
end=TimeStamp > [ module=ModuleNumber ] count=N interval=IntervalSize
resolution_unit=<minute|hour|day|week|month>
To further explain this command, assume that you want to collect 10 intervals, and each interval is for 1 minute. The point of interest occurred on 16 March 2016 roughly 15 minutes after 13:45:00.
 
Note: Use the statistics_get command to gather the performance data from any time period.
The time stamp is formatted as YYYY-MM-DD.hh:mm:ss, where the YYYY represents a four-digit year, MM is the two-digit month, and DD is the two-digit day. After the date portion of the time stamp is specified, you specify the time, where hh is the hour, mm is the minute, and ss represents the seconds. See Example 6-3.
Example 6-3 Using the statistics_get command
ITSO_2_A9000R>>statistics_get end=2016-03-16.13:45:30 count=10 interval=1 resolution_unit=minute
Figure 6-6 shows a sample output of the statistics. The output that is shown is a small portion of the data that was provided.
Figure 6-6 Statistics output
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.26.22