Chapter 4. IBM Spectrum Scale adjustments

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

IBM Spectrum Scale adjustments

In addition to VDisk and file system settings, the SAP workload requires some specific tuning parameters in the cluster configuration. This chapter describes some of those parameters.

This chapter includes the following topics:

•Overview

•IBM Spectrum Scale parameters

•Performance numbers

4.1 Overview

This section describes several server and client settings to consider.

4.1.1 Server-side settings

Most parameters on the server side (the IBM Elastic Storage Server (ESS) I/O nodes) include the default deployment procedure. However, by adding memory to the machine and increasing the loghome capabilities, some of those parameters must be adjusted, as shown in Example 4-1.

Example 4-1 Configuration changes

mmchconfig nsdRAIDFlusherFWLogLimitMB=60k,-N gss_ppc64

mmchconfig nsdRAIDFlusherFWLogHighWatermarkMB=30k -N gss_ppc64

mmchconfig nsdRAIDFastWriteFSMetadataLimit=1m -N gss_ppc64

mmchconfig nsdRAIDFastWriteFSDataLimit=2m -N gss_ppc64

4.1.2 Client-side settings

A similar procedure applies for the client nodes. In addition to the ESS head nodes, you must check that the appropriate gssclientconfig script was applied. Because client nodes can be dynamically added and removed from a cluster, there is no guarantee that the correct clients settings are implemented by the default deployment procedure.

To ease the process of adding and removing clients, create node classes and configure the client settings (which are deployed by the sample script) on this node classes. New clients then receive their settings by ordering them into the correct node class. For more information, see the IBM Spectrum Scale documentation for node classes.

A sample script for the minimum ESS clients tuning is shown in Example 4-2.

Example 4-2 Script for minimum ESS clients tuning

[root@gssio1 gss]# cd /usr/lpp/mmfs/samples/gss/

[root@gssio1 gss]# ll

total 24

-rwxr-xr-x 1 root root 7817 Jul 26 15:20 gssClientConfig.sh

Because HANA nodes feature an unusually large amount of memory, adjust the pagepool after the client configuration is applied. This adjustment is necessary because the clientCinfig script is using an internal heuristic to calculate the pagepool from the available memory.

In addition to these default settings, you must adjust other settings, such as the setting that is shown in Example 4-3. The commands are split into single lines for better text formatting. The settings can be deployed all at the same time.

Example 4-3 Adjusting default settings

mmchconfig maxMBpS=2000,maxGeneralThreads=2048 -N hananode

mmchconfig numaMemoryInterleave=yes,verbsRdmaMinBytes=8k -N hananode

mmchconfig verbsRdmaSend=yes,verbsRdmasPerConnection=128 -N hananode

mmchconfig verbsSendBufferMemoryMB=1024,nsdInlineWriteMax=4k -N hananode

mmchconfig aioWorkerThreads=256 -N hananode

mmchconfig disableDIO=yes,aioSyncDelay=10 -N hananode

mmchconfig ignorePrefetchLUNCount=yes -N hananode

mmchconfig pagepool=32G -N hananode

4.2 IBM Spectrum Scale parameters

This publication is not intended to describe all of the various IBM Spectrum Scale parameters. Some commonly used parameters are described in this section.

4.2.1 DirectIO in IBM Spectrum Scale

Even if DirectIO (DIO) is indicated, the file system is always allowed to ignore the DIO option and run a read/write as a normal, buffered I/O. You might need to use buffered I/O instead of DIO, regardless of which configuration parameters are set, such as if a read/write is not aligned on sector boundaries (although a correctly written application should always read/write on sector boundaries). Another example is when DIO is used to write a new file (rather than an update-in-place of an existing file) or when writing to a sparse file. In this case, the normal DIO path cannot be used because disk space must be allocated before anything can be written.

According to the Portable Operating System Interface (POSIX) definition, there is no requirement that data is written through to disk unless the application specifies O_SYNC. However, some UNIX systems traditionally interpreted O_DIRECT to imply O_SYNC and so some applications rely on this behavior.

Therefore, IBM Spectrum Scale implements the same semantics. This implementation is done by implicitly performing a fsync at the end of each DIO write if the write was run as buffered I/O instead of DIO, regardless of why it was done (as though the application specified O_SYNC in addition to O_DIO).

Therefore, if DIO is disabled by using the disableDIO option, data is still written through to disk, and the application receives the same semantics as it would without this option.

The HANA workload frequently forces DIO operation. However, IBM Spectrum Scale needs to occasionally switch to buffered mode+sync, depending on the conditions.

Some non-trivial overhead exists for switching between DIO and buffered mode. Therefore, it is better in many cases to stay in buffered mode for some specific types of workload.

With the disableDIO=yes,aioSyncDelay=10 setting on the client, you can adjust IBM Spectrum Scale to stay in buffered mode and fsync the data for any operation, which is called with DIO.

4.2.2 ignorePrefetchLUNCount

This client parameter controls how many threads that the IBM Spectrum Scale daemon awakes for write behind or pre-fetching. An old internal heuristic is used for calculating and starting threads, depending on the number of Network Shared Disks (NSDs). With IBM Spectrum Scale RAID, the number of NSDs is small, so have IBM Spectrum Scale use all available threads that are derived by cluster configuration.

The ignorePrefetchLUNCount tells the NSD client to not limit the number of requests that are based on the number of visible LUNs (as they can have many physical disks behind them). Rather, it indicates that the maximum number of buffers and pre-fetch threads is limited.

The default of this parameter is no(0). The default is set automatically after the gssclient config script is started.

You can check that the parameter is set correctly on each NSD client by using the command that is shown in Example 4-4.

Example 4-4 Checking parameter setting

[root@ems1 ~]# mmlsconfig | grep -i ignorepre

ignorePrefetchLUNCount yes

[root@ems1 ~]#

4.3 Performance numbers

A performance test and verification environment is shown in Figure 4-1. The numbers are achieved from a model GL6 that was deployed with the ESS 4.5.1 code level. Use gpfsperf to verify your setup.

Figure 4-1 Test and verification environment

As shown in Figure 4-1, the ESS nodes are connected by 4 x InfiniBand FDR, the clients by 2 x FDR, and IBM Spectrum Scale code level 4.2.0.4 was used on the client side. The numbers that are shown in Example 4-5 are real measured numbers that were achieved in a customer setup. The NSD client machines are all virtual machines (VMs or LPARs) on a power8 E880 model with at least four cores each and 32 GB memory for the IBM Spectrum Scale pagepool.

Example 4-5 Multiple clients, write

root@ems1 # mmdsh -N beer0200g,beer0201g,beer0202g,beer0203g,beer0205g,beer0206g,beer0207g,beer0204g,beer0208g "gpfsperf create seq /gpfs/test/data/$(hostname)/100Gfile -n 100g -r 16m -th 12 -fsync" | grep "Data rate"

beer0206g: Data rate was 2925860.09 Kbytes/sec, thread utilization 0.771, bytesTransferred 107374182400

beer0201g: Data rate was 2889809.46 Kbytes/sec, thread utilization 0.749, bytesTransferred 107374182400

beer0202g: Data rate was 2888886.65 Kbytes/sec, thread utilization 0.770, bytesTransferred 107374182400

beer0203g: Data rate was 2863675.27 Kbytes/sec, thread utilization 0.766, bytesTransferred 107374182400

beer0205g: Data rate was 2859437.49 Kbytes/sec, thread utilization 0.771, bytesTransferred 107374182400

beer0200g: Data rate was 2767664.24 Kbytes/sec, thread utilization 0.835, bytesTransferred 107374182400

beer0207g: Data rate was 2738951.66 Kbytes/sec, thread utilization 0.867, bytesTransferred 107374182400

beer0204g: Data rate was 2340173.58 Kbytes/sec, thread utilization 0.917, bytesTransferred 107374182400

beer0208g: Data rate was 1150506.74 Kbytes/sec, thread utilization 0.749, bytesTransferred 107374182400

~ 23,4 Gbytes/s

As you can see in the read performance that is shown in Example 4-6, we are approaching the theoretical overall SAS bandwidth of the building block, which is 3 SAS adapters x 4 ports (12 Gbps) ~ 36 GBps.

Example 4-6 Multiple clients, read

[root@rb3i0001 hwcct]# mmdsh -N beer0200g,beer0201g,beer0202g,beer0203g,beer0205g,beer0206g,beer0207g "gpfsperf read seq /gpfs/test/data/$(hostname)/100Gfile -n 100g -r 16m -th 12 -fsync" | grep "Data rate"

beer0200g: Data rate was 4779483.20 Kbytes/sec, thread utilization 0.968, bytesTransferred 107374182400

beer0203g: Data rate was 4428156.11 Kbytes/sec, thread utilization 0.973, bytesTransferred 107374182400

beer0206g: Data rate was 4419566.91 Kbytes/sec, thread utilization 0.980, bytesTransferred 107374182400

beer0205g: Data rate was 4413607.93 Kbytes/sec, thread utilization 0.972, bytesTransferred 107374182400

beer0202g: Data rate was 4409906.75 Kbytes/sec, thread utilization 0.985, bytesTransferred 107374182400

beer0201g: Data rate was 4408141.93 Kbytes/sec, thread utilization 0.982, bytesTransferred 107374182400

beer0207g: Data rate was 4408088.04 Kbytes/sec, thread utilization 0.984, bytesTransferred 107374182400

~ 31,2 Gbytes/s

4.3.1 Single client performance

For a HANA environment, the single client performance is essential for recovery or the time it takes to load data from disk into HANADB.

A rough test scenario is shown in Example 4-7, which demonstrates IBM Spectrum Scale single client performance of about 10 GBps read performance. For more information about the hardware setup, see Figure 4-1 on page 22.

Example 4-7 Test scenario

beer0201 [data] # gpfsperf read seq /gpfs/test/data/tmp1/file100g -n

100g -r 8m -th 8 -fsync

gpfsperf read seq /gpfs/test/data/tmp1/file100g

recSize 8M nBytes 100G fileSize 100G

nProcesses 1 nThreadsPerProcess 8

file cache flushed before test

not using direct I/O

offsets accessed will cycle through the same file segment

not using shared memory buffer

not releasing byte-range token after open

fsync at end of test

Data rate was 10318827.72 Kbytes/sec, thread utilization 0.806,

bytesTransferred 107374182400

4.3.2 SAP HANA HWCCT test

Although the ESS model was certified with eight productive HANA DB instances, an ESS can outperform this certified value by more than 50%. If all of the customized settings are configured correctly, you can achieve high numbers with the SAP test tool hwcct, which is included with the HANA distribution.

For more information about HWCCT, see the SAP HANA Tailored Data Center Integration - Frequently Asked Questions page of the SAP website.

A summary of the results is shown in Figure 4-2.

Figure 4-2 HWCCT results

The results show a test with 12 HANA nodes on a power8 E880 machine in parallel to one ESS GL6 building block, which is connected by InfiniBand FDR. In the summary chart, the columns include the following information:

•The first column describes the workload regarding log (sequential) or random (data)

•The second column references the various I/O sizes from the HWCCT

•The third column lists the expected minimum level

The measured performance numbers by SAPs HWCCT for each client are listed in the rest of the table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. IBM Spectrum Scale adjustments

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 4. IBM Spectrum Scale adjustments