IBM Spectrum Scale adjustments
In addition to VDisk and file system settings, the SAP workload requires some specific tuning parameters in the cluster configuration. This chapter describes some of those parameters.
This chapter includes the following topics:
4.1 Overview
This section describes several server and client settings to consider.
4.1.1 Server-side settings
Most parameters on the server side (the IBM Elastic Storage Server (ESS) I/O nodes) include the default deployment procedure. However, by adding memory to the machine and increasing the loghome capabilities, some of those parameters must be adjusted, as shown in Example 4-1.
Example 4-1 Configuration changes
mmchconfig nsdRAIDFlusherFWLogLimitMB=60k,-N gss_ppc64
mmchconfig nsdRAIDFlusherFWLogHighWatermarkMB=30k -N gss_ppc64
mmchconfig nsdRAIDFastWriteFSMetadataLimit=1m -N gss_ppc64
mmchconfig nsdRAIDFastWriteFSDataLimit=2m -N gss_ppc64
 
4.1.2 Client-side settings
A similar procedure applies for the client nodes. In addition to the ESS head nodes, you must check that the appropriate gssclientconfig script was applied. Because client nodes can be dynamically added and removed from a cluster, there is no guarantee that the correct clients settings are implemented by the default deployment procedure.
To ease the process of adding and removing clients, create node classes and configure the client settings (which are deployed by the sample script) on this node classes. New clients then receive their settings by ordering them into the correct node class. For more information, see the IBM Spectrum Scale documentation for node classes.
A sample script for the minimum ESS clients tuning is shown in Example 4-2.
Example 4-2 Script for minimum ESS clients tuning
[root@gssio1 gss]# cd /usr/lpp/mmfs/samples/gss/
[root@gssio1 gss]# ll
total 24
-rwxr-xr-x 1 root root 7817 Jul 26 15:20 gssClientConfig.sh
Because HANA nodes feature an unusually large amount of memory, adjust the pagepool after the client configuration is applied. This adjustment is necessary because the clientCinfig script is using an internal heuristic to calculate the pagepool from the available memory.
In addition to these default settings, you must adjust other settings, such as the setting that is shown in Example 4-3. The commands are split into single lines for better text formatting. The settings can be deployed all at the same time.
Example 4-3 Adjusting default settings
mmchconfig maxMBpS=2000,maxGeneralThreads=2048 -N hananode
mmchconfig numaMemoryInterleave=yes,verbsRdmaMinBytes=8k -N hananode
mmchconfig verbsRdmaSend=yes,verbsRdmasPerConnection=128 -N hananode
mmchconfig verbsSendBufferMemoryMB=1024,nsdInlineWriteMax=4k -N hananode
mmchconfig aioWorkerThreads=256 -N hananode
mmchconfig disableDIO=yes,aioSyncDelay=10 -N hananode
mmchconfig ignorePrefetchLUNCount=yes -N hananode
mmchconfig pagepool=32G -N hananode
4.2 IBM Spectrum Scale parameters
This publication is not intended to describe all of the various IBM Spectrum Scale parameters. Some commonly used parameters are described in this section.
4.2.1 DirectIO in IBM Spectrum Scale
Even if DirectIO (DIO) is indicated, the file system is always allowed to ignore the DIO option and run a read/write as a normal, buffered I/O. You might need to use buffered I/O instead of DIO, regardless of which configuration parameters are set, such as if a read/write is not aligned on sector boundaries (although a correctly written application should always read/write on sector boundaries). Another example is when DIO is used to write a new file (rather than an update-in-place of an existing file) or when writing to a sparse file. In this case, the normal DIO path cannot be used because disk space must be allocated before anything can be written.
According to the Portable Operating System Interface (POSIX) definition, there is no requirement that data is written through to disk unless the application specifies O_SYNC. However, some UNIX systems traditionally interpreted O_DIRECT to imply O_SYNC and so some applications rely on this behavior.
Therefore, IBM Spectrum Scale implements the same semantics. This implementation is done by implicitly performing a fsync at the end of each DIO write if the write was run as buffered I/O instead of DIO, regardless of why it was done (as though the application specified O_SYNC in addition to O_DIO).
Therefore, if DIO is disabled by using the disableDIO option, data is still written through to disk, and the application receives the same semantics as it would without this option.
The HANA workload frequently forces DIO operation. However, IBM Spectrum Scale needs to occasionally switch to buffered mode+sync, depending on the conditions.
Some non-trivial overhead exists for switching between DIO and buffered mode. Therefore, it is better in many cases to stay in buffered mode for some specific types of workload.
With the disableDIO=yes,aioSyncDelay=10 setting on the client, you can adjust IBM Spectrum Scale to stay in buffered mode and fsync the data for any operation, which is called with DIO.
4.2.2 ignorePrefetchLUNCount
This client parameter controls how many threads that the IBM Spectrum Scale daemon awakes for write behind or pre-fetching. An old internal heuristic is used for calculating and starting threads, depending on the number of Network Shared Disks (NSDs). With IBM Spectrum Scale RAID, the number of NSDs is small, so have IBM Spectrum Scale use all available threads that are derived by cluster configuration.
The ignorePrefetchLUNCount tells the NSD client to not limit the number of requests that are based on the number of visible LUNs (as they can have many physical disks behind them). Rather, it indicates that the maximum number of buffers and pre-fetch threads is limited.
The default of this parameter is no(0). The default is set automatically after the gssclient config script is started.
You can check that the parameter is set correctly on each NSD client by using the command that is shown in Example 4-4.
Example 4-4 Checking parameter setting
[root@ems1 ~]# mmlsconfig | grep -i ignorepre
ignorePrefetchLUNCount yes
[root@ems1 ~]#
4.3 Performance numbers
A performance test and verification environment is shown in Figure 4-1. The numbers are achieved from a model GL6 that was deployed with the ESS 4.5.1 code level. Use gpfsperf to verify your setup.
Figure 4-1 Test and verification environment
As shown in Figure 4-1, the ESS nodes are connected by 4 x InfiniBand FDR, the clients by 2 x FDR, and IBM Spectrum Scale code level 4.2.0.4 was used on the client side. The numbers that are shown in Example 4-5 are real measured numbers that were achieved in a customer setup. The NSD client machines are all virtual machines (VMs or LPARs) on a power8 E880 model with at least four cores each and 32 GB memory for the IBM Spectrum Scale pagepool.
Example 4-5 Multiple clients, write
root@ems1 # mmdsh -N beer0200g,beer0201g,beer0202g,beer0203g,beer0205g,beer0206g,beer0207g,beer0204g,beer0208g "gpfsperf create seq /gpfs/test/data/$(hostname)/100Gfile -n 100g -r 16m -th 12 -fsync" | grep "Data rate"
beer0206g: Data rate was 2925860.09 Kbytes/sec, thread utilization 0.771, bytesTransferred 107374182400
beer0201g: Data rate was 2889809.46 Kbytes/sec, thread utilization 0.749, bytesTransferred 107374182400
beer0202g: Data rate was 2888886.65 Kbytes/sec, thread utilization 0.770, bytesTransferred 107374182400
beer0203g: Data rate was 2863675.27 Kbytes/sec, thread utilization 0.766, bytesTransferred 107374182400
beer0205g: Data rate was 2859437.49 Kbytes/sec, thread utilization 0.771, bytesTransferred 107374182400
beer0200g: Data rate was 2767664.24 Kbytes/sec, thread utilization 0.835, bytesTransferred 107374182400
beer0207g: Data rate was 2738951.66 Kbytes/sec, thread utilization 0.867, bytesTransferred 107374182400
beer0204g: Data rate was 2340173.58 Kbytes/sec, thread utilization 0.917, bytesTransferred 107374182400
beer0208g: Data rate was 1150506.74 Kbytes/sec, thread utilization 0.749, bytesTransferred 107374182400
 
~ 23,4 Gbytes/s
As you can see in the read performance that is shown in Example 4-6, we are approaching the theoretical overall SAS bandwidth of the building block, which is 3 SAS adapters x 4 ports (12 Gbps) ~ 36 GBps.
Example 4-6 Multiple clients, read
[root@rb3i0001 hwcct]# mmdsh -N beer0200g,beer0201g,beer0202g,beer0203g,beer0205g,beer0206g,beer0207g "gpfsperf read seq /gpfs/test/data/$(hostname)/100Gfile -n 100g -r 16m -th 12 -fsync" | grep "Data rate"
beer0200g: Data rate was 4779483.20 Kbytes/sec, thread utilization 0.968, bytesTransferred 107374182400
beer0203g: Data rate was 4428156.11 Kbytes/sec, thread utilization 0.973, bytesTransferred 107374182400
beer0206g: Data rate was 4419566.91 Kbytes/sec, thread utilization 0.980, bytesTransferred 107374182400
beer0205g: Data rate was 4413607.93 Kbytes/sec, thread utilization 0.972, bytesTransferred 107374182400
beer0202g: Data rate was 4409906.75 Kbytes/sec, thread utilization 0.985, bytesTransferred 107374182400
beer0201g: Data rate was 4408141.93 Kbytes/sec, thread utilization 0.982, bytesTransferred 107374182400
beer0207g: Data rate was 4408088.04 Kbytes/sec, thread utilization 0.984, bytesTransferred 107374182400
 
 ~ 31,2 Gbytes/s
4.3.1 Single client performance
For a HANA environment, the single client performance is essential for recovery or the time it takes to load data from disk into HANADB.
A rough test scenario is shown in Example 4-7, which demonstrates IBM Spectrum Scale single client performance of about 10 GBps read performance. For more information about the hardware setup, see Figure 4-1 on page 22.
Example 4-7 Test scenario
beer0201 [data] # gpfsperf read seq /gpfs/test/data/tmp1/file100g -n
100g -r 8m -th 8 -fsync
gpfsperf read seq /gpfs/test/data/tmp1/file100g
recSize 8M nBytes 100G fileSize 100G
nProcesses 1 nThreadsPerProcess 8
file cache flushed before test
not using direct I/O
offsets accessed will cycle through the same file segment
not using shared memory buffer
not releasing byte-range token after open
fsync at end of test
Data rate was 10318827.72 Kbytes/sec, thread utilization 0.806,
bytesTransferred 107374182400
4.3.2 SAP HANA HWCCT test
Although the ESS model was certified with eight productive HANA DB instances, an ESS can outperform this certified value by more than 50%. If all of the customized settings are configured correctly, you can achieve high numbers with the SAP test tool hwcct, which is included with the HANA distribution.
For more information about HWCCT, see the SAP HANA Tailored Data Center Integration - Frequently Asked Questions page of the SAP website.
A summary of the results is shown in Figure 4-2.
Figure 4-2 HWCCT results
The results show a test with 12 HANA nodes on a power8 E880 machine in parallel to one ESS GL6 building block, which is connected by InfiniBand FDR. In the summary chart, the columns include the following information:
The first column describes the workload regarding log (sequential) or random (data)
The second column references the various I/O sizes from the HWCCT
The third column lists the expected minimum level
The measured performance numbers by SAPs HWCCT for each client are listed in the rest of the table.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.144.56