Introduction to IBM Spectrum Scale Erasure Code Edition
This chapter introduces IBM Spectrum Scale Erasure Code Edition (ECE). It is a scalable, high-performance data and file management solution. ECE is designed to run on any industry standard server that meets the ECE minimum hardware requirements.
ECE also provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale with the added benefit of network-dispersed IBM Spectrum Scale RAID, which provides data protection, storage efficiency, and the ability to manage storage in hyperscale environments that are composed from standardized hardware.
This chapter includes the following topics:
1.1 Overview
IBM Spectrum Scale Erasure Code Edition (ECE) is a high-performance, scale-out storage system for commodity servers. It is a new software edition of the IBM Spectrum Scale family, as shown in Figure 1-1. ECE provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale on the customer’s choice of commodity servers with the added benefit of network-dispersed IBM Spectrum Scale RAID, providing data protection, storage efficiency, and the ability to manage storage in hyperscale environments.
Figure 1-1 High-performance, scale-out storage with IBM Spectrum Scale Erasure Code Edition
Although ECE is a new IBM Spectrum Scale edition, the IBM Spectrum Scale RAID technology is field-proven in over 1000 deployed IBM Elastic Storage Server (ESS) systems. ESS is the storage power behind the fastest supercomputers on the planet. Summit and Sierra, supercomputers at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory, are ranked the first and second fastest computers in the world at the time of this writing1.
With the innovative network-dispersed IBM Spectrum Scale RAID adapted for scale-out storage, ECE delivers the same capabilities on industry standard compute, storage, and network components. Customers can choose their preferred servers that meet ECE hardware requirements with the best flexibility and cost.
ECE can be integrated into existing Spectrum Scale clusters, or expanded with extra ECE servers or any other storage that is supported by Spectrum Scale, including IBM ESS, IBM block storage, or other vendor’s block storage.
Spectrum Scale ECE, ESS and other Scale Editions provide the freedom to choose and combine different storage hardware. This feature is a major advantage over storage that is purchased as an “appliance” where expansion is limited to other appliances from the same vendor.
Software and middleware, which is certified to operate with IBM Spectrum Scale software, continues to be certified with IBM Spectrum Scale ECE, or a cluster combining ECE and other Spectrum Scale storage pools.
A Spectrum Scale cluster that includes ECE also can include ECE servers, and industry-standard IBM POWER® servers that are running Linux or IBM AIX®, x86 servers that are running Linux or Windows, and IBM z servers that are running Linux.
All of the servers in the same Spectrum Scale cluster use high-performance parallel access to access or serve data.
Users on servers that are outside the Spectrum Scale cluster can access the same data by using various industry standard protocols, such as NFS, SMB, HDFS, SWIFT, and S3. Other protocols can also be added by using “gateway” servers that are running open software, such as FTP, or vendor software, such as ownCloud.
1.2 Value proposition
The demand for storage systems that are based on commodity servers grew quickly in recent years. Many customers ask for enterprise storage software so they can adopt the most suitable server platform with the best flexibility and cost, without hardware vendor lock-in and the easiest management in their IT infrastructure. The following example user quotes explain why they need ECE:
Supplier mandates:
 – “We buy from Dell, HP, Lenovo, SuperMicro - whoever is cheapest at that moment.”
 – “Our designated configuration is HPE Apollo.”
 – “We assemble our own servers that are OCP-compliant.”
Technical and architectural mandates:
 – “This is for an analytical grid where the IT architecture team only allows x86.”
 – “We need a strategic direction for scale-out storage.”
 – “Only storage rich servers are acceptable, no appliances.”
 – “We use storage arrays today and we are forced by upper management to go with storage rich servers.”
Cost perception:
 – “We want the economic benefits of commodity hardware.”
 – “We don't want to pay for high-end or even mid-range storage.”
As commodity servers with internal disk drives become more popular, they are widely adopted in various use cases, especially the emerging AI, big data analytics, and cloud environments. This architecture provides the best flexibility to choose the storage hardware platforms and it makes large-scale storage systems much more affordable for many customers, which becomes more important with the explosion of enterprise data. However, commodity storage servers also expose the following major challenges:
Poor storage utilization
Many storage systems use traditional data replication to protect data from hardware or software failures, typically by using three replicas. This results in low storage efficiency (33 percent), which requires much more hardware in the storage system. With large volumes of data, customers must pay a large amount of money to acquire and operate the extra hardware.
High failure rates
Commodity hardware is less reliable than enterprise hardware, which introduces more hardware failures in different components, including node, HBA, and disk drive failures. High failure rates of commodity hardware results in poor durability and has a higher impact to performance during failure events, which makes these events more common instead of rare case. Because of these factors, achieving high data reliability and high storage performance during failure becomes a significant challenge to distributed storage systems.
Data integrity concerns
With a large quantity of data in the storage system, the possibility of silent data corruption becomes much higher than traditional storage systems with a much smaller scale.
Scalability challenges and data silos
It is a challenge to manage many servers and disk drives in the same system. Some distributed storage systems might not scale well when approaching exa-scale or even tens of petabytes. This issue introduces unnecessary data movement among storage systems or from storage systems to data processing systems.
Missing enterprise storage features
The features include data lifecycle management (tiering, ILM policies), auditing, multi-site synchronization, snapshots, backup and restore, disaster recovery, and disk management. Without these features, it becomes difficult to manage large server farms with frequent maintenance requirements.
To address these issues, ECE provides the value of enterprise storage that is based on industry standard servers to our customers. A typical ECE hardware architecture is shown in Figure 1-2.
Figure 1-2 Hardware Architecture of IBM Spectrum Scale Erasure Code Edition
It is composed of a set of homogeneous storage servers with internal disk drives, typically NVMe or SAS SSD and spinning disks. They are connected to each other with a high-speed network infrastructure.
ECE delivers all the capability of IBM Spectrum Scale Data Management Edition, including enormous scalability, high performance and enterprise manageability, and information lifecycle management tools. It also delivers the following durable, robust, and storage-efficient capabilities of IBM Spectrum Scale RAID:
Data is distributed across nodes and drives for higher durability without the cost of replication
End-to-end checksum identifies and corrects errors that are introduced by network or media
Rapid recovery and rebuild after hardware failure while generally maintaining performance levels
Disk hospital function manages drive health issues before they become disasters
Continuous background scrub and error correction support deployment on many drives while maintaining data integrity
All of these features are delivered on your choice of ECE storage servers.
1.3 Advantages and key features
ECE delivers full features of valued IBM Spectrum Scale and IBM Spectrum Scale RAID with commodity server as a distributed storage system. It solves the challenges to manage large-scale, server-based distributed storage.
1.3.1 High-performance erasure coding
ECE supports several erasure codes and brings much better storage efficiency; for example, ~70 percent with 8+3p and ~80 percent with 8+2p Reed Solomon Code (see Figure 1-3).
Figure 1-3 8+2p / 8+3p Reed Solomon Code in ECE
Better storage efficiency means less hardware, improved network utilization, and lower operating costs. These benefits provide customers with significant savings without compromising system availability and data reliability.
ECE erasure coding can better protect data when compared to traditional RAID5/6, with three nodes of fault tolerance with 8+3p in a configuration that features 11 or more nodes. ECE erasure coding also provides much faster rebuild and recovery performance compared to traditional RAID5/6.
This configuration can survive a concurrent failures of multiple servers and storage devices. Furthermore, ECE implements high-performance erasure coding, which can be used as tier one storage.
One of the typical use cases of ECE is to accelerate data processing by using enterprise NVMe drives, which can deliver high throughput and low latency. High performance is a key differentiation compared with other erasure coding implementations in distributed storage systems. These other schemes are typically used for cold data only.
1.3.2 Declustered erasure coding
ECE implements advanced declustered RAID with erasure coding. ECE declustered RAID can manage many disk drives across multiple servers in a single grouping that is known as a declustered array (also referred to as DA in this publication).
The left side of Figure 1-4 on page 9 shows a declustered RAID array that is composed of disk drives from multiple nodes in an ECE storage system. The ECE failure domain feature can detect and analyze hardware topology automatically and distribute data evenly among all the nodes and disk drives. The spare space is also distributed evenly across the drives in the declustered array. This even distribution results in a low probability of losing two or three strips in the same data block, which means much less data to rebuild during hardware failure.
With many disk drives in the same group and evenly distributed spare space, the data rebuild process can read from all surviving servers and disk drives in parallel and write to them in parallel as well, which results in shorter rebuild time and better mean time to data loss (MTTDL).
A key advantage of ECE’s Spectrum Scale RAID that distinguishes it from other declustered RAID approaches is its ability to maintain near-normal levels of performance to the user in many failure scenarios. This ability to maintain service levels at large scale was proven extensively with Spectrum Scale RAID that is running at some of the largest and most challenging computing sites in the world.
ECE achieves this by categorizing data rebuild into critical rebuild and normal rebuild. Critical rebuild occurs when data is in a high risk situation; for example, having lost two strips with 8+2p or three strips with 8+3p erasure code. In this situation, ECE rebuilds data urgently by using as much bandwidth as possible. Given much less data to rebuild, critical rebuild can complete in short time.
After critical rebuild, ECE enters normal rebuild and reserves most of the bandwidth for the applications if the data includes good enough fault tolerance so that the rebuild does not have to be completed as urgently.
With declustered RAID, even data and spare distribution and critical/normal rebuild, ECE can balance between high data reliability and low-performance impact to the applications.
1.3.3 End-to-end checksum for comprehensive data integrity
ECE is highly reliable with extreme data integrity for any type of silent data corruption.
ECE calculates, transfers, and verifies checksum for each data block over the network. If corruption occurs during network transfer, the data is retransmitted until it succeeds.
ECE also calculates, stores, and verifies a checksum and other information, such as data versions, VDisk association, and data block and strip location. These metadata items are called buffer trailer in ECE, and are used to protect data from various data corruptions, especially silent data corruption, including hardware failures, offset write, drop write, write garbage, and media errors.
1.3.4 Extreme scalability
One of the major advantages of IBM Spectrum Scale and IBM Spectrum Scale RAID is its high scalability. This capability was proven in many large-scale systems. The latest and most impressive examples are the Coral systems.
The Summit system is in the US Department of Energy’s Oak Ridge National Laboratory (ORNL), and is 8 times more powerful than ORNL’s previous top-ranked system, Titan. The Summit system is the world’s fastest supercomputer with 200 PFLOPS of compute bandwidth and 300 PB storage capacity with 2.5 TBps I/O bandwidth that uses IBM ESS storage hardware.
IBM Spectrum Scale and IBM Spectrum Scale RAID are the same core storage software technologies powering ESS and ECE storage systems. A set of storage servers can be configured with ECE to provide a high-performance and reliable storage building block. Many of these building blocks can be aggregated together into the same large Spectrum Scale file system, which eliminates data silos and unnecessary data movement.
1.3.5 Enterprise storage features and manageability
IBM Spectrum Scale has been in production for over 20 years. It is well-known as an enterprise file system with a competitive list of features to meet data management requirements in various use cases.
ECE further extends Spectrum Scale to enable the use of industry standard storage servers.
ECE automatically configures storage layout by sensing the hardware topology and distributing data evenly among all nodes and drives to automatically achieve high data reliability and durability. ECE detects changes in the hardware topology, such as a node failure, and rearranges data to maintain an optimal distribution of data on the remaining hardware. It can also help system administrators manage their hardware in a simple and convenient way.
ECE implements a disk hospital to predict and detect disk failures, diagnose problems, and identify failing disks for replacement to the system administrator. It defines a standard procedure to help system administrators identify and replace bad disk drives.
It also informs the server in which slot a bad disk drive is located and can turn on an indicator LED for most types of drives, which makes disk replacement convenient. ECE provides this functionality on industry-standard, commodity server-based storage software by implementing hardware platform neutrality.
1.4 Configuration options
IBM Spectrum Scale ECE is configured in one or more building blocks, also known as recovery groups, that are made up of storage rich servers. All of the servers in an ECE building block must have the same configurations in terms of CPU, memory, network, storage drive types, and operating system. The storage drives are used for storing data by striping the data across all the servers and drives in a building block.
A Spectrum Scale cluster can be constructed from multiple ECE building blocks. Each building block can have a unique server type and drive types. The storage topology must be the same for each building block. Multiple drive types can be installed into each server, but each server in a building block must have the same number of drives of each type, and the drives of each type must have the same capacity, and, for HDDs, the same rotational speed.
Different types of storage can be configured into separate file system storage pools, and these storage pools can be used for different types of workloads. For example, NVMe or SAS SSD devices are configured for metadata and small data fast I/Os, while HDD devices can be used to store cold or archived data.
The following storage options are supported by IBM Spectrum Scale Erasure Code Edition versions 5.0.3 and 5.0.4, please check the ECE Knowledge Center for the latest updates in hardware support:
NVMe drives only
Storage rich servers that are populated with enterprise class NVMe drives with U.2 form factor can be configured with Erasure Code Edition to store data on the NVMe drives.
Combination of HDD, NVMe, and SSD drives
Storage rich servers that are populated with a combination of SAS HDD, SAS SSD, and NVMe drives can be configured with ECE to have data stored on those drives. Normally, NVMe or SSD drives are configured for metadata and small data I/Os.
ECE with multiple building blocks
Depending on your use case, you might want to create a storage system that is constructed from multiple building blocks. For a high capacity use case, several building blocks might be HDD with a few NVMe drives per node. For a high-performance file serving use case, one building block might be NVMe only, and a second that is a mix of NVMe and HDD drives, or several building blocks that include both NVMe and HDDs.
ECE with ESS
At large-scale installations, the Erasure Code Edition servers (along with ESS storage systems) can be configured in a single IBM Spectrum Scale cluster environment. The different Recovery Groups must be configured on ESS storage systems and ECE storage servers. Figure 1-4 on page 9 shows a typical configuration of IBM Spectrum Scale cluster with ECE storage rich servers and ESS storage systems.
Figure 1-4 IBM Spectrum Scale cluster made up of a combination of ECE and ESS servers
1.5 Example ECE use cases
IBM Spectrum Scale Erasure Code Edition can be used in many customer scenarios where industry standard scale-out storage systems are required. Examples can be, but are not limited to, AI and analytics, life sciences, manufacturing, media and entertainment, financial services, academia and government, and cloud storage; that is, the use cases where IBM Spectrum Scale demonstrated significant value.
This section describes several typical workloads or use cases that are used in ECE customer environments.
1.5.1 High-performance file serving
IBM Spectrum Scale Erasure Code Edition can provide backend storage with IBM Spectrum Scale Protocol services to allow clients to access data with NFS, SMB, and Object protocols in addition to high speed native access using the IBM Spectrum Scale client.
Each ECE storage server is typically configured with several NVMe drives to store and accelerate IBM Spectrum Scale metadata and small data I/Os, and several HDD drives to store user data. ECE can deliver high-performance file serving for the user’s workloads and also achieve cost savings by tiering data from the NVMe storage pool to the HDD storage pool as the usage of data goes from hot to cold.
1.5.2 High-performance compute tier
IBM Spectrum Scale Erasure Code Edition implements high-performance erasure coding and provides the capability of storage tiering to different storage media (for example, flash drives, spinning disks, tape, and cloud storage) with different performance and cost characteristics.
Spectrum Scale’s policy-based Information Lifecycle Management (ILM) feature makes it convenient to automatically or manually manage data movement among different storage tiers.
A typical ECE high-performance compute tier is composed of servers with NVMe drives to store and accelerate IBM Spectrum Scale metadata and the set of hot data for high-performance computing and analytics.
1.5.3 High capacity data storage
IBM Spectrum Scale Erasure Code Edition can deliver the essential cost effective and data reliability features to large-scale storage system with space efficient erasure coding and extreme end-to-end data protection support.
A typical ECE storage system for high capacity storage can be composed of a NVMe storage pool to store and accelerate IBM Spectrum Scale metadata and small data I/Os, and a larger set of HDD drives to store the massive user data. It also can move cold data to much cheaper tape or Object Storage, if needed.
1.6 Example configuration
Consider an example configuration of 16 servers, with a mix of NVMe and SAS HDD drives, as listed in Table 1-1.
Table 1-1 Example ECE server configuration
ECE server configuration
Description
Number of servers
16
CPUs per server
2 (Intel(R) Xeon(R) Silver 4110)
Cores per cpu
8
Memory per server
256 GB (16 GB x 16 DIMMs)
NVMe drives per server
2 x 1.8 TB (1.5 TiB)
SAS HDD drives
10 x 10.0 TB (9.1 TiB)
SAS HBA
LSI MegaRAID SAS3516
HCA
Mellanox MT27800 Family [ConnectX-5]
(100 Gbps)
Price per node
Approximately 10,000 USD
This type of server is shown in Figure 1-5.
Figure 1-5 Example ECE server
With this configuration, the approximate file system capacity is 1100 TB (1000 TiB) using an 8+3P erasure code. This usable space is delivered and accounts for erasure code overhead, reserved spare space, and IBM Spectrum Scale RAID metadata.
1.7 Summary
IBM Spectrum Scale Erasure Code Edition is a new member of the IBM Spectrum Scale family. It offers exciting potential to deploy systems that are highly tuned to your compute and storage needs.
In this IBM Redpaper publication, we describe ECE use cases in more detail, provide more information about the underlying technology, discuss planning considerations, and then describe an installation scenario, day-to-day management examples, and provide an overview of problem determination procedures.
 
 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.187.233