Introduction to iSCSI in IBM Storwize storage systems
This chapter provides a beginner’s perspective of Internet Small Computer System Interface (iSCSI). It includes considerations for implementing iSCSI storage. It also describes a few iSCSI keywords and their Fibre Channel (FC) and Fibre Channel over Ethernet (FCoE) equivalents. In conclusion, the benefits of deploying an iSCSI-based storage solution are listed, along with things to be considered.
This chapter describes the following topics:
2.1 What iSCSI is
The Small Computer Systems Interface (SCSI) is a family of protocols for connecting and communicating with peripheral devices, such as printers, scanners, tape drives, and hard disk drives (HDDs). SCSI stands on the foundation of client/server architecture and both ends can send SCSI commands and receive responses. The individual I/O devices are called logical units (LUs) and they are identified by logical unit number (LUN). The SCSI target exposes the LUs to the SCSI initiator, which can then query or perform I/O operations on it. A SCSI initiator sends a command in a specific format, the Command Descriptor Block (CDB), and the SCSI target processes it.
iSCSI is a protocol that uses the Transmission Control Protocol and Internet Protocol (TCP/IP) to encapsulate and send SCSI commands to storage devices that are connected to a network. The detailed specification of the iSCSI standard is documented in RFC3720.
iSCSI is used to deliver SCSI commands from a client interface, which is called an iSCSI Initiator, to the server interface, which is known as the iSCSI Target. The iSCSI payload contains the SCSI CDB and, optionally, data. The target carries out the SCSI commands and sends the response back to the initiator.
In summary, the way iSCSI works is that it encapsulates SCSI commands by adding a special iSCSI header. This header is forwarded to the TCP layer, which creates TCP segments. The TCP segments are further broken down into IP packets, which can be transferred over a local area network (LAN), wide area network (WAN), or the internet in general. Figure 2-1 shows the path that a SCSI command takes when it is transmitted by using iSCSI.
Figure 2-1 iSCSI through the layers and the packet format on the wire
2.2 iSCSI sessions
The iSCSI session is the basic block through which the entire iSCSI layer processing is done. An iSCSI session can be considered equivalent to a SCSI I_T nexus (that is, a path from the initiator to the target). An iSCSI session must be established between an iSCSI initiator and target before the initiator can send any SCSI commands to the target. To process SCSI I/O, the iSCSI initiator establishes an iSCSI session with the iSCSI target after agreeing on certain operational parameters. For more information, see 2.2.1, “Components of an iSCSI session” on page 9.
2.2.1 Components of an iSCSI session
An iSCSI session has three main components that help define it.
iSCSI names
iSCSI initiators and targets are identified by their iSCSI names, which can be specified in two formats: the iSCSI qualified name (IQN) and the iSCSI Enterprise Unique Identifier (EUI).
iSCSI qualified name
The IQN is the most commonly used naming mechanism for iSCSI. It has a maximum length of 256 bytes and has the following format beginning with the letters “iqn”:
iqn.<yyyy-mm>.<reverse-domain-name>:<unique name>
yyyy-mm is the year and month when the naming authority was created.
reverse-domain-name is the domain name of the naming authority in reverse.
unique-name is a portion of the IQN that can be used by the naming authority to add some meaningful parameters.
iqn.1986-03.com.ibm:2145.cluster.node1 is an example of an IQN.
iSCSI Enterprise Unique Identifier
The EUI starts with the letters “eui” and has the following format:
eui.<16 hexadecimal digits>
Sixteen hexadecimal digits must be used for assigning a globally unique identifier.
iSCSI discovery
iSCSI initiators must identify which targets are present in the system to serve I/O requests. For this purpose, an initiator can run a discovery. There are three supported mechanisms that can be used for discovery: static configuration, SendTargets, and iSCSI Name Server (iSNS).
Static configuration
In this mechanism, the initiator already knows the target IP address and port and no real discovery is done. The initiator can directly establish a session. This option can be selected for small, unchanging iSCSI configurations.
SendTargets
With this mechanism, the assumption is that the initiator knows the target’s IP address and the initiator sends a SendTargets command to the IP address and the response consists of a list of available targets. The SendTargets mechanism is for suitable for correlatively large configurations.
iSCSI Name Server
In an environment with many targets supporting iSCSI connectivity and many hosts that must connect to the target controllers, configuring target connectivity on each iSCSI initiator host can be cumbersome. The iSCSI protocol enables setting up a mechanism called iSNS. It is a name service mechanism to which all targets register. After a name server is configured on the initiator, the initiator can discover available targets from the name server and establish connectivity to the listed targets.
iSCSI login
iSCSI enables two kinds of logins.
Discovery session
This session is a restricted-access-only type of session in which initiators can discover only the targets. The initiator specifies that the session type is Discovery. The target accepts only requests with the SendTargets key to send back a list of targets to the initiator and log out requests to close the session.
Normal operational session
In this type of session, all iSCSI commands are accepted, and responded to, by the target.
2.2.2 The three phases of iSCSI login
After an iSCSI initiator discovers all the targets that it can communicate with, an iSCSI login must be done before data transfer can begin. An iSCSI login establishes a TCP session between the iSCSI initiator and the iSCSI target. The iSCSI target listens on a TCP port and the initiator begins the login by sending a connect request. Authentication and negotiation of supported session parameters are carried out. This process is done in three phases: security negotiation, operational parameter negotiation, and full feature phase.
Security negotiation
In this phase of the iSCSI login, the initiator sends its list of supported authentication methods and the one to be used is determined (the most popular one is CHAP). The initiator and target authenticate each other by using the selected method and the next phase can now begin.
Operational parameter negotiation
Operational parameter negotiation might be the first phase if security negotiation is skipped. This exchange goes on in the form of login requests and responses until both parties agree on the operational parameters, which include (but are not limited to) the following things:
Header digest
Data digest
Immediate data
MaxRecvDataSegmentLength
MaxConnections
For the complete list of operational parameters, see the IETF website.
Full feature phase
After authentication and parameter setting is done, the TCP connection is established and the iSCSI session can proceed. The initiator starts sending SCSI commands and data to LUNs in the form of iSCSI Protocol Data Units (PDUs).
SAN Volume Controller and IBM Storwize storage systems enable you to advance to the full-feature phase directly from operational negotiation.
2.3 iSCSI adapters
Three types of iSCSI adapter are most commonly deployed in iSCSI storage solutions. Latency and performance vary depending on which adapter is chosen. This section lists the impact of each type in terms of cost and performance.
2.3.1 Ethernet card (network interface card)
Figure 2-2 shows an Ethernet card.
Figure 2-2 iSCSI implemented directly with an Ethernet card
Ethernet cards are designed to transfer IP packets. Therefore, to use Ethernet to send SCSI commands and data, it must first be packetized such that the adapter can process and forward it. The iSCSI protocol assists the server to achieve this task, but the entire iSCSI protocol processing is done in software before the operating system’s TCP/IP handler code is called. This method is processor-intensive and reduces the overall performance of the server. It leads to increased latencies, and the performance also is affected in cases where the Ethernet card might be getting traffic from other applications that share the interface.
2.3.2 TCP offload engine
Figure 2-3 shows a TCP offload engine (TOE).
Figure 2-3 iSCSI implemented with a TCP offload engine
The TOE interface card is more sophisticated in terms of the processing capabilities, and most of the TCP packet processing is done by specialized hardware that is built into the adapter. This implementation means that the TOE is better than the NIC when compared on a performance or latency basis. The TOE is also more expensive than the NIC.
2.3.3 iSCSI offload engine
Figure 2-4 shows an iSCSI offload engine.
Figure 2-4 iSCSI offload engine
The iSCSI offload engine sends all of the iSCSI protocol processing to the iSCSI-specific hardware that is built into the iSCSI offload engine card. This implementation provides the least latency and best performance of the three. The iSCSI offload engine is expensive because most functions are implemented in hardware, but the latency is reduced because there is little or no processing in the operating system kernel.
2.4 iSCSI routing
Internet Protocol (IP) has defined and widely used routing standards. iSCSI relies on the IP protocol for its routing requirements.
FCoE and iSCSI can coexist in the same data center if an iSCSI gateway is used to route traffic. An iSCSI gateway is a device that facilitates conversion between FCoE and iSCSI. It has FCoE ports in addition to normal Ethernet ports that provide connectivity for the TCP/IP protocols. An iSCSI gateway exports FC LUNs as iSCSI targets to provide integration with use of fewer cables and host bus adapters (HBAs).
Figure 2-5 shows an example of how iSCSI routing can be done.
Figure 2-5 iSCSI gateway
2.5 Ethernet for iSCSI
In traditional deployments of Ethernet, frames that are lost (either because they are dropped or because a collision occurred) are retransmitted. This loss is acceptable in normal non-storage data networks because the application can usually afford to wait for the frame retransmission. For storage traffic, lost frames are more harmful because the application suffers from large I/O latencies. Thus, even though iSCSI can handle the best-effort delivery of Ethernet networks, reduced retransmission significantly improves iSCSI performance.
In its early days, iSCSI was deployed only on 1-Gigabit Ethernet (GbE). This implementation gave a theoretical maximum performance of 125 MBps. (This theoretical maximum performance cannot be achieved because there is processing impact for each of the three protocols, that is, iSCSI, TCP, and IP.) With 10 GbE, mechanisms can be used to prevent retransmission of frames. The Converged Enhanced Ethernet (CEE) (also known as Data Center Bridging (DCB)) standard was developed to consolidate FC and Ethernet networks to provide lossless connectivity in the data center with the fewest number of cables. DCB is not supported on 1 GbE.
The following section explains DCB and other trends in Ethernet networks at the time of writing. It also details a few challenges and opportunities as newer and better Ethernet technology emerges.
2.5.1 Data Center Bridging
DCB is a set of standards that are defined by the Institute of Electrical and Electronics Engineers (IEEE) Task Group to enhance existing 802.1 bridge standards. This enhancement is done by improving link robustness and enabling a 10 GbE link to support multiple traffic types simultaneously while preserving their respective traffic properties.
The goal of DCB is to improve the Ethernet protocol so it becomes lossless by eliminating packet loss due to queue overflow. This scenario is known as lossless Ethernet.
The DCB standards include Priority Flow Control (PFC), Enhanced Transmission Selection (ETS), Congestion Notification (CN), and Data Center Bridging Exchange (DCBx).
Priority Flow Control (IEEE standard 802.1 Qbb)
In traditional Ethernet networks, a transmitter can send frames faster than a receiver accepts them, which means that if the receiver runs out of available buffer space to store incoming frames for further processing, it is forced to drop all frames arriving, leading to retransmission. The solution to avoid retransmission is to pause traffic when it exceeds the receiver’s capacity.
The traditional Ethernet flow control uses a PAUSE mechanism. If the port becomes busy, the switch manages congestion by pausing all the traffic on the port, regardless of traffic type.
PFC can individually pause traffic according to the tags that are assigned, and it facilitates lossless or no-drop behavior for a priority at the receiving port. Lossless behavior, when implemented end-to-end on a network, controls dropping frames during congestion by pausing traffic types that use PFC. Each frame that is transmitted by a sending port is tagged with a priority value (0 - 7) in the virtual local area network (VLAN) tag.
Figure 2-6 shows how PFC works. It divides the available 10 GbE bandwidth into eight different virtual lanes, with each lane assigned a priority level. If bursts of heavy congestion occur, lower priority traffic can be paused. In this example, lane 3 is paused while the rest of the lanes allow flow of traffic.
Figure 2-6 Priority Flow Control by pausing virtual lane 3
PFC provides fine-grained flow control. The switch pauses certain traffic types that are based on 802.1p Class of Service (CoS) values in the VLAN tag.
PFC works together with ETS, which is described in “Enhanced Transmission Selection (IEEE802.1 Qaz)” on page 16.
For more information about the PFC standard, see the IEEE 802 website.
Enhanced Transmission Selection (IEEE802.1 Qaz)
ETS is used to allocate link bandwidth between different traffic classes. With ETS enabled, bandwidth allocation is carried out based on the 802.1p priority values in the VLAN tag. It is possible to combine multiple priority values into traffic groups or classes. The important traffic can be assigned high priorities and ensured bandwidths. To improve the overall network efficiency, ETS allows lower priority traffic to use unused bandwidth from the high-priority queues and to exceed their own bandwidth guarantees.
Figure 2-7 shows an example of adding priorities values for each type of traffic.
Figure 2-7 Enhanced Transmission Selection working with Priority Flow Control to pause specific types of traffic
For more information about ETS, see the IEEE 802 website.
Congestion Notification (IEEE 802.1Qau)
CN is a flow control protocol for Layer 2 networks to eliminate heavy congestion due to long-lived traffic flows by throttling frame transmission. The congestion point, which can be a network switch or end-device port, can request that ingress ports limit their speed of transmissions when congestion is occurring. When the congestion ends, the ingress ports can increase their speed of transmission again. This process allows traffic flows to be throttled at the source to react to or prevent congestion by having a temporary reduction in transmission rate. This reduction is preferred to lost packets, which can cause long timeouts. This feature must be considered for large-scale environments with multi-hop networks.
Figure 2-8 shows a basic example of what occurs when CN is used.
Figure 2-8 An example of Congestion Notification in a network
For more information, see the IEEE 802website.
Data Center Bridging Exchange
DCBx is a protocol that is used by DCB devices to exchange configuration information with directly connected peers to ensure a consistent configuration across the network.
DCBx uses Link Layer Discovery Protocol (LLDP) to exchange parameters between two link peers to learn about the capabilities of the other peer. For example, two link peer devices support PFC. PFC is described in “Priority Flow Control (IEEE standard 802.1 Qbb)” on page 15.
This protocol also can be used to detect misconfiguration of a feature between the peers on a link and to configure DCB features in its link peer.
For more information about this protocol, see the IEEE 802 website.
2.5.2 The future of Ethernet and its impact on iSCSI
An IEEE study group has been formed to research 25 GbE and plans to achieve higher bandwidth in multiples of 25 Gbps. With Ethernet speeds increasing up to 40 Gbps and
100 Gbps and becoming commercially viable, new challenges for iSCSI storage must be addressed. The processor becomes a bottleneck for software-based iSCSI, necessitating the use of a TOE or mechanisms such as iSCSI over Remote Direct Memory Access (RDMA). The protocol-processing impact on I/O performance must be analyzed and resolved.
To address the need for higher bandwidths and lower latencies for iSCSI interconnects, new protocols that extend RDMA to Ethernet interconnects are being developed, such as internet Wide Area RDMA Protocol (iWARP) and RDMA over Converged Ethernet (RoCE). iSCSI Extensions for RDMA (iSER) is a new standard that enables iSCSI hosts and targets to take advantage of RDMA capabilities. iSER can run on top of any RDMA capable Network Interface Card (rNIC) regardless of the protocol, that is, iWARP or RoCE (V1 or V2). These new technologies are being adopted as higher Ethernet bandwidths, such as 25, 40, 50, and 100 Gbps, gain acceptance.
2.6 Fibre Channel: FCoE terms and their iSCSI equivalents
This section describes a few FC concepts and their iSCSI equivalents.
2.6.1 Fibre Channel zoning
FC zoning is a method to partition the switched fabric so that it restricts the visibility of certain FC endpoints and isolates devices into two zones. It is used to simplify security and management of the fabric.
iSCSI does not provide zoning.
2.6.2 Virtual SAN
Virtual fabric or Virtual SAN (vSAN) is a set of ports that is selected from a set of connected switches. Both FC zones and a vSAN can be used for isolation of traffic, but the key difference is that in a vSAN all the FC services are replicated within the switch so that it can act as a self-sufficient SAN.
This function is similar to VLAN in iSCSI and it helps for easier administration of a large SAN.
2.6.3 Buffer-to-Buffer credit
Buffer-to-Buffer credits (BB credits) are used in FC deployments for flow control, much like PFC is used in iSCSI. Each time an FC port transmits data, its BB credit is decremented by one and it is incremented when a recipient issues it some credits. A port with zero credits cannot transmit again until it obtains credits.
2.6.4 Worldwide name
A worldwide name (WWN) is a unique 64-bit identifier that is assigned to an FC device. It can either be a worldwide port name (WWPN) or worldwide node name (WWNN). A WWN consists of Network Address Authority (NAA) bits, usually followed by an Organizationally Unique Identifier (OUI). It can loosely be equated to an iSCSI IQN because an IQN uniquely identifies an iSCSI device.
2.6.5 Fabric name server
The fabric name server is a database of all the devices that are attached to a network fabric. An FC host can query the fabric name server to obtain information about a particular FC device. It is mandatory for operation. It is equivalent to iSNS. iSNS reduces the possibility of human error and provides for easier maintenance of the fabric.
2.7 Comparison of iSCSI and FCoE
Figure 2-9 shows the FCoE packet format.
Figure 2-9 FCoE protocol stack and packet format
Although both FCoE and iSCSI operate on Ethernet, there are several differences in the manner in which they work. Table 2-1 lists the key differences between iSCSI and FCoE.
Table 2-1 Comparison of FCoE and iSCSI
FCoE
iSCSI
FCoE enables the encapsulation of FC frames over Ethernet networks. The underlying protocol is not TCP/IP.
iSCSI encapsulates SCSI into the TCP/IP format.
FCoE is not routable because it does not have IP headers. FCoE works within a subnet.
iSCSI can be routed based on IP headers, and iSCSI payloads can be carried beyond a subnet through a gateway.
Practical implementations of FCoE require DCBx and PFC.
Although they are good to have, iSCSI implementations can do without DCBx and PFC.
In practice, for a usable FCoE solution, a fabric name server is mandatory.
Small and medium iSCSI solutions can work well without iSNS.
FCoE solutions that are commercially viable need firmware-based implementation in part.
iSCSI can be implemented completely in software.
2.8 Why use iSCSI
This section lists a few advantages of implementing an iSCSI-based storage solution.
2.8.1 iSCSI is cost-effective
iSCSI is a cost-effective storage solution because it uses existing hardware and network elements.
Cost of installation
iSCSI does not require expensive proprietary hardware on which to run. It does not need dedicated cabling and switches like FC. iSCSI can be implemented on standard Ethernet network hardware. Almost all organizations, including small and medium businesses, already have Ethernet network and cabling.
Maintenance and expansion costs
Replacement of parts of the network and burned-out hardware is inexpensive, which reduces the cost of maintaining the data center. Also, capacity expansion can be easily achieved by acquiring new disk arrays.
Administrative costs
Data center and network administrators are well-versed with TCP/IP configurations. Therefore, iSCSI has a natural advantage because it is implemented over IP. The cost of training staff in iSCSI can be lower than for other technologies.
2.8.2 No distance limitations
With the internet being so ubiquitous, it is possible to implement iSCSI storage such that the data center can be miles away from the application server. Also, iSCSI-based disaster recovery (DR) solutions over long distances are simplified, which are an affordable alternative to FC DR setups that require high-priced optical cables to be laid out from the primary site to the secondary site.
2.8.3 Good interoperability
iSCSI does not require specialized cabling or switches like FC. An iSCSI HBA provides Ethernet connectivity to storage devices and only the higher-level protocols are aware of iSCSI, and the transport layer (and layers lower than transport) treats the iSCSI packets as payload. iSCSI also provides good interoperability with equipment from multiple vendors because IP and Ethernet are common industry standards.
2.8.4 Bandwidth usage and Converged Enhanced Ethernet benefits
CEE provides a consolidated transport for both storage and networking traffic, which leads to better bandwidth usage in the data center. Implementation of lossless Ethernet over 10-Gigabit Ethernet, as described in 2.5, “Ethernet for iSCSI” on page 14, provides lower latency and improves performance. As servers deploy better processors for which the bus is no longer the bottleneck and commodity internet reaches higher speeds, converged Ethernet helps provide maximum performance with optimum resource usage.
2.8.5 Security
RFC 3720 lists six methods that are supported by iSCSI to provide security through authentication. The iSCSI initiator and target agree upon one of the six methods when the iSCSI session is established.
The most-widely used method that iSCSI uses to provide security is through Challenge Handshake Authentication Protocol (CHAP). CHAP limits an initiator’s access to volumes by using a challenge-response authentication mechanism. There are two ways CHAP can be set up: one-way CHAP and mutual CHAP.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.251.142