image
Storage Management
In this chapter, you will learn how to
• Provision storage for use
• Manage storage devices using common management protocols
• Select an effective administration method for the task
• Configure monitoring, alerting, and reporting for storage
image
The previous chapters discussed the technical, physical, and theoretical elements of storage. This chapter builds upon that foundation of how hardware, protocols, and other network resources work together and explores the allocation, optimization, and management of storage resources. One of the first steps in this process is to allocate storage resources where needed in the enterprise. The allocation or partitioning of storage resources is referred to as provisioning.
Why is provisioning important? Provisioning determines dynamically and in real time where critical system resources are needed. Priorities that will guide the selection of method or technique for provisioning may be based on a few of the following factors:
• Importance or priority of the process, routine, or task
• Importance or priority of data
• The identification of what critical storage and network resources are required to satisfy and maintain system policies and service-level standards
Storage Provisioning
Storage provisioning is the process of assigning disk space to users while optimizing storage resources. Techniques for provisioning storage are LUN, thick, and thin provisioning.
• LUN provisioning  A logical unit number (LUN) is created and assigned to each enterprise application.
• Thick provisioning  The space required is determined when creating the virtual disk based on anticipated storage requirements.
• Thin provisioning  The primary goal is to optimally allocate storage on the basis of demand (multiple users) and actual size required.
Provisioning usually begins with the creation of storage known as a logical unit (LU) and referred to by a LUN. Storage can be allocated to a device through thin or thick provisioning. Thin provisioning allows the host to believe that it has the maximum amount of space available while the storage system allocates only what is actually used, whereas thick provisioning allocates the entire amount to the storage device. Thin provisioning represents a more optimal means of allocating storage resources than thick provisioning when trying to conserve the amount of space allocated. For example, say a storage administrator configures a LUN with a size of 10GB that can grow to 100GB. This LUN will appear to the operating system as a LUN of 100GB, but the LUN will consume only 10GB of space on the storage array. As more data is added to the LUN, the space consumed on the storage array will increase accordingly, so when 30GB of the 100GB is used, the LUN will take up 30GB of space on the storage array but still appear to the operating system as a 100GB LUN.
With thick provisioning in the same situation, the administrator would need to allocate 100GB of space for the LUN and then map it to the host. The LUN would not grow, and as more space is consumed by the operating system, the storage provisioned on the storage array would not change. In thick provisioning storage, resources are allocated in a fixed manner, oftentimes resulting in less-than-optimal system utilization, but sometimes performance is better with thick provisioning because the system does not need to allocate additional space from the array each time more space is consumed by the host. This space can be allocated in contiguous locations on the storage array, thus avoiding the fragmentation that can occur when thin LUNs are expanded.
Storage arrays may need to reclaim space after it has been allocated to a thin-provisioned LUN if data within the LUN is deleted. Storage arrays may utilize an automatic method for reclaiming space occurring on a scheduled basis, or they may require that space reclamation be performed manually by the storage administrator.
LUN Provisioning
The process of provisioning storage for use begins with the creation of a RAID group from multiple physical disks. (RAID groups were discussed in Chapter 1.) RAID groups allow for the availability and use of a larger amount of storage as compared to a single disk. RAID groups can then be combined to form larger storage pools from which storage can be provisioned. Storage pools can be partitioned into smaller chunks to be allocated to machines. The partitioned chunks created by partitioning a storage pool are called logical units, and each logical unit is referenced with a number called a logical unit number.
image
imageEXAM TIP  LUNs can be given any number supported by the storage array, but they must be unique within the storage array. The same number can be used on different storage arrays, but it cannot be reused within a storage array.
Thick Provisioning
Thick provisioning creates LUs that are a fixed or static size. Thick LUs take up the same amount of space on the storage array as is represented to the host operating system, as shown in Figure 6-1. For example, if a 100GB LU is thick provisioned for a server, the server will see it as a 100GB drive, and it will consume 100GB of space on the storage array.
image
image
Figure 6-1  Thick-provisioned LUs
For example, here is the procedure for creating a thick-provisioned LUN on a Hitachi AMS 200 storage array:
1. Click the array and then click the Change Operation Mode button.
2. Type in the password and click OK.
3. Double-click the array unit.
4. Click the Logical Status tab.
5. Expand RAID Groups on the left.
6. If you click RAID Groups, each RAID group will be shown with its free capacity, as shown in Figure 6-2.
image
image
Figure 6-2  Storage array management console showing RAID groups
7. Right-click the RAID group from which you want to create an LU and select Create New Logical Unit.
8. The next screen, displayed in Figure 6-3, allows you to choose options for the new LUN.
image
image
Figure 6-3  LUN creation
9. Use the SAN RAID groups and LUN allocation documentation to find the next available LU and type that LUN into the blank called Logical Unit No.
10. The controllers should alternate, so choose either controller 0 or 1 depending on the other LUs that are in that RAID group. The RAID groups should be balanced in terms of gigabytes allocated between controllers.
11. Choose the size.
12. Click OK.
13. Click OK again.
Thin Provisioning
Thin provisioning allows for the flexible creation of LUs. The LU appears to the host system as the maximum size it can grow to, but it consumes only the amount of space the host uses on the storage system, as shown in Figure 6-4. The term thin provisioning creates LUs of variable size, as contrasted with thick provisioning, where LUs are created with a fixed-size. Thin provisioning, in contrast to thick provisioning, allows for the dynamic creation of LUs, but must be closely monitored to prevent “resource hogging,” or over-utilization, of critical storage resources. It should be noted that the dynamic growing does put load on the system. With thick provisioning, there is load on the system during the creation process, but because thick provisioning uses a static size, there is no additional load later. Both thin and thick LUs appear the same to the host operating system to which they are assigned.
image
image
Figure 6-4  Thin-provisioned LUs
Both thick and thin provisioning offer two means of allocating storage space. In one instance, storage is allocated in static, fixed-sized blocks and in the other on a dynamic basis. While each scheme is different in terms of how it allocates storage resources, both require a method for reallocating resources back into the pool of available storage. Thin reclamation is a process where space that was previously consumed by a thin-provisioned disk and then left unused can be returned to the overall storage pool. Thin reclamation can be configured to operate when the free space on an LU reaches a specified percentage or a specified amount of free space exceeds 10GB on an LU. Some systems allow for scheduled reclamation, while others can reclaim space immediately following its availability in the LU.
Suppose a 300GB LUN number 105 is created and users store 212GB of data on it for a project. When the project concludes, 100GB of data is archived to tape and removed from the server. This 100GB is unused now on LUN 105, but it must be reclaimed by the storage system in order for it to be allocated to another LU. The process of thin reclamation does just that. If a thin reclamation limit was set to 10GB with a schedule running every night, the night after the 100GB was deleted, the storage array would return 90GB to the pool to be used by whichever LU may need it next. If the setting was 10 percent instead, then the storage array would return 70GB to the pool, leaving 30GB (10 percent of 300GB) as unused space on LUN 105.
Thin Provisioning Woes
A storage administrator learns that he can save space by provisioning only what is currently needed using thin provisioning. This allows the storage administrator to put off purchasing additional storage for his storage array. He has 2TB left on the array, and he creates twelve 400GB LUs using thin provisioning and creates several virtual machines on a highly available hypervisor cluster using the LUs as their drives. Several months later, he receives a call saying that the 12 machines he set up are all not responding. When he investigates, he finds that all the machines are paused and he cannot access them. The event collection system shows out-of-disk-space errors for all machines. This causes him to check the storage array to find that all the space on the array has been consumed by the thin-provisioned LUs. He migrates two of the LUs to another storage system and then resumes the virtual machines, knowing that he will have to propose purchasing additional storage immediately. Monitoring the size of thin-provisioned LUs in the situation described here would have alerted this storage administrator to the need for additional storage before the problem occurred. Such monitoring solutions are discussed later in this chapter.
Best Practices for Disk Provisioning
Repeatedly growing an LU can also lead to fragmentation across the storage pool since the new space allocated to an LU may not be in a continuous location on the RAID group. It could even be on a different RAID group in the pool. For this reason, it is best to try to estimate the near-term needs of an LU and to allocate its minimum size according to the expected near-term needs to avoid excessive growth. Growth sizes can also be determined. It may be best for some LUs to increase by larger amounts than others based on their usage. This can be configured as a storage array default or on a per-LU basis. Reclamation thresholds may also need to be increased if an LU regularly consumes space and then frees it up again so that the free space is not immediately returned to the pool. It is sometimes best to re-create thin-provisioned LUs that have been increased several times. The newly created LU will have space allocated from the same area of disk, and this can remove fragmentation problems from the LU.
As with any critical network resource, the provisioning of storage must include practical as well as technical considerations. Storage optimization must include tweaking system performance by assigning storage to hosts and servers, identification and optimization of paths between these host and servers, and zoning and masking. Collaboration and communication are two key elements in successfully provisioning resources for a given enterprise. Collaborative identification of critical data locations, potential provisioning bottlenecks, and enterprise-wide provisioning standards will result in the development of comprehensive storage allocation and management policies that foster security, optimization, and availability of storage resources.
Oversubscription
While thin provisioning is a more robust and dynamic method for allocating storage in a given environment, it can result in situations where demand exceeds available capacity, or oversubscription. Oversubscription allows multiple devices to be connected to the same switch port. Devices share the bandwidth of a single port, and it can be effective when multiple slower connections, such as 1 Gbps fiber links, connect to a single 8 Gbps fiber link. However, if multiple devices are connected to a single port and they have a higher demand for bandwidth than is available, contention will arise, and response times could be impacted. For this reason, it is important to determine the needs and capabilities of nodes before oversubscribing them to a link. High-bandwidth nodes should be on their own port, whereas low-bandwidth ports, or ports that utilize bandwidth only during certain times, can be grouped onto the same port.
Management Protocols
Storage and data management are inextricably bound. As the life source of an enterprise, data is the one of the most valuable assets it possesses. Any circumstance that delays the availability of and access to this critical resource will have serious and potentially unrecoverable consequences. Successful storage management ensures that data is stored and accessible to the enterprise when it is needed. Storage management maintains information about where, in what form, and how data is stored throughout the system. It allows for monitoring, which is invaluable for ongoing system optimization, capacity planning, and troubleshooting.
Data-hungry applications and transactions are a source of ever-increasing demand. Many enterprises react to this demand by buying additional network and storage resources. While the price of network infrastructure and storage have declined exponentially, both still represent potentially sizable capital expenditures for an organization. A well-devised storage management plan can leverage these investments by providing a proactive road map to meeting an organization’s data storage needs. The following sections provide an overview of several storage management techniques and trends.
SNMP
Simple Network Management Protocol (SNMP) is one of the earliest management protocols that can be used to manage routers, switches, and other devices such as storage devices. As its simplest operations, SNMP defines four operations: Get, GetNext, Set, and Trap. Get retrieves one or more management information base (MIB) values, Get-Next sequentially retrieves data from an MIB table, Set is used to update the MIB value, and Traps are used to flag unusual network and storage conditions.
image
imageEXAM TIP  SNMP information can be helpful in troubleshooting problems such as unavailable systems, logical volumes, or shares.
SNMP is a method for devices to share information with a management station through UDP ports 161 and 162. SNMP-managed devices allow management systems to query them for information. Figure 6-5 shows a management console for Dell OpenManage that uses SNMP to gather information from servers. Management systems are configured with MIBs for the devices they will interact with. These MIBs define which variables can be queried on a device. This is what makes SNMP so versatile since manufacturers can create MIBs for their devices that can then be installed onto management stations.
image
image
Figure 6-5  SNMP information shown in a Dell OpenManage system
SNMP version 1 devices share data with management servers that are in the same family, denoted by a community name configured on the device. This name is some-what similar to a workgroup name. There is no authentication required to be part of the family like there would be in a domain environment, so this offers little security. Many common off-the-shelf systems management applications and open-source tools offer SNMP management, and the protocol is stable and mature.
SNMP was enhanced in version 2, and SNMPv2 offers performance and security enhancements over SNMPv1. SNMPv2 can achieve greater performance by requesting data in bulk from devices rather than issuing many separate requests. SNMPv2 can operate in one of two modes depending on the security desired and the level of interoperability required. SNMPv2 community mode authenticates devices based solely on the community string they present, whereas SNMPv2 is user based.
SNMP interoperability between versions 1 and 2 can be achieved through a proxy that converts data between the protocols or through a system that supports both protocols at once. This is a process known as bilingual network management.
The latest version of SNMP is version 3, which offers even greater security than version 2. SNMPv3 supports encrypted channels and mutual authentication of both managed devices and management systems. SNMPv3 also offers additional data integrity checks to better protect against corruption or modification of data in transit.
WBEM
While SNMP is commonplace in organizations for monitoring network storage, the increasing demands and complexity of modern networks and the storage assets they connect sometimes require more robust and dynamic methods of managing them. A storage management consortium called the Storage Management Initiative (SMI) sponsored the Common Information Model (CIM), an object-oriented approach to organizing information where objects have attributes and each is created, or “instantiated,” from a class that describes the format of the object. Objects can inherit properties from parent objects and pass down properties to child objects below them. This allows for high customization without unnecessary duplication of effort or an increase in complexity. Figure 6-6 shows a sample WBEM tool for a storage array.
image
image
Figure 6-6  WBEM tool for a storage array
Web-Based Enterprise Management (WBEM) was created by the Distributed Management Task Force (DMTF) as a framework to be utilized by a collection of technologies to remotely control servers or other equipment and to configure, audit, update, and monitor devices. WBEM is used to provide a common platform for defining standards for the heterogeneous technologies, services, and applications that comprise most enterprise-wide networks. The standards of WBEM provide a web-based approach for exchanging CIM data across different technologies and platforms. Extensible Markup Language (XML) can be used to encode CIM data, and this is then transmitted between WBEM servers and clients using Hypertext Transfer Protocol (HTTP).
The design of WBEM allows it to be extensible. This means that new applications, devices, and even operating systems can be specified in the future. There are many applications for WBEM, including grid computing, utility computing, and web services.
SMI-S
The Storage Management Initiative-Specification (SMI-S) is a protocol used for sharing management information on storage devices. It is a good alternative to SNMP for managing devices such as storage arrays, switches, tape drives, HBAs, CNAs, and directors. It is a client-server model where the client is the management station requesting information and the server is the device offering the information. The latest version of SMI-S is version 1.5, published in 2011. SMI-S includes specifications for WBEM that allow the protocol to take advantage of CIM, web-based management and monitoring, and XML encoding, among other things.
In-Band vs. Out-of-Band Management
In-band management in a storage network means that data as well as management and monitoring information travel over the same path. In-band management is generally easier to implement because it has a connection to all the devices in band already and, when bundled with a storage system, typically does not require additional or special software to be installed in order for it to function. In-band management provides caching and advanced functions within a storage network. In-band management systems, however, are limited to only managing devices that are in the fabric that they are on.
Out-of-band management involves the use of a path for device management and monitoring that is separate from the data path. Out-of-band management allows the system administrator to monitor and manage servers as well as other network equipment through remote control from outside the environment and to pool management and monitoring resources in a central location for distributed data centers and SANs.
In addition to the benefits listed previously, both forms of error correction/flow control have subtle differences. While in-band management is cheaper than its counter-part out-of-band management, it does not allow access to Basic Input/Output System (BIOS) settings or the re-installation of the operating system. Hence, in-band management is not suitable for solving boot issues. Out-of-band management supports remote management activities such as remotely shutting down; restarting or power cycling hardware; and redirecting keyboard, video, and mouse (KVM).
Storage Administration
Given the complexity of most modern enterprises and their associated networks, storage administration has become an equally daunting task. Capacity planning, configuration management, system utilization, and performance optimization, along with a robust problem identification and resolution strategy, are the key responsibilities of effective administration. Storage administration must be based on proactive practices and is oftentimes done with a variety of tools and procedures. The following sections provide a brief overview of the most common tools and techniques that can be deployed as part of a comprehensive storage management policy. Administration tools fall into two categories: graphical and command line.
GUI
A graphical user interface (GUI) is a method of interacting with a system whereby programs and operations are represented by small pictures known as icons and menus and where navigation and possible user selections are displayed on the screen as pictures or text that the user can select by clicking with a mouse, pressing keys on the keyboard, or touching the screen. GUIs are relatively easy to learn because users do not need to memorize commands or conform to strict command syntax.
Some forms of GUI administration include management applications and web-based administration tools using HTTP or HTTPS.
Management Applications
Some storage systems can be managed by applications that are installed on a server. The software typically runs on top of one or more mainstream operating systems such as Microsoft Windows, Linux, or Solaris and can be managed in band or out of band.
HTTP/S
Hypertext Transfer Protocol Secure (HTTP/S or HTTPS) is a communication protocol for secure communication over a computer network. First coined in 1991 by Ted Nelson, HTTP/S is implemented by layering the Hypertext Transfer Protocol (HTTP) over the Secure Sockets Layer (SSL)/Transport Layer Security (TLS) protocol, resulting in the addition of the security capabilities of SSL/TLS into that of HTTP communications.
HTTPS works by encrypting and decrypting user page requests as well as the pages that are returned by the web server. The use of HTTPS helps prevent eavesdropping and man-in-the-middle attacks. Netscape is the developer of HTTPS. Both HTTPS and SSL support the use of X.509 digital certificates from servers so that a user can authenticate a sender, if necessary. HTTPS uses port 443 instead of port 80 unless otherwise specified.
CLI
A command-line interface (CLI) is a way of interacting with a computer program where a user or client sends instructions or “commands” to a program using text-based commands. CLI is also known as command-line user interface (CLUI), console user interface (CUI), and character user interface.
CLI was the primary way of interacting with most computer systems before GUI systems were invented, and it was a big improvement over the system that proceeded it, punch cards. The interface is a program that accepts text inputs as instructions and then converts this into the appropriate operating system functions that the equipment, server, or storage device can execute.
CLIs in operating systems are distinct programs that are supplied with the OS. The program that implements text interfaces is called a command-line interpreter (also known as command processor or shell). Examples of command-line interpreters include some of the following:
• PuTTY for SSH access
• Hyperterminal for Telnet access
• Shells for Unix (csh, ksh, sh, bash, tcsh, and so on)
• Command prompt for Windows (CMD.EXE)
• DOS (COMMAND.COM)
• Command shell for Apple
• PowerShell for Windows Vista/7/8 and Server editions
Because of their complexity and the dominance of GUIs, command-line interfaces are rarely used by casual computer users. However, CLIs provide a deeper level of system control and are preferred by advanced computer users as an easy way to issue routine or batch commands and perform complex logical operations, such as the use of looping and conditional programming to an operating system. CLIs allow administrators to write a series of commands that can be executed whenever desired or even on a schedule. CLIs allow for more complex instructions and operations because a series of commands to multiple computers can be executed simultaneously or in sequence rather than having to perform a standard set of tasks manually on many individual servers, computers, or devices.
Some forms of CLI administration include local administration of the device through the serial interface or remote administration via Telnet or SSH.
Serial
The serial port has been used for decades to manage network devices. The serial port is for local administration—when the administrator has physical access to the device. Serial cables have nine connectors, and they are used to connect two devices together. Once a connection has been established, a program such as Hyperterminal on Windows can be used to initiate a connection to the device that is at the other end of the serial cable. Serial ports have a logical communication (COM) port that they are associated with. To create a connection, you must choose the speed, data bits, parity, stop bits, and flow control.
The speed is the rate at which data can be transferred over the cable, measured in bits per second (bps). Common speeds include 1200, 2400, 4800, 9600, 19200, 38400, 57600, and 115200 bps. The data bit setting configures how many bits will be framed by a start signal and a stop signal. Most communication frames are 8 bits because this correlates to a single ASCII character, but you may encounter 5, 6, 7, or 9 as well. Parity can be set to either even, odd, or off. When parity is set to on, an extra bit is added to each data bit set to make the number of 1s in the bit set either odd or even. If the receiver finds a bit set that is odd when the parity is set to even, it knows that the data was corrupted in transit. This method is still prone to error because corruption in the data could still result in the parity calculation to prove correct, so other mechanisms for error detection are usually required on top of parity calculations. The last setting is the stop bits. This setting determines how many bits will be used to show that a character stream has ended. This is usually set to 1. Serial communication does allow for flow control if this is specified in the configuration. Flow control allows for a device to pause and resume communication.
Serial communication, specified in Reference Standard 232 (RS-232), operates asynchronously, meaning that there is no signal or time spacing used to keep the communication in sync. There are a number of serial ports, but the most common one is shaped roughly like the letter D and has nine ports. This type of serial port can be referred to as DB-9. As the name suggests, serial ports send data serially, one bit at a time. Each set of bits is framed with a start signal and a stop signal. The number of bits that must be framed is specified when configuring a serial port. One of the key limitations of serial cables is their maximum distance of 50 feet. However, there are network-based serial port modules that can be used to connect to serial sessions over an IP address. These devices operate much like Keyboard Video Monitor (KVM) devices, and they encapsulate the information received on a serial port over an IP network and send information received on the IP network to the device connected to the serial port.
Telnet
Telnet is a protocol used to provide a command-line terminal that allows communication and control of another device known as a virtual terminal. As an early network access protocol, Telnet was used to connect various remote hosts, computers, or devices in a network using TCP as the transport protocol. Telnet operates over TCP port 23 by default.
image
imageEXAM TIP  Telnet exchanges information in plain text, meaning it is not encrypted. Other devices on the network may be able to intercept this traffic and view the username and password used to connect to the device, along with any information that is shared between the user and the device over Telnet.
Telnet is easy to use and configure, and there are many client applications that support Telnet, including the Terminal application built into Microsoft Windows. Telnet, however, does not include many security features and does not encrypt data; thus, it is not suitable for use over an unsecured channel such as the Internet. When connecting to a remote device using Telnet, a virtual private network (VPN) should be used to provide authentication and encryption between the local and remote networks.
SSH
Secure Shell (SSH) is a Unix-based command interface and protocol for getting access to a remote computer or device in a secure manner. Figure 6-7 shows an SSH logon session using PuTTY. SSH is a protocol that is widely used by network administrators to control web and other types of servers remotely. SSH is also used for remote command-line login, remote command execution, secure data communication, and other secure network services between two networked computers. SSH does this through a secure channel over an insecure network and a server and a client (both of which need to be running SSH server and SSH clients, respectively).
image
image
Figure 6-7  SSH session using PuTTY
SSH is composed of a suite of three utilities: slogin, ssh, and scp. All of these are secure versions of the earlier Unix utilities rlogin, rsh, and rcp. SSH commands are encrypted and are secure in lots of ways. Both ends of the client-server application use a digital certificate to get authentication. The passwords are also encrypted for added protection.
SSH makes use of the Rivest, Shamir, Adleman (RSA) public key cryptography for both connection and authentication. Blowfish, Data Encryption Standard (DES), and International Data Encryption Algorithm (IDEA) are among the list of encryption algorithms. The default among these is IDEA.
image
imageEXAM TIP  Many systems that support Telnet also support SSH. If an exam question asks how to best connect to a system and both Telnet and SSH are listed, SSH is most likely the answer.
The latest version of SSH is SSH2, which is a proposed set of standards from the Internet Engineering Task Force (IETF). SSH2 is more efficient and more secure than SSH, but it is not compatible with SSH version 1. The enhanced security of SSH2 comes from its use of message authentication codes (MACs), additional integrity checking, and a more secure method of key exchange.
The following example procedure generates an SSH key to be used with PuTTY:
1. Download and run the puttygen program to create a key.
2. Select the version of SSH you will use (SSH-1, SSH-2, or SSH-3).
3. Click Generate.
4. Move your mouse around under the public key label until the bar completes. The screen will look like Figure 6-8.
image
image
Figure 6-8  PuTTY key generation
Storage Monitoring, Alerting, and Reporting
Monitoring, alerting, and reporting are integral parts of a comprehensive storage management plan. These capabilities foster rapid prototyping; fast moves, adds, and changes; efficient fault-tolerant upgrades; and a proactive resource for capacity planning, performance analysis and optimization, problem identification and management, and system availability, reliability, and security.
Storage monitoring is a major function within storage management. It monitors the pulse and health of a storage network and all associated resources. Monitoring is an expensive proposition. Real-time monitoring is crucial but costly. Hence, care must be taken to prioritize which storage resources and data assets are essential to business health and continuity in the event of their loss. Snapshots may be taken over a period of time to offset the cost of monitoring those resources deemed less critical.
Alerting is the mechanism by which the system administrator or, in some cases, automated routines and policies are made aware of changing conditions in the network, device, and storage. Thresholds are typically established to delimit upper and lower operating parameters across a range of metrics. The true benefits of storage monitoring and alerting are captured in a series of reports. These reports detail the trends in system/storage utilization, performance, and faults. On the high end, analysis of this data can form the input to a system that “self-heals,” or automatically adjusts to maintain system optimization and integrity. In more practical use, these reports allow for the proactive administration of complex system and storage resources.
Settings Thresholds
In storage management, thresholds are used to monitor the storage usage of a database. Administrators can set warning and alarm thresholds in storage management tools, which then compare the set values against real-time readings from the system. If the storage state exceeds the safe levels—or thresholds—that were set for it, an alert flag will be shown beside the object whose value has increased beyond its safety level.
image
imageEXAM TIP  Metrics that exceed the alarm threshold should be addressed immediately. For example, an alarm threshold might be a temperature above 90 degrees in a server room.
Every object is created with a default threshold. The children of this object will inherit the same value. However, these values can be overridden when an administrator decides to set a specific value for certain objects—or all of them, for that matter. Once a new value has been set for any object, the children of that object will inherit the set value.
Normally, there are three threshold values or boundaries or zones. These are normal, warning, and alarm. Normal is when an object doesn’t show any signs of problems at all or is at its set or default value. A warning happens when an object has reached beyond its set or default value but isn’t a cause for immediate concern or hasn’t progressed into a threat. An alarm is when action needs to be taken to address the problem because the object has exceeded the set or default values and has progressed into a state where it can threaten the entire system.
Trending
A trend is a pattern of gradual change in a condition, output, or process, or an average or general tendency of a series of data points to move in a certain direction over time, represented by a line or curve on a graph. This definition is applied to many facets, especially in forecasting. For example, a business can conduct a survey on what would be the products that consumers will demand in the future based on shopping data.
This same concept can also be applied in storage management. An administrator, through the use of a storage management tool, can compare growth trends for a storage resource between two points in time. Be it the number of files, overall file size, or available free space, a storage management tool gives an administrator all the information needed to better manage storage devices.
The trending information that an administrator gets from a storage management tool will help the administrator determine the growth rate on a volume, calculate capacity, and budget for more storage. By doing this, an organization can better plan for the use of its resources and purchase additional ones before they are needed. As mentioned throughout, data is important to a company, and having sufficient storage to put that data in is crucial. And with the help of storage trending, a business can easily determine when it needs to add more based on data compared over different time periods.
The use of real-time network/system maps or applications that supply access to aggregate data via GUIs, graphs, charts, and other interactive multimedia methods has greatly aided the analysis of data from monitoring and alerts and supplied invaluable tools for forecasting and capacity planning based on observed trends.
Forecasting and Capacity Planning
Storage capacity planning refers to the practice of assessing or making a forecast with regard to future storage requirements. The aim of capacity planning is to strike a balance between providing high levels of system, storage, and data reliability and availability at reasonable operating costs. The goal is to ensure that an enterprise is proactive rather than reactive in addressing capacity planning through forecasting.
In the past, many network as well as storage resources were poorly allocated or in some cases grossly underutilized. Forecasting and capacity planning have taken the guesswork out of meeting an enterprise’s growing storage needs. With effective capacity planning, data storage administrators can maximize capital expenditures by procuring only the storage and supporting infrastructure that is currently needed while predicting future growth with a high degree of accuracy. Overall cash flow and fiscal management are optimized.
The following ten questions can serve as a starting point for forecasting and capacity planning. This list is in no way comprehensive, but it can demonstrate the direction and mind-set found in the process.
• How much storage is in use out of what has been allocated?
• How much bandwidth is required between sites, switches, and VLANs?
• How much bandwidth is available for WANs, trunks/ISLs, front-end and back-end ports, and routing ports?
• What is the current read-to-write cache allocation for storage arrays?
• What is the relationship between storage I/O and network traffic?
• What is the recovery time objective (RTO) and recovery point objective (RPO)? (See Chapter 7.)
• How much data must be backed up, replicated, or archived? What is the schedule?
• Where is storage located?
• Where are the users of storage located?
• How will systems, applications, and users interface with the storage system?
Storage administrators can determine how storage is being allocated with the use of storage capacity planning tools that analyze storage systems and then generate a report based on the data available and storage performance. These analysis and reporting tools are disparate and often proprietary. Proprietary forecasting and reporting tools can make it difficult for storage administrators to get an end-to-end look at the overall health of storage resources and report on future capacity and performance needs, but they still save time in collecting and interpreting data. The establishment of baselines helps to elevate the disparity between various data collection and analysis tools.
Recording a Baseline
A baseline is defined as statistical information on performance over a period of time. Establishing a baseline is the starting point before comparisons can be made. For example, in business, sales from the first quarter can be compared to those from the second quarter to determine whether the business earned a profit. This same concept is applicable in storage management to determine whether system performance is as expected or within tolerance.
When dealing with storage devices, it helps to know the capabilities and limitations. Understanding this will serve as a reliable and accurate baseline for comparison. Establishing storage baselines will allow the administrator to identify and use values that aid in defining normal performance conditions from abnormal ones. Establishing and monitoring these baselines for various elements of performance and utilization allow the administrator to differentiate between normal and abnormal system conditions.
A baseline is created by capturing data on normal operations over a period of time, so it should be created when a system has reached a point of stability and then should be updated when significant planned changes are made and/or on a periodic basis. The period of time during which you collect data is discretionary, but if your organization does not have a standard, one week is a good starting point. This period typically provides enough time to average usage changes that may occur during the week and to take into account expected periods of low and high utilization. If you don’t have a baseline now, consider creating one right away.
image
imageNOTE  The goal of the baseline is to understand normal traffic. Do not collect a baseline during times when you know that traffic will be outside of the norm, such as the end of the fiscal year on an accounting system.
The baseline should include data on both peak and average performance. The peak statistics show the utilization when it is at its highest level, and the average statistics show the sum of a set of performance statistics taken at regular intervals divided by the number of intervals in the set. Some elements of a baseline include peak and average performance statistics for the following:
• Error rates
• Input and output per second
• Requests
• Queue sizes
• Concurrent users
The average statistics are your normal values, but the peak statistics will help in filtering out the false-positive events that differ from the baseline. For example, consider a situation where average NIC utilization is 10 percent with a standard deviation of 40 percent for the link-aggregated connection to a NAS. Peak utilization, occurring at 8 a.m. and 4:30 p.m., is 35 percent, with a standard deviation of +/– 10 percent. Given this information, anomalous behavior would be NIC utilization that is less than 5.99 percent or greater than 14 percent, except for 8 a.m. and 4:30 p.m., when anomalous behavior would be NIC utilization below 31.4 percent or higher than 38.5 percent.
It is also helpful to map out the average load on data sets within the storage system or systems. A single storage system may host data for a variety of shares and purposes. Some of the data may be accessed continually, and other data may be accessed rarely. Understanding which data is accessed the most will help in planning for expansion or in prioritizing troubleshooting or recovery efforts so that the most critical data is made available first.
You can acquire storage performance metrics using SAN vendor-based tools, third-party storage management programs, or built-in operating system utilities.
Displaying Performance Data in Windows with Performance Monitor
Microsoft Windows systems have a built-in tool called Performance Monitor that can be helpful in obtaining performance statistics from the local machine. While this may not be central to the storage device, NICs or HBAs are part of the host and contain useful metrics for the baseline. On a Windows machine, select the Start button and go to Run. Type perfmon and press enter. This will open Performance Monitor, as shown here:
image
The % Processor Time counter is monitored by default, but you can click the red X to remove it. Click the green plus sign to add more counters. Each counter will be given a line color as it is displayed on the graph.
Setting Alerts
In the event that storage thresholds or baseline conditions are not met or are exceeded, some form of alerting is needed. In this regard, the type and frequency of monitoring and reporting become critical. The complex task of alerting is shared by humans, applications, and devices. Many applications and devices are able to monitor themselves and send alerts if operating policies are not met. Intelligence in the network has reached a staggering degree of sophistication as broken or ailing paths, devices, storage, and other resources are automatically taken offline or even bypassed in order to maintain system functioning and integrity. Nonetheless, human vigilance and interaction are always required. Figure 6-9 shows the alerts section for an EMC storage array within a WBEM management tool.
image
image
Figure 6-9  Alerts for a storage array in a WBEM management tool
Once an alert has been sent, the storage administrator can take a number of predetermined contingencies and actions. As previously stated, a large amount of resolving or correcting system conditions has been automated with solutions “pushed” to the administrator. By analyzing the type, source, and other critical information related to the alert, the administrator may identify trends or patterns of behavior in the system. Used properly, this information allows the administrator to proactively devise strategies to minimize the impact of these issues across time, thus ensuring system health and financial stability.
Auditing Log Files
Log files are another important aspect in the monitoring of storage devices. A log file gives the administrator a complete picture of what went wrong. Some legacy or inexpensive storage devices are not capable of indicating the source and nature of a fault that has occurred. Log files provide details that assist in system monitoring and reporting. Logs provide data about desired system conditions as well as abnormal conditions, which allows the administrator to validate or change those areas of the storage administration plan as needed. While more tedious than other monitoring techniques because of the size and the sheer number, log files are invaluable for longitudinal trend analysis.
image
imageNOTE  Be sure to record the machine time for devices when collecting or analyzing log files. If the machines being compared have different times, you will need to adjust the timestamps in the log files to compensate so that you can accurately track activity.
Alerting Methods
An administrator can set different types of alerts in order to be notified if something goes wrong with a storage device, such as a hardware failure, or a high-priority or critical-event log entry is detected. Alerts can be sent when a critical error occurs or when a server or system metric has exceeded its threshold value. For example, alerts could be sent out when a hard drive fails, and then the administrator could check the particular disk that issued the alert, replacing it if necessary.
Some alerting methods include the following:
• E-mail  E-mail is one of the most commonly used methods for alerting. E-mail uses Simple Mail Transport Protocol (SMTP) to deliver e-mail from the alerting server to the mailboxes of those you want to alert, such as storage administrators or other IT staff.
• SNMP  Another alert method of choice is through the Simple Network Management Protocol. Through SNMP traps, a management application can receive alert information so that it can be displayed in a network operations center or reviewed on a periodic basis by IT personnel. Management applications may also send alerts via other methods when an SNMP trap is received for specific items.
• Short Message Service (SMS)  SMS is used for sending text messages to phones, software that emulates a phone, or SMS software. Many employees carry their cell phone with them, and this is an effective way to reach them wherever they may be.
• Phone or modem  Some systems can dial a number and leave a prerecorded message on a phone or cell phone. Others may use a modem to send notifications. Storage devices may be configured with a modem or a network connection to a remote site managed by a vendor so that they can “call home.” This way, the vendor knows of the problem and can dispatch support personnel or troubleshoot the issue remotely.
Figure 6-10 shows an alert configuration screen for an EMC storage array. ACLUPD alerts with a critical severity will be sent to the e-mail address [email protected], and ADMIN events with a critical severity will be sent using SNMP to an event collector residing on a server with the IP address 192.168.5.30. This event collector is configured to receive alerts tagged with the community string “EMC-alerts.”
image
image
Figure 6-10  Configuring alerts
Chapter Summary
This chapter marked a transition from the technical, physical, and theoretical aspects of storage and began the discussion of managing storage and the data it holds.
• Logical units are assigned to a host by associating their logical unit number with that host.
• There are two methods for provisioning LUs—thick or thin.
• Thick LUs are static in size, meaning they consume the same amount of storage on the storage array as is presented to the host.
• Thin LUs are provisioned with only the actual amount of space that is consumed by the host, but a maximum size is presented to the host.
The second part of the chapter focused on management systems and protocols. To monitor critical events such as free space, management protocols were created. Management protocols allow for information on systems, including critical events and performance, to be gathered and configuration changes to be made.
• The notification of such events is known as alerting.
• Alerts can be set up based on criteria such as the presence of an event and the crossing of a threshold.
• Alerts can be logged or sent directly to administrators.
• Storage devices and network devices often retain logs containing such information that can be reviewed.
• The default log settings may not be appropriate for all circumstances, and some administrators may want to enable logging on other items.
• Logs may be overwritten over time, so they must be archived if administrators want to view them in the future.
• Management protocols are as follows:
• SNMP gathers system information from devices and other network components and stores it in a repository.
• SNMP gathers information from network devices over UDP ports 161 and 162.
• SNMP interprets the data based on a management information base.
• The Storage Management Initiative-Specification operates in a client-server model with clients requesting information from storage resources.
• Web-Based Enterprise Management is a collection of such technologies that is used to configure, audit, update, and monitor devices.
• In-band management in a storage network means that both data and management move over the same path, whereas out-of-band management uses a separate path for data and management.
• Some management utilities use graphical interfaces such as WBEM systems through the use of Hypertext Transfer Protocol or other HTTP-based management consoles. Other control functions are performed using a command-line interface.
• A serial cable is used to make direct connections from a computer to a device such as a network switch or storage device.
• Telnet and Secure Shell are methods of accessing a computer or device remotely.
• Telnet does not include many security features and thus is not suitable for use over an unsecured channel such as the Internet.
• SSH uses an encrypted secure channel to connect a server or device and a client.
• Storage monitoring is summarized as follows:
• A threshold is an established target value that administrators desire systems to remain above or below. For example, one threshold could be temperature.
• Baselines are metrics that define normal performance. A baseline is gathered so that future metrics can be compared against it to identify anomalous behavior.
• Trending is a pattern of change over time that can be used to predict utilization or a need of systems in the future.
• Forecasting allows for the analysis of trends identified as part of the ongoing process of monitoring and alerting.
Chapter Review Questions
1. Which of the following are possible provisioning methods?
A. Thin and thick
B. Lazy and thin
C. Think and thin
D. Thick and lazy
2. You are making a business case for implementing thin provisioning on your storage arrays. Which reason would you give your manager to explain why thin provisioning should be used?
A. It replicates data to another storage array.
B. Resources are minimized until needed.
C. Space on the array can easily be consumed.
D. Monitoring of the space available is simplified.
3. Which of the following is a best practice when using thin provisioning?
A. Nothing. Thin-provisioned software will automatically manage everything.
B. Run Disk Cleanup weekly.
C. Set Disk Defragment and Optimize to run daily.
D. Monitor the disk space used and trends in data capacity.
4. What is the advantage of SNMPv3 over SNMPv2?
A. Supports encrypted channels and mutual authentication
B. Performance and security enhancements over v2
C. Can share data with management servers that are in the same family
D. Can connect multiple devices to the same port
5. How does Web-Based Enterprise Management (WBEM) help the administrator manage devices?
A. WBEM has a limited set of management standards that function flawlessly.
B. WBEM has no expansion to allow future enhancements.
C. WBEM is a standard that allows both the Common Information Model and Extensible Markup Language (XML) to allow future enhancements.
D. WBEM is a protocol used for sharing data on storage devices.
6. How does out-of-band storage management differ from in-band storage management?
A. Only certified operations are allowed in in-band storage management.
B. Out-of-band installations are less complex.
C. Out-of-band requires a dedicated management channel.
D. Out-of-band requires a complex network IP addressing scheme.
7. Where did the OpenEBEM, OpenPegasus, and WBEMsource feature sets originate?
A. Originally proposed by Apple in the 1970s
B. DOS and Linux
C. CIM
D. PowerShell
8. What is considered the biggest benefit to the CLI?
A. It is more intuitive.
B. Commands can be scheduled or sequenced rather than as individual processes.
C. The graphic user interface is easier to use.
D. Commands are easier to remember than a long series of steps in the operating system.
9. You have been tasked with documenting the controls in use on your storage network. How should you describe the use of log files?
A. Log files provide details that assist in system monitoring and reporting.
B. Log files are used to preserve data integrity.
C. Log files ensure that third parties meet availability SLAs.
D. Log files are used only by hardware vendors to troubleshoot hardware issues.
10. Which of the following storage administration methods is not encrypted?
A. SSH
B. Telnet
C. VPN
D. IPSec
11. Which of the following describes oversubscription?
A. Oversubscription is when multiple devices receive the same data on the network.
B. Oversubscription should be avoided whenever possible.
C. Oversubscription allows multiple devices to be connected to the same switch port.
D. Oversubscription is an error state when devices consume too much bandwidth.
Chapter Review Answers
1. A is correct. Thin provisioning allows the host to believe that it has the maximum amount of space available while the storage system allocates only what is actually used, whereas thick provisioning allocates the entire amount to the storage device. Thin provisioning represents a more optimal means of allocating storage resources than thick provisioning.
B, C, and D are incorrect. Thick and thin are both methods of provisioning, while lazy and think are not. Choices B, C, and D use the terms lazy or think.
2. B is correct. A thin-provisioned drive will increase in size as needed.
A, C, and D are incorrect. A is incorrect because thin provisioning is not used for replication. C and D are incorrect because they are disadvantages of thin provisioning.
3. D is correct. Monitoring of data storage usage and data usage trends is absolutely necessary to prevent drive space overusage.
A, B, and C are incorrect. A is incorrect because applications running on thin-provisioned LUNs do not know that they are thin provisioned. B and C are incorrect. While Disk Cleanup and Defragmentation can be helpful for regular drive maintenance, they do not benefit thin-provisioned disks better than thick-provisioned disks.
4. B is correct. SNMP version 3 offers many of the same features as version 2, but it provides better performance and security.
A, C, and D are incorrect. A and C are incorrect because these features are available in both SNMP versions. D is incorrect because SNMP does not handle physical connections.
5. C is correct. WBEM is a standard that allows both the Common Information Model and Extensible Markup Language (XML) to have future enhancements. WBEM is web based, and XML is a data formatting standard used in both web-based systems and data storage.
A, B, and D are incorrect. A is incorrect because WBEM does not function flawlessly. B is incorrect because WBEM offers many options for future expansion by providing a framework that systems can continue to use even as other technologies and processes change. Lastly, D is incorrect because WBEN shares data with management stations and users of management stations, not between storage devices.
6. C is correct. Out-of-band requires a dedicated management channel, while in-band uses the same channel for data and management.
A, B, and D are incorrect. A is incorrect because certification is not required for in-band storage management. B is also incorrect because out-of-band installations are more complex, rather than less complex, than in-band solutions. Lastly, D is incorrect because out-of-band storage management utilizes the same IP addressing scheme as any other service on an IP network.
7. C is correct. These feature sets were born out of CIM.
A, B, and D are incorrect. OpenEBEM, OpenPegasus and WBEMsource did not originate with Apple, DOS, Linux, or PowerShell.
8. B is correct. Commands can be scheduled or sequenced rather than as individual processes.
A, C, and D are incorrect. A is incorrect because CLI is often less intuitive since users must know and remember commands and their structure. C is incorrect because a CLI does not have a graphical user interface. D is also incorrect since a graphical user interface provides visual cues for users to remember a process, and users can fumble around until they find the right option, whereas CLI commands must be entered perfectly in order to execute.
9. A is correct. Log files provide details that assist in system monitoring and reporting. An administrator should look to log files when errors are reported because log files contain information on what actions the system performed, what errors were encountered, and the time each action took place.
B, C, and D are incorrect. B is incorrect because log files do not provide data integrity. C is incorrect because log files have no enforcement ability over SLAs.
Lastly, D is incorrect because log files can be used by anyone who understands them, not only hardware vendors.
10. B is correct. Telnet is not encrypted.
A, C, and D are incorrect. SSH, VPNs, and IPSec can all be encrypted.
11. C is correct. Oversubscription allows multiple devices to be connected to the same switch port. This can save on the number of ports and cables required to network components together.
A, B, and D are incorrect. A is incorrect because although these devices may be connected to the same port, they do not receive the same data. B is incorrect because oversubscription can be valuable in lowering the cost of system implementation. D is incorrect because oversubscription is not an error state.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.218.84