Chapter 18. Facilities Management

Introduction

This chapter discusses the major elements associated with managing the physical environment of an infrastructure. We begin by offering a formal definition of facilities management and discussing some of the implications this definition represents. Next we list the many entities involved with facilities management, including a few that are normally overlooked. This leads to a key topic of designating a process owner and the traits most desirable in such an individual.

One of the most critical responsibilities of the process owner is to proactively ensure the stability of the physical infrastructure environment. In support of this, we list the major risks that many infrastructures are exposed to along with methods to proactively address them. We conclude with a quick and simple method to assess the overall quality of an infrastructure’s facilities management process.

Definition of Facilities Management

The words for this definition have been chosen carefully. An appropriate physical environment implies that all environmental factors (such as air conditioning, humidity, electrical power, static electricity, and controlled physical access) are accounted for at the proper levels on a continuous basis. The term all critical infrastructure equipment refers not only to hardware in the data center, but to key infrastructure devices located outside of the centralized facility, including switch rooms, vaults, wiring closets, and encryption enclosures.

Major Elements of Facilities Management

If we were to ask typical infrastructure managers to name the major elements of facilities management, they would likely mention common items such as air conditioning, electrical power, and perhaps fire suppression. Some may also mention smoke detection, uninterruptible power supplies (UPS), and controlled physical access. Few of them would likely include less common entities such as electrical grounding, vault protection, and static electricity.

A comprehensive list of the major elements of facilities management is as follows:

  1. Air conditioning
  2. Humidity
  3. Electrical power
  4. Static electricity
  5. Electrical grounding
  6. Uninterruptible power supply (UPS)
  7. Backup UPS batteries
  8. Backup generator
  9. Water detection
  10. Smoke detection
  11. Fire suppression
  12. Facility monitoring with alarms
  13. Earthquake safeguards
  14. Safety training
  15. Supplier management
  16. Controlled physical access
  17. Protected vaults
  18. Physical location
  19. Classified environment

Temperature and humidity levels should be monitored constantly, either electronically or with recording charts, and reviewed once each shift to detect any unusual trends. The advent of high-density devices such as blade servers can result in hot spots within a data center. These concentrations of heat need to be addressed with proper cooling design. The use of virtualization, in which servers are partitioned with multiple applications, is another way to address this. Electrical power includes continuous supply at the proper voltage, current, and phasing as well as the conditioning of the power. Conditioning purifies the quality of the electricity for greater reliability. It involves filtering out stray magnetic fields that can induce unwanted inductance, doing the same to stray electric fields that can generate unwanted capacitance, and providing surge suppression to prevent voltage spikes. Static electricity, which affects the operation of sensitive equipment, can build up in conductive materials such as carpeting, clothing, draperies, and other non-insulating fibers. Antistatic devices can be installed to minimize this condition. Proper grounding is required to eliminate outages and potential human injury due to short circuits. Another element sometimes overlooked is whether UPS batteries are kept fully charged.

Water and smoke detection are common environmental guards in today’s data centers as are firesuppression mechanisms. Facility-monitoring systems and their alarms should be visible and audible enough to be seen and heard from almost any area in the computer room, even when noisy equipment such as printers are running at their loudest. Equipment should be anchored and secured to withstand moderate earthquakes. The large mainframes of yesteryear used to be safely anchored, in part, by the massive plumbing for water-cooled processors and by the huge bus and tag cables that interconnected the various units. In today’s era of fiber-optic cables, air-cooled processors, and smaller boxes designed for non-raised flooring, this built-in anchoring of equipment is no longer as prevalent.

Emergency preparedness for earthquakes and other natural or man-made disasters should be a basic part of general safety training for all personnel working inside a data center. They should be knowledgeable about emergency powering off, evacuation procedures, first-aid assistance, and emergency telephone numbers. Training data-center suppliers in these matters is also recommended.

Most data centers have acceptable methods of controlling physical access to their machine rooms, but not always for vaults or rooms that store sensitive documents, check stock, or tapes. The physical location of a data center can also be problematic. A basement level may be safe and secure from the outside, but it might also be exposed to water leaks and evacuation obstacles, particularly in older buildings. Locating a data center along outside walls of a building can sometimes contribute to sabotage from the outside. Classified environments almost always require data centers to be located as far away from outside walls as possible to safeguard them from outside physical forces such as bombs or projectiles as well as from electronic-sensing devices.

In fairness to infrastructure managers and operations personnel, several of these elements may be under the management of the facilities department for which no one in IT would have direct responsibility. But even in this case, infrastructure personnel and operations managers would normally want and need to know who to go to in the facilities department for specific types of environmental issues.

The Facilities Management Process Owner

This brings us to the important issue of designating a facilities management process owner. There are two key activities associated with this designation:

• Determining the scope of this person’s responsibilities

• Identifying desirable traits, or skill sets, of such an individual

Determining the Scope of Responsibilities of a Facilities Management Process Owner

The previous discussion about the major elements of facilities management demonstrates that a company’s facilities department plays a significant role in managing the physical environment of a company’s data center. Determining the exact boundary of responsibilities between the facilities department and IT’s facilities management is critical. Clearly scoping out the areas of responsibility and, more important, the degree of authority between these two groups usually spells the difference between resolving a facilities problem in a data center quickly and efficiently versus dragging out the resolution amid chaos, miscommunication, and strained relationships.

For example, suppose a power distribution unit feeding a critical server fails. A computer operations supervisor would likely call in electricians from the facilities department to investigate the problem. Their analysis may find that the unit needs to be replaced and that a new unit will take days to procure, install, and make operational. Alternative solutions need to be brainstormed and evaluated between facilities and IT to determine each option’s costs, time, resources, practicality, and long-term impact, and all this activity needs to occur in a short amount of time—usually less than an hour. This is no time to debate who has responsibility and authority for the final decisions. That needs to have been determined well in advance. Working with clearly defined roles and responsibilities shortens the time of the outage to the clients, lessens the chaos, and reduces the effort toward a satisfactory resolution.

The lines of authority between an IT infrastructure and its facilities department will vary from shop to shop depending on size, platforms, degree of outsourcing, and other factors. The key point here is to ensure that the two departments clearly agree upon, communicate to their staffs, and ensure compliance with these boundaries.

Desired Traits of a Facilities Management Process Owner

The owner of the facilities management process almost always resides in the computer operations department. There are rare exceptions—small shops or those with unique outsourcing arrangements—in which the facilities management process owner is part of the facilities department and matrixed back to IT or is part of the IT executive staff. In any event, the selection of the person assigned the responsibility for a stable physical operating environment is an important decision. An understanding of at least some of the basic components of facilities management, such as power and air conditioning, is a high-priority characteristic of an ideal candidate.

Table 18-1 lists, in priority order, a variety of desirable traits that an infrastructure manager might look for in such an individual. Knowledge of hardware configurations is a high priority because understanding how devices are logically connected and physically wired and how they can be impacted environmentally helps in their operation, maintenance, and recoverability. The high priority for backup systems refers both to physical backups such as UPS, electrical generators, and air conditioning as well as to data backup that may need to be restored after physical interruptions. The restore activity drives the need to be familiar with database systems. The ability to think and plan strategically comes into play when laying out computer rooms, planning for expansion, and anticipating advances in logical and physical technologies.

Table 18-1. Prioritized Characteristics of a Facilities Management Process Owner

image

Evaluating the Physical Environment

As we read from our definition, the facilities management process ensures the continuous operation of critical equipment. The overriding implication is that the physical environment in which these devices operate is sound, stable, and likely to stay that way. But how does one determine the current state of their physical environment and what the likely trend of its state will become?

There are a number of sources of information that can assist data center managers in evaluating the current state of their physical environment. The following list details some of the more common of these sources. Outages logs normally associated with availability reports should point to the frequency and duration of service interruptions caused by facilities. If the problem-management system includes a robust database, it should be easy to analyze trouble tickets caused by facilities issues and highlight trends, repeat incidents, and root causes.

  1. Outage logs
  2. Problem tickets
  3. Facilities department staff
  4. Hardware repair technicians
  5. Computer operators
  6. Support staff
  7. Auditors

The remaining sources are of a more human nature. Facilities department staff can sometimes speak to unusual conditions they observed as part of normal walk-throughs, inspections, or routine maintenance. Similarly, hardware supplier repair technicians can typically spot when elements of the physical environment appear out of the ordinary. Some of the best observers of their physical surroundings are computer operators, especially off-shift staff who are not as distracted by visitors, telephone calls, and the more hectic pace of prime shift. Support staff who frequent the data center (such as network, systems or database administrators) are also good sources of input as to possible glitches in the physical environment. Finally, there are facilities-type auditors whose job is to identify irregularities in the physical operation of the data center and recommend actions to correct them.

Major Physical Exposures Common to a Data Center

Most operations managers do a reasonable job at keeping their data centers up and running. Many shops go for years without experiencing a major outage specifically caused by the physical environment. But the infrequent nature of these types of outages can often lull managers into a false sense of security and lead them to overlook the risks to which they may be exposed. The following list details the most common of these. The older the data center, the greater these exposures. I have clients who collectively have experienced at least half of these exposures during the past three years. Many of their data centers were less than 10 years old.

  1. Physical wiring diagrams out of date
  2. Logical equipment configuration diagrams and schematics out of date
  3. Infrequent testing of UPS
  4. Failure to recharge UPS batteries
  5. Failure to test generator and fuel levels
  6. Lack of preventive maintenance on air conditioning equipment
  7. Annunciator system not tested
  8. Fire-suppression system not recharged
  9. Emergency power-off system not tested
  10. Emergency power-off system not documented
  11. Hot spots due to blade servers
  12. Infrequent testing of backup generator system
  13. Equipment not properly anchored
  14. Evacuation procedures not clearly documented
  15. Circumvention of physical security procedures
  16. Lack of effective training to appropriate personnel

Keeping Physical Layouts Efficient and Effective

In addition to ensuring a stable physical environment, the facilities management process owner has another responsibility that is sometimes overlooked. The process owner must ensure efficiencies are designed into the physical layout of the computer facility. A stable and reliable operating environment will result in an effective data center. Well-planned physical layouts will result in an efficient one. Analyzing the physical steps that operators take to load and unload printers, to relocate tapes, to monitor consoles, and to perform other routine physical tasks can result in a well-designed floor plan that minimizes time, minimizes motion, and maximizes efficiency.

One other point to consider in this regard is the likelihood of expansion. Physical computer centers, not unlike IT itself, are an ever-changing entity. Factoring in future expansion due to capacity upgrades, possible mergers, or departmental reorganizations can assist in keeping current floor plans efficient in the future.

Tips to Improve the Facilities Management Process

There are a number of simple actions that can be taken to improve the facilities management process (as shown in the following list). Establishing good relationships with key support departments such as the facilities department and local government inspecting agencies can help keep maintenance and expansion plans on schedule. This can also lead to a greater understanding of what the infrastructure group can do to enable both of these agencies to better serve the IT department.

  1. Nurture relationships with facilities department.
  2. Establish relationships with local government inspecting agencies, especially if you are considering major physical upgrades to the data center.
  3. Consider using video cameras to enhance physical security.
  4. Analyze environmental monitoring reports to identify trends, patterns, and relationships.
  5. Design adequate cooling for hot spots due to concentrated equipment.
  6. Check on effectiveness of water and fire detection and suppression systems.
  7. Remove all tripping hazards in the computer center.
  8. Check on earthquake preparedness of data center (devices anchored down, training of personnel, and tie-in to disaster recovery).

Video cameras have been around for a long time to enhance and streamline physical security, but their condition is occasionally overlooked. Cameras must be checked periodically to make sure that the recording and playback mechanism is in good shape and that the tape is of sufficient quality to ensure reasonably good playback.

Environmental recording devices also must be checked periodically. Many of these devices are quite sophisticated; they collect a wealth of data about temperature, humidity, purity of air, hazardous vapors, and other environmental measurements. The data is only as valuable as the effort expended to analyze it for trends, patterns, and relationships. A reasonably thorough analysis should be done on this type of data quarterly. Anticipating hot spots due to concentrated servers and providing adequate cooling in such instances can prevent serious outages to critical systems.

In my experience, most shops do a good job of periodically testing their backup electrical systems such as UPS, batteries, generators, and power distribution units (PDUs), but not such a good job of testing their fire detection and suppression systems. This is partly due to the huge capital investment companies make into their electrical backup systems—managers want to ensure a good return on such a sizable outlay of cash. Maintenance contracts for these systems frequently include inspection and testing, at least at the outset. However, this is seldom the case with fire detection and suppression systems. Infrastructure personnel need to be proactive in this regard by insisting on regularly scheduled inspection and maintenance of these systems as well as up-to-date evacuation plans.

Real Life Experience—Operators Devise Shocking Solution

A municipality data center once decided to improve the comfort levels of its computer operators by replacing the raised floor vinyl covers in its data center with carpeted floor panels. The operators were very appreciative of the softer flooring, but noticed static electricity would often build up in them and discharge when they touched the command consoles, often causing a disruption to service.

The facilities department and console vendors devised a solution involving the simple grounding of the consoles, but the work could not be done until the weekend, which was four days away. In the meantime, operators devised their own temporary solution by discharging the built-up charges prior to any console actions by touching, and lightly shocking, unsuspecting co-workers.

One of the simplest actions to take to improve a computer center’s physical environment is to remove all tripping hazards. While this sounds simple and straightforward, it is often neglected in favor of equipment moves, hardware upgrades, network expansions, general construction, and—one of the most common of all—temporary cabling that ends up being semi-permanent. This is not only unsightly and inefficient; it can be outright dangerous as physical injuries become a real possibility. Operators and other occupants of the computer center should be trained and authorized to keep the environment efficient, orderly, and safe.

The final tip is to make sure the staff is trained and practiced on earthquake preparedness, particularly in geographic areas most prone to this type of disaster. Common practices such as anchoring equipment, latching cabinets, and properly storing materials should be verified by qualified individuals several times per year.

Facilities Management at Outsourcing Centers

Shops that outsource portions of their infrastructure services—co-location of servers is an example—often feel that the responsibility for the facilities management process is also outsourced and no longer of their concern. While outsourcers have direct responsibilities for providing stable physical environments, the client has an indirect responsibility to ensure this will occur. During the evaluation of bids and in contract negotiations, appropriate infrastructure personnel should ask the same types of questions about the outsourcer’s physical environment that they would ask if it were their own computer center.

Assessing an Infrastructure’s Facilities Management Process

The worksheets shown in Figures 18-1 and 18-2 present quick and simple methods for assessing the overall quality, efficiency, and effectiveness of a facilities management process. The first worksheet is used without weighting factors, meaning that all 10 categories are weighted evenly for the assessment of a facilities management process. Sample ratings are inserted to illustrate the use of the worksheet. In this case, the facilities management process scored a total of 25 points for an overall nonweighted assessment score of 63 percent. The weighted assessment score is coincidentally an identical 63 percent based on the sample weights used on our worksheet.

Figure 18-1. Sample Assessment Worksheet for Facilities Management Process

image

image

Figure 18-2. Sample Assessment Worksheet for Facilities Management Process with Weighting Factors

image

image

One of the most valuable characteristics of these worksheets is that they are customized to evaluate each of the 12 processes individually. The worksheets in this chapter apply only to the facilities management process. However, the fundamental concepts applied in using these evaluation worksheets are the same for all 12 disciplines. As a result, the detailed explanation on the general use of these worksheets presented near the end of Chapter 7, “Availability,” also applies to the other worksheets in the book. Please refer to that discussion if you need more information.

Measuring and Streamlining the Facilities Management Process

We can measure and streamline a facilities management process with the help of the assessment worksheet shown in Figure 18-1. We can measure the effectiveness of a facilities management process with service metrics such as the number of outages due to facilities management issues and the number of employee safety issues measured over time. Process metrics—for example, the frequency of preventative maintenance and inspections of air conditioning, smoke detection, and fire suppression systems and the testing of uninterruptible power supplies and backup generators—help us gauge the efficiency of this process. And we can streamline the facilities management process by automating actions such as notifying facilities personnel when environmental monitoring thresholds are exceeded for air conditioning, smoke detection, and fire suppression.

Summary

A world-class infrastructure requires stable and reliable facilities to ensure the continuous operation of critical equipment to process critical applications. We began this chapter with a definition of facilities management built around these concepts. Maintaining a high-quality production environment requires familiarity with numerous physical elements of facilities management. We listed and discussed the most common of these. The diversity of these elements shows the scope of knowledge desired in a facilities management process owner.

Next we discussed the topic of selecting a process owner, including alternatives that some shops use in the placement of this key individual. One of the prime responsibilities of a process owner is to identify and correct major physical exposures. We offered several sources of information to assist in evaluating an infrastructure’s physical environment and identified several of the risks common to data centers. The final part of this chapter presented customized assessment sheets used to evaluate an infrastructure’s facilities management process.

Test Your Understanding

1. The lines of authority between IT and its facilities department will vary from shop to shop depending on size, platforms, and degrees of outsourcing. (True or False)

2. The only way to eliminate static electricity is to remove conducting materials such as carpeting, clothing, draperies, and other noninsulating fibers. (True or False)

3. All of the following are major elements of facilities management except for:

a. backup generator

b. backup tape library

c. fire-suppression system

d. backup UPS batteries

4. Most shops do a good job of periodically testing their backup electrical systems, but do less than a good job testing ____________ and ____________ systems.

5. What factors would you consider in locating a data center within a large building?

Suggested Further Readings

2. Build the Best Data Center Facility for Your Business (Networking Technology; 2005; Alger, Douglas; Cisco Press

3. Association for Computer Operations Managers (AFCOM); http://www.afcom.com

4. Designing a Safe House for Data.(data centers); Security Management; 2005; Reese, Lloyd F.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.253.161