Chapter 5. The Problems Short-Listed

How did these mainframe shops attain and maintain 99.9+ percent availability? Think about this question for a moment. How did the mainframe shops work so effectively compared to today's environments in which the client/server centers have every redundant component built into their systems? Anyone who purchases new Unix or NT servers is usually ordering these new machines with back-up power supplies, double power strips, extra RAID disks, etc. Just based on the product specifications alone, the newer systems should never, ever go down because they are double or triple packed. And most of the shops we visit gobble up this technology (with a healthy price tag) at an alarming rate. So why is client/server RAS non-existent? What's going on? The hardware is there. This isn't rocket science. The mainframe survived and is still excelling because of disciplines (organization structure, processes, metrics, etc.) which are enforced by the people who maintain the mainframes. The uptime isn't a result of the mainframe boxes themselves. The availability is caused by the mainframe support staff knowing what they are doing and doing it well. They have disciplines that should be used throughout the enterprise.

With all this technology available for redundancy comes complexity. The good news is that today we have more choices of components to use; call it "open systems." This means that we chose our databases, our utilities, our network vendors, and our operating systems. The bad news is that often these vendors' products don't easily plug and play together. Vendors say their products work well together. But when you need to make sure the versions are in synch, well, that's another matter. It's mainly a problem of making sure that the releases work well together which is one more reason to adopt key processes and deal with the people issues. If you rely on the technology only, which is what most IT shops do today, you will surely fail.

We recently visited a shop where the most critical system in the shop was down for over 24 hours and it started with a simple hardware problem. (Bad memory was the fault.) The hardware and software configuration had full fail-over capability. They were able to switch systems. They fixed the problem but as they switched back it corrupted a bunch of tables. It just went down hill from there. They needed to restore the database with the most recent backup. That tape was corrupted. They had to go back several weeks to find a good backup. There were no backup processes and procedures to test for integrity. There were no disaster-recovery procedures or periodic tests to restore data. To make a very long story short, there wasn't a tape librarian function. Instead of being down for a few hours, they were down for 24 hours. When the vendor sold the customer this multimillion-dollar solution, they guaranteed continuous operation.

Why is everyone in IT turning his or her back on an environment that supported mission critical like no other? We're referring to the mainframe world. Why does everyone want to reinvent the wheel or, as in most of the companies we studied, not invent the wheel at all? In this section we take a closer look at the most critical non-organizational- related issues with recommendations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.163.175