How the SAP Solution Vision Drives TCO

Total cost of ownership analysis seeks to measure the life cycle costs of a particular solution stack—the end-to-end complete costs incurred to own and operate a mySAP solution over its useful life. Thus TCO serves to highlight the relationship between cost and performance, illuminating how quickly a particular solution can claim a return on investment.

Returning to Chapter 3 and the various solution characteristics that need to be envisioned and planned for in advance, a number of characteristics were discussed. I have collapsed these into a few key areas or considerations:

  • High Availability

  • Disaster Recovery

  • Performance

  • Scalability

  • Security, Manageability, and other Operations areas

Throughout the next few pages, I will focus on these areas as each pertains to the SAP Solution Stack from a technology perspective. Later, I will identify people and process considerations inherent to each layer in the stack, like ongoing operations, systems management, and other processes that are subject to continuous improvement, therefore lending themselves to reducing or increasing TCO.

The Impact of High Availability Requirements on TCO

High-availability (HA) requirements refer to the need of your SAP solution to suffer from only a limited amount of unplanned downtime. In other words, the higher the level of HA, the more available a solution is to its end users. This availability is often expressed in percentages related to the total number of minutes available in a year. Over time, these percentages have been labeled, giving us the infamous “3 nines,” “4 nines,” and “5 nines” of availability (which equate to 99.9%, 99.99%, and 99.999% availability, respectively).

Generally, the higher the level of availability required by the business, the more costly it becomes to procure, implement, and support the system in question. And the relationship between cost and high availability is not linear; rather, it grows exponentially as we strive to achieve something closer and closer to 100% availability, as you see in Figure 5.3.

Figure 5.3. Small incremental percentage gains cost more and more as we inch closer to 100% availability.


Figure 5.4 really puts this into perspective. As we move from 99.99% availability to the famous “5 nines” of availability, the incremental cost to the business to achieve another 47 minutes of availability is $2.9 million, nearly five times the cost required to jump from 3 nines to 4 nines ($590K). And yet in terms of raw numbers, we only add a fraction of a percentage to our availability targets—a number measured in mere minutes. For businesses that lose millions of dollars for every minute of unplanned downtime they incur, five nines is the way to go, of course. In my experience, though, the incremental cost is usually not worth the nearly negligible difference in uptime.

Figure 5.4. For this fictional enterprise, note the small amount of time or availability gained as we increase our SAP solution’s availability from 99.9% to 99.999%.


I have always told my customers that I can provide them pretty much any level of availability they’d like—the only issue is money, of course. Many of my customers would commence our initial discussions on high availability by telling me that their business demanded “the highest levels of availability” or “little to no downtime.” Many of these same companies had failed to do the “5 nines of availability” math, however. Their need for this level of availability was little more than a perceived requirement, a desire. After we worked out the budget and ROI numbers together, it was amazing to watch the perceived business requirements nose dive to embrace less expensive technology solutions more representative of the real business’s needs.

I like to structure the end result of these types of ROI calculations such that they reflect how much the business will pay for that extra few hours or minutes of availability. The real-world numbers have been pretty significant in my experience, in one case running up to another $500,000 over the three-year life cycle of the project for seven hours less unplanned downtime per year. In other words, giving 21 hours back to the business would cost the business nearly $24K per hour. I then had to pose this question to the customer: “Will you suffer more than $500,000 in lost revenue or productivity if you are down an incremental seven hours a year?” In this one particular case, the cost of downtime escalated pretty quickly with every hour of downtime incurred by my customer, and they indeed required the solution. Most of the time though, in my experience, the answer is usually more like “Oh! I guess not” and my TCO-educated client settles for a lower level of availability that may not be the highest, but is good enough for the business.

Good enough—what a great phrase! In these days of redundant and clustered everything, companies often seem to forget that going after that extra wall-clock tick of availability can really add up, and yet not significantly impact their ability to do business. Good enough is all about compromising, settling for 95% of what you desire, but paying only 50% of the cost of a solution that could satisfy 100% of your desires. In so many cases, for so many SAP implementations, it’s the right attitude. The decision boils down to identifying business requirements versus desires. As I’ve said before, just do the math. There’s nothing like a logical discussion around ROI, TCO, and HA to convince a business that good enough will serve most companies quite well (not to mention leaving a little money in the corporate budget for things like Go-Live bonuses!).

Disaster Recovery Requirements That Drive TCO

Similar to High Availability discussions, determining a company’s Disaster Recovery (DR) requirements also boils down to how much it costs the organization when the SAP system is unavailable. But where HA addresses timelines of minutes and hours, DR discussions focus most often on hours and days.

To learn about Disaster Recovery options in depth, seeDetermining Your Required Level of Disaster Tolerance,” p. 167 in Chapter 6.

Figuring out the real cost of downtime after an extended period of time—days—can be quite complicated. Consider the following:

  • Business processes must usually be capable of failing over to a completely different physical location. Therefore, the technology ramifications are huge, to the point of requiring completely redundant data centers or hosting sites in some cases.

  • Ownership issues are huge, too—every member of the SAP TSO needs to understand exactly what their team is responsible for providing in the event of a disaster, and staff/plan for this accordingly.

  • Communication issues are therefore paramount. A communications plan must be developed, tested, and continually retested as the solution stack evolves over time. Similarly, communication vehicles like escalation plans and even system current-state/as-is and process documentation must be maintained and updated at both the primary and DR sites, and tested regularly.

  • Technology issues abound, including backup/restore concerns, data synchronization between the primary and DR site, access to the DR site by the system’s end users, access to the site by other computing systems (integration touch points), day-to-day operations/management of the DR site in the event of a disaster, and issues with how to fail back over to the primary site (or another site) after the disaster has subsided.

  • Additional people-issues exist, too, like determining contingency and other backup plans should key people be unavailable to perform their duties in the event of a disaster. Here, the importance of consulting agreements (that is, “consulting on demand” or “reactive services” contracts) is underscored, as is training a backup for each key role, maintaining excellent documentation, and so on.

All of these issues impact cost, and therefore present opportunities to minimize or increase TCO. Most often, my involvement in Disaster Recovery-related TCO exercises has amounted to doing the math between different DR alternatives. In some cases, the math supports building redundant data centers. More often, though, establishing a mini-data center or putting into place a support agreement with a third party to provide such an environment is more appropriate.

In the best scenarios, a customer is able to distribute their SAP environment across two existing sites, though. For example, it’s quite common in my experience to house the Production and Development systems at one physical location, and a Staging or Test/QA environment at another physical location. In this way, the site with the Staging or Test/QA system becomes the de facto DR site. This is generally an excellent approach to addressing disaster recoverability. Consider the following, however:

  • The system that takes on the role of the DR system must be sized appropriately, based on what the business considers “appropriate” performance after a failover. It is not uncommon to size a DR system for half or perhaps three-quarters of the concurrent SAP end users typically hosted by the production system. In some cases, though, a DR system capable of hosting all production users, batch processes, and so on is mandated by the business. And in worst-case situations, both the Production system and a fully-functioning Staging or Test/QA system need to be available concurrently, even in the event of a failover.

  • In a similar manner, the high availability built into the DR system must be considered. That is, the DR system may need to be clustered or configured with redundant components to achieve a certain level of availability should it become the Production system for an extended period of time.

  • Along with maintaining the system for its primary purpose of testing or quality assurance, this system must also be kept up-to-date from a DR perspective, consistent with the service-level agreements and other requirements of the business. Therefore, this could entail anything from weekly SAP client refreshes to 15-minute database snapshots replicated across a WAN or other network link, in addition to the load already placed on the system.

  • SAP end-user access to the DR site must be addressed, because a DR site is no good to anyone if no one can access it. This often involves redundant network links between the client public networks and the DR site. In the best of cases, separate carriers (for example, AT&T and Southwestern Bell) are employed, to avoid single points of failure specific to the carriers themselves. But most often the real challenge here is more a matter of explaining to the end users how to get to the production system should it failover to the other site—which SAP logon group to use, or special DR-only SAPGUI icon to double-click, or ITS server to access, and so on.

  • The location of each site is critical. For example, there is more risk of a single disaster taking down both the primary and DR sites if they are within a few minutes of each other rather than if the sites are separated by a hundred miles or so. This location issue is common in campus or single-building environments, where the backup site might be only a mile (or a few floors) away from the primary site. Of course, the economics of such a decision are easy to understand—a DR site that is close by will probably be easier to manage and maintain. My point is this, though—identify the lack of critical “distance” between the two sites as a risk, and either mitigate or minimize this risk.

When it comes to Disaster Recovery and TCO, it’s important to focus on the business areas that will cripple the business if interrupted or unavailable. Therefore, core transactional systems responsible for generating revenue, maintaining minimum levels of customer service, and keeping the production lines rolling are generally most often addressed. Reporting systems, internal procurement systems, and other such functions typically fail to garner enough business support to cover the expense of a dedicated DR system or site.

In addition to focusing on the core business areas, it’s also wise to establish exactly what it means to say that a business function is interrupted or unavailable. Does this mean eight hours? Three days? A week? Bottom line, when does unavailability become unacceptable? The answer to this question drives failover timelines, impacts service level agreements, and more.

And finally, as you see in Figure 5.5, the most effective TCO Disaster Recovery analyses benefit from a sound costing model or baseline—such a baseline should reflect how much revenue or productive time may be lost prior to moving business processes over to the DR site. And it should also cover time as a function of dollars, so that the relationship between the two clearly illustrates how losses grow over time.

Figure 5.5. The relationship between time and dollars can help characterize and justify the relative importance of a DR solution, as well as serve in delta comparisons.


With this information, you naturally have a better understanding as to how long the business can actually tolerate disasters, including different levels or thresholds of pain. Such a keen understanding will serve to further refine the DR solution/approach down the road, as cheaper and more capable technology solutions continue to evolve and therefore rate delta TCO analyses in their own right. In my opinion, this whole area is probably one of the best places for something beyond a simple delta analysis; understanding the complete DR picture can conceivably mean the difference between surviving a disaster and going out of business a few months later. And with so many people and process considerations, building a good technical solution is simply not good enough.

Performance Requirements and TCO

Like so many other areas of TCO analysis, the more performance sought, the larger the IT budget typically needs to be. The key lies not so much in understanding this, though, as in understanding how to measure performance such that it can be factored into a TCO performance delta analysis of solution stack options. In the past, I have characterized SAP performance in terms of

  • End-user response times as related to average and peak periods (like end-of-month peaks, or peaks observed during a seasonal cycle).

  • Dialog steps processed during the peak hour (as a reminder, dialog steps represent units of work in SAP, where one dialog step equates to a user pressing the Enter key or otherwise completing a transaction against the database).

  • Average number of fully completed transactions processed in an hour (like dialog steps, this is another way of measuring work completed by the SAP system).

  • Average number of concurrent processes executing in the system while it’s under load (easily gathered point-in-time by SAP CCMS transaction SM66, or historically via any number of SAP-aware management applications).

  • Average disk queue length while under load (captured via PerfMon, similar UNIX-based utilities, or most SAP-aware management applications). This kind of performance measurement is most applicable to changes in the disk subsystem or database layers of the SAP Solution Stack.

  • Average CPU utilization while under load (again, using PerfMon or a similar UNIX utility), most applicable to changes in server infrastructure, but also valuable when new disk subsystems are introduced (to observe how a bottleneck moves from the disk subsystem to the server, for example, due to a faster subsystem).

After a measurement is embraced, it’s fairly easy to redefine it in preparation for a TCO exercise, in terms of dollars. Thus, transactions per hour becomes “x$ per transaction,” “% CPU utilization per dollar,” and so on.

Further, it’s usually easier to perform a delta analysis rather than a full-blown performance analysis. Some of the most common areas of TCO analysis when it comes to performance involve testing new versions of SAP, new disk subsystems, and configuration changes to existing systems. Another area might include a change in operating system, or the application of a Service Pack or patch to a particular layer in the stack, or the impact of installing multiple ITS instances on a single server. In all of these cases, testing the before and after scenarios simply makes the most sense. And it simplifies both testing and success criteria, as absolute numbers tend to not be as important as the delta between the two.

How Scalability Impacts TCO

The need for scalability in an SAP system also impacts total cost of ownership. Scalability is historically addressed by purchasing more than you need when it comes to hardware resources. In this way, headroom is available should it become necessary, for example, in the event of heavier-than-usual end-of-month processing or after new processor- or disk-intensive functionality is added to the system. These “in the box” scalability considerations include

  • Buying a server with additional CPUs, RAM, or disks, in effect supersizing the solution to address unknown what-if needs at a later date. In the meantime, such an approach will naturally provide better-than-expected performance of the current solution as well.

  • Buying a server capable of scaling or growing in terms of the number and speed of CPUs, RAM, disk capacity, I/O slots, and so on—without actually adding these components. In this way, the oversized system is ready to grow with only minimal downtime required. Unfortunately, you might wind up buying a more capable platform than you ever need.

  • Buying a disk subsystem or supporting infrastructure capable of easily housing more data, or supporting rapid expansion of capabilities, instead of buying the exact capacity required for the time being. Such incremental capacity might include a SAN’s switched fabric containing a higher available port density than currently required, or a disk subsystem with a few extra empty disk shelves (and therefore ready to add disk drives).

Scalability can also be addressed by virtue of the architecture of the solution. In a classic example, it’s common to see a large number of relatively cheap servers employed as SAP application servers, rather than deploying fewer larger but more expensive boxes. In other cases, clustered resources are introduced into the SAP system landscape, providing for both improved scalability and availability.

Scalability goes way beyond hardware, though. Even the OS, database, and SAP application components can benefit from buying more than you need. I have many customers who have implemented Windows-based operating systems as the foundation for their SAP systems, for instance. The capabilities of different versions within the same family of operating systems differ, though—to address greater than 2GB of RAM per process, NT 4.0 Enterprise Edition, or Windows 2000 Advanced Server is required, for example. Of course, these more capable versions are priced at a premium compared to their less capable brethren. Customers who make the decision to purchase the more capable versions without a current need for the capabilities inherent to that version are actually investing in scalability.

What all of the aforementioned cases have in common boils down to a simple delta TCO analysis, and the benefits it can provide. Each scenario includes an opportunity to spend more money on something that you may or may not actually need. Depending upon the raw dollars at stake, a quick comparison between the two cost models can make a lot of sense, though.

As you will see in Chapter 7, these kinds of scalability considerations play a common role in sizing the SAP solutions within your system landscape. It is insane to buy exactly what you think you need, because invariably a need will evolve over time that was never addressed during the SAP vision phase—appropriate attention to scalability and sizing a system for a period of time (that is, a three- or four-year life cycle, taking into account growth in terms of users, database, and so on) can mitigate this risk.

Other SAP Solution Vision TCO Drivers

Other SAP solution vision drivers impact the solution’s total cost of ownership. The most prevalent are security, manageability, and operations. That is, the design and architecture of a particular SAP solution may cost substantially more (or less), depending on how these areas are addressed. The tightest security constraints, for example, will dictate deployment of firewalls, lock-down of services and ports, implementation of virus protection, and so on. Each has a particular cost that must be factored in to the total solution cost. And because a variety of firewalls and virus protection methods exist, delta analyses may be appropriate as well between different competing solutions.

A need for implanting the most manageable system, or system that is inherently “operations-friendly,” will also impact TCO. There are a multitude of management/operations approaches and software packages on the market for SAP, for example. Some are operating system- or database-specific, not to mention mySAP component-specific. Others are supported only on a particular hardware vendor’s product line. Still others vary in how much time is required to learn the products, whereas others are more difficult to work with when it comes to managing changes in the landscape.

For detailed Operations and Systems Management information, seeSystems Management Techniques for SAP,” p. 511 in Chapter 14.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.62.197