Providing Data for an SLA

Let’s make a list of what users really care about when it comes to their network applications.

What Users Care About:

response time for interactive transactions

throughput for file transfers and print jobs

high availability

ease of use

convenience

Now let’s make another list of things users don’t care about as they use their day-to-day network applications:

What Users Don’t Care About:

network backbone utilization

percent error rate

percent packet loss

ping round trip time

Now isn’t that interesting? The very things that network managers routinely measure are of no concern to the users. SNMP provides dozens of performance metrics, none of which relate directly to the user experience. This is because SNMP was designed to manage networks, not applications.

Network managers may take the position that they provide error-free network bandwidth and that application response time is the business of server administrators. Application response time can be measured at the application server if the code is instrumented with ARM. But the transaction response time seen by the user is the sum of client-side response time, server-side response time, and network latency at each network location between client and server. No wonder there is so much fingerpointing when a user complains about performance.

Network and system administrators alike know that capacity headroom is critical for good performance. That’s why they measure utilization and that’s why an SLA should contain an agreement about this metric. Remember that an SLA is a tool used by the end-user community and their IT service provider to come to an agreement. It is not a monthly excuse to complain, point fingers, exercise political agendas, and take down names.

So you see that an SLA must be “as simple as possible and no simpler” [1] and be based on measurable quantities. For example, it may be agreed upon that the IT department will engineer the connection to the Internet such that utilization is below 50% for 90% of the time as long as there are fewer than 100 active users. Utilization data is easily obtained from the router via SNMP. The number of active users isn’t directly measurable with SNMP. Indeed, the measurement is a difficult one to make properly, but it can be harvested from the firewall proxy server log or from the router via its IP accounting feature. Every month at the SLA meeting a simple graph is presented, such as in Figure 9-3.

[1] This quote is attributed to Albert Einstein.

Figure 9-3. A sample chart used in an SLA.

The IT department and the user community have agreed that 90% of the time the line utilization to the Internet will not exceed 50% as long as there are less than 100 active users. Just by graphing this data, you see that Friday is the busiest day of the week and the last day of the month is also the busiest. Special business days often bring out more users when they use a different application mix.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.55.20