Chapter 10. Could They Do It?: Real User Monitoring

A company once ran a beautiful monster of a marketing campaign.

The campaign was an attempt to drive traffic to its e-commerce site. Shortly after launching the campaign, sales dropped by nearly half. Synthetic tests suggested everything was fine. Web analytics reported an increase in visits, but a huge drop in conversions across every visitor segment. It looked like the campaign had appealed to a large number of visitors who came to the site but didn’t buy anything.

Management was understandably annoyed. The official response amounted to, “Don’t ever do that again, and fire the guy who did it the first time.”

Fortunately, one of the company’s web operators was testing out new ways of monitoring end user performance at this time. He noticed something strange: a sudden spike in traffic, followed by the meltdown of much of the payment infrastructure on which the site depended. This payment system wasn’t part of the synthetic tests the company was running.

As it turned out, the company had hit upon an incredibly successful promotion that nearly killed the system. So many people were trying to buy that the checkout page took over 20 seconds, and often didn’t load at all. Nearly all of the visitors abandoned their purchases. Once the company responded by adding servers, upgrading the payment system, and fixing some performance bottlenecks, they tripled monthly revenues.

It’s one thing to know your site is working. When your synthetic tests confirm that visitors were able to retrieve a page quickly and without errors, you can be sure it’s available. While you know it’s working for your tests, however, there’s something you don’t know: is it broken for anyone, anywhere?

Just because a test was successful doesn’t mean users aren’t experiencing problems:

  • The visitor may be on a different browser or client than the test system.

  • The visitor may be accessing a portion of the site you’re not testing, or following a navigational path you haven’t anticipated.

  • The visitor’s network connection may be different from that used by the test for a number of reasons, including latency, packet loss, firewall issues, geographic distance, or the use of a proxy.

  • The outage may have been so brief that it occurred in the interval between two tests.

  • The visitor’s data—such as what he put in his shopping cart, the length of his name, the length of a storage cookie, or the number of times he hit the Back button—may cause the site to behave erratically or to break.

  • Problems may be intermittent, with synthetic testing hitting a working component while some real users connect to a failed one. This is particularly true in a load-balanced environment: if one-third of your servers are broken, a third of your visitors will have a problem, but there’s a two-thirds chance that a synthetic test will get a correct response to its HTTP request.

In other words, there are plenty of ways your site can be working and still be broken. As one seasoned IT manager put it, “Everything could be blinking green in the data center with no critical events on the monitoring tools, but the user experience was terrible: broken, slow, and significantly impacting the business.” To find and fix problems that impact actual visitors, you need to watch those visitors as they interact with your website.

Real user monitoring (RUM) is a collection of technologies that capture, analyze, and report a site’s performance and availability from this perspective. RUM may involve sniffing the network connection, adding JavaScript to pages, installing agents on end user machines, or some combination thereof.

RUM and Synthetic Testing Side by Side

For this book, we’re using a simple distinction between synthetic testing and RUM. If you collect data every time someone visits your site, it’s RUM. This means that if you have 10 times the visitors, you’ll collect 10 times the data. On the other hand, with the synthetic testing approaches we saw in the previous chapter, the amount of data that you collect has nothing to do with how busy the site is. A 10-minute test interval will give you six tests an hour, whether you have one or a thousand visitors that hour.

Here’s a concrete example of RUM alongside synthetic data. Figure 10-1 shows page requests to an actual website across an hour. Each dot in the figure is an HTTP GET. The higher the dot, the greater the TCP round-trip time; the bigger the dot, the larger the request.

A scatterplot of page requests over time in Coradiant’s TrueSight, showing relative TCP round-trip time

Figure 10-1. A scatterplot of page requests over time in Coradiant’s TrueSight, showing relative TCP round-trip time


While requests happen throughout the hour for which the data was collected, there are distinct stacks of dots at regular intervals. These columns of requests, which occurred at five-minute intervals, are actually synthetic tests from the Alertsite synthetic testing service, coming from Australia, Florida, and New York.

Figure 10-2 highlights these five-minute intervals. The tests from each of the three locations have “bands” of latency—tests from Australia had the highest round-trip time, as we’d expect. Notice that there would have been no data on Australia without synthetic data. Also notice that the only way to discover the excessively large request (the big dot) was to watch actual visitors—there’s no way for a synthetic test to detect this. Finally, notice the dozens of requests that happen in those five minutes—an eternity of Internet time.

The same scatterplot in , with synthetic tests identified

Figure 10-2. The same scatterplot in Figure 10-1, with synthetic tests identified


Synthetic tests give you an idea of what users might experience, but RUM tells you what actually happened—whether users could accomplish the things they tried to do. In this respect, RUM is the natural complement to web analytics. However, you cannot use RUM on its own: it’s useless if users don’t visit the site, because there are no visits to analyze.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.143.207