Chapter 2. Tooling

“Measure twice, cut once.”

Carpenter’s adage

This chapter considers tool selection. Tools help you understand the current performance of your digital applications—both in ideal conditions and to end users. Such understanding is required on two bases: absolute, and relative to key competitors and other mass market so-called “bellwether” sites such as Facebook, BBC, and CNN—wherever your customers are when they are not on your site.

Tools also let you see how the individual components that make up the overall customer experience of your site are performing—images, multimedia, logic (JavaScript), third-party affiliates, etc. Finally, tools capture the detailed measurements needed to inform core analytics on frontend performance, leading to identification of root-cause issues and ultimately to improved performance.

I will not compare specific vendor offerings, but rather will explain the various generic approaches and their strengths and weaknesses. Success in this field cannot be achieved by a one-size-fits-all approach, no matter what some would have us believe!

Introduction to FEO Tools

I will provide a summary of available tool types (see “Relevant Tool Categories for FEO”) and then a structured FEO process (see Chapter 3). Before doing so, let’s start with some high-level considerations. This book assumes an operations-centric rather than developer-centric approach. Certainly, the most robust approach to ensuring client-side performance efficiency is to bake it in from inception, using established “Performance by Design” principles and cutting edge techniques. However, because in most cases, “I wouldn’t have started here” is not exactly a productive recommendation, let’s set the scene for approaches to understanding and optimizing the performance of existing web applications.

So, tooling. Any insights gained will originate with the tools used. The choice will depend upon the technical characteristics of the target (e.g., traditional website, Single Page Application, PWA/WebApp, Native Mobile App), and the primary objective of the test phase (covering the spectrum from [ongoing] monitoring to [point] deep-dive analysis).

Note

I will use examples of many tools to illustrate points. These do not necessarily represent endorsement of the specific tools. Any decision made should include a broad consideration of your individual needs and circumstances.

Gaining Visibility

The first hurdle is gaining appropriate visibility. However, it must be noted that any tool will produce data, but the key is effective interpretation of the results. This is largely a function of knowledge and control of the test conditions.

Two Red Herrings

A good place to start in tool selection is to stand back from the data and understand the primary design goal of the tool class. As examples, consider two tools, both widely used, neither of which is appropriate to FEO work even though they are superficially relevant.

Firstly, let’s consider behavioral web analytics, such as Google Analytics. Some of these powerful, mass-market products certainly will generate some performance (page response) data. They are primarily designed for understanding and leveraging user behavior, not managing performance. Still, the information that such tools provide can be extremely useful for defining analysis targets both in terms of key transaction flows and specific cases (e.g., top-ranked search engine destination pages with high bounce rates). They are, however, of no practical use for FEO analysis. This is for several detailed reasons but mainly because the reported performance figures are averaged from a tiny sample of the total traffic, and granular component response data is absent.

Secondly, consider functional/cross-browser test tooling, like Selenium. These are somewhat more niche than behavioral analytics, but they certainly add considerable value to the pre-launch testing of applications, both via device emulation and real devices. All testing originates in a few (often a single) geographic location, thus introducing high and unpredictable latency into the testing. This tooling class is excellent for functional testing, which is what it is designed to do. Different choices are required for effective FEO support.

Key Aspects of FEO Practice

As we will see when considering process, FEO practice in operations essentially consists of two aspects. One is understanding the outturn performance to external end points (usually end users). This is achieved through monitoring: obtaining an objective understanding of transaction, page, or page component response from replicate tests in known conditions, or of site visitors over time. Monitoring provides information relative to patterns of response of the target site or application, both absolute and relative to key competitors or other comparators.

The other aspect is analysis of the various components delivered to the end-user device. These components fall into three categories: static, dynamic, or logic (JavaScript code). Data for detailed analysis may be obtained as a by-product of monitoring, or from single or multiple point “snapshot” tests. Component analysis will be covered in a subsequent section (see “Component-Level Analysis”).

What Is External Monitoring?

External monitoring may be defined as any regular measurement of application response time and availability from outside the edge servers of the delivery infrastructure. There are broadly two types of external monitoring approach: synthetic, which relies on the regular automated execution of what are effectively functional test scripts, and passive (also known as Real User Monitoring, or RUM), which relies on the capture or recording of visitor traffic relative to various timing points in the web application code.

Note

It is useful to think of FEO as an extension activity supported by specifically targeted monitoring but undertaken separately to “core” production monitoring.

Production monitoring is typically characterized by ongoing recording and trending of defined key performance indicators (KPI); see reference to my detailed treatment of this subject in “Securing Gains with Ongoing Monitoring and KPI Definition”. These are most effectively used to populate dashboards and balanced scorecards. They provide an extremely useful mechanism for understanding system health and issue resolution, and are often supported by Application Performance Management (APM) tooling.

Relevant Tool Categories for FEO

So what are the relevant categories of frontend test tooling? The following does not seek to provide a blow-by-blow comparison of the multiplicity of competitors in each category; in any case, the best choice for you will be determined by your own specific circumstances. Rather, it is a high-level category guide. As a general rule of thumb, examples of each category will ideally be used to provide a broad insight into end-user performance status and FEO. Modern APM tools increasingly tick many of these boxes, although some of the more arcane (but useful) details are yet to appear—beware the caveat (see “APM Tools and FEO: A Cautionary Note”)!

As outlined in the next section, tools for monitoring external performance fall into two distinct types: active or passive. Each are then covered in more detail.

Tooling Introduction

Following is a high-level introduction to the principal generic types of tooling used to understand web application performance and provide preliminary insights for use in subsequent FEO. Tools fall into two main categories, which will be discussed in more detail in subsequent sections. Open source options do exist in each category, although for a variety of technical reasons, these are often best reserved for the more experienced user—at least when undertaking detailed optimization work.

Firstly, synthetic monitoring. This has several subtypes, not all of which may be provided by any given vendor. The principal test variants are:

  • Backbone (primary-ISP-based) testing, either from individual Tier 1 (or quasi T1) data centers such as Verizon, British Telecom, or Deutsche Telecom or from an Internet Exchange Point (such as the London Internet Exchange [LINX]). The latter provides low-latency multiple tests across a variety of carriers.

  • Cloud-based for comparison of relative CDN performance.

  • Private peer locations, which can be any specific location where a vendor test agent has been installed. Typically, these are inside a corporate firewall (e.g., sites such as customer service centers or branch offices), although they could include testing from partner organizations, such as an insurance company underwriting application accessed by independent brokers. In theory, such testing could involve Internet of Things (IoT) devices or customer test panels (e.g., VIP users of betting and gaming sites).

  • End user testing from test agents deployed to consumer grade devices, connected via so-called “last mile” (e.g., standard domestic or commercial office) connections. Depending upon the technology used, these can vary between “true” end users recruited from the general population in a given country or region, Private Peer testing (see above) or quasi end-user testing from consumer grade devices over artificially modelled connection speeds. WebPageTest provides a good open source example of the latter.

The second type of tooling is passive, visitor, or real-user monitoring (RUM):

  • The performance analysis of incoming traffic by reporting of individual or grouped user responses to a variety of timing points in the page delivery process.

  • Performance metrics are associated with other user-device related information, such as:

    • Operating system

    • Screen resolution

    • Device type

A subtle variant of RUM is end-user experience monitoring (EUM):

  • EUM is essentially RUM (i.e., it’s a synonym used by some vendors), but note the distinction between experience in this sense (that is, speed of response) and behavioral-based end-user experience tools and techniques such as click-capture heat maps (see Figure 3-1). The latter are more associated with design-led behavior and support a separate category of tools, although heat-map-type outputs are increasingly being incorporated into RUM tools.

Active (Synthetic) Tooling

The term active (sometimes called synthetic or heartbeat monitoring) is used to describe testing that works by requesting information from the test application from a known, remote situation (data center or end-user device) and timing the response received.

Active Monitoring: Key Considerations

Active (aka synthetic) monitoring involves replicate testing from known external locations. Data captured is essentially based on reporting on the network interactions between the test node and the target site. The principal value of such monitoring lies in the following three areas:

  • Understanding the availability of the target site.

  • Understanding site response/patterns in consistent test conditions; for example, to determine long-term trends, the effect of visitor traffic load, performance in low-traffic periods, or objective comparison with competitor (or other comparator) sites.

  • Understanding response/patterns of individual page components. These can be variations in the response of the various elements of the object delivery chain—DNS resolution, initial connection, first byte (i.e., the dwell time between the connection handshake and the start of data transfer over the connection, which is a measure of infrastructure latency), and content delivery time. Alternatively, the objective may be to understand the variation in total response time of a specific element, such as third-party content (useful for Service Level Agreement management).

Increasingly, modern APM tools offer synthetic monitoring options. These tend to be useful in the context of the APM (i.e., holistic, ongoing performance understanding), but more limited in terms of control of test conditions and specific granular aspects of FEO point analysis such as Single Point of Failure (SPOF) testing of third-party content. Although it may sound arcane, this is a key distinction for those wishing to really get inside the client performance of their applications.

In brief, the key advantages of synthetic tooling for FEO analysis are these:

  • Range of external locations – geography and type

    • Tier 1 ISP/LINX test locations; end-user locations; private peer (i.e., specific known test source)

    • PC and mobile (the latter is becoming increasingly important)

  • Control of connection conditions—hardwired versus wireless; connection bandwidth

  • Ease and sophistication of transaction scripting—introducing cookies, filtering content, coping with dynamic content (popups, etc.)

  • Control of recorded page load end point (see “What Are You Measuring? Defining Page-Load End Points”), although this also applies to RUM if custom markers are supported by the given tool. As a rule of thumb, the more control the better. However, a good compromise position is to take whatever is on offer from the APM vendor—provided you are clear as to exactly what is being captured—and supplement this with a “full fat” tool that is more analysis-centric (WebPageTest is a popular and open source choice). Beware variable test node environments with this tool if using the public network.

Figure 2-1 is an example of a helpful report that enables comparison of site response between major carriers (hardwired ISPs or public mobile networks). Although significant peerage issues (i.e., problems with the “handover” between networks) are relatively rare, if they do exist, they are:

  • Difficult to determine without such control of test conditions

  • Have the propensity to affect many customers—in certain cases/markets, 50% or more

Synthetic monitoring—ISP peerage report (UK)
Figure 2-1. Synthetic monitoring—ISP peerage report (UK)

End-user synthetic testing. Figure 2-2 is an example from a synthetic test tool. It illustrates the creation of specific (consumer-grade) test peers from participating members of the public. Note the flexibility/control provided in terms of geography and connection speed. Such control is highly advantageous, although it will ultimately be determined by the features of your chosen tool. In the absence of such functionality, you will likely have to fall back on RUM reporting, although bear in mind (as mentioned elsewhere) this is: inferential not absolute, and it will not give you an understanding of availability as it is reliant on visitor traffic.

Creation of end user test clusters in synthetic testing
Figure 2-2. Creation of end user test clusters in synthetic testing

What Are You Measuring? Defining Page-Load End Points

Now, an important word on page-load end points. Traditional synthetic tools rely on the page onload navigation marker. It is essential to define an end point more closely based on end user experience (i.e., browser fill time), as this is what is perceived by the user as page response regardless of what is happening “behind the page.” With older tools, this needs to be done by introducing a flag to the page. This can either be existing content such as an image appearing at the base of the page (at a given screen resolution), or by introducing such content at the appropriate point. This marker can then be recorded by modification of the test script.

Note

Given the dynamic nature of many sites, attempting to time to a visual component can be a short-lived gambit. Introducing your own marker, if you have access to the code, is a more robust intervention.

It is worth exploring whether a tool will support this feature automatically, thus saving a lot of work.

Figure 2-3 illustrates the manual introduction of a test pixel as a timing end point into the source code of a web application should this not be supported as a standard feature within your chosen tool.

“Above the fold” end point; custom insertion of flag image; synthetic testing
Figure 2-3. “Above the fold” end point; custom insertion of flag image; synthetic testing

Some modern tooling has introduced this as a standard feature. It is likely that competitors will follow suit.

Warning

Let me emphasize: using the onload marker will produce results that do not bear any meaningful relationship to end user experience, particularly in sites with high affiliate content loads.

Figure 2-4 illustrates the very high variation in recorded page response depending upon the end point used.

PC page response variation with end point (example)
Figure 2-4. PC page response variation with end point (example)

Modifications of standard testing are required to manage potentially misleading results in specific cases (e.g., server push, Single Page Applications). These are covered in “Emerging Developments”.

Passive (RUM-based) Monitoring Tools

The previous section considered synthetic (active) monitoring of PC-based sites by examining data from replicate “heartbeat” external tests in known conditions. Now let’s consider complimentary monitoring of actual visitor traffic and aspects of mobile device monitoring.

Passive monitoring—also known as Real-User Monitoring (RUM), End User Monitoring (EUM), User Experience Monitoring (UEM)—is based on the performance analysis of actual visitors to a website. This is achieved by manual or (more typically) automatic introduction of small JavaScript components to the web page. These record and return (by means of a beacon) the response values for the page, based on standard W3C navigation metrics—DOM ready time, page onload time, etc. It is worth noting in passing that these are not supported by all browsers, notably older versions of Safari and some others. However, the proportion of user traffic using unsupported versions of non-Safari browsers will probably be fairly negligible today, at least for core international markets.

Figure 2-5 shows a typical RUM dashboard, illustrating near real-time split of visitor traffic by geography, devices, operating system, etc.

A typical RUM dashboard (Credit: AppDynamics)
Figure 2-5. A typical RUM dashboard (Credit: AppDynamics)

Modern RUM tooling increasingly captures some information—subject to certain technical limitations outside the scope of this book—at object level as well (or it can be modified to do so). A useful capability, available in some tools, is the ability to introduce custom end points. If supported, these can be coordinated with appropriately modified synthetic tests (see “What Are You Measuring? Defining Page-Load End Points”), providing the ability to read across between active and passive test results. Figure 2-6 illustrates a useful end user monitoring report. The table shows the variation of mobile visitor traffic response for all the target pages on a site.

A further useful capability in some RUM tools is event timing. Event timing involves the placing of flags to bracket and record specific user-invoked events (e.g., the invocation of a call to a payment service provider as part of an ecommerce purchase).

APM End User Monitoring: mobile visitors by usage and response
Figure 2-6. APM End User Monitoring: mobile visitors by usage and response

Performance APIs

I should include a few words on the use of performance-centric APIs. This includes “traditional” navigation flags—DOM Ready, page unload, etc.—that have been around for a few years now, along with more leading-edge developments, such as the sendBeacon (already referenced regarding monitoring service worker/push content) property, the Event.timestamp property, and others. The only negative to introducing timing APIs in this book is that it moves us across the “dev” spectrum and away from an introduction to day-to-day operations. Failure to exploit them, however, will prove a serious limitation to effective performance practice going forward, so awareness and, if possible, adoption is increasingly important.

Network timing attributes are collected for each page resource. Navigation and resource timers are delivered as standard in most modern browsers for components of “traditional” page downloads. User interaction and more client-centric design (e.g., SPAs), however, require event-based timers. Basic custom timers introduce a timing mark() at points within the page/code. Your RUM tooling should be able to support these, and they enable read-across between different tooling (e.g., using visual end points for user experience endpoints and browser fill times in synthetic measurements). Not all RUM products do support these, however, so this is an important aspect to understand when making a product purchase decision. Other APIs have been developed to support things like image rendering and frame timing, which are important if seeking to ensure smooth user experiences.

Browser support cannot be taken for granted, particularly with the newer APIs. It is important to be aware of which browsers support each method, as you will be “blind” with respect to the performance of users with non-supported technologies. In certain cases (e.g., Opera in Russia, or Safari for media-centric user bases), this can introduce serious distortions to results interpretation.

A useful primer for Web Performance Timing APIs, which also contains links to further specialist information in this evolving area, can be found here. Figure 2-7 shows the extremely valuable “Can I Use” reference site. This provides a regularly updated look up of which browsers currently support a particular performance timing API.

Browser support for resource timing API (Credit: Can I Use, May 2016)
Figure 2-7. Browser support for resource timing API (Credit: Can I Use, May 2016)

Monitoring Mobile Devices

The increasing traffic from mobile device users makes careful consideration of the end user experience a key part of most current FEO efforts. Investigation typically uses a combination of emulation-based (including browser developer tool) analysis, including rules-based screening (e.g., Google’s PageSpeedInsights—see “Rules-based screening”) and real device testing.

The key advantage of testing from real mobile devices as opposed to spoofed user-string/PC-based testing is that the interrelationship between device metrics and application performance can be examined. As discussed in “Active Monitoring: Key Considerations”, ensuring good, known control conditions is essential. This applies to connectivity (bandwidth, SIM/mobile phone network, or WiFi) and device environment. Both are crucial to effective interpretation of results.

Most cross-device tools are designed for functional (or, in some cases, load) testing rather than performance testing per se. This limits their value. The choices are:

  • Limiting investigation to browser-based developer tools.

  • Building/running an in-house device lab with access to presentation layer timings and device system metrics (not as trivial an undertaking as it may seem).

  • Using a commercial tool—these are thin on the ground, but a few exist.

  • Using the real device testing offered by performance vendors. Look before you leap—the devil is in the detail!

Four approaches to understanding the performance of native mobile applications are possible:

  • Consider a commercial tool (if you can find one—they may emerge)

  • Instrument the application code using a Software Developer Kit (this is the approach adopted by the APM vendors). Typically, these are stronger on end-user visibility rather than control of test conditions or range of device metrics. Crash analytics can be useful, if included (not all provide this).

  • Use a Network Packet Capture (PCAP) approach, analyzing the initial download size and ongoing network traffic between the user device and origin. This is the approach taken by the (open source) AT&T ARO tool.

  • Build your own in-house device lab with API hooks into device hardware metrics like CPU utilization, memory, battery state, etc. Figure 2-8 is a trace of CPU utilization (user, kernel, and total) by a mobile device during a repeated (three-phase) test transaction.

CPU utilization trace during test transaction on an Android device
Figure 2-8. CPU utilization trace during test transaction on an Android device

Either way, having defined your control conditions within the constraints of the tool/approach selected, the key aspects are:

  • Timeline: understand the interrelationship between the various delivery components, such as JavaScript processing, image handling, etc., and CPU utilization

  • System metrics when delivering cached and uncached content. These include:

    • CPU—OS (kernel), user, total

    • Memory—free, total

    • Battery state

    • Signal strength

  • Crash analytics

    • Impact of third-party content

Key Categories of Mobile Monitoring

There is a core distinction between emulation and real device testing.

Emulation testing

Emulation testing has the advantage of convenience and the ability to rapidly test delivery across a wide variety of device types. It also uses a consistent, powerful, PC-based platform. This can be useful depending on the precise nature of the testing. Emulation consists of “spoofing” the browser user string such that the request is presented to the target site as a mobile device. Given that it is important to replicate (a range of) realistic user conditions to gain an understanding of actual performance in the field, the most useful tools will permit comparison across a variety of connection types and bandwidths, including hardwired, WiFi, and public carrier network. Figure 2-9 shows a 20%-page-size variation at fixed bandwidth for an Android smartphone over a public wireless carrier. Such variation can be much larger on occasion, and with careful analysis the root cause can usually be pinpointed. Bear in mind that such effects can be introduced by specific carriers as well as (more commonly) by the delivery infrastructure. Failure of third-party content is relatively common at high traffic periods (e.g., betting sites during the Grand National steeplechase in the UK, retail sites during Black Friday weekend) due to the inability of a specific third party to cope with the aggregate delivery demands across the sector.

Page-size variation at fixed bandwidth (public wireless carrier, Android smartphone)
Figure 2-9. Page-size variation at fixed bandwidth (public wireless carrier, Android smartphone)

Many tools (e.g., WebPageTest, browser dev tools, etc.) only offer hardwired connectivity throttled to provide a range on connection speeds. This can be appropriate during deep-dive analysis. It is, however, insufficient for monitoring comparisons. Figure 2-10 illustrates the wide range of “devices” available for emulation-based testing within one market leading browser based developer tool.

Emulation testing: device selection (Chrome Developer Tool)
Figure 2-10. Emulation testing: device selection (Chrome Developer Tool)

Real device monitoring

Testing from real mobile devices has several advantages. Access to the graphical user interface for script recording (supported by some tools) enables visual end-point recording. Transactions may be recorded, not only for websites but also native mobile applications. A further advantage of testing from real devices is enhanced control and understanding of the performance influence of device characteristics.

The performance to a given device is likely to be influenced by system constraint. These may be inherent (e.g., processor and memory capacity, operating system version) or dynamic (battery state, memory, and CPU utilization, etc.). In addition, user behavior and environmental factors can have a significant influence—everything from applications running in the background, number of browser tabs open, or even the ambient temperature. Figure 2-11 illustrates device selection within a tool supporting real-device testing.

Testing from real device—device selection (Credit: Perfecto)
Figure 2-11. Testing from real device—device selection (Credit: Perfecto)

It’s that control word again—the more accurate your modeling of test conditions (particularly edge states), the more accurate and relevant your interpretation will become.

Monitoring and Analysis of Native Mobile Applications

Two approaches are possible here. For monitoring/visitor analysis, the most widely used approach (and that adopted by APM tooling) is to provide measurement based on a Software Development Kit (SDK). The application is instrumented by introducing libraries to the code via the SDK. The degree of visibility can usually be extended by introducing multiple timing points, such as for the various user interactions across a logical transaction. Errors are reported, usually together with some crash data.

All the major vendors support both Android and iOS. Choices for other operating systems (such as RIM or Windows Mobile) are much more limited due to their relatively small market share. This limitation should be borne in mind when making decisions about which mobile operating systems to support in creating a native mobile web application. Figure 2-12 shows the use by an APM tool of an SDK code snippet for mobile application instrumentation.

Other tools exist for point analysis of native apps. AT&T’s Application Resource Optimizer (ARO) utility is a useful (open source) example. This utility screens applications against best practice in 25 areas based on initial download size and network interactions (PCAP analysis) via a Virtual Private Network (VPN) probe. Figure 2-13 shows the rules-based RAG output offered within this valuable open source tool.

Mobile instrumentation using Software Development Kit (Credit: New Relic)
Figure 2-12. Mobile instrumentation using Software Development Kit (Credit: New Relic)
Rules-based best practice analysis (25 parameters) for native mobile apps (AT&T ARO)
Figure 2-13. Rules-based best practice analysis (25 parameters) for native mobile apps (AT&T ARO)

Most modern APM tools offer both synthetic and passive external monitoring to support end-to-end visibility of user transactions. Although it is possible to integrate “foreign” external monitoring into an APM backend, this is unlikely to repay the effort and maintenance overhead. The key advantage of using the APM vendor’s own end-user monitoring is that the data collected is automatically integrated with the core APM tool. The great strength of APM is the ability to provide a holistic view of performance. The various metrics are correlated, supporting a logical drilldown from an end-user transaction to root cause, whether application code or infrastructure.

Be aware, however, that correlation between frontend and backend is unlikely to be at individual user-session level—the RUM data may be, but the backend comparison will usually be to an aggregated time slot. This is probably OK for gross errors, but becomes more limiting the more episodic the issue is. It is important to understand any limitations of the RUM and active test capabilities offered, both to assist in accurate interpretation and to make provisions for supplementary tooling to support deep-dive FEO analytics.

Ultimately, the strength of an APM lies in its ability to monitor over time against defined KPIs and Heath Rules, to understand performance trends and issues as they occur, and to rapidly isolate the root cause of such issues. These are powerful benefits. They do not support detailed client-side analysis against best practice performance-by-design principles. These are best undertaken as a standalone exercise, using independent tools designed specifically with such analysis in mind. The key use case for APM is to support an understanding of “now” in both absolute and relative terms, and to support rapid issue isolation/resolution when problems occur. Figure 2-14 illustrates the ability, offered by most APM tooling, to correlate—to some extent—frontend and backend performance.

Frontend/backend correlation—RUM timings, link (highlighted) to relevant backend data (APM tooling example)
Figure 2-14. Frontend/backend correlation—RUM timings, link (highlighted) to relevant backend data (APM tooling example)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.71.94