Chapter 10

Performance Tuning

There is an industry perception that Flash technology is slow. This has been reinforced by negative statements in the media, such as Apple CEO Steve Jobs's “Thoughts on Flash,”1 where he stated that “Flash has not performed well on mobile devices.”

While it is possible to write slow-performing applications in Flash or any other mobile technology, with the right tools and techniques you can also create fast, responsive applications. Just like native applications, Flash technology lets you take advantage of hardware-accelerated video playback and GPU-accelerated rendering. Proper use of these techniques can significantly speed up your application's performance and reduce the drain on the battery.

It is also very easy to fall into the trap of taking existing content optimized for desktop use and misusing it for mobile applications. Mobile devices have smaller screens, slower processors, less memory, and often slower or unreliable network connectivity. If you build your application with these constraints in mind and test often on your target mobile device, you will have much better results.

In this chapter, you will learn how the Flash runtime works in enough detail to understand the critical factors that constrain your application performance. We will then dig into several different performance-sensitive areas, including images, media, and text. In this process, we will go over new APIs in ActionScript and Flex that were specifically introduced to optimize mobile content that you should take advantage of.

There will always be poorly written Flash applications for the detractors to point out as examples of why Flash is not fit for mobile. However, by following the advice and guidelines in this chapter, you will ensure that your application is not one of them.

_______________

1 Apple, “Thoughts on Flash,” www.apple.com/hotnews/thoughts-on-flash/, April 2011

Mobile Performance Tuning Basics

Performance tuning mobile applications is not that much different than desktop applications, and breaks down into the same three fundamental considerations:

  • Execution time
  • Memory usage
  • Application size

Execution time is CPU cycles spent by your application on processing prior to each frame being displayed. This could be application logic that you wrote to prepare or update the content, network I/O where your application is waiting on a response from an external server, or time spent in the underlying Flash runtime for validation or rendering of graphics.

Memory is the amount of device RAM that you are using while your application is running. This will typically grow over the duration of your application's execution until it hits a steady state where no additional objects are being created or the number of new objects roughly equals the number of freed objects. Continual growth of memory might indicate a memory leak where resources are not being freed or invisible/offscreen objects are not dereferenced.

Mobile devices add an additional level of complexity, with memory limitations both on the main system and on the GPU. Garbage collection also factors into this, because the memory in use will often be double what is actually needed as the collector copies over live objects to free unused memory.

Application size is an important consideration as well, because it affects both the initial download of your application from Android Market and its startup performance. Compiled ActionScript is actually quite compact, so static assets, such as images and video that you embed in your project, usually dominate the size of the application.

All of these factors are important in determining the overall performance of your application. However, what matters more than the absolute measures of execution time, memory, and application size is performance as perceived by your end users.

Perceived vs. Actual Performance

If you have written an application that is in widespread use, you have probably experienced user dissatisfaction with performance. For every user that complains about slow performance, there are tens or hundreds who give up or stop using the application instead of reporting an issue.

This correlation between slow application performance and low usage and user satisfaction has been substantiated by research done by John Hoxmeier and Chris DiCesare at Colorado State University.2 Through testing with a control group of 100 students, they proved the following hypotheses:

  • Satisfaction decreases as response time increases
  • Dissatisfaction leads to discontinued use
  • Ease of use decreases as satisfaction decreases

Though they were testing with a web-based application, these findings are highly analogous to what you will experience with a rich client application built on the Flash platform. In this study, responses that took six seconds or less were perceived as being powerful and fast enough, while responses that took nine seconds or more were rated highly unfavorably.

NOTE: In this study, they also disproved the hypothesis that expert users were more likely to tolerate slower response times, so don't assume that this research does not apply to your application.

So how fast does your application need to be in order to satisfy users? According to Ben Shneiderman,3 you should stay within the following bounds:

  • Typing, cursor motion, mouse selection: 50–150 milliseconds
  • Simple frequent tasks: 1 second
  • Common tasks: 2–4 seconds
  • Complex tasks: 8–12 seconds

In addition, giving users feedback about long-running tasks with a progress bar or a spinning wait icon makes a huge difference to their willingness to wait. Beyond the 15-second mark, this is absolutely crucial to ensure the user will wait or come back after context switching.

So what does this mean for your Flash application?

Flash applications typically make use of animations and transitions to improve the user experience. If you plan to make use of these, they need to have relatively high frame rates in order to give the user the impression that the application is responding quickly. The goal for these should be around 24 frames per second or roughly 42 milliseconds, which is the minimum frame rate for users to perceive animation as being smooth. We talk more about how you can tune rendering performance to hit this in the next section.

_________________

2 John A. Hoxmeier and Chris DiCesare, “System Response Time and User Satisfaction: An Experimental Study of Browser-Based Applications.” AMCIS 2000 Proceedings (2000). Paper 347.

3 Ben Shneiderman, “Response time and display rate in human performance with computers.” Computing Surveys 16 (1984), p. 265–285.

For frequent tasks, such as showing details, submitting forms, or drag-and-drop, you should target under one second of response time. Flash applications have a distinct advantage over web applications in performing these operations since they can give the user immediate feedback while executing tasks in the background to retrieve or send data.

Common tasks, such as loading a new page or navigating via a tab or link can take longer, but should be accompanied by an indeterminate progress indicator to let the user know that activity is happening. In addition, judicious use of transition animations can make the loading seem to occur faster than it actually does.

Complex tasks, such as searching or populating a large list of data, can take longer, but should either be bounded to complete in less than twelve seconds or provide a progress bar that indicates how long the task will take to complete. Often it is possible to display intermediate results, such as partial search results or the first few pages of data. This will allow the user to continue using the application while additional data is loaded in the background, dramatically changing the perceived wait time.

Tuning Graphics Performance

At its core, the Flash runtime is a frame-based animation engine that processes retained mode graphics. Even if you are building applications using higher-level frameworks such as Flex, it is helpful to understand the rendering fundamentals of the Flash Player so that you can optimize your processing and content for optimal performance.

The heartbeat of the Flash engine is the frames-per-second setting, which controls how many frames get drawn onscreen each second. While performance bottlenecks may cause the number of frames per second to get reduced, there will never be more than this number of frames processed.

Many graphics toolkits use what is called immediate mode rendering to draw to screen. In immediate mode rendering, the application implements a callback where it has to redraw the contents of the screen each clock cycle. While this is conceptually simple and close to what the hardware implements, it leaves the job of saving state and providing continuity for animations up to the application developer.

Flash uses retained mode graphics where you instead build up a display list of all the objects that will be rendered on the screen, and let the framework take care of rendering and blitting the final graphics each clock cycle. This is better suited to animation and graphics applications, but can be more costly in resources based on the size and complexity of the display list.

The Elastic Racetrack

Ted Patrick came up with a very useful conceptual model for how the Flash Player handles rendering, which he called the Elastic Racetrack.4 Shown in Figure 10–1, this model splits the work in each frame between code execution and rendering.

images

Figure 10–1. The Flash Player Elastic Racetrack

Code execution is the time spent running any ActionScript associated with that frame, including event handlers that get fired for user input, the Timer, and ENTER_FRAME events. Rendering includes processing done by the Flash Player to prepare the display list, composite images, and blit the graphics to the screen. To keep a steady frame rate, the total duration of these two activities cannot exceed the time slice allocated for that frame.

So how much time do you have to execute all your logic? Table 10–1 lists some common frame rates and how many milliseconds you have to process both code execution and rendering.

images

_________________

4 Ted Patrick, “Flash Player Mental Model - The Elastic Racetrack,” http://ted.onflash.org/2005/07/flash-player-mental-model-elastic.php, July 2005

The default frame rate for the Flash Player is 24fps; anything lower than this is noticeably choppy or laggy to the user. However, users can easily perceive frame rate differences up to 60fps, especially in tasks where there is a large amount of motion or scrolling. Shooting for frame rates above 60fps is usually not worthwhile, especially considering that most LCDs are capped at a refresh rate of 60hz and some devices have their max frame rate capped at 60.

When trying to diagnose a slow frame rate, the first step is to determine whether you are constrained by long code execution or slow rendering. Code execution is the easier of the two to profile since it is under your control, and if it approaches or exceeds the total frame length for your target frame rate, this is where you will want to start your optimization.

Reducing Code Execution Time

If your code execution time takes slightly longer than a single frame cycle, you may be able to get enough performance by optimizing your code. This will vary based on whether you are doing pure ActionScript or building on Flex. Also, if you are doing a complex or long-running operation, a different approach may be needed.

Some common ActionScript code performance best practices that are worth investigating include the following:

  • Prefer Vectors over Arrays: The Vector datatype is highly optimized and much faster than doing basic list operations using Arrays. In some cases, such as large, sparse lists, Arrays will perform better, but this a rare exception.
  • Specify strong types wherever possible: ActionScript is dynamically typed, allowing you to leave off explicit type information. However, when provided, type information can allow the compiler to generate more efficient code.
  • Keep constructors light: The just-in-time (JIT) compiler does not optimize code in variable initializers or constructors, forcing the code to run in interpreted mode. In general, object construction is expensive and should be deferred until the elements become visible onscreen.
  • Use binding judiciously: Binding introduces an extra level of overhead that makes sense for updating the UI, but should be avoided elsewhere.
  • Regex expressions are costly: Use regex expressions sparingly and for validating data. If you need to search, String.index Of is an order of magnitude faster.

If you are writing a Flex application, you will want to look into the following in addition:

  • Minimize nesting of groups and containers: Measurement and layout of large object graphs can be very expensive. By keeping your containment as flat as possible, you will speed up your application. This is particularly important when building grid or list renderers that will be reused repeatedly.
  • Prefer groups over containers: The new Spark graphics library was redesigned with performance in mind. As a result, groups are very lightweight in comparison with containers and should be used for layout instead.

If code execution is still the bottleneck after tuning your code, you may want to look into splitting up the workload over multiple frames. For example, if you are doing a hit detection algorithm, it may not be possible to check all the objects within a single frame. However, if you can group objects by region and process them incrementally, the work can be spread over multiple frames, increasing your application's rendering speed.

Speeding Up Rendering

When running on the CPU, Flash uses a highly optimized retained mode software renderer for drawing graphics to the screen. To render each frame, it goes through the list of all the objects in your DisplayList to figure out what is visible and needs to be drawn.

The software renderer scans line by line through the update region, calculating the value of each pixel by looking at the ordering, position, and opacity of each element in the DisplayList. Figure 10–2 contains a sample graphic created in Flash with several layers of text and graphics composited to create an image.

images

Figure 10–2. Sample Flash graphic of internal organs5

When placed in a Stage, this scene would have a DisplayList similar to that shown in Figure 10–3.

images

Figure 10–3. DisplayList for the sample Flash organ graphic

____________

5 Graphics based on public domain organ library: Mikael Häggström, “Internal Organs,” http://commons.wikimedia.org/wiki/File:Internal_organs.png

During the rendering phase, Flash would use this DisplayList to determine how to draw each pixel on the screen. Since the graphics are opaque and the nesting is only three levels deep, this would render very quickly onscreen. As the complexity of your DisplayList goes up, you need to pay careful attention to the type of objects used in your application, and the effects applied to them.

Some of the ways that you can improve your application rendering performance include the following:

  • Keep your DisplayList small: A well-pruned DisplayList will help the Flash renderer to save memory and execution time scanning the hierarchy. If objects are no longer in use, make sure to remove them for their parent. Otherwise you can hide and show individual elements by changing their visibility on the fly.
  • Use appropriate object types: A Shape or Bitmap is the smallest object in the DisplayList, consuming only 236 bytes. Sprites are more heavyweight with features for interaction and event handling that takes 414 bytes. MovieClips are the most expensive objects in the scene at 440 bytes and additional overhead to support animation. To speed up rendering, you should choose the least complex object type that meets your needs.
  • Avoid alpha, masking, filters, and blends: The Flash rendering engine cannot make certain optimizations if you use these features, which slows down the rendering performance. Rather than using alpha to hide and show objects, use the visibility flag instead. Masking is very expensive, and can often be substituted with simple cutouts or layering of the scene. Blend modes are particularly expensive and should be avoided whenever possible.

If you are developing a Flex-based application, you will want to pay careful attention to your use of UIComponents, GraphicElements, and FXG. Table 10–2 lists the trade-offs of using these different object types.

images

UIComponents are the most complex object types in Flex and can significantly impact your rendering performance, especially if used extensively within a table or list renderer. GraphicsElements and FXG are both very lightweight components that the renderer can do significant optimization of. FXG has a slight performance edge since it is compiled down to graphics when the application is built, as opposed to GraphicsElements, which need to be processed at runtime.

A common mistake in mobile development is to develop exclusively in the desktop emulator and wait until the application is almost complete to start testing on device. If you wait until you have an extremely complex DisplayList, it will be very difficult to figure out which elements are contributing to the slowdown. On the other hand, if you are testing regularly as you build out the application, it will be very easy to diagnose which changes affect the performance the most.

Scene Bitmap Caching

Another technique that you can use to speed up rendering performance at the expense of memory is scene bitmap caching. Flash has built-in support via the cacheAsBitmap and cacheAsBitmapMatrix properties to easily capture and substitute static images in place of a completed scene hierarchy. This is particularly important on mobile devices where vector graphics operations are much slower and can significantly impact your performance.

cacheAsBitmap

cacheAsBitmap is a boolean property of DisplayObject, and by extension all the visual elements you use in Flash and Flex including Sprites and UIComponents have access to this variable. When set to true, each time the DisplayObject or one of its children changes, it will take a snapshot of the current state and save it to an offscreen buffer. Then for future rendering operations it will redraw off the saved offscreen buffer, which can be orders of magnitude faster for a complicated portion of the scene.

To enable cacheAsBitmap on a DisplayObject, you would do the following:

cacheAsBitmap = true;

Flex UIComponents have a cache policy that will automatically enable cacheAsBitmap based on a heuristic. You can override this behavior and force cacheAsBitmap to be enabled by doing the following:

cachePolicy = UIComponentCachePolicy.ON;

Turning on cacheAsBitmap is an important technique when you have complex graphics that change infrequently, such as a vector-rendered background. Even though the background is static, other elements that move around it can trigger an update when they overlap and obscure portions of it. Also, simple translations, such as scrolling the background, will cause an expensive redraw operation.

To figure out what portions of the screen are being repainted on each frame redraw by your application, you can enable showRedrawRegions with the following code:

flash.profiler.showRedrawRegions(true);

This will draw red rectangles around the screen areas that are being actively updated, and can be turned on and off programmatically. Figure 10–4 shows an example of a CheckBox control that lets you toggle redraw regions on and off. The control has recently been clicked, so it has a red rectangle drawn around it.

images

Figure 10–4. Example of the redraw region debugging feature

This option is available only in the debug player, so it will work in the AIR Debug Launcher while testing your application, but will not work when deployed in a runtime player, such as on a mobile device. Figure 10–4 also demonstrates a very simple frames-per-second monitor that can be used to benchmark your Flex application performance while under development. The full code for both of these is shown in the upcoming section on building the Flash Mobile Bench application.

While cacheAsBitmap is a very powerful tool for optimizing the redraw of your application, it is a double-edged sword if not used properly. A full-size screen buffer is kept and refreshed for each DisplayObject with cacheAsBitmap set to true, which can consume a lot of device memory or exhaust the limited GPU memory if you are running in graphics accelerated mode.

Also, if you have an object that updates frequently or has a transform applied, then cacheAsBitmap will simply slow down your application with unnecessary buffering operations. Fortunately, for the transformation case there is an improved version of cacheAsBitmap, called cacheAsBitmapMatrix, that you can take advantage of.

cacheAsBitmapMatrix

cacheAsBitmapMatrix is also a property on DisplayObject, and works together with cacheAsBitmap. For cacheAsBitmapMatrix to have any effect, cacheAsBitmap must also be turned on.

As mentioned previously, cacheAsBitmap does not work when a transformation, such as a rotation or a skew, is applied to the object. The reason for this is that applying such a transformation to a saved Bitmap produces scaling artifacts that would degrade the appearance of the final image. Therefore, if you would like to have caching applied to objects with a transform applied, Flash requires that you also specify a transformation matrix for the Bitmap that is stored in the cacheAsBitmapMatrix property.

For most purposes, setting cacheAsBitmapMatrix to the identify matrix will do what you expect. The offscreen Bitmap will be saved in the untransformed position, and any subsequent transforms on the DisplayObject will be applied to that Bitmap. The following code shows how to set cacheAsBitmapMatrix to the identify transform:

cacheAsBitmap = true;
cacheAsBitmapMatrix = new Matrix();

If you were doing the same on a Flex UIComponent utilizing a cachePolicy, you would do the following:

cachePolicy = UIComponentCachePolicy.ON;
cacheAsBitmapMatrix = new Matrix();

NOTE: If you plan on setting cacheAsBitmapMatrix on multiple objects, you can reuse the same matrix to get rid of the cost of the matrix creation.

The downside to this is that the final image may show some slight aliasing, especially if the image is enlarged or straight lines are rotated. To account for this, you can specify a transform matrix that scales the image up prior to buffering it. Similarly, if you know that the final graphic will always be rendered at a reduced size, you can specify a transform matrix that scales down the buffered image to save on memory usage.

If you are using cacheAsBitmapMatrix to scale the image size down, you need to be careful that you never show the DisplayObject at the original size. Figure 10–5 shows an example of what happens if you set a cache matrix that reduces and rotates the image first, and then try to render the object at its original size.

images

Figure 10–5. Demonstration of the effect of cacheAsBitmapMatrix on image quality when misapplied

Notice that the final image has quite a bit of aliasing from being scaled up. Even though you are displaying it with a one-to-one transform from the original, Flash will upscale the cached version, resulting in a low-fidelity image.

The optimal use of cacheAsBitmapMatrix is to set it slightly larger than the expected transform so you have enough pixel information to produce high-quality transformed images.

Flash Mobile Bench

Flash Mobile Bench is a simple application that lets you test the effect of different settings on the performance of your deployed mobile application.

The functionality that it lets you test includes the following:

  • Addition of a large number of shapes to the display list
  • Animation speed of a simple x/y translation
  • Animation speed of a simple clockwise rotation
  • Impact of cacheAsBitmap on performance
  • Impact of cacheAsBitmapMatrix on performance
  • Impact of the automatic Flex cache heuristic on performance

It also includes a simple FPS monitoring widget that you can reuse in your own applications.

In order to stress the capabilities of the device this application is running on, the first thing we had to do was increase the frame rate from the default of 24fps to something much more aggressive. Based on testing on a few devices, we found 240fps to be a ceiling limit that lots of platforms hit, and chose this as the target frame rate setting. Remember that this is a benchmark application testing theoretical performance, but in most cases you will not want to have the frame rate set this high, because you may be processing more frames than the hardware is able to display.

To change the frame rate, there is a property called frameRate on the Application class. Listing 10–1 demonstrates how you can set this in your Flex mobile application.

Listing 10–1. Flash Mobile Bench ViewNavigatorApplication(MobileBench.mxml)

<?xml version="1.0" encoding="utf-8"?>
<s:ViewNavigatorApplication xmlns:fx="http://ns.adobe.com/mxml/2009"
  xmlns:s="library://ns.adobe.com/flex/spark"
  firstView="views.MobileBenchHomeView"
  frameRate="240">
</s:ViewNavigatorApplication>

This follows the ViewNavigatorApplication pattern for building Flex mobile applications with a single View called MobileBenchHomeView. The layout for this View is done in MXML and shown in Listing 10–2.

Listing 10–2. Flash Mobile Bench View Code for Layout (MobileBenchHomeView.mxml)

<?xml version="1.0" encoding="utf-8"?>
<s:View xmlns:fx="http://ns.adobe.com/mxml/2009"
    xmlns:s="library://ns.adobe.com/flex/spark"
    title="Flash Mobile Bench" initialize="init()">

  <fx:Script>
    <![CDATA[
      …
    ]]>
  </fx:Script>
  <s:VGroup top="10" left="10" right="10">
    <s:Label id="fps"/>
    <s:CheckBox id="redraw" label="show redraw"
                click="{flash.profiler.showRedrawRegions(redraw.selected)}"/>
    <s:HGroup verticalAlign="middle" gap="20">
      <s:Label text="Cache:"/>
      <s:VGroup>
        <s:RadioButton label="Off" click="cacheOff()"/>
        <s:RadioButton label="Auto" click="cacheAuto()"/>
      </s:VGroup>
      <s:VGroup>
        <s:RadioButton label="Bitmap" click="cacheAsBitmapX()"/>
        <s:RadioButton label="Matrix" click="cacheAsBitmapMatrixX()"/>
      </s:VGroup>
    </s:HGroup>
    <s:TileGroup id="tiles" width="100%">
      <s:Button label="Generate Rects" click="generateSquares()"/>
      <s:Button label="Generate Circles" click="generateCircles()"/>
      <s:Button label="Start Moving" click="moving = true"/>
      <s:Button label="Stop Moving" click="moving = false"/>
      <s:Button label="Start Rotating" click="rotating = true"/>
      <s:Button id="stop" label="Stop Rotating" click="rotating=false"/>
    </s:TileGroup>
  </s:VGroup>
  <s:Group id="bounds" left="20" top="{stop.y + tiles.y + stop.height + 20}">
    <s:Group id="shapeGroup" transformX="{tiles.width/2 - 10}"
             transformY="{(height - bounds.y)/2 - 10}"/>
  </s:Group>
</s:View>

This creates the basic UI for the application, including a place to populate the FPS setting, radio buttons for selecting the cache policy, and buttons for adding GraphicsElements and starting and stopping the animations.

There is also an extra check box to show redraw regions. This control can be dropped into your own applications as-is, and can help you to minimize the size of the redraw region in order to optimize render performance. Remember that this feature works only in the AIR Debug Launcher, so you can't use it in the device runtime.

Other than the UI label, the code for the FPS monitor is fairly stand-alone. It consists of an event listener that is tied to the ENTER_FRAME event, and some bookkeeping variables to keep track of the average frame rate. The code for this is shown in Listing 10–3.

Listing 10–3. ActionScript Imports, Initialization, and Code for the FPS Handler

import flash.profiler.showRedrawRegions;
import flash.utils.getTimer;
import mx.core.UIComponentCachePolicy;
import mx.graphics.SolidColor;
import mx.graphics.SolidColorStroke;
import spark.components.Group;
import spark.primitives.Ellipse;
import spark.primitives.Rect;
import spark.primitives.supportClasses.FilledElement;

privatefunction init():void {
  addEventListener(Event.ENTER_FRAME, calculateFPS);
  addEventListener(Event.ENTER_FRAME, animateShapes);
}

// FPS handler

privatevar lastTime:int = getTimer();
privatevar frameAvg:Number = 0;
privatevar lastFPSUpdate:int = getTimer();

privatefunction calculateFPS(e:Event):void {
  var currentTime:int = getTimer();
  var duration:int = currentTime - lastTime;
  var weight:Number = (duration + 10) / 1000;
  frameAvg = frameAvg * (1 - weight) + duration * weight;
  lastTime = currentTime;
  if (currentTime - lastFPSUpdate > 200) {
    fps.text = "FPS: " + Math.round(1000.0 / frameAvg).toString();
    lastFPSUpdate = currentTime;
  }
}

The algorithm used for calculating the frame rate is tuned for the following characteristics:

  • Refresh no more than five times per second: Refreshing the counter too frequently makes it difficult to read and can impact your performance negatively. This condition is enforced by the lastFPSUpdate comparison against a 200ms threshold.
  • Weight slow frames higher: As the frame rate decreases, the number of events goes down. This requires each frame to be weighted higher to avoid lag in the reading. The weight variable accomplishes this up to the threshold of 1000ms (1 second).
  • Give a minimum weight to fast frames: As the frame rate goes up, the weighting approaches zero. Therefore, a minimum weight of 1% is allocated to prevent the reading from lagging at the other extreme.

Something else to note in this algorithm is the use of integer and floating point arithmetic. The former is faster and preferred where possible (such as calculating the duration), while the latter is more accurate, and required for keeping a precise average (frameAvg).

The next critical section of code is the population of GraphicsElements into the scene. The code in Listing 10–4 accomplishes this.

Listing 10–4. ActionScript Code for Creation of GraphicsElements

[Bindable]
private var shapes:Vector.<FilledElement> = new Vector.<FilledElement>();

private function populateRandomShape(shape:FilledElement):void {
  shape.width = shape.height = Math.random() * 20 + 20;
  shape.x = Math.random() * (tiles.width - 20) - shape.width/2;
  shape.y = Math.random() * (height - bounds.y - 20) - shape.width/2;
  shape.fill = new SolidColor(0xFFFFFF * Math.random());
  shape.stroke = new SolidColorStroke(0xFFFFFF * Math.random());
  shapes.push(shape);
  shapeGroup.addElement(shape);
}

private function generateCircles():void {
  for (var i:int=0; i<100; i++) {
    populateRandomShape(new Ellipse());
  }
}

private function generateSquares():void {
  for (var i:int=0; i<100; i++) {
    populateRandomShape(new Rect());
  }
}

All the attributes of the shapes are randomized, from the color of the fill and stroke to the size and location. The overlapping logic between the Rect and Ellipse creation is also abstracted out into a common function to maximize code reuse.

To animate the shapes, we use the code found in Listing 10–5.

Listing 10–5. ActionScript Code for Animation of the Rect and Ellipse Shapes

privatevar moving:Boolean;
privatevar rotating:Boolean;
privatevar directionCounter:int;

privatefunction animateShapes(e:Event):void {
  if (moving) {
    shapeGroup.x += 1 - ((directionCounter + 200) / 400) % 2;
    shapeGroup.y += 1 - (directionCounter / 200) % 2;
    directionCounter++;
  }
  if (rotating) {
    shapeGroup.rotation += 1;
  }
}

Rather than using the Flex animation classes, we have chosen to do it via a simple ENTER_FRAME event listener. This gives you the flexibility to extend the harness to modify the variables on the shape classes that are not first-class properties.

Finally, the code to modify the cacheAsBitmap settings is shown in Listing 10–6.

Listing 10–6. Application Descriptor Tag for Setting the renderMode (addition in bold)

privatevar identityMatrix:Matrix = new Matrix();

privatefunction cacheOff():void {
  shapeGroup.cachePolicy = UIComponentCachePolicy.OFF;
}

privatefunction cacheAuto():void {
  shapeGroup.cachePolicy = UIComponentCachePolicy.AUTO;
}

privatefunction cacheAsBitmapX():void {
  shapeGroup.cachePolicy = UIComponentCachePolicy.ON;
  shapeGroup.cacheAsBitmapMatrix = null;
}

privatefunction cacheAsBitmapMatrixX():void {
  shapeGroup.cachePolicy = UIComponentCachePolicy.ON;
  shapeGroup.cacheAsBitmapMatrix = identityMatrix;
}

This code should look very familiar after reading the previous section. Even though we have only one instance of an object to apply the cacheAsBitmapMatrix on, we follow the best practice of reusing a common identity matrix to avoid extra memory and garbage collection overhead.

Upon running Flash Mobile Bench, you will immediately see the FPS counter max out on your given device. Click the buttons to add some shapes to the scene, set the cache to your desired setting, and see how your device performs. Figure 10–6 shows the Flash Mobile Bench application running on a Motorola Droid 2 with 300 circles rendered using cacheAsBitmapMatrix.

images

Figure 10–6. Flash Mobile Bench running on a Motorola Droid 2

How does the performance of your device compare?

GPU Rendering

One of the other techniques that is currently available only for mobile devices is offloading rendering to the graphics processing unit (GPU). While the GPU is a highly constrained chip, which cannot do everything a normal CPU is capable of, it excels at doing graphics and rendering calculations that take several orders of magnitude longer on the CPU. At the same time, the GPU produces less battery drain, allowing the mobile device to cycle down the CPU to conserve battery life.

The default setting for Flash mobile projects is to have a renderMode of “auto”, which defaults to cpu at present. You can explicitly change this to gpu rendering to see if you get significant gains in performance for your application. To change the renderMode in Flash Professional, open the AIR for Android Settings dialog and choose GPU from the render mode drop-down, as shown in Figure 10–7.

images

Figure 10–7. GPU render mode setting in Flash Professional

To change the renderMode in a Flash Builder project, you will need to edit the application descriptor file and add in an additional renderMode tag under initialWindow, as shown in Listing 10–7.

Listing 10–7. Application Descriptor Tag for Setting the renderMode (Addition in Bold)

<application>
  …
  <initialWindow>
    <renderMode>gpu</renderMode>
    …
  </initialWindow>
</application>

The results you get from gpu mode will vary greatly based on the application features you are using and the hardware you are running on. In some cases, you will find that your application actually runs slower in gpu mode than it does in cpu mode. Table 10–3 lists some empirical results from running Flash Mobile Bench on a Motorola Droid 2 with 100 circles and 100 squares on different cache and gpu modes.

images

As you can see from the results with this scenario on this particular device, the GPU provided no advantage, and was significantly slower in the case where cacheAsBitmap was enabled without a matrix set.

This underscores the importance of testing with different devices before you commit to design decisions in your application. In this particular example, the reduced performance was most likely due to write-back overhead of the GPU sending data back to the CPU. Most GPU devices are optimized for receiving data from the CPU in order to write it to the screen quickly. Sending data back in the other direction for processing is prohibitively expensive on some devices.

This is changing quickly, however, with new chipsets such as the Intel Integra features on the Motorola ATRIX and XOOM, which have optimized pipelines for bidirectional communication. Also, the Flash team is working on an optimized render pipeline that will reduce the need for write-backs to the CPU by doing more work on the processor. For more information about the performance improvements being done by the Flash team, see the “Future of Flash Performance” section later in this chapter.

Performant Item Renderers

Performance is best tuned in the context of a critical application area, which will be noticeable by users. For Flex mobile applications, organizing content by lists is extremely common, yet presents a significant performance challenge.

Since scrolling lists involve animation, it is very noticeable if the frame rate drops during interactions. At the same time, any performance issues in the item renderer code are magnified by the fact that the renderer is reused for each individual list cell.

To demonstrate these concepts, we will build out a simple example that shows a list of all the Adobe User Groups and navigates to the group web site when an item is clicked.

Listing 10–8 shows the basic View code for creating a Flex list and wiring up a click event handler that will open a browser page. We are also making use of the FPSComponent developed earlier to keep track of the speed of our application while developing.

Listing 10–8. Adobe User Group Application View Class

<?xml version="1.0" encoding="utf-8"?>
<s:View xmlns:fx="http://ns.adobe.com/mxml/2009"
    xmlns:s="library://ns.adobe.com/flex/spark"
    xmlns:renderers="renderers.*" xmlns:profile="profile.*"
    title="Adobe User Groups (Original)">
  <fx:Script>
    <![CDATA[
      import flash.net.navigateToURL;
      privatefunction clickHandler(event:MouseEvent):void {
        navigateToURL(new URLRequest(event.currentTarget.selectedItem.url));
        }
    ]]>
  </fx:Script>
  <s:VGroup width="100%" height="100%">
    <profile:FPSDisplay/>
    <s:List width="100%" height="100%" dataProvider="{data}"
            click="clickHandler(event)">
      <s:itemRenderer>
        <fx:Component>
          <renderers:UserGroupRendererOriginal/>
        </fx:Component>
      </s:itemRenderer>
    </s:List>
  </s:VGroup>
</s:View>

TIP: For mobile applications, always use the itemRenderer property rather than the itemRendererFunction property. The latter results in the creation of multiple instances of the item renderer and will negatively impact performance.

This class references a UserGroupRenderer that will display the list items. The creation of this renderer involves combining the following components:

  • An image component for the user group logo
  • Two text fields for displaying the user group name and description
  • A horizontal line to separate different visual elements

Listing 10–9 shows a straightforward implementation of an ItemRenderer that meets these requirements.

Listing 10–9. Unoptimized ItemRenderer Code

<?xml version="1.0" encoding="utf-8"?>
<s:View xmlns:fx="http://ns.adobe.com/mxml/2009"
    xmlns:s="library://ns.adobe.com/flex/spark"
    xmlns:renderers="renderers.*" xmlns:profile="profile.*"
    title="Adobe User Groups (Original)">
  <fx:Script>
    <![CDATA[
      import flash.net.navigateToURL;
      privatefunction clickHandler(event:MouseEvent):void {
        navigateToURL(new URLRequest(event.currentTarget.selectedItem.url));
      }
    ]]>
  </fx:Script>
  <s:VGroup width="100%" height="100%">
    <profile:FPSDisplay/>
    <s:List width="100%" height="100%" dataProvider="{data}"
            click="clickHandler(event)">
      <s:itemRenderer>
        <fx:Component>
          <renderers:UserGroupRendererOriginal/>
        </fx:Component>
      </s:itemRenderer>
    </s:List>
  </s:VGroup>
</s:View>

Upon running this example, we have a very functional scrolling list, as shown in Figure 10–8.

images

Figure 10–8. Adobe User Group list using a custom ItemRenderer

While the functionality and appearance are both fine, the performance of this implementation is less than ideal. For normal scrolling, the frame rate drops to around 18fps, and when doing long throws of the list by swiping across the screen you get only 7fps. At these speeds, the scrolling is visually distracting and gives the impression that the entire application is slow.

Flex Image Classes

Flash provides several different image classes that provide different functionality and have very different performance characteristics. Using the right image class for your application needs can make a huge difference in performance.

The available image classes in increasing order of performance are as follows:

  • mx.controls.Image: This is the original Flex image component. It is now obsolete and should not be used for mobile applications.
  • spark.components.Image: This replaced the previous image class and should be used anywhere styling, progress indicators, or other advanced features are required.
  • flash.display.Bitmap: This is the core Flash image component. It has limited features, and is the highest performance way to display images onscreen.

For the original version of the ItemRenderer, we used the Flex Image class. While this was not a bad choice, we are also making no use of the advanced features of this class, so we can improve performance by using a Bitmap instead.

Also, a new feature that was added in Flex 4.5 is the ContentCache class. When set as the contentLoader on a Bitmap, it caches images that were fetched remotely, significantly speeding up the performance of scrolling where the same image is displayed multiple times.

Listing 10–10 shows an updated version of the item renderer class that incorporates these changes to improve performance.

Listing 10–10. ItemRenderer Code with Optimizations for Images (Changes in Bold)

<?xml version="1.0" encoding="utf-8"?>
<s:ItemRenderer xmlns:fx="http://ns.adobe.com/mxml/2009"
        xmlns:s="library://ns.adobe.com/flex/spark">
  <fx:Style>
    .descriptionStyle {
      fontSize: 15;
      color: #606060;
    }
  </fx:Style>
  <fx:Script>
    <![CDATA[
      import spark.core.ContentCache;
      static private const cache:ContentCache = new ContentCache();
    ]]>
  </fx:Script>
  <s:Line left="0" right="0" bottom="0">
    <s:stroke><s:SolidColorStroke color="gray"/></s:stroke>
  </s:Line>
  <s:HGroup left="15" right="15" top="12" bottom="12" gap="10" verticalAlign="middle">
    <s:BitmapImage source="{data.logo}" contentLoader="{cache}"/>
    <s:VGroup width="100%" gap="5">
      <s:RichText width="100%" text="{data.groupName}"/>
      <s:RichText width="100%" text="{data.description}" styleName="descriptionStyle"/>
    </s:VGroup>
  </s:HGroup>
</s:ItemRenderer>

With these additional improvements, we have increased the frame rate to 19fps for scrolling and 12fps for throws. The latter is over a 70% improvement for only a few lines of code and no loss of functionality.

Text Component Performance

One of the most notable performance differences that you will notice between desktop and mobile is the performance of text. When you are able to use text components and styles that map to device fonts, you will get optimal performance. However, using custom fonts or components that give you precise text control and anti-aliasing has a significant performance penalty.

With the release of Flash Player 10, Adobe introduced a new low-level text engine called the Flash Text Engine (FTE) and a framework built on top of it called the Text Layout Framework (TLF). TLF has significant advantages over the previous text engine (commonly referred to as Classic Text), such as the ability to support bidirectional text and print-quality typography. However, this comes with a significant performance penalty for mobile applications.

The optimal settings for Flash Player to get high-performance text display is to set the text engine to “Classic Text” and turn off anti-aliasing by choosing “Use device fonts” in the Text Properties pane, as shown in Figure 10–9.

images

Figure 10–9. Flash Professional optimal mobile text settings

For Flex applications, you have a wide array of different Text components that make use of everything from Classic Text to TLF, and have varying performance characteristics as a result.

The available Text components are shown in Table 10–4, along with the text framework they are built on and mobile performance characteristics.

images

For mobile applications, you will get the best performance by using the Label, TextInput, and TextArea components, and you should use them whenever possible. Since they don't support bidirectional text and other advanced features and styling, you may still have certain instances where you will need to use RichEditableText or RichText.

Since we do not require any advanced text features for the User Group List application, we can replace the use of RichText with Label. The updated code for this is shown in Listing 10–11.

Listing 10–11. ItemRenderer Code with Optimizations for Text (Changes in Bold)

<?xml version="1.0" encoding="utf-8"?>
<s:ItemRenderer xmlns:fx="http://ns.adobe.com/mxml/2009"
        xmlns:s="library://ns.adobe.com/flex/spark">
  <fx:Style>
    .descriptionStyle {
      fontSize: 15;
      color: #606060;
    }
  </fx:Style>
  <fx:Script>
    <![CDATA[
      import spark.core.ContentCache;
      staticprivateconst cache:ContentCache = new ContentCache();
    ]]>
  </fx:Script>
  <s:Line left="0" right="0" bottom="0">
    <s:stroke<<s:SolidColorStroke color="gray"/></s:stroke>
  </s:Line>
  <s:HGroup left="15" right="15" top="12" bottom="12" gap="10" verticalAlign="middle">
    <s:BitmapImage source="{data.logo}" contentLoader="{cache}"/>
    <s:VGroup width="100%" gap="5">
      <s:Label width="100%" text="{data.groupName}"/>
      <s:Label width="100%" text="{data.description}" styleName="descriptionStyle"/>
    </s:VGroup>
  </s:HGroup>
</s:ItemRenderer>

After this change, the scrolling speed is 20fps and the throw speed is 18fps, which is a significant improvement. We could have achieved even higher speeds by using a StyleableTextField, which is exactly what the Flash team has done for their built-in components.

Built-In Item Renderers

In the past few sections, we have taken the performance of our custom item renderer from completely unacceptable speeds below 10fps, up to around 20fps on our test device. We could continue to optimize the renderer by doing some of the following additional changes:

  • Use cacheAsBitmap to save recent cell images.
  • Rewrite in ActionScript to take advantage of the StyleableTextField.
  • Remove groups and use absolute layout.

However, there is already a component available that has these optimizations included and can be used right out of the box.

The Flex team provides a default implementation of a LabelItemRenderer and IconItemRenderer that you can use and extend. These classes already have quite a lot of functionality included in them that you can take advantage of, including support for styles, icons, and decorators. They are also highly tuned, taking advantage of all the best practices discussed throughout this chapter.

Listing 10–12 shows the code changes you would make to substitute the built-in IconItemRenderer for our custom item renderer.

Listing 10–12. View Code Making Use of the Built-In IconItemRenderer

<?xml version="1.0" encoding="utf-8"?>
<s:View xmlns:fx="http://ns.adobe.com/mxml/2009"
    xmlns:s="library://ns.adobe.com/flex/spark"
    xmlns:views="views.*"
    title="Adobe User Groups (Built-in)" xmlns:profile="profile.*">
  <fx:Script>
    <![CDATA[
      import flash.net.navigateToURL;
      privatefunction clickHandler(event:MouseEvent):void {
        navigateToURL(new URLRequest(event.currentTarget.selectedItem.url));
      }
    ]]>
  </fx:Script>
  <fx:Style>
    .descriptionStyle {
      fontSize: 15;
      color: #606060;
    }
  </fx:Style>
  <s:VGroup width="100%" height="100%">
    <profile:FPSDisplay/>
    <s:List width="100%" height="100%" dataProvider="{data}"
              click="clickHandler(event)">
      <s:itemRenderer>
        <fx:Component>
          <s:IconItemRenderer labelField="groupName"
                    fontSize="25"
                    messageField="description"
                    messageStyleName="descriptionStyle"
                    iconField="logo"/>
        </fx:Component>
      </s:itemRenderer>
    </s:List>
  </s:VGroup>
</s:View>

The results of running this code are extremely close to our original item renderer, as shown in Figure 10–10. If you compare the images side by side, you will notice subtle differences in the text due to the use of the StyleableTextComponent, but there are no significant differences that would affect the usability of the application.

images

Figure 10–10. Adobe User Group list using the built-in IconItemRenderer

The resulting performance of using the built-in component is 24fps for scrolling and 27fps for throws on a Motorola Droid 2. This exceeds the default frame rate of Flex applications, and demonstrates that you can build featureful and performant applications in Flash with very little code.

Performance Monitoring APIs and Tools

The best-kept secret to building performant mobile applications is to test performance early and often. By identifying performance issues as you build out your application, you will be able to quickly identify performance-critical sections of code and tune them as you go along.

Having the right tools to get feedback on performance makes this job much easier. This section highlights several tools that are freely available, or you may already have on your system, that you can start taking advantage of today.

Hi-ReS! Stats

Getting real-time feedback on the frame rate, memory usage, and overall performance of your application is critical to ensure that you do not introduce regressions in performance during development. While you can roll your own performance measurements, if you are not careful, you could be skewing your results by slowing down your application with your own instrumentation.

Fortunately, an infamous web hacker, who goes by the name Mr. Doob, created an open source statistics widget that you can easily incorporate in your project. You can download the source from the following URL: https://github.com/mrdoob/Hi-ReS-Stats.

Mr. Doob's Hi-ReS! Stats gives you the following instrumentation:

  • Frames per second: This shows the current FPS plus the target FPS set by in the player (higher is better).
  • Frame duration: The inverse of frames per second, this lets you know how many milliseconds it is taking to render a frame (lower is better).
  • Memory usage: The current amount of memory in use by the application (in megabytes)
  • Peak memory usage: The highest memory usage threshold that this application has hit (also in megabytes)

To add Hi-ReS! Stats to an ActionScript project, you can use the following code:

import net.hires.debug.Stats;
addChild(newStats());

Since it is a pure ActionScript component, you need to do a little more work to add it to a Flex project, which can be done as follows:

import mx.core.IVisualElementContainer;
import mx.core.UIComponent;
import net.hires.debug.Stats;
private function addStats(parent:IVisualElementContainer):void {
  var comp:UIComponent = new UIComponent();
  parent.addElement(comp);
  comp.addChild(new Stats());
}

Then, to attach this to a View, simply invoke it from the initialize method with a self reference:

<s:View … initialize="addStats(this)"> … </View>

Below the statistics, a graph of these values is plotted, giving you an idea of how your application is trending. You can also increase or decrease the application frame rate by clicking the top or bottom of the readout. Figure 10–11 shows an enlarged version of the Hi-ReS! Stats UI.

images

Figure 10–11. Enlarged screen capture of Hi-ReS! Stats

PerformanceTest v2 Beta

Once you have identified that you have a performance issue, it can be very tricky to track down the root cause and make sure that once you have fixed it, the behavior does not regress with future changes.

Grant Skinner has taken a scientific approach to the problem with PerformanceTest, giving you pure ActionScript APIs to time methods, profile memory usage, and create reproducible performance test scenarios. Sample output from running the PerformanceTest tool is shown in Figure 10–12.

images

Figure 10–12. Output from running the PerformanceTest tool

Since the output is in XML, you can easily integrate this with other tools or reporting, including TDD frameworks for doing performance testing as you write code. For more information on PerformanceTest v2, see the following URL:

http://gskinner.com/blog/archives/2010/02/performancetest.html.

Flash Builder Profiler

For heap and memory analysis, one of the best available tools is the profiler built into Flash Professional. The Flash Builder profiler gives you a real-time graph of your memory usage, allows you to take heap snapshots and compare them against a baseline, and can capture method-level performance timings for your application.

While this does not currently work when running directly on a mobile device, it can be used to profile your mobile application when running in the AIR Debug Launcher. To launch your application in the profiler, select Profile from the Run menu. Upon execution, you will see a real-time view of your application, as shown in Figure 10–13.

images

Figure 10–13. Flash Builder profiler running against a Flash mobile project in debug mode

The Future of Flash Performance

The Flash runtime team at Adobe is continually looking for new ways to improve the performance of Flash applications on the desktop and mobile. This includes performance enhancements in the Flash and AIR runtimes that are transparent to your application as well as new APIs and features that will let you do things more efficiently from within your application.

CAUTION: All of the improvements and changes in this section have been proposed for the Flash roadmap, but are not committed features. The final implementation may vary significantly from what is discussed.

Faster Garbage Collection

As the size of your application grows, garbage collection pauses take an increasingly large toll on the responsiveness of your application. While the amortized cost of garbage collection is very low given all the benefits it provides, the occasional hit caused by a full memory sweep can be devastating to the perceived performance of your application.

Since Flash Player 8, the Flash runtime has made use of a mark and sweep garbage collector. The way that mark and sweep garbage collectors work is that they pause the application before traversing from the root objects through all the active references, marking live objects as shown in Figure 10–14. Objects that are not marked in this phase are marked for deletion in the sweep phase of the algorithm. The final step is to de-allocate the freed memory, which is not guaranteed to happen immediately.

images

Figure 10–14. Visual representation of the mark and sweep garbage collection algorithm

The benefit of the mark and sweep algorithm is that there is very little bookkeeping involved, and it is reasonably fast to execute. However, as the size of the heap grows, so does the duration of the garbage collection pause. This can wreak havoc on animations or other timing-critical operations that will seemingly hang while the collection takes place.

The Flash runtime team is looking at several improvements to the garbage collection algorithms that would benefit performance:

  • Incremental GC
  • GC hint API
  • Generational garbage collection

Incremental garbage collection would allow the garbage collector to split the mark and sweep work over several frames. In this scenario, the total cost of garbage collection will be slightly higher; however, the impact on any particular frame duration is minimized, allowing the application to sustain a high frame rate during collection.

The garbage collector is fairly naïve about when to trigger collections to take place, and invariably will choose the worst possible times to mark and sweep. A GC hint API would let the developer give hints to the garbage collector about performance-critical moments when a garbage collection would be undesirable. If memory is low enough, a garbage collection may still get triggered, but this will help prevent spurious garbage collections from slowing down the application at the wrong moment.

While it is not very well known, the converse is already possible. Flash already has a mechanism to manually trigger a garbage collection to occur. To trigger a garbage collection cycle to happen immediately, you need to call the System.gc() method twice, once to force a mark and a second time to force a sweep, as shown in Listing 10–13.

Listing 10–13. Code to Force a Garbage Collection (Duplicate Call Intentional)

flash.system.System.gc();
flash.system.System.gc();

TIP: Previously this API was available only from AIR and worked only while running in debug mode, but it is now fully supported in all modes.

While mark and sweep collectors are fairly efficient and easy to implement, they are poorly suited for interactive applications and have a tendency to thrash on newly created objects. In practice, long-lived objects need collection fairly infrequently, while newly created objects are frequently discarded. Generational garbage collectors recognize this trend and group objects into different generations based on their age. This makes it possible to trigger a collection on younger generations more frequently, allowing for the reclamation of larger amounts of memory for less work.

Having an efficient generational garbage collector would make a huge difference in the usage pattern of ActionScript, getting rid of the need for excessive object pooling and caching strategies that are commonly used today to increase performance.

Faster ActionScript Performance

The Flash applications that you write and even the libraries in the platform itself are written using ActionScript, so incremental improvements in ActionScript performance can have a huge effect on real-world performance.

Some of the improvements that the Flash team is looking into that will benefit all applications include the following:

  • Just-in-time (JIT) compiler optimizations
  • Float numeric type

Flash makes use of what is known as a just-in-time (JIT) compiler to optimize Flash bytecodes on the fly. The JIT compiler translates performance-critical code sections into machine code that can be run directly on the device for higher performance. At the same time, it has information about the code execution path that it can take advantage of to perform optimizations that speed up the application.

Some of the new JIT optimizations that are planned include the following:

  • Type-based optimizations: ActionScript is a dynamic language, and as such type information is optional. In places where the type is either explicitly specified or can be implicitly discovered by inspecting the call chain, more efficient machine code can be generated.
  • Numeric optimizations: Currently in the Flash runtime all numeric operations, including overloaded operators like addition and multiplication, work on numeric objects rather than primitive numbers and integers. As a result, the code that gets generated includes extra instructions to check the type of number and fetch the value out of the object, which can be very expensive in tight loops. By inspecting the code to determine where primitive values can be substituted, the performance of these operations can be dramatically improved.
  • Nullability: ActionScript is a null-safe language, which is very convenient for UI programming, but means that a lot of extra checks are generated to short-circuit calls that would otherwise dereference null pointers. This is even the case for variables that are initialized on creation and are never set to null. In these cases, the JIT has enough information to safely skip the null checks, reducing the amount of branching in the generated code.

The net result of these JIT optimizations is that with no changes to your application code, you will benefit from faster performance. In general, the more CPU-bound your application is, the greater the benefit you will receive.

In addition, the Flash team has proposed the addition of an explicit float numeric type and matching Vector.<float>. By definition, the Number type in Flash is a 64-bit precision value, and changing the semantics of this would break backward compatibility with existing applications. However, many mobile devices have optimized hardware for doing floating point arithmetic on 32-bit values. By giving programmers the choice of specifying the precision of numeric values, they can decide to trade off accuracy for performance where it makes sense.

Concurrency

Modern computers have multiple processors and cores that can be used to do operations in parallel for higher efficiency. This trend has also extended to mobile applications, where modern devices such as the Motorola ATRIX are able to pack dualcore processors in a very small package. This means that to make full use of the hardware your application needs to be able to execute code in parallel on multiple threads.

Even where multiple processors are not available, it is still a useful abstraction to think about code executing in parallel on multiple threads. This allows you to incrementally work on long-running tasks without affecting operations that need frequent updates, like the rendering pipeline.

Many built-in Flash operations are already multithreaded behind the scenes and can make effective use of multiple cores. This includes the networking code, which executes the I/O operations in the background, and Stage Video, which makes use of native code running in a different thread. By using these APIs, you are implicitly taking advantage of parallelism.

To allow you to take advantage of explicit threading, the Flash team is considering two different mechanisms for exposing this to the developer:

  • SWF delegation: Code is compiled to two different SWF files that are independent. To spawn off a new thread, you would use the worker API from your main SWF file to create a new instance of the child SWF.
  • Entrypoint class: Multithreaded code is separated into a different class using a code annotation to specify that it is a unique application entry point.

In both of these scenarios, a shared-nothing concurrency model is used. This means that you cannot access variables or change state between the code executing in different threads, except by using explicit message passing. The advantage of a shared-nothing model is that it prevents race conditions, deadlocks, and other threading issues that are very difficult to diagnose.

By having an explicit concurrency mechanism built into the platform, your application will benefit from more efficient use of multi-core processors and can avoid pauses in animation and rendering while CPU-intensive operations are being executed.

Threaded Render Pipeline

The Flash rendering pipeline is single-threaded today, which means that it cannot take advantage of multiple cores on newer mobile devices, such as the Motorola ATRIX. This is particularly problematic when rendering graphics and video, which end up being processed sequentially, as shown in Figure 10–15.

images

Figure 10–15. Single-threaded render pipeline

When the ActionScript code execution takes longer than expected, this can cause video frames to get dropped. Flash will compensate by skipping stage rendering and prioritizing video processing on the subsequent frame. The result is that your video and animation performance both suffer significant degradation while one of your processors remains idle.

The threaded render pipeline offloads video processing to a second CPU, allowing video to run smoothly regardless of delays in ActionScript execution or stage rendering. This makes optimal use of the available resources on a multi-core system, as shown in Figure 10–16.

images

Figure 10–16. Multithreaded render pipeline

We can take this a step further by leveraging Stage Video to offload video decoding and compositing to the graphics processor, which gives you the optimized render pipeline shown in Figure 10–17.

images

Figure 10–17. Multithreaded render pipeline with Stage Video

The net result is that you are able to do more processing in your ActionScript code without impacting either your frame rate or video playback.

Stage3D

One of the other items on the Flash roadmap that has received considerable attention is Stage3D. The code name for this technology is Molehill, and it is of particular interest to game developers who need a cross-platform 3D library that is very close to the underlying graphics hardware. Some of the applications that Stage3D makes possible are shown in Figure 10–18.

images

Figure 10–18. Molehill demos from Away3D (top and bottom-right) and Adobe Max (bottom-left)

These examples were built using a third-party 3D toolkit called Away3D on top of a pre-release version of Stage3D. Some other toolkits that you can expect to take advantage of Stage3D include Alternative3D, Flare3D, Sophie3D, Unity, Yogurt3D, and M2D.

Besides being useful to game developers, Stage3D also opens up the possibility of having a highly optimized 2D UI toolkit. As discussed earlier with the GPU acceleration support, graphics processors can do many operations much faster than the CPU can, while consuming less power and saving battery life. By completely offloading the UI toolkit to the graphics processor, the CPU can be dedicated to application and business logic, leaving the display list management, compositing, and rendering to the GPU via the existing 3D scenegraph.

Summary

As you have learned in this chapter, building high-performance Flex applications with advanced graphics, high frame rate, and smooth animation is attainable by following some mobile tuning best practices. Some of the specific areas in which you have gained performance tuning knowledge include the following:

  • Speeding up graphics rendering
  • Caching portions of the scenegraph as Bitmaps
  • Building high-performance item renderers
  • Optimal use of Text and Item components

In addition, you also learned about future improvements in the Flash runtime and graphics processing capabilities that you will be able to take advantage of in the future with no code changes.

All of these performance tuning techniques also apply to our final topic, which is extending the reach of your Flash and Flex applications to tablet, TV, and beyond.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.54.136