Chapter 17
IceT
Kenneth Moreland
Sandia National Laboratories
17.1 Introduction ...................................................... 373
17.2 Motivation ........................................................ 374
17.3 Implementation .................................................. 374
17.3.1 Theoretical Limitations ... and How to Break Them ... 375
17.3.2 Pixel Reduction Techniques ............................. 376
17.3.3 Tricks to Boost the Frame Rate ......................... 377
17.4 Application Programming Interface ............................. 378
17.4.1 Image Generation ........................................ 378
17.4.2 Opaque versus Transparent Rendering .................. 379
17.5 Conclusion ........................................................ 379
References .......................................................... 381
The Image Composition Engine for Tiles (IceT) is a high-performance,
sort-last parallel rendering library. IceT is designed for use in large-scale,
distributed-memory rendering applications. It works efficiently with large
numbers of processes, with large amounts of geometry, and with high-
resolution images. In addition to providing accelerated rendering for a stan-
dard display, IceT also provides the unique ability to generate images for tiled
displays. The overall resolution of the display may be several times larger
than any viewport rendered by a single machine. IceT is currently being used
as the parallel rendering implementation in high performance visualization
applications as like VisIt (see Chap. 16) and ParaView (see Chap. 18).
17.1 Introduction
The Image Composition Engine for Tiles (IceT) is an API designed to
enable applications to perform sort-last parallel rendering on large displays [8].
The design philosophy behind IceT is to allow very large data sets to be
rendered on arbitrarily high resolution displays. Although frame rates can be
sacrificed in lieu of scalable polygon/second rendering rates, there are many
features in IceT that allow an application to achieve interactive rates. These
373
374 High Performance Visualization
features include image compression, empty pixel skipping, image reduction,
and data replication. Together, these features make IceT a versatile parallel
rendering solution that provides optimal parallel rendering under most data
size and image size combinations.
IceT is designed to take advantage of spatial decomposition of the geometry
being rendered. That is, it works best if all the geometry on each process is
located in as small a region of space as possible. When this is true, each process
usually projects geometry on only a small section of the screen. This results
in less work for the compositing engine.
Overall, IceT demonstrates extraordinary speed and scalability. It is used
to render at tremendous rates, such as billions of polygons per second, and on
the largest supercomputers in the world [10].
17.2 Motivation
The original motivation for IceT was the need to support high performance
rendering for scientific visualization on large format displays [16]. Further-
more, the IceT development group needed to take advantage of the distributed
memory rendering clusters that were replacing the more expensive multipipe
rendering computers of the day. These requirements still ring true today. Sci-
entific data continues to grow, desktop displays with over 2 megapixels are
common, and nearly all high performance scientific visualization is performed
on distributed-memory computers.
There are three general classes of parallel rendering algorithms: sort-first,
sort-middle, and sort-last [7] (although, it is possible to combine elements of
these classes together [14]). When run on a distributed-memory machine, every
type of parallel rendering algorithm has some overhead caused by communi-
cation. In sort-first and sort-middle algorithms, this overhead is proportional
to the amount of geometry being rendered. In the sort-last algorithms, this
overhead is proportional the number of pixels on the display.
Although sort-first and sort-middle parallel rendering algorithms efficiently
divide screen space and are often used to drive tiled displays, these algorithms
simply cannot scale to the size of data that sort-last algorithms are able to
support [12, 18]. Because large-scale data is its primary concern, IceT im-
plements sort-last rendering and employs the techniques used to reduce the
overhead incurred with high-resolution displays.
17.3 Implementation
The most important aspect of parallel rendering in IceT’s implementation
is that it performs well with both large amounts of data and high-resolution
displays. The sort-last compositing algorithms, described in Chapter 5, en-
sure that IceT performs well with large amounts of data and large process
IceT 375
counts [10]. This chapter primarily describes the techniques IceT uses to ef-
fectively render to large-format displays.
17.3.1 Theoretical Limitations ... and How to Break Them
There are a number of theoretical metrics with important practical conse-
quences. These include the number of pixel-blending operations performed by
each process (which affects the total time computing), the number of pixels
sent or received by each process (which affects how long it takes to trans-
fer data), the number of messages sent at any one time (which can affect
network congestion), and the number of sequential messages sent (which can
accumulate the effect of the network latency).
With respect to IceT’s performance on large images, the most important of
these metrics are the number of pixel-blending operations and the number of
pixels sent and received. It is easy to show, for example, that the binary-swap
algorithm is optimal on both counts. The binary-swap algorithm is described
in 5.2.2 as well as previous studies [3, 4].
Binary-swap is a divide-and-conquer algorithm that operates in rounds
that pair processes, swap image halves, and recurse in each half. Given p
processes compositing an image of n pixels, there must be at least (p 1) · n
blending operations (because it takes p 1 operations to blend the p versions
of each pixel generated by all the processes). A perfectly balanced parallel
algorithm will blend
(p1)·n
p
pixels in each process.
The binary-swap algorithm has log
2
p rounds with each round blending
n/2
i
pixels in each process, where i is the round index starting at 1. The total
number of blending operations performed in each process is therefore
log
2
p
i=1
n
2
i
= n
n
2
log
2
p
= n
n
p
=
(p 1) ·n
p
, (17.1)
which is, as previously mentioned, optimal. Likewise, binary-swap has an op-
timal amount of pixels transferred. Radix-k [13], which is also supported in
IceT, is similarly optimal with respect to pixel blending and pixel transfer.
In addition, radix-k can also reduce the number of rounds as well as overlap
computation and communication [2].
Although this theoretical, optimal solution still grows linearly with respect
to the resolution of the image, it is possible, in practice, to perform much
better. The previous analysis makes a critical assumption: that every pixel
generated by every process contains useful data. Such generation is possible
in the worst case, but in practice, many pixels can be ignored.
Consider the example of parallel rendering shown in Figure 17.1. Each
process renders a localized cube of data surrounded by an abundance of blank
space. Although the example in Figure 17.1 may seem artificial, this case is
actually quite common. Sort-last volume rendering requires the geometry to
376 High Performance Visualization
FIGURE 17.1: An example rendering of a cubic volume by eight processes.
The first eight images represent those generated by each process. The image
at the right is a fully composited image.
be spatially decomposed in this way [9], and even unstructured data tends to
exhibit good spatial decomposition.
If this blank space is identified and grouped, it does not need to be trans-
ferred or blended. In this way, the overhead from high-resolution images can
be reduced. Furthermore, as more processes are used, the geometry gets di-
vided into even smaller units, thereby further reducing the amount of pixel
data. Consequently, the larger overhead from higher-resolution images can be
compensated for by adding more processes to the task.
17.3.2 Pixel Reduction Techniques
The first step in reducing the amount of pixel data to be composited is
to identify the active pixels, which are those that have been rendered to,
and the complementary inactive pixels, which are those that have no render
information. IceT first conservatively estimates a region of inactive pixels by
projecting the bounding box of the geometry on the screen and declaring
everything outside of this inactive. It then checks the remaining pixels for the
background depth, or opacity, to find any remaining inactive pixels.
Once inactive pixels are identified, they are marked and their data is re-
moved from the image data storage. IceT uses a form of run-length encoding
called active-pixel encoding [11]. Active-pixel encoding stores alternating run
lengths of inactive and active pixels. Data for inactive pixels are removed,
whereas data for active pixels follow their respective run lengths. Active-pixel
encoding provides the double benefit of: (1) decreasing the amount of data
transferred; and (2) providing a simplified means of skipping over inactive
pixels that need not be blended.
Although active-pixel encoding does reduce the overhead of sort-last par-
allel rendering, the savings are generally not well balanced. When binary-swap
or radix-k partitions an image, the resulting subimages are unlikely to contain
the same number of active pixels.
A simple but effective method to rebalance the parallel compositing is to
interlace the image [7, 17]. In image interlacing, the pixels are shuffled around
to distribute local regions throughout the image. When interlacing an image,
IceT carefully shuffles regions that match those created by the binary-swap
or radix-k algorithm such that each partition binary-swap or radix-k creates
contains a block of unshuffled pixels. Thus, the reshuffling back to the original
IceT 377
FIGURE 17.2: Image interlacing in IceT.
pixel order is combined with the image partition collection at no extra cost
as shown in Figure 17.2
IceT can also take advantage of spatial decomposition in other ways on
multitile displays. In such a case, processes tend to render anything only on a
small set of tiles. IceT identifies completely blank tiles and removes them from
the compositing computation. Special parallel compositing algorithms balance
the compositing work for the remaining tiles that contain valid data [11].
The most effective of these algorithms is a reduction algorithm that assigns
processes to tiles in proportion to the number of images generated for each
tile. All images are sent to a process assigned to the corresponding tile, and
subsequently, each process group composites an image for their assigned tile.
17.3.3 Tricks to Boost the Frame Rate
Even with the pixel reduction techniques implemented in IceT, it may be
the case that the image composition overhead is still too great to maintain
an interactive frame rate. For example, this can occur when there are too few
processes driving a large-format display or when the view direction is zoomed
on large portions of the geometry.
A straightforward method to reduce the image compositing overhead is
to simply reduce the number of pixels in the image. Rather than render a
full-resolution image, the user can render a smaller image then resample up
to the displayed size. Clearly, this is not a technique that can be used for
every render because it loses detail. However, it is often the case in scientific
visualization that, when interacting with the data, a reduced level of detail
can be used in place of a full-resolution representation [1]. In such a case, you
might render a lower-resolution image, then replace it with a full-resolution
image after interacting is finished. To better support this technique, IceT can
automatically inflate images when drawing to an OpenGL context.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.214.27