IceT 375
counts [10]. This chapter primarily describes the techniques IceT uses to ef-
fectively render to large-format displays.
17.3.1 Theoretical Limitations ... and How to Break Them
There are a number of theoretical metrics with important practical conse-
quences. These include the number of pixel-blending operations performed by
each process (which affects the total time computing), the number of pixels
sent or received by each process (which affects how long it takes to trans-
fer data), the number of messages sent at any one time (which can affect
network congestion), and the number of sequential messages sent (which can
accumulate the effect of the network latency).
With respect to IceT’s performance on large images, the most important of
these metrics are the number of pixel-blending operations and the number of
pixels sent and received. It is easy to show, for example, that the binary-swap
algorithm is optimal on both counts. The binary-swap algorithm is described
in 5.2.2 as well as previous studies [3, 4].
Binary-swap is a divide-and-conquer algorithm that operates in rounds
that pair processes, swap image halves, and recurse in each half. Given p
processes compositing an image of n pixels, there must be at least (p −1) · n
blending operations (because it takes p −1 operations to blend the p versions
of each pixel generated by all the processes). A perfectly balanced parallel
algorithm will blend
(p−1)·n
p
pixels in each process.
The binary-swap algorithm has log
2
p rounds with each round blending
n/2
i
pixels in each process, where i is the round index starting at 1. The total
number of blending operations performed in each process is therefore
log
2
p
i=1
n
2
i
= n −
n
2
log
2
p
= n −
n
p
=
(p − 1) ·n
p
, (17.1)
which is, as previously mentioned, optimal. Likewise, binary-swap has an op-
timal amount of pixels transferred. Radix-k [13], which is also supported in
IceT, is similarly optimal with respect to pixel blending and pixel transfer.
In addition, radix-k can also reduce the number of rounds as well as overlap
computation and communication [2].
Although this theoretical, optimal solution still grows linearly with respect
to the resolution of the image, it is possible, in practice, to perform much
better. The previous analysis makes a critical assumption: that every pixel
generated by every process contains useful data. Such generation is possible
in the worst case, but in practice, many pixels can be ignored.
Consider the example of parallel rendering shown in Figure 17.1. Each
process renders a localized cube of data surrounded by an abundance of blank
space. Although the example in Figure 17.1 may seem artificial, this case is
actually quite common. Sort-last volume rendering requires the geometry to