80 High Performance Visualization
above, other possible parameters for
k are [12], [6, 2], [2, 6], [3, 4], or [2, 2, 3],
to name a few. With a judicious selection of
k, higher compositing rates are
attained when the underlying hardware offers support for multiple commu-
nication links and the ability to perform communication and computation
simultaneously. Even when hardware support for increased parallelism is not
available or the image size or number of processes dictates that binary-swap
or direct-send is the best approach, those algorithms are valid radix-k config-
urations.
5.3.3 Optimizations
Kendall et al. [11] extended radix-k to include active-pixel encoding and
compression, and they showed that such optimizations benefit radix-k more
than its predecessors, because the choice of k-values is configurable. Hence,
when message size is decreased by active-pixel identification and encoding,
k-values can be increased and performance can be further enhanced. Their
implementation encodes nonempty pixel regions, based on the bounding box
information, into two separate buffers: one for alternating counts of empty
and nonempty pixels, and the other for the actual pixel values. This way,
new subsets of the image can be taken by reassigning pointers rather than
copying pixels, and images remain encoded throughout all of the composit-
ing rounds. Non-overlapping regions are copied from one round to the next
without performing the blending operation.
A set of empirical tests was performed to determine a table of target k-
values for different image sizes, number of processes, and architectures at both
the Argonne and Oak Ridge Leadership Computing Facilities. The platforms
tested were IBM Blue Gene/P, Cray XT, and two graphics clusters. With this
table, radix-k can look up the closest entry for a given image size, system
size, and architecture, and automatically factor the number of processes into
k-values as close to the target value as possible.
Moreland et al. [18] deployed and evaluated optimizations in a production
framework, rather than in isolated tests: IceT serves as both this test and
production framework. The advantages of this approach are that the tests
represent real workloads and improvements are ready for use in a production
environment sooner. These improvements include minimizing pixel copying
through compositing order and scanline interleaving, and a new telescoping
algorithm for the non-power-of-two number of processes that can further im-
prove radix-k. A final advantage of using IceT for these improvements is that
IceT provides unified and reproducible benchmarks that other researchers can
repeat.
One of the improvements was devising a compositing order that minimizes
pixel copying. The usual, accumulative order causes nonoverlapping pixels to
be copied up to k
i
−1 times in round i, whereas tree methods only incur log k
i
copy operations. Pixel copying can further be reduced while using a novel
image interlacing algorithm. Rather than interleaving partitions according to