106 High Performance Visualization
many ICs must be communicated to the block owning processors. For sparse
seeding, the ICs are more uniformly distributed and communication costs are
therefore lower.
6.3.6 Hybrid Data Structure and Communication Algorithm
Large seed sets and large time-varying vector fields pose additional chal-
lenges for parallel IC algorithms. The first challenge is the decomposition
of 4D time-varying flow data, in space and time. This section discusses a
hybrid data structure that combines both 3D and 4D blocks for comput-
ing time-varying ICs efficiently. The second challenge arises in both steady
and transient flows but it is magnified in the latter case: this is the problem
of performing nearest-neighbor communication iteratively while minimizing
synchronization points during the execution. A communication algorithm is
introduced, then, for the exchange of information among neighboring blocks
that allows tuning the amount of synchronization desired. Together, these
techniques are used to compute 1/4 million steady-state and time-varying
ICs on vector fields over 1/2 terabyte in size [10].
Peterka et al. [9] introduced a novel 3D/4D hybrid data structure, which is
shown in Figure 6.7. It is called “hybrid” because it allows a 4D space–time to
be viewed as both a unified 4D structure and a 3D space × 1D time structure.
The top row of Figure 6.7 illustrates this idea for individual blocks, and the
bottom row shows the same idea for neighborhoods of blocks.
The same data structure is used for both steady-state and time-varying
flows, by considering steady-state as a single timestep of the time-varying gen-
eral case. After all, a particle is a 4D (x, y, z, t) entity. In the left column of
Figure 6.7, a block is also a 4D entity, with both minimum and maximum
extents in all four dimensions. Each 4D block sits in the center of a 4D neigh-
borhood, surrounded on all sides and corners by other 4D blocks. A single
neighborhood consists of 3
4
blocks: the central block and adjacent blocks are
in both directions in the x, y, z,andt dimensions.
If the data structure were strictly 4D, and not hybrid, the center and right
columns of Figure 6.7 would be unnecessary; but then, all timesteps of a time-
varying flow field would be needed at the same time, in order to compute an
IC. Modern CFD simulations can produce hundreds or even thousands of
timesteps, and the requirement to simultaneously load the entire 4D data set
into a main memory would exceed the memory capacity of even the largest
HPC systems. At any given time in the particle advection, however, it is only
necessary to load data blocks that contain vector fields at the current time
and perhaps one timestep in the future. On the other hand, loading multiple
timesteps simultaneously often results in a more efficient data access pattern
from storage systems and reduced I/O time. Thus, the ability to step through
time in configurable-sized chunks, or time windows, results in a flexible and
robust algorithm that can run on various architectures with different memory
capacities.