Parallel Integral Curves 95
that the ICs will traverse a relatively small amount of the overall data. On
the other hand, for some applications, like streamline statistics, a sparse seed
point set covers the entire vector field evenly. This distribution results in ICs
traversing the entire data set. Hence, the seed set distribution is a significant
factor in determining if the performance stands to gain the most from parallel
computation, data distribution, or both.
Vector Field Complexity. Depending on the choice of seed points, the struc-
ture of a vector field can have a strong influence on which parts of the data need
to be taken into account in the IC computation process. Critical points, or
invariant manifolds, of a strongly attracting nature draw streamlines towards
them. The resulting ICs seeded in or traversing their vicinity will remain
closely localized. In the opposite case, a nearly uniform vector field requires
ICs to pass through large parts of the data. This dependency of IC com-
putation on the underlying vector field is both counterintuitive and hard to
identify without conducting a prior analysis to determine the field structure,
as is done in Yu et al.’s study [13], for example. While such analysis can be
useful for specific problems, this chapter’s contents consider a more general
setting, where the burden and cost of pre-processing is not considered.
6.3 Approaches to Pa rallelization
IC calculation places demands on almost all components of a computa-
tional system, including memory, processing, communication, and I/O. Be-
cause of the complex nature of vector fields, seeding scenarios, and the types
of analyses, as outlined in 6.2, there is no single scalable algorithm suitable
for all situations.
Generally, algorithms can be parallelized in two primary ways. The first
way is parallelization across the seed points, where seeds are assigned to pro-
cessors, and data blocks are loaded as needed. The second way is paralleliza-
tion across the data blocks, where processors are assigned a set of data blocks,
and particles are communicated to the processor that owns the data block. In
choosing between these two axes of parallelization, the different assignments
of particles and data blocks to processors will place different demands on the
computing, memory, communication, and I/O subsystems of a cluster.
These two classes of algorithms have a tendency to perform poorly due to
a workload imbalance, either through an imbalance in particle to processor as-
signment, or through loading too many data blocks, which become I/O bound.
Recent research blending these two approaches shows promising results.
The subsequent sections outline algorithms that parallelize with respect to
particles (6.3.2) and data blocks (6.3.3), as well as a hybrid approach (6.3.4)
and discuss the algorithm’s performance characteristics. These strategies aim
to keep a balanced workload across the set of processors and to design efficient
data structures for handling IC computation and communication.
In the approaches outlined below, the problem mesh is decomposed into