Parallel Integral Curves 99
For example, if the entire mesh will fit into the main memory of a single
process, or if the flow is known to be spatially local, the I/O will be minimal.
In such cases, parallelization is trivial, and parallel scaling is expected to be
nearly ideal. Also, the risk of processor imbalance is minimized because of the
uniform assignment of seeds to processors.
Disadvantages: Because data blocks are loaded on demand, it is possible for
I/O to dominate this algorithm. For example, in a vector field with circular
flow and a data block cache that does not have large enough blocks, data will
be continuously loaded and purged in the LRU cache. Additionally, workload
imbalance is possible if the computational requirements for particles differ
significantly. For example, if some particles are advected significantly farther
than others, the processors assigned to these particles will be doing all the
work, while the others sit idle.
6.3.3 Parallelization Over Data
The second of the traditional algorithms is one that parallelizes over the
axis of data blocks. In this algorithm, shown in Figure 6.4, the data blocks
are statically assigned to the processors. Thus, given n processors, 1/n of the
data blocks are assigned to each processor. Each particle is integrated until it
terminates or leaves the blocks owned by the processor. As a particle crosses
block boundaries, it is communicated to the processor that owns the data
block. A globally maintained active particle count is maintained so that all
processors may monitor how many particles have yet to terminate. Once the
count goes to zero, all processors terminate.
Advantages: This algorithm performs the minimal amount of I/O, where
each data block is loaded once, and only once into memory. As I/O operations
are orders of magnitude more expensive than computation, this is a significant
advantage. This algorithm is well-suited to situations where the vector field is
known to be uniform and the seed points are known to be sparse.
Disadvantages: Because of the spatial parallelization, this algorithm is very
sensitive to seeding and vector field complexity. If the seeding is dense, only the
processors where the seeds lie spatially will do any work. All other processors
will be idle. Finally, in cases where the vector field contains critical points, or
invariant manifolds, points will be drawn to these spatial regions, increasing
the workload of some processors, while decreasing the work of the rest.
6.3.4 A Hybrid Approach to Parallelization
As outlined previously, parallelization over seeds or data blocks are subject
to workload imbalance. Because of the complex nature of vector field analy-
sis, it is often very difficult or impossible to know a priori which strategy