334 High Performance Visualization
exploit simple weak scaling as machines grow in size. Instead, codes, including
algorithms for scientific data understanding, will have to greatly increase their
exploitation of on-node parallelism using very scarce memory resources.
The second implication is that, primarily for reasons of energy efficiency,
the locality of data will become much more important. On an exascale-class
architecture, the most expensive operation from a power perspective will be
moving data. The further the data is moved, the more expensive the process
will be. Therefore, approaches that maximize locality as much as possible and
pay close attention to data movement are likely to be the most successful.
Although this is also the case at the petascale, it will become a much more
dominant factor at the exascale. As well as locality between nodes, it will also
be essential to pay attention to on-node locality, as the memory hierarchy
is likely to be deeper. The importance of locality also implies that global
synchronization will be very expensive, and the level of effort required to
manage varying levels of synchronization will be high.
Finally, the last implication is that the growth in the external secondary
storage system on an exascale machine, both in capacity and bandwidth, will
be dramatically less than the growth in floating point operations per second
(FLOPS) and concurrency. The relative decrease in I/O will have dramatic
impacts upon the way data is moved off the HPC system. The relative per-
formance of an I/O system often can be judged by measuring how long it will
take to “checkpoint” the entire machine, that is, write the entire contents of
the memory to the secondary storage system (spinning disk). Over the past
15 years of HPC, that time has grown steadily, from 5 minutes in 1997 to
over 26 minutes in 2008 [7]. Extrapolations to the exascale vary between 40
and 100 minutes. Clearly, it will no longer be practical to quickly “dump”
the current state of a simulation to disk for later analysis by a visualization
tool. Both storing and analyzing simulation results are thus likely to require
entirely new approaches. One immediate conclusion is that much analysis of
simulation data is likely to be performed in situ with the simulation to min-
imize communication and I/O bandwidth to secondary storage. (See 15.4.1.1
for a more in-depth discussion.)
One bright note rings out that possibly mitigates some of these issues.
New non-volatile memory technologies are emerging that may facilitate some
of the dramatic changes needed for I/O optimization. Probably the most well
known of these is NAND flash, because of its ubiquity in consumer electronics
devices such as music players, phones, and cameras. The last few years have
seen the first exploration of its use in HPC systems [6]. Because it enables sig-
nificant improvements in read and write latency, non-volatile memory holds
the promise of improving relative I/O bandwidth for small I/O operations.
Adding a non-volatile memory device to the nodes of an HPC system provides
the underlying hardware resource necessary to both improve checkpoint per-
formance and provide a fast “swap” capability for these memory-constrained
nodes.
Current visual data analysis and exploration platforms, such as VisIt [20]