200 High Performance Visualization
managing the memory footprint directly addresses the large data challenge
and it can also improve performance, even on small data because of the effect
of forced locality of data references on the memory hierarchy [32, 9, 34, 40].
However, there are challenges to applying streaming in general purpose
visualization. Some data structures are not well-suited for partitioning, and
not every algorithm can work within the confines of a restricted portion of the
data. Also, since a streaming formulation of an algorithm may involve many
passes through the data where each pass can be time consuming, optimizations
are essential for interactive data exploration. Fortunately, given that the data
is to be processed in distinct portions, streaming facilitates skipping (culling)
unimportant portions of the data, which can amount to a majority of the
input.
This chapter first discusses OOC processing and streaming, in general.
First, it considers the design parameters for applying the concept of stream-
ing to scientific visualization. Second, it presents some observations on the
potential optimizations and hazards of doing scientific visualization within
a streaming framework. With an appreciation of these fundamentals, recent
works are surveyed in which the streaming concept plays an important part
of the implementation. The discussion will conclude by summarizing the state
of the art and a conjecture about the future.
10.1 External Memory Algorithms
The most popular approach to large data visualization is data parallelism.
In data parallelism, the system takes advantage of the aggregate memory
of a set of processing elements by splitting up the data among all of them.
Each processing element only computes the portion for which it is responsible.
Chapters 16, 18, and 21 describe current scientific visualization systems that
make use of data parallelism.
Data parallelism implicitly assumes that the data fits entirely in a large ag-
gregate memory space, somewhere. Getting access to a large enough memory
space can be expensive and otherwise impractical for many users. Streaming
exchanges resources for time—instead of using ten machines to visualize a
problem, one machine might visualize the same problem but take ten times
longer. Streaming then allows all users to process larger data sets than they
otherwise could, regardless of the size of the machine.
External memory algorithms are written with the assumption that the
entire problem cannot be loaded all at once. These algorithms are described
well in surveys by Vitter [35], Silva [31], and Lipsa [22]. Vitter first classi-
fied external memory algorithms into two groups: online problems and batch
problems.
Online problems deal with large data by first pre-processing data and then
making queries into an organized whole, out of which each query can be sat-
isfied by touching only a small portion. The data structures and algorithms