204 High Performance Visualization
space for inter-stage buffering. As illustrated in Figure 10.2, the synchronous
pipeline, on the left, cannot start processing the second portion until the first
is finished. The first module in the asynchronous pipeline, on the right, will
start working on the second portion as soon as it has finished processing the
first one.
10.3 Streamed Visualization Concepts
This section discusses some general aspects of applying the concept of
streaming to scientific visualization. The discussion will be limited to tradi-
tional qualitative scientific visualization in which data sets are processed and
rendered in an effort to gain qualitative insights into the problem at hand.
10.3.1 Data Structures
Ideally, streaming visualization must break scientific data apart into co-
herent regions of space and time. The different classes of input data that are
to be visualized have properties that make them more or less well-suited for
partitioning.
A completely structured data set has the property where the topology and
geometry are implicit and the data values represent regularly separated sam-
ples. Structured data sets are efficiently stored as data value arrays. Streaming
with structured data is relatively easy:
The mapping between geometric location and a location within a data
array is implicit and easily calculated.
Portions containing spatially coherent regions (slices along the slow-
est changing direction) are straightforward [32, 34] without any pre-
processing.
With pre-processing, portions containing contiguous regions [9] and mul-
tiresolution representations [26] are also trivial to acquire.
Strided sampling can produce lower resolution data [27, 3].
Unstructured data are not as well-suited to streaming because of their
unconstrained topology and geometry. For example, in the context of the
VTK library, to know the location of any given cell, the program must access
the cell’s vertices by walking the cell’s vertex index list. The vertices are
stored in their own array, and the vertices, for any cell will be arbitrarily
scattered in that array. It is not trivial to create coherent portions since the
cell and vertex arrays may not fit into the memory, and because geometric
information is spread across that memory layout. Once partitioned, the fact
that a portion’s bounding box is not known until after the portion is entirely
read can also be problematic.
Streaming and Out-of-Core Methods 205
Modern external memory and streaming formats for unstructured data
solve the first problem by in-lining the vertex data with the cell data (also
called normalized or triangle soup representations [6]). The cells and vertices
are kept together in portions called meta-cells [5], or by intermixing the vertex
and cell lists so that the vertices and the cells that reference them are kept
very close together in the data stream [19]. The second problem is solved by
keeping meta-information along with each portion.
A notable exception to the rule that unstructured data is hard to stream
is the point set. It is an unstructured data type that consists only of a set of
locations in space without any connections between them. The fact that one
level of indirection (from cell list to point list) is removed greatly simplifies
the handling of large point sets. There are also many algorithms that work on
point sets that do not require spatially coherent regions. For those algorithms
that do, standard OOC sorting algorithms [35] can be used in a pre-processing
step to produce regions that do not overlap.
Composite data sets, which consist of sets of structured or unstructured
data sets (i.e., blocks), can be straightforward to stream, as long as metadata
describing each block is available. In this situation it is straightforward to
process each block, in turn.
10.3.2 Repetition
In streaming visualization, the visualization pipeline has to support repe-
tition to iterate through all of the portions that make up the whole data set.
As Figure 10.3 illustrates, both push and pull pipeline models can support
streaming. In the push model, the system reads a portion at a time. The act
of reading each portion causes the pipeline to push it through toward the
rendering module. In the pull model, it is the rendering module that requests
each portion, in turn (see also 2.2.2).
Comparing the two, a push model is better suited to unbounded length
streams since the availability of new data can simply trigger the next cy-
cle. Push models also have less overhead in asynchronous streaming because
downstream modules can begin as soon as results are ready [36]. On the other
hand, push models are not as effective at sparse traversal because downstream
modules can not be as selective about what the pipeline processes [23].
With either approach, streaming visualization systems must aggregate re-
sults over time as each new result is produced. For example, with a scan-
converted rendering of opaque surfaces, the color and depth buffers are not
cleared in between portions, so that each pixel ends up with the color of the
nearest primitive out of the entire data set.
10.3.3 Algorithms
Some, but not all, visualization algorithms are inherently well-suited for
streaming [32, 34, 11]. The defining factor is how much support an algorithm
206 High Performance Visualization
push
pull
create_pipeline
(source, ..., sink)
while (!endofstream)
source.readMore()
create_pipeline
(source, ..., sink)
while (piece<#pieces)
sink.render(piece++)
FIGURE 10.3: In push and pull pipelines, the source or sink direct the pipeline
to process the next portion.
requires: local support or global support? Localized algorithms, typified by
fragment programs with pixel processing, are well-suited, because they ana-
lyze each point or cell independently, and are thus insensitive to data parti-
tioning and traversal order. Many extraction algorithms fall into this category.
Algorithms that require information from a local neighborhood of elements,
for example differential operators, are less well-suited as both the primary el-
ement and its neighbors must be resident together. This is typically solved by
using a ghost cell technique, where portions include a layer of cells belonging
to their neighbors.
Algorithms that need global support must be recast into a local problem to
work efficiently under a streaming paradigm [40]. Problems such as streamline
integration, which can suffer from load imbalances in a data parallel solution,
also tend to cause excessive passes through the data in a streaming solution.
In a distributed-memory data parallel context, in which the data fits in the
parallel machine’s aggregate memory, the data needed by one processor to
continue work is always present somewhere. An algorithm can resort to com-
municating between different processors to continue forward work progress.
In a streaming context, data dependencies are more critical, since the data
needed to continue work may not have been loaded yet, or it might have
been processed and evicted from the memory to make space for current data.
This problem may require additional passes to solve. The problem of interde-
pendence between portions can sometimes be solved by carefully considering
the order in which portions are processed, such as in the parallel streamed
computation of ghost cells [16].
Rendering algorithms may be local or global in nature. Aggregating scan-
converted or splat-rendered results, neglecting translucency, are handled sim-
ply, as described earlier. Volume rendering and translucent surface rendering
is problematic for two reasons: first, as the interior of the data is visible, a
larger fraction of it contributes to the image and fewer portions can be culled;
and second, one must maintain a back-to-front (or front-to-back) traversal
order [32]. Ray tracing and streamlines calculations are problematic because
of their unstructured memory access characteristics (see 14.3 and Chap. 6).
Streaming and Out-of-Core Methods 207
FIGURE 10.4: Culling unimportant data leaves only portion 2 when slicing
at X = 0. Likewise, lazy evaluation could operate on only the important cells
within the portion.
10.3.4 Sparse Traversal
In streamed visualization, where each pass through the data takes a signif-
icant amount of time and where many passes may be needed, achieving inter-
activity may require reducing the number of portions processed. Any region of
the data that can be quickly determined as being unimportant to the final vi-
sualization should be culled to avoid time-consuming processing [23, 7, 2]. If a
portion’s extreme data ranges do not encompass a contour module’s isovalue,
the entire portion can be ignored by the whole pipeline. Similarly, as shown
in Figure 10.4, a slice module could reject a portion by testing its bounding
box against a slicing plane. These “lightweight,” “metadata” operations can
be performed quickly and are able to save time by skipping “heavyweight”
module operations. At a finer granularity, expensive derived computations on
cells within a passing portion may also be avoided.
Other lightweight computations, such as rendering optimization, can save
time as well. For example, portions that lie entirely outside the view frustum,
or that are occluded by other portions, should be skipped. In some situations,
back-facing portions are assumed to be occluded, so back-facing portions can
be rejected in the same way that back-facing polygons are rejected in render-
ing [28]. Level-of-detail techniques are also appropriate for reducing data and
processing passes. With large data, there may be many more display primi-
tives (polygons, voxels, etc.,) than there are pixels, so only a subset of those
primitives actually contribute [29, 37, 3]. During interaction, responsiveness
is more important than accuracy, so the level of detail can be further reduced.
In practice, the appropriate level of detail can be determined by projecting
the portion’s bounding boxes into screen-space; low-resolution data can be
obtained by downsampling either on-the-fly or in a pre-processing step.
208 High Performance Visualization
slice
x=0
read
portion 1
X=
20,60
portion 2
X=
-20,20
portion 3
X=
-60,-20
render
translate
x'=x-40
slice
x'=0
read render
FIGURE 10.5: In order to be used to avoid processing portions, metadata
must remain accurate without processing the data in the portion. The slice
module in the pipeline on top can trust the metadata from the source and will
ignore portions 1 and 3. The translate module in the pipeline on the bottom
will alter the real data, and the slice module should ignore portions 2 and 3
instead.
Sparse traversal adds an additional requirement to have accurate meta-
data with the partitioned data. In a few cases, simulation codes store meta-
data along with the data. If metadata is not present, the data must be pre-
processed, or, the metadata can be derived on the first pass through the data,
during which sparse traversal will not be possible.
Once the metadata is obtained, it must be managed through the
pipeline [10]. The pipeline avoids processing chunks based on the contents of
their metadata. A portion that is off-screen will be ignored. However, pipeline
modules manipulate data and thereby invalidate every portion’s metadata.
For example, a module that repositions data may move an offscreen portion
back into view. Downstream modules can not cull anything then, unless up-
stream modules change the metadata before processing the data. Figure 10.5
demonstrates another example. In many cases the metadata can not be deter-
mined a priori. The system should fall back to marking a portion’s metadata
as “known to be clean” or “potentially dirty,” and then cull portions based
only on the clean information. Later, the dirty information can be updated
and marked clean after data processing once it is known.
Lastly, the output of the visualization pipeline tends to be much smaller
than the whole data set. Although the input data may be very large, the
salient features within it tend to be small. Some salient features are: the part
of the data within a particularly important region; slices, isocontours and
other techniques that reduce dimensionality; descriptive statistical metrics;
locations of extremal values; the presence or absence of particular events; etc.
Because the features are small, it is often possible to cache results [23, 10] in
the available memory, in order to avoid redundant processing while the user
interacts with previously generated results. Figure 10.6 illustrates an example
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.72.176