List of Figures
1.1 The visualization pipeline. Abstract data is input to the
pipeline, where the pipeline transforms it into readily com-
prehensibleimages........................ 2
1.2 Animageoftwovortexcoresmerging. ............ 2
2.1 A simple parallelization strategy for a parallel visualization
framework............................. 14
2.2 A parallel visualization framework embedded into a client–
serverapplication......................... 17
2.3 Determining constraints and optimizations in a data flow net-
workusingcontracts....................... 19
2.4 Reading the optimal subset of data for processing . . .... 20
3.1 The three most common visualization pipeline partitionings
inremoteanddistributedvisualization............. 27
3.2 Three potential partitionings of a remote and distributed vi-
sualizationpipelineperformanceexperiment.......... 34
3.3 The three potential partitionings have markedly different per-
formancecharacteristics..................... 35
3.4 Visapult’s remote and distributed visualization architecture. 37
3.5 Chromium Renderserver architecture diagram for a two-tile
DMXdisplaywallconguration................. 39
3.6 Chromium Renderserver running a 6-way parallel visualiza-
tion of a molecular docking application on a 3 × 2 tiled display
system............................... 41
4.1 Rendering taxonomy based on Molnar’s description. ..... 51
4.2 Raycastingforvolumerendering. .............. 54
4.3 Volumerenderingoverview. .................. 54
4.4 Ray cast of a 316M triangle isosurface from timestep 273 of
anRMinstabilitycalculation.................. 60
4.5 Ray cast of the 259M triangle Boeing data set using HD res-
olution............................... 61
4.6 Frame rate when varying cache size and number of nodes in
theraycastedRMandBoeingdatasets............ 63
4.7 TotalmemoryusedpernodefortheRMdataset....... 64
xiii
xiv High Performance Visualization
4.8 Displayprocessscaling...................... 65
5.1 Sort-lastimagecompositing................... 73
5.2 Example of the direct-send algorithm for four processes. . . 75
5.3 Tree-basedcompositing. .................... 75
5.4 Example of the binary-swap algorithm for four processes and
two rounds. ........................... 76
5.5 Example of the 2-3 swap algorithm for seven processes in two
rounds. .............................. 78
5.6 Example of the radix-k algorithm for twelve processes, fac-
tored into two rounds of
k =[4, 3]. .............. 79
5.7 Performance comparing binary-swap with radix-k for an im-
agesizeofeightmegapixels. .................. 82
5.8 Targetk-valuesfortwodierentmachines........... 83
5.9 Performance comparing optimized binary-swap with radix-k
shows overall improvement in volume rendering tests of core-
collapsesupernovaesimulationoutput. ............ 83
5.10 Core-collapse supernova volume rendered in parallel and com-
positedusingtheradix-kalgorithm............... 84
6.1 A stream surface visualizing recirculating flow in a vortex
breakdown bubble. ....................... 92
6.2 Streamlines showing outflows in a magnetic flow field com-
putedbyanastrophysicssimulationcode. .......... 93
6.3 Integralcurvetestdatasets................... 96
6.4 Parallelization over seeds versus parallelization over data al-
gorithms.............................. 97
6.5 Astrophysicstestcasescalingdata............... 104
6.6 Fusiontestcasescalingdata. ................. 105
6.7 Data structure as a 3D/4D hybrid of time–space decomposi-
tion. ............................... 107
6.8 Thecumulativeeectonend-to-endexecutiontime. .... 109
7.1 Asamplebitmapindex. .................... 121
7.2 Serial performance for computing conditional histograms and
evaluatingbin-basedhistogramqueriesusingFastBit..... 123
7.3 Comparison of line-based and histogram-based parallel coor-
dinates and extensions of parallel coordinates to visualize the
temporalevolutionofaselectedfeature. ........... 126
7.4 Visualizations illustrating the application of segmentation of
queryresults. .......................... 128
7.5 Query-driven analysis of network traffic data to detect dis-
tributednetworkscanattacks.................. 131
7.6 Query-driven analysis of halo particles in a electron linear
particle accelerator simulation. ................. 134
List of Figures xv
7.7 Query-driven exploration of a large 3D plasma-based particle
accelerator simulation using parallel coordinates to define and
validatedataqueries....................... 137
7.8 Visualization of the relative traces of the particles of a particle
beam illustrating the injection and initial acceleration of the
particlesbytheplasmawave. ................. 138
8.1 The Z-order, or Morton, space-filling curve maps multi-
dimensional data into one dimension while preserving the lo-
calityofneighboringpoints. .................. 148
8.2 PDAbasedonmultiresolution. ................ 152
8.3 Analysis, reconstruction, and synthesis of a signal, c
j
..... 157
8.4 A test signal with 1024 samples and a multiresolution approx-
imation at 1/8
th
theresolution. ................ 160
8.5 The CDF 9/7 biorthogonal wavelet and scaling functions. . 163
8.6 A single pass of the 2D DWT and resulting decomposition
after two passes of the DWT. . . ............... 164
8.7 Directvolumerenderingofanenstrophyeld......... 168
9.1 Comparing timings for a turbulent combustion simulation and
its corresponding in situ visualization over many concurrency
levels,upto15,360cores. ................... 176
9.2 Image produced from in situ visualization of the CH
2
O field
fromaturbulentcombustionsimulation ........... 176
9.3 Diagram of the adaptor that connects fully featured visual-
izationsystemswitharbitrarysimulationcodes........ 177
9.4 Example of co-processing in situ visualization using adaptors,
showing the VisIt system with the GADGET-2 simulation
code................................ 182
9.5 Diagram of a concurrent in situ system performing data pro-
cessing, monitoring and interactive visualization ....... 188
9.6 Illustration of hybrid data staging architecture for in situ co-
processing and concurrent processing. ............. 189
9.7 An example of locality-aware, data-centric mapping of two
interactingapplicationsontotwomulti-corenodes. ..... 190
9.8 Image produced from an in situ visualization of a turbulent
combustion simulation, showing the interaction between small
turbulenteddies. ........................ 192
10.1 A comparison of data parallel visualization and streaming. . 201
10.2 Synchronous and asynchronous streaming in a data flow visu-
alizationnetwork......................... 203
10.3 In push and pull pipelines, the source or sink direct the
pipeline to process the next portion. ............. 206
10.4 Culling unimportant data when slicing. ............ 207
xvi High Performance Visualization
10.5 Metadata must remain accurate, without processing the data
intheportion........................... 208
10.6 Assuming the cache can fit the results, the cache module will
prevent upstream re-executions when the camera moves but,
otherwise,thedataremainsunchanged. ........... 209
10.7 Culling, prioritizing, and multiresolution streaming in a pull
pipeline. ............................. 213
11.1 SimpliedgraphicspipelinebasedonOpenGL. ....... 225
11.2 GPGPU architecture with multiple compute programs called
in sequence from the CPU, but running each in parallel on
theGPU. ............................ 228
11.3 Interactive sort-first volume rendering of the Visible Human. 235
11.4 Visualization of a laser ablation simulation with 48 million
atomsrenderedinteractively................... 238
11.5 Schematic overview of the abstraction layers of the CUDASA
programmingenvironment.................... 241
11.6 A rear-projection tiled display without photometric calibra-
tion. ............................... 245
12.1 4608
2
image of a combustion simulation result, rendered by
hybrid parallel MPI+pthreads implementation running on
216,000 cores of a Cray XT6 system. ............. 263
12.2 P
H
volume rendering system architecture. Image courtesy
of Mark Howison, E. Wes Bethel, and Hank Childs (LBNL). 265
12.3 For ghost zone data, the P
H
volume renderer requires less
memory and performs less interprocessor communication than
the P
T
implementation...................... 270
12.4 Charts comparing P
H
and P
T
ray casting and total render
performance............................ 271
12.5 Comparison of the number of messages and total data sent
during the fragment exchange in the compositing phase for
P
H
and P
T
runs.......................... 273
12.6 Total render time in seconds split into ray casting and com-
positing components and normalized to compare P
T
and P
H
performance............................ 274
12.7 Astreamlinefromcomputationalthermalhydraulics..... 275
12.8 Comparison of P
T
and P
H
implementations of the parallelize
over seeds algorithm....................... 276
12.9 Comparison of P
T
and P
H
implementations of the parallelize
over blocks algorithm. ..................... 278
12.10 Comparison of P
H
and P
T
performance for the parallelize over
seeds algorithm.......................... 281
List of Figures xvii
12.11 Gantt chart showing a comparison of integration and I/O
performance/activity of the parallelize over seeds P
T
and
P
H
versionsforoneofthebenchmarkruns........... 283
12.12 Performance comparison of the P
H
and P
T
variants of the par-
allelize over blocks algorithm. ................. 284
12.13 Gantt chart showing a comparison of integration I/O,
MPI
Send,andMPI Recv performance/activity of the paral-
lelize over blocks P
T
and P
H
versions for one of the benchmark
runs. ............................... 285
13.1 Contouring of two trillion cells, visualized with VisIt on
Franklinusing32,000cores. .................. 294
13.2 Plots of execution time for the I/O, contouring, and rendering
phases of the trillion cell visualizations over six supercomput-
ingenvironments......................... 296
13.3 Contouring of replicated data (one trillion cells total), visual-
ized with VisIt on Franklin using 16,016 cores. ........ 299
13.4 Rendering of an isosurface from a 321 million cell Denovo
simulation, produced by VisIt using 12,270 cores of JaguarPF. 301
13.5 Volume rendering of data from a 321 million cell Denovo sim-
ulation, produced by VisIt using 12,270 cores on JaguarPF. 302
13.6 Volume rendering of one trillion cells, visualized by VisIt on
JaguarPFusing16,000cores. ................. 303
14.1 Comparison of Gaussian and bilateral smooth applied to a
synthetic,noisydataset..................... 312
14.2 Three different 3D memory access patterns have markedly dif-
ferent performance characteristics on a many-core GPU plat-
form................................ 313
14.3 Using GPU-specific features can produce a 2× performance
gain. ............................... 315
14.4 Filter performance has a 7.5× variation depending upon the
settingsoftunablealgorithmicparameters........... 315
14.5 Chart showing how filter runtime performance on the GPU
varies as a function of CUDA thread block size. ....... 316
14.6 Parallel ray casting volume rendering performance measures
on the GPU include absolute runtime, and L2 cache miss
rates................................ 321
14.7 Examples showing how different transfer functions produce
differing visible and performance results in parallel volume
rendering. ............................ 323
14.8 Performance gains on the GPU using Z-ordered memory in-
creasewithincreasedconcurrency................ 324
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.14.98