64 High Performance Visualization
0
2
4
6
8
10
12
14
16
18
20
22
0 10 20 30 40 50 60
GB per node
render nodes
1/32 Memory
1/8 Memory 1/4 Memory 1/2 Memory 1/1 Memory
FIGURE 4.7: Total memory used per node for the RM data set. Image
source: Ize et al. [15].
ancer sharing the node with the display process. The render nodes thus have
a limited amount of work to perform and the display process quickly becomes
the bottleneck. Figure 4.8 shows that when using one thread to receive the
pixels and another to copy the pixels from the receive buffer to the image, the
maximum frame rate is 55fps, no matter how many render threads and nodes
being used. Using two threads to copy the pixels to the image results in up
toa1.6× speedup, three copy threads improves performance by up to 2.3×,
but additional copy threads offer no additional benefit, demonstrating that
copying is the bottleneck until three copy threads are used, after which the
receive thread becomes the bottleneck since it is not able to receive the pixels
fast enough to keep the copy threads busy. When around 18 threads are being
used, be they on a few nodes or many nodes, the system can obtain a frame
rate of 127fps. This is exactly the expected maximum of 127fps given by the
amount of time it takes to transmit all the pixel data across the InfiniBand
interconnect. More render threads result in lower performance due to the MPI
implementation not being able to keep up with the large volume of communi-
cation. In order to achieve the results required, the MPI implementation must
be tuned to use more RDMA buffers and turn off shared receive queues (SRQ);
otherwise, the system can still achieve the same maximum frame rate of about
127fps with 17 render cores, but after that point, adding more cores causes
Rendering 65
performance to drop off quickly, with 384 render cores (48 render nodes) being
2× slower. However, this is a moot point since faster frame rates will offer no
tangible benefit.
40
50
60
70
80
90
100
110
120
130
0 50 100 150 200 250 300 350 400
FPS
render cores
1C
2C 3C 4C 5C 6C 7C 6C+LB
FIGURE 4.8: Display process scaling: frame rate using varying numbers of
render cores to render a trivial scene when using one copy thread (1C) to seven
copy threads (7C), and when using six copy threads with the load balancer
process on the same node (6C+LB). Note that performance is not enhanced
beyond three copy threads. Image source: Ize et al. [15].
Modern graphics cards can produce four megapixel images at 60fps. As this
image size is roughly twice the HD image size, in our system the maximum
frame rate would halve to about 60fps. Higher resolutions than 4 megapixels
are usually achieved with a display wall consisting of a cluster of nodes driving
multiple screens. In this case, the maximum frame rate will be given not by
thetimetotransmitanentireimage,butbythetimeittakesforasingle
display node to receive its share of the image. Assuming each node renders 4
megapixels, and the load balancing and rendering continue to scale, the frame
rate will thus stay at 60fps, regardless of the resolution of the display wall.
Since three copy threads are able to keep up with the receiving thread, and
the load balancer process is also running on the same node, there are three
unused cores on the tested platform. If data is replicated across the nodes
then these three cores can be used for a render process. This render process
will also benefit from being able to use the higher-speed shared memory for its
66 High Performance Visualization
MPI communication with the display and load balancer instead of the slower
InfiniBand. However, if DC is required, then it will not be possible to run
any render processes on the same node since those render processes will be
competing with the display and load balancer for scarce network bandwidth
and this will much more quickly saturate the network port and result in much
lower maximum frame rates.
With modern hardware and software, the described system can ray trace
massive models at real-time frame rates on a cluster and even show interactive
to real-time rates when rendering distributed geometry using a small cache.
The system is one to two orders of magnitude faster than previous cluster ray
tracing implementations, which used both slower hardware and algorithms [11,
39], or had equivalent hardware but did not scale to as many nodes or to high
frame rates [5]. Compared to compositing approaches, the system can achieve
about a 4× improvement in the maximum frame rate for same size non-empty
images compared to the state of the art [18] and can also handle advanced
shading effects for improved visualization.
4.6 Conclusion
Parallel rendering methods for generating images from visualizations are
an important area of research. In this chapter, a general framework for par-
allel rendering was presented and applied to both geometry rendering and
volume rendering. In the future, as HPV moves into the exascale regime, par-
allel rendering methods will likely become more important as in situ methods
require parallel rendering and the send-image method of parallel display will
scale better than the send-geometry method. It is anticipated that GPUs will
become integrated into compute nodes, which offer another avenue for parallel
rendering in HPV.
Rendering 67
References
[1] C. Bajaj, I. Ihm, G. Joo, and S. Park. Parallel Ray Casting of Visi-
bly Human on Distributed Memory Architectures. In VisSym ’99 Joint
EUROGRAPHICS-IEEE TVCG Symposium on Visualization, pages
269–276, 1999.
[2] James Bigler, James Guilkey, Christiaan Gribble, Charles Hansen, and
Steven Parker. A Case Study: Visualizing Material Point Method Data.
In EUROVIS the Eurographics /IEEE VGTC Symposium on Visualiza-
tion, pages 299–306. EuroGraphics, 2006.
[3] James Bigler, Abe Stephens, and Steven G. Parker. Design for Paral-
lel Interactive Ray Tracing Systems. In Proceedings of the 2006 IEEE
Symposium on Interactive Ray Tracing, pages 187–196, 2006.
[4] Carson Brownlee, John Patchett, Li-Ta Lo, David DeMarle, Christopher
Mitchell, James Ahrens, and Charles Hansen. A Study of Ray Trac-
ing Large-Scale Scientific Data in Parallel Visualization Applications. In
Proceedings of the Eurographics Workshop on Parallel Graphics and Vi-
sualization, EGPGV ’12, pages 51–60. Eurographics Association, 2012.
[5] Brian Budge, Tony Bernardin, Jeff A. Stuart, Shubhabrata Sengupta,
Kenneth I. Joy, and John D. Owens. Out-of-Core Data Management
for Path Tracing on Hybrid Resources. Computer Graphics Forum,
28(2):385–396, 2009.
[6] Hank Childs, Mark A. Duchaineau, and Kwan-Liu Ma. A Scalable, Hy-
brid Scheme for Volume Rendering Massive Data Sets. In Eurographics
Symposium on Parallel Graphics and Visualization, pages 153–162, May
2006.
[7] Wagner T. Corrˆea, James T. Klosowski, and Cl´audio T. Silva. Out-of-
Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays. In
Proceedings of the 4th Eurographics Workshop on Parallel Graphics and
Visualization, EGPGV ’02, pages 89–96. Eurographics Association, 2002.
[8] Brian Corrie and Paul Mackerras. Parallel Volume Rendering and Data
Coherence. In Proceedings of the 1993 Symposium on Parallel Rendering,
PRS ’93, pages 23–26. ACM, 1993.
[9] Thomas W. Crockett and Tobias Orloff. A MIMD Rendering Algorithm
for Distributed Memory Architectures. In Proceedings of the 1993 Sym-
posium on Parallel Rendering, PRS ’93, pages 35–42. ACM, 1993.
[10] David E. DeMarle. Ice Network Library. http://www.cs.utah.edu/
~
demarle/software/, 2004.
68 High Performance Visualization
[11] David E. DeMarle, Christiaan Gribble, Solomon Boulos, and Steven
Parker. Memory Sharing for Interactive Ray Tracing on Clusters. Parallel
Computing, 31:221–242, 2005.
[12] Robert A. Drebin, Loren Carpenter, and Pat Hanrahan. Volume Render-
ing. SIGGRAPH Computer Graphics, 22(4):65–74, 1988.
[13] Christiaan P. Gribble and Steven G. Parker. Enhancing Interactive Par-
ticle Visualization with Advanced Shading Models. In Proceedings of
the 3rd Symposium on Applied Perception in Graphics and Visualization,
pages 111–118, 2006.
[14] Milan Ikits, Joe Kniss, Aaron Lefohn, and Charles Hansen. Chapter 39,
Volume Rendering Techniques, pages 667–692. Addison Wesley, 2004.
[15] Thiago Ize, Carson Brownlee, and Charles D. Hansen. Real-Time Ray
Tracer for Visualizing Massive Models on a Cluster. In Proceedings of the
2011 Eurographics Symposium on Parallel Graphics and Visualization,
pages 61–69, 2011.
[16] W. Jiang, J. Liu, H.W. Jin, D.K. Panda, D. Buntinas, R. Thakur, and
W.D. Gropp. Efficient Implementation of MPI-2 Passive One-Sided Com-
munication on InfiniBand Clusters. Recent Advances in Parallel Virtual
Machine and Message Passing Interface, Lecture Notes in Computer Sci-
ence, 2131:450–457, 2004.
[17] Arie Kaufman and Klaus Mueller. Overview of Volume Rendering. In
Charles D. Hansen and Christopher R. Johnson, editors, The Visualiza-
tion Handbook, pages 127–174. Elsevier, 2005.
[18] W. Kendall, T. Peterka, J. Huang, H.W. Shen, and R. Ross. Accelerat-
ing and Benchmarking Radix-K Image Compositing at Large Scale. In
Proceedings Eurographics Symposium on Parallel Graphics and Visual-
ization, pages 101–110, 2010.
[19] Joe Kniss, Patrick McCormick, Allen McPherson, James Ahrens, Jamie
Painter, Alan Keahey, and Charles Hansen. Interactive Texture-Based
Volume Rendering for Large Data Sets. IEEE Computer Graphics and
Applications, 21(4), July/August 2001.
[20] Michael Krogh, James Painter, and Charles Hansen. Parallel Sphere
Rendering. Parallel Computing, 23(7):961–974, July 1997.
[21] Marc Levoy. Display of Surfaces from Volume Data. IEEE Computer
Graphics and Applications, 8(3):29–37, May 1988.
[22] Kwan-Liu Ma. Parallel Volume Ray-Casting for Unstructured-Grid Data
on Distributed-Memory Architectures. In PRS ’95: Proceedings of the
IEEE Symposium on Parallel Rendering, pages 23–30. ACM, 1995.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.38.14