4. Rendering (4/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

64 High Performance Visualization

0 10 20 30 40 50 60

GB per node

render nodes

1/32 Memory

1/8 Memory 1/4 Memory 1/2 Memory 1/1 Memory

FIGURE 4.7: Total memory used per node for the RM data set. Image

source: Ize et al. [15].

ancer sharing the node with the display process. The render nodes thus have

a limited amount of work to perform and the display process quickly becomes

the bottleneck. Figure 4.8 shows that when using one thread to receive the

pixels and another to copy the pixels from the receive buﬀer to the image, the

maximum frame rate is 55fps, no matter how many render threads and nodes

being used. Using two threads to copy the pixels to the image results in up

toa1.6× speedup, three copy threads improves performance by up to 2.3×,

but additional copy threads oﬀer no additional beneﬁt, demonstrating that

copying is the bottleneck until three copy threads are used, after which the

receive thread becomes the bottleneck since it is not able to receive the pixels

fast enough to keep the copy threads busy. When around 18 threads are being

used, be they on a few nodes or many nodes, the system can obtain a frame

rate of 127fps. This is exactly the expected maximum of 127fps given by the

amount of time it takes to transmit all the pixel data across the InﬁniBand

interconnect. More render threads result in lower performance due to the MPI

implementation not being able to keep up with the large volume of communi-

cation. In order to achieve the results required, the MPI implementation must

be tuned to use more RDMA buﬀers and turn oﬀ shared receive queues (SRQ);

otherwise, the system can still achieve the same maximum frame rate of about

127fps with 17 render cores, but after that point, adding more cores causes

Rendering 65

performance to drop oﬀ quickly, with 384 render cores (48 render nodes) being

2× slower. However, this is a moot point since faster frame rates will oﬀer no

tangible beneﬁt.

100

110

120

130

0 50 100 150 200 250 300 350 400

FPS

render cores

2C 3C 4C 5C 6C 7C 6C+LB

FIGURE 4.8: Display process scaling: frame rate using varying numbers of

render cores to render a trivial scene when using one copy thread (1C) to seven

copy threads (7C), and when using six copy threads with the load balancer

process on the same node (6C+LB). Note that performance is not enhanced

beyond three copy threads. Image source: Ize et al. [15].

Modern graphics cards can produce four megapixel images at 60fps. As this

image size is roughly twice the HD image size, in our system the maximum

frame rate would halve to about 60fps. Higher resolutions than 4 megapixels

are usually achieved with a display wall consisting of a cluster of nodes driving

multiple screens. In this case, the maximum frame rate will be given not by

thetimetotransmitanentireimage,butbythetimeittakesforasingle

display node to receive its share of the image. Assuming each node renders 4

megapixels, and the load balancing and rendering continue to scale, the frame

rate will thus stay at 60fps, regardless of the resolution of the display wall.

Since three copy threads are able to keep up with the receiving thread, and

the load balancer process is also running on the same node, there are three

unused cores on the tested platform. If data is replicated across the nodes

then these three cores can be used for a render process. This render process

will also beneﬁt from being able to use the higher-speed shared memory for its

66 High Performance Visualization

MPI communication with the display and load balancer instead of the slower

InﬁniBand. However, if DC is required, then it will not be possible to run

any render processes on the same node since those render processes will be

competing with the display and load balancer for scarce network bandwidth

and this will much more quickly saturate the network port and result in much

lower maximum frame rates.

With modern hardware and software, the described system can ray trace

massive models at real-time frame rates on a cluster and even show interactive

to real-time rates when rendering distributed geometry using a small cache.

The system is one to two orders of magnitude faster than previous cluster ray

tracing implementations, which used both slower hardware and algorithms [11,

39], or had equivalent hardware but did not scale to as many nodes or to high

frame rates [5]. Compared to compositing approaches, the system can achieve

about a 4× improvement in the maximum frame rate for same size non-empty

images compared to the state of the art [18] and can also handle advanced

shading eﬀects for improved visualization.

4.6 Conclusion

Parallel rendering methods for generating images from visualizations are

an important area of research. In this chapter, a general framework for par-

allel rendering was presented and applied to both geometry rendering and

volume rendering. In the future, as HPV moves into the exascale regime, par-

allel rendering methods will likely become more important as in situ methods

require parallel rendering and the send-image method of parallel display will

scale better than the send-geometry method. It is anticipated that GPUs will

become integrated into compute nodes, which oﬀer another avenue for parallel

rendering in HPV.

Rendering 67

References

[1] C. Bajaj, I. Ihm, G. Joo, and S. Park. Parallel Ray Casting of Visi-

bly Human on Distributed Memory Architectures. In VisSym ’99 Joint

EUROGRAPHICS-IEEE TVCG Symposium on Visualization, pages

269–276, 1999.

[2] James Bigler, James Guilkey, Christiaan Gribble, Charles Hansen, and

Steven Parker. A Case Study: Visualizing Material Point Method Data.

In EUROVIS the Eurographics /IEEE VGTC Symposium on Visualiza-

tion, pages 299–306. EuroGraphics, 2006.

[3] James Bigler, Abe Stephens, and Steven G. Parker. Design for Paral-

lel Interactive Ray Tracing Systems. In Proceedings of the 2006 IEEE

Symposium on Interactive Ray Tracing, pages 187–196, 2006.

[4] Carson Brownlee, John Patchett, Li-Ta Lo, David DeMarle, Christopher

Mitchell, James Ahrens, and Charles Hansen. A Study of Ray Trac-

ing Large-Scale Scientiﬁc Data in Parallel Visualization Applications. In

Proceedings of the Eurographics Workshop on Parallel Graphics and Vi-

sualization, EGPGV ’12, pages 51–60. Eurographics Association, 2012.

[5] Brian Budge, Tony Bernardin, Jeﬀ A. Stuart, Shubhabrata Sengupta,

Kenneth I. Joy, and John D. Owens. Out-of-Core Data Management

for Path Tracing on Hybrid Resources. Computer Graphics Forum,

28(2):385–396, 2009.

[6] Hank Childs, Mark A. Duchaineau, and Kwan-Liu Ma. A Scalable, Hy-

brid Scheme for Volume Rendering Massive Data Sets. In Eurographics

Symposium on Parallel Graphics and Visualization, pages 153–162, May

2006.

[7] Wagner T. Corrˆea, James T. Klosowski, and Cl´audio T. Silva. Out-of-

Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays. In

Proceedings of the 4th Eurographics Workshop on Parallel Graphics and

Visualization, EGPGV ’02, pages 89–96. Eurographics Association, 2002.

[8] Brian Corrie and Paul Mackerras. Parallel Volume Rendering and Data

Coherence. In Proceedings of the 1993 Symposium on Parallel Rendering,

PRS ’93, pages 23–26. ACM, 1993.

[9] Thomas W. Crockett and Tobias Orloﬀ. A MIMD Rendering Algorithm

for Distributed Memory Architectures. In Proceedings of the 1993 Sym-

posium on Parallel Rendering, PRS ’93, pages 35–42. ACM, 1993.

[10] David E. DeMarle. Ice Network Library. http://www.cs.utah.edu/

demarle/software/, 2004.

68 High Performance Visualization

[11] David E. DeMarle, Christiaan Gribble, Solomon Boulos, and Steven

Parker. Memory Sharing for Interactive Ray Tracing on Clusters. Parallel

Computing, 31:221–242, 2005.

[12] Robert A. Drebin, Loren Carpenter, and Pat Hanrahan. Volume Render-

ing. SIGGRAPH Computer Graphics, 22(4):65–74, 1988.

[13] Christiaan P. Gribble and Steven G. Parker. Enhancing Interactive Par-

ticle Visualization with Advanced Shading Models. In Proceedings of

the 3rd Symposium on Applied Perception in Graphics and Visualization,

pages 111–118, 2006.

[14] Milan Ikits, Joe Kniss, Aaron Lefohn, and Charles Hansen. Chapter 39,

Volume Rendering Techniques, pages 667–692. Addison Wesley, 2004.

[15] Thiago Ize, Carson Brownlee, and Charles D. Hansen. Real-Time Ray

Tracer for Visualizing Massive Models on a Cluster. In Proceedings of the

2011 Eurographics Symposium on Parallel Graphics and Visualization,

pages 61–69, 2011.

[16] W. Jiang, J. Liu, H.W. Jin, D.K. Panda, D. Buntinas, R. Thakur, and

W.D. Gropp. Eﬃcient Implementation of MPI-2 Passive One-Sided Com-

munication on InﬁniBand Clusters. Recent Advances in Parallel Virtual

Machine and Message Passing Interface, Lecture Notes in Computer Sci-

ence, 2131:450–457, 2004.

[17] Arie Kaufman and Klaus Mueller. Overview of Volume Rendering. In

Charles D. Hansen and Christopher R. Johnson, editors, The Visualiza-

tion Handbook, pages 127–174. Elsevier, 2005.

[18] W. Kendall, T. Peterka, J. Huang, H.W. Shen, and R. Ross. Accelerat-

ing and Benchmarking Radix-K Image Compositing at Large Scale. In

Proceedings Eurographics Symposium on Parallel Graphics and Visual-

ization, pages 101–110, 2010.

[19] Joe Kniss, Patrick McCormick, Allen McPherson, James Ahrens, Jamie

Painter, Alan Keahey, and Charles Hansen. Interactive Texture-Based

Volume Rendering for Large Data Sets. IEEE Computer Graphics and

Applications, 21(4), July/August 2001.

[20] Michael Krogh, James Painter, and Charles Hansen. Parallel Sphere

Rendering. Parallel Computing, 23(7):961–974, July 1997.

[21] Marc Levoy. Display of Surfaces from Volume Data. IEEE Computer

Graphics and Applications, 8(3):29–37, May 1988.

[22] Kwan-Liu Ma. Parallel Volume Ray-Casting for Unstructured-Grid Data

on Distributed-Memory Architectures. In PRS ’95: Proceedings of the

IEEE Symposium on Parallel Rendering, pages 23–30. ACM, 1995.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Rendering (4/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
4. Rendering (4/5)