Performance Optimization and Auto-Tuning 327
References
[1] E. Wes Bethel. High Performance, Three-Dimensional Bilateral Filtering.
Technical Report LBNL-1601E, Lawrence Berkeley National Laboratory,
2009.
[2] E. Wes Bethel. Exploration of Optimization Options for Increasing Per-
formance of a GPU Implementation of a Three-Dimensional Bilateral
Filter. Technical Report (In patent review), Lawrence Berkeley National
Laboratory, Berkeley, CA, USA, 94720, 2012.
[3] E. Wes Bethel and Mark Howison. Multi-core and Many-core Shared-
memory Parallel Raycasting Volume Rendering Optimization and Tun-
ing. International Journal of High Performance Computing Applications,
(In press), 2012.
[4] David R. Butenhof. Programming with POSIX threads. Addison-Wesley
Longman Publishing Co., Inc., Boston, MA, USA, 1997.
[5] David Camp, Christoph Garth, Hank Childs, Dave Pugmire, and Ken-
neth I. Joy. Streamline Integration Using MPI-Hybrid Parallelism on a
Large Multicore Architecture. IEEE Transactions on Visualization and
Computer Graphics, 17:1702–1713, 2011.
[6] Robit Chandra, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff Mc-
Donald, and Ramesh Menon. Parallel Programming in OpenMP. Morgan
Kaufmann Publishers Inc., San Francisco, CA, USA, 2001.
[7] Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John
Shalf, and Katherine Yelick. Optimization and Performance Modeling
of Stencil Computations on Modern Microprocessors. SIAM Review,
51(1):129–159, 2009.
[8] Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan
Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick.
Stencil Computation Optimization and Auto-tuning on State-of-the-art
Multicore Architectures. In SC 08: Proceedings of the 2008 ACM/IEEE
Conference on Supercomputing, pages 1–12, Piscataway, NJ, USA, 2008.
IEEE Press.
[9] Kaushik Datta, Sam Williams, Vasily Volkov, Jonathan Carter, Leonid
Oliker, John Shalf, and Katherine Yelick. Auto-tuning the 27-point Sten-
cil for Multicore. In 4th International Workshop on Automatic Perfor-
mance Tuning (iWAPT), 2009.
328 High Performance Visualization
[10] T. Fogal and J. Kr¨uger. Tuvok, an Architecture for Large Scale Volume
Rendering. In Proceedings of the 15th International Workshop on Vision,
Modeling, and Visualization, pages 139–146, November 2010.
[11] Enrico Gobbetti, Fabio Marton, and Jos´e Antonio Iglesias Guiti´an. A
Single-pass GPU Ray Casting Framework for Interactive Out-of-core
Rendering of Massive Volumetric Datasets. The Visual Computer,
24(7):797–806, 2008.
[12] S¨oren Grim, Stefan Bruckner, Armin Kanistar, and Eduard Gr¨oller. A
Refined Data Addressing and Processing Scheme to Accelerlate Volume
Raycasting. Computers and Graphics, 5(28):719–729, 2004.
[13] J. Hollingsworth and A. Tiwari. End-to-end Auto-tuning with Active
Harmony. In David H. Bailey, Robert F. Lucas, and Samuel W. Williams,
editors, Performance Tuning of Scientific Applications. CRC Press, Boca
Raton, FL, USA, 2010.
[14] Mark Howison, E. Wes Bethel, and Hank Childs. MPI-hybrid Paral-
lelism for Volume Rendering on Large, Multi-core Systems. In Euro-
graphics Symposium on Parallel Graphics and Visualization (EGPVG),
Norrk¨oping, Sweden, May 2010.
[15] Mark Howison, E. Wes Bethel, and Hank Childs. Hybrid Parallelism
for Volume Rendering on Large, Multi- and Many-core Systems. IEEE
Transactions on Visualization and Computer Graphics, 99(PrePrints),
2011.
[16] S. Kamil, C. Chan, S. Williams, L. Oliker, J. Shalf, M. Howison, E. W.
Bethel, and Prabhat. A Generalized Framework for Auto-tuning Stencil
Computations. In Proceedings of Cray User Group Conference, Atlanta
GA, USA, May 2009. LBNL-2078E.
[17] Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, and Sam Williams.
An Auto-tuning framework for Parallel Multicore Stencil Computations.
In International Parallel & Distributed Processing Symposium (IPDPS),
2010.
[18] Jens Kr¨uger and R¨udiger Westermann. Acceleration Techniques for GPU-
based Volume Rendering. In Proceedings IEEE Visualization 2003, 2003.
[19] Marc Levoy. Display of Surfaces from Volume Data. IEEE Computer
Graphics and Applications, 8(3):29–37, May 1988.
[20] William E. Lorensen and Harvey E. Cline. Marching Cubes: A High
Resolution 3D Surface Construction Algorithm. SIGGRAPH Computer
Graphics, 21(4):163–169, August 1987.
Performance Optimization and Auto-Tuning 329
[21] Lukas Marsalek, Armin Hauber, and Philipp Slusallek. High-speed Vol-
ume Ray Casting with CUDA. In IEEE Symposium on Interactive Ray
Tracing, 2008. Poster.
[22] Jason Nieh and Marc Levoy. Volume Rendering on Scalable Shared-
Memory MIMD Architectures. In Proceedings of the 1992 Workshop on
Volume Visualization, pages 17–24. ACM Siggraph, October 1992.
[23] NVIDIA Corporation. NVIDIA CUDA
TM
Version 2.1 Programming
Guide, 2008.
[24] NVIDIA Corporation. NVIDIA CUDA
TM
Programming Guide Ver-
sion 3.2 RC, 2010. http://developer.nvidia.com/object/cuda_3_2_
toolkit_rc.html.
[25] Michael E. Palmer, Brian Totty, and Stephen Taylor. Ray Casting on
Shared-Memory Architectures: Memory-Hierarchy Considerations in Vol-
ume Rendering. IEEE Concurrency, 6(1):20–35, 1998.
[26] Valerio Pascucci and Randall J. Frank. Global Static Indexing for Real-
time Exploration of Very Large Regular Grids. In Proceedings of the 2001
ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing
’01, New York, NY, USA, 2001. ACM.
[27] Tom Peterka, David Goodell, Robert Ross, Han-Wei Shen, and Rajeev
Thakur. A Configurable Algorithm for Parallel Image-Compositing Ap-
plications. In Proceedings of Supercomputing 2009, Portland OR, Novem-
ber 2009.
[28] Tom Peterka, Hongfeng Yu, Robert Ross, Kwan-Liu Ma, and Rob
Latham. End-to-End Study of Parallel Volume Rendering on the IBM
Blue Gene/P. In Proceedings of ICPP 09, Vienna, Austria, 2009.
[29] S. Stegmaier, M. Strengert, T. Klein, and T. Ertl. A Simple and Flexible
Volume Rendering Framework for Graphics-Hardware–based Raycasting.
In Proceedings of the International Workshop on Volume Graphics ’05,
pages 187–195, 2005.
[30] C. Tomasi and R. Manduchi. Bilateral Filtering for Gray and Color Im-
ages. In ICCV ’98: Proceedings of the Sixth International Conference on
Computer Vision, page 839, Washington, DC, USA, 1998. IEEE Com-
puter Society.
[31] S. Williams, K. Datta, L. Oliker, J. Carter, J. Shalf, and K. Yelick. Auto-
tuning Memory-Intensive Kernels for Multicore. In David H. Bailey,
Robert F. Lucas, and Samuel W. Williams, editors, Performance Tuning
of Scientific Applications. CRC Press, Boca Raton, FL, USA, 2010.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.198