Preface

This lecture describes the fundamental principles of compiling “regular” (typically numerical) programs with the goal of extracting and exploiting parallelism. This class of programs covers much of the important domain of dense linear algebra, stencil computations, and so forth. The topics covered include a high-level overview of compilers, analyses performed by the compiler to understand the flow of data through a program and transformations to extract parallelism and otherwise increase the performance of the program. Although the main focus of this lecture is on compiling for shared memory machines, we discuss in detail the issues raised when targeting distributed memory machines. An understanding of all of these techniques will allow computer architects to better leverage the capabilities of compilers and to understand their limitations. The material in this lecture should be accessible to an upper-level undergraduate in Computer Science or Engineering with experience in programming and a basic understanding of computer systems. We have tried to make this lecture self-contained and do not assume any prior knowledge of compiler internals on the part of the reader.

We begin our lecture in Chapter 1 with an overview of what allows a program to execute in parallel, parallel execution on shared and distributed memory architectures and compiler support for parallel machines.

In Chapter 2, we describe analyses that allow the compiler to understand the flow of data through a program: data flow analysis, abstract interpretation and dependence analysis, which uses array subscript information to develop more accurate information. One important result of these analyses is the ability to determine, at compile time, when the equivalent of read-after-write, write-after-read and write-after-write hazards occur in programs. The increasingly important topic of how to analyze explicitly parallel programs is also covered. In Chapter 3, we show how the results of these analyses can be used to detect parallelism in programs, including general parallelism, vector parallelism and instruction level parallelism. The issues raised by while loops and recursion are also discussed. Because it is often the case that programs must be transformed before parallelism can be uncovered and exploited, Chapter 4 covers transformations that eliminate and enforce dependences(i.e., hazards), allowing parallelism to be realized.

The analyses that allow parallelism to be detected and exploited also allow the compiler to determine when other transformations that benefit performance can be used. These are covered in Chapter 5. In addition to loop interchange and tiling, which are important transformations that improve locality, we discuss transformations to block loops (useful in both tiling and targeting vector hardware), loop unrolling (which exposes instruction level parallelism and larger branch-free regions of code to target with other transformations), fusion and fission, (which allow loops to be merged and can benefit locality), and other transformations.

Chapter 6 looks at problems presented by distributed memory machines, in particular the problems of how to represent the distribution of data across the different processes executing a program and communication selection.

In Chapter 7, we go into more detail on how to solve Diophantine equations, i.e., equations whose domain, range and coefficients are integers. Solving these equations is an important technique for dependence analysis and performing unimodular transformations. Finally, in Chapter 8 we present a guide to further readings for the reader interested in exploring in depth topics raised earlier in the book.

This lecture would not have been finished without the support of numerous people. I would like to thank my wife, Laura, and my three sons, Danny, Jack and Nathan, for their love and support during this project. Projects often take longer than we think they will when we first embark upon them, and every hour spent on the book was one less hour to spend with them. I would also like to thank my mother and late father whose patience and support got me to a point in life where writing a book like this is possible.

Several people have offered invaluable feedback that has made this lecture much better than it would have been. In particular, Gagan Gupta, a PhD student of Gurindar Sohi at Wisconsin, did an extremely careful reading of an early draft and made excellent suggestions. Keshav Pingali, an anonymous reviewer, and Mark Hill did a reading of the “final” draft and their suggestions led to the inclusion of new material and a much improved and more complete book.

Samuel P. Midkiff
January 2012

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.181.146