CHAPTER 2

Dependence analysis, dependence graphs and alias analysis

As described in Chapter 1, determining what data (memory) is accessed by references in the program is fundamental to determining if different regions of a program can execute in parallel. Three major techniques are used to determine when two references access the same storage: dependence, use-def and alias analysis. In general, dependence analysis is used to analyze array accesses, and alias and use-def analysis are used to analyze accesses to scalars or data that is treated as a scalar.

All of the analyses we discuss have the property that they are sound or conservative. Any analysis performed at compile time will be forced to approximate the behavior of the program at run time, and this approximation will cause the results of the analysis to differ from what actually happens at run time. The question is, what effect do these differences have on the correctness of transformations using the analysis results? If the analysis is sound, or conservative, then the differences will cause optimization opportunities to be missed, but will not lead to incorrect executions of the program. What constitutes a conservative result from an analysis is in part a function of how the results are used. Consider an analysis that determines if two reads are to the same memory location. If the analysis is used by a pass that allows the results of the first read, a1, stored in a register to be reused by a second access, a2, then the analysis is conservative only if every pair of accesses a1, a2 said to be to the same location must be to the same location. If they are not, then a1 will read the wrong value. In this case, the analysis is conservative if it only says that accesses are to the same location if they always are. Now consider when the analysis is being used to determine if a read of storage by a2 can be moved above a write by a1. In this case, if the analysis says that two accesses that may, or may not, access the same storage do not access the same storage (a conservative result for the first use of the analysis) then illegal moves of two accesses past each other may occur. In this case, the analysis should say that two accesses are to the same location if they may be to the same location. It is entirely the responsibility of the creator of an analysis and the user of the analysis (the client) to ensure that the approximations made by the analysis are appropriately conservative.

We will first describe a basic dataflow analysis, and use the concepts developed to discuss the more complicated alias analysis. We will then move on to dependence analysis and conclude with a discussion of use-def analyses. Dataflow analysis has been a area of intense research for over 40 years ([100, 200]), and a more extensive introduction can be found in [3, 17, 77].

2.1 DATAFLOW ANALYSIS

We will first discuss a simple dataflow analysis to allow a clearer explanation of certain concepts to be done. The analysis we will consider is constant propagation, which attempts to determine what values in a program are constants.

Images

Figure 2.1: The semi-lattice used in the constant propagation analysis.

2.1.1 CONSTANT PROPAGATION

Constant propagation, like most dataflow analyses, is performed by traversing the control flow and calling graph of the program. We will first discuss the simpler intra-procedural case (i.e., where the analysis is only examining a single procedure) where only the control flow graph of the function being analyzed is needed, using the program of Figure 2.2(a) and the control flow graph of Figure 2.2(b). Prior to performing the analysis, each node in the graph is annotated with two pieces of information: facts that become true in the analysis when the node is visited, and facts that are no longer true after the node is visited. The specification of facts that become true is contained in the GEN set, and the specification of facts that are no longer true after the node is visited are given by the KILL set. For constant propagation, we can say three things about the domain of values for variables in our analysis:

1. it is a constant with a value v;

2. we have not yet discovered anything about the variable; and

3. we have discovered something about the variable, and it is not a constant.

The relationship between these values is often represented by a semi-lattice L, which consists of a set of values V and a meet operator ∧.

The values in the semi-lattice and the meet operator vary from analysis to analysis. In Figure 2.1(a) a diagram showing the values of the semi-lattice for a constant propagation is given. Given two values vi and vj, let vr be the first common descendant of vi and vj in the semi-lattice, then vr = vivj. Using these definitions, we see how constant propagation can be done using the semi-lattice shown in the figure.

Images

Images

Figure 2.2: An example of a flow sensitive constant propagation analysis.

Constant propagation is a forward data flow analysis, that is, facts that are true at some point are propagated forward through the program. The program state, before a statement is executed, is in the IN set, and the program state after the statement is executed is in the OUT set. Moreover, at each node in the control flow graph certain facts become true—these are represented by a GEN set associated with each node of the program. Also at each node in the CFG certain facts must become false—these are represented by a KILL set associated with the node.

A transfer function of the form OUT = GEN ∧ (INKILL) represents the effect of the node on the program state when the statement is executed. Almost all dataflow analyses use a transfer function of this general form. The set associated with the nodes must first be initialized. In constant propagation, the KILL set is empty, and is set as such for every node. The IN, OUT and GEN sets consist of doubles of the form [v, val]. For the IN and OUT sets, val is an element of the semi-lattice. For the GEN set, v is a variable that is on the left-hand side of the assignment, and val is either a single variable, a constant, or an expression of the form vali Images valj that is assigned into that value. Figure 2.2(b) shows the nodes of the CFG annotated with the GEN and (empty) KILL sets that capture the effect on the nodes on the program state, and the initial values of the IN and OUT sets.

The transfer function is evaluated as follows. For simplicity, we assume one statement per node, and each node contains one of five kinds of expressions:

Images

where ci is a constant literal, vi, vj are variables, op1 and op2 are either variables or constants, foo is a function and null is a node where no variables are assigned. Let VALS(t) = val, where t is the double [v, val] in the set S, i.e., IN, GEN or KILL. Let VARS(t) = v, where t is the same double. For every double [vi, val] in the IN set with no corresponding double [vi, . . .] in the GEN set, the double [vi, val] is simply copied to the OUT set. This is true for all variables when the expression is null. For the variable vi such that [vi, val] is in the GEN set, the following action is taken. If val is a variable vj, then the double [vj, val] is found in the IN set (i.e., the current value of vj in the domain represented by the semi-lattice is found), and the double [vi, vj] is placed into the OUT set. If val is the constant cj, then the double [vi, cj] is added to the OUT set. Otherwise, the expression in the node is val = op1 Images op2. As was just done with a singleton right-hand side, we look up each of op1 and op2 in the IN set. If both are constants or constant valued variables, the expression op1 Images op2 is evaluated using the constant values. The double [vi, r] is then added to the OUT set, where r is the result of evaluating the expression. Finally, if either op1 or op2 is not a constant, the double [vi, VAL(op1) ∧ VAL(op2)] is added to the OUT set. In Figure 2.2(c), the result of doing this throughout the program on a single pass is shown.

Statements like the C for statement may be broken into more than one statement—this has been done for S4. Statement S4 is broken into two statements, one which assigns 0 to i, and the other which performs the increment of i.

Statement S4 is a join point in the control flow graph. As a result, there are values for each variable vi defined along each control path entering the join. At this particular join point, the values along the path from the top of the program will hold the first time statement S4 is visited, and values from within the loop will hold the second and later times that it is visited. The analysis, however, must represent values that hold at any execution of S4. To model this, the IN set of S4 contains the meet of the two OUT sets, that is, for every double tS4′ (tS8), such that VAROUT(tS4′) =VAROUT(tS8), the double [VAROUT(tS4′), VALOUT (tS4′)∧VALOUT(tS8)] is in the IN set of S4.

The propagation of values through the CFG continues until no changes occur, and the algorithm terminates with the graph as it appears in Figure 2.2(d). The contents of the IN set associated with each statement tells whether the corresponding variables are constants or not, and if so, what their value is, immediately before the statement executes. We know the algorithm will terminate because the result of the ∧ operation either moves down the semi-lattice, or stays in the same place. Thus, at any statement S there can only be h updates of the variable’s value in S, where h is the height of the semi-lattice.

The analysis just described is flow sensitive, that is, there is an IN set for each statement, and the IN set describes the state of the execution along the program flow in the CFG for each statement, and as such, all facts in the OUT set are conservatively correct for each point in the program. A flow insensitive analysis can be performed where there is effectively a single IN set that also serves as the OUT set for every statement. As a statement is visited, the IN set is updated, and then used as the input for the analysis of the next statement. Flow insensitive analyses are generally less precise, and more conservative, than flow sensitive analyses because a fact is true only if it is true throughout the entire control flow graph. Thus, the knowledge in the flow sensitive algorithm that k = 5 in S6, and j = k = 5 in S7, but have different constant values in other parts of the program, is lost in the flow insensitive analysis. Flow insensitive algorithms tend, however, to be faster than flow sensitive algorithms. Figure 2.3 shows the effect of flow insensitive analysis on the example of Figure 2.2.

What happens when a procedure call is encountered? When the intra-procedural analysis of a function is finished, its effect on the global state of the program can be represented as a summary. Thus, in the example of Figure 2.2, the parameter k is not a constant upon leaving the function. Thus, at each call site for foo, the GEN set for the calling statement would contain the double {[κ, ⊥]} to reflect this, where κ is the argument passed to the parameter k of foo. Similarly, the state of the program at the point of the call can be represented as the IN set of the function call statement. The analysis, upon encountering the function call, would send the values associated with arguments to the intra-procedural analysis of the function body. Analogous to flow sensitive and insensitive analyses, inter-procedural analyses can be context sensitive and context insensitive. In a context sensitive analysis, each function call site has a separate summary (GEN and KILL) sets, whereas in a context insensitive analysis all call sites share summary IN information. Similar precision/speed trade-offs exist as with the flow sensitive and insensitive analyses.

Images

Images

Figure 2.3: A flow insensitive constant propagation analysis on the program of Figure 2.2(a).

A work queue is typically used to implement a dataflow algorithm. The first node of the function is placed into the queue. Until the queue is empty, the algorithm removes a node from the queue. The IN set is computed from the predecessor node’s OUT sets, and the transfer function is evaluated. If the OUT set of the node being processed changes, its successor nodes are placed into the work queue. This causes nodes to be reevaluated only when their inputs change.

Finally, some analyses, such as live variable analysis, are backwards analyses, and information is propagated backwards through the program. In live variable analysis, the fact that a variable is read later in a program makes a value in the variable at an earlier point in the program live. Backwards analyses are in principle just like forward analyses, except that a reverse control flow graph is used. A reverse control flow graph is formed from the control flow graph by changing each edge αβ to βα. In these analyses the OUT sets are computed from the IN sets of the predecessor nodes in the reverse control flow graph, and the transfer function then computes the IN nodes value.

2.1.2 ALIAS ANALYSIS

Alias analysis is a dataflow analysis that tries to determine the different “names” by which an object in memory is referenced. By “object” we do not mean an object in the object oriented use of the term, but rather any collection of memory that can be referenced in whole by the program. An object in our sense may be a word, a structure, the field of a structure, an entire array or some other element (including an object in the OO sense of the word) in the program. Two different pointers can be made to refer to the same object by assigning the same value, or address, to each. Two different function parameters can be made to refer to the same object in languages that pass arguments to functions by reference (such as Fortran) by passing the same object as two or more arguments. In languages such as C and C++ that pass arguments by value, passing two or more pointers that contain the same address as parameters causes aliasing. Address arithmetic can also cause real, or apparent aliasing.

We will refer to the program of Figure 2.4(a) when describing alias analysis. In Figure 2.4(b) the control flow graph (as described in Section 1.3.3) for the program is shown.

The program first declares three pointers, p, q and r, in statements S1, S2 and S3, and assigns to each pointer the address of a heap allocated object. This raises an interesting problem in alias analysis—how does the analysis refer to a heap allocated object? Note that if the allocation site is executed repeatedly, as in:

Images

Images

Images

Figure 2.4: Alias analysis example.

then the variable ptr may reference a number of different objects, the number of which is unknown at compile time. Therefore, the different objects cannot all be uniquely and individually represented at compile time, and that which cannot be represented by a compiler cannot be analyzed by the compiler. One standard way around this representation problem is to name allocation sites, and refer to all objects allocated at that site by the site name. Thus, all objects allocated in statement Sj in the loop above would be referred to as an Sj object, and any action taken on any of those objects would be assumed to take place on all of them.

Returning to the example of Figure 2.4, in statements S1 and S2 the assignments to p and q cause them to point to objects S1 and S2, respectively. This is represented by the points-to graph shown in Figure 2.4(c). In the simple points-to graph of this example, each node is a circle labeled with the name of a pointer, or a square labeled with the name of an allocation site representing all of the objects allocated at the site. Edges in the graph go from a pointer to the object referenced by the pointer. In S3, r receives the value of q, and therefore points-to object S2, as shown in Figure 2.4(d). Because q and r point to the same object, they are aliased at this point in the program.

The next statement is an if statement. At this point we will assume a flow sensitive analysis, which not only follows the control flow of the program but keeps analysis information at each point in the program. Along the true path of the if the statement q = p is executed. This makes q point-to what p points-to instead of pointing to object S2. As well, the assignment s = q makes s point to what q (and p) point to. This is shown in the points-to graph of Figure 2.4(e). Along the false path r = p is executed, and r now points-to object S1 rather than object S2, as shown in Figure 2.4(f). At the end of the if statement, the points-to graph must be conservative and correct regardless of which path the program took. Thus, whatever is true along either path reaching this point must be true at S9. Since p points only to S1 along both paths, it must point to S1 at the join of the two paths. Because q points-to S1 at the end of the true branch, and to S2 at the end of the false branch, it can point to either at the end of the if. r points to S1 or S2 at the end of the if, and s points-to either S1 or nothing. This is shown in Figure 2.4(g).

In this analysis the result of the join operation for a graph pq and pr is a graph where p is pointing to both q and r, i.e., is the union of the graphs. The domain of the semi-lattice over which this analysis operates contains increasing connected points-to graphs containing all heap objects and pointers visible in the region of the program being analyzed. Like the infinitely wide semi-lattice using in constant propagation, this semi-lattice is never explicitly represented.

The analysis can be flow insensitive instead of flow sensitive. In the flow insensitive analysis, a single points-to graph is maintained for the entire program region under analysis, rather than at each program point, and the resulting points-to graph, when the analysis is finished, must be conservatively true at all points in the region under analysis. More concretely, in our example, when S9 is analyzed, q is made to point to both S1 and S2, because it points-to each at some location in the program. Similarly, when s = q is analyzed, s must point to both S1 and S2 because q points-to both of these. At the merge point at the end of the if no additional action is necessary because the merger of information has already happened on the single points-to graph used by the flow insensitive analysis.

Alias analysis is imprecise for several reasons. Context and flow insensitive analyses merge information from multiple points in the program, and consider the information to hold at all points in the program. These less precise analyses are used because they are generally faster than the more precise context and flow sensitive analyses. Because the analysis must be true for all paths through the program, merging alias information at control flow join points, such as after the if in the example above, also loses information about what is true during a particular execution of the program at this point. Finally, the fact that all objects allocated at a particular site are treated as the same object, and that arrays are generally treated as a single, monolithic object, such that a write (read) to any element of the array is considered a write (read) to the entire array, also causes alias analyses to be generally conservative. In Section 2.3 we discuss data dependence analysis, which aims to gather more precise information about array accesses.

2.2 ABSTRACT INTERPRETATION

Abstract interpretation [58, 59] is an alternative approach to dataflow analysis for the static analysis of computer programs. Abstract interpretation has traditionally been less widely used than dataflow analysis in compilers, but is an area of intense research interest, and arguably is gaining in popularity.

Conceptually, a programming language L has some semantics Images and operates over some domain Images. Thus, the semantics of the programming language specifies how programs map input values (in the domain Images) onto the domain Images. Given a program pL, the semantics Images define allowable outcomes in Images given some input or initial state. Moreover, this outcome can be computed by executing the sub-steps of the program in a way allowed by the language semantics.

We note that programs typically have an infinite number of outcomes, and computing all of these outcomes at compile time is clearly impossible. Abstract interpretation, like dataflow analysis, deals with this by defining a restricted, or abstract, domain for the analysis to operate on. In our dataflow analysis constant propagation example above, values of variables could be either Image, ⊥, or a constant value. An abstract interpretation designed for performing constant propagation might have as its abstract domain Images the same elements, i.e., Images = {Image, ⊥, iImages}. Thus, in the abstract domain the program would perform a mapping onto Images instead of Images.

It is incumbent on the designer of the abstract interpreter to provide a mapping M from Images Images Images, and to prove certain properties of this mapping. For example, it is typically shown that given a binary operation Images, that the corresponding binary operation operating on the abstract domain gives a consistent result, i.e., Images, and if Images, and if Images, then it must be that Images. Composition, associativity, commutativity, and other properties defined by the semantics S on Images should also hold for the operations on Images

To perform the analysis, the statements of the program are interpreted over this abstract domain, and by showing that operations over the program state in Images lead to monotonically increasing values of the program state, it can be shown that the interpretation will reach a fixed point such that further interpretation does not change the program state. This final program state then is the result of the analysis.

A comparison of dataflow analyses and abstract interpretation Dataflow analysis and abstract interpretation have their relative strength and weaknesses. In practical terms, many commercial compilers (and open source compilers for Java, C and C++) contain built-in dataflow analysis frameworks that allow the analysis writer to only define the transfer functions and computation of IN (OUT) sets from the predecessors (successors) for forward (backward) dataflow analyses. This removes much of the complexity of implementing many dataflow analyses. Some analyses, such as liveness, which determines if a value in a variable may be used later in the program execution, are more difficult to do with abstract interpretation.

At the same time, abstract interpretation makes the linkage between the semantics of the language being analyzed and the analysis semantics explicit. In particular, each value in the abstract semantics corresponds to a well defined set of values that the actual program can take on. In dataflow analyses these relationships must be independently proven. Nevertheless, it is often felt that dataflow analyses are more intuitive, and consequently are more likely to be taught in introductory compiler courses. It has been shown [208, 221, 222] that dataflow analysis is equivalent to model checking using mu-modal calculus. Schmidt [208] makes the point that

classic [data flow analyses] merely scratch the surface as to the forms of iterative static analysis one can perform – the combination of the abstract interpretation design space and the modal mu-calculus define the design space of iteratively static analyses.

2.3 DATA DEPENDENCE ANALYSIS

Consider the loop of Figure 2.5(a). A standard alias analysis would consider the b array to be a monolithic structure, and would conservatively consider the two accesses of b to touch the same memory location. A closer examination of the accesses, however, shows that b[2*i] is only writing the even elements of the array, and b[2*i-1] is only reading the odd elements of the array, and the same location is never touched twice. The goal of dependence analysis is to provide a more precise analysis than alias analysis that can detect when this happens. Moreover, when the same array element is accessed two or more times, dependence analysis will try and identify the order that the accesses may occur, and if possible to determine how many iterations of the loop nest there are between the two accesses. Because dependence analysis is conservative, it seeks to find those cases when it can be proven that a dependence does not exist between two references, and otherwise assumes that a dependence exists. Depending on the particular dependence analysis technique, it can sometimes prove the harder claim that a dependence really exists, but in general dependence analysis seeks to show when a dependence cannot exist rather than when it must.

There are four kinds of data dependence, three of which are commonly considered in loop parallelization. A flow or true dependence exists when a value is written to an array element, and then read from that element at a later time. An anti dependence exists when a value is read from an array element, and then written to that element at a later time. An output dependence exists when a value is written to an array element, and then overwritten with a different value at a later time. The fourth type of dependence is the input dependence which occurs across writes. This is not really a dependence since reordering the reads will not cause problems1. Input dependences are sometimes used with attempting to analyze data reuse [85, 139]. In practice, compilers will conservatively assume that a dependence exists regardless of whether a value different than what is already in the array element is written. We note that of these three kinds of dependences, anti and output dependences result from reusing storage, and only the flow or true dependence is essential to the computation, hence its name. In Chapter 4, we discuss transformations that can sometimes eliminate anti and output dependences, and that enable dependences to be ignored. Because the relative order of reads is immaterial when parallelizing an initially sequential program we will discuss input dependences no further.

Images

Images

Figure 2.5: An example of a loop with multiple dependences.

Dependence analyses implicitly treat both a Dλ deep loop nest and a DA dimensional array as a Dλ dimensional Cartesian space in the integers. Figure 2.5(b) shows a pair of references in a doubly nested loop. The iteration space diagram (ISD) in Figure 2.5(c) shows the Cartesian space that describes the iteration space of the loop and illustrates the dependence structure of the program at run time. Imposed on this Cartesian space are edges that relate points in the iteration space that access the same element of an array. In the iteration space diagram (ISD) of Figure 2.5(c) we see the edges corresponding to the accesses to the references to the c array in S1 and S2. For example, when i1 = 4, i2 = 3, element c[8, 3] is written, and when i1 = 5, i2 = 2 element c[8, 3] is read. That the same element is accessed, and the access is a write, creates a dependence on the c array between iterations i1 = 4, i2 = 3 and i1 = 5, i2 = 2. The dependence edge from i1 = 4, i2 = 3 to i1 = 5, i2 = 2 in the ISD represents this graphically, with the darkened nodes being those involved in the dependence just described. Additional edges on the graph are shown corresponding to the other points in the iteration space on which dependences exist. Note that for accesses on the d array the accesses that read and write the same element of the array are always in the same element, and so edges representing these are from an iteration to that iteration. We note that in the literature ISDs are sometimes shown with a node for each statement or reference in each iteration.

In Figure 2.5(d), we see the dependence graph for the loop nest of Figure 2.5(b). Nodes in the graph represent statements in the loop nest, and edges represent dependences. Edges are often labeled with the kind of dependence, in this case with δf to represent the flow dependence on c from S1 to S2 with a distance of (1, − 1), and δa to represent the anti-dependence on d with a distance of (0, 0).

As seen in the ISD of our example, dependences always go forward in the execution of a program. A flow dependence represents a write to an array element followed by a read. An anti dependence represents a read to an array element followed by a write, and similarly for output and input dependences. Thus, the dependence is always from an earlier event to a later event. If the dependence is across iterations, then this means that the dependence moves forward in the iteration space as shown in the ISD. If the dependence is within an iteration, as is the dependence on d in Figure 2.5(b), then the dependence must be from an access that occurs earlier in the iteration to one that occurs later. Thus, for d, the read of d[i1, i2] occurs before the write of d[i1, i2], and therefore the dependence is an anti-dependence. This ordering is crucial because in part it is what allows the compiler to distinguish between a true dependence, and an anti dependence that it might be able to eliminate via code transformations.

When drawing dependences on an ISD or other diagram, the tail is always at the first action in the dependence, and the head at the second. The tail action is referred to as the source of the dependence, and the head action as the sink of the dependence. Thus, for a flow dependence the source is always a write and the sink is always a read, and for an anti-dependence the source is always a read and the sink is always a write.

The direction of a dependence tells how the iterations at the source and sink are related to one another. The direction is represented as a vector Γ, with Γ[d], 1 ≤ dD containing the relationship between iterations at the source and sink (tail and head) of the dependence edge for the loop at nest level d in the loop nest containing the dependence. Let id be the value of the loop index variable i for the d’th loop in the loop nest for the dependence source, and Images be the value of the loop index variable for the dependence sink. The distance of the dependence is (Imagesid), and the direction is the sign of the distance, i.e., sign(Imagesid). Thus, if the source is at an earlier iteration that the sink, the dependence direction vector element for loop id will be 1. It is possible for the direction to be positive, negative, or some combination of these, thus the direction for a loop id is represented as a set whose elements are any member of the power set of {1, 0, −1}, and the direction for the entire loop nest is a vector of these sets.

Since dependences must move forward in the iteration space of the loop, the question arises as to how the dependence distance and direction for some loops can represent a negative distance. Looking again at the loop of Figure 2.5, we see that the distance of the flow dependence on the read and write of c is positive on the i1 loop, and negative on the i2 loop. Thus, the dependence vector is Γ = [{1}, {−1}]. The reader can enumerate the iterations accessing the various elements of c on the read and write, and compare the i1 and i2 values when the same elements are read and written. The first non-zero element of a distance vector must be positive, and therefore the first non-zero element of the direction vector be lexicographically greater than {0}.

We note that sometimes a different notation for direction is used. In this notation, “<” is substituted for “1”, “>” is substituted for “−1”, and “=” is substituted for “0”. The direction {<, =, >} is sometimes represented as “*” in the literature. The advantage of this notation is that it lexically shows the relationship of the source and sink iterations (i.e., id < Images for a positive distance), while the advantage of the 0, 1, −1 notation is that it represents the direction in a mathematically amenable notation.

A dependence for which the distance is non-zero, i.e., the direction contains a −1 or 1, is called a loop carried dependence because data is carried across iterations of the loop.

Logically, dependence testing has two phases. In the first phase, the variable references to be tested are determined, and then, in the second phase, the test for dependence is done. The dependence test driver determines pairs of references to be tested, and the direction vector under which they will be tested. It then calls a dependence test to actually perform the test.

2.3.1 DETERMINING REFERENCES TO TEST FOR DEPENDENCE

The dependence test driver operates over all pairs of accesses within some selected loop i, where i is typically the outermost loop in a loop nest that is being analyzed. The driver then examines all references which may access the same storage. Two references are considered to possibly access the same storage if (i) the references are to the same declared array, or (ii) the references are may or must aliased to the same array.

First, the relative base addresses, i.e., the relationship between the lowest elements of the array that can be accessed by each referenced array, is determined. In the easier (and common) case of references that are both to the same declared array, the base addresses must be the same. In the case of references that are possibly aliased to the same array, the situation may be more difficult, as shown in Figure 2.6. In the example, statements S3, S4 and S5 all pass the array z as arguments for the formal parameters a and b in the function leftShiftBtoA. In the case of the first call at S3, the statement at S1 essentially performs the operation

a[i] = a[i + 2]

for the iteration space from 0–999, and there exists an anti-dependence of distance “2” from b[i + 2] to a[i]. In the case of the second call at S4, the statement at S1 essentially performs the operation

a[i] = a[i − 1]

for the iteration space from 0–999, and there exists a flow dependence of distance 1 from a[i] b[i+1]. In the case of the third call at S5, the statement at S1 essentially performs the operation

a[i] = a[i + 500]

for the iteration space from 0–499, the regions accessed by the left-hand reference and the right-hand reference do not overlap, and there is no dependence.

Images

Figure 2.6: An example of issues raised in dependence analysis by aliased arrays. Different parameter values for a, b and n in function leftShiftBtoA lead to different types of dependence.

A compiler, when confronted by aliased references with non-equal base addresses, can either conservatively assume that flow and anti dependences exist between every pair of read and write references to the aliased arrays, and that an output dependence exists between every pair of write references, or the compiler can attempt to incorporate the offsets of the bases into the dependence equation. In the former case the dependence is assumed to exist over all possible valid directions, and in the latter case the test can proceed as normal with an extra offset term to augment the offset information provided by the subscript expression.

In the simpler case of common base addresses, or after an offset has been determined for the more complicated case of different base addresses for the array, the dependence tester needs to be called with different direction vectors to determine under what conditions the dependence exists. Given that there are O(3n) combinations of direction vectors to test under for an n deep loop nest, it is desirable that these tests be performed in a systematic way. In [39] a hierarchy of calls to the dependence testing routine under different direction vectors that allowed the search space to be systematically trimmed was proposed. Figure 2.7 illustrates this for a two deep loop nest. The test is first called on the subscripts with a direction of (*, *) (i.e., any direction). If a dependence is possible under these directions, the test is called for the three directions at the next level ((<, *), (=, *) and (>, *)). Again, for any of the directions for which a dependence might still exist, the test is called for the next level of directions, as shown in the Figure 2.7. The observant reader will notice that the test on directions with a leading “>” direction, or with an “(=, >)” appear illegal because the dependence must move forward in the iteration space. If the test is performed with a source of Si and a sink of Sj, the existence of a dependence with these directions implies a dependence from Sj to Si with “>” directions being changed to “<”, and “<” directions being changed to “>”, yielding a dependence and direction that moves forward in the iteration space. This allows for a simultaneous testing of anti and flow dependences, and of output dependences with either statement being the source or sink.

We note at this time that the selection of references to test is typically done in a flow insensitive manner. That is, information about flow of control between the references is typically not used to determine if a dependence exists. Instead, all possibly aliased array references within the loop nest being analyzed are tested. There has been work in the field of software engineering to use control information in conjunction with dependence information to better understand the part of a program that is responsible for a the current value of a variable (i.e., the slice [241] for the variable) [212], but this topic is beyond the scope of this lecture.

Images

Figure 2.7: The hierarchy and order of testing on different direction vectors.

2.3.2 TESTING FOR DEPENDENCE

The test for dependence takes as input four items, listed below along with an example of each item from Figure 2.8.

1. The subscript expressions for the two references being tested, e.g.,

Images

where the as and bs represent loop invariant values. The subscripts are assumed to be linear functions in several dimensions, i.e., affine functions;

2. the bounds for each loop index id in the common loop nest surrounding the references (if not available the maximum allowed value (e.g., MAXINT) for the bounds can be used), e.g.,

Images

3. a direction for each variables in the loop nest(s) surrounding the loop, (e.g., Γ = [γ1, γ2, γ3, *, *, *]); and

4. the non-common loops (denoted by i*) surrounding each reference, e.g., i* = Images.

The dependence tester now needs to answer the question:

Given subscript functions σ1(I), σ2(I′) on the index variables I and I′ of the loops surrounding each reference, does an integer solution exist for the equation σ1(I) = σ2(I′) (the references access the same elements of the array) subject to the constraints that:

1. ∀id, 1 ≤ dD in the D-deep common loop nest id γ i′ == true, 1 ≤ dD (i.e., the direction vector is satisfied by the solution);

2. LididUid, 1 ≤ dD (i.e., the solution is within the loop iteration space); and

3. lbdidubd, idi* and LidImagesUid, ImagesImages (i.e., the solution for the non-common loop indices is within the loop iteration space of the non-common loops).

Images

Figure 2.8: An example of references with non-common loops (those with italicized fors) in their surrounding loop nests.

The first part of this question is answered using the gcd test. In the gcd test, the Diophantine equation for the subscripts is formed:

Images

A well-known result in number theory says that a Diophantine equation has a solution if and only if the gcd of the coefficients evenly divides the constant right-hand side, i.e., that there is a solution only if

Images

That this is true can be shown using the simpler equation

Images

and observing that if

Images

then

Images

and therefore, if the gcd of the coefficients does not evenly divide the constant term, there can be no solution, because if the right and left-hand side are equal they must have the same divisors.

If there are multiple subscript expressions, as in the case of multi-dimensional arrays, then there are three possible approaches. First, the equations for each dimension can be formulated and solved independently. If any equation does not have a solution, dependence is disproved. Second, the subscripts expressions can be combined by linearizing them. These subscript expressions are combined in essentially the same way as they are combined by a compiler generating code for a multi-dimensional array access. These new subscript expressions are then equivalenced and the resulting equation is checked for a solution. Third, the extended gcd test can be used, with the equations for all of the dimensions being solved simultaneously. This is discussed in more detail in Chapter 7.

Unfortunately, coefficients of index variables are often 1, and thus the gcd is often 1. Thus, it is essential to determine if the solution to the Diophantine equation is within the loop bounds and consistent with the direction vector. A precise answer to whether the solution fits these constraints requires linear programming, which has exponential complexity. This has motivated the development of various heuristics. Banerjee’s inequalities were the earliest such heuristic. They attempt to find an upper and lower bound on the right-hand side of the equation. Consider a term a imageib imagei′ in the right-hand side. If the direction vector is 1, then i < i′, and thus the lower bound of the term (assuming a, b positive) is a imagelbib imageubi. Similar expressions can be derived for the case of a > 0, b < 0; a < 0, b < 0, and a < 0, b < 0. From these the minimum (L) and maximum (U) values of the dependence equation can be found, and by the mean value theorem, if Lb0a0U, the equation is assumed to have a solution in the iteration space, and a dependence is assumed to exist. If it is not the case that if Lb0a0U then dependence has been disproved. We note that in rectangular loop nests (i.e., nests in which the upper and lower bounds of all loops are constants within the iteration space) with known loop bounds, this test is precise [22].

A tighter bound can be found by using the PowerTest [252]. When solving a set of Diophantine equations using the extended gcd test the solutions for each index variable can be expressed in terms of parametric equations and parametric variables t. Thus, it may be that i1 = t2 + 1 and Images = t2. If i1 < Images, because of the direction vector under which the test is applied, then it must be that t2 + 1 < t2. If Images > 0, then t2 > 0, and t2 + 1 > t2, thus the solution to the system of equations is inconsistent with the direction vector, and the solution is inconsistent. Fourier-Motzkin elimination gives a systematic way of finding these inconsistencies. The Power Test can be exponential in the number of index variables (i.e., depth of the loop), but provides faster solutions than other techniques [119, 134] that utilize Fourier-Motzkin elimination to prove independence. When originally developed it was too slow for general use, but with modern machines can be applied to those references that may be dependent after applying Banerjee’s inequalities. The Power Test is exact for loops whose bounds are affine functions of outer loop indices, and in which all constants in the dependence equations and loop bounds are known.

The Omega test [184] is a powerful technique that uses an extension of Fourier-Motzkin Elimination to determine if a solution exists to a dependence equation. The Omega test is exponential in the number of index variables. The Omega tests can solve for solutions in some non-linear dependence equations. Of particular importance are those that arise from index expressions of the form c · i + n, where the value of n is unknown at compile time. This is done by adding the variable in the index expressions with an unknown value into the system to be solved as a new variable. As well, equations involving integer division and integer remainder operations can be handled. This is done by adding n as another variable to the system being solved. In [181, 182] it is claimed that in all situations where other dependence tests based on Fourier-Motzkin elimination and simpler analyses are accurate, the Omega test is sufficiently fast, especially when a careful implementation is done.

Using hybrid analysis, described in [202], conditions that lead to dependence not being disproved are identified, and tested at run time. Consider the pair of references a[i] and a[i + N]. If N = 0 then no loop carried dependence exists, i.e., no dependence that crosses iterations of a loop. If N > 0 the dependence source is a[i] and its sink is a[i + N]. Depending on the lexical relation of the references in the program, it might be possible to place a test at run time to evaluate the value of N, and if N = 0, the loop could be run in parallel, and if N > 0 or N < 0 then it might be possible to apply loop fission to the loop (see Section 5.3) and then parallelize the loop.

2.3.3 ARRAYS OF ARRAYS AND DEPENDENCE ANALYSIS

Some languages, most prominently Java, represent all multi-dimensional arrays as “arrays of arrays”. Thus, in Java, an m × n two-dimensional array consists of a one-dimensional array with m-elements that are pointers to arrays (the pointer array), and rn one-dimensional arrays representing the rows of the array, as shown in Figure 2.9. There are numerous problems associated with analyzing such arrays [155, 159], but we will only consider those raised by dependence analysis.

As shown in Figure 2.9, it is possible for a row array to be addressed by multiple elements in the pointer array. As a consequence, of this, showing that two references never access an element with the same coordinates is insufficient to show that they are independent. Thus, while the subscript expressions of the reads and writes of a never have the same values, there is still a flow dependence with distance zero on the i loop.

Images

Figure 2.9: An example of a Java array of arrays.

2.4 CONTROL DEPENDENCE

When a compiler transforms a program such that the order, kind of operation, or the conditions under which an operation is executed change, it must ensure that the output of the program is as if

1. the inputs to the operation are unchanged;

2. that the outputs of the moved operation do not change the input to other dependent operations;

3. that the operation is always performed at run time when it might have been performed during the original program; and

4. that the operation is never performed at run time when it would not have been performed during the original program.

The first two conditions are determined by the data dependence structure of the program. The final two constraints are determined by the control dependence [62, 75] structure of the program. If a control dependence exists from a source statement Sso to a sink statement Ssi, then whether or not some instances of Ssi execute is dependent on the outcome of Sso.

Consider the program and CFG of Figures 2.10(a) and (b). In Section 1.3.3 we discussed the concepts of dominance and dominance frontier, and how they relate to finding join points in a program. Determining control dependences solves a related problem. The statements in some block bsi are control dependent on the last statement in a block bso if the block bso causes a fork rather than a join in the CFG, and which path is taken at the fork determines if some instance of bsi is executed. It is precisely these forks that constitute the sources of control dependences.

There are two well-known ways of finding control dependences. The first is from [75], and there the following condition is given for control-dependence to exist between two nodes Nso and Nsi in the forward CFG. Nsi is control dependent on Nso if:

1. there is a path from Nso to Nsi such that additional node(s) N on the path must be post-dominated by Nsi;

2. Nso is not post-dominated by Nsi.

Intuitively, Nsi is control dependence on Nso if all nodes on the path between Nso and Nsi can reach Nsi (the first condition), but Nsi doesn’t have to (the second condition), and all other nodes N between Nso and Nsi on the path from Nso to Nsi must pass through Nsi to reach the exit of the CFG (the second condition).

Another technique, related to dominance frontiers, is given in [62]. If the reverse CFG is formed (shown in Figure 2.10(c)), i.e., a CFG where every edge bibj is replaced with an edge bjbi, then forks in the graph become joins, and the joins become forks. Thus, by finding the dominance frontier of a block bsi in the reverse CFG, the fork points nearest to bsi in the regular CFG are located, and hence the sources bso of all control dependences δc(bso, bsi) involving bsi. Figure 2.10(d) shows the control dependence in the program.

The program dependence graph [75] displays both the data and control dependences found in a program. By showing only the dependence edges, all edges not ordered by a dependence edge can be executed in parallel. The PDG for the program of Figure 2.10(a) can be found in Figure 2.11.

Control dependences can be essentially converted to data dependences [7], allowing a common framework to be used. Figure 2.12 shows a for loop with an if statement. The technique first converts the if to an assignment into a vector of bits, and then tests the appropriate bit before executing each statement that is under the if. Dependence testing will reveal the appropriate orderings that must be enforced, and can provide information about the iteration for the control dependence source.

2.5 USE-DEF CHAINS AND DEPENDENCE

The dependence test described above is typically applied to references that have at least one loop in common in their enclosing loop nests. It is often useful to determine the sources of read values on scalars and on array elements when there is no common loop nest, such as is shown in Figure 2.13. Whether the references access the same array elements can be determined by applying the gcd test to the subscripts. Since the reference in S1 will execute first, we know that the dependence source is S1 and the sink is S2. Since there are no common enclosing loops, the direction and distance vectors have no elements.

Images

Images

Figure 2.10: Example of a control dependencies.

Images

Figure 2.11: The program dependence graph (PDG) for the program of Figure 2.10(a), where solid edges represent data dependences and dashed edges represent control dependences.

Images

Figure 2.12: An example of converting control to data dependences.

Images

Figure 2.13: Array accesses without a common loop nest.

2.6 DEPENDENCE ANALYSIS IN PARALLEL PROGRAMS

An analysis analogous to dependence analysis can be applied to identify orderings in multithreaded programs that must be honored by both hardware and compiler transformations. This analysis is necessary when the program being analyzed is already parallel. Before considering the case of dependence analysis in parallel programs we will revisit what it means for there to be a dependence in a sequential program.

Images

Images

Figure 2.14: An example of viewing a dependence as an edge in a conflict graph.

In Figure 2.14(a) a sequential program with a loop containing a loop carried dependence (a flow dependence from S1 to S1 with a distance of 1) is shown. Let us consider an alternative way of viewing this dependence by building a conflict graph. In the conflict graph, nodes will be statements or references, and edges will represent either (i) orders implied by the program semantics (program edges) or (ii) a pair of accesses to the same memory location, where at least one is a write (conflict edges). A conflict graph for the loop body of Figure 2.14(a) is shown in Figure 2.14(b), and the program edge is shown as a solid line and the conflict edge as a dashed line.

Now consider the execution order of the references at each end of the conflict. If the reference that executes first is the reference that would have executed first if all program edges were honored (as seen in the orientation of the conflict edge in Figure 2.14(c)) then the value read by the reference to a in S2 will be the value written earlier by a write to a in S1, and the execution of the loop would be correct. If, however, the read of S2 executed before the write in S1 for one or more conflicting accesses of a, the orientation of those conflict edges would be as shown in Figure 2.14(d), and the outcome of the loop execution would be incorrect. That this is the case can be seen by observing that there is a cycle in the conflict graph of Figure 2.14(d), the cycle involves both program and conflict edges, and because there is this cycle the orientation of the conflict edges is inconsistent with the order of accesses implied by the program edges. It is the inconsistency that leads to an error in the execution.

Dependence analysis is looking for similar inconsistencies in a different way. The gcd test shows that there is a conflict between two references, since two references can access the same memory location. Testing under the sign determines if the possible source reference can access the memory location before the possible sink reference, a dependence exists, and the cycle corresponding to this dependence can be found by orienting the conflict edge to run from the sink to the source of the dependence.

Images

Figure 2.15: An example of an explicitly parallel scalar program.

Testing for dependence in explicitly parallel programs requires looking for similar inconsistencies between possible conflict edge orientations and orders implied by the parallel program semantics. We now discuss how this is done, using the results of Shasha and Snir [209].

Figure 2.15(a) shows a fragment of an explicitly parallel program, and Figure 2.15(b) shows the conflict graph for that program. The cobegin construct defines different threads, with each thread being the code that is delimited by the cobegin statement and the “//” markers. The conflict graph for the parallel program has solid lines that indicate program orders, i.e., orders that are defined by the program execution order (in this case, for a sequentially consistent program execution order) and dashed lines that indicate conflicts among accesses in the different threads. As mentioned above, a conflict can be conservatively defined as two accesses to the same storage location, at least one of which is a write. The more precise definition adds the proviso that the order that the accesses occur in can be determined by examining the state of the program, which in practice means that writes place a different value into the storage location than what was already there.

Images

Figure 2.16: An example of an explicitly parallel program with arrays.

Some execution orderings of accesses at the endpoints of conflict edges can lead to program outcomes that are inconsistent with those that are possible if the program orders are followed. These orders are exactly those orders that lead to cycles in the conflict graph. Thus, in Figure 2.15(a), if z = Y executes after Y = 1 its value will be 1. Because of the program edge from X = 1 to Y = 1, if z = Y executes after Y = 1 it should also execute after X = 1. And because of the program edge from z = Y to w = X, w = X should execute after z = Y, and transitively should execute after X = 1. Therefore, if z has the value of 1 then w should also have the value of 1. If w has a value of 0, it implies that somehow w = X executed before X = 1 in the same execution that z = Y executed after Y = 1. Tracing the corresponding conflict orientations on the conflict graph yields a cycle.

An important result of [209] is that enforcing all program edges involved in these cycles will prevent the invalid orientations from occurring at run time, and will prevent invalid executions. Thus, the program edges involved in cycles can be thought of as dependences in the explicitly parallel program.

Images

Figure 2.17: An example of a conflict graph with a minimal cycle involving the nodes X = 1, Y = 1, z = Y, and w = X; and a non-minimal cycle involving the same nodes as well as . . .= z and . . .= w.

In [209], Shasha and Snir prove that dependences in parallel programs can be identified by identifying all of the minimal, mixed cycles in a conflict graph for the program, and marking all program edges in these graphs as dependences. A mixed cycle is a cycle that contains both conflict and program edges. This is necessary because a cycle without conflict edges does not capture the state resulting from reads and writes occurring out of order, and a cycle without program edges does not expose an inconsistency with respect to the ordering semantics of the program. A minimal cycle is one that contains no smaller mixed cycle that spans two or more threads. An example of a minimal mixed cycle, and a non-minimal mixed cycle are shown in Figure 2.17(b). This edges of the minimal mixed cycle are shown bold and dark, and the additional edges in the non-minimal cycle (involving nodes “. . .= z” and “. . .= w” are shown in grey.

When a compiler analyzes the threads (and functions in a thread) independently of other functions and threads, alias analysis will show that X and Y in the program access separate storage and are independent, and therefore in an analysis that assumes that the program is sequential they may be re-ordered by either the hardware or the compiler. Either reordering the writes or the reads of X and Y will lead to the disallowed execution.

Figure 2.16(b) shows the conflict graph for the program of Figure 2.16(a). This program fragment contains a parallel parfor loop that has cross iteration conflicts. Again, the bold edges are involved in a cycle, and again this cycle indicates that an execution that is not consistent with the program orders can result if program edges involved in the cycle are not enforced. Conflicts can be detected by applying a dependence test with a direction vector of “*” elements to potentially conflicting references.

In [209] it is argued, but not proven, that a polynomial number of such cycles exist. It is shown, however, in [128] that finding the cycles is exponential in the number of threads, which makes this algorithm extremely expensive for compilers. Yelick and Krishnamurthy show in [128] that for SPMD programs the number of threads can be modeled with two threads, without loss of precision (compared to doing the analysis on a conflict graph from [209]), and leads to an O(|T|2) solution, where |T| is the number of threads. In Figure 2.19 an example of how SPMD programs can be handled is shown. In the graph the program of 2.16 each iteration is potentially a thread, and therefore up to n threads might need to be represented on a conflict graph as defined in [209]. Yelick and Krishnamurthy “roll” all of these threads into a left and a right pair of threads. Each conflicting pair of nodes (vp, vq) leads to conflicts between the corresponding vp in the left (right) thread, and vq in the right (left) thread. All program edges in the original conflict graph are placed on the right thread of this graph.

To find a program edge that is in a critical cycle, i.e., a program edge that must be enforced, the algorithm selects a pair of nodes (vp, vq) from the left side. It then attempts to find a path involving both conflict and program edges from vp to vq. If such a path exists, the edge (vq, vp) must be honored. Thus, the ordering from “a=” to “=b” must be honored because there is a path from “b=” in the left thread through the nodes “b=”, “=a”, “a=” in the right thread, and then to “a=” in the left thread.

In Figure 2.19(c), the same program as shown in Figure 2.19(c) is given, except now the references to the a and b arrays are subscripted. As can be seen from the graph of Figure 2.19(d) no cycles exist, but because the methods of both Shasha and Snir and Yelick ignore subscripts, both will still find a cycle. The algorithm of Midkiff [152] partially overcomes this limitation by finding cycles in parallel loops with constant distance dependences. The paper makes the observation that an edge on a conflict with a distance k moves from an iteration i to an iteration i ± k, depending on which direction in the iteration space (towards higher or lower iterations) the edge is traversed. By using Kirchoff’s laws of flows, the relative number of edges that can be traversed through the graph can be computed, and from this and the edge distances a linear programming problem can be solved that says whether it is possible to travel from an iteration back to that iteration on the conflict edges, and therefore whether or not a cycle exists.

Figure 2.18 shows an example of this (note that for clarity transitive program edges are not shown). Each mixed cycle in the graph is enumerated, and considered in turn. For example, with the cycle b[13] → b[j4] → b[j1] → b[i2] → b[j3], the set of constraints shown in Figure 2.18(c) is formed. Each program edge contains a relationship between the index variable values at the head and tail of the program edge, and these are added to the system. Each conflict edge is also labeled with a relationship between the index variable values at the tail and head of the edge, thus for the edge (b[j4], b[i4]) the relationship is i4 = j4. For clarity, this labeling has only been shown on the very bottom conflict edges ((b[j4], b[i4]) and (b[i4], b[j4])). Finally, each edge is labeled with a value ei. This represents the number of times this edge can be traversed, and the constraints on each ei are determined by Kirchoff’s Laws of Flow. Thus, in our example, the sum of the number of times edges e11 and e5 are traversed cannot be more than the number of times node a[j1] is entered, thus we know that both e11 + e5 < e4 and e11, e5e4. Moreover, it is known that every edge is traversed at least 0 times, i.e., ep ≥ 0, and that edges in the tested cycle must be traversed at least once, and so all e > 1 for these edges. Combining these constraints involving the edges in this example yields the set of equations of Figure 2.18(c). A linear programming solver can be used to see if a solution exists, and if it does the cycle is assumed to exist, and program edges in the cycle must be enforced.

Images

Images

Figure 2.18: An example of using SPMD program graphs to analyze explicitly parallel programs with SPMD parallelism. Dashed lines indicate conflicts, solid lines program flow.

Images

Images

Figure 2.19: An example of using SPMD program graphs to analyze explicitly parallel programs with SPMD parallelism. Dashed lines indicate conflicts, solid lines program flow.

In [225] a heuristic that finds cycles in non-SPMD programs is presented that gives good results. This technique works by checking for necessary, but not sufficient, conditions for a minimal mixed cycle to exist. Given a conflict graph, their technique picks a pair of nodes A and B joined by a program edge (A, B). We will demonstrate the heuristic using the program of Figure 2.15(a), and will test to see if there is a minimal mixed cycle involving the program edge going from the node X = 1 (A) to Y = 1 (B). Two other nodes are identified, which we will call C and D2. Let D be the node z = Y and C be the node w = X. Then a minimal mixed cycle may exist if:

1. a conflict edge exists between A (X = 1) and C (w = X);

2. a conflict edge exists between B (Y = 1) and D (z = X);

3. A (X = 1) and C (w = X) may occur in different threads;

4. B (Y = 1) and D (z = Y) may occur in different threads;

5. A (X = 1) may occur after C (w = X) and D (z = Y);

6. B (X = 1) may occur before C (w = X) and D (z = Y);

7. D (z = Y) may occur before C (w = X).

The first four conditions determine that there are conflict edges from the endpoints of the program edge being tested to accesses in different threads. The last condition says that a path of program edges might go from D to C. This combined with the existence of the (A, B) program edge and the two conflict edges establishes that there may exist a path from B to D to C to A and then back to B, which establishes the cycle. Finally, the fifth and sixth conditions establish that the executions of A and B, and C and D can overlap and interfere with each other, and therefore may lead to a dependence. In [225] it is shown that this test eliminates the majority of the dynamic program edges that must be enforced in the benchmarks studied.

In most languages with support for parallel and concurrent programming, different threads (as opposed to different instances of a single static thread) are activated by executing different functions or methods. This means that the analysis to determine dependences in explicitly parallel programs is inherently inter-procedural. The language with the best defined consistency model3 is currently Java, and its dynamic compilation model precludes, for performance reasons, analyses that require inter-procedural analysis. As a consequence, of this, and the lack of (or only recent definition of) well-defined memory models in other languages, these techniques have to our knowledge only been implemented in research compilers, with one exception. That exception is Unified Parallel C (UPC) [237], which used the analysis of [128] to optionally support a sequentially consistent execution model.

1This statement may not be true for multi-threaded programs, but the current discussion concerns single-threaded programs. The multi-threaded case will be discussed in Section 2.6

2In the paper they are called X and Y, but to avoid confusion with the variables in the example we have renamed them.

3Although with recent work on the C, C++ and OpenMP consistency models, it may not be unique at the time this lecture is being read

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.174.204