Code optimization is an important phase to improve the time and space requirement of the generated target code. Given a code in intermediate form it applies optimizing techniques and reduces the code size and returns the code in intermediate form.
10.2 Where and How to Optimize
10.3 Procedure to Identify the Basic Blocks
10.5 DAG Representation of Basic Block
10.7 Principle Source of Optimization
Code optimization phase is an optional phase in the phases of a compiler, which is either before the code generation phase or after the code generation phase. This chapter focuses on the types of optimizer and the techniques available for optimizing. It also includes global flow analysis, a technique to gather the information of how data and control flows, to apply global optimization.
It is possible for a programmer to outperform an optimizing compiler by using his or her knowledge in choosing a better algorithm and data items. This is far from practical conditions. In such an environment, there is need for an optimizing compiler. Optimization is the process of transformation of code to an efficient code. Efficiency is in terms of space requirement and time for its execution without changing the meaning of the given code. The optimization is considered an important phase because of the following practical reasons:
The following constraints are to be considered while applying the techniques for code improvement:
The optimization can be classified depending on
Optimization is the field where most compiler research is done today. High-quality optimization is more of an art than a science. Compilers for mature languages are judged based upon the quality of the object code generated and not based on how well they parse or analyze the code.
Example 1:
Consider the following example, which sets every element of an array to 1.
a) int array_ele[10000];
void Binky() {
int i;
for (i=0; i < 10000; i++)
array_ele[i] = 1;
}
b) int array_ele[10000];
void Winky() {
register int *p;
for (p = array_ele; p < array_ele + 10000; p++)
*p = 1;
}
In the above two examples, one may think the second one is faster than the first. It may be true if they use a compiler without optimization. Many modern compilers emit the same object code by using clever techniques like “loop-induction variable elimination.”
Optimization techniques can be applied to intermediate code or on the final target code. It is a complex and a time-consuming task that involves multiple sub phases, sometimes applied more than once. Most compilers allow the optimization to be turned off to speed up compilation process. For example, in gcc there are specific flags that are turned on/off for individual optimization. To apply optimization it is important to do control flow analysis and data flow analysis followed by transformations as shown in Figure 10.1.
Control Flow Analysis: It determines the control structure of a program and builds a control flow graph.
Data Flow Analysis: It determines the flow of scalar values and builds data flow graphs. The solution to flow analysis propagates data flow information along a flow graph.
Transformation: Transformations help in improving the code without changing the meaning or functionality.
Figure 10.1 Code Optimization Model
Flow Graph
A graphical representation of three address code is called flow graph. The nodes in the flow graph represent a single basic block and the edges represent the flow of control. These flow graphs are useful in performing the control flow and data flow analysis and to apply local or global optimization and code generation.
Basic Block
A basic block is a set of consecutive statements that are executed sequentially. Once the control enters into the block then every statement in the basic block is executed one after the other before leaving the block.
Example 2: For the statement a = b + c * d/e the corresponding set of three address code is
t1 = c * d
t2 = t1 / e
t3 = b + t2
a = t3
All these statements correspond to a single basic block.
Given a three address code, first identify the leader statements and group the leader statement with the statements up to the next leader. To identify the leader use the following rules:
Example 3: Identify the basic blocks for the following code fragment.
main ( )
{
int i = 0, n = 10;
int a [n];
while (i <=(n-1))
{
a[i] = i * i;
i=i+1;
}
return;
}
The three address code for the initialize function is as follows:
Identifying leader statements in the above three address code
Statement (1) is leader using rule 1
Statement (3) and (12) are leader using rule 2
Statement (4) and (12) are leaders using rule 3
Basic block 1 includes statements (1) and (2)
Basic block 2 includes statements (3) and (4)
Basic block 3 includes statements (5)–(11)
Basic block 4 includes statement (12)
Basic blocks are shown in Figure 10.2.
Flow graph shows the relation between the basic block and its preceding and its successor blocks. The block with the first statement is B1. An edge is placed from block B1 to B2, if block B2 could immediately follow B1 during execution or satisfies the following conditions.
Flow graph for Figure 10.2 is shown in Figure 10.3.
Figure 10.2 Basic Blocks
Figure 10.3 Flow Graph for Example 3
A DAG is a useful data structure for implementing transformations within a basic block. It gives the pictorial representation of how values computed at one statement are useful in computing the values of other variables. It is useful in identifying common sub-expressions within a basic block. A DAG has nodes, which are labeled as follows:
Note: The DAG is not the same as a flow graph. Each node in a flow graph is a basic block, which has a set of statements that can be represented using DAG.
To construct DAG, we process each statement of the block.
If the statement is a copy statement, that is, a statement of the form a = b, then we do not create a new node, but append label a to the node with label b.
If the statement is of the form a = b op c, then we first check whether there is a node with same values as b op c, if so we append the label a to that node. If such node does not exist then we first check whether there exists nodes for b and c which may be leaf nodes or an internal nodes if recently computed, then create a new node for op and add to it the left child b and the right child as c. Label this new node with a. This would become the value of a to be used for next statements; hence, we mark the previously marked nodes with a as a0.
DAG creation would be easy if we maintain the information of the nodes created for all the identifiers and facility to create a linked list of attached identifiers for each node.
We also define a function that returns the most recently created node associated with identifier.
Given a basic block we need to construct a DAG, which has the following information:
In the construction process, we process each statement and categorize it to one of the following cases.
Initially we assume there are no nodes, and a node is undefined for all arguments and does the following steps:
Example 4: Let us consider Figure 10.4 and construct DAG for each block step by step.
For block B1
For the statement which satisfies case iii) we first create a leaf labeled 4 and attach identifier i to it as shown in Figure 10.5.
For block B2
For first statement, which satisfies case i) we first create nodes for n and 1, then create node for operator(–) and label it as t1. Figure 10.6(a) shows for single statement in B2.
For second statement, which is a conditional statement, we create a node for operator > and label it as 12 as when this condition is satisfied it should go to statement 12. Figure 10.6(b) shows for all statements in B2 block.
Block B3
For the first statement, which satisfies case (i), we create nodes for i and use as right child, then create node for operator(*) and label it as t2 as shown in Figure 10.7(a).
Figure 10.4 Example 3 Program in Basic Blocks
Figure 10.5 DAG for Block B1
Figure 10.6(a) DAG for First Statement in Block B2
Figure 10.6(b) DAG for Block B2
Figure 10.7(a) DAG for One Statement Block B3
For second statement, we first create nodes for 4 and use node(i), then create node for operator (*) and label it as t3 as in Figure 10.7(b).
For third statement, we first create nodes for a and use node(t3), then create node for operator([]) and label it as t4 as in Figure 10.7(c).
For fourth statement, since it satisfies the copy statement, we delete label t4; mark it as t40 and attach t4 label to node with label t2 as in Figure 10.7(d).
After construction for the entire block, the DAG would be as shown in the Figure 10.7(e).
Figure 10.7(b) DAG for Two Statements in Block B3
Figure 10.7(c) DAG for Three Statements in Block B3
Figure 10.7(d) DAG for Four Statements in Block B3
Figure 10.7(e) DAG for Complete Block B3
A transformation of a program is called local if it is applied within a basic block and global if applied across basic blocks. There are different types of transformations to improve the code and these transformations depend on the kind of optimization required.
The following techniques are function-preserving transformations, where common sub expression elimination can be applied locally or globally. The other techniques are applied globally.
An expression E is said to be common sub expression if E is computed before and the variables in the expression are not modified since its computation. If such expression is present, then the re-computation can be avoided by using the result of the previous computation. This technique can be applied both locally and globally. We need to maintain a table to store the details of expressions evaluated so far and use this information to identify and eliminate the re-computation of the same expression. The common sub-expression elimination can be done within basic block by analyzing and storing the information of expression in the table until the operands in the expression are redefined. If any operand in the expression is redefined, then remove the expression from the table. The algorithmic approach is given below.
Function subexpr_elimination (Block B)
{
For each statement that evaluates an expression within basic block
{
Maintain in the table the set of expressions evaluated so far
If any operand in the expression if found as redefined, then remove it from the table
Modify the instructions accordingly as you go
Generate temporary variable and store the expression in it and Use the latest variable definition next time the expression is encountered.
}
}
Figure 10.8 is an example that shows the optimized code on applying this technique locally and globally.
Global common sub expression elimination is applied with the extra information collected in flow graph analysis. At every block entry and exit the collected information should be maintained to identify the evaluated expression.
Example 5: The following example in Figure 10.9 shows the common sub expression elimination globally by replacing the evaluated expression with a new temporary variable t1 and t2 and these variables are used wherever the same expression is used. This generates copy statements. If these copy statements are unnecessary, then they are eliminated by the technique explained in the next section.
Example 6: The bubble sort program is given below with its corresponding three address code represented in flow graph.
void quicksort(int LI, int HI)
{
int i, j;
int pivot, z ;
if ( HI ≤LI) return;
i = LI - 1;
j = HI
pivot =a [HI];
while (1)
{
do i = i + 1; while (a[i] < pivot);
do j = j - 1; while (a[j] > pivot);
if (i3 j) break;
z = a [i];
a[i] = a[j];
a[j] = z;
}
z = a[i];
a[i] = a[HI];
a[HI] = z;
quicksort(LI, j);
quicksort(j+1, HI);
}
Figure 10.8 Program Code Before and After Common Sub Expression Elimination
Figure 10.9 Example for Global Common Sub Expression Elimination
Its corresponding three address code is given below and corresponding flow graph is shown in Figure 10.10.
1. i: = LI - 1
2. j: = HI
3. temp1: = 4 * HI
4. pivot: = a[ temp1 ]
5. i: = i + 1
6. temp2: = 4 * i
7. temp3: = a[ temp2 ]
8. if temp3 < pivot go to 5
9. j: = j - 1
10. temp4: =4 * j
11. temp5: = a[ temp4 ]
12. if temp5 > pivot go to 9
13. if i 3 j go to 23
14. temp6: = 4 * i
15. z: = a[ temp6 ]
16. temp7: = 4 * i
17. temp8: = 4 * j
18. temp9: = a[ temp8 ]
19. a[ temp7 ]: = temp9
20. temp10: =4 * j
21. a[ temp10 ]: = z
22. go to 5
23. temp11: = 4 * i
24. z: =a[ temp11 ]
25. temp12: = 4 * i
26. temp13: = 4 * HI
27. temp14: = a[ temp13 ]
28. a [ temp12 ]: = temp14
29. temp15: = 4 * HI
30. a[ temp15 ]: = z
On applying local common sub expression elimination, we have block B5 and block B6 with improved code as the expressions 4 * i is computed more than once without change of value. Hence, on optimizing locally, we get resultant code as shown in Figure 10.11.
On applying this technique globally, the code is further improved and the resultant code is shown below in Figure 10.12. It is clear that the computation of a[ i ] in block B2 can be used in B5 and B6. Similarly, a[ j ] computed in B3 can be used in B5 and B6.
A copy statement is a statement that is in the form a = b. Copy propagation technique is applied to replace the later use of variable a with the use of b if original values of a do not change after this assignment. This technique can be applied locally and globally. This optimization reduces runtime stack size and improves execution speed. To implement this, we need to maintain a separate table called copy table, which holds all copy statements. While traversing the basic block, if any copy statement is found, add this information into the copy table. While processing any statement, check the copy table for variable a, which can be replaced variable b. If there is an assignment statement that computes the value for a then remove the copy statement from the copy table. Copy propagation helps in identifying the dead code.
Figure 10.10 Flow Graph for Quick Sort Program
Let us assume that there are a set of statements s1, s2, s3.....sn, where the statements would be either a copy statement of the statement which uses the right hand side of copy statement or computes the value for a variable.
Figure 10.11 After Applying Common Sub Expression Elimination for Figure 10.10
Algorithm for copy propagation
Let INSERT (a, b) be a function that inserts {(a, b)} in a copy table, for a copy statement a = b, which indicates that a can be replaced by b.
GET (a, table) be a function that checks in the table for the variable a. if found returns b; otherwise, returns a.
Figure 10.12 After Applying Global Common Sub Expression Elimination for Figure 10.11
GET (a, table)
{
if you find (a, b) in table
return b
else return a
}
GET (a, table)
{
if you find (a, b) in table
return b
else return a
}
{
for each statement from s1 to sn
if statement is of the form “a = b op c”
b = GET(b, table)
c = GET(c, table)
else if statement is of the form “a = x”
x = GET(x, table)
if statement has a left hand side a,
REMOVE from table all pairs involving a.
if statement is of the form “a = x”
insert {(a, GET(x, table))} in the table
endfor
}
Example 7: Let the basic block contain the following set of statements:
Y = X
Z = Y + 1
W = Y
Y = W + Z
Y = W
On the first statement 1, since it is a copy statement, the information is added in to the copy table. Statement 2 uses the value of X indirectly from Y. hence GET(Y,table) would return X and the statement is modified replacing Y with X. The third statement is a copy statement and since Y is to replace X, we insert into the copy table W to be replaced by X. When statement 4 is processed, Y value is defined; hence, details for Y are removed from the copy table. The fifth statement is a copy statement and is inserted into the copy table.
To apply the technique globally, we perform flow analysis and given a copy statement X = Y and a statement as W = X in successive blocks, we can replace W = X with W = Y only if the following conditions are met:
Example 8: The following example in Figure 10.13 shows the copy propagation technique applied globally across different blocks. After the copy propagation we can see that the copy statements d = c and g = e is a dead code as the values assigned to d and g are not used; hence they can be eliminated.
Example 9: For the resultant quicksort code in Figure 10.12, obtained after common sub expression elimination, copy propagation is applied the resultant code as is shown in Figure 10.14.
Code that is unreachable or that does not affect the program is said to be dead code. Such code requires unnecessary CPU time, which can be identified and eliminated using this technique. The following program gives an example in high-level language where, the value assigned to i is never used, and the value assigned to var1 variable in statement 6 is dead store and the statement 9 is unreachable. These statements can be eliminated as they do not affect the program execution.
Figure 10.13 Example for Global Copy Propagation Technique
Figure 10.14 Quick Sort Program After Applying Copy Propagation
2. void sample()
3. {
4. int i;
5. i = 1; /* dead store */
6. var1 = 1; /* dead store */
7. var1 = 2;
8. return;
9. var1 = 3; /* unreachable */
10. }
Below is the code fragment after dead code elimination.
1. int var1;
2. void sample()
3. {
4. var1 = 2;
5. return;
6. }
Example 10: Blocks B5 and B6 of flowchart corresponding to quick sort there is dead store to variable Z which can be eliminated by dead code elimination. The resultant code is shown in Figure 10.15.
Figure 10.15 Quick Sort Program After Dead Code Elimination
Constant propagation is an approach that propagates the constant values assigned to a variable at the place of its use. For example, given an assignment statement x = c, where c is a constant, replace later uses of x with uses of c, provided there are no intervening assignments to x. This approach is similar to copy propagation and is applied at first stage of optimization. This method can analyze by propagating constant value in conditional statement, to determine whether a branch should be executed or not, that is, identifies the dead code.
Example 11: Let us consider the following example:
In this short example, we can notice some simple constant propagation results, which are as follows:
Note: In common sub expression elimination or constant propagation, often it may require to rearrange the expressions by using the associative properties of the algebraic expression.
In most of the programs, 90% of execution time is inside the loops; hence, loop optimization is the most valuable machine-independent optimization and are good candidates for improvement. Loops are for convenience of program writing, but these programs has to be transformed using loop transformation techniques. Let us consider the example of a program that has a loop statement.
for (int i=0; i<1000; i++)
x = x + y/z;
print x;
The optimized code of this program is represented in flow graph as shown in Figure 10.16.
Figure 10.16 Flow Graph After Code Optimization
It is clear that for the code the total number of statements that are executed are as follows:
Statement Number |
Frequency of execution |
Total number of statements |
---|---|---|
1 |
1 |
1 |
2 – 5 |
1000 |
4000 |
6 |
1 |
1 |
On the whole, the number of statements that are executed are 4002. Instead, if the code is written as follows, then the total number of statements that are executed are only 1002.
….
1001. x = x + t1
1002. print x
The execution is three times better for later than the first form, but the space requirement is more. Inefficient code is generated in the loops for various reasons like
The important loop optimizations are elimination of loop invariant computations, strength reduction, code motion and elimination of induction variables.
A loop invariant computation is one that evaluates the same value every time the statements in loop are executed. Moving such a computational statements outside the loop leads to a reduction in the execution time.
Algorithm for elimination of loop invariant code
Step 1: Identify loop-invariant code.
Step 2: An instruction is loop-invariant if, for each operand:
Step 3: Move it outside the loop.
(i.e., all uses of x within the loop can be reached only by statement i.)
Note: It is important that while applying loop-invariant code motion in nested loops, start at innermost loop and work outwards.
Example 12:
for (int i=0; i<1000; i++)
x = x + y/z;
print x;
In the example, the value of y and z remains unchanged as there is no definition for these variables in the loop. Hence, the computation of y/z remains unchanged, which can be moved above the loop and the same code can be rewritten as follows:
t1= y/z
for (int i=0; i<1000; i++)
x = x + t1;
print x;
Induction variables are those variables used in a loop, their values are in lock-step, and hence, it may be possible to eliminate all except one. There are two types of induction variables—basic and derived. Basic induction variables are those that are of the form
where I is loop variable and C is some constant.
Derived induction variables are those that are defined only once in the loop and their value is a linear function of the basic induction variable.
For example,
Here J is a variable that is dependent on the basic induction variable I and the constants A and B. This is represented as a triplet (I, A, B)
Induction variable elimination involves three steps.
Given a loop L with reaching definition information and the families of induction variable this algorithm generates the output that indicates the set of induction variables. For each induction variable J that belongs to family of I if, there is as associated triplet (I, A, B) where I is the basic induction variable, A and B are constants such that, value of J is computed as A * I + B.
Algorithm for detecting the induction variable
For all basic blocks
J = I + B ( then the triplet is (I, 1, B)
J = I– B ( then the triplet is (I, 1, –B)
J = A * I ( then the triplet is (I, A, 0)
J = I / A ( then the triplet is (I, 1/A, 0)
Suppose the induction variable J = C * K where the triplet for K is (I, A, B), then triplet for J is given as (I, A * C, B * C). Once the family of induction variable is found, we modify the instructions, computing induction variable to use additions or subtractions rather than multiplications. This can be done by the strength reduction process, which is shown in the next algorithm.
Example 13: Let us consider the block B2, optimized code of quick sort program as in Figure 10.17. In this block, variable i is a basic induction variable as there is a lone assignment to i in the loop and increments by 1. temp2 is in the family of i and its triplet is (i, 4, 0). Similarly J is the only basic induction variable in block B3 with temp4 in the family of J with triplet (j, 4, 0)
5. i : = i + 1
6. temp2 : = 4 * i
7. temp3 : = a [ temp2]
8. if temp3 < pivot go to B2
Figure 10.17 Block B2 from Quick Sort Program
For a loop L with reaching definition information and families of induction variables, we revise the loop where the high strength operations are replaced by low strength operations.
Algorithm
Consider each basic induction variable I, consider a variable J in the family of I depending on the variable I with triplet (I, a, b):
Example 14: On applying the previous algorithm, we got that in block B3 the basic induction variable is j and temp4 is in family of J with triplet (j, 4, 0). On applying the above algorithm, the statement temp4 = 4 * j is replaced by temp4 = s4 and inserts the assignment statement s4 = s4 – 4 after the assignment of j = j – 1 where –4 is obtained by multiplying the –1 in the assignment to j and the 4 in the triplet (j, 4, 0) for temp4. Initialization of s4 is placed at the end of block B1, which contains the definition of J. After strength reduction, we find that only use of some induction variables is in tests, we can replace a test of such an induction variable by that of another as shown in Figure 10.18.
The induction variable can be eliminated if the reaching definition information, loop invariant computation information, and live variable information is available.
Figure 10.18 Quick Sort Program After Applying Strength Reduction
For example, let J be in the family of I with triplet (I, A , B) where A is positive.
=> J = A * I + B.
Let there be a test of the form
if I relop X goto B
where X is not an induction variable; then this statement can be replaced as follows:
C = A *X
If J relop C goto B
where C is a new temporary variable. This modification makes the code independent of induction variable I since comparing I with X is the same as comparing J with A*X+B
If there are two variables I1 and I2 then the test statement is
if I1 relop I2 goto B,
then we check for the variables J1 and J2 in the families of I1 and I2 with the triplets as (I1, A1, B1) and (I2, A2, B2) respectively with A1 = A2 and B1 = B2 then the statement
if I1 relop I2 goto B, can be replaced as
if J1 relop J2 goto B.
Once these variables are replaced, delete all the induction variables as these are useless.
On applying induction variable elimination to the quick sort program, the resultant code is shown in Figure 10.19.
To apply the optimization globally and get a good optimized code, it is required to collect information about variables or expressions in the program as a whole and distribute this information to each block. Data flow information can be collected by setting up equations and solving these equations at various points. These equations are framed in terms of four variables that are listed below.
IN(S) |
it indicates the set of definitions that are reaching the statement S. | |
OUT(S) |
it indicates the set of definitions that are leaving the statement S. | |
GEN(S) |
the definition that is generated at this statement. | |
KILL(S) |
the definition that is killed at this statement. |
A typical equation has a form
OUT(S) = GEN(S) ∪ (IN(S) – KILL(S))
This indicates the information at the end of a statement is either generated within the statement, or enters at the beginning of the statement and is not killed in as the control flows through the statement.
The factors that influence the framing of the equations and solving them depends on
Before understanding the method for framing and solving the equations, let us try to understand reaching definition, use definition chains, definition use chains, live variable analysis.
Figure 10.19 Code After Induction Variable Elimination
A point is between two adjacent statements, as well as the point before and after a statement. If a block has four statements, then there would be five points in the block as shown in Figure 10.20.
A path is a sequence of points p1, p2, .......pn, which give the global view of how the control flows and satisfies the condition for each i between 1 to n; either
Figure 10.20 Basic Block indicating Different Points
A definition of a variable a is a statement that assigns, or may assign, a value to a. The common forms of definition to assign value to a is by an input statement or by an assignment statement. These forms are said to be unambiguous. The ambiguous definitions are
A definition d that reaches a point p, if and only if there exists a path from the point of its definition to the point immediately following d, such that d is not killed in any path that reaches d.
Use definition chains are used to store the reaching definition information. It is the list of each use of a variable out of all definitions that reach that use.
Some code optimization techniques depend on the information computed in the direction opposite to the flow of control. For example, in dead code elimination, we may eliminate the statements that assign values to variables that are never used. This requires the analysis in the opposite direction, which identified what definitions are used at any statement. In live variable analysis, we wish to know for a variable x and a point p whether the value x at p could be used along some path in the flow graph starting at p. if we say so, x is live at p, otherwise x is dead at p.
A variable is used at a statement s if the r-value of the variable is required in the computation in statement s. The du-chain represents for a point p the set of uses s of a variable x, such that there is a path from p to s that does not redefine x.
The program includes different control flow constructs such as if statements, iterative statements, which would control the framing of data flow equations. This section gives different ways of framing equations that are structure dependent.
Figure 10.21 gives the list of rules for framing equations where Da indicates the set of all definitions corresponding to variable a.
The set of definitions for GEN and KILL can be represented using bit representation, which is compact and easy to compute. We assign a number for each definition and then use a bit vector that has the bits depending on the number of definitions. To compute union, we can perform logical operation, for intersection perform a logical operation and for difference, that is, A – B perform logical A and (not B).
Figure 10.21 Rules for Framing Equations
Example 15: Consider the program that has eight definitions as shown below
d1 I = n - 1
d2 J = m
d3 a = x 1
do
d4 I = I + 1
d5 J = J - 1
if E1 then
d6 a = x2
else
d7 I = x2
d8 a = x3
while E2
In the above partial program, there are eight definitions from d1 through d8. Eight bits are required for bit representation. At any given point, the KILL, GEN, IN, OUT are represented as 8 bits. The bits that are set to 1 are said to be those definitions that are killed, generated, reaching, or leaving the point respectively. The set for KILL and GEN are computed by applying the data flow equations to the statements. These statements can be represented as a syntax tree using the following rules.
S → id: = E | S ; S if E then S else S |
do S while E | if E then S
E → id + id | id – id | id * id | id / id | id
We can represent the entire program in the form of syntax tree using the rules. For each node in the parse tree, we can compute the values for GEN and KILL as explained below.
GEN(d1) = 1000 0000 / as only one definition is done here
KILL(d1) = 0001 0010 / the definition here is for I and this kills the definition of I in
Similarly, for all definitions, d2, d3 .....d8 the calculations are done. At a point above d1 and d2 to compute GEN and KILL we apply the rule of statements in sequence
GEN[S] = GEN[S2] ∪ (GEN[S1] – KILL[S2])
⇒ 0100 0000 ∪ (1000 0000– 0000 1000)
⇒ 0100 0000 ∪ (1000 0000 ∩ 11110111)
⇒ 0100 0000 ∪ (1000 0000)
⇒ 1100 0000
KILL[S] = KILL[S2] ∪ (KILL[S1] – GEN[S2])
⇒ 0000 1000 ∪ (0001 0010 – 0100 0000)
⇒ 0000 1000 ∪ (0001 0010 ∩ 1011 1111)
⇒ 0100 0000 ∪ (0001 0010)
⇒ 0001 1010
Similarly, calculations are done for each node and the values are shown in the annotated parse tree of Figure 10.22.
Once the KILL and GEN for each block is computed, we can compute IN and OUT for each block using the formulas
We use iterative approach, starting with an estimation for IN[b] = Ø for all blocks B and converging to the desired value of IN and OUT. The algorithm sketch is given below.
Figure 10.22 Annotated Parse Tree with GEN and KILL Values
Example 16: The above algorithm when applied to the flow graph of Example 15, we can observe that at the end, the GEN and KILL values for each block are shown in Figure 10.23.
According to the algorithm, first initialize the values of IN[B] with all zeros and based on GEN information find the values for OUT for each block.
PASS 1:
IN[B1] remains the same as there is no predecessor block.
OUT[B1] = GEN[B1] ∪ (IN[B1] – KILL[B1])
= 1110 0000 ∪ (0000 0000 – 0001 1111)
= 1110 0000 ∪ (0000 0000 ∩ 1110 0000)
Figure 10.23 GEN and KILL Values for Each Block
= 1110 0000
IN[B2] = OUT[B1] ∪ OUT[B3] ∪ OUT[B4]
= 1110 0000 ∪ 0000 0100 ∪ 0000 0011
= 1110 0111
OUT[B2] = GEN[B2] ∪ (IN[B2] – KILL[B2])
= 0001 1000 ∪ (1110 0111– 1100 0010)
= 0001 1000 ∪ (1110 0111 ∩ 0011 1101)
= 0001 1000 ∪ 0011 0101
= 0011 1101
IN[B3] = OUT[B2] = 0011 1101
OUT[B3] = GEN[B3] ∪ (IN[B3] – KILL[B3])
= 0000 0100 ∪ (0011 1101 – 0010 0001)
= 0000 0100 ∪ (0011 1101 ∩ 1101 1110)
= 0000 0100 ∪ 0001 1100
= 0001 1100
IN[B4] = OUT[B2] = 0011 1101
OUT[B4] = GEN[B4] ∪ (IN[B4] – KILL[B4])
= 0000 0011 ∪ (0011 1101 – 0010 0001)
= 0000 0011 ∪ (0011 1101 ∩ 1101 1110)
= 0000 0011 ∪ 0001 1100
= 0001 1111
PASS 2:
IN[B1] remains same as there is no predecessor block.
OUT[B1] = GEN[B1] ∪ (IN[B1] – KILL[B1])
= 1110 0000 ∪ (0000 0000– 0001 1111)
= 1110 0000 ∪ (0000 0000 ∩ 1110 0000)
= 1110 0000 ∪ 0000 0000
= 1110 0000
IN[B2] = OUT[B1] ∪ OUT[B3] ∪ OUT[B4]
= 1110 0000 ∪ 0001 1100 ∪ 0001 1111
= 1111 1111
OUT[B2] = GEN[B2] ∪ (IN[B2] – KILL[B2])
= 0001 1000 ∪ (1111 1111– 1100 0010)
= 0001 1000 ∪ (1111 1111 ∩ 0011 1101)
= 0001 1000 ∪ 0011 1101
= 0011 1101
IN[B3] = OUT[B2] = 0011 1101
OUT[B3] = GEN[B3] ∪ (IN[B3] – KILL[B3])
= 0000 0100 ∪ (0011 1101– 0010 0001)
= 0000 0100 ∪ (0011 1101 ∩ 1101 1110)
= 0001 1100
IN[B4] = OUT[B2] = 0011 1101
OUT[B4] = GEN[B4] ∪ (IN[B4] – KILL[B4])
= 0000 0011 ∪ (0011 1101– 0010 0001)
= 0000 0011 ∪ (0011 1101 ∩ 1101 1110)
= 0000 0011 ∪ 0001 1100
= 0001 1111
Since the OUT of each block in pass 2 is same with pass 1, it is converged and can be stopped.
The same technique can also be applied for common sub expression and use this information for global optimization.
This optimization can be applied on target machine instructions. This includes register allocation, use of addressing modes and peep hole optimization. Instructions involving register operands are faster and shorter (instruction length is small); hence, if we make use of more registers during target code generation, efficient code will be generated. Hence, register allocation and use of addressing modes also contribute to optimization. The most popular optimization that can be applied on target machine is peephole optimization.
Peephole Optimization
Generally code generation algorithms produce code, statement by statement. This may contain redundant instructions and suboptimal constructs. The efficiency of such code can be improved by applying peephole optimization, which is simple but effective optimization on target code. The peephole is considered a small moving window on the target code. The code in peephole need not be contiguous. It improves the performance of the target program by examining and transforming a short sequence of target instructions. The advantage of peephole optimization is that each improvement applied increases opportunities and shows additional improvements. It may need repeated passes to be applied over the target code to get the maximum benefit. It can also be applied directly after intermediate code generation.
In this section, we shall define the following examples of program transformations that are characteristic of peephole optimizations.
The code generation algorithm produces the target code, which is either represented with single operand or two operands or three operands. Let us assume the instructions are with two operands. The following is an example that gives the assembly code for the statement x = y + z.
Instruction 1 moves the value of y to register R0, second instruction performs the addition of value in z with the register content and the result of the operation is stored in the register. The third instruction copies the register content to the location x. At this point the value of x is available in both location of x and the register R0.
If the above algorithm is applied on the code a = b + c, d = a + e then it generates the code given below:
Here we can say that 3 and 4 are redundant load and store instructions. These instructions will not affect the values before or after their execution. Such redundant statements can be eliminated and the resultant code is as follows:
There are few algebraic identities that occur frequently enough and are worth considering.
Look at the following statements.
x: = x + 0
x: = x * 1
They do not alter the value of x. If we keep them as it is, later when code generation algorithm is applied on it, it may produce six statements that are of no use. Hence, such statements whether they are in three address code or target code can be removed.
Removal of unreachable code is an opportunity for peephole optimization. A statement immediately after an unconditional jump or a statement that never get a chance to be executed can be identified and eliminated. Such code is called the dead code.
For example consider a statement in high level language code
# define x = 0
…….
If (x)
{ ----print value
-----
}
If this is translated to target code as
If x=l goto L1
goto L2
L1: print value
L2 : ……..
Here value will never be printed. So whatever code inside the body of “if(x)” is dead code; hence, it can be removed.
Sometimes when we apply code generation algorithms mechanically we may get jump on jumps as follows:
goto L1
L1: goto L2
…….
L2: goto L3
….
L3: if a < b goto L4
L4:
This can be optimized as
goto L3
……….
L3: if a < b goto L4
L4:
This optimization mainly deals with replacing expensive operations by cheaper ones. For example
While generating the target code it is better to make use of rich instruction set supported by the target machine, instead of blindly applying available code-generation algorithms. This may produce efficient code. Feature provided by machine architecture may be identified and used wherever applicable to reduce the overall execution time significantly.
For example, consider the statement x = x + 1; if we apply the code-generation algorithm mentioned in redundant load/store, we get six instructions but there can be hardware instructions for certain specific operations auto-increment and auto-decrement addressing mode like INR x, which is one instruction only.
t1 = 4 * i
t2 = a[t1]
t3 = b[t1]
t4 = t2 * t3
pr = pr + t4
i = i + 1
if i <=20 goto (1)
Solution: The DAG for a given code is shown in Figure 10.24
Figure 10.24 DAG for 1
Solution: By using the algorithm for leaders, we identify leaders as 1, 3, 4, 8.
The statement starting from a leader up to the next leader is a basic block.
Figure 10.25 shows the basic blocks for the factorial program.
The flow graph is shown in Figure 10.26.
Figure 10.25 Basic Blocks for Factorial Program
Figure 10.26 Flow Graph for Factorial Program
for (i=1; i<n; i++)
for (j=1; j<n; j++)
c [i,j] = a[i,j] * b[i,j];
print(“done”);
Solution: a. The three address code for the above program is as follows:
2. j = 1
3. t1 = 4 * i
4. t2 =i - 1
5. t3 =t2 + j
6. t4 = t3 * 4
7. t5 = a[t4]
8. t6 = i - 1
9. t7 = t6 * n
10. t8 = t7 + j
11. t9 = t8 * 4
12. t10 = b[t9]
13. t11 = t5 * t10
14. t12 = i - 1
15. t13 = t12 * n
16. t14 = t13 + j
17. t15 = t14 * 4
18. t16 = c[t15]
19. t16 = t11
20. t17 = j + 1
21. j = t17
22. if j < n go to 2
23. t18 = i + 1
24. i = t18
25. if i < n go to 1
26. print(“done”);
b. To identify the basic blocks, first identify the leader statements. In the above three address code, the leader statements are statement 1, 2, 23, and 26.
1. i = 1 →leader 1
2. j = 1; →leader 2
3. t1 = 4 * i
4. t2 = i - 1
5. t3 = t2 + j
6. t4 = t3 * 4
7. t5 = a[t4]
8. t6 = i - 1
9. t7 = t6 * n
10. t8 = t7 + j
11. t9 = t8 * 4
12. t10 = b[t9]
13. t11 = t5 * t10
16. tl4 = t13 + j
17. t15 = t14 * 4
18. t16 = c[t15]
19.t16 = t11
20. t17 = j + 1
21. j = t17
22. if j < n go to 2
23. t18 = i + 1 →leader 3
24. i = t18
25. if i <n go to 1
26. print(“done”); →leader 4
14. t12 = i - 1
Basic block 1 has statement 1
Basic block 2 has statements 2 – 22
Basic block 3 has statements 23 – 25
Basic block 4 has statement 26.
The flow graph is shown in Figure 10.27.
Figure 10.27 Flow Graph for 3
Solution: The three address code is given in the previous problem. On applying redundant sub expression elimination we get it as
t1 = a + b;
t2 = a * b;
t3 = t1 * b;
c = t3 + t2;
Solution:
The DAG for the above sequence of statements is in Figure 10.28
Figure 10.28 DAG for 5
Solution: Procedure for constructing DAG is explained in Section 10.6. The DAG for the given expression is in Figure 10.29.
Solution:
Figure 10.29 DAG for 6
Figure 10.30 DAG for 7
1. __________ are the principal sources of optimization.
2. __________ analysis is used to detect loops in intermediate code.
3. Applying optimization guarantee that generated code is optimal. (Y/N) __________
3. Replicating the body of the loop to reduce number of tests if the number of iterations is constant is called __________.
4. Merging the bodies of two loops if the loops have same number of iterations and they use same indices is called __________.
5. Moving a computation from a high-frequency region to a low-frequency region is called __________.
6. __________ is a code which is never used.
7. __________ are useful data structures for redundancy elimination.
8. Replacing a costlier operation by a cheaper one is called __________.
9. The usefulness of a variable is listed using __________.
10. The possible definitions that are reachable at a particular point are listed using
11. In a path p1, p2, …..pn, if pi is the last point in a block then pi+1 is __________.
12. To apply global redundant sub expression elimination, __________ gives the required information.
1. Write the importance of code optimization. Explain the strength-reduction technique in loop with example.
2. Explain with example the various techniques in loop optimization.
3. What is local and global optimization? Explain with example any three local optimization techniques.
4. Consider the following part of code.
int main()
{
int n,k=0;
scanf(“%d,”&n) ;
for(i=2; i<n;i++)
{
if ( (n % i) == 0) break;
}
k=1;
if( i==n)
printf(“number is prime”);
else
printf(“number is not prime”);
}
5. Explain the following loop optimization techniques with examples.
6. Explain about (a) loop unrolling (b) loop jamming (c) code motion.
7. Explain about (a) folding (b) strength reduction (c) redundant code elimination (d) dead code elimination.
8. Write the procedure to construct the DAG for a statement. Represent the following statement using DAG.
9. Explain DAG and its use. Write the procedure to construct the DAG for a statement.
10. What is a basic block? With an example explain the procedure to identify the basic blocks in a given program. Draw the domination tree for the following procedure.
11.
t8 = d + e
t6 = a + b
t5 = t6 – c
t4 = t5 * t8
t3 = t4 – e
t2 = t6 + t4
t1 = t2 * t3
12.
read a,b
c=a+b
If (c>=30) go to 10
c=c+0
c=c*2
go to 20
10 c=c*3
20 print a,b,c
14. Define the following terms.
15. Explain the formulation of data flow equations for reaching definition in structured programs. Describe the procedure to compute in and out values.
16. Explain about global data flow analysis. Explain the factors that affect the data flow equations.
17. Write the iterative algorithm for reaching definition. Compute in and out for the following flow graph.
18. Explain about global data flow analysis. List the data flow equations for reaching definitions for structured programs.
19.
20.
18.222.114.132