Intermediate code generator makes target code generation easier. It takes the hierarchical representation of the source program as parse tree and prepares linear representation that is simple and easy to analyze.
The intermediate code is useful representation when compilers are designed as two pass system, i.e. as front end and back end. The source program is made source language independent by representing it in intermediate form, so that the back end is filtered from source language dependence. The intermediate code can be generated by modifying the syntax-directed translation rules to represent the program in intermediate form. This phase of intermediate code generation comes after semantic analysis and before code optimization as shown in Figure 8.1.
Benefits of intermediate code
Figure 8.1 Role of Intermediate Code Generator
Intermediate code can be represented in the following four ways.
A syntax tree is a graphical representation of the source program. Here the node represents an operator and children of the node represent operands. It is a hierarchical structure that can be constructed by syntax rules. The target code can be generated by traversing the tree in post order form. For instance, consider an assignment statement a = b* – (c – d) + b* – (c – d) when represented using the syntax tree it appears as follows shown in Figure 8.2.
The rules for constructing the syntax tree for assignment statements are produced by the syntax-directed definition as shown in Figure 8.3
The tree for the statement a = b* – (c – d) + b* – (c – d) is constructed by creating the nodes in the following order.
p1 = mkleaf(id, c)
p2 = mkleaf(id, d)
p3 = mknode('–', p1, p2)
p4 = mknode('U', p3, NULL)
p5 = mkleaf(id, b)
Figure 8.2 Syntax tree for a = b* – (c – d) + b* – (c – d)
Semantic Rule |
|
---|---|
S → id := E |
S.nptr := mknode( 'assign', mkleaf(id, id.place), E.nptr) |
E → E1 + E2 |
E.nptr := mknode('+', E1.nptr ,E2.nptr) |
E → E1 * E2 |
E.nptr := mknode('* ', E1.nptr ,E2.nptr) |
E → − E1 |
E.nptr := mkunode('uminus', E1.nptr) |
E → ( E1 ) |
E.nptr := E1.nptr |
E → id |
E.nptr := mkleaf(id, id.place) |
Figure 8.3 SDT for Creating the Syntax tree
p6 = mknode('*', p5, p4)
p7 = mkleaf(id, c)
p8 = mkleaf(id, d)
p9 = mknode('–', p7, p8)
p10 = mknode('U', p9, NULL)
p11 = mkleaf(id, b)
p12 = mknode('*', p11, p10)
p13 = mknode('+', p6, p12)
p14 = mkleaf(id, a)
p15 = mknode('=', p14, p13)
The tree that is constructed using the above procedures is shown in Figure 8.4.
Figure 8.4 Tree Construction for a = b* – (c – d) + b* – (c – d)
The tree that shows the same information with identified common sub-expression is called Directed Acyclic Graph (DAG). On examining the above example, it is observed that there are some nodes that are unnecessarily created. To avoid extra nodes these functions can be modified to check the existence of similar node before creating it. If a node exists then the pointer to it is returned instead of creating a new node. This creates a DAG, which reduces the space and time requirement. We can use the same SDT (Figure 8.3) to create a DAG without any modification but mknode() is redefined as above. The list of nodes created in DAG is as follows:
p1 = mkleaf(id, c)
p2 = mkleaf(id, d)
p3 = mknode('–', p1, p2)
p4 = mknode('U', p3, NULL)
p5 = mkleaf(id, b)
p6 = mknode('*', p5, p4)
p7 = mkleaf(id, c) = p1
p8 = mkleaf(id, d) = p2
p9 = mknode('–', p1, p2) = p3
p10 = mknode('U', p3, NULL) = p4
p11 = mkleaf(id, b) = p5
p12 = mknode('*', p5, p4) = p6
p13 = mknode('+', p6, p6)
p14 = mkleaf(id, a)
p15 = mknode('=', p14, p13)
The Figure 8.5 shows the constructed DAG for the given expression.
In this tree the number of nodes is reduced and also helps in identifying redundant sub expression. All the nodes in the syntax tree can be visited by following the pointers starting from the root and the instructions can be generated by traversing the tree in post order form.
Postfix notation is a linear representation of a syntax tree. This can be written by traversing the tree in the post order form. The edges in a syntax tree do not appear explicitly in postfix notation; only the nodes are listed. The order is followed by listing the parent node immediately after listing its left sub tree and its right sub tree. In postfix notation, the operators are placed after the operands. The postfix notation for the statement is as follows:
Figure 8.5 DAG Construction for a = b* – (c – d) + b* – (c – d)
Three address code is a linear representation of a syntax tree or a DAG in which explicit names correspond to the interior nodes of the graph. Three address code is a sequence of statements of the form A = B OP C where A, B and C are the names of variables, constants or the temporary variables generated by the compiler. OP is any arithmetic operation or logical operation applied on the operands B and C. The name reflects that there are at most three variables where two are operands and one is for the result. In three address statement, only one operator is permitted; if the expression is large, then break it into a sequence of sub expressions using the BODMAS rules of arithmetic and store the intermediate results in newly created temporary variables. For example, consider the expression a + b * c; this expression is expressed as follows:
T1 = b * c
T2 = a + T1
Here T1 and T2 are compiler-generated temporary names. This simple representation of a complex expression in three address code makes the task of optimizer and code generator simple. It is also easy to rearrange the sequence for efficient code generation. Three address code for the statement a = b* – (c – d) + b* – (c – d) is as follows:
T1 = c – d
T2 = –T1
T3 = b * T2
T4 = c – d
T5 = –T4
T6 = b * T5
T7 = T3 + T6
a = T7
The code can also be written for DAG as follows:
T1 = c – d
T2 = –T1
T3 = b * T2
T4 = T3 + T3
a = T4
Example 1: Consider the assignment a: = b* – c + b* – c. Draw the syntax tree and the DAG.
Solution: The tree and DAG for the expression a: = b* – c + b* – c is shown in Figure 8.6.
Figure 8.6 Syntax tree and DAG for a = b* – c + b* – c
For expressing the different programming constructs, the three address statements can be written in different standard formats and these formats are used based on the expression. Some of them are as follows:
param Al
param A2
param A3
.
.
param An
call fun, n
return B
For any intermediate code generator, the choice of allowable operator is an important issue as this will simplify the optimization and code generation task.
Three address codes can be represented in special structures known as quadruple, triple and indirect triple.
A quadruple is a record structure with four fields. The first field is to store the operator, the second and third fields are for the operands used in the operation, and the fourth field is for the result. For example, the three address statement A = B op C is written by placing op in the first field, that is, operator, B, and C are placed in the second and third fields, that is, operand 1 and operand 2 respectively, and A in fourth field, that is, result. If the operator is unary, then the third field is not used. In case of conditional and unconditional jump statements, the target label is placed in the result field. The quadruple for the statement a = b* – (c – d) + b* – (c – d) is shown below.
We use U for unary minus and – as it is for binary minus.
In quadruples there is an overhead for managing the temporary variables created. This problem can be reduced by referring the position of the statement that computes the value of the sub expression. In triples there are only three fields, one for the operation and the remaining two for operands that may have a variable or a constant or the statement position number that computes the value of the operand. Since there are only three fields, it is called "triples."
Parenthesized numbers represent the position number of the triple structure, while symbol-table pointers are represented by the names themselves. Since the operands represent two different entries, the fields can be encoded in a proper manner. For example, in case of copy statement a:= tl, where tl is computed at some position (i) then the triplet entries are "=" is placed in the first field, a in the second field and (i) in the third field. For a ternary operation like a[i]:=b, it requires two entries in the structure. The first entry finds the position of i with reference to a and then the actual copy statement as given below.
The triple for the statement a = b* – (c – d) + b* – (c – d) is
Indirect triples is another implementation of three address code where the listing of pointer to triples is given separately as shown in the following example:
Sequence |
Statement No |
---|---|
(0) |
(20) |
(1) |
(21) |
(2) |
(22) |
(3) |
(23) |
(4) |
(24) |
(5) |
(25) |
(6) |
(26) |
(7) |
(27) |
The main difference between the three forms of structures is that in case of quadruples, temporary variables are created, which need entries in the symbol table. This leads to waste of space. This problem is eliminated in triples and indirect triples. The second difference is regarding the extent of indirection that is present in the representation. For instance, when the target code is produced, each variable or compiler generated temporary variable is assigned some run-time memory location. There would be an entry regarding these addresses in the symbol table. In case of quadruples, the three address statement that uses these temporary variables can be accessed immediately from the symbol table. The advantage of these entries in the symbol table is when the optimizer rearranges the statements for generating efficient code, it requires no changes for the variables. In case of triples moving a single statement that defines a temporary value, many changes have to be made in all the statements that refer to this computation either in argl or in arg2. Hence, it is difficult for the optimizer to optimize the code.
In case of indirect triples, this problem does not exist. A statement can be moved by reordering the statement list. Since pointers to temporary values refer to the statements in the actual list of statements, which are not modified, the reference pointers can be reordered easily. The indirect triples are almost similar to quadruples in terms of space and utility. However, indirect triples can save some space compared with quadruples if the same temporary value is used more than once. This is because two or more entries in the statement array can point to the same line in the actual triple structure.
For example, lines (20), (21), and (22) could be combined with (23), (24), and (25).
Syntax-directed translation rules can be defined to generate the three address code while parsing the input. It may be required to generate temporary names for interior nodes which are assigned to nonterminal E on the left side of the production E→E1 op E2. While processing, the variable names and code generated so far are tracked till they reach the starting nonterminal. For this purpose, we associate two attributes place and code associated with each nonterminal.
To generate intermediate code for assignment statement, first searching is applied to get the information of the identifier from the symbol table. These identifiers are simple or multidimensional array or a constant value that is stored in a literal table. After searching, the three address code is generated for the program statement. Function lookup will search the symbol table for the lexeme and store it in id.place. Function newtemp is defined to return a new temporary variable when invoked and gen function generates the three address statement in one of the above standard forms depending on the arguments passed to it.
Let us consider the example arithmetic statement a = – (b – c). When it is represented as three address code, the statements are
T1 = b – c
T2 = – T1
a = T2
Let us consider the order in which the statements are generated. First the statement relating to c-d is generated by using a new temporary variable. While parsing the possible rules used for this expression are
S → id = E
E → – E1
E → E1 – E2
E → id
Figure 8.7 Syntax tree for a = –(b – c)
Consider the syntax tree for the expression a = –(b – c) shown in Figure 8.7.
The syntax-directed translation for arithmetic statements can thus be written as follows:
S → id = E |
{id.place = lookup(id.name); if id.place ≠ null then S.code = E.code | | gen( id.place ":=" E.place) else S.code = type_error} |
E → – E1 |
{E.place = newtemp(); E.code = E1.code | | gen(E.place ":=" "–" E1.place)} |
E → E1 + E2 |
{E.place = newtemp(); E.code = E1.code | | E2.code | | gen(E.place ":=" E1.place "+" E2.place)} |
E → E1 – E2 |
{E.place = newtemp(); E.code = E1.code | | E2.code | | gen(E.place ":=" E1.place "–" E2.place)} |
E → E1 * E2 |
{E.place = newtemp(); E.code = E1.code | | E2.code | | gen(E.place ":=" E1.place "*" E2.place)} |
E → E1 / E2 |
{E.place = newtemp(); E.code = E1.code | | E2.code | | gen(E.place ":=" E1.place "/" E2.place)} |
E → id |
{E.place = lookup(id.name), E.code = " "} |
Elements of array are stored in consecutive memory location. If the array is of size n and the size of each element is s, then the ith element of the array can be accessed at the base address + (i – low) * s when the array is single dimension.
Here base is the base address of the array or the address of the first element of array and low is the lower bound of the array or the index of the first element of the array.
Example 2: Let A[5] be an array of 5 elements. Let the size of each element be 2, that is, s = 2 and the array is stored from memory location 100, that is, base address = 100.
To refer to the third element of the array, the address is calculated as 100 + (3 – 1) * 2 = 104.
In case of the two-dimensional array, the expression is written as i * s + base – low * s where the first sub expression is (i * s) and second sub expression is (base – low * s). In the second expression, all the components are known before compilation; hence, they can be pre-computed and stored. This reduces the time taken to generate the address of the ith element. In case of multi-dimension arrays like matrix, elements are either stored as row major or column major order.
Example 3: Consider Array A[3,3] with elements stored in row major order is shown below in Figure 8.8.
Figure 8.8 Array A[3,3]
The address of element A[i,j] in row major order is computed with the expression base + (( i – low_i )* n2 + j – low_j ) * s, where base is the starting address of the array, low_i is the lower bound of i, low_j is the lower bound of j, n2 is the number of columns, and s is the size of each element.
This expression can be written as ((i * n2) + j) * s + ( base – (( low_i * n2) + low_j) * s The second part of the expression can be pre-computed by knowing the value of base, low_i, low_j, and s. This helps in faster generation of address of A[i,j]. The generalized rules may be as follows:
S → A = E
E → E + E
E → E * E
E → A
A → Alist ]
A → id
Alist → Alist, E | id [ E
A can be a simple name (has only one base address and no offset) or an indexed name (has base address and object) assignment to location.
S → A = E |
{ If A.offset = null then gen(A.place '=' E.place else gen(A.place '[' A.offset']''='E.place )} |
|
E → E1 + E2 |
{E.place = newtemp(); E.code = E1.code || E2.code || gen(E.place ":=" E1.place"+"E2.place)} |
|
{E.place = newtemp(); E.code = E1.code | | E2.code | | gen(E.place ":=" E1.place "*" E2.place)} | ||
E → A |
{ If A.offset = null then E.place = A.place else begin E.place = newtemp() gen(E.place '=' A.place '[' A.offset']') end} | |
A → Alist |
{ A.place = newtemp() A.offset = newtemp() gen(A.place '=' c(Alist.array) gen(A.offset '=' Alist.place '*' width(Alist.array))} | |
A → id |
{ A.place = id.place A.offset = null} | |
Alist → Alist1, E |
{ t = newtemp() m = Alist1.dim +1 gen(t '=' Alist.place '*' lmt(Alist.array,m)) gen(t '=' t'+' E.place) Alist.array = Alist1.array Alist.place = t Alist.ndim=m} | |
Alist → id [ E |
{ Alist.array = id.place Alist.place = E.place Alist.ndim = 1} |
If A A.offset and A.value are the two attributes associated with A as L-values. If A is an array, A.offset is a temporary variable to store the first part of the expression and A.value stores the second part of the expression. If A is a simple variable, then A.value points to the symbol table and A.offset is set to null. Here we define other functions like lmt() for maximum number of elements present in the jth dimension, width() for the size of the array, m to denote the dimension of the array, and c for the second component of the expression.
An expression that contains operators like +, –, *, / are simple arithmetic expressions, whereas the expression that contains relational operators like <, >, ≥, ≤, or, and, not, etc., are logical expressions. The logical expressions are mainly used for a set of statements that are to be executed based on the condition that is satisfied, that is, to control the flow of path. The use of logical expression always results in either true or false, which is considered 0/1. 0 indicates false and 1 or a positive number indicates true. The rules for writing the logical expressions are as follows:
E → E1 or E2
E → E1 and E2
E → not E1
E → id1 relop id2
E → (E1)
E → false
To convert the expression with logical operator requires two things to be added after the evaluation of the expression to true or false.
This is done by first evaluating the expression and then adding the "if" statement that checks the result and sets the value to 0 or 1. While generating three address statement, we use variable next that keeps track of current statement number that is used to generate the next required statement number for true or false condition. The translation rules to convert to three address code are as follows:
E → E1 or E2 |
{E.value = newtemp(); gen(E.value "=" E1.value "or" E2.value)} | |
E → E1 and E2 |
{E.value = newtemp(); gen(E.value "=" E1.value "and" E2.value)} | |
E → not E1 |
{E.value = newtemp(); gen(E.value "=" "not" E1.value)} | |
E → (E1) |
{E.value = E1.value} | |
E → id1 relop id2 |
{E.value = newtemp(); gen( "if" id1.value relop.op id2.value "goto" nextstat + 3) gen(E.value "=" "0") gen("goto" nextstat + 2) gen(E.value "=" "1")} | |
E → true |
{E.value = newtemp(); gen(E.value "=" "1")} | |
E → false |
{E.value = newtemp(); gen(E.value "=" "0")} |
Example 4: Write three address statement for the x or y and not z
Solution: Three address code for the above expression is
t1 = not z
t2 = y and t1
t3 = x or t2
Example 5: Write three address statement for the if x < y then 1 else 0.
Solution: The address code for the above expression is
10. if x < y go to 13
11. t1 = 0
12. go to 14
13. t1 = 1
14.
Flow of control statements can be shown pictorially as in the Figures 8.9 and 8.10 below.
Figure8.9 Control low for if-then-else statement
Figure 8.10 Control low for while statement
While generating the code we need to generate the label that will execute the segment of code depending on the condition whether it is true or false. The rules for writing different constructs are as follows:
S →. if E then S1
S → if E then S1 else S2
S → while E do S1
While writing the translation rules for the first rule, we need to generate a new label for the true segment and the next statement is the statement that follows S. Hence, the translation rules are as follows:
S → if E then S1 |
{E.true = newlabel(); E.false = S.next; S1.next = S.next; S.code = E.code | | gen(E.true,":") | | S1.code} | |
S → if E then S1 else S2 |
{E.true = newlabel(); E.false = newlabel(); S1next = S.next; S2.next = S.next; S.code = E.code | | gen(E.true,":") | | S1.code | | gen(" GOTO ", S.next) | | gen(E.false, " :") | | S2.code} | |
S → while E do S1 |
{S.begin = newlabel(); E.true = newlabel(); E.false = S.next; S1.next = S.next; S.code = gen(S.begin":") | | E.code | | gen(E.true,":") | | S1..code | | gen("GOTO",S.begin)} |
Example 6: Give three address code for the following:
While (a < 5) do a: = b + 2
Solution:
L1: |
If a < 5 goto L2 goto last |
|
L2: |
t1 = b + 2 a = t1 goto L1 |
|
last: |
|
Similarly, we can translate for the "do...while" and the "for" loop.
Translate "do Swhile (E)."
The flow of control for the "do while" loop is as shown in Figure 8.11.
Figure 8.11 Control flow of repeat... statement do... while
Example 7: Translate do x = y + z while a < 10
Solution: LI:
t1= y + z
x = t1
If a < 10 goto L1
Flow of control for the "for" loop is shown in Figure 8.12 for(El;E2;E3) S;
Figure 8.12 Control flow of for loop
Translate the following switch statement.
switch(E)
{ case v1: s1;
case v2:s2;
:
:
case vn : sn
}
The translation is based on the way the flow of control executes the statement and is given as follows:
|
t1=E |
|
|
goto test |
|
L1: |
s1 |
|
|
goto last |
|
L2: |
s2 |
|
|
goto last |
|
Ln: |
sn |
|
|
goto last |
|
test: |
if (t1==v1) goto L1 |
|
|
if (t2==v2) goto L2 |
|
|
: |
|
|
if (tn==vn) goto Ln |
|
last: |
|
–(a + b) * (c + d) + (a + b + c)
Solution:
Postfix form = ab + – cd + * ab + c + +
Three address code:
t1 = a + b
t2 = –tl
t3 = c + d
t4 = t2 * t3
t5 = a + b
t6 = t5 + c
t7 = t4 + t6
Syntax tree: The syntax tree for the given expression is shown in Figure 8.13.
Figure 8.13 Syntax tree
DAG: The DAG for the given expression is shown in Figure 8.14.
Figure 8.14 DAG:
a * (b + c) / (d + e)
Solution: The postfix form step by step is given by
a* (bc+) / (de+)
(abc+*) / (de+)
abc+*de+/
Three address code is
t1 = b + c
t2 = a * t1
t3 = d + e
t4 = t2/t3
–a + b / c ↑ d ↑ e*f / g
Solution: postfix form step by step is given by
(a–) + b / c ↑ d ↑ e*f/g
(a–) + b / c ↑ (de ↑)*f/g
(a–) + b / (cde↑↑)*f/g
(a–) + (bcde↑↑/)*f/g
(a–) + (bcde↑↑/f*g/)
a–bcde↑↑/f*g/+
Postfix form is: a-bcde↑↑/f*g/+
Three address code:
t1 = –a
t2 = d ↑ e
t3 = c↑t2
t4 = b / t3
t5 = t4 *f
t6 = t5 / g
t7 = tl + t6
Solution: The lower bound of both is equal to 1, that is, lowl = low2 = 1 and higher bound is equal to 10, that is, nl = n2 = 10 and let the size of each element be 4, that is, s = 4. The annotated parse tree generates the address of x = A[y, z] as shown in Figure 8.15.
Figure 8.15 Parse tree for x = A[y, z]
Solution: Three address code for the above expression will be as follows:
10. if x < y go to 13
11. t1= 0
12. go to 15
13. t1 = 1
15. if z < X1 go to 18
16. t2 = 0
17. go to 23
18. if y1 < Z1 go to 21
19. t3 = 0
20. go to 23
21.t3 = l
22. go to true
23. false cond
while a<b
do
if c < d then
x = y + z
else
x = y - z
done
Solution: The three address code will be as follows:
L1. if a<b then GOTO L2
GOTO LNEXT
L2. if c<d then GOTO L3
GOTO L4
L3. tl = y + z
x = tl
GOTO L1
L4. tl= y - z
x = tl
GOTO L1
LNEXT.
main (0
{
int i=l;
int a[10];
while(i<=10)
A[i] = 10;
}
Solution: Three address code is as follows:
i = 1
L: if i<=10 goto L1
goto last
L1:tl = i * w
t2 = addr(A)
t2[tl]= 10
Goto L
Last:
where w is the width of each array location.
switch (i+j)
{ case 1: x = y + z;
case 2: u = v + w;
default: p = q + r
}
Solution:
The translation as follows:
t=i + j; goto test | ||
L1: |
tl = y + z; x = tl; goto last | |
L2: |
t2 = v + w u = t2 ; goto last | |
L3: |
t3 = q + r; p = t3; goto last | |
test: |
if (t==l) goto L1 if (t==2) goto L2 goto L3 | |
last: |
main ()
{
int i ;
int a [10] ;
i = l;
while (i < 10)
{
a[i] = 0.0; i = i + 1;
}
}
A[i , j] : = B[i , j] + C[A[k , 1]] + D[i + j]
3.138.36.72