8.8 Grammar and IR Generation for miniC

In Chapter 12, we plan to present a complete development of a compiler for a stripped down C-like language, miniC. Here we present, as examples of Intermediate Representation for most important program constructs, along with the action terms added in the yacc grammar to generate them. The following are considered:

  • expressions
  • assignments
  • statements: IF-THEN, IF-THEN-ELSE, WHILE-DO
  • IF-THEN and IF-THEN-ELSE initiation
  • WHILE-DO initiation
  • variable declarations
  • function definitions
  • function calls

We use three additional stacks to keep track of various code-related parameters:

tstack: Stack for the temporaries Tn number, tcount used in the 4-tuple IR. Corresponding stack functions are tpush() and tpop(). The 4-tuple IR assumes “infinite” number of temporary locations available.

mstack: Stack for the 4-tuple entry numbers, counted by mcount. Generally, this count goes on incrementing, but we need to stack its value while generating the control constructs like IF-THEN-ELSE or WHILE-DO. The corresponding stack manipulation functions are mpush() and mpop().

lstack: Stack for label numbers created for generating IR for control constructs. The label count is maintained by 1count. The corresponding stack manipulation functions are lpush() and lpop(). The labels are used by the generators for both RPN IR and 4-tuple IR.

 

You will notice that the generation of 4-tuple IR is slightly more complex than the generation of RPN IR. This is so because 4-tuple is nearer to the assembly or machine language of most of the modern processors, and almost one-to-one correspondence can be established between 4-tuples and a short sequence of assembly or machine instructions. On the other hand, mapping the RPN to assembly or machine instructions involves use of a stack to store operands, especially for mapping expressions.

One important difference in the action code from the one given in Chapter 5 is that, instead of putting all the outputs in one output file, we now output the RPN IR in file fout1 and the 4-tuple matrix in file fout2. The file names are created out of the source file name.

8.8.1 Expressions

We had already presented IR generation for a few of the possible expressions in Chapter 5. Here we see some more details. In case of RPN, we expect that the IR generated for the two operand subexpressions leave their values on the top of the stack; we simply generate a code for the operator.

On the other hand, for 4-tuple IR, we generate an entry by popping the top two tcount from the stack and assigning next tcount to the result. Then we should push the result tcount on the tstack.

expr: INT { $$ = $1; fprintf(fout1,“%s
”, name($1));
	    fprintf(fout2,“%d: LD %s T%d −−
”,++mcount,name($1),
					++tcount);tpush(tcount);}
    | NUMBER { $$ = $1; fprintf(fout1,“%s
”, name($1));
	      fprintf(fout2,“%d: LD %s T%d −−
”,++mcount,name($1),
					++tcount);tpush(tcount);}
    | VAR    { $$ = $1; fprintf(fout1,“%s
”, name($1));
	      fprintf(fout2,“%d: LD %s T%d −−
”,++mcount,name($1),
					++tcount);tpush(tcount);}
    | IVAR   { $$ = $1; fprintf(fout1,“%s
”, name($1));
	      fprintf(fout2,“%d: LD %s T%d −−
”,++mcount,name($1),
					++tcount);tpush(tcount);}
    | expr ‘+’ expr { fprintf(fout1,“+
”); $$ = $3;
      fprintf(fout2,“%d: ADD T%d T%d T%d
”,++mcount,tpop(), tpop(),
					++tcount); tpush(tcount);}
    | expr ‘−’ expr { fprintf(fout1,“−
”); $$ = $3;
      fprintf(fout2,“%d: SUB T%d T%d T%d
”,++mcount,tpop(), tpop(),
					++tcount); tpush(tcount);}
    | expr ‘*’ expr { fprintf(fout1,“*
”); $$ = $3;
      fprintf(fout2,“%d: MUL T%d T%d T%d
”,++mcount,tpop(), tpop(),
					++tcount); tpush(tcount);}
    | expr ‘/’ expr { fprintf(fout1,“/
”); $$ = $3;
      fprintf(fout2,“%d: DIV T%d T%d T%d
”,++mcount,tpop(), tpop(),
					++tcount); tpush(tcount);}

The basic assignment operation is rather straightforward, but miniC contains op= type of assignments also. The semantics of these extended assignments raises an issue. The normal subtract operation, written in infix notation as a −b, in RPN appears as a b − and the semantics is “subtract the TOS from one below it”. The semantics of extended assignment a -= e is “subtract the value of e from a. The corresponding RPN IR that we would generate is e a – a =, but it reverses the order of subtraction. We have two options − either exchange the top two values on the stack, or use a special form of − operator, shown as R−(reversed minus), which has semantics “subtract from TOS the value below it”. Similarly, divide operator also being non-commutative, we use R/ for “reverse divide”. The 4-tuple IR has no such issues.

8.8.2 Assignments

asgn: VAR ‘=’ expr {fprintf(fout1,“%s
=
”, name($1));
		    fprintf(fout2,“%d: = T%d %s −−
”, ++mcount,tcount,
					      name($1));tpush(tcount);}
    | IVAR ‘=’ expr {fprintf(fout1,“%s
=
”, name($1));
		    fprintf(fout2,“%d: = T%d %s −−
”, ++mcount,tcount,
					      name($1)); tpush(tcount);}
    | IVAR ADDEQ expr {fprintf(fout1,“%s
+
%s
=
”, name($1),name($1));
           fprintf(fout2,“%d: ADDE T%d %s %s
”, ++mcount, tcount, name($1),
					      name($1));tpush(tcount);}
    | IVAR SUBEQ expr {fprintf(fout1,“%s
R-
%s
=
”, name($1),name($1)); 
           fprintf(fout2,“%d: SUBE T%d %s %s
”, ++mcount, tcount, name($1),
					      name($1));tpush(tcount);}
    | IVAR MULEQ expr {fprintf(fout1,“%s
*
%s
=
”, name($1),name($1));
           fprintf(fout2,“%d: MULT T%d %s %s
”, ++mcount, tcount, name($1),
					      name($1));tpush(tcount);}
    | IVAR DIVEQ expr {fprintf(fout1,“%s
R/
%s
=
”, name($1),name($1));
           fprintf(fout2,“%d: DIVE T%d %s %s
”, ++mcount, tcount, name($1),
					      name($1));tpush(tcount);}

We illustrate by a small example:

int : a,b
a = 5
b = a − 3
a − = 2
a = a − 2
end

The generated RPN IR is:

int a DCL int b DCL 5 a = a 3 − b = 2 a R − a = a 2 − a =

and the 4-tuple IR is:

DCL int a −−
DCL int b −−
1: LD 5 T1 −−
2: = T1 a −−
3: LD a T2 −−
4: LD 3 T3 −−
5: SUB T2 T3 T4 
6: = T4 b −−
7: LD 2 T5 −−
8: − a T5 a
9: LD a T6 −−
10: LD 2 T7 −−
11: SUB T6 T7 T8
12: = T8 a −−

8.8.3 Statements

A statement can be a variable declaration, a function definition, an expression, a function return, a WHILE-DO or an IF-THEN construct. Because certain initialization actions are needed for many of them, the IR generation for a complex control construct is divided into two parts. Explanation of code generation is explained in each type of statement.

stmt:	  dec1 
	| defn 
	| expr 
	| RETURN  {fprintf(fout1,“RET
”);
		   fprintf(fout2,“%d: RET T%d −− −−
”, ++mcount, tcount);
                                                          tpush(tcount);}
	| RETURN expr {fprintf(fout1,“RET
”);
		   fprintf(fout2,“%d: RET T%d −− −−
”, ++mcount, tcount);
                                                          tpush(tcount);} 
	| while ‘(’ cond ‘)’ stmt end {int 1; 1 = 1pop();      /* labe12 */
		   fprintf(fout1, “Label%d
BR
”, 1pop());
		   fprintf(fout1, “Label%d
:
”, 1);
		   fprintf(fout2, “%d: BR %d −− −−
”, ++mcount,($1)–>x.I);
		   fprintf(fout2, “%d: LABI %d −− −−
”, ++mcount,
                                                          ($3)–>y.I);}
	| if ‘(’ cond ‘)’ stmt end {	/* else-less if */
		   fprintf(fout1, “Label%d
:
”, lpop()); lpop();
                                                          /* label1 */ 
		   fprintf(fout2, “%d: LABI %d −− −−
”, ++mcount,
                                                          ($3)–>y.I); 
		   }
	| if ‘(’ cond ‘)’ stmt end ELSE {int 1,11 ;
		   1 = 1pop();	/* labe12 */
		   fprintf(fout1, “Label%d
BR
”, (11=1pop())); 1push(11);
                                                          /* label1 */
		   fprintf(fout1, “Label%d
:
”, 1); /* labe12 */ fprintf(fout2, “%d: BR %d −− −−
”, ++mcount, 0);
                                                          ($3)–>x.I = mcount;
		   fprintf(fout2, “%d: LABI %d −− −−
”, ++mcount,
                                                          ($3)–>y.I);
		   }
		   stmt end {
		   fprintf(fout1, “Label%d
:
”, 1pop());
                                                          /* labe12 */
		     fprintf(fout2, “%d: LABI %d −− −−
”, ++mcount,
                                                          ($3)–>x.I);
		   }
	| ‘{’ stmtlist ‘}’	{ }
	;

Examples of the code generation are given with each type of statement.

8.8.4 IF-THEN and IF-THEN-ELSE

The problem with the control constructs is that we have to generate a reference to a remote point in the generated code, depending upon the condition that prevails at run-time. In case of RPN, this can be a symbolic label, like Label1, which is comparatively easy to manage. Basically, the operation similar to what we did in the case of Recursive-descent Parser (RDP) will work for yacc-based parser also. However, for 4-tuple IR generation, we need to know the 4-tuple entry number for a remote branch, which is not known due to intervening statements.

Note that if we were generating the assembly code for a particular processor directly, then also we could have simply generated symbolic labels for remote jumps, but we want to investigate the machine-independent optimization, which requires that IR form of the program be available.

Starting with IF-THEN construct, where we need to have action like:

 

image

 

In the first case, within the construct, we need to generate a reference to one remote location Label1, in the second case we need to have two references generated Label1 and Label2, but we do not know where they will fall in the final generated IR code. In case of RPN, we have to output these labels in the output stream at appropriate point in the generated code. In the case of 4-tuple, the code generator needs actual remote 4-tuple entry number.

We decide to use a symbol called “dope” as a semantic value assigned to the keyword token if. We use it as a means of communication between various action terms. For each IF construct used in the source program, a fresh “dope” symbol is generated. We also generate two label numbers and push them on the lstack, by the action terms shown below. IF-THEN will use only one of them and discard another one, IF-THEN-ELSE will use both of them.

if: IF { Symbol *i; i = tree_root(“dope”); $$ = i; ++1count; 1push(1count); ++1count; 1push(1count);}

Now comes the interesting part. On detection of a cond in the IF or WHILE construct, we pop the first label, and in RPN we generate a “Branch-if-Zero Label1” code in the output stream and push back that label on lstack. Later, referring to Section 8.8.3, when the end of the true statements is detected, we send out the first label and discard the second.

cond:  expr {int 1; fprintf(fout1, “Label%d
BZ
”, 1=lpop()); 1push(1);
	    ($$)–>y.I = ++mcount;
	    fprintf(fout2, “%d: BZ %d −− −−
”, mcount, 0); } 
       ;
       

In case of IF-THEN-ELSE, again referring to Section 8.8.3, we generate and output a “Branch-to Label2” code at the end of the true statements and generate and output Label1 to mark the position in the RPN where control should go if the cond tests false. When the end of the false statements is detected, we pop the second label and output it.

If we were generating the 4-tuple IR, the situation is slightly more complex. As noted above, we need to have the remote entry numbers as the branch targets. This really requires two passes over the source code – in the first pass we note the remote points to which a conditional or unconditional branch may take place and the second pass we generate the IR code. This is somewhat similar to a two-pass assembler. Although it is possible by an additional stack, which stores generated code for the “statements”, to achieve this in a single pass, we decide to work as follows. We generate the 4-tuple IR for the complete source program, generating branch entries with unknown branch target, and taking care to note the entries within it which correspond to the remote branch targets. At the end of the IR generation, we go through this 4-tuple matrix and insert correct entry numbers in the branch 4-tuples.

Thus when the cond symbol is detected, the next 4-tuple entry number is saved in a field within the $$ symbol, as a note to remember where we have to insert correct 4-tuple number later, and a 4-tuple is generated and output for branch to 0. Let us say the entry number is i. In the IF-THEN, action on detection of the end of statements, we generate a 4-tuple with a special operation LABI, with the entry number saved in $$ symbol as the first argument. Let us say that its entry number is j, which looks like j : LABI i −− −−. The post-processing label-insert function will detect this LABI code, insert j in entry number i and change LABI to LAB.

Similar actions are performed for the IF-THEN-ELSE construct; the only difference is that for the second half of the IF construct, another field in the cond symbol is used to store the entry number of the ELSE part.

For an example source code using partial, IF-THEN construct:

int : a,b,c,d
if(a = 3) d = a * (b + c) end
end

we obtained the following IR (RPN):

3 a = Label2 BZ a b c + * d = Label2 :

and the following 4-tuple matrix:

1: LD 3 T1 −−
2: = T1 a −−
3: BZ 0 −− −−         3: BZ 10 −− −−
4: LD a T2 −−
5: LD b T3 −−
6: LD c T4 −−
7: ADD T3 T4 T5
8: MUL T2 T5 T6
9: = T6 d −−
10: LABI 3 −− −−     10: LAB 3 −− −−

For an example source using full IF-THEN-ELSE:

int : a,b,c,d
if(a = 3) c = 7 else d = a * (b + c) end
end

we got the following IR (RPN):

3 a = Label2 BZ 7 c = Label1 BR Label2 : a b c + * d = Label1 :

and the following 4-tuple IR:

1:  LD 3 T1 −−
2:  = Tl a −−
3:  BZ 0 −− −−       3: BZ 7 −− −−
4:  LD 7 T2 −−
5:  = T2 c −−
6:  BR 0 −− −−       6: BR 14 −− −−
7:  LABI 3 −− −−     7: LAB 3 −− −−
8:  LD a T3 −−
9:  LD b T4 −−
10: LD c T5 −−
11: ADD T4 T5 T6
12: MUL T3 T6 T7
13: = T7 d −−
14: LABI 6 −− −−    14: LAB 6 −− −−

8.8.5 WHILE-DO

In our miniC grammar, the cond symbol is used in both IF and WHILE constructs. WHILE will need to branch to the beginning of the IR code for the condition computation that is why the WHILE-DO construct is initialized differently.

while:   WHILE {Symbol *w; fprintf(fout1, “Label%d
:
”, ++1count);
		1push(1count); ++1count; 1push(1count);
		w = tree_root(“dope”); w–>x.I = ++mcount; $$ = w;
		fprintf(fout2, “%d: LAB %d −− −−
”, mcount, mcount);
		mpush(mcount);}
	;

For the generation of RPN IR, we generate two labels, push them on the lstack and output the first label. Referring to Section 8.8.3, when the “statements” end is detected, we generate a branch to the first label and issue the second label in the output. The BZ instruction in the action term of cond token will branch out of the WHILE loop if condition is not satisfied.

For the generation of 4-tuple IR, we create and assign a symbol w, which records the 4-tuple entry number at the beginning of the cond in its x field, and issue a label to the output.

When the end of the “statements” is detected, a branch to this 4-tuple is issued and then a j : LABI i −− −− 4-tuple is issued, where j corresponds to entry number of this 4-tuple and i refers to the entry number of the i: BZ 0 −− −− 4-tuple, which will be updated by post-processing, as we explained for IF construct.

For an example source code shown below:

int : a,b,c,d
while(a = 3) d = a * (b + c) end
end

we got the following IR (RPN):

Label1 : 3 a = Label2 BZ a b c + * d = Label1 BR Label2 :

and the following 4-tuple IR:

1:  LAB 1 −− −−
2:  LD 3 T1 −−
3:  = T1 a −−
4:  BZ 0 −− −−       4: BZ 12 −− −−
5:  LD a T2 −−
6:  LD b T3 −−
7:  LD c T4 −−
8:  ADD T3 T4 T5
9:  MUL T2 T5 T6
10: = T6 d −−
11: BR 1 −− −−
12: LABI 4 −− −−    12: LAB 4 −− −−

Note that the IR generator code output the BZ command at matrix line 4 with a zero branch “address”. Later at line 12, LABI command sets a label and also indicates that the line number of the matrix line where this label is set should be inserted in line 4 and then LABI command is changed to LAB. These line address operations are carried out after the whole of the IR is obtained. Though it is possible to write an IR generator which does this “on-the-fly” in one pass, it involves a separate additional stack. We chose the simpler method shown here. We also tried nesting an IF construct within a WHILE loop:

int : a,b,c,d
while(a = 3) if(b = 5) d = a * (b + c) else d = 9 end end
end

we got the following IR (RPN):

Label1 : 3 a = Label2 BZ 5 b = Label4 BZ a b c + * d = Label3 BR
Label4 : 9 d = Label3 : Label1 BR Label2 :

and the following 4-tuple IR:

1:  LAB 1 −− −−
2:  LD 3 T1 −−
3:   = T1 a −−
4:  BZ 0 −− −−      4: BZ 20 −− −−
5:  LD 5 T2 −−
6:  = T2 b −−
7:  BZ 0 −− −−      7: BZ 15 −− −−
8:  LD a T3 −−
9:  LD b T4 −−
10: LD c T5 −−
11: ADD T4 T5 T6
12: MUL T3 T6 T7
13: = T7 d −−
14: BR 0 −− −−     14: BR 18 −− −−
15: LABI 7 −− −−   15: LAB 7 −− −−
16: LD 9 T8 −−
17: = T8 d −−
18: LABI 14 −− −−  18: LAB 14 −− −−
19: BR 1 −− −−
20: LABI 4 −− −−   20: LAB 4 −− −−

8.8.6 Variable Declarations

Declaration statements for variables are mainly concerned with transmission of the inherited attributes to the Symbol Table entry for each of the variables being declared. Also, the same information has to be passed on in the RPN and 4-tuple IR being output. See Chapter 6 and Sections 5.4, 5.4.4 and 5.4.7 for details of the synthesized and inherited attributes.


decl:     dtype ‘:’  dlist {}
	;
dtype:    IVAR
	| VAR
	| SVAR 
	;
dlist:    VAR       {type($1) = type($<sym>0); subtype($1) = type($<sym>0); 
		     fprintf(fout1,“%s
%s
DCL
”,tokname(subtype($1)),
                                                                     name($1));
		     fprintf(fout2,“DCL %s %s −−
”, tokname(subtype($1)),
                                                                     name($1));}
	| dlist ‘,’ VAR {type($3) = type($<sym>0); subtype($3) = type($<sym>0);
			  fprintf(fout1,“%s
%s
DCL
”,tokname(subtype($3)),
                                                                   name($3));
			  fprintf(fout2,“DCL %s %s −−
”, tokname(subtype($3)),
                                                                   name($3));}
	;

Examples of IR generated for variable declarations are already included in other examples.

8.8.7 Function Definitions

Function definitions in miniC have syntax:

func <func-name> () statement

In the version of miniC presented here, in the interest of simplicity, the number and types of the formal arguments are not taken into account by the compiler. A function may be called with any reasonable number of arguments and because the Caller-Cleans-Stack type calling conventions are used, the run-time stack remains valid. The symbol procname carries with it initially the type VAR and name from the Symbol Table. First, we have to set the type as FUNCTION.

In the case of RPN IR generation, we simply output name DFN in the output stream. Then the actions associated with the stmt will output the corresponding RPN. Then we output DFE to indicate “definition end”.

In case of 4-tuple IR generation, the definition starts with i: DFN name −− −− and ends with j : RET Tm −− −− where Tm is the temporary last assigned any value within the function.

defn:   FUNC procname { type($2) = FUNCTION; fprintf(fout1,“%s
DFN
”,
								name($2));
		fprintf(fout2,“%d: DFN %s −− −−
”, ++mcount, name($2));
							($2)–>u.I = mcount;} 
	  ‘(‘ ’) ’ stmt {fprintf(fout1,“DFE
”);
		fprintf(fout2,“%d: RET T%d −− −−
”,++mcount, tcount);}
	;
procname: VAR
	| FUNCTION
	;

With the following example source code:

int: a, b, c
b = c + 9
func myfunc() a = a + 7 end
c = 11

we got the following IR (RPN):

int a DCL int b DCL int c DCL c 9 + b = myfunc DFN a 7 + a = DFE

and the following 4-tuple IR:

DCL int a −−
DCL int b −−
DCL int c −−
1: LD c T1 −−
2: LD 9 T2 −−
3: ADD T1 T2 T3
4: = T3 b −−
5: DFN myfunc −− −−
6: LD a T4 −−
7: LD 7 T5 −−
8: ADD T4 T5 T6
9: = T6 a −−
10: RET T6 −− −−

8.8.8 Function Calls

A function call (invocation) in miniC looks like: funcname(arg1, arg2, …, argn and is expected to return a value if used on the RHS of an assignment or within an expression.

For RPN IR generation, each expr in the argument list will leave its value on the stack, so a function call consists of only issuing the name of the function. The function is expected to pick-up correct number of call arguments from the stack, compute and leave the result on the stack.

For 4-tuple IR generation, first we issue a function call by a 4-tuple as: i: CALL funcname Tm −− where Tm is the temporary holding the function return value. After that, we issue arg 4-tuples as: j : ARG Tp −− −− where Tp is the temporary holding the result from the respective argument expression. The following example will clarify these concepts:

expr | FUNCTION begin ‘(’ arglist ‘)’ { fprintf(fout1,“%s
”, name($1)); 
		fprintf(fout2,“%d: CALL %s T%d −−
”, ++mcount, name($1),
							   ++tcount);}
arglist: /* nothing */    {$$ = 0; }
       | expr	          { fprintf(fout2,“%d: ARG T%d −− −−
”,++mcount,
							  tpop()); $$ = 1; } 
       | arglist ‘,’ expr { fprintf(fout2,“%d: ARG T%d −− −−
”,++mcount,
							tpop()); $$ = $1+1;} 
       ; 
expr   | ARG { fprintf(fout2,“%d: CARG %d T%d −−
”,++mcount,4 * ($1 + 1),
						++tcount); tpush(tcount);}

Within the function, the call arguments can be accessed as $1, $2, etc. which are detected by the Scanner and returned as ARG. We generate an entry for each of them as CARG. With the following example source input:

int: a, b, c
b = c + 9
func myfunc() a = a + 7
myfunc(a + 2, b − 3)
end

we got the following IR (RPN):

int a DCL int b DCL int c DCL c 9 + b = myfunc DFN a 7 + a = DFE a 2 + b 3 − myfunc

and the following IR (4-tuple):

DCL int a −−
DCL int b −−
DCL int c −−
1:  LD c T1 −−
2:  LD 9 T2 −−
3:  ADD T1 T2 T3
4:  = T3 b −−
5:  DFN myfunc −− −−
6:  LD a T4 −−
7:  LD 7 T5 −−
8:  ADD T4 T5 T6
9:  = T6 a −−
10: RET T6 −− −−
11: LD a T7 −−
12: LD 2 T8 −−
13: ADD T7 T8 T9
14: ARG T9 −− −−
15: LD b T10 −−
16: LD 3 T11 −−
17: SUB T10 T11 T12
18: ARG T12 −− −−
19: CALL myfunc T13 −−

8.8.9 Utility Functions Used

int tcount = 0;
int 1count = 0;
int mcount = 0;
char *tokens[] = {"float”,“int”,“string”,“UNDEF”, 0};

char *tokname(int tok){
  return strdup(tokens[tok-VAR]);
}
Symbol * tree_root(char *label){
  Symbol *np;
  np = (Node *)malloc(sizeof(Node));
  if(np == NULL) exit(-1);
  np–>link[0] = NULL; // ptr first child (if any)
  np–>link[1] = NULL; // ptr next sibling (if any)
  np–>w.S = (char *)malloc(strlen(label)+1);
  strcpy(np–>w.S, label); // node label
  return np;
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.37.38