8.9 Real-life: Intermediate Codes of GNU gcc

GCC – GNU compiler collection (previously expanded as GNU C compiler), gcc/ccl – the modern GNU C compiler (see Fig. 8.12).

 

GNU compiler collection framework. Generic, Gimple and RTL are the Intermediate codes in GCC

 

Fig. 8.12 GNU compiler collection framework. Generic, Gimple and RTL are the Intermediate codes in GCC

 

The Abstract Syntax Tree (AST), generic Intermediate representation in GCC:

  • Output by the parser.
  • Explicitly represents all information in the original source code.
  • Front-end code may have its specific AST, but generic is aimed at being the common IR of all HLLs that GCC supports.

GIMPLE is another abstract level of representation in GCC. It is influenced by the SIMPLE Intermediate representation of McCat compiler. In fact, the name GIMPLE is given due to this similarity with SIMPLE. It is a subset of the AST-generic. There are two “flavours” of GIMPLE:

Structured: Cleanups and simplification.

Unstructured: Lowering of control flow, i.e. a program is represented as sequenced statements plus unrestricted jumps. Note that lowered control flow is nearer to register machines (hardware CPU) and leads to easier generation of static single assignment (SSA) form, where every variable is assigned in only one place.

GIMPLE grammar is available (see GCC reference):

http://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html

Internal data structure used to store the GIMPLE nodes is a tree and that is the reason why the intermediate output from a GCC compiler is called trees.

Tree is the central data structure of the gcc front-end. A tree-node is capable of holding integers, reals, complex, strings, etc. Actually a tree is a pointer type, but the object to which it points may vary. If we are just taking the example of an integer node of a tree, it is a structure containing an integer value, some flags and some space for RTL (these flags are common to all the nodes). There are a large

number of functions and macros provided by the GCC, which help us to manipulate the tree. In our program, we need not directly handle the trees. Instead we use the function calls.

Some of the GIMPLE node types are:

Binary operator – BIT AND EXPR, MAX EXPR

Comparison – EQ EXPR, LTGT EXPR

Constants – INTEGER CST, STRING CST

Declaration – FUNCTION DECL, LABEL DECL

Expression – MODIFY EXPR, ADDR EXPR, BIND EXPR

Reference – COMPONENT REF, ARRAY RANGE REF

Statement – EXIT EXPR, GOTO EXPR

Type – BOOLEAN TYPE, INTEGER TYPE

Unary – ABS EXPR, NEGATE EXPR, NOP EXPR

 

RTL stands for register transfer language. It is the intermediate code handled by GCC. Each and every tree structure that we build has to be changed to RTL so that the back-end can work with it perfectly. As with the trees, GCC provides a number of functions to produce the code from the trees. So while developing a compiler we are trying to build the tree and convert it to RTL for each and every line of the program. Most of the RTL routines take the trees as arguments and emit the RTL statements.

8.9.1 Example GCC Intermediate Code

In order to make concrete the above ideas, we give below the intermediate representation from a gcc compilation of a sample C program. These outputs were obtained by the command:

gcc -fdump-tree-all testintermediate.c

First, let us have a look at the original C source:

int main(){
  int i = 2, j = 0;
  if(i > j) j = i + 3;
  while(j < 10){
    i = j * 2;
    j++;
  }
  if(i == j) j = i + 3; else i = j/3;
  return 0;
}

The program is not written to do any useful work, but just to investigate various structures generated in the Intermediate code. Note that we have not tried any optimization while compiling.

GCC processes the parsed version of the source program, i.e. the Parse tree, by a series of passes which generate various forms of trees representing the original program and going nearer and nearer to what is required by the code generator. Before the code generator takes over, these trees are further processed into a form called RTL (Register Transfer Language), which is summarized in Chapter 9.

The first tree is called the original, which is a top-level representation:

;; Function main (main)
;; enabled by -tree-original
{
  int i = 2;
  int j = 0;
  <STATEMENT_LIST>
}

Note that the tree is output in a human-readable form, which looks like C code. If there were any functions defined in the source program, they would have been included here.

Now we have a look at the GIMPLE form:

main(){
  int D.1283;
  int i;
  int j;
  i = 2;
  j = 0;
  if(i > j){
     j = i + 3;
  } else {
  }
  goto <D1281>;
  <D1280>:;
  i = j * 2;
  j = j + 1;
  <D1281>:;
  if(j <= 9){
     goto <D1280>;
  } else {
     goto <D1282>; 
  }
  <D1282>:;
  if(i == j){
     j = i + 3; 
  } else {
     i = j / 3;
  }
  D.1283 = 0;
  return D.1283;
}

There are several points to be noted. <D.1283> represents the constant ‘0’ which is returned by the main() function. The first if statement generates a null else clause. Particularly note how the while loop is represented by a combination of if and goto: Compare this construct with Polish notation in Section 8.2 and the N-tuple notation in Section 8.3. This is an example of “lowering the control construct”.

 

image

 

We then have a look at one more tree, called “cfg” tree:

;; Function main (main)
main(){
  int j;
  int i;
  int D.1283;
<bb 0>:
  i = 2;
  j = 0;
  if(i > j) goto <L0>; else goto <L1>;
<L0>:;
  j = i + 3;
<L1>:;
  goto <bb 4> (<L3>);
<L2>:;
  i = j * 2;
  j = j + 1; 
<L3>:;
  if(j <= 9) goto <L2>; else goto <L4>;
<L4>:;
  if(i == j) goto <L5>; else goto <L6>;
<L5>:;
  j = i + 3;
  goto <bb 8> (<L7>);
<L6>:;
  i = j / 3; 
<L7>:;
  D.1283 = 0;
  return D.1283;
}

This form is the last tree for this simple program, without compiler optimization, before the RTL form is generated. Notice that now code blocks are identified and marked.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.246.223