3.2 Symbol Tables and a Scanner

Though as suggested above and in Figs. 3.1 and 3.2, Symbol Tables are created during the Scanner phase of a compiler, their real use and manipulations will be needed during the semantic analysis phase and as such we postpone full discussion of Symbol Table handling techniques till Section 5.5 in Chapter 5. However, the Scanner does initiate building up of the Symbol Table and here we discuss building a skeleton Symbol Table, with whatever information is available with the Scanner.

A general discussion about Symbol Tables is given in Appendix B, Section B.2.3.

In anticipation of later developments, we build the Symbol Table as a binary search tree (BST). Though a BST can have worst-case search time as bad as a linked list, we use it here for the sake of simplicity. In a real-life compiler, a more sophisticated binary tree structure like AVL or Red–Black trees or Hash tables would be used, in order to get a fast access to the Symbol Table. The design we present here is a slightly simplified version of the Symbol Table that will be built for a full example compiler in Chapter 12.

Each node in the Symbol Table BST has the following structure, declared in the file tb_ tree_l.h:

typedef union{
  struct node_struct*N;
  char *S;
  long I;
  float F;
  void *P;
}util;
typedef struct node_struct{
struct node_struct *link[2]; // left = 0, right = 1
util u,v,w,x,y,z;
}Node;

The links to the left and right child are stored in a two-cell array link[2]. There are six elements of type util, which can be used for storing up to six attributes for each node. We shall need to use them in later chapters, where their application will be detailed. Note that though this node structure has been optimized for a binary tree, it can easily be used for a general tree.

As far as a Scanner is concerned it will create a Symbol Table entry for each identifier, constant, string literal, keywords, built-in functions, predefined identifiers and constants, etc., along with their syntactic type and value if known. The utility fields are defined as follows:

#define L(n)            ((n)−>link[0])
#define R(n)            ((n)−>link[1])
#define name(n)         ((n)−>w.S)
#define type(n)         ((n)−>v.I)
#define val(n)          ((n)−>u.F)

Some of the type values are:

#define NUMBER 258
#define STRING 259
#define VAR 261
#define UNDEF 263

The Scanner that we plan to use in Chapter 12 is similar to the one we have discussed above in Section 3.1.1, and hence its code is not detailed here. The insertion and look-up of Symbol Table entries are carried out by the following functions, which in turn invoke standard BST handling functions from our Tree Data Structure library, called tree-base.

typedef Node Symbol;
static Symbol *symtab; /* Symbol Table */

void insert_node(Node **r, Node * n);
Node *search_node(Node *r, Node *n);
/* ----- find s in symbol table ----- */
Symbol* lookup(char* s){
        Symbol *sp;
        sp = malloc(sizeof(Symbol));
        name(sp) = malloc(strlen(s)+1);
        strcpy(name(sp), s);
        sp = search_node(symtab, sp);
        return sp;
}
/* ----- install s in symbol table ----- */
Symbol* install(char* s, int t, double d){
        Symbol *sp;
        sp = malloc(sizeof(Symbol));
        name(sp) = malloc(strlen(s)+1);
        strcpy(name(sp), s);
        type(sp) = t;
        val(sp) = (float)d;
        insert_node(&symtab, sp);
        return sp; 
}
void traverse_rec(Node *r);
void traverse(){
  Node * r = symtab;
  traverse_rec(r);
}

In our tree-base library, the BST manipulation functions are written as abstract data types (ADT) and require that application-specific functions be written to handle the comparison of values and input–output from a data structure.

/* ------ comp, display_value ------ */
int comp(Node *i, Node *j){
  return strcmp(i–>w.S, j–>w.S);
}
void display_value(Node *n){
  if(n == NULL) printf(″No value available
″);
  else printf(″[%s](%d){%x}
″, n–>w.S,n–>v.I,n–>u.P);
}

Note that the display_value() matches our definition of the utility fields. As the value associated with a node could be of several types, we simply print it as an unsigned hex value when we want to dump the Symbol Table.

The BST insert and search functions are:

/* ---------- insert ----------- */
void insert_node(Node **r, Node * n){
  if((*r) == NULL)
    {
      (*r) = n;
      L(*r) = NULL;
      R(*r) = NULL;
      return;                 /* tree was empty */
    }
  if(comp(n, (*r)) < 0)
    insert_node(&L (*r), n);
  else
    insert_node(&R (*r), n);
  return;
}
/* ---------- search ---------- */
Node *search_node(Node *r, Node *n){
  if(r == NULL) return NULL;
  if(comp(n, r) < 0)
    return search_node(L(r), n);
  else if(comp(n, r) > 0)
    return search_node(R(r), n);
  else return r;
}

Finally, we would like to have a look at the Symbol Table created, for which we have a traverse() function.

/* ---------- traverse BST ---------- */
/* need to do in-order traversal      */
void traverse_rec(Node *r){
  if(r == NULL)
    return;
  traverse_rec(r–>link[0]);
  display_value(r);
  traverse_rec(r–>link[1]);
}

After writing a suitable main() function, which reads an input file (possibly STDIN), invokes the Scanner, and dumps the Symbol Table, we gave the following input:

bbb = 1
aaa = 2
ccc = 3
r = aaa + bbb + ccc

The output from the Scanner was:

261 61 258 10
261 61 258 10
261 61 258 10
261 61 261 43 261 43 261 10

Note that 61, 10 and 43 are decimal ASCII codes for ‘=’, ‘nl’ and ‘+’, respectively. We also obtained the following output as Symbol Table dump:

[1.0000000e+00](258){3f800000}
[2.0000000e+00](258){40000000}
[3.0000000e+00](258){40400000}
[aaa](263){0}
[bbb](263){0}
[ccc](263){0}
[r](263){0}

Note that the numbers are also stored in the Symbol Table, with their display value as “name”. This is an option we have selected in our design. Also note that what we see as variables is stored as UNDEF (263) and not VAR (261) because the Parser has not yet identified them as validated variables. Note further the zero values for the identifiers and floating-point hex values for the numbers.

We also had another version of this Scanner, modified to print the BST. It gave the following output for the same input.


                                      bbb(263)
                                         / 
                                        /   
                                       /     
                                      /       
                                     /                                          
                          1.0000000e+00(258)  ccc(263)
                                                
                                  aaa(263)      r(263)
                                     /
                          2.0000000e+00(258)
                                     
                          3.0000000e+00(258)

 

In the subsequent chapters, we shall have occasion to refer to the Symbol Table for actions suitable for a particular phase of the compiler.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.93.175