5.2 Synchronized Semantic Stacks

For context-free grammars, being more general than the restricted grammar with which can be handled by an RDP, we have to use a separate stack to store the semantic information generated during the semantic analysis process. RDP used the return stack of the implementation platform to store this information in local variables created automatically on the return stack. Generally, LL(k), LR(1) or LALR(1) parsers do not have this one-to-one relationship between the Activation Record on the return stack and detection of a particular syntactic construct.

The separate stacks, more than one of which may be used by a parser, must be kept in synchronization with the detection of a particular syntactic construct. Such stacks are called synchronized Semantic stacks (Fig. 5.1).

 

Synchronized Semantic stack used for semantic analysis. Note that the Semantic stack does not push or pop every time the return stack does

 

Fig. 5.1 Synchronized Semantic stack used for semantic analysis. Note that the Semantic stack does not push or pop every time the return stack does

 

Remember that ultimate goal of the semantic analysis is to generate the Intermediate representation (Chapter 8). Thus, the actual details of design and implementation of a semantic analyzer will be governed by the nature of the Intermediate representation.

To illustrate the idea, we implemented a Semantic stack with the RDP for the example grammar, as a part of a more extensive grammar given in Chapter 8. The C code is given in file tb_rdp_if_sstk2.c, some essential code segments from which are given here.

In anticipation of later development, we use the following definition of a Node, defined in file tb_tree_l.h as elements of the Semantic stack.

typedef union{
  struct node_struct * N;
  char *S;
  long  I;
  float F;
  void *P; 
}util;
typedef struct node_struct{
struct node_struct *link[2]; // left = 0, right = 1
util u,v,w,x,y,z;
}Node;

Note that we have defined the util union to give us a great flexibility in associating almost any kind of data in the Semantic stack. Though the node structure is optimized for a binary tree, it is flexible enough to construct any general tree. This can be done as follows:

link[0] = first child 
link[1] = next sibling

                        [n]
                       /   
             [p]=n–>link[0] p–>link[1]
                      /  
                     /    

                   /      
            [q]=p–>link[0] [r]=q–>link[1]
                           [p]–>link[0]–>link[1] … etc.

The original RDP for the example grammar is extended to handle the Semantic stack, by adding stack management functions and statements to obtain an RPN output. The Semantic stack is defined below:

#define SSTK 100
typedef struct semstk{
  Node *s[SSTK];
  int ssp;        // next empty cell 
} sstk;

sstk semstack;

The stack push and pop functions are

int sstk_push(sstk *s, Node *p){
  if(s –> ssp > (SSTK – 1)){
    fprintf(stderr, "SemStack Full
");
     return SSTK; 
  } else {
     s –> s[s –> ssp] = p;
     (s –> ssp)++;
     return s –> ssp; 
  }
}
Node *sstk_pop(sstk *s){
  if(s –> ssp <= 0){
     fprintf(stderr, "SemStack Empty
");
     return NULL; 
  } else {
     (s –> ssp)––;
     return s –> s[s –> ssp]; 
  }
}

Generally, we shall put character strings in the Semantic stack and we have to form a node out of them. The function to do that is

Node *tree_root(char *label){
  Node *np;
  np = (Node *) malloc (sizeof(Node));
  if(np == NULL) 
    exit(–1);
  np –> u.N = NULL;
  np –> v.N = NULL;
  np –> w.S = (char *) malloc(strlen (label) + 2);
  strcpy(np –> w.S, label);
  strcat(np –> w.S,″ ″); // to space when we print out
  return np;
}

We also require, for debugging our code, a stack dump function:

void sstk_displ(sstk *s){
  int n, i;
  n = s –> ssp;
  printf("sem.stack size = %d
", n);
  for(i = 0; i < n; i++){
     printf("%s
", s –> s[i] –> w.S);
  } 
}

Some of the Terminal handling functions do not push or pop anything on the Semantic stack. They are:pSEMI(), pIF(), pFI(), pELSE(), pTHEN(), plp(), prp(). They simply check that the symbol read is as required by the syntax. We shall call them delimiter functions, as they possibly indicate some action within the RDP, but do not deal with the Semantic stack.

For example,

pIF(){
  printf("Entry: pIF()
");
  if(symbol() != IF){
    fprintf(stderr, "Expecting I");
    exit(–1);
  }
}

On the other hand, there are Terminals handling functions which push a string on the Semantic stack. They are:pASSGN(), pa(), pplus(), ppast(). For example,

pASSGN(){
  printf("Entry: pASSGN()
");
  if(symbol() != ASSGN){
     fprintf(stderr, "Expecting =");
     exit(–1);
  }
  sstk_push(&semstack,tree_root("="));
}
pa(){
  printf("Entry: pa()
");
  int c;
  char str[4];
  if((c = symbol()) < ‘a’ || c > ‘z’ ){ 
      fprintf(stderr, "Expecting a–z");
      exit(–1);
  }
  sprintf(str,"%c",c);
  sstk_push(&semstack,tree_root(str)); 
}

Note that the function pa() allows us to use any small alphabetic character as a variable or number name.

Here are the rules by which the Semantic stack is operated:

Terminal functions (nondelimiters): Push the semantic value of the Terminal token, which generally is itself, on the Semantic stack. We have already seen examples of this rule above.

NonTerminal functions: Just after calling a non-delimiter Terminal function pop the stack and if required by the output Intermediate language, hold the value in a temporary variable, to be issued to the output later. For example, code for pE() is given below:

pE(){
  Node *np;
  printf("Entry: pE()
");
  pT();
  while(nexttoken() == ‘+’){
    pplus();
    np = sstk_pop(&semstack); // hold
    pT();
    strcat(out, np –> w.S); // to get RPN 
  }
}

The code for pT() is similar.

NonTerminal functions: Some of them will have to manipulate the order of tokens extracted from the Semantic stack, in order to get a proper RPN. For example,

basic_statement(){
  Node *np1, *np2;
  printf("Entry: basic–stsmt
");
  pa();
  np1 = sstk_pop(&semstack);        // LHS of assignment
  pASSGN();
  np2 = sstk_pop(&semstack);        // ‘=’
  pE();
  strcat(out, np1 –> w.S); strcat(out, np2 –> w.S); 
}

NonTerminal functions: It require additional semantic information to be inserted in the Intermediate representation output, for example, IF statement:

if_statement(){
  char lab1[10], lab2[10];
  printf(“Entry: IF–stsmt
”);
  pIF();
  pE();                                      // expr
  strcpy(lab1,genlabel());
  strcat(out,lab1); strcat(out, "BF ");
  pTHEN();
  statement_list();                          // stmt1
  if(nexttoken() == FI){
     pFI();
     strcat(out,lab1); strcat(out,": ");
  } else {
     pELSE();
     strcpy(lab2,genlabel());
     strcat(out,lab2); strcat(out, "BR ");
     strcat(out,lab1); strcat(out,": ");
     statement_list();                       //stmt2
     pFI();
     strcat(out,lab2); strcat(out,": "); 
  }
}

Utility functions: For example, a function to generate unique labels for control constructs like IF, WHILE, GOTO, etc.:

int nextlabel = 1;
char *genlabel(){
  char *s;
  s = malloc(8);
  sprintf(s, "LAB%d ", nextlabel++);
  return s;
}

In our example program tb_rdp_if_sstk2.c, the main() function, besides doing routine setup, reads the input string and prints out the final Intermediate representation in RPN.

We gave a string a = s * (d + f) as input and received RPN s d f + * a = as output. The following trace of the program shows the working of the synchronized Semantic stack.

                            Return–Stack      Semantic–Stack   In calling func. 
Entry: stsmt–list
Entry: stsmt	             push
Entry: basic–stsmt           push
Entry: pa()	             push,pop          push(a),pop      hold1(a)
Entry: pASSGN()	             push,pop          push(=),pop      hold2(=)
Entry: pE()	             push
Entry: pT()	             push
Entry: pF()	             push
Entry: pa()	             push,pop,pop      push(s),pop      out(s)
Entry: ppast()	             push,pop          push(*),pop      hold3(*)
Entry: pF()                  push
Entry: plp()	             push,pop
Entry: pE()	             push
Entry: pT()	             push
Entry: pF()	             push
Entry: pa()	             push,3 pop's      push(d),pop      out(d)
Entry: pplus()	             push,pop          push(+),pop      hold4(+)
Entry: pT()	             push
Entry: pF()	             push
Entry: pa()	             push,3 pop's      push(f),pop      out(f)
Entry: prp()	             push,4 pop’s	                hold4 to out(+)
                                    pop          	        hold3 to out(*)
                                  2 pop’s	                hold1 to out(a)
                                                                hold2 to out(=)
DONE
s d f + * a =

If we wanted different types of Intermediate representation, we would have manipulated the Semantic stack differently (Chapter 8).

5.2.1 Semantic Stack in yacc

We have already introduced a well-known parser generator yacc in Chapter 4, Section 4.4, and introduced there the basic semantic facilities in yacc.

Internally, yacc maintains two stacks in memory – a parse stack and a value (or attribute) stack. The parse stack contains Terminals and non-terminals and represents the current parsing state. The value stack is an array of YYSTYPE elements, and it associates a value with each element in the parse stack. For example, when lex returns an INTEGER token, yacc shifts (pushes) this token to the parse stack. At the same time, the corresponding yylval is shifted to the value stack. The parse and value stacks are always synchronized, so finding a value related to a token on the stack is easily accomplished. Here is a portion of the yacc input specification for a calculator:

expr:
       INTEGER         { $$ = $1; }
       | expr ‘+’ expr { $$ = $1 + $3; }
       | expr ‘–’ expr { $$ = $1 – $3; }
       ;

The grammar for expr is augmented by adding additional terms, called Action terms or symbols (Section 5.3). These action terms are enclosed in {}. Do not confuse these curly brackets with those used in C language, though the action term does generate a corresponding C code.

Here within the action terms, the yacc symbol $$ denotes the semantic value associated with the LHS of a grammar rule. Symbols $1 and $3, respectively, represent the values associated with the 1st and 3rd terms in a grammar rule, i.e. the two expr terms.

In the example, the augmented grammar says that if an INTEGER is detected, the value to be assigned to LHS ($$) is that given by the Scanner as yylval ($1). If expr ‘+’ expr is parsed, the value to be assigned to LHS is to be obtained by adding the value of the first expr and the second expr. Similarly for expr ‘–’ expr, actually what happens is this: when yacc applies the rule,

expr: expr ‘+’ expr { $$ = $1 + $3; }

it replaces the RHS of the production in the parse stack with the LHS of the same production. In this case, it pops expr ‘+’ expr and pushes expr. It has now reduced the stack by popping three terms off the stack and pushing back one term. The yacc symbol $$ designates the top of the value stack after reduction has taken place. The above action adds the value associated with two expressions, pops three terms off the value stack and pushes back a single sum. Thus, the parse and value stacks remain synchronized.

The value (or the attribute in theoretical literature) assigned to the LHS of a grammar rule by the method discussed above is called Synthesized attribute, as it is created out of the attributes of the RHS of the same rule, see Section 5.4.4 for more details.

In fact, yacc has some more facilities for accessing the semantic values, which we shall discuss at an appropriate place.

There is one aspect of the Semantic stack in yacc which we have not mentioned till now.

The semantic value of a grammar symbol can be almost anything:

  • an integer, a float, a memory address or pointer, a string, even a data structure
  • how does yacc handle that?

The value (Semantic) stack is formed as an array of cells of type YYSTYPE, which is also the type of yylval. In yacc, the input file defines YYSTYPE by a union construct similar to, for example,

%union {
  int i;      /* integer value      */
  float f;    /* float value        */
  char s;     /* symbol table index */
  Node *np;   /* node pointer       */
};

This results in the following code being included in the yacc output header file y.tab.h:

typedef union {
     int i;	/* integer value      */
     float f;	/* float value        */
     char s;	/* symbol table index */
     Node *np;	/* node pointer       */
} YYSTYPE;
extern YYSTYPE yylval;

The type of semantic value for each type of term in the grammar is defined at the beginning of the yacc input file as:

%token <i> INTEGER
%type  <f> expr

This binds expr to f, and INTEGER to i in the YYSTYPE union. With this binding, yacc automatically generates a correct code, selecting a proper union member from YYSTYPE.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.234.150