6.8 Example: Type Checking in an Interpreter

As an interpreter has to immediately execute a given program, it should be able to do some preliminary type checking. We take an example of a simple interpretive language. Its grammar is:

program :  funcs END;
funcs   :  func
        |  func funcs
        ;  
func    :  typeid ‘(’ typeids ‘)’ ‘:’ expr ;
typeid  :  INT ID
        |  BOOL ID
        ;  
typeids :  typeid
        |  typeid ‘,’ typeids
        ;
expr    :  NUM
        |  ID
        |  expr ‘+’ expr
        |  expr ‘~’ expr
        |  IF expr THEN expr ELSE expr END
        |  ID ‘(’ exprs ‘)’
        |  LET ID ‘=’ expr IN expr END
        ;  
exprs   :  expr
        |  expr ‘,’ exprs
        ;

It is a small functional language, where the program is made up of function definitions. Each function declares its result type and the types and names of its arguments. Functions and variables have separate name spaces. Though the grammar is ambiguous, we will assume that the ambiguities have been resolved during parsing.

The body of a function is an expression. Comparison operation ‘ ‘ is defined both on booleans and integers, but addition ‘+’ only on integers. Numbers, NUM, are integers only. The tokens INT, BOOL, ID, NUM, IF, THEN, ELSE, LET, IN are returned by the Scanner on scanning appropriate lexical representations.

Since a value can be either integer or boolean, the interpreter uses a value type that contains both integer and boolean. The interpreter calls a function error() if any error, including type error, is found.

When an expression is to be evaluated, in addition to the AST of the expression, we need a Symbol Table vtable which binds a variable to its value. Also, in order to handle function calls, we need another Symbol Table ftable which binds a function name to the AST of its declaration. Incidently, this method of handling function definitions is similar to the way it is implemented for hoc6 in “UNIX Programming Environment” by Kernighan and Pike. The result of evaluating an expression is the value of the expression.

We assume that the following functions are available:

getname(ID) extracts the name of an identifier from vtable.

getvalue() returns the value of a number.

eval(expr, vtable, ftable) returns value – either a boolean or an integer.

call(func) handles function calls.

lookup(table, name) lookup the value of a name.

bind(table, name, value) binds ‘value’ to ‘name’ in the ‘table’.

 

We are not interested here in full implementation of the interpreter, but only in how type checking is applied. In view of that, we give various actions taken by the evaluator function eval() in the form of a table:

eval(expr, vtable, ftable) = case expr of
NUM : getvalue(NUM)
ID  : v = lookup(vtable, getname(ID))
      if v = unbound then error()
      else v
expr1 + expr2 : v1 = eval(expr1, vtable, ftable)
                v2 = eval(expr2, vtable, ftable)
                if v1 and v2 are integers then v1 + v2
                else error()
expr1 ~ expr2 : v1 = eval(expr1, vtable, ftable)
                v2 = eval(expr2, vtable, ftable)
                if v1 and v2 are both integers or both booleans 
                    then if v1 = v2 then True else False
                else error()
IF expr1 THEN expr2 ELSE expr3 END:
                v1 = eval(expr1, vtable, ftable)
                if v1 is a boolean then
                   if v1 = True then eval(expr2, vtable, ftable)
                               else eval(expr3, vtable, ftable)
                else  error()
ID(exprs) : def = lookup(ftable, getname(ID))
            if def = unbound then error()
            else args = evals(exprs, vtable, ftable)
                 call(def, args, ftable)
LET ID = expr1  IN expr2 END: v1 = eval(expr1, vtable, ftable)
                        vtable’ = bind(vtable, getname(ID), v1)
                        eval(expr2, vtable’, ftable)
endcase

The function evals builds a list of the values of the expressions in the expression list. A list is written in square brackets with commas between the elements. The operator ‘::’ adds an element to the front of a list.

evals(exprs, vtable, ftable) = case exprs of
expr : ‘[’ eva1(expr, vtable, ftable) ‘]’
expr ‘,’ exprs : eval(expr, vtable, ftable) ‘::’ evals(exprs, vtable, ftable)

Note that all the if predicates on the RHS in the above table are concerned with type checking. For example, interpreting a function call will proceed as follows. A function declaration explicitly declares the types of the arguments. When a function is called, it must be checked that the number of arguments is the same as the declared number, and that the values of the arguments match the declared types.

If this is true then we build a Symbol Table that binds the parameter variables to the values of the arguments and use this in evaluating the body of the function. The value of the body must match the declared result type of the function. So the call() is to be implemented as:

call(func, args, ftable) = case func of
typeid ‘(’ typeids ‘)’ ‘=’ expr : (f,t0) = gettypeid(typeid)
                                  vtable = bindtype(typeids, args)
                                   v1 = eval(expr, vtable, ftable)
                                   if type(v1) = t0 then v1
                                   else error()
endcase

The function gettypeid() just returns a pair of the declared name and type.

gettypeid(typeid) = case typeid of
INT ID : (getname(ID), INT)
BOOL ID : (getname(ID), BOOL)
endcase

The function bindtype() checks the declared type against a value and builds a Symbol Table that binds the name to the value if they match. It also checks if all parameters have different names by checking il the current name is already bound.

bindtype(typeids, args) = case(typeids, args) of
’(’typeid ‘,’’[’ v ‘]’’)’ : (x,t) = gettype(typeid)
                            if type(v) = t then bind(emptytable, x, v)
                            else error()
’(’typeid’,’typeids’,’
’(’v’::’vs’)’’)’          : (x,t) = gettype(typeid)
                            vtable = bindtype(typeids, vs)
                            if lookup(vtable, x) = unbound and type(v) = t
                             then bind(vtable, x, v)
                            else error()
anything else             : error()

A program is executed by calling the main() function, which must be present in any program, with a single argument that is the input to the program. So we build the Symbol Table for the functions, look up main() in this table and call call() with the resulting definition and an argument list containing just the input.

run(program, input) = case program of
funcs : ftable = buildftable(funcs)
        def = lookup(ftable, ‘main’)
        if def = unbound then error()
        else call(def, [input], ftable)

buildftable(funcs) = case funcs of
func  : f = getfname(func)
        bind(emptytable, f, func)
func funcs : f = getfname(func)
             ftable = buildftable(funcs)
             if lookup(ftable, f) = unbound
                then bind(ftable, f, fcun)
             else error()

getfname(func) = case func of
typeid’(’typeids’)’’=’ expr : (f,t0) = gettype(typeid)
                              f

Note that you can implement this interpreter by using yacc to parse the gram mar and yacc actions to specify the activity on the RHS of the above tables, when the corresponding syntactic construct given on the LHS is parsed. You are told to implement a limited portion of this interpreter, with all its type checking in one of the Exercises. Here we give some hints about how type checking can be done.

We shall assume the use of the Node data structure given below:

typedef struct node_struct{
struct node_struct *link[2];
util u,v,w,x,y,z;
}Node;

typedef Node Symbol;
#define L(n)       ((n)–>link[0])
#define R(n)       ((n)–>link[1])
#define name(n)    ((n)–>w.S)
#define type(n)    ((n)–>v.I)
#define ival(n)    ((n)–>u.I)
#define ttree(n)   ((n)–>z.N)
#define subtype(n) ((n)–>y.I)

Note that while type checking identifiers and numerical constants, we have to deal with two levels of types – syntactic types like ID, NUMB, IF, THEN, etc. – and semantic value types like INTEGER, BOOLEAN, FLOAT, etc. Thus an identifier has type ID and subtype INTEGER or BOOLEAN. We have provided for this requirement in the Node structure. Also, with each syntactic token, we have to associate its constructor tree as discussed in Section 6.2. This tree is linked via the link field ttree. Further, note that type Symbol is made equivalent to type Node.

The grammar as given above is ambiguous and we need to take help of operator precedences so that yacc can resolve the ambiguities adequately. The following precedence specification seems reasonable:

%union {
   Symbol *sym; /* symbol table pointer */
}
%token <sym> INT BOOL ID NUM IF THEN ELSE LET IN END UNDEF 
%token   ‘:’ ‘;’
%left    ‘,’
%token   ‘=’
%left    ‘~’
%left    ‘+’

We now show yacc actions for some of the expr constructs. You should compare them against the table given above.

expr  : NUM { $$ = copy_node($1); }
      | ID  { int t; char s[40];
              if((t = type($1) == UNDEF){
                  sprintf(s, ″undefined [%s]
″,name($1)); yyerror(s);}
              else { $$ = copy_node($1);}
            }
      | expr ‘+’ expr { int x,y; char s[40]; x = ival($1); y = ival($3);
                        if(subtype($1) == INT && subtype($3) == INT){
                        $$ = tree_node(gentemp(″plus″));
                        ival($$) = x + y;
                        subtype($$) = subtype($1); type($$) = subtype($1);
                        subtype(install(name($$),subtype($$),ival($$)))
                                       = subtype($1);}
                      else{sprintf(s,″not int [%s %s]
″,name($1),
                          name($3)); yyerror(s);
                      }

In the above, tree_node() is a utility function which generates a new Node with a given name attribute. Unique names with specified prefix are generated by gentemp().

We have given below one more portion of the evaluator – LET construct which substitutes a value for an identifier in an expression.

| LET ID ‘=’ expr { Symbol *sp; sp = lookup(name($2)); type(sp) = ID;
	            subtype(sp) = subtype($4); ival(sp) = ival($4); }
	            IN expr END %prec ‘;’
	          { type($2) = ID; subtype($2) = subtype($4);
                    ival($2) = ival($4); $$ = tree_node(gentemp(″letexpr″));
                    ival($$) = ival($7); subtype($$) = subtype($7);
                    type($$) = type($7);
                    subtype(install(name($$),subtype($$),ival($$))) = subtype($$);
                  }
;

To properly implement the semantics of the LET construct, it is required to update the attributes of the IDentifier in the Symbol Table with values obtained from the expr, before the second expr is computed. This requires two actions – first updates the attributes of the IDentifier and the second computes the second expression and assigns its value to the LHS and inserts in the Symbol Table for diagnostics purpose.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.137.58