Chained Hash Table Example: Symbol Tables

An important application of hash tables is the way compilers maintain information about symbols encountered in a program. Formally, a compiler translates a program written in one language, a source language such as C, into another language, which is a set of instructions for the machine on which the program will run. In order to maintain information about the symbols in a program, compilers make use of a data structure called a symbol table. Symbol tables are often implemented as hash tables because a compiler must be able to store and retrieve information about symbols very quickly.

Several parts of a compiler access the symbol table during various phases of the compilation process. One part, the lexical analyzer, inserts symbols. The lexical analyzer is the part of a compiler charged with grouping characters from the source code into meaningful strings, called lexemes. These are translated into syntactic elements, called tokens , that are passed on to the parser . The parser performs syntactical analysis. As the lexical analyzer encounters symbols in its input stream, it stores information about them into the symbol table. Two important attributes stored by the lexical analyzer are a symbol’s lexeme and the type of token the lexeme constitutes (e.g., an identifier or an operator).

The example presented here is a very simple lexical analyzer that analyzes a string of characters and then groups the characters into one of two types of tokens: a token consisting only of digits or a token consisting of something other than digits alone. For simplicity, we assume that tokens are separated in the input stream by a single blank. The lexical analyzer is implemented as a function, lex (see Examples Example 8.4 and Example 8.5), which a parser calls each time it requires another token.

The function works by first calling the next_token function (whose implementation is not shown) to get the next blank-delimited string from the input stream istream. If next_token returns NULL, there are no more tokens in the input stream. In this case, the function returns lexit, which tells the parser that there are no more tokens to be processed. If next_token finds a string, some simple analysis is performed to determine what type of token the string represents. Next, the function inserts the lexeme and token type together as a Symbol structure into the symbol table, symtbl, and returns the token type to the parser. The type Symbol is defined in symbol.h, which is not included in this example.

A chained hash table is a good way to implement a symbol table because, in addition to being an efficient way to store and retrieve information, we can use it to store a virtually unlimited amount of data. This is important for a compiler since it is difficult to know how many symbols a program will contain before lexical analysis.

The runtime complexity of lex is O (1), assuming next_token runs in a constant amount of time. This is because lex simply calls chtbl_insert, which is an O (1) operation.

Example 8.4. Header for a Simple Lexical Analyzer
/*****************************************************************************
*                                                                            *
*  --------------------------------- lex.h --------------------------------  *
*                                                                            *
*****************************************************************************/

#ifndef LEX_H
#define LEX_H

#include "chtbl.h"

/*****************************************************************************
*                                                                            *
*  Define the token types recognized by the lexical analyzer.                *
*                                                                            *
*****************************************************************************/

typedef enum Token_ {lexit, error, digit, other} Token;

/*****************************************************************************
*                                                                            *
*  --------------------------- Public Interface ---------------------------  *
*                                                                            *
*****************************************************************************/

Token lex(const char *istream, CHTbl *symtbl);

#endif
Example 8.5. Implementation of a Simple Lexical Analyzer
/*****************************************************************************
*                                                                            *
*  --------------------------------- lex.c --------------------------------  *
*                                                                            *
*****************************************************************************/

#include <ctype.h>
#include <stdlib.h>
#include <string.h>

#include "chtbl.h"
#include "lex.h"
#include "symbol.h"

/*****************************************************************************
*                                                                            *
*  ---------------------------------- lex ---------------------------------  *
*                                                                            *
*****************************************************************************/

Token lex(const char *istream, CHTbl *symtbl) {

Token              token;

Symbol             *symbol;

int                length,
                   retval,
                   i;

/*****************************************************************************
*                                                                            *
*  Allocate space for a symbol.                                              *
*                                                                            *
*****************************************************************************/

if ((symbol = (Symbol *)malloc(sizeof(Symbol))) == NULL)
   return error;

/*****************************************************************************
*                                                                            *
*  Process the next token.                                                   *
*                                                                            *
*****************************************************************************/

if ((symbol->lexeme = next_token(istream)) == NULL) {

   /**************************************************************************
   *                                                                         *
   *  Return that there is no more input.                                    *
   *                                                                         *
   **************************************************************************/

   free(symbol);
   return lexit;

   }

else {

   /**************************************************************************
   *                                                                         *
   *  Determine the token type.                                              *
   *                                                                         *
   **************************************************************************/

   symbol->token = digit;
   length = strlen(symbol->lexeme);

   for (i = 0; i < length; i++) {

      if (!isdigit(symbol->lexeme[i]))
         symbol->token = other;

   }

   memcpy(&token, &symbol->token, sizeof(Token));

   /**************************************************************************
   *                                                                         *
   *  Insert the symbol into the symbol table.                               *
   *                                                                         *
   **************************************************************************/

   if ((retval = chtbl_insert(symtbl, symbol)) < 0) {

      free(symbol);
      return error;

      }

   else if (retval == 1) {

      /***********************************************************************
      *                                                                      *
      *  The symbol is already in the symbol table.                          *
      *                                                                      *
      ***********************************************************************/
      
      free(symbol);

   }

}

/*****************************************************************************
*                                                                            *
*  Return the token for the parser.                                          *
*                                                                            *
*****************************************************************************/

return token ;

}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.48.181