Chapter 9

Symbol Table

Symbol table organization is important for improving the efficiency of the compiler. It is important to understand the different forms of symbol table and how it effects the performance while retrieving the variable information

CHAPTER OUTLINE 

Symbol table is an important data structure which stores the information of all the variables and procedures in the program. This table is created during first phase and is used by all the phases for inserting the information or retrieving the information. Organizing the elements in the symbol table shows the impact on the performance of the compiler. This chapter focuses on what type of details are stored in the symbol table, how they are organized and how the arrangement of these elements effect the retrieval time. The focus in mainly on simple array structure, linked list, trees and hash tables for both block and non-block structured languages.

9.1  Introduction

So far we have discussed the different phases of the compiler, each performing some specific task. The main objective of any compiler is to generate a target code that corresponds to a given source code in terms of task execution, correctness, and meaningfulness. These objectives are achieved with the support of a special data structure called the symbol table. It is a data structure that stores information about the name, type, and scope of the variables for performing the tasks defined in all the phases of a compiler.

The symbol table is created during the lexical analysis phase and is maintained till the last phase is completed (Table 9.1). It is referred to in every phase of the compiler for some purpose or the other.

 

Table 9.1 Symbol table

S.No.
Phase
Usage

1.

Lexical

Creates new entries for each new identifiers

2.

Syntax

Adds information regarding attributes like type, scope, and dimension, line of reference, and line of use.

3.

Semantic

Uses the available information to check for semantics

4.

Intermediate

Helps to add temporary variables information.

5.

Optimization

Helps in machine-dependent optimization by considering address and aliased variables information.

6.

Code Generation

Generates the code by using the address information of the identifiers.

It is clear that every time the compiler finds a new identifier in the source code during lexical analysis phase, it needs to check if this identifier is already in the table, and if not it needs to store it there. Every insertion operation is always preceded with search operation during lexical phase. During other phases it searches in the symbol table to access the attribute information of the entries.

The basic two operations that are often performed with the symbol table are insert—to add new entries—and lookup—to find existing entries. The performance of the table depends on how these elements are arranged in the table. The different mechanisms that govern the performance of the table are linear list, hierarchical list, and hash-based lists. These mechanisms are evaluated based on the number of insertions(n) and the number of lookup(e) operations. A linear list is the simplest to implement, but its performance is poor when n and e are large. Hierarchical list gives performance proportional to n(log(n))+e(log(n)), where n is the number of insertions and e is the number of lookup operations. Hashing schemes provide better performance with greater programming effort and space overhead.

Apart from organizing the elements, the size of the table is also an important factor. The size can be fixed when the compiler is written. The fixed size has a limitation—if chosen small, it cannot store more variables; if chosen large, a lot of space is wasted. It is important for the symbol table to be dynamic to allow the increase in size at compile time.

The entries in the symbol table are made during the lexical phase as soon as the name is seen in the declaration statement of the input program. The information on the attributes is added when available. In some cases, the attribute information is added along with entry, on the first appearance of the variable.

For example, the C declarations

 

char NAME[20];

int AGE;

char ADDRESS[30];

int PHONENO[10];

For block structured languages, the entries are made when the syntactic role played by the name is discovered. The attributes are entered as action corresponding to identifying a LEXEME for a token, which is an identifier in declaration statements. This action is performed for every colon encountered in the sequence of the variable list.

9.2  Symbol Table Entries

Each entry in the symbol table is associated with attributes that support the compiler in different phases. These attributes may not be important for all compilers, but each should be considered for a particular implementation.

  • Name
  • Size
  • Dimension
  • Type
  • Line of declaration
  • Line of usage
  • Address

There is a distinction between the token id for an identifier or name, the lexeme consisting of the character string forming the name, and the attributes of the name. Lexeme is used mainly to distinguish the attributes of one variable from the other categorized as the same token type. During the lexical analysis, when the lexeme that corresponds to an identifier is found, it first looks up in the symbol table and if it does not appear then the entry is made. A common representation of a name is a pointer to a symbol table entry for it.

All these attributes are not of a fixed size. Their space requirement in symbol table is not always constant. For instance, the size of the variable’s name is language dependent. If there is an upper limit on the length, then the characters in the name can be stored in the symbol table entry as shown in Figure 9.1.

Attributes

Figure 9.1

Figure 9.1 Symbol table

In fixed size space within a record.

In language like BASIC and FORTRAN, the name is of a limited size of two or six. In such languages it is better if the name is stored in the symbol table.

If the limit on the length is not fixed, then it is not feasible to fix the size in the symbol table. The solution in such a case is to use the indirect scheme. Instead of storing the variable name, it is good to store the address of the location where the variable is stored. The indirect scheme permits the size of the name field of the symbol table entry itself to remain a constant.

The complete lexeme constituting a name must be stored in a separate location to ensure that all uses of the same name can be associated with the symbol table record. The following table shows the entries in the symbol table for the above example, which is with fixed size space requirement.

In languages like C, C++, Java, etc., the variable name can vary from one character to 31 characters. In such cases the indirect scheme provides a better way of storing the information.

Figure 9.3

Figure 9.2 Array to store all name attributes

The name attribute in the symbol table has two values, the starting address of the name and the size of the name. In this approach, good space is saved but the disadvantage is when the compiler has to look for the attribute information. It first looks in the symbol table for the first name location; then it checks if that is the required variable; if that is not the required variable, it has to again make the symbol table reference until it gets the required variable information. A lot of time is wasted in searching for the required information.

9.3  Operations on the Symbol Table

The operations performed on the symbol table are dependent on whether the language is block structured or non-block structured. In case of non-block structured languages, there is only one instance of the variable declaration and its scope is throughout the program. In block structured languages, the variables may be re-declared and its scope is within the block. To handle the variable information, some more operations are required to keep track of the scope of the variable.

For non-block structured languages the operations are:

  • Insert
  • Lookup

For block-structured languages the operations are:

  • Insert
  • Lookup
  • Set
  • Reset

Insert (s,t) performs the insertion of string s and token t and returns the index of this new entry.

Lookup(s) finds for string S, if found, it returns the index, to get attributes; otherwise, it returns 0.

Set operation is performed on every block entry to mark the beginning of a new scope of variables declared in that block. Every insert and look up operation depends on this information entered.

Reset is performed at every block exit, to remove all declarations inside this scope and the details of the block.

Set and reset operations are performed in block structured languages to keep the scope information of the variables. Since, scope behave like stacks the best way to implement these functions is to use a stack. When we begin a new scope, we push a special marker to the stack to indicate the beginning of the block. The list of new declarations is inserted after this special marker is verified. At the exit of block the scope the element ends and hence all the elements inserted are popped off the stack.

9.4  Symbol Table Organization

We can use any data structure for the symbol table. The performance of any compiler depends on the organization of the symbol table and the type of declaration supported. Some programming languages support implicit declaration. In such languages, the variable can be declared or can be directly used without declaration. For example, in FORTRAN, the variables can be used without declaration. It considers the variable as int if it starts with I, J, K, L, M, or N; otherwise, it is real. In some languages, all declarations are done at the start of the program and are used later. All insert operations are performed first followed by lookup operations in case of explicit languages. In implicit languages, the insert and lookup operations may be performed at any time. The various ways a symbol table can be stored are as follows and each has its own advantages and disadvantages.

  • Linear list
    • Array
      • Ordered
      • Unordered
    • Linked list
      • Ordered
      • Unordered
    • Hierarchical
      • Binary search tree
      • Height balanced tree
  • Hash table.

9.5  Non-block Structured Language

9.5.1  Linear List in Non-block Structured Language

It is the simplest form of organizing the symbol table to add or retrieve attributes. The list can be ordered or unordered. Consider the following example in Ada language shown in Figure 9.3.

 

void Number(int Mike, int Alice, int John_Smith, float F=1.0)

{

printf( "Enter value of Mike " );

scanf("%d",&Mike);

printf( "Enter value of Alice ");

scanf("%d",&Alice);

John_Smith = 3*Mike + 2*Alice + 2;

printf("%d ",3*Mike + 2*Alice + 11);

printf("%d ",John_Smith);

John_Smith=Mike + Alice+1000000;

printf("A million more than Mike and Alice %d ", John_Smith);

F=F+float(Mike) + 3.1415265;

printf("F as an Integer %d ".F);

}

Figure 9.3 Program in Ada

In the above example, the variables are Mike, Alice, and John_Smith declared as integers and F declared as float, and value assigned as 1.0.

9.5.1.1  Ordered List

In an ordered symbol table, the entries in the table are lexically ordered on the variable names. Every insertion operation must be preceded with a lookup operation to find the position of the new entry. Two search techniques can be applied, that is, linear or binary search. The following Figure 9.4 shows the order in which these variables are inserted.

Figure 9.4

Figure 9.4 Insert operation in ordered list

Performance:

Insert operation has the overhead of moving the elements to find the place to insert the new element. On the average, the lookup time would be (n + l)/2 if linear search is used. This can be improved if binary search is used and it would be proportional to (log n).

9.5.1.2  Unordered List

As shown in Figure 9.5, it is easy to insert the elements in an unordered list, since it inserts at the end. The look up time increases as it has to search the entire list to fetch the information of variables.

image
Figure 9.5

Figure 9.5 unordered list – Insert

Performance:

If the language supports explicit declaration then there is no need to perform a lookup operation for every insertion. But to avoid duplication, complete table checkup may be needed after all insertions. The lookup operation requires, on an average, a search length of (n + l)/2 assuming there are n records in the table. This value is derived based on the position of the element while comparisons are made as follows:

image

If the element is not found, it indicates an error condition and the error handler should handle it.

If variables are declared implicitly, attributes are added according to the order in which they are encountered during compilation. Every insert operation must be preceded by a look up operation. If the lookup operation returns 0, it indicates the absence of a variable and it must be inserted. In worst case scenario, it requires n elements to be checked and then inserted.

If only lookup operation is performed, then on an average it requires (n + l)/2 comparisons. Let the ratio of first variable reference to total variable references be denoted by x; then the lookup and insertion process requires on an average

      x * n + (1 – x) * (n + 1) / 2

This unordered table organization should be used only if the number of variables are less or the table size is small, since the average time for lookup and insertion is directly proportional to the table size.

These tables are not suitable where the insert and look up operations are always applied. They are good to be used to store the reserved words in a language. The main drawback with arrays is the overhead and fixed size. This is overcome by the linked list.

9.5.2  Linked List or Self-organizing Tables

9.5.2.1  Ordered list

The variables information is inserted as shown in Figure 9.6 in the sequence encountered. The node requires an extra field to store the pointer to the next node and the last node in the list has NULL.

Figure 9.6

Figure 9.6 Insertion sequence

For each insertion operation, it should check if the element is there in the list, if not present, then it creates a new node and places in the order changing only two link pointers. This makes the insertion to overcome the overhead of moving the elements. The time to insert the element is O(n) + O(1). The lookup operation is always sequential with worst case O(n).

9.5.2.2  Unordered Linked List

Figure 9.7

Figure 9.7 Insertion in unordered list

In case of an unordered list, the time for insertion is 0(1) as the insertion is done at head node as shown in Figure 9.7. Each lookup operation always has the worst case time since it has to search the entire list.

9.5.3  Hierarchical List

Binary search tree is a more efficient approach to symbol table organization. We add two links, left and right, in each record, and these links point to the record in the search tree. Whenever a new name is found decision is made either to add it or ignore it. First the name is searched in the tree if it exists, then it is ignored. If it does not exist, then a new record is created for the new name and is inserted as a leaf node. This organization follows a lexicographical order, that is, all the names accessible from namei with value less than namei are found by following a left link. Similarly, all the names accessible from namei that follow namei in alphabetical order are found by following the right link. The expected time needed to enter n names and to make m queries is proportional to (m + n) log2n. So for a greater number of records (higher n), this method has advantages over linear list organization. For the example program the tree structure is shown below in Figure 9.8.

  1. On inserting first variable Mike.
    image
  2. On inserting the second variable Alice.
    image
  3. On inserting third element John_smith.
    image
  4. On inserting the last element F the tree is as shown below.
    Figure 9.8

    Figure 9.8 Insertion in hierarchical list

From the above example, it is clear that the tree structure may not always give better performance. If the tree formed is a skewed tree (each level has only one node), then the time complexity is again dependent on the number of elements in the program. This can be overcome by using height balanced trees like AVL trees or 2 – 3 trees.

9.5.3.1  AVL Trees

An AVL tree is similar to a binary search tree but involves balancing operations after insertions and deletions when it leaves the tree unbalanced. These operations are called rotations, which help to restore the height balance of the sub-trees.

Lookup

Lookup in an AVL tree is performed exactly as in an unbalanced binary search tree. The only difference lies in the time taken to execute the lookup operation. It is proportional to the height of the tree which is O(log n). The time for search is maintained as O(log n). The efficiency can be further increased by adding index number to each node and the elements are retrieved based on index.

The parent or child node can be explored in amortized constant time after a node is searched in the tree. The traversing requires maintaining at most 2 × log(n) links. If it is required to explore n nodes it may require at most approximately 2 × (n – 1)/n links, which is just 2.

Insertion

Every node in an AVL tree is associated with a balance factor bf. This bf is the difference of height of left and right sub-trees.

 

      bf = H1 – Hr

 

The permissible value can be –1, 0, or +1 if the value is o then the node is balanced. If the value is 1, then we say the node is left heavy as the left sub-tree has height one more than the right sub-tree. If it is –1, then we say it is right heavy as the height of the right sub-tree is one more than the left sub-tree. However, if the balance factor becomes ±2, then the sub-tree rooted at this node is unbalanced and rotation is applied to balance the sub-tree.

After inserting a node, the balance factor is adjusted for all the nodes that lie in the path from the point of insertion and the root. If there is any node whose value is ±2, then depending on the condition, the balancing rules are applied to balance it and are shown in Figure 9.9 to 9.12.

The following figures explain how rotations can rebalance the tree, proceeding toward the root and updating the balance factor of the nodes that lie in the path. There are four types of rotations where two are symmetric to the other two in opposite directions.

Right–Right (RR)

This case occurs when X is the root sub-tree with Y as the right child of X. Let Z be the right child of Y. Let the height if XL, YL, ZL, ZR be H. Then the bf Z is 0, Y is –1 and for X is –2. Hence, node X is critical. Since the node is critical to the right and the right child is right heavy, we apply Right–Right rotation (single). The resultant tree has the node Y as root of the tree and X as left child and Z as right child. The sub-trees left to Y are adjusted as shown in the Figure 9.9.

Figure 9.9

Figure 9.9

Right–Left (RL)

This case occurs when X is the root sub-tree with Z as the right child of X. Let Y is the left child of Z. Let the height if XL, YL, YR, ZR be H. then the bf Y is 0, Z is 1 and for X is –2. Hence node X is critical. Since the node is critical to the right and the right child is left heavy, we apply Right–Left rotation (double). The resultant tree has the node Y as the root of the tree and X as the left child and Z as the right child. The sub-trees left to Y are adjusted as shown in the Figure 9.10.

Figure 9.10

Figure 9.10

Left–Left (LL)

This case occurs when Z is the root sub-tree with Y as the left child of Z. Let X be the left child of Y. Let the height of XL, XR, YR, ZR be H. Then the bf of X is 0, Y is 1, and for Z is 2. Hence, node Z is critical. Since the node is critical to the left and the left child is left heavy, we apply Left–Left rotation(single). The resultant tree has the node Y as the root of the tree and X as left child and Z as right child. The sub-trees right to Y are adjusted as shown in the Figure 9.11.

Figure 9.11

Figure 9.11

Left–Right (LR)

This case occurs when Z is the root sub-tree with X as the left child of Z. Let Y be the right child of X. Let the height of XL, YL, YR, ZR be H. Then the bf of Y is 0, X is –1 and for Z is 2. Hence node Z is critical. Since the node is critical to the left and the left child is right heavy, we apply Left–Right rotation (double). The resultant tree has the node Y as the root of the tree and X as the left child and Z as the right child. The sub-trees left to Y are adjusted as shown in the Figure 9.12

Figure 9.12

Figure 9.12

Deletion

Deletion of the node is similar to the binary tree. After deletion, the balance factor of the nodes is adjusted till it encounters the root node. If the balance factor for the tree is +2 / –2 and that of right/left sub-tree is 0, then right/left rotation is performed at the root of that sub-tree.

While retracing, if the balance factor of any node has a value between –2 and +2, based upon the balance factor, do any one of the following.

  •  If the balance factor is either –1 or +1, then the tree remains unchanged so stop adjusting.
  • If the balance factor is 0, it indicates that the height of sub-tree is decreased by one, hence, it continues to retrace towards root.
  •  If balance factor is either –2 or +2 it indicates that the node is critical; hence, rotation is applied. If the balance factor of any node is 0, then retrace toward the root.

The time required for the deletion operation is O(log n) as the time required for lookup and adjusting nodes backwards id O(log n) + O(log n).

For the previous example, the AVL tree would be constructed as follows.

  1. On inserting first variable Mike.
    image
  2. On inserting the second variable Alice.
    image
  3. On inserting third element John_smith it requires to apply rotation to balance the tree
    structure.
    image
  4. On inserting the last element F the tree is as shown below in Figure 9.13.
    Figure 9.13

    Figure 9.13

9.5.4  Hash Tables

The data structures discussed so far has the time complexity that is expressed as a function of n. As the value of n becomes large, the time requirement also increases. There is a special structure where the operation time is independent of n.

Let there be m locations and n elements whose keys are unique. If m ≥ n, then each element is stored in the table T[m], so that the hash function applied on the key K results in the location Ti. If the location Ti is empty then the element is inserted; otherwise, if it contains an element it applies the second strategy to insert element. When searching for a key element K, in location Ti it would return the element if found, otherwise returns NULL. Sample table is shown in Figure 9.14.

To use the hash table technique, it requires the keys to be unique. Also the range of keys must be bounded with the addresses of the locations.

Note: If keys are not unique, then there are various mechanisms that can be adopted. A simple method is to construct a set of m lists that store the heads of these lists in the direct address table. If the elements in the collection have at least one distinguishing feature other than the key, then the search for the specific element in that collection depends on the maximum number of duplicates.

Figure 9.14

Figure 9.14

Mapping Function

The direct address approach requires a hash function, h(K) to map the key K to one of the location that is in the range (1,m) where 1,m indicates the range of memory address. It is said to be a perfect mapping function if it is one-to-one as it results in search time that is O(1).

It is easy to define such a perfect hash function theoretically but practically it is always not possible. For example, consider (1,m) to be (0,100). To map elements with keys 12, 112, 512, if the hash function is K%100, then all the keys are mapped to the 12th location. When more than one key is mapped to the same location, we call the condition, as collusion. When there are collusions, then more than one element has to be stored in the same location; this is not possible. In such a condition, we apply collusion-resolving techniques.

Handling the Collisions

Techniques have to be applied for resolving collusions for easy insertion and search. The following is the list of collusion-resolving techniques.

  1. chaining,
  2. re-hashing,
    • linear probing(using neighboring slots),
    • quadratic probing,
    • random probing.
  3. overflow areas

Chaining

It is the simplest technique to chain all the collusions in the list attached to the appropriate slot. This method doesn’t require a priori knowledge of how many elements are contained in the collection. This would exhibit poor performance if all the elements are mapped to the same location as it would create a linked list whose time is again proportional to O(n).

Re-Hashing

This technique uses a second hash function when there is a collusion. This is repeated until it finds an empty space in the table. In general, the second function could be the same function with varying parameters constituting a new function. This technique requires applying the same hash function in the same order for searching of insertion. It also involves overhead in finding the elements and requires more hash functions.

a. Linear probing

Simple rehash function that can be chosen is +1 of –1, that is, looking in the consecutive locations until a free slot is found as shown in Figure 9.15. It is easy to implement.

Figure 9.15

Figure 9.15

b. Random probing

This technique uses a random function to find the next location to insert the element. If the location has an element, then the random function is applied until a free slot is found. This ensures the proper utilization of the complete table as the random function generates the index which ranges uniformly. The problem is, the time taken by the random function.

c. Quadratic probing

In quadratic probing when a collusion occurs, secondary hash function is used to map the key to address. The advantage of this technique is that the address obtained by secondary function is distributed quadratically.

Address = h(key) + c i2 on ith re-hash.

(A more complex function of i may also be used to have better performance.)

This re-hashing scheme uses the space within the allocated table avoiding the overhead of maintaining the linked list. To apply the approach it is required to know the items that are to be stored in advance. At the same time it adds up a drawback of creating collusions for other valid keys as the table space is pre-occupied by the elements that caused the collusion earlier.

Overflow area

The table is divided into two sections in this technique. One is the primary area to which the keys are mapped and overflow area to take care of collusions as shown in Figure 9.16.

Figure 9.16

Figure 9.16

Whenever collusion occurs, the space in overflow area is used for the new entry. This location is linked to the primary table space. This appears to be similar to chaining but there is slight difference in the allocation of table space. Here the extra space is allocated along the actual table; hence, it provides faster access. It is essential to know the size of elements before allocation of space for table

It is possible to design the system with multiple overflow tables, or with a mechanism for handling overflow out of the overflow area which provides flexibility without losing the advantage of the overflow scheme.

The following table gives the summary of the hash table organization:

 

Organization
Advantages
Disadvantages

Chaining

Unlimited number of elements

Unlimited number of collisions

Overhead of multiple linked lists

Re-hashing

Fast re-hashing

Fast access through use of main table space

Maximum number of elements must be known

Multiple collisions may become probable

Overflow area

Fast access

Collisions don’t use primary table space

Two parameters which govern performance need to be estimated

Example: Let the variables names be

      Mike, Alice: Integer;

      John_Smith: Integer;

      F: Float := 1.0;

Let the hash function be chosen as the sum of ASCII representation of each alphabet and let the table size be 10. We use linear probing to overcome the collusion.

Solution:

The result of hash function on each variable is shown in the table below.

image

The insertions are done as shown in Figure 9.17.

Figure 9.17

Figure 9.17

9.6  Block Structured Language

Block structured languages comprise a class of high-level languages in which a program is made up of blocks, which may include nested blocks as components, such nesting being repeated to any depth. A block is a group of statements that are preceded by declarations of variables that are visible throughout the block and the nested blocks. These declarations are invisible if the inner blocks have the same variables declared. Once the scope of the inner block is completed, the variables of the outer block become effective. Variables are said to have nested scopes. The concept of block structure was first introduced in the Algol family of languages. The symbol table organization in block structured languages is complex, compared to the structured languages. It requires the additional information to be stored at every block entry and exit. Let us consider the following example shown in Figure 9.18.

This program contains four blocks. B1 is main that has two inner blocks B2 and B3. Block B3 has another inner block B4. On execution first the main is called, which invokes the function fun1; fun1 in turn calls fun2, which includes B4. On completion of B4, it executes the remaining part of fun2(). On completion fun2() it returns to next statement of Call fun2() in fun1(). On completion of fun1() it executes the next statement of call fun1() in main.

Figure 9.18

Figure 9.18

During execution, these two blocks behave differently, but during compilation both types of blocks require similar types of processing. During the compilation, at the block entry, a sub-table should be created for new variables using the set operation. For the variables declared with the same name in both inner block and outer block, care should be taken, so that the variables of the outer block are inactive and the variables in the inner block are active. At the block exit, these entries should be deleted using the reset operation. The following is the trace of compilation with set and reset operations at every block.

 

B1 entry:

Set operation is performed and no variable is either active or inactive.

B2 entry:

Set operation is performed and at this moment variables a, b, name and fun1 are active. No variable is inactive.

B2 exit:

Reset operation is performed, a, b, name, fun1, x and a are active. No variable is inactive.

B3 entry:

Set operation is performed, a, b, name, fun1 and fun2 are active. x and a are inactive.

B4 entry:

Set operation is performed, a, b, name, fun1, fun2 and y are active. x and a are inactive.

B4 exit:

Reset operation is performed, a, b, name, fun1, fun2, y, F and test1 are active. x and a are inactive.

B3 exit:

Reset operation is performed, a, b, name, fun1, fun2 and y are active. x, a, F and test1 are inactive.

B1 exit:

Reset operation is performed, a, b, name, fun1 and fun2 are active. x, a, F, test1 and y are inactive.

End of compilation: All variables a, b, name, fun1, fun2, x, a, F, test1 and y are inactive.

Note: In block B2 both the instances of a are active but the lookup operation should return the attributes of the recent instance of a. Hence, the best suitable data structure that can be used is a stack.

9.6.1  Stack Symbol Tables

In this organization, records containing the attributes of the variables are stacked as they are encountered. At the block exit these records are deleted since they are not required outside the block. This organization contains two stacks—one that holds the records and the other that holds the block index details. The four operations performed on the table are explained below.

Set: This operation generates a new block index entry at the top of block index table, which corresponds to the top of the symbol stack. This entry marks the start of the variable in the new block.

Reset: This operation removes all the records corresponding to the current completed block. This corresponds to setting the top in symbol stack to the value pointed by the block index and popping the current top in block index.

Insert: This operation is simple and involves in adding the new record on top of symbol stack. This operation requires examining that no duplicates exist in the same block.

Look up: This operation is similar to the linear search of the table from the top to bottom. It searches for the variable that is the latest declaration.

For the above example, the following Figure 9.19 shows how the variable information is added or removed after the entry/exit of every block.

image
image
image
image
Figure 9.19

Figure 9.19

9.6.2  Stack-Implemented Tree-structured Symbol Tables

Tree structured symbol table organization can be done in two different forms for block structured languages. The first approach is using a single tree for all the variables. This involves removal of records for a block, when the compilation of the block is completed. Since the records of all the blocks are merged as one tree, it requires a complex procedure to address the required records while applying the operations on the tree.

Insertions are always done as the leaf node; care should be taken while performing a lookup operation to ensure that the reference is for the current block. Every deletion operation involves the following steps.

  1. Locate the position of the record in the tree.
  2. Remove the record from the tree and adjust the links to bypass the node.
  3.  Rebalance the tree if the deletion causes the tree unbalanced.

Note: Single tree structure may not be suited to compile the nested languages.

The second approach is to construct a forest of trees where each block has allocated its own tree—structured table. When the compilation of the block is complete, the entire tree for that block is deleted.

In this organization, the node for each record is associated with two special pointers along with all the attributes—one left pointer to point the left node and right pointer to point to right node. The symbol table is maintained as a stack. When the block is entered during compilation, the value of top of stack table is stored at the top of the block index table. As decelerations are encountered, records are inserted at the top of the symbol table and rebalanced if the insertions make it unbalanced.

A lookup operation must ensure that the latest occurrence is located. The search must begin at the tree structure for the last block to be entered and proceed down the block index table till it points the root of the tree for the first block entered. For example, to lookup for variable “a” in Figure 9.20B, it first starts the search at the root pointed by the top of the block index. The top points to location 4, compares with it, and searches in the left sub-tree until it is found and returns the index as 5. When the search is for b, the sub-tree pointed by the top of block index returns null; hence a search is made in the sub-tree pointed by the (top-1) and returns the location is at 1.

The following figure shows the operations on every entry and exit of the block.

image
Figure 9.20

Figure 9.20

9.6.3  Stack-Implemented Hash-Structured Symbol Table

Applying hashing technique is complex for block structured languages, as it requires some techniques to preserve the information regarding the variables of same block. This is achieved by using an intermediate hash table. The hash table stores the link to the location in the stack symbol table where the variable is stored. The stack symbol table stores the information of the variable along with the link, to the location of the variable which maps to the same location in hash table. The block index table stores the starting location of the variables on the current block. The operations performed are also complex.

 

Set: On every block entry, the current top of the stack symbol table is stored in block index table.

 

Insert: First the hash function is applied on the key

  • If it maps to a location with no collusion, then the variable information is stored in the current top of stack symbol table and that index is stored in hash table.
  • If it maps to a location with collusion, then the index stored in the hash table is stored along with the variable information in the current top of the stack symbol table and this index is updated in the hash table. This enables to store the variable information without losing the information of the variable previously inserted.

Lookup: First the hash function is applied on the key,

  • If the hash table has null, then return null.
  • If it points to the some location in the stack symbol table,
    •  If it is the required variable, return the index if the index is greater than the current top of the block index.
    • Otherwise, use the link pointer to go to next location and perform step a.

Reset: Delete all the variables from the stack symbol table whose index is greater than or equal to the current top in block index. While deleting the variable, it requires the following modifications.

  • If the link field of the variable is null, then delete the variable and set the index in hash table to null which points it.
  • If the link field is not null, then store this index in the hash table and delete the variable information.

Example: The symbol table organization for block structured languages using hashing technique is shown below.

Let the hash function be chosen as the sum of ASCII representation of each alphabet and let the table size be 10. We use linear probing to overcome the collusion.

Solution:

The result of hash function on each variable is shown in the table below. The Figure 9.21 shows the content of hash table after set and insert operation in each block entry.

image

Block B1: Four variables are inserted a, b, name, and fun1 where a and name map to the 7th index and b and fun1 map to the 8th index in the hash table.

Figure 9.21

Figure 9.21

Block B2: Two variables x and a are inserted where a is mapped to the 7th index and x is mapped to the 0th index in the hash table.

image

Block B3: Two variables fun2 and y are inserted where fun2 is mapped to the 9th index and y is mapped to the 1st index in the hash table.

image

Block B4: Two variables F and test are inserted where F is mapped to the 9th index and test1 is mapped to the 1st index in the hash table.

Figure 9.21

Figure 9.21

Summary

  • A symbol table is created during the lexical phase.
  • The indirect scheme permits the size of the name field of the symbol table entry itself to remain a constant.
  • In non-block structured languages the operations performed on symbol table are insert and lookup.
  • In block structured languages the operations performed on symbol table are set, reset, insert, and lookup.
  • Linear structures like arrays and linked list are simple to implement but performance is poor.
  • A search tree is a more efficient approach to symbol table organization.
  • The expected time needed to enter n names and to make m queries is proportional to (m + n) log2n.
  • Balanced trees are more suitable than binary search trees.
  • Performance of hash-based technique is independent of number of elements in the list.
  • Insert (s,t) performs the insertion of string s and token t and returns the index of this new entry.
  • Lookup(s) finds for string S, if found returns the index, 0 otherwise.
  • Set operation is performed at every block entry and it stores the top of the variable index table in the stack table.
  • Reset operation is performed at every block exit. It removes the variable list information of the block it exited and the top of the stack table.

Fill in the Blanks

  1. The symbol table is created during __________ phase.
  2. Hierarchical list gives performance proportional to __________.
  3. For __________ the entries are made when the syntactic role played by the name is discovered.
  4.  __________ is used mainly to distinguish the attributes of one variable with the other categorized as same token type.
  5.  __________ performs the insertion of string s and token t and returns the index of this new entry.
  6. __________ finds for string S, if found returns the index, 0 otherwise.
  7. __________ is performed on every block entry to mark the beginning of new scope of variables declared in that block.
  8. __________ is performed at every block exit, to remove all declarations inside this scope and the details of the block.
  9. Set and reset operations are performed in Block structured languages to keep __________ of the variables.
  10. In __________ the variables can be used without declaration.
  11. __________ operation has the overhead of moving the elements to find the place to insert new element.
  12. Balance factor bf of every node in an AVL tree is associated with a value __________.
  13. If the node is critical to the right and the right child is right heavy we apply __________.
  14. If the node is critical to the right and the right child is left heavy we apply __________.
  15. If the node is critical to the left and the left child is left heavy we apply __________.
  16. If the node is critical to the left and the left child is right heavy we apply __________.
  17. If the balance factor becomes __________ then the sub-tree is said to be unbalanced.
  18. The range of the key determines the size of the __________ and may be too large to be practical.
  19. When more than one key is mapped to the same location we say __________ has occurred.
  20. The concept of block structure was first introduced in the __________ family of languages.

Objective Question Bank

  1. The basic two operations that are often performed with the symbol table are __________.
    • set and reset
    • insert and lookup
    • set and insert
    • reset and lookup
  2.  __________ schemes provide better performance for greater programming effort and space overhead.
    • Arrays
    • Linked list
    • Trees
    • Hashing
  3. In case of __________ there in only one instance of the variable declaration and its scope is throughout the program.
    • non-block structured languages
    • block structured languages
    • object oriented languages
    • All of the above
  4. In __________ the variables may be re-declared and its scope is within the block.
    • non-block structured languages.
    • block structured languages.
    • object oriented languages.
    • All of the above
  5. __________ performs the insertion of string s and token t and returns the index of this new entry.
    • Lookup
    • Set
    • Insert
    • Reset
  6. __________ finds for attributes of the variable in symbol table.
    • Lookup
    • Set
    • Insert
    • Reset
  7. __________ operation marks the scope of variables in the block.
    • Insert
    • Lookup
    • Set
    • Reset
  8. __________ operation removes all variable declarations inside the scope of a block on exit.
    • Insert
    • Lookup
    • Set
    • Reset
  9. __________ operations are performed in block structured languages to keep scope information of the variables.
    • Set and reset
    • Insert and lookup
    • Set and insert
    • Reset and lookup
  10. In ordered symbol table, the entries in the table are lexically ordered on the __________.
    • variable names
    • variable size
    • variable type
    • All.
  11. __________ is applied if the node is critical to the right and the right child is right heavy.
    • Right - Right rotation.
    • Right - Left rotation
    • Left - Left rotation
    • Left - Right rotation
  12. __________ is applied if the node is critical to the right and the right child is left heavy.
    • Right - Right rotation
    • Right - Left rotation
    • Left - Left rotation
    • Left - Right rotation
  13. __________ is applied if the node is critical to the left and the left child is left heavy.
    • Right - Right rotation
    • Right - Left rotation
    • Left - Left rotation
    • Left - Right rotation
  14. __________ is applied if the node is critical to the left and the left child is right heavy.
    • Right - Right rotation
    • Right - Left rotation
    • Left - Left rotation
    • Left - Right rotation
  15. The __________ requires that the function, h(k) to map each key k to one address that is in the range (l,m).
    • direct address approach.
    • indirect address approach.
    • Both.
    • None.
  16. The __________ technique uses a second hashing operation when there is a collision.
    • quadratic hashing.
    • linear hashing.
    • re-hashing.
    • double hashing.
  17. The concept of block structure was first introduced in the __________ family of languages.
    • Fortran.
    • Pascal
    • Lisp.
    • Algol.

Exercises

    • What is the use of the symbol table in the compilation process? List out various attributes stored in the symbol table.
    • Explain the different schemes of storing the name attribute in symbol table.
    • Write about the importance of symbol table.
    • Explain the importance of each attribute stored in symbol table.
  1. Explain the importance and format of storing the following attributes in symbol table.
    • Variable name.
    • Variable type.
    • Size.
    • Address of variable.
  2. List the various data structures that can be used to organize a symbol table. Compare the performance.
    • Explain symbol table organization as arrays and linked list.
    • Construct ordered array list for the variable in the following program.

     

    int main()

    {

    int a1, a2, c1, c2;

    char b1;

    float d1, d2;

          ----

          ----

    }

    • Explain the hash table organization of symbol table.
    • List the advantage of the hash table over other data structures.
    • Explain about block and non-block structured languages with example.
    • List the operations in block structured language.
    • What is the use of the symbol table in compilation process? List out the various attributes stored in the symbol table.
    • Explain dynamic storage allocation.
  3. Explain symbol table organization using hash tables. Construct hash-based structure for symbol table for the variable in the following program.

     

    int main()

    {

    int a1, a2, c1,c2;

    char b1;

    float d1, d2;

          ----

          ----

    }

  4.  Explain symbol table organization in trees. Construct the tree structure for symbol table for the variable in the following program.

     

    int main()

    {

    int a1, a2, c1, c2;

    char b1;

    float d1, d2;

          ----

          ----

    }

  5. Explain symbol table organization using hash tables. With an example show the symbol table organization for block structured language.
  6. What is a symbol table? What is its use? What are the different ways of organizing data in symbol table?
    1. Explain tree-structured symbol tables with example.
    2. What is the main criterion used in comparing different symbol table organizations?
  7. What is the use of the symbol table in compilation process? List out various attributes stored in the symbol table.
    • Explain the importance of each attribute stored in symbol table.
    • Compare the performance of different symbol table organization.
    • Construct ordered array list for the variable in the following program.

       

      int main()

      {

      int a1, a2, c1, c2;

      char b1;

      float d1, d2;

            ----

            ----

      }

    • Explain about block and non-block structured languages with example.
  8. Explain the operations in block structured languages with examples.
    1. a. Set
    2. b. Reset
    3. c. Lookup
    4. d. Insert

Key for Fill in the Blanks

  1. Lexical analysis.
  2. n(log (n)) + e (log (n)).
  3. Block structured languages.
  4. Lexeme.
  5. Insert.
  6. Lookup.
  7. Set operation.
  8. Reset.
  9. Scope information.
  10. FORTRAN.
  11. Insert.
  12. –1, 0, or +1.
  13. Right – Right rotation
  14. Right – Left rotation.
  15. Left – Left rotation.
  16. Left – Right rotation.
  17. –2 or +2
  18. direct address table.
  19. collusion
  20. Algol.

Key for Objective Question Bank

  1. b
  2. d
  3. a
  4. b
  5. c
  6. a
  7. c
  8. d
  9. a
  10. a
  11. a
  12. b
  13. c
  14. d
  15. a
  16. c
  17. d
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.157.190