Chapter 5

Bottom-Up Parsers

Bottom-up parsing is a more general parsing technique when compared with top-down parsing. The widely used method in practice is bottom-up parsing. Examples of bottom-up parsers are YACC and Bison, which are automatic parser generators.

CHAPTER OUTLINE

Bottom-up parsing is the most widely used parsing technique used in most of the automatic parser generators. In this chapter, we discuss what is shift reduce (SR) parsing and the types of different SR parsers. We also discuss in detail parsing algorithms of different bottom-up parsers. Given a grammar, how to test whether it is suitable for any type of SR parser is also discussed. Finally, comparison of all the parsers is described.

5.1   Bottom-Up Parsing

Bottom-up parsing is an attempt to construct a parse tree for the given input string in the bottom-up manner. It starts construction of the tree starting at the leaves (the bottom) and working up toward the root (the top). Bottom-up parsing is defined as an attempt to reduce the input string w to the start symbol of the grammar by tracing out the rightmost derivation of w in reverse. This is analogous to constructing a parse tree for a given input string “w” in bottom-up manner by starting with the leaves and proceeding toward the root.

Given a grammar “G” and an input string w, bottom-up parser starts construction of parse tree with input string w. At any time, it identifies a substring that matches with the right hand side of a production of the grammar. This substring is replaced by the left-hand side nonterminal of the grammar. If this replacement gives the generation of sentential form that is one step along the reverse of the rightmost derivation. This process of detecting the substring that matches with the right hand side of production and replacing that substring by the left hand side nonterminal is continued until the given string is reduced to the start symbol S. Let us understand the process with the following example.

 

Consider the grammar:

      S → aABe

      A → Abc | b

      B → d

The sentence “abbcde” can be reduced to “S” as follows shown in Figure 5.1.

      abbcde

      aAbcde (replace b by A using A → b)

      aAde (replace Abc by A using A → Abc)

      aABe (replace d by B using B → d)

      S (replace aABe by S using S → aABe)

 

The process of replacing a substring by a nonterminal in bottom-up parsing is called reduction.

Bottom-up parsing can be viewed as tracing out the rightmost derivation in reverse. Look at the reductions in the above example:

 

abbcde aAbcde aAde aABe S
image

Figure 5.1 Reducing the String “abbcde”

This is nothing but the reverse of the rightmost derivation. The reason why bottom-up parser traces out the rightmost derivation in reverse but not leftmost is because the parser scans the input string w from left to right, one symbol at a time. To trace out in reverse by scanning from left to right, it is possible to reduce only with rightmost derivation.

So the task of a bottom-up parser at any time is to identify a substring that matches with the right hand side of the production, if that substring is replaced by left hand side nonterminal that should give one step along the reverse of rightmost derivation.

In bottom-up parsing, finding a substring that matches with the right hand side of a rule as well as its position in the current sentential form is very important. In order to take care of both factors into account, let us define handle.

5.2  Handle

If S → α A β ⇒ αƴβ, then A → ƴ is a handle of αƴβ, in the position following ƴ. Hence, handle can be defined as: A handle of a right sentential form ƴ is a production A → β and the position of β in ƴ. The string β will be found and replaced by A to produce the previous right sentential form in the rightmost derivation of ƴ. Handle is nothing but a substring that matches the right hand side of a production, which when reduced to nonterminal gives one step along the reverse of the rightmost derivation.

Consider the following example:

 

Expr → Expr + Expr | Expr * Expr | id

The rightmost derivation for the string “id + id * id” is

 

Expr ⇒ Expr + Expr

      ⇒ Expr + Expr * Expr

      ⇒ Expr +Expr * id

      ⇒ Expr + id * id

      ⇒ id + id * id

 

The handles of the sentential forms occurring in the above derivation are shown below in Table 5.1.

 

Table 5.1 Handles of the Right Sentential Form

Sentential form
Handle

id + id * id

Expr → id at the first id

Expr + id * id

Expr → id at the position following +

Expr + Expr * id

Expr → id at the position following *

Expr + Expr * Expr

Expr → Expr * Expr at the position following +

Expr + Expr

Expr → Expr * Expr at the position preceding the end marker

Bottom-up parsing can be described as an attempt to detect handle and reduce the handle. Reducing the handle is even called handle pruning. The difficulty in bottom-up parsing is detecting the handle at any time in parsing.

5.3   Why the Name SR Parser

Bottom-up parsers are also called SR parsers. To understand why they are called SR parser, let us look at the way a bottom-up parser works.

Given a grammar Expr → Expr + Expr | Expr * Expr | id & input string w = “id + id * id,” let us look at bottom-up parsing. Parser takes the string w and reads, symbol by symbol from left to right. Whatever it reads it should remember till a handle is recognized. So whatever is read will be stored on an auxiliary memory called “stack.” It reads symbol by symbol and stores/pushes it on to the stack until a handle is detected. This process of reading the symbol and pushing on to the stack is called “shift action.” Once a handle is detected, the parser reduces the handle by nonterminal of the production. This is called “reduce action.” So at any time, the task performed by a bottom-up parser is to read the symbol and shift the symbol until a handle is detected, when a handle is detected it performs reduce action. So the main actions performed by a parser are shift or reduce. That is why it is called SR parser. This shift/reduce action is performed until the parser halts. Halting may occur in two situations—either on successful completion or on the occurrence of an error.

A simple bottom-up parsing technique using a stack based implementation is as follows:

  1. Initially stack contains only the sentinel $, and input buffer contains the input string w$.
  2. While stack not equal to $S do
    • While there is no handle at the top of stack, do shift input buffer(read symbol from L-R) and push the symbol onto stack
    • if there is a handle on top of stack, then pop the handle and reduce the handle with its nonterminal and push it onto stack
  3. Halt.

    Consider the grammar:

    S → aABe

    A → Abc | b

    B → d

The action performed by parser while parsing the input string is shown in Table 5.2.

 

Table 5.2 Shift Reduce Parsing

Stack
Input
Action

$

abbcde$

shift a

$a

bbcde$

shift b

$ab

bcde$

reduce by A → b

$aA

bcde$

shift b

$aAb

cde$

shift c

$aAbc

de$

reduce by A → Abc

$aA

de$

shift d

$aAd

e$

reduce by B → d

$aAB

e$

shift e

$aABe

$

reduce by S → aABe

$S

$

accept

5.4   Types of Bottom-Up Parsers

Bottom-up parsers are broadly classified into two categories as shown in Figure 5.2—operator precedence parser and LR parser. Operator precedence parser is a simple parser, which is preferred for parsing expressions. The most powerful parsers are under the LR category. There are four types of LR parsers: LR(0), simple LR-SLR(1), look ahead LR-LALR(1), and canonical LR- CLR(1). Out of all LR parsers CLR(1)/LR(1) is the most powerful parser; it can be used to parse almost all grammars.

Figure 5.2

Figure 5.2 Types of Bottom-Up Parsers

Out of all bottom-up parsers, LR(0) is less powerful. It can be used to parse only a small class of grammars; hence, it is practically not used. The preferred parsers are SLR(1), LALR(1), and CLR(1). SLR(1) is simple to construct but once again least powerful. Canonical LR(1) is the most powerful; hence its implementation is costly. The power and cost to construct LALR(1) is intermediate between those of SLR(1) and LALR(1). Though LR(0) is not used, it is the basic machine on which all the other three are built. Hence, we also discuss the design of LR(0) parser. Let us look at the design of a simple SR parser, that is, operator precedence parser.

5.5   Operator Precedence Parsing

This is a technique for constructing shift-reduce parsers by hand for a small class of grammars. To construct LL(1) parser, there is a restriction on the grammar; grammar should be free of left recursion and should be left factored. Similarly, to construct operator precedence parser, there is a restriction on the grammar, that is, the grammar must be operator grammar. Let us look at what is operator grammar.

Operator grammar: The grammar that does not contain null production and two adjacent nonterminals on the right hand side of any production is called operator grammar.

 

For example:

 

      E → E B E | id

      B → + | – | =

 

This is not operator grammar. There are no null productions but adjacent nonterminals are not allowed. To construct operator grammar, expand the nonterminal B. Then the resulting grammar is operator grammar.

 

      E → E + E | E * E | E = E | id

 

The resulting grammar is ambiguous grammar since it is operator grammar; hence, it can be used for parser construction. Operator precedence parser does not bother about whether the grammar is ambiguous or not but it must be operator grammar. Operator precedence parser is the only parser that can be constructed even if the given grammar is ambiguous. All other parsers (LL, LR) are constructed only with unambiguous grammars. Now let us see given an ambiguous grammar how to construct operator precedence grammar that parses the string unambiguously.

Example 1: Convert the following grammar to operator grammar.

 

      S → S A S | a

      A → b S b | b assume S is the start symbol.

Solution: There are no null productions but adjacent nonterminals SAS is not allowed. Hence expand the nonterminal A. By substituting for nonterminal A, we get the grammar as follows

 

      S → S b S b S | S b S | a

      A→ b S b | b

 

This is the resulting operator grammar. Here A is the useless symbol. So it can be ignored.

Operator precedence parser mainly works by considering the precedence among the operators. That is why it is called operator precedence parser. Though the given grammar is ambiguous, the parser preserves the actual precedence among the operators in a table called precedence relation table. Like LL(1) table used by LL(1) parser, operator precedence parser uses precedence relation table for parsing the input string. In this table, the actual precedence among two operators is preserved as precedence relation imageor image. So let us understand precedence relations.

5.5.1  Precedence Relations

The following precedence relations are defined across terminals of an operator grammar.

image (a “takes precedence over” b) So a is reduced before b. Ex: image

image (a “yields precedence to” b). So b is reduced before a. Ex: image

image (a “has the same precedence as” b). So both are reduced at the same time.

Ex: (=).

 

For the sentinel symbol $ and any terminal b we define $ < b and b < $.

The precedence relations looks similar to logical relations >, < and ==. But they are not the same. With logical relations whenever a > b then b > a never exits. But with precedence relations both may become true at the same time, that is, a > b and b > a are true at the same time. Later we will discuss the situation where such a condition is satisfied.

5.5.2  Recognizing Handles

The main difficulty in bottom-up parsing is identifying the handle. The precedence relations help the parser in recognizing the handle. Once the precedence information in stored in the precedence table with precedence relations, let us see how to recognize the handle. Let us understand with an example. Consider the grammar E → E + E | E * E | id and input string

 

“ id + id * id.”

 

Parser constructs the parsing table first. Assume that parsing table is available as shown in Table 5.3.

 

Table 5.3 Precedence Relation Table

Table 5.3

To parse the given string, insert the string between two end markers $. Then read symbol by symbol from left to right and insert the precedence relation between two terminals by using the table. Then we get the string as follows:

image

Once precedence relations are inserted, recognition of handle is as follows:

  1. Scan the string from the left end until the first imageis encountered.
  2. Then scan backwards to the left over any imageuntil a imageis encountered.
  3. Everything to the left of the first imageand to the right of the encountered image, including any intervening or surrounding nonterminals is the handle.

We apply the above procedure on string image

Here handle is whatever is included between imageand imageat anytime. This is the main criteria used for recognizing the handle. So in first step of parsing, handles are id, id, id. Once handle is recognized, it is reduced. So after reducing we get the string

image

The resulting string $E + E * E$ is viewed by the parser as $ + * $. That is, nonterminals are ignored by the parser as they are not defined with any precedence. Once again the parser repeats the above procedure on the string until it reduces to the start symbol. This is shown below.

image

The above procedure can be stated as an algorithm. The operator precedence parser uses a stack. It reads the input string into a buffer and appends $ as right end marker. It then pushes $ onto the stack. This is to identify the bottom of the stack. It mainly works by comparing the top of the stack with the look ahead symbol at anytime. Here comparison is in terms of precedence relation and based on this the parsing decision is made by using the following algorithm.

An SR parser mainly performs two actions at any time: shift or reduce until it halts. Halting can be either on successful completion or on an error. So to define parsing algorithm for any SR parser we need to define only two steps—(1) when it prefers a shift action and how it completes a shift action (2) when it prefers a reduce action and how it completes a reduce action. The parsing algorithm of an operator precedence parser is given below.

5.5.3   Parsing Algorithm for Operator Precedence Parser

  1. Initially, stack contains only $ and input buffer contains w$ where w is input string
  2. Repeat forever
    • Let “a” be the top element on stack and “b” is the current element pointed by the input pointer, that is, the look ahead symbol
    • if imageor image, push b onto the stack and increment input pointer. (Shift action)
    • if imagethen (Reduce action) Repeat Pop the stack until the top of the stack is imageto the terminal most recently popped
    • If a = b = $, announces successful completion

Example 2: Parse the input string a + b * c by using the above algorithm.

Solution: The result of each step is described in the following Table 5.4.

 

Table 5.4 Parsing Actions of the Operator Precedence Parser

Stack
Input
Action taken

$

a + b * c$

$a

+ b * c$

shift a because $ imagea

$

+ b * c$

pop a because a image+

$+

b * c$

shift + because $ image+

$+b

* c$

shift b because + imageb

$+

* c$

pop b because b image*

$+*

c$

shift * because + image*

$+*c

$

Shift c because * image; c

$+*

$

pop c because c image$

$+

$

pop * because * image$

$

$

pop + because + image$

$

$

Accept

5.5.4   Construction of the Precedence Relation Table

The actual precedence among the operators is used in constructing the precedence relation table.

If θ1 and θ2 are two operators, the following heuristics are used in constructing the table.

  1. If θ1 has higher precedence than θ2 then make the. entries as imageand image. For example, * has higher precedence than + so define the relations between them as imageas well as image.
  2. If θ1 and θ2 are of equal precedence, then consider associativity.
    • If they are left associative, define relation as imageand image.

      For example, take expression a + b – c + d. Both + & – are left associative. So here first + is to be reduced. Whichever is to be reduced first, insert that between imageand image. Here to ensure left associativity, the relation must be imageand image, i.e.image. Here θ1 is +ve and θ2 is –ve, i.e. imageUnless we define the relation as imageand image, left associativity cannot be obtained.

    • If they are right associative, define relation as imageand image.

      For example, consider the expression a ↑ b ↑ c ↑ d. Assume ↑ is right associative. So here right ↑ is to be reduced then second ↑. Whichever is to be reduced first, insert that between imageand image. Here to ensure right associativity, the relation must be imageand imagei.e., imageHere θ1 is ↑ and θ2 is ↑. Hence imageUnless we define the relation as imageand imageright associativity cannot be obtained.

  3. In addition to operators, we find other symbols in an expression like id, (,), $. The precedence relation is defined based on actual precedence among the terminals. Some of the relations are as follows.
    • θ has less precedence than id. ⇒ image and image
    •  θ has high precedence than $. ⇒ imageand image
    • id has high precedence than $. ⇒ imageand image
    • θ and ( are right associative. ⇒ imageand image
    •  θ and ) are left associative. ⇒ image and image

      Other relations areimage

Example 3: Prepare precedence relation table for parsing the input string id + id * id.

Solution: The rows and columns of the table correspond to terminals. Here terminals are id, +, * and $(input end marker). So there will be four rows and four columns for each terminal. To define relations in table, we need to know the actual relations among operators. Here we assume that * has higher precedence than + and both are left associative. By using above procedure we prepare Table 5.5 as shown below.

 

Table 5.5 Precedence Relation Table

Table 5.5

* Example 4: Consider the following grammar

Para → Sentence Rp | Sentence

Rp → imageSentence Rp | Sentence

Sentence → word imageSentence | word

word → letter * word | letter

letter → id

Here “image” is blank space

  • Convert the following grammar into operator form.
  • Define precedence relations among the terminals and show how to use a stack algorithm to parse the string “id * id imageid * id.”

Solution: (a) In the given grammar, para and Rp are not in the operator form because they contain adjacent nonterminals. The remaining productions are in the required form. So to convert para and Rp into the operator form, we need to eliminate adjacent nonterminals “Sentence Rp.” To do this, substitute for Rp in para; then we get the grammar as follows.

 

Para → Sentence imageSentence Rp | Sentence imageSentence | Sentence

Here once again we are getting adjacent nonterminals. So mere substitution may not work. If you look at the grammar “para” is defined as “Sentence Rp” (Para → Sentence Rp). So in the above grammar, replace “Sentence Rp” by “para.” The resulting grammar is

 

Para → Sentence imagepara | Sentence imageSentence | Sentence

Sentence → word imageSentence | word

word → letter * word | letter

letter → id

 

Now this is operator grammar.

 

(b) To prepare the precedence relation table, we need to have the actual precedence among operators. Look at the above grammar; here image,* and id are terminals, id has highest precedence. * has higher precedence than imageas * is defined at a lower level. * and imageare right associative as the productions defining * and imageare right recursive.

 

The precedence relation table will have 4 rows and 4 columns for id, *, +, and $. It is shown in Table 5.6.

 

Table 5.6 Precedence Relation Table

Table 5.6

No two ids appear in any expression. It is an error. So there is no relation between id on id. The same is applicable for b on b also.

5.5.5  Mechanical Method of Constructing Operator Precedence Table

There is another mechanical way of constructing the operator precedence table. This way uses two functions Leading () and Trailing(). Let us look at the functions.

Leading(A) gives set of terminals “a” such that “a” is the leftmost terminal in some string derived from A. It is evaluated by using following rule.

If “a” is in Leading(A) if there is a production of the form A → αaβ where α is ε or single nonterminal.

Leading(A) ={a | A ⇒ αaβ where α is ε or single nonterminal}.

Tailing(A) is set of terminals “a” such that “a” is rightmost in a string derived from A.

It is evaluated by using following rule.

If “a” is in Trailing(A) if there is a production of the form A → αaβ where β is ε or single nonterminal.

Trailing(A) ={a | A ⇒ αaβ where β is ε or single nonterminal}.

For example, consider the following grammar:

 

 

S → aABe

 

 

A → Ab | d

 

 

B → d

 

 

Leading(S) = {a}

Trailing(S) = {e}

 

Leading(A) = {b, d}

Trailing(A) = {b,d}

 

Leading(B) = {d}

Trailing(B) = {d}

5.5.6   Calculating Operator Precedence Relation image

Once leading and trailing of each nonterminal are computed, then the operator precedence table is filled with the relations based on the following rules.

  1. If there is a production A → αaBbβ, where α and β are some strings then set the relation for a and b as image.
  2. If there is a production A → αaBβ, where α and β are some strings then set the relation for a with the elements in leading (B) as image
  3. If there is a production A → αBaβ, where α and β are some strings then set the relation for elements in trailing (B) with a as image
  4. If S is the starting nonterminal then set the relation for $ with elements of leading(S) image
  5. If S is the starting nonterminal then set the relation for elements of leading(S) with $ as image

The algorithm for setting the relation based on the production is given below. After applying the algorithm apply rule 4 and 5 to fill the relation with respect to $.

For each A → X1 X2…Xn do

      For i = 1 to n – 1 do

  1. If Xi and Xi-1 are both terminals then set
    image

    Or if Xi and Xi+2 are terminals, Xi+1 is nonterminal then set

    image
  2. If Xi is terminal and Xi+1 is nonterminal then set
    image
  3. If Xi is nonterminal and Xi+1 is terminal then set
    image

Example 5: Construct precedence relation table for the following grammar.

S → aAcBe

A → Ab | b

B → d

Solution: First calculate leading and trailing of nonterminals.

 

Leading(S) = {a}

Trailing(S) = {e}

 

Leading(A) = {b,d}

Trailing(A) = {b,d}

 

Leading(B) = {d}

Trailing(B) = {d}

 

Now construct the table with the above algorithm.

Consider the first production S → aAcBe,

 Here using rule 1,

      Hence imageand image

According to rule 2, image

      Hence image

According to rule 3, image

      Hence image  

According to rule 4, image

      Hence image

According to rule 5 image

      Hence image

Consider the second production A → Ab

According to rule 3 image

      Hence image

 The table is prepared as shown in Table 5.7.

 

Table 5.7 Precedence Relation Table

Table 5.7

Example 6: Construct the precedence relation table for the following grammar.

S → baXaS | ab

X → Xab | aa

Solution:

 

 

leading(S)= {b,a}

trailing(S) ={a,b}

 

leading(X)= {a}

trailing(X) ={b,a}

Considering production S → baXaS we get the relations as

using rule 1 we get    image
image

Considering production X → Xab we get the relations as

image

Applying rule 4 and 5 we get

image

Operator precedence table for the given grammar is shown in Table 5.8.

 

Table 5.8 Precedence Relation Table

Table 5.8

Note: This grammar is ambiguous grammar as the parser has options to both shift and to reduce.

5.5.7   Error Recovery in Operator Precedence Parser

Blank entries in any parsing table refer to errors. So for each blank entry, have a pointer to subroutine. So that whenever a blank entry is referred, the corresponding subroutine is called. For writing error recovery routines, the compiler designer should have good knowledge about all possible errors. For example, look at the following Table 5.9. e1, e2, e3, and e4 are error recovery routines. They specify how to recover from the error.

 

Table 5.9 Error Recovery Routines in Precedence Relation Table

Table 5.9

Here e1( ) tells that operator is missing between the terminals. So error recovery here can be to insert an operator and issue error message.

e2( ) tells that expression is starting with). So error recovery here can be to delete input symbol and issue an error message.

e3( ) tells that input expression is missing. So error recovery here can be to issue an error message.

e4( ) tells that expression is ending with (. So error recovery here can be to pop the input symbol and issue an error message.

The advantage of operator precedence parser is that it is simple to construct and can be used for parsing expression grammar. It works by considering precedence among operators. Hence, if there is an operator that has different precedence (e.g., unary –), it cannot be handled by this parser. To solve the problem at lexical analysis itself, separate them into two different tokens instead of a single token of two precedence.

Another difficulty with operator precedence parser is that, if operators supported in grammar are more, more space is required for parsing table (as it is n * n table where n is the number of operators/terminals). So to save space, a precedence relation table is converted into another table called the precedence function table.

For example, consider the precedence relation Table 5.10.

 

Table 5.10 Precedence Relation Table

Table 5.10

It can be reduced to a precedence function table as shown below in Table 5.11.

 

Table 5.11 Precedence Function Table

Table 5.11

Here two integer mapping functions “f” and “g” are used for reducing each precedence relation into a numerical value. The actual relation is given by numerical comparison between the two entries.

For example,

If we want precedence relation between * and +.

“*” is a row element, function “f” is used for mapping row values, so take f(*) = 4.

Now “+” is a column element, function “g” is used for mapping column values, so take g(+) = 1.

f(*) > g(+) hence relation is image

If we want precedence relation between + and id.

“+” is a row element, function “f” is used for mapping row values, so take f(+) = 2.

Now “id” is a column element, function “g” is used for mapping column values, so take g(id) = 5.

f(+) < g(id); hence relation is image

5.5.8   Procedure for Converting Precedence Relation Table to Precedence Function Table

Input is precedence relational table.

Step 1: For every terminal, create two symbols fa and ga.

Step 2: Divide the symbols into groups by using imagerelation.

For example, imagethen fa and gb are combined into one group.

      In addition to the above relation, image, then fa, gb, gc are combined into one group.

Step 3: Create a digraph with groups in Step 2 as nodes and edges given by imageor image.

 

For example, imagethen add edge as

image

 

For example, imagethen add edge as

image

Add an edge for each imageor imagerelation in the table.

Step 4: Check the digraph for cycles. If there are any cycles, stop the procedure and conclude that the table cannot be converted to a function table.

Step 5: If there are no cycles, that is, if digraph is acyclic, then length of the longest path from fa gives f(a) for each terminal a.

Example 7: Convert the following relation Table 5.12 to function table.

 

Table 5.12 Precedence Relation Table

Table 5.12

Solution: There are four terminals id,*,+, and $; hence, create 8 symbols fid, gid, f+, g+, f*, g*, f$ and g$. As there are no imagerelations, each symbol is treated as a separate group. Now construct digraph as shown in Figure 5.3 with each symbol as a node, that is, 8 nodes as follows.

To get the precedence function f(id), start from node fid, traverse all possible paths from fid. The path must start with fid and can end anywhere (need not be f$ or g$).

For example, from node fid there are four possible paths as follows:

 

      fid – g$ where path length is 2

      fid – g* – f+ – g$ where path length is 3

      fid – g* –f+ – g+ –f$ where path length is 4

      fid – g+ –f$ where path length is 3

Figure 5.3

Figure 5.3 Digraph for the Precedence Relation Table

Longest is 4 hence f(id) is 4.

Hence the resulting table is shown below in Table 5.13.

 

Table 5.13 Precedence Function Table

Table 5.13

5.6   LR Grammar

A grammar for which we can construct LR parser is called LR grammar. A grammar that can be parsed by a parser after examining k input symbols on each move is called LR(k) grammar. There is a difference between LL and LR grammar. For a grammar to be LR(k) we must be able to recognize the right side of the production having seen from what is derived with k look ahead symbols. In LL(k), the task is simple. It should recognize the use of a production by seeing only the first k input symbol. LR parsers can describe more languages than LL parsers.

5.7   LR Parsers

LR parsers are the most efficient bottom-up parsers and can be implemented for almost any programming language. The class of grammars that can be parsed using LR parsers is a proper superset of the class of grammars that can be parsed with predictive parsers. It is difficult to write/trace LR parsers by hand. Usually a parser generator like yacc (bison) is required to write LR parsers. With such tools, one can write context free grammar and have the generator automatically produce a parser for the grammar. In LR(k) parsing, the first L stands for left-to-right scan of the input buffer, the second R stands for a right-most derivation in reverse, and k stands for the maximum of lookaheads used for taking parsing decisions. If k is omitted, it is assumed to be 1. To construct an LR parser, there is no restriction on the grammar. Unlike LL(1), we do not have to worry about left recursion and left factoring, etc. Even if it is left recursive, you still can start construction without eliminating it.

The LR parser is a bottom-up parser that makes use of DFA. The DFA is used to recognize the set of all viable prefixes by reading the stack from bottom to top. It determines what handle is available. A viable prefix of a right sentential form is a prefix that contains a handle, but no symbol to the right of the handle. Therefore, if a DFA that recognizes viable prefixes is constructed, it can be used to guide the handle selection in the bottom-up parser. Let us understand how an LR parser works.

The LR parser uses a DFA that recognizes viable prefixes to guide the selection of handles. Hence, it must keep track of the states of the DFA. That is why the LR parser stack contains two types of symbols: state symbols used to identify the states of the DFA and grammar symbols. The state symbol on top of stack contains all the information it needs. The parser starts with the initial state of the DFA, I0, on the stack. The parser operates by looking at the next input symbol “a” and state symbol Ii on top of the stack. If there is a transition from the state Ii on “a” in the DFA going to state Ij, then it shifts the symbol “a” followed by the state symbol “j” onto the stack. If there is no transition from the state Ii on input symbol “a” in the DFA and if the state Ii on the top of stack recognizes a viable prefix that contains the handle S→α, then the parser carries out the reduce action by popping off α and pushing S onto the stack. This is equivalent to making a backward transition from state Ii on α in the finite automata and then making a forward transition on A. Every shift action of the parser corresponds to a transition on terminal symbol in the DFA. Therefore, the current state of the DFA and the next input symbol determine whether the parser performs a shifts action or reduce action. Now let us define the LR parsing algorithm. An LR parser or an SR parser mainly performs two actions at any time: shift or reduce until it halts. Halting can be either on successful completion or on an error.

5.8   LR Parsing Algorithm

5.8.1  Task of LR Parser: Detect Handle and Reduce Handle

An LR parser shown in Figure 5.4 uses a stack and input buffer and parsing table. The parsing table is divided into two parts—action and goto. The rows in parsing table correspond to states of DFA that is used in recognizing the handle. What action an LR parser is supposed to take at any state is defined under the action part. The entries in action part would be shift given by Si or reduce ri, or accept. The Goto part contains state number under the nonterminals. Try to understand the structure of the LR parsing table. We shall discuss the procedure for construction later. Look at the example of LR parsing Table 5.14 shown below.

Figure 5.4

Figure 5.4 Model of LR Parser

Table 5.14 LR Parsing Table

Table 5.14

Stack is used to store grammar symbols surrounded by state symbols like smamsm-1Xm-1sm-2am-2…s0 where smsm-1sm-2…s0 are state symbols and am..Xm-1..am-2 are grammar symbols. To take the parsing decisions, always the top of the stack should be a state because rows in parsing table are defined with states. So to start with, the LR parser pushes the start state on to the stack.

Given a grammar and input string to be parsed, the first parser constructs a parsing table. The input string to be parsed is read into the input buffer. An end marker “$” is appended at the end. Initially “$” is pushed on to the stack; this is to identify the bottom of the stack. It then pushes the start state of DFA, that is, 0 onto the stack. The input pointer “ip” points to the first symbol in the input buffer.

Parsing algorithm:

Let “X” be the state on top of the stack and “a” be the symbol pointed by “ip.”

Action part of parsing table is used to take parsing decisions as follows.

Repeat for ever

  1. If action[X,a] = Si, then (Shift action)

     Push “a” then state I onto stack. Advance the input pointer to the next symbol.

  2. If action[X,a] = ri, then (Reduce action)

    {

          Take ri, that is, the ith production from the grammar. Let ri is A → β.

          Pop 2 * | β | symbols from stack and replace them by nonterminal A.

          If Xi is the state below the nonterminal A, then push

          goto[Xi,A] onto the stack. Output the production A → β

    }

          Now continue parsing.

  3. If action[X,a] = accept, then (Successful completion)

All the LR parsers use the same parsing algorithm. Let us understand the algorithm with an example.

Example 8: Parse the input string “aabb” using the grammar

 

S → AA, A → aA | b

Solution: The LR parsing is shown in Table 5.15 using the given grammar.

 

Table 5.15 SLR(1) Parsing Table

Table 5.15

Parser compares the top stack state “s” with the look ahead symbol “a.” It refers to action part of parsing table, and performs action defined under terminal “a” and state “s.”

The set of actions performed by the parser is shown in Table 5.16.

All the four LR parsers—LR(0), SLR(1), LALR(1), and CLR(1)—use the algorithm mentioned above for parsing. They differ only in construction of the parsing table. Let us look at the parsing table construction. Most of the procedure for the construction of the parsing table is common for all the four parsers. Hence, we first discuss the common procedure, then we take up specific parser design separately. The general procedure for all four LR parsers parsing table construction is as follows:

 

Table 5.16 LR Parsing Actions

Stack Input Action performed

$

aabb$

Push state 0

$0

aabb$

S3: shift “a” then 3, increment ip.

$0a3

abb$

S3: shift “a” then 3, increment ip.

$0a3a3

bb$

S4: shift “b” then 4, increment ip.

$0a3a3b4

b$

reduce by r3, i.e., A → b.pop 2 symbol, replace by A, push goto[3,A] = 6

$0a3a3A6

b$

reduce by r2, i.e., A → aA.pop 4 symbol, replace by A, push goto[3,A] = 6

$0a3A6

$

reduce by r2, i.e., A → aA.pop 4 symbol, replace by A, push goto[0,A] = 2

$0A2

$

S4: shift “b” then 4, increment ip.

$0A2b4

$

reduce by r3, i.e., A → b.pop 2 symbol, replace by A, push goto[2,A] = 5

$0A2A5

$

reduce by r1, i.e., S→AA.pop 4 symbol, replace by A, push goto[0,S] = 1

$0S1

accept

5.9   Construction of the LR Parsing Table

  1. Given grammar, take augmented grammar.
  2. Create canonical collection of LR items.
  3. Draw DFA and prepare table.

Here the term LR items are different for different types of the LR parser.

image

If you want to define the construction of the SLR(1)/LR(0) parsing table, in the above procedure, change the word "LR itemsz" by LR(0) items.

  1. Given grammar, take augmented grammar.
  2. Create canonical collection of LR(0)items.
  3. Draw DFA with the set of items and the prepare table.

If it is CLR(1) /LALR(1), change the word "LR items" by "LR(1) items."

  1. Given grammar, take augmented grammar.
  2. Create canonical collection of LR(1)items.
  3. Draw DFA and the prepare table.

Since most of the procedure is common, let us look at the common procedure step by step.

5.9.1  Augmented Grammar

Given a grammar "G" with "S" as the start symbol, if we want to change start symbol "S" to "S," then add a rule to the grammar as S" → S. The grammar with the newly augmented production is called augmented grammar. So writing augmented grammar is easy. But a question may rise—what is the use for augmented production. This rule helps the parser to understand when to stop parsing and announce the successful completion of the process. LR parsing is a bottom-up parsing where it starts with the input string, performs a series of reductions until the string is reduced to start symbol. So ultimate reduction is reducing by start symbol. If we take reducing by the start symbol as final reduction, sometimes there is confusion for the parser because of two reasons. (1) The start symbol may appear on the right hand side of any rule also because of which reduction by S can be used in between not only at the end. (2) There is no restriction that the start symbol should be defined with only one string. It can have more than one alternative definition; in such a case, there is a dilemma in which one to consider final reduction. That's why whether the start symbol has one or more rules or is occurring on the right hand side of any rule, there is no loss in appending augmented production like S" → S, which unambiguously determines when to stop parsing and announce successful completion.

Hence, given a grammar "G" with "E" as start symbol, write the augmented grammar as G together with E" → E.

Here, we first discuss the construction of LR(0) and SLR(1). The next step in parsing table construction is creating canonical collection of LR(0) items. First let us look at what is LR(0) item or item in simple.

5.9.2  LR(0) Item

A production with "dot" at any point on the right hand side of a rule is called LR(0) item. So with "dot," a production is called LR(0) item and without dot as production, "dot" makes the difference between the two. So let us understand the significance of "dot."

A bottom-up parser, starts with the input string and reads symbol by symbol from left to right until a handle is detected; once handle is found, it reduces. Handle is nothing but the right hand side of any rule. Until complete handle is read, reduction cannot be performed. So "dot" tells us how far the right hand side is seen by the parser at any time.

For example, if there is a production X → abc. Here handle is "abc." Parser will not read "abc" at once. It first reads "a," which will be indicated by item X → a•bc. Once it reads b then it is indicated by item X → a • bc. So the sequence of items generated while recognizing handle "abc" is as follows:

 

      X → • abc ..................not yet read anything

      X → a • bc .................read "a" yet to read "bc"

      X → ab • c .................read "b" yet to read "c"

      X → abc • .................read "c" ready for reducing.

 

An item with "•" at the end is called complete item/final item. Similarly, a production S → ABC yields items as follows:

 

      S → •ABC

      S → A • BC

      S → AB• C

      S → ABC •

 

The production A → ε generates final item A → •. An item can be represented by a pair of integers (a,b) where a is the number of the production and b is the position of the dot.

We have so far studied what is item or a set of items; now, let us see how to construct the LR parser. For constructing the LR(0) parser or SLR(1) parser, we have to create canonical collection of LR(0) items. To create canonical collection of LR(0) items, we use two functions—Closure(0) and "Goto()." Let us understand these functions.

5.9.3  Closure(I)

I is set of LR(0) items like Α→α•Ββ, C→a•. The function closure takes input as a set of items and produces set of LR(0) as output. The function is evaluated using the following two rules.

  1. Initially add every item from input to output.
  2. If Α→α•Ββ is in I where B is nonterminal & B →γ is the rule for B, then add B → • γ to Closure (I). Repeat this for every newly added item.

To evaluate Closure (I), we first add all items from input I to output. With this we get the initial items. In these initial items, verify the grammar symbol that follows "•." If it is a terminal, there is no need to add any new items. But if it is a nonterminal, then take the production for nonterminal and add it to Closure(I) with dot at the beginning.

Example 9: Find Closure (E″ → •E) for the following grammar:

 

      E ″ → E

      E → E + T | T

      T → T * F | F

      F → id | (E)

Solution: First add E ″→ • E. In the newly added item, the grammar symbol that follows dot is a nonterminal, ie., E; so take the rules for E and add dot at the beginning.

We get

 

      Ε″→ • E

      E → • E + T | • T

 

Now repeat rule 2 for the newly added item. "T" is a nonterminal, hence add rule for T with dot at the beginning.

 

      E″ → • E

      E → • E + T | • T

      T → • T * F | • F

 

Now repeat rule 2 for the newly added item. "F" is a nonterminal, hence add rule for F with dot at the beginning.

 

      Ε″ → • E

      E → • E + T | • T

      T → • T * F | • F

      F → • id | • (E) This is closure(E″ → • E).

5.9.4  Goto(I,X)

Where I is a set of item and X is the grammar symbol.

The Goto function basically defines what is the transition/change in the set of items I on seeing the grammar symbol X. This is formally defined as

      goto(I,X) = Closure({A → aX•b | A → a•Xb is in I}),

that is, closure of all the productions of the form A → aX•b such that A → a•Xb is in I.

If I = A → a•Xb, then goto(A → a•Xb, X) is, first find transition in I on X. The grammar symbol X matches with the symbol that follows dot in I, hence change is new item where dot is moved next to X, that is, A → aX• b. Now find the closure of such resulting items. Suppose I is A → aX•A then goto(A → aX•A,X) = φ, that is, no new item is generated. Because the grammar symbol that follows dot in item is A and argument is X, both do not match. Hence, no new item is produced. The evaluation of goto function involves two steps:

  1. Find transition
  2. Apply Closure() on resulting items.

Consider the following example:

Example 10: If I = {E″ →• Ε, E → E • + T} and G is

E → E + T | Τ, T → T * F|F, F → id | (E)

Find goto (I, +).

Solution: First find transition on seeing the symbol "+" in each item. In the first item "E" → • E," there is no match with "+." In second item, that is, "E → E • + T" there is a match with "+"; so the result is E → E + • T. Now apply closure on this. "T" is a nonterminal; so add rules for "T" with dot at the beginning. The result therefore is as follows:

 

I1: E → Ε + • T

      T → • T * F | • F

      F → • id | • (E)

 

Let us see one more example. Find goto(I1,T).

First find transition on seeing "T" in each item. In the first item "Ε → E + • T," there is a match so result is E → E + T •. In second the item, that is, "T → •T * F" there is a match; so result is T → T• * F. So after finding the transition, we get items as follows:

 

      E → E + T •

      T → T • * F

 

Now apply closure on this, "*" is a terminal and so there is no need to add rules. The result therefore is as follows

 

      E → E + T •

      T → T • * F

 

As we have understood closure and goto functions, let us discuss how to create canonical collection of LR(0) items by using these two functions.

5.9.5  Creating Canonical Collection "C" of LR(0) Items

C = {I0,I1,I2,I3......In}

  1. The initial item in C is I0 = closure( augmented production with dot at the beginning). For example, in the above grammar, I0 = closure(E″ → • E )
  2. For each Ii in C and each grammar symbol X in G

    Repeat

     

    while goto(I,X) is not in set C and not a empty set

    add goto(I,X) to C

    until no more new set of items can be added to set C.

The initial item in C is obtained using the closure function. The remaining elements in set C are obtained by the goto function.

Example 11: Construct canonical collection of LR(0) items for grammar.

S″ → S, S → AA, A → aA | b

Solution: I0 = closure(S″ → • S,) = S″ → • S

          S → • AA

          A → • aA |B

      I1 = Goto(I0,S) = S″ → S •

      I2= Goto(I0,A) = S → A • A

        A → • aA | • b

      I3 = Goto (I0,a) = A → a • A

        A → • aA | • b

      I4 = Goto(I0,b) = A → b

 

That's all the transitions from I0. Now apply the goto function on I1. Since I1 has only one final item, there is no need to apply the goto function. Now consider I2

 

      I5 = Goto(I2,A) = S → AA

        Goto(I2,a) = I3, Goto(I2,b) = I4

      I6 = Goto(I3,A) = A → aA

        Goto(I3,a) = I3, Goto(I3,b) = I4

 

As I4, I5 and I6 have only final items, this completes canonical collection.

5.9.6  Construction of DFA with a Set of Items

Now after creating the canonical collection, we draw DFA with a set of items as final states—I0 as the initial state and edges as goto transitions. This deterministic finite automaton recognizes the viable prefixes of G. The state symbol stored on top of the stack is the state the handle recognizing finite automaton would be in if it had read the grammar symbols of the stack from bottom to top. From the canonical collection in the above Example 11, we can draw the DFA as shown in Figure 5.5.

Figure 5.5

Figure 5.5 DFA for Grammar in Example 11

If each state, Ii of DFA is a final state and I0 is the initial state, then DFA recognizes exactly the viable prefixes of grammar.

The final step in the construction of the LR parsing table is preparing the table from DFA. To prepare the LR parsing table, we first need to know the number of rows and columns. For each state "Ii" in DFA, we define a row "i" in the parsing table. For example, consider the above DFA, for that the number of rows in LR table, there are 7 as there are 7 states I0 to I6. Columns are given by terminal in the action part and nonterminals in the goto part. One additional column is defined for "$" in the action part. For the above example, the LR table will have 7 rows, that is, 0 to 6 and 5 columns, that is, a, b, $, S, A. Coming to the table preparation, we need to know how to enter shift entries, goto states, and reduce entries. All the parsers basically differ only in placing reduce entries. The remaining procedure is the same, that is, placing shift entries and goto states are also common for all the parsers. Now let us see how to prepare shift entries and goto information using DFA. In the DFA, transition in any state on terminals yields to shift entries and nonterminal yields to the goto state.

Consider the following transition on terminal 'a',

image

The corresponding shift entry is

image

Consider the following transition on nonterminal A,

image

The corresponding goto state entry is

image

The procedure we have discussed thus far is the common procedure used for all LR parsers except item type. To complete the table the left out part is reduce entries, this is where all the four parsers differ. Hence, let us consider each parser separately for discussing the preparation of reduce entries. Let us start with the simplest of all, that is, LR(0) parser.

5.10   LR(0) Parser

The main characteristic of this parser is that it does not use any look ahead in making the parsing decisions. If you recollect, in LR parsing, parsing decisions (when to go for shift action/reduce action) are made by looking at the top stack state and the look ahead symbol. In LR(0) parser, the top stack state unambiguously determines the action to be performed at anytime without looking at the look ahead symbol.

To enter the reduce entries, once again DFA is used by the parser. In the DFA, it checks for states that contain the final item (an item with dot at the right end). The final item indicates that the entire right hand side is seen by the parser; hence, it is ready for reduce action. Hence, it checks for states that contains final items. If a state Ii contains the final item, then place the reduce entries in the corresponding row "i." If state Ii has a final item A → α• the productions in given grammar is numbered. If the rule A → α is jth rule, then in the row "i" place the reduce entry as rj under all the columns because the columns in the table are look ahead symbols. LR(0) makes parsing decisions independent of look ahead. Hence, place the reduce entry under every column in the action part. For example, consider the following state of DFA:

 

S → AA•

I5

The corresponding reduce entry in table is

image

If the final item is S″ → S•, then place the entry as "acc" instead of reduce. Because reducing by S″ → S• is the accepting state. For example, consider the following state of DFA:

 

S" → S•

I1

The corresponding reduce entry in table is

image

When the top of the stack is state 1, without looking at the input symbol it announces successful completion.

Example 12: Consider the grammar:

  1. S → (L)
  2. S → a
  3. L → S
  4. L → L, S

Prepare the LR(0) parsing table for the above grammar.

Solution: Step 1: Take the augmented grammar as follows:

 

S″→ S, S → (L), S → a

L → S, L → L, S

 

Step 2: Create canonical collection and draw DFA. DFA is shown in Figure 5.6.

Step 3: Table is prepared and is shown in Table 5.17.

Figure 5.6

Figure 5.6 DFA for Grammar in Example 12

 

Table 5.17 LR Parsing Table for Example 12

image

5.10.1  Advantages of the LR(0) Parser

  • It is very simple to construct the LR(0) Parser compared to other LR parsers.
  • Each row in the table defines unique action, that is, either shift action or reduce action or accept.

5.10.2  Disadvantages of the LR(0) Parser

  • The LR(0) Parser can be used to parse a small class of grammars.
  • Look ahead is not used in making parsing decisions.

5.10.3  LR(0) Grammar

The grammar for which there are no multiple entries in the LR(0) parsing table is called LR(0) Grammar. An LR(0) grammar indicates that this grammar is suitable for constructing the LR(0) parser.

Given a grammar, let us see how to check whether it is LR(0) or not. The straightforward method is to construct the LR(0) parsing table for the grammar and check whether there are any multiple entries in the table. No multiple entry in the table indicates that the grammar is LR(0). For example, if grammar has 100 nonterminals and 100 terminals, then to check whether it is LR(0) or not, we need to construct a table with 100 × 100, which is a tedious task. Hence, let us see a simple way of checking whether the grammar is LR(0) or not. The basic criterion used is checking the multiple entries in the table. Let us see how to check the possibility of multiple entries without constructing the table.

5.10.4  Conflicts in Shift-Reduce Parsing

After constructing the DFA, check if any state contains a combination of final and nonfinal item. For example, assume that state I5 has items as follows.

image

If we prepare the corresponding parsing table entries, we get a table as shown below:

image

On seeing the input symbol "b," there is a conflict. The nonfinal item in state I5 indicates that on seeing "b" it may enter to some other state, which may lead to shift action. At the same time, the final item in the same state indicates reduce action. So the same state is indicating two different actions—shift/reduce, that confuses the parser. This conflict is called shift/ reduce conflict. Here, the parser is not able to decide whether to shift or to reduce. Hence, by checking out shift/reduce conflicts, we can check for multiple entries in the table.

For example, the following LR(0) items

 

      A → α • Αβ

image do not have an SR conflict because there is no shift action. The symbol next to dot in nonfinal item is "A" which is a nonterminal. On seeing "A," it enters to another state that corresponds to the goto state but not shift action. There is only reduce action. Hence, no SR conflict at all. So to test SR conflict we test each state of DFA as follows:

 

If there is a combination of final and nonfinal items where grammar symbol next to dot in nonfinal item is a terminal such as

image

Another way of checking the possibility of multiple entries without constructing the table is as follows. After constructing the DFA, check if any state contains a combination of more than one final item. For example assume that state I5 has items as follows:

image

If we prepare the corresponding parsing table entries, we get a table as shown below:

 

Let r1 : S → a     and     r2 : A→b
image

There are two final items in the state I5. A final item indicates reduce action. So same state is indicating two different reduce actions, that confuses the parser. This conflict is called reduce/reduce conflict. Here, the parser is not able to decide whether to reduce by r1 or r2 . Hence, by checking out reduce/reduce conflicts, we can check for multiple entries in the table. To test RR conflict we test each state of DFA as: If there is a combination of more than one final item in any state.

There are two kinds of conflicts:

Shift/reduce conflict: Here, the parser is not able to decide whether to shift or to reduce. Reduce/reduce conflict: Here, the parser cannot decide which sentential form to use for reduction.

To verify whether the given grammar is LR(0) or not, check each state of DFA for shift/reduce or reduce/reduce conflicts. The necessary condition for conflicts is there should be at least one final item. In any state of DFA, with the final item if there is one more nonfinal item (where grammar symbol next to dot must be terminal), then it is shift/reduce conflict. In any state of DFA, if there is more than one nonfinal item then it is reduce/reduce conflict.

Example 13: Check whether the following grammar is LR(0)

 

      S → AA, A→ aA | b

Solution: Look at the DFA constructed for the above grammar in the previous example. Check each state for conflicts. None of the states has a final item along with final item or nonfinal item; hence, no conflicts. So the grammar is LR(0).

Example 14: Check whether the following grammar is LR(0)

 

      S → a | A, A → a

Solution: Take augmented grammar G″ as follows:

  

      S″ → S, S → a | A, A → a

 

Draw the DFA with LR(0) items by using the procedure described in 5.9.6.

State I2 has reduce-reduce conflict as shown in Figure 5.7. On reading input symbol "a," it may not be able to decide whether to reduce by first rule or second rule. Hence, it is not LR(0).

In fact, the grammar is ambiguous grammar. No ambiguous grammar can be LR(0).

Example 15: Check whether the following grammar is LR(0)

 

      E → E + T | Τ, T → a

Solution: Take augmented grammar G″ as follows:

 

      Ε″→ Ε, Ε → E + T | Τ, T → a

 

Draw the DFA with LR(0) items by using the procedure described in 5.9.6.

Look at Figure 5.8. Does state I2 has shift-reduce conflict? No, this is not shift reduce conflict. A shift-reduce conflict means same state specifying different actions. In state I2, the nonfinal item is specifying the shift action. But the final item is not for reduce action. It indicates accept action. There are only shift-reduce or reduce–reduce conflict; accepting state itself should not be a confusion. Remember augmented production is not present in original grammar. This is what we have added. So to avoid such confusion with augmented production, we can append a special symbol at the right end of augmented production like E″→ E#. Now let us construct DFA with this. It is shown in Figure 5.9.

Figure 5.7

Figure 5.7 RR Conflicts in DFA

Figure 5.8

Figure 5.8 Confusion in State I2 for the LR Parser

Figure 5.9

Figure 5.9 DFA with LR(0) Items for Grammar in Example 15

Now there are no conflicts. Hence, it is LR(0).

Now let us understand the power of the LR(0) parser. Consider a simple unambiguous grammar given below. Here the difference between previous example and this grammar is, the operator "+" is right associative.

Example 16: Check whether the following grammar is LR(0) or not

 

      E → T + E | T, Ta

Solution: Take augmented grammar G" as follows:

 

      Ε″ → E, ET + E | T, Ta

 

Draw the DFA with LR(0) items by using the procedure described in 5.9.

Look at Figure 5.10. State S2 has shift-reduce conflict. In this state it may not be able to decide whether to shift or reduce. Hence, it is not LR(0).

Remember, to avoid such conflict, if you add "#" in augmented production as added in previous example, the conflict will not be resolved. Here confusion is not because of augmented production. It is in the original grammar. The above example demonstrates the power of the LR(0) parser. Such a simple unambiguous grammar cannot be parsed by the parser. LR(0) can parse only a very small class of grammars.

Figure 5.10

Figure 5.10 SR ConfIictsin DFA of Grammar in Example 16

Now let us examine why LR(0) is failing here. The main drawback of LR(0) is "0" that is, it does not use any lookaheads (0 look aheads) in making parsing decisions. Hence, while constructing the parsing table, if a state has a final item, we place reduce action under every column. One entry is already placed in each column because of which the possibility of multiple entries will be increased. Hence, the LR(0) parser can parse only few grammars. To overcome this disadvantage, the better parser, that is, simple LR parser, SLR(1) is constructed with one look ahead symbol.

5.11   SLR(1) Parser

The SLR(1) parser is a simple LR parser, which is easy to construct. This is better than LR(0) as it uses a look ahead symbol. The "1" in SLR(1) indicates the number of lookaheads used by the parser. It uses a look ahead given by the follow set. The procedure for SLR(1) parsing table is the same as LR(0); the only difference is in reduce entries. To place reduce entries once again, SLR(1) uses the DFA. It checks if a state has the final item. The state that contains a final item indicates in which row the reduce entries are to be placed. This procedure is the same as LR(0). For example, if state Ii has a final item, we place reduce entries in row "i." But in row "i," finding out columns is different for SLR(1). If it is LR(0), we place under every column, but for SLR(1) it is under the columns given by the follow set. For example, consider the following state of DFA:

 

S →AA•

I5

Assume follow(S) is {$}

The corresponding reduce entry in table is

image

SLR(1) determines the columns by follow of the left hand side nonterminal. To understand how SLR(1) is better than LR(0), consider the above example where LR(0) is failing. Let us construct the SLR(1) parsing table for the same grammar. Ε′ → Ε, E → T + E | Τ, Ta (Figure 5.11).

Figure 5.11

Figure 5.11 DFA for Grammar in Example 16

Consider state 2, the entry in table is as follows

image

There is a Shift-Reduce (SR) conflict in state I2. If this conflict is resolved by SLR(1) parser, then the grammar is SLR(1). To verify if the conflict is resolved or not, let us construct the SLR(1) parsing table.

Follow(E″) = Follow (E) = {$}, Follow(T) = {+,$}

Look at the Table 5.18, row 2; though there is an SR conflict in state I2, there are no multiple entries in row 2. That means conflicts are resolved by the SLR(1) parser. Hence, the given grammar cannot be parsed by LR(0) but can be parsed by SLR(1). That is the power of SLR(1).

 

Table 5.18 SLR(1) Parsing Table

Table 5.18

Example 17: Check whether the following grammar is SLR(1) or not.

      E → E + T | Τ, T → TF | F, F→ F* | a | b

Solution: Take augmented grammar G″ as follows:

      E″ Ε, E → E + T | T, T→ TF | F, F → F* | a | b

Draw the DFA with LR(0) items by using the procedure described in 5.9.6. Look at Figure 5.12.

Figure 5.12

Figure 5.12 DFA for Grammar in Example 17

Observe that there are conflicts in states 2, 3, 7, and 8. The states of LR parser with conflicts are called inadequate states. Because of the conflicts, the grammar is not LR(0). To test whether it is SLR(1) or not, we need to test whether the conflicts are resolved by SLR(1) or not. For this we need not have to construct the complete SLR(1) table. Construct the rows for inadequate states. If there are no multiple entries in that rows, then we can say that the grammar is SLR(1). So let us construct rows for inadequate states, that is, 2, 3, 7, and 8. It is shown in Table 5.19.

 

Table 5.19 SLR(1) Parsing Table for Inadequate States

Table 5.19

Follow(E) = {$, +}, Follow(T) = {$, +, a, b}, Follow(F) = {$, +, a, b, *},

As there are no multiple entries in any row, the grammar is SLR(1).

Example 18: Check whether the following grammar is SLR(1) or not.

 

      SA a A b | B b B a

      A → ε

      B → ε

Solution: Take augmented grammar G″ as follows:

 

        S″ → S, S → AaAb | BbBa, A → ε, B → ε

 

      Create canonical collection of LR(0) items

 

      I0 = {S″ → • S, S → • A a A b, S → • B b B a, A → •, B → •},

      I1= goto(I0, S) = {S″ → S •},

      I2 = goto(I0, A) = {S → A • a A b}

      I3 = goto(I0, B) = {S → B • b B a}

      I4 = goto(I2, a) = {S → A a • A b, A → •}

      I5 = goto(I3, b) = {S → B b • B a, B → •}

      I6 = goto(I4, A) = {S → A a A • b}

      I7 = goto(I5, B) = {S → B b B • a}

      I8 = goto(I6, b) = {S → A a A b •}

      I9 = goto(I7, a) = {S → B b B a •}

 

Draw the DFA with LR(0) items by using the procedure described in 5.9.6.

This is a simple unambiguous grammar, which is LL(1) but not LR(0). Look at Figure 5.13. State I0 has RR conflict. There are no conflicts in states I2 or I6. Let us check for SLR(1) by con-structing table.

Figure 5.13

Figure 5.13 DFA for Grammar in Example 18

Follow(S) = {$}, Follow(A) = {a, b}, Follow(B) = {a, b};

Look at the Table 5.20. Since there are reduce–reduce conflicts in [I1, a] and [I1, b], this grammar is not an SLR(1) grammar. Hence, this grammar is LL(1) but not LR(0) and not SLR(l).

Let us examine why SLR(1) is weak. There are two reasons for the weakness of SLR(l).

  • Though SLR(1) uses look ahead given by the follow set, actually follow sets are larger than actual lookaheads. That leads to more entries than required in parsing table. Hence, SLR(1) parses only a small class of grammars.
  • Look ahead information is not available in any state of parser, it is used independent of any state by the follow set.

To overcome the above disadvantages, the better parsers, that is, LR(1) and LALR(1) are constructed using LR(1) items.

 

LR(1) item = LR(0) item + look ahead terminal

Ex: A→ a•, a where the input symbol "a" is called the "lookahead"

(which is of length 1). That's why it is called LR(1) item.

 

When look ahead is present with the final item, it tells when to perform reduce action. In the above example, LR(1) item indicates that reduce a to A only if the next symbol is "a."

 

Table 5.20 Conflicts in SLR(1) Parsing Table

image

5.12   Canonical LR(1) Parsers CLR(1)/LR(1)

This technique of parsing is the most powerful among all LR parsers. Let us see how to con struct the LR(1) parser. Procedure is the same as SLR(1). Take augmented grammar, create canonical collection of LR(1) items, draw DFA, and prepare table. So to create canonical col lection of LR(1) items, we use closure and goto functions. Earlier, we have defined them for LR(0) items; now let us redefine closure and goto functions for LR(1) items.

5.12.1  Closure(I) Where I Is a Set of LR(1) Items

I is a set of LR(1) items like [Α→α•Ββ, $], [C→a•,a]. The function closure takes input as a set of LR(1) items and produces set of LR(1) as output. The function is evaluated using the following two rules:

  1. Initially add every item from input to output.
  2. If A→ α • Ββ,$ is in I where B is nonterminal & B → γ is the rule for B then add B → γ, First(β$) to Closure (1). Repeat this for every newly added item.

Example 19: Consider the grammar

 

S → CC

C → cC

C → d

Find closure(S″ s → • S, $)

Solution: First add every item from input to output.

 

S″ → •S, $

 

Now because of "S" add rules for S where look ahead is First(S) = {$}

 

S→CC, $

 

Now because of "C," add rules for C where look ahead is First(C$) = {c,d}

 

C→ • cC, c | d

C → d, c | d

 

In the newly added items there are no nonterminals next to dot. So that completes closure().

Result is

 

S″ → •S,$

S → •CC,$

C → •cC, c | d

C → d, c | d

5.12.2  Goto(I,X)

Given the set of all items of the form [A → α • Χβ, a] in I, goto[I,X] = closure ([A → αX · β; a]

The Goto function involves two steps—find transition and apply closure. Earlier we have defined the goto() function for LR(0) items. The same definition still holds good for LR(1) also. But we need to extend the definition for look ahead. The change in look ahead in goto function is as follows:

  • While finding transition look ahead remains the same
  • While finding closure, look ahead may change.

Example 20: Find goto(I0,C) if I0 is as follows

      I0: S″ → •S,$

        SCC,$

        C → • cC, c | d

        C → d, c | d

Solution: First find transition in each item on "C." Result is

 

      S → CC,$

 

Now apply closure on the above item. Here evaluate look ahead as first of grammar symbols next to nonterminal including look ahead.

Because of "C" add rules for C where look ahead is First($)={$}

 

      C → • cC, $

      C →d, $

So result of goto() is

 

      I1:S→ CC,$

        C→ •cC, $

        C→d, $.

Example 21: Find goto(I1,c)

Solution: First find transition in each item on "C." Result is

 

      C → cC, $

 

Now apply closure on the above item.

Because of "C" add rules for C where look ahead is First($) = {$}

 

      C → • c C, $

      C → • d, $

 

So result of goto() is

 

      I2: C → cC,$

        C → •cC, $

        C → •d,$

 

Creating the canonical collection of LR(1) items and drawing the DFA is the same as LR(0).

5.12.3  Creating Canonical Collection "C" of LR(1) Items

C = {I0,I1,I2,I3.....In}

  1. The initial item in C is I0= closure (augmented production with dot at the beginning, $). For example, in the above grammar, I0= closure(E" → • E,$)
  2. For each Ii in C and each grammar symbol X in G

    Repeat

          While goto(I,X) is not empty and not in C

          Add goto(I,X) to C

    Until no more set of items can be added to C.

    Let us understand the procedure with an example.

Example 22: Draw DFA with LR(1) items for the grammar

 

      S→ AA

      A → aA | b

Solution: Take augmented grammar

 

      S″ → S, S → AA, A → aA | b

      I0 = closure(S″ → • S,$ ) =    S″ → • S,$

        S → • A A,$

        A → • a A | b, a | b

      I1 = Goto(I0,S) = S″ → S,$

      I2 = Goto(I0,A) = SAA,$

        A→ •aA | •b,$

      I3 = Goto(I0,a) = AaA, a | b

        A → • aA | b, a | b

      I4 = Goto(I0,b) = A b, a | b

 

That's all the transitions from I0. Now apply the goto function on I1. Since I1 has only the final item, there is no need to apply the goto function. Now consider I2

 

      I5 = Goto(I2,A) = SAA •, $

      I6 = Goto(I2,a)= AaA, $

        A → • aA | . Β, $

      I7 = Goto(I2,b) = Ab •, $

      I8 = Goto(I3,A) = AaA •, a | b

      I9 = Goto(I6,A) = AaA•,$

      Goto(I3,a) = I3, Goto(I3,b) = I4

      Goto(I6,a) = I6, Goto(I6,b) = I7

 

The DFA for the above set of LR(1) items is as shown in Figure 5.14.

Figure 5.14

Figure 5.14 DFA for Grammar in Example 22

5.12.4  Constructing CLR(1) Parsing Table

Once DFA is drawn, by using the DFA we can construct the parsing table. The procedure is similar to SLR(1) the only difference being reduce entries. So let us discuss how CLR(1) prepares reduce entries in the parsing table. SLR(1) uses the follow set but there are no complications with LR(1). Look aheads are already available in item itself. So CLR(1) checks the DFA for final item, if final item is available in state I2 as A→ abc, a | $, then it places reduce entry in the row 2 under columns given by look ahead, that is, a, $ as shown in Table 5.21.

 

Table 5.21 Reduce Entries in the CLR(1) Parsing Table

image

This is how invalid reduction of SLR(1) are avoided by CLR(1). SLR(1) places reduce entry at more places as follow sets are larger than actual look aheads but CLR(1) puts only under actual look aheads. That's why CLR(1) is the most powerful and can parse almost all CFG.

Example 23: Construct the CLR(1) parsing table for the above DFA.

Solution: There are 10 states in DFA I0 to I9. So 10 rows. Columns are a, b, $, in action part and S, A in goto part.

Table 5.22 is the parsing table of the most powerful parser, that is, the CLR(1) Parser.

 

Table 5.22 CLR(1) Parsing Table

Table 5.22

5.12.5  CLR(1) Grammar

If the CLR(1) parser does not contain any multiple entries then grammar is called CLR(1) or LR(1) Grammar. Given a grammar to check whether it is CLR(1) or not, construct DFA with LR(1) items. If there are any conflicts in DFA then it is not CLR(1). So checking for SR or RR conflicts in DFA is very important for verifying whether the grammar is LR(1) or not. Let us review checking SR or RR conflicts with LR(0) and LR(1) items. An example for each case is shown in Table 5.23.

 

Table 5.23 Conflicts with LR Items

 

With LR(0) Items With LR(1) Items

Shift Reduce conflict

A → α • aβ

B → γ •

A → α • aβ, $

B → γ •, a

Reduce Reduce Conflict

A → α •

B → γ •

A → α •,a

B → γ •, a

For example the following LR(1) items

image

There is no SR conflict because the shift action is on "a" and reduce action is on "b." There is no conflict at all.

For example, the following LR(1) items

image

There is no SR conflict because there is no shift action as the symbol next to dot in nonfinal item is "A" nonterminal. On seeing "A" if it enters to another state that corresponds to the goto state but not shift action. There is only reduce action on "b." There is no conflict at all.

For example, the following LR(1) items

image

There is no RR conflict because first reduce action is on "a" and next reduce action is on "b." There is therefore no conflict at all.

So we can test for conflicts in LR(1) items as follows:

SR conflict: If there is final item with nonfinal item where look ahead in final item is same as the terminal next to dot in nonfinal item, then it is an SR conflict.

RR conflict: If there is more than one final item with at least one common look ahead, then it is a RR conflict.

Example 24: Check if the following grammar is LR(1) or not.

 

      S → AA

      A → aA | b

Solution: The DFA for above grammar is shown in Figure 5.14.

As there are no conflicts in any state of DFA, the grammar is LR(1).

The same can be verified with the LR(1) parsing table constructed in Example 23.

No multiple entries in Table 5.19. So it is LR(1).

Compare the SLR(1) and CLR(1) tables for the grammar S → ΑΑ,Α → aA | b. The SLR(1) table is in Example 8 and LR(1) table is in Example 23. Note the difference. In SLR parser table there are 7 rows, whereas in LR(1) there are 10 rows. So we observed that for a given grammar if we construct SLR(1) and CLR(1) tables, table size of LR(1) is more as number of rows in LR(1) is more. This is the main disadvantage of LR(1). Though it is most powerful, it requires more table space; that's why this is not a preferred parser. LALR(1) overcomes this disadvantage; that's why LALR(1) is preferred rather than LR(1). Even most of automatic parser generators use the LALR technique rather than the CLR technique. Let us see how to construct the LALR(1) parser.

5.13   LALR(1) Parser

The procedure to construct the LALR(1) parser is the same as that of CLR(1). Construct DFA with LR(1) items. Now check if any two states are differing by look aheads, then merge them into a single state. For example, consider the DFA with LR(1) items for the grammar

 

      S AA

      A aA | b

 

Look at the DFA shown in Figure 5.15.

Figure 5.15

Figure 5.15 DFA for a Given Grammar

Check for the states that are differing by look ahead. Here I3I6 and I4I7 and I8I9 are such states.

Now in LALR(1) we merge them to a single state like I3 I6 to new state I36, I4 I7 to new state I47, I8I9 to new state I89. To merge the states we take items as they are but add lookaheads as union of two states. For example,

 

      I36 = A → a•A, a | b | $

        A → • aA | • b, a | b | $

      I47 = A → b•, a | b | $

      I89 = Goto(I3,A) = A → aA•, a | b

 

Now the table is constructed with new states with same procedure. It is shown in Table 5.24.

 

Table 5.24 LALR(1) Parsing Table

Table 5.24

This is the LALR(1) parsing table. Compare this with the SLR parsing Table 5.15. What is the difference? Are they same? Yes. Number of states in SLR is the same as LALR. As LALR is merging the states where items are same but look aheads are different into same state. The number of states is reduced compared to CLR(1). CLR(1) treats such states as separate. That's why more states and more number of rows are present in a parsing table. Hence, LR(1) requires more space. As LALR(1) is combining such states together, the number of states are reduced. Hence it requires less table space than CLR(1). But for a given grammar, table size of SLR and LALR is the same but CLR will be large.

Now to test the power of CLR(1) and LALR(1), we can take all the previous examples that failed to be SLR(1). Let us start with the grammar in Example 18.

Example 25: The following grammar is LL(1), not LR(0), not SLR(1)

 

      S → A a A b | B b B a

      A → ε

      B → ε check if it is LR(1) and LALR(1).

Solution: Take augmented grammar G" as follows:

 

      S"→ S, S → AaAb | BbBa, Aε, Bε

 

Create canonical collection of LR(1) items

 

      I0 = S" → • S, $, S → • A a A b, $, S → • B b B a, $ A → •, a, B → •, b

      I1 = goto(I0, S) = {S" → S •, $},

      I2 = goto(I0, A) = {S → A • a A b, $}

      I3 = goto(I0, B) = {S → B • b B a, $}

      I4 = goto(I2, a) = {S → A a • A b, $ A → •, b}

      I5 = goto(I3, b) = {S → B b • B a, $, B → •, a}

      I6 = goto(I4, A) = {S → A a A • b, $}

      I7 = goto(I5, B) = {S → B b B • a, $}

      I8 = goto(I6, b) = {S → A a A b •, $}

      I9 = goto(I7, a) = {S → B b B a •, &}

 

Draw the DFA with LR(0) items by using the procedure described in 5.9.6.

Look at Figure 5.16. State I0 has no conflict. Though there are two final items [A → •, a] and [B→ •, a]. Two look aheads are different. Hence, there is no conflict. The grammar is CLR(1) as there are no conflicts in DFA.

Figure 5.16

Figure 5.16 DFA for Grammar in Example 25

For checking LALR(1), we need to check if any two states are differing by look aheads.

If such states exist, merge them into single states and then check for conflicts. In this example, there are no such states. Hence, this is also LALR(1). The above grammar cannot be parsed by LR(0) or SLR(1) but can be parsed by CLR(1) or LALR(1). That is the power of LR(1) and LALR(1). Let us understand this power with one more example.

Example 26: The following grammar is not LL(1), not LR(0), and not SLR(1). Check if it is CLR(1) and LALR(1). Also generate the LR(1) parsing table for the following grammar:

 

      SA a | b A c | d c | b d a

      Ad

Solution: Take augmented grammar

 

      S"S

  1. SA a
  2. S b A c
  3. S d c
  4. S b d a
  5. Ad

Create canonical collection of LR(1) items.

 

      I1 = {(S" → • S, $), (S → • A a, $), (S → • b A c, $), (S → • d c, $), (S b d a, $), (A → • d, a)},

      I2 = goto(I1, S) = {(S" → S •, $)},

      I3 = goto(I1, A) = {(SAa, $)},

      I4 = goto(I1, b) = {(S bA c, $), (Sb d a, $), (A → • d, c)},

      I5 = goto(I1, d) = {(S d c, $), (Ad, a)},

      I6 = goto(I3, a) = {(SA a •, $)},

      I7 = goto(I4, A) = {(Sb Ac, $)},

      I8 = goto(I4, d) = {(S b da, $), (A d •, c)},

      I9 = goto(I5, c) = {(S d c •, $)},

      I10 = goto(I7, c) = {(Sb A c •, $)},

      I11 = goto(I8, a) = {(S b d a •, $)}.

 

Look at Figure 5.17. The states I5 and I8 contain final and nonfinal items but there is no SR or RR conflict as look ahead in the final item is not the same as terminal next to dot in the nonfinal item. So no conflicts. This grammar is an LR(1) grammar. There are no states differing by look aheads; hence, it is also LALR(1). The parsing table for CLR(1) and LALR(1) is as shown in Table 5.25.

This grammar is not LL(1), not LR(0), not SLR(1) but is CLR(1) and LALR(1) grammar.

Figure 5.17

Figure 5.17 DFA for Grammar in Example 26

 

Table 5.25 CLR(1)/LALR(1) Parsing Table

Table 5.25

Example 27: The following grammar is not LL(1), not LR(0) and not SLR(1). Check if it is CLR(1) and LALR(1). Also generate the LR(1) parsing table for the following grammar

 

      SA a | b A c | B c | b B a

      Ad

      Bd

Solution: Take augmented grammar

 

      S"S

  1. SA a
  2. Sb A c
  3. S B c
  4. Sb B a
  5. Ad
  6. Bd

Create canonical collection of LR(1) items.

 

      I1 = {(S" → • S, $), (S → • A a, $), (S → • b A c, $), (S → • B c, $), (S → • b B a, $), (A → • d, a), (B → • d, c)}

      I2 = goto(I1, S) = {(S" → S •, $)},

      I3 = goto(I1, A) = {(SAa, $)},

      I4 = goto(I1, b) = {(SbA c, $), (S → bB a, $), (A → • d, c), (B → • d, a)

      I5 = goto(I1, B) = {(SBc, $)},

      I6 = goto(I1, d) = {(Ad, a), (B d •, c)},

      I7 = goto(I3, a) = {(SA a •, $)},

      I8 = goto(I4, A) = {(S b Ac, $)},

      I9 = goto(I4, B) = {(Sb Ba, $)},

      I10 = goto(I4, d) = {(Ad •, c), (B d •, a)},

      I11 = goto(I5, c) = {(SB c, $)},

      I12 = goto(I8, c) = {(Sb A c •, $)},

      I13 = goto(I9, a) = {(S b B a, $)},

 

This grammar is an LR(1) grammar as there are no conflicts in the above DFA as shown in Figure 5.18. To check for LALR(1), verify whether there are any two states differing by look aheads. Look at states I6 and I10. Though there are no conflicts in the two states, later for LALR(1), when we combine the two, it results in RR conflict.

 

      I610 = Ad •, a | c

        Bd •, c | a,

 

This is a conflict in LALR(1). Hence, this grammar cannot be parsed by LALR(1) also. It can be parsed only by CLR(1—the most powerful parser.

The CLR(1) parsing table is as shown in Table 5.26.

Figure 5.18

Figure 5.18 DFA for Grammar in Example 27

 

Table 5.26 CLR(1) Parsing Table

Table 5.26

Example 28: Check whether the following grammer is CLR(1) or LALR(1).

 

      S → A

      A → AB | ε

      B → aB | b

Solution: First construct the initial item.

 

      I0 = {(S" → • A, $), (A→ • A B, $), (A → •, $).

 

This is what we get when we apply closure on first production. Now we need to apply closure to newly added item, which results in

 

      (A→ • A B, a | b), (A → •, a | b).

 

So, finally we get

 

      I0 = {(S" → • A, $), (A→ • A B, $), (A → •, $), (A→ • A B, a | b), (A → •, a | b),}

 

As all these are in the same state, we can combine similar items together. Hence, we get

 

       I0 = {(S" → • A, $), (A→ • A B, a | b | $), (A → •, a | b | $)}

 

DFA with LR(1) items is shown in Figure 5.19.

This grammar is CLR(1) and LR(1). Also as there are no conflicts in DFA and no two states differing by look ahead.

Figure 5.19

Figure 5.19 DFA for Grammar in Example 28

5.14   Comparison of Parsers: Top-Down Parser vs. Bottom-Up Parser

Out of all top-down parsers widely used, one is the LL(1) parser. Out of all bottom-up parsers widely used, one is the LALR(1) parser. Let us compare them with the following criteria.

Design: Top down is simple to construct than bottom up.

Table size: LALR(1) parser table size is exponential to the size of grammar. This is not the case with LL(1) where the table size is bounded by the square of the size of grammar. The size of the bottom-up parser is roughly double the size of top-down parser.

Power: LL(k) ⊂ LR(k)

Error Detection: Predictive nature of TDP allows errors to be detected at the earliest possible time. The TDP parser stack can be used to repair as it contains information on what is expected to be seen in contrast with BUP where the stack contains what has already been encountered in parsing. So far we have discussed many examples of parsers. Let us summarize them in Table 5.27.

The LL(1) parser is a simple top-down parser that can parse only a small class of grammars as there is a restriction on the grammar that it should be free of left recursion and should be left factored. LR(0) is the simplest of all LR parsers but not used practically as it is less powerful. It does not use any look ahead in making parsing decisions; hence, it can be used to parse only a small class of grammars. SLR(1) is better than LR(0) as it uses look ahead.

 

Table 5.27 Grammars That Can be Parsed by Different Parsers

Table 5.27

But once again it is the least powerful as it cannot avoid invalid reduction using the follow set. CLR(1) is the most powerful parser among all but requires more table space. LALR(1) is better than SLR(1) as it uses LR(1) items; it is preferred than CLR(1) as it requires less table space. The class of grammars that can be parsed by different parsers is shown in Figure 5.20.

Figure 5.20

Figure 5.20 Comparison of All Parsers

If a grammar is LR(0), it is SLR(1), LALR(1), and CLR(1). If a grammar is LALR(1), it can be CLR(1) but may or may not be SLR(1) and it may or may not be LR(0). But every LL(1) grammar is LALR(1).

5.15   Error Recovery in LR Parsing

An LR parser will detect error when it consults the parsing action table and find a blank entry. An LR parser implements panic mode error recovery as follows: for each blank entry, have a pointer to error recovery subroutine. So that whenever the blank entry is referred, the corresponding subroutine is called. For example, look at Table 5.28, e1, e2, e3, and e4 are error recovery routines. They specify how to recover from an error.

 

Table 5.28 Error Recovery in LR Parsing Table

Table 5.28

Here e1( ) tells that an operand is missing at the beginning of the expression. So push id and issue error message "missing operand."

e2( ) tells that expression is starting with ). So error recovery here can be to delete input symbol and issue error message.

5.16   Parser Construction with Ambiguous Grammars

So far whatever the parsing techniques discusses are applicable provided the grammar is unambiguous. If it is ambiguous, if we follow the above procedures, we get multiple entries in the parsing table—whether it is LL(1) or LR(1). So let us see whether there is any way of constructing a parser with ambiguous grammars. Before looking at parser design, let us first understand what the advantages of ambiguous grammars are.

  • Ambiguous grammars are shorter and natural. For example, look at the following ambiguous and its equivalent unambiguous grammar.

     

     

    E → E + E

    E → E + T | T

     

    | E * E

    T → T * F | F

     

    | id

    F → (E) | id

     

  • If the parser is constructed with ambiguous grammar, we can change precedence associativity of the parser at a later time, as precedence and associativity are not fixed in ambiguous grammar.
  • No wastage of time. Unambiguous grammar wastes time with reduction like E → T, T → F, F → id. No such wastage of time.

To take advantage of all the above, most importantly, it is easy to write ambiguous than unambiguous. Hence, it is better to construct parser with ambiguous grammar. For this we are not going to modify any previous procedure. Follow the same procedure and construct the table (anything LL(1) or LR(1)). Now the table will have multiple entries. Use disambiguating rules and resolve multiple entries to single entries. So the change in the procedure here is resolving multiple entries to single entries. To understand how to resolve multiple entries to single entries, let us take a SLR(1) parsing table for ambiguous grammar E → E + E | E * E | id. By following the procedure already discussed, we get SLR(1) Table 5.29.

 

Table 5.29 Error Recovery in LR Parsing Table

Table 5.29

There are four SR conflicts. So this table as such cannot be used. Let us see how to resolve this. We use disambiguating rules for resolving multiple entries. The disambiguating rules for this grammar are precedence and associativity of operators. "*" has higher precedence than "+" and both are left associative. With this information we resolve conflicts in the following ways:

Take a simple string "id + id * id $." Parse it.

      id + id * id $

image

State 5 on "*" is S4/r1—an S/R conflict. Let us understand why SR conflict. The string that is already read is id + id. There is no conflict till this is read. But on reading the next operator, that is, "*," if you look at the stack, there is a handle E + E; so it can go for reduce action as handle is available. Otherwise, it can read further for the longest match, which is nothing but shift action. That's why we have an SR conflict. Now let us resolve what right action is. "*" has higher precedence than "+." So if parser goes with reduce action, + is given higher precedence, wrong decision. It should be resolved in favour of shift. that is, S4. This means, read further till id * id is read. Reduce "*" first and then "+."

Similarly, take a string "id + id + id$." In state 5 on "+," the parser will refer to S3/r1—an S/R conflict. The string that is already read is "id+id." So on the next token "+," now parser may not be able to decide whether to go for shift/reduce as handle E + E is already available. Here associativity of "+" should be considered. As it is left associative, resolve conflict in favour of reduce, that is, r1.

Similarly, take a string "id * id + id$." In state 6 on "+," the parser will refer to S3/r2—an S/R conflict. The string that is already read is "id * id." So on the next token "+," now parser may not be able to decide whether to go for shift/reduce as handle E * E is already available. "*" has higher precedence than "+." So the parser goes with reduce action. Resolve conflict in favour of reduce, that is, r2.

Similarly, take a string "id * id * id$." In state 6 on "*," parser will refer S4/r2—an S/R conflict. The string that is already read is "id * id." So on the next token "*," now parser may not be able to decide whether to go for shift or reduce as handle E * E is already available. Here associativity of "+" should be considered. As it is left associative, resolve conflict in favour of reduce, that is, r2. So the resulting parsing table is shown in Table 5.30.

As there are no multiple entries, it can be used by the parser to parse any string defined by the grammar.

 

Table 5.30 Error Recovery in the LR Parsing Table

image

Solved Problems

  1. How many conflicts occur in DFA with LR the following grammar?

     

    S → AB,

    A → c | d,

    B → aAB | b

    Solution: Here replace nonterminal B by its production. Then we get

     

    S → AaAB | Ab,

    A → c | d,

    B → aAB | b

     

    but AB is S. So equivalent operator grammar is

     

    S → AaS | Ab,

    A → c | d,

    B → aS | b

     

  2. Convert the following precedence relation table to function table
    image

    Solution: There are five terminals; so create 10 symbols. Here there is imagerelation between "(" and ")". So take f( and g) into one group, remaining symbols can be taken as separate nodes. Now add edges for each imageand imagerelation. We get the following digraph shown in Figure 5.21.

    Figure 5.21

    Figure 5.21 Digraph

    The precedence function table is as follows:

    image
  3. Consider the following grammar:

     

           E → E + T | T

          T → T * F | F

          F → (E) | id.

    Construct SLR(1) parsing table.

    Solution: The DFA with LR(0) items is shown in Figure 5.22.

    Figure 5.22

    Figure 5.22 DFA with LR(0) items

    The SLR(1) parsing table for the above grammar is given in Table 5.31.

     

    Table 5.31 SLR(1) Parsing Table for Example 3

    Table 5.31
  4. Construct DFA with LR(1) items for the following grammar:

     

    S → CC

    C → cC

    C → d

    Solution:

    I0: closure({(S" → • S, $)}) = {(S" → • S, $), (S → • C C, $), (C → • cC, c/d), (C → • d, c/d)}

    I1: goto(I0, S) = (S" → S •, $)

    I2: goto(I0, C) = {(S → C • C, $), (C → • cC, $), (C → • d, $)}

    I3: goto(I0, c) = {(C → c • C, c/d), (C → • cC, c/d), (C → • d, c/d)}

    I4: goto(I0, d) = (C → d •, c/d)

    I5: goto(I2, C) = (S → C C •, $)

    I6: goto(I2, c) = {(C → c • C, $), (C → • cC, $), (C → • d, $)}

    I7: goto(I2, d) = (C → d •, $)

    I8: goto(I3, C) = (C → cC •, c/d)

        : goto(I3, c) = I3

        : goto(I3, d) = I4

    I9: goto(I6, C) = (C → cC •, $)

        : goto(I6, c) = I6

        : goto(I6, d) = I7

    DFA with LR(1) items is shown in Figure 5.23.

  5. Consider the following grammar:

    S → a A d | a c e | b A e

    A → c

    • Construct the SLR(1) parsing table for this grammar. Is this grammar SLR(1)?
    • Construct the LR(1) parsing table for this grammar. Is this grammar LR(1)?
    • Construct the LALR(1) parsing table for this grammar. Is this grammar LALR(1)?

    Solution:

    • 0 S" → S

      1 S → a A d

      2 S → a c e

      3 S → b A e

      4 A → c

       

      Follow(S) = {$}, Follow(A) = {d, e};

       

      I1 = {S" → • S, S → • a A d, S → • a c e, S → • b A e},

      I2 = goto(I1, S) = {S" → S •},

      I3 = goto(I1, a) = {S → a • A d, S → a • c e, A → • c}

      I4 = goto(I1, b) = {S → b • A e, A → • c}

      I5 = goto(I3, A) = {S → a A • d}

      Figure 5.23

      Figure 5.23

      I6 = goto(I3, c) = {S → a c • e, A → c •}

      I7 = goto(I4, A) = {S → b A • e}

      I8 = goto(I4, c) = {A → c •}

      I9 = goto(I5, d) = {S → a A d •}

      I10 = goto(I6, e) = {S → a c e •}

      I11 = goto(I7, e) = {S → b A e •}

      image

      Since there is a shift/reduce conflict at state I6, this grammar is not an SLR(1) grammar.

    • Set of LR(1) items are as follows:

       

      I1 = {[S" → • S, $], [S → • a A d, $], [S → • a c e, $], [S → • b A e, $]},

      I2 = goto(I1, S) = {[S" → S •, $]},

      I3 = goto(I1, a) = {[S → a • A d, $], [A → • c, d], [S → a • c e, $]}

      I4 = goto(I1, b) = {[S → b • A e, $], [A → • c, e]}

      I5 = goto(I3, A) = {[S → a A • d, $]}

      I6 = goto(I3, c) = {[A → c •, d], [S → a c • e, $]}

      I7 = goto(I4, A) = {[S → b A • e, $]}

      I8 = goto(I4, c) = {[A → c •, e]}

      I9 = goto(I5, d) = {[S → a A d •, $]}

      I10 = goto(I6, e) = {[S → a c e •, $]}

      I11 = goto(I7, e) = {[S → b A e •, $]}

      image

      Since there is no conflict in the parsing table, this grammar is an LR(1) grammar.

    • Since there are no two states with the same core in the LR(1) parsing table, the LALR(1) table is the same as the LR(1) table, and the grammar is an LALR(1) grammar.
  6. * Consider the following grammar.

     

    S → SS | a | ε

     

    • Construct collection of sets of LR(0) items for this grammar and draw its goto graph.
    • Indicate SR and RR conflicts in various states of LR(0) parser.

    Solution: Draw DFA with LR(0) items as shown in Figure 5.24.

    Here states I0, I1, and I3 have SR conflicts. State I3 has RR conflict also.

Summary

  • Bottom-up parsing is the most efficient non-backtracking technique.
  • There is no restriction on grammar for constructing LR parsers.
  • Operator precedence parser is a simple bottom-up parser mainly used for parsing expression grammars.
    Figure 5.24

    Figure 5.24 DFA with LR(0) items

  • LR(0) parser is the simplest of all LR parsers but is least powerful. Hence practically it is not used.
  • SLR(1) is simple to construct, better than LR(0), but can parse only a small class of grammars.
  • LALR(1) is the most widely used LR parser. Yacc also uses this technique. It requires less table space compared to LR(1).
  • LR(1) is most powerful bottom-up parsing technique. But requires less table space.
  • SR conflict confuses the parser whether to take shift action/reduce action.
  • RR conflict confuses the parser whether to reduce by first rule or second rule.
  • If there are any SR/RR conflicts, we cannot construct LR parser.
  • Every LL(1) grammar is LALR(1).
  • LR(0) grammars are subset of SLR(1), LALR(1) and LR(1).
  • SLR(1) grammars are subset of LALR(1) and LR(1).
  • LALR10) grammars are subset of LR(1).
  • Operator grammar can be ambiguous or unambiguous.
  • We can construct parsers with ambiguous grammars also but need to resolve multiple entries.

Fill in the Blanks

  1. Operator precedence parser is________type of parser.
  2. The parsing technique that is used in parsers generators is________.
  3. If A → α • Bb, a is in I then closure (I, a) is________.
  4. If A → B, a is a production in LR(1) items, then reduce operation is entered under________.
  5. __________items are used for LALR(1) parsers.
  6. The type of item used in [A → • Aa, a | b | c]________.
  7. Relation between grammars LL(1)________LL(k).
  8. Relation between grammars LL(1)________LR(k).
  9. If there are no RR conflicts in a LR(1) parser then it________in LALR(1).
  10. If there are no SR conflicts in a LR(1) parser then it________in LALR(1).
  11. Can we reduce every precedence relation table to function table?________.
  12. The other name for bottom-up parser is________.
  13. YACC is________.
  14. Operator precedence parser________handle operator with different precedence.
  15. "0" or "1" in LR items represent________.

Objective Question Bank

  1. Consider the grammar given below

    A → Ba | Aa | c

    B → B b | A b | d

    Convert equivalent operator grammar

     

    • A → Ba D | c D

      D → a D | ∊

      B → B b | A b | d

    • A → Ba D | c D

      D → a D | ∊

      B → c D b E | d E

      E → b E | a D b E | ∊

    • A → Ba D | c D

      D → a D | ∊

      B → A b E | d E

      E → b E | ∊

    • none

     

  2. For a given grammar if SLR(1) has n1 states, LALR(1) has n2 states, LR(1) has n3 states which of the following is true?
    • n2 = n3
    • n2 <= n3
    • n2 < n3
    • none
  3. For a given grammar if SLR(1) has n1 states, LALR(1) has n2 states, LR(1) has n3 states which of the following is true?
    • n3 > n1
    • n3 >= n1
    • n3 = n1
    • none
  4. Consider the grammar given below

    A → SB | S

    B →; S B | ; S

    S → a

    Convert the equivalent operator grammar

     

    • A → S; A | S; S | S

      S → a

    • A → S; B | S; S | S

      B →; SB |; S

      S → a

    • A → S; A | S; S | S

      B →; S B | ; S

      S→ a

    • does not exist
  5. What is the precedence relation between; & a?
    • less 
    • greater 
    • equal 
    • none
  6. * Consider SLR(1) and LALR(1) tables for CFG. Which of the following is false?
    • Goto of both tables may be different
    • Shift entries are identical in both tables
    • Reduce entries in tables may be different
    •  Error entries in tables may be different
  7. * For a given grammar if SLR(1) has n1 states, LALR(1) has n2 states, which of the following is true?
    • n2 < n2
    • n1 = n2 
    • n1 > n2 
    • none
  8. * Consider the grammar S → CC, C → cC | d is
    • LL(1) 
    • SLR(1) but not LL(1)
    • LALR(1) but not SLR(1) 
    • LR(1) but not LALR(1)
  9. * Which of the following is the most powerful parsing method?
    • LL(1) 
    • CLR(1) 
    • SLR(1) 
    • LALR(1)
  10. Consider the grammar S → SS | d | ε is
    • LL(1) 
    • LR(1) 
    • LR(0) 
    • none
  11. * Consider the grammar E → E + n | E × n | n. For a sentence "n + n × n," the handles in the right sentential form of reduction are
    • n, E + n and E + n × n 
    • n, E + n and E + E × n
    • n, n + n and n + n × n 
    • n, E + n and E × n
  12. * Consider the grammar S → ( S ) | a. If SLR(1) has n1 states, LR(1) has n2 states, LALR(1) has n3 states, which of the following is true?
    • n1<n2<n3
    • n1=n3<n2
    • n1=n2=n3
    • n1>n3>n2
  13. * An LALR(1) parser for "G" can have SR conflicts if and only if
    • The SLR(1) has SR conflicts
    • The LR(1) has SR conflicts
    • The LR(0) has SR conflicts
    • The LALR(1) has RR conflicts
  14. Which of the following is true?
    • every SLR(1) is LR(0) 
    • every LL(1) is LR(1)
    • every LL(1) is SLR(1) 
    • every LL(1) is LR(0)
  15. The LL(1) and LR(0) techniques
    • are both same in power
    • both simulate RMD 
    • are not same in power
    • none
  16. If a grammar is SLR(1) then [ ]
    • it may have S/R conflict
    • It will not have any conflict
    • It may have R/R conflict
    • it will not have S/R but may have R/R conflict
  17. Using LR(0) items we can construct. [ ]
    •  SLR parsing table 
    • LALR parsing table 
    • CALR parsing table
    • none.
  18. I0 : S→ • aA, $ A→ • ε, a the state I0 has [ ]
    • S/R conflict 
    • R/R conflict 
    • S/S conflict
    • None
  19. The following grammar is [ ]

    S → ABC

    A → 0A1 | ε

    B → 1B | ε

    C → 1C0 | ε

     

    • LL(1)
    • LR(0)
    • not LL(1)
    • not LL(1) and not LR(0)
  20. Consider the grammar [ ]

     

    E → A | B

    A → a | c

    B → b | c

     

    • LR(0)
    • LR(0) & LR(1)
    • LR(1), SLR(1)
    • none

Exercises

  1. Convert the following grammar to operator grammar.

     

    P → S R | S

    R → θ S R | θ S

    S → W θ s | W

    W → L ↑ W | L

    L → a.

    Prepare a precedence relation table and parse the i/p string using the above grammar.

     

    a ↑ a θ a ↑ a $

     

  2. Check whether the following grammar is LR(0).

     

    S → S#

    S → dA | aB

    A → bA | c

    B → bB | c

     

  3. Check whether the following grammar is LR(0).

     

    S → (L)

    S → x

    L → S

    L → L, S

     

  4. Check whether the following grammar is LR(0), SLR(1).

     

    S → A | B

    A → aAb | ab

    B → aBbb | abb

     

  5. Check whether the following grammar is LR(0), SLR(1).

     

    E → bEa | aEb | ba

     

  6. Check whether the following grammar is LR(1).

     

    E → E + T | T

    T → T * F | F

    F → (E) | id

     

  7. Check whether the following grammar is SLR(1).

     

    S → X Y a #,

    X → a | Yb,

    Y → ∊ | c

     

  8. Design the SLR parser with the following ambiguous grammar.

     

    S → i S | iSeS | a

     

  9. Check if the following grammar is SLR(1), LR(1).

     

    S → L = R | R

    L → * R | id

    R → L

     

  10. Check L(G) = {wwR | w ∊ [a+b]+ } is LR(1)

     

    S → aSa | bSb | a | b

     

  11. How many conflicts occur in DFA with LR(1) items for the following grammar?

     

    S → SS | a | c

     

  12. Find closure of ( E1 → • E, $) in G.

     

    E → E + T | T;

    T → T * F | F;

    F → id

     

  13. Find the number of conflicts, if any, in DFA with LR(1) item.

     

    S → A;

    A → AB | ε;

    B → aB | b

     

  14. Check if G is SLR(1).

     

    S → AB | ε;

    A → aASb | a;

    B → bS

     

  15. Check if G is LR(1).

     

    S → aB | ab;

    A → aAB | a;

    B → ABb | b

Key for Fill in the Blanks

  1. Bottom up 
  2. LALR(1) 
  3. φ 
  4. LR(1) 
  5. LR(1) 
  6. may occur
  7. Will not occur
  8. No. if digraph has cycle
  9. SR parser
  10. Yet another compiler compiler.
  11. cannot
  12. presence or absence of lookahead.

Key for Objective Question Bank

  1. b
  2. a
  3. d
  4. c
  5. d
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.222.132