8.2 Polish Notation

Polish notation, also known as prefix notation, is a form of notation for writing expressions in logic, arithmetic and algebra. The Polish logician Jan Lukasiewicz invented this notation around 1920 in order to simplify sentential logic. In this notation, the operator is written as a prefix, i.e. to the left of the operands. If the arity (the expected number of operands) of the operators is fixed, the result is a syntax devoid of any parentheses that can be parsed without ambiguity. If the notation is used to represent mathematical expressions by interpreters of programming languages, they can be readily parsed into abstract syntax trees and define a one-to-one representation for the same. A typical expression in Polish notation is image which is equivalent to our normal infix expression (3 + 4) * 5.

Reverse Polish notation (RPN), also known as Postfix notation, is a mathematical notation where every operator follows all of its operands, in contrast to Polish notation. It is parenthesis-free as long as operator arities are fixed. A typical expression in RPN is image which is equivalent to our normal infix expression (3 + 4) * 5.

The postfix or Reverse Polish notation is used in many stack-based programming languages like FORTH, PostScript, and is the way certain calculators, notably from Hewlett-Packard, operate.

Evaluation of an RPN expression expects an operand stack. The evaluation method is shown in Algorithm 8.2.1.

 

image

 

For the example RPN expression image evaluation proceeds as:

 

image

 

The original RPN was meant to work with logical, arithmetic or algebraic expressions. We extend it to deal with programming control constructs also so that we can use it to represent Intermediate code. We call it ERPN (extended RPN). Suppose we want to represent the following if statement:

 

image

 

We can represent it in ERPN form, stored in an array, as:

 

image

 

Here, BZ (Branch-if-Zero) is a primitive operator that causes a branch to 11, which is the index of the first token of s2, if expr evaluates to zero or FALSE. Similarly, BR (branch) is a primitive operation which provides an unconditional Jump to l2, the index of token next to the end of this IF statement. If we assume that s1 and s2 occupy only one array position (a rather unusual situation), then 11 = 6 and 12 = 7.

Let us see how this construct can be interpreted. Scanning from left to right, first the expr, in RPN, is interpreted and a value, 0 or 1, is left on the stack. Next 11, being an operand, is put on the stack. Then BZ is scanned, which pops 2 operands, 0/1 and l1, and branches to l1, i.e. s2, if the expr is 0. That provides the ELSE branch. If expr were 1, no branch will take place, s1 is scanned and interpreted. Then l2 is stacked and operator BR will pop it to unconditionally branch to token next to the IF-THEN-ELSE statement.

A little thought will tell you that the canonical control construct WHILE-DO can be represented with the above simple arrangement. For example,

 

image

 

can be represented by:

 

image

 

Here, 11 = 6 is the index of token next to this WHILE-DO. If cond is not FALSE, this branch is not taken and statement s1 is executed. Next an unconditional branch is taken to 12 = 0, i.e. the first token of the cond. Remember that the statement s1 should be such that eventually it makes the cond FALSE, otherwise we will have an infinite loop.

The RPN representation is not very convenient from performing optimization, though it is used frequently to build virtual machines.

8.2.1 Generating RPN

We now see how Reverse Polish notation can be generated from a parse tree. Suppose we want the RPN for an expression 3 * (4 + 5). Its parse tree is shown in Fig. 8.5 and syntax tree in Fig. 8.6. The syntax tree is obtained by traversing the parse tree in pre-order and creating a sub-tree when an operator is encountered, with the operator as the root and the operands under it from the parse tree. The RPN can be generated by traversing the syntax tree in post-order and outputing nodes of each operator sub-tree. For example, the post-order traversal shown in Fig. 8.6 will give us image, which is a correct RPN. Note that we can combine the two operations and obtain the RPN directly from the parse tree.

 

A parse tree for the example expression

 

Fig. 8.5 A parse tree for the example expression

A syntax tree for the example expression

 

Fig. 8.6 A syntax tree for the example expression

 

We have a C language program syntax_rpn_tree.c which implements this idea for an RDP for the example grammar. Table 8.1 shows some simple examples.

 

Table 8.1 Examples of RPN generation – simple statements

Input string Output RPN
P (a + b) * c. a b + c * .
P L s = d * (f + g). d f g + * s = .
P (q + w) * (e + r). q w + e r + * .
P a + b + c + d. a b + c + d + .
P L s = d * (f + g);L g = h + j * k + 1. d f g + * s = h j k * + 1 + g = .

The essential changes in our RDP are illustrated by the following code segments. The functions for Terminal symbols will return the symbol itself as a string, with a few exceptions as noted below.

char * pplus(){
    if(symbol() != ‘+’)
       error(“+ expected”);
    return “+";
}

The functions for Non-Terminals will return the RPN built-up till now. They will build the partial RPN by post-order traversal as mentioned above. As these functions for Non-Terminals invoke one another, it is necessary to hold the partial results during each invocation separately in a stack. We could have used a separate stack for that purpose, but then we would have to write code for management of that stack. We have instead used the normal C language Return-stack itself, by reserving space for the required strings.

char * pT(){
    char *t2, *t3, *t4;
    char res[1024], tmp[1024];
    sprintf(res,“%s”,pF());
    while(nextsymbol() == ‘*’) {
        t2 = ppast();
        t4 = pF();
        sprintf(tmp,“%s %s %s”, res,t4,t2);
        sprintf(res,“%S”,tmp);
        tokens++;
    }
    t3 = res;
    return t3;
}

In the above code, the variable tokens is used for a purpose which will become clear subsequently. Each of the Non-Terminal function increments this variable by certain amount, related to the length of the partial RPN generated by it.

We also need to implement generation of RPN for at least the canonical alternate (IF-THEN-ELSE) and loop (WHILE-DO). For an IF construct, the RDP would insert I just after the condition-expression, i.e. to represent

 

IF cond1 THEN stmt-list1 FI

RDP would generate:

 

cond1-RPN I stmt-list1-RPN …

 

We should replace ‘I’ with 11 BZ as discussed previously, where l1 is the index of the first token of the next statement. Also, for the full IF construct

 

IF cond1 THEN stmt-list1 ELSE stmt-list2 FI

 

the RDP would generate:

 

cond1-RPN I stmt-list1-RPN : stmt-list2-RPN …

 

Again we should replace ‘:’ with 12 BR. Some examples of generated IF construct RPN are given in Table 8.2. Both these operations require that we maintain a count of number of tokens generated in the RPN and insert proper values as l1 and l2, and while doing so, still keep a correct count of the RPN tokens generated. The modified RDP function which generates both short and long versions of IF construct is given below.

 

Table 8.2 Examples of RPN generation – IF constructs

Input string Output RPN
P I(a + a) L s = d * (g + h)E. a a + 12 BZ d g h + * s = .
P I(a + a) L s = d * (g + h):L a a + 14 BZ d g h + * s = 19 BR
x = c + vE. c v + x = .
PI(a * a)Ls = a + b;Lg = h * jE. a a * 15 BZ a b + s = h j * g = .
PI(a + a) I(b * c)Ls = d * f: a a + 22 BZ b c * 17 BZ d f * s
Lc = v + bEE. = 22 BR v b + c = .
char * pI(){
  char * t1, *t2, *t3, *t4, *t5;
  char res[1024], tmp[512], tmp2[512];
  int 11, 12;
  t1 = pi(); // returns ‘BZ’
  tokens +=2;
  p1p();
  t2 = pE(); strcpy(tmp2,t2);
  prp();
  t3 = pR(); strcpy(tmp,t3); 11 = tokens + 2;
  if(nextsymbol() == ‘:’){
    t4 = pbar(); // returns ‘BR’
    tokens += 2;
    t5 = pR();
    12 = tokens;
    sprintf(res, “%s %d %s %s %d %s %s”, tmp2,11,t1,tmp,12,t4,t5);
  } else {
    sprintf(res, “%s %d %s %s”, tmp2,11-2,t1,tmp);
   }
    pe();
    t5 = res;
  return t5;
}

Examples of generation of WHILE construct are given in Table 8.3.

 

Table 8.3 Examples of RPN generation – WHILE construct

Input string Output RPN
PW(a + a)Lz = x * (c + v) Z. a a + 14 BZ x c v + * z = 0 BR.
PW(a + a)Lz = x * (c + v); a a + 19 BZ x c v + * z = d f + s
Ls = d + fZ. = 0 BR.
PW(a + a)W(b * c)Lz = x * (c + v)ZZ. a a + 21BZ b c * 19 BZ x c v + * z = 5 BR 0 BR.
PLa = b; w(a)1(c)Ld = f + gEZ. b a = a 16 BZ c 14 BZ f g + d = 3 BR.

Similar considerations hold while handling a WHILE-DO construct and we give the modified RDP function to generate a WHILE-DO construct right away.

char * pW(){
  char * t1, *t2, *t3, *t4, *t5;
  char res[1024], tmp[1024];
  int 11, 12;
  t1 = pw(); // returns ‘BZ’
  12 = tokens; tokens+=2;
  p1p();
  t2 = pE(); strcpy(tmp,t2);
  prp();
  t3 = pR();
  t4 = pz(); // returns ‘BR’
  11 = tokens + 2;
  sprintf(res, “%s %d %s %s %d %s “,tmp, 11, t1, t3, 12, t4);
  t5 = res;
  tokens += 2;
  return t5;
}

You will notice that tokens is advanced at several places within these functions. This is to maintain the correct index values for nested constructs.

We have used absolute RPN-token index values, but many implementations of RPN generator use List-pointer relative index, i.e. the difference between the target index and the index of the Branch token.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.176.88