2.1 A simple Language

We define a simple – but high enough level – language as an example. At present it contains just sufficient construct to write small programs, which can be compiled with the compiler we are going to discuss and executed on the virtual machine VM1. VM1 is a working computer; it is implemented only in software as a C program and not in silicon. As indicated in Chapter 1, a similar idea is actually used in programming languages such as Java, Perl and Python.

A typical program in simple looks like:

     program
      int a ;
      let a = 10 ;
loop: print (a) ;
      let a = a – 1 ;
      if a goto loop ;
      end

This program will print numbers 10 to 0, in descending order. Some of the language constructs are redundant at this stage (e.g. declaration of a variable as an integer), but we may extend this language “some day” and add floating point quantities, variable list in the print statement, etc.

Note the way we have formed the loop in the above example. There are only two control statements – if and goto – in this language. There are even no relational operators.

To have a precise specification of the language, we give its grammar below.

2.1.1 Grammar of simple

The grammar could have been given in Backus-Naur Form (BNF), see Appendix A, but in anticipation of our discussions about compiler writing tools like yacc, we give the grammar in a format compatible with yacc input language.

In the yacc format, the Non-Terminals or meta-variables are written as identifiers and the Terminals or meta-literals are written within single quotes. Alternate productions are separated by vertical bar ‘|’, the non-terminal on LHS is followed by a colon ‘:’ and the productions for a particular non-terminal are terminated by semi-colon ‘;’.

program : ‘program’ statements ‘end’
        ;
statements : statement
           | statements statement
           ;

statement : basic-statement ‘;’
          | label basic-statement ‘;’
basic-statement : assignment
                | var
                | if
                | goto
                | read
                | print
                ;
assignment : ‘let’ variable ‘=’ expr
              ;
var : ‘int’ variable
       ;
if : ‘if’ expr statement
      ;
goto : ‘goto’ variable
       ;
read : ‘read’ ‘ (‘ variable ‘)’
       ;
print : ‘print’ ‘ (‘ number ‘)’
      | ‘print’ ‘ (‘ variable ‘)’
      ;
expr : expr ‘+’ term
       | expr ‘-’ term
       | term
       ;
term : term ‘*’ factor
     | term ‘/’ factor
     | factor
     ;
factor : ‘ (’ expr ‘)’
       | variable
       | number
       ;
label : variable ‘:’
      ;

variable : char
           | variable char
           | variable digit
           ;
number : ‘+’ unsigned-number
         | ‘-’ unsigned-number
         | unsigned-number
         ;
unsigned-number : digit
          | unsigned-number digit
          ;
          ;
char : ‘a’|‘b’|‘c’|‘d’|‘e’|‘f’|‘g’|‘h’|‘i’|‘j’|‘k’|‘l’|‘m’
     |‘n’|‘o’|‘p’|‘q’|‘r’|‘s’|‘t’|‘u’|‘v’|‘w’|‘x’|‘y’|‘z’
     ;
digit : ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’
      ;
op    : ‘+’|‘-’|‘*’|‘/’|‘ (‘|’)‘|‘:’|‘=’|‘;’|‘<’|‘>’;

A close look at the grammar will tell you that each statement type starts with a keyword. Such a design simplifies the job of writing the parser for this language.

Note that the grammar does not specify certain semantic requirements, for example, a valid program in simple should have a label, say, loop given somewhere, if there is a gotoloop in the program. Similarly, we require that all the variables be declared before use, the print or read statement cannot have a label as their argument, etc. The grammar does not specify these things, but a compiler will have to check for such unacceptable constructs.

2.1.2 Target Language

We shall now define the target language, i.e. the language understood by our target machine VM1. Our VM1 is a 32-bit, fixed instruction size, binary computer. The programmer view of the machine is shown in Fig.2.1

 

A programmer's view of VM1

 

Fig. 2.1 A programmer's view of VM1

 

The following registers are available:

A – 32-bit accumulator, leftmost bit is interpreted as its sign bit. Register no. 0.

M – Auxiliary register, leftmost bit is interpreted as its sign bit. Register no. 4.

IP – Instruction pointer, 32-bits, but only lower 16-bits significant at present. Register no. 7.

SP – Stack Pointer, 32-bits, but only lower 16-bits significant. Register no. 6.

BP – Base Pointer, 32-bits, but only lower 16-bits significant. Register no. 5.

IX1 – Index Register 1, 32-bits; Register no. 1.

IX2 – Index Register 2, 32-bits; Register no. 2.

IX3 – Index Register 3, 32-bits; Register no. 3.

IR – Instruction Register, 32-bits; divided into OP (1-byte), MOD (1-byte) and ADDRS (2-bytes);holds an instruction being executed.

MOD – A part of IR, consisting of m (2-bits), RP (3-bits), R (3-bits); m determines the addressing mode used:

 

00 – Direct address

10 – Indirect w.r.t RP (the numbered registers R)

01 – Indexed w.r.t RP

11 – Indirect (indexed w.r.t RP)

 

Some of the op-codes of VM1 are given in Table 2.1.

 

Table 2.1 Some of the op-codes of VM1

 

Some of the op-codes of VM1

 

Address 0000 is for an input/output device. LD 0000 will read from the input device and ST 0000 will print to the output device.

2.1.3 Example Program in Machine Code

We write the example program given above in the “machine code” of VM1:

 0100 | 15 0 0 0 000A | Load value 10 in A-reg

    1 | 06 0 0 0 0110 | store in a

    2 | 06 0 0 0 0000 | display (store in 0)

    3 | 05 0 0 0 0110 | load a

    4 | 12 0 0 0    1 | subtract 1

    5 | 0A 0 0 0 FFFB | Jump if +ve (relative)

    6 | FF 0 0 0 0    | Halt

 0110 | XX X X X X    | value of a stored here

Note that the jump instructions take relative jump, from the current value in IP.

Now we shall discuss in some details the compiler for the simple for the target language of VM1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.156.235