9.2 The Target Language

We are going to use x86 assembly language for our project in Chapter 12. Here, we use simple subset of that language.

9.2.1 x86 Assembly Language in GAS Syntax

We use the AT&T assembly syntax used in GNU assembler as, also known as, GAS. The assembly code output from the gcc compiler uses this syntax and is compatible with the GCC in-line assembly syntax. However, this is not the only syntax that is used to represent x86 operations. For example, Net-Assembler NASM uses a different syntax to represent assembly mnemonics, operands and addressing modes, as do some High Level assemblers. The AT&T syntax is the standard on Unix-like systems but some assemblers use the Intel syntax used in MASM. GAS itself can accept both the syntaxes. We shall use only the AT&T syntax.

GAS instructions generally have the form

<mnemonic op-code> <source>, <destination>

which is the reverse of other assemblers like MASM and NASM. For example, the following mov instruction

movw $0xFEED, %ax

will move the hex value FEED (16-bit) into the register AX.

In GAS, the op-codes are generally suffixed with the letters ‘b’, ‘s’, ‘w’, ‘l’, ‘q’ or ‘t’ to specify the size of the operand being manipulated.

b  =  byte (8-bit)

s  =  short (16-bit integer) or single (32-bit floating point)

w  =  word (16-bit)

l   =  long (32-bit integer or 64-bit floating point)

q  =  quad (64-bit)

t   =  10-bytes (80-bit floating point)

If the suffix is not specified, and there are no memory operands in the instruction, GAS infers the operand size from the size of the destination register operand (the final operand).

When referencing a register, the register name needs to be prefixed with a ‘%’. Numerical constants need to be prefixed with a ‘$’.

Address operand syntax Up to 4 parameters of an address operand are presented in the memory address operand syntax:

<displacement>(<base register>, <offset register>, <scalar multiplier>)

This is equivalent to:

 

[BaseRegister + displacement + OffsetRegister * ScalarMultiplier]

in Intel syntax. Either or both of the numeric, and either of the register parameters may be omitted. Some examples are:

 

movl −4(%ebp,%edx,4),%eax Full: load *(ebp – 4 + (edx * 4)) into eax
movl −4(%ebp), %eax Typical: load a stack-variable into eax
movl (%ecx), %edx No offset: copy value at memory address in ecx into edx
leal 8(,%eax,4), %eax Arithmetic: multiply eax by 4 and add 8
leal (%eax,%eax,2), %eax Arithmetic: multiply eax by 2 and add eax (i.e. multiply by 3)

To summarize:

To generate a good code for the x86 series of machines we need:

  • To select appropriate instructions.
  • To make good use of registers and avoid unnecessary memory accesses.
  • Must not use registers that are already in use.
  • To keep the number of jumps as less as possible, as too many of them will reduce the CPU cache’ performance.

The design decisions that we have to take are:

  • How will the values of types in the source language be mapped to values of types in the target language?
  • How will the constructs of the source language be mapped to constructs in the target language?
  • How do we represent source language values in the target language?
  • At present we shall consider only a few primitive types:

     

    integers → 32-bit words
    booleans→ 32-bit words, 0 = false, 1 = true

     

  • A statement specifies an action: So compilation of a statement should produce code to execute that action.
  • An expression specifies a value: So compilation of an expression should produce code that can be executed to calculate that value.

To make this work we need a general strategy and a mapping of that strategy onto our target machine.

A possible general strategy is like this. Assume that we have only two registers, eax and ebx. We have to decide where to store the intermediate results. In general, it is not possible to store them in registers, because enough of them are not available. We can store them in memory, but memory exchanges take time and we have to allocate space for them. We can store them on the run-time stack, i.e. push a value on to the stack when we need to save it so that we can release a register, and pop the value off later when we need to use that saved value.

Our general strategy for compiling code will be: calculate a value for the expression and leave the result in eax and push eax on the stack to save its value when necessary. For example, to compile the expression e1 + e2:

generate code to evaluate e1, leaving result in eax
pushl %eax
generate code to evaluate e2, leaving result in eax
popl %ebx
addl %ebx, %eax

In short, the result of any sub-expression computation is kept in eax, and its value is saved on the stack when it must survive the computation of another sub-expression.

A more formal description of this code generation strategy is as follows.

E[e1 + e2]  =  E[e1]
               pushl %eax
               E[e2]
               popl  %ebx
               addl  %ebx, %eax

We can extend the generation scheme E to other forms of expression:

Constant load:

E[n]        =  movl $n, %eax

Generation of conditional logic operators:

E[e1 && e2] =      E[e1]
                   cmp1 $1, %eax
                   jnz      lab1       a new label E[e2]
            lab1:           …

Generation of variable references:

E[var]      =  movl var, %eax

For example, consider computation of an expression (2 + 4) * (3 + 5) using the stack to save the intermediate values:

movl   $2,%eax
pushl  %eax
movl   $4,%eax
popl   %ebx
add    %ebx,%eax
pushl  %eax
movl   $3,%eax
pushl  %eax
movl   $5,%eax
popl   %ebx
add    %ebx,%eax
popl   %ebx
imull  %ebx,%eax

If we use register allocation instead of the stack, because we have more than two registers:

movl   $2, %eax
movl   $4, %ebx
addl   %ebx, %eax
movl   $3, %ebx
movl   $5, %ecx
addl   %ecx, %ebx
imull  %ebx, %eax

We can modify our code generation schemes to make good use of machine registers. Further improvements are also possible by making a better use of the instruction set:

movl   $2, %eax
addl   $4, %eax
movl   $3, %ebx
addl   $5, %ebx
imull  %ebx, %eax

This just requires a specialized version of our revised generation scheme:

E1[e1 + n] = E1[e1]
             addl $n, %eax
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.30.236