4.5. TIE Instructions

The byte-swapping function described in the previous chapter provides an excellent demonstration of the ability of new processor instructions to reduce cycle count. Figure 4.3 illustrates the byte-swap function, which converts a 32-bit word between big- and little-endian formats.

Figure 4.3. Conversion from one endian format to the other for a 32-bit word.


The nine instructions from the base Xtensa ISA that are required to perform this operation are:

slli a9, a14, 24Form intermediate result bits 24–31 in register a9
slli a8, a14, 8Shift 32-bit word left by 8 bits, save in register a8
srli a10, a14, 8Shift 32-bit word right by 8 bits, save in register a10
and a10, a10, a11Form intermediate result bits 9–15 in register a10
and a8, a8, a13Form intermediate result bits 16–23 in register a8
or a8, a8, a9Form intermediate result bits 16–31, save in register a8
extui a9, a14, 24, 8Extract result bits 0–7, save in register a9
or a10, a10, a9Form result bits 0–15, save in register a10
or a10, a10, a8Form final result in register a10

Through TIE, it’s possible to define a BYTESWAP instruction that takes a 32-bit word from one of the processor’s general-purpose register-file entries, converts the word from one endian format to the other, and stores the result in another register-file entry. The TIE description to create this new instruction is remarkably short:

operation BYTESWAP {out AR outR, in AR inpR} { }
{
wire [31:0] reg_swapped = {inpR[7:0],inpR[15:8],inpR[23:16],inpR[31:24]};
     assign outR reg_swapped;
}

The operation section of a TIE description provides the name, format, and behavior of a TIE instruction. It is the simplest and most compact way to describe a new instruction. In some instances, it is the only section required to completely specify an instruction.

The interface to the BYTESWAP instruction is defined by the information contained in the first set of curly braces of the operation section:

operation BYTESWAP {out AR outR, in AR inpR} { }

Within the first set of braces, the argument outR specifies the destination entry in the AR register file for the result of the instruction. Argument inpR specifies the AR register-file entry that provides the source operand for the instruction. The second set of braces in the operation statement can be used to specify additional internal states for this operation extension but this feature isn’t used in this example.

The behavior of the BYTESWAP instruction is defined within the next set of curly braces:

{
wire [31:0] reg_swapped = {inpR[7:0],inpR[15:8],inpR[23:16],inpR[31:24]};
     assign outR = reg_swapped;
}

The first line in this group defines how the new machine instruction should compute the byte-swapped 32-bit value and assigns the result of the operation to a temporary variable named reg_swapped. Note that the values of the intermediate wires, states, and registers are visible in the tailored Xtensa debugger for a processor incorporating this instruction using the debugger’s info tie_wires command. This feature greatly facilitates the debugging of TIE instructions in a software environment. The second line above assigns the byte-swapped value to the output argument outR.

This 2-line definition is all that’s required to add an instruction to the Xtensa processor core. From this single instruction description, the TIE Compiler within the Xtensa Processor Generator builds the necessary execution-unit hardware, adds it to the processor’s RTL description, and adds constructs in the software-development tool suite so that the new BYTESWAP instruction can be used as an intrinsic in a C or C++ program.

The BYTESWAP instruction is an example of a fused instruction. The operations of nine dependent instructions (each instruction in the sequence depends on results from previous instructions) have been fused into one. In this example, the circuitry required to implement the function is extremely simple. The execution unit for this instruction adds little more than some additional wires to scramble byte lanes, yet this new instruction speeds endian conversion by a factor of 9x. This example demonstrates that a small addition to a processor’s hardware can yield large performance gains.

In addition, the new BYTESWAP instruction doesn’t need the intermediate-result registers that are used in the 9-instruction byte-swap routine. Some additional gates are included in the processor’s instruction decoder to add the new instruction to the processor’s instruction set, but these few gates do not make the processor core noticeably larger than the base processor.

In general, most new instructions described in TIE use more complex operations than BYTESWAP. In addition to the wire statement used in the above example, the TIE language includes a large number of operators and built-in functions that are used to describe new instructions. These operators and function modules appear in Tables 4.2 and 4.3, respectively. Nearly any sort of data manipulation can be performed using the wire statement in conjunction with the other TIE operators and built-in functions.

Table 4.2. TIE operators
Operator typeOperator symbolOperation
Arithmetic+Add
 Subtract
 *Multiply
Logical!Logical negation
 &&Logical and
 ||Logical or
Relational>Greater than
 <Less than
 > =Greater than or equal
 < =Less than or equal
 = =Equal
 ! =Not equal
Bitwise˜Bitwise negation
 &Bitwise and
 |Bitwise or
 ^Bitwise ex-or
 ^˜or ˜^Bitwise ex-nor
Reduction&Reduction and
 ˜&Reduction nand
 |Reduction or
 ˜|Reduction nor
 ^Reduction ex-or
 ^˜ or ˜^Reduction ex-nor
Shift<<Left shift
 >>Right shift
Concatenation{ }Concatenation
Replication{ { } }Replication
Conditional?:Conditional
Built-in modules<module-name> (...)See Table 4.3

Table 4.3. Built-in TIE function modules
FormatDescriptionResult definition
TIEadd(a, b, cin)Add with carry-ina b cin
TIEaddn(a0, a1, ... an–1)N-number additiona0 + a1 + ... + an–1
TIEcmp(a, b, sign)Signed and unsigned comparison{a < b, a <= b, a == b, a >= b, a > b}
TIEcsa(a, b, c)Carry-save adder{a & b | a & c | b & c, a ^ b ^ c}
TIEmac(a, b, c, sign, negate)Multiply-accumulatenegate? c – a * b : c + a * b where sign specifies how a and b are extended in the same way as for TIEmul
TIEmul(a, b, sign)Signed and unsigned multiplication{{m{a[n – 1] & sign}}, a} * {{n{b[m – 1] & sign}}, b} where n is size of a and m is size of b
TIEmulpp(a, b, sign, negate)Partial-product multiplynegate? – a * b:a * b
TIEmux(s, d0, d1, ..., dn–1)n-way multiplexers == 0?d0 : s == 1? d1: ... : s == n – 2? dn–2: dn–1
TIEpsel(s0, d0, s1, d1, ..., sn–1, dn–1)n-way priority selectors0?d0: s1?d1: ... : sn–1?dn–1: 0
TIEsel(s0, d0, s1, d1, ..., sn–1, dn–1)n-way 1-hot selector(size{S0} & D0) | (size{S1} & D1) |... (size{Sn–1} & Dn–1) where size is the maximum width of D0 ... Dn–1

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.19.17