2.6 TI C64x

The Texas Instruments TMS320C64x is a high-performance VLIW DSP. It provides both fixed-point and floating-point arithmetic. The CPU can execute up to eight instructions per cycle using eight general-purpose 32-bit registers and eight functional units.

Figure 2.24 shows a simplified block diagram of the C64x. Although instruction execution is controlled by a single execution unit, the instructions are performed by two data paths, each with its own register file. The CPU is a load/store architecture. The data paths are named A and B. Each data path provides four function units:

The .L units (.L1 and .L2 in the two data paths) perform 32/40-bit arithmetic and comparison, 32-bit logical, and data packing/unpacking.

The .S units (.S1 and .S2) perform 32-bit arithmetic, 32/40-bit shifts and bit-field operators, 32-bit logical operators, branches, and other operations.

The .M units (.M1 and .M2) perform multiplications, bit interleaving, rotation, Galois field multiplication, and other operations.

The .D units (.D1 and .D2) perform address calculations, loads and stores, and other operations.

image

Figure 2.24 C64x block diagram.

Separate data paths perform data movement:

Load-from memory units .LD1 and .LD2. These units loads registers from memory.

Store-from memory units .ST1 and .ST2 to store register values to memory.

Address paths .DA1 and .DA2 to compute addresses. These units are associated with the .D1 and .D2 units in the data paths.

Register file cross paths .1X and .2X to move data between register files A and B. Data must be explicitly moved from one register file to the other before it can be used in the other data path.

On-chip memory is organized as separate data and program memories. The external memory interface (EMIF) manages connections to external memory. The external memory is generally organized as a unified memory space.

The C64x provides a variety of 40-bit operations. A 40-bit value is stored in a register pair, with the least significant bits in an even-numbered register and the remaining bits in an odd-numbered register. A similar scheme is used for 64-bit values.

Instructions are fetched in groups known as fetch packets. A fetch packet includes eight words at a time and is aligned on 256-bit boundaries. Due to the small size of some instructions, a fetch packet may include up to 14 instructions. The instructions in a fetch packet may be executed in varying combinations of sequential and parallel execution. An execute packet is a set of instructions that execute together. Up to eight instructions may execute together in a fetch packet, but all must use a different functional unit, either performing different operations on a data path or using corresponding function units in different data paths. The p-bit in each instruction encodes information about which instructions can be executed in parallel. Instructions may execute fully serially, fully parallel, or partially serially.

Many instructions can be conditionally executed, as specified by a s that specifies the condition register and a z field that specifies a test for zero or nonzero.

Two instructions in the same execute packet cannot use the same resources or write to the same register on the same cycle. Here are some examples of these constraints:

Instructions must use separate functional units. For example, two instructions cannot simultaneously use the .S1 unit.

Most combinations of writing using the same .M unit are prohibited.

Most cases of reading multiple values in the opposite register file using the .1X and .2X cross path units are prohibited.

A delay cycle is executed when an instruction attempts to read a register that was updated in the previous cycle by a cross path operation.

The .DA1 and .DA2 units cannot execute in one execute packet two load and store registers using a destination or source from the same register file. The address register must be in the same data path as the .D unit being used.

At most four reads of the same register can occur on the same cycle.

Two instructions in an execute packet cannot write to the same register on the same cycle.

A variety of other constraints limit the combinations of instructions that are allowed in an execute packet.

The C64x provides delay slots, which were first introduced in RISC processors. Some effects of an instruction may take additional cycles to complete. The delay slot is a set of instructions following the given instruction; the delayed results of the given instruction are not available in the delay slot. An instruction that does not need the result can be scheduled within the delay slot. Any instruction that requires the new value must be placed after the end of the delay slot. For example, a branch instruction has a delay slot of five cycles.

The C64x provides three addressing modes: linear, circular using BK0, and circular using BK1. The addressing mode is determined by the AMR addressing mode register. A linear address shifts the offset by 3, 2, 1, or 0 bits depending on the length of the operand, then adds the base register to determine the physical address. The circular addressing modes using BK0 and BK1 use the same shift and base calculation but only modify bits 0 through N of the address.

The C64x provides atomic operations that can be used to implement semaphores and other mechanisms for concurrent communication. The LL (load linked) instruction reads a location and sets a link valid flag to true. The link valid flag is cleared when another process stores to that address. The SL (store linked) instruction prepares a word to be committed to memory by storing it in a buffer but does not commit the change. The commit linked stores CMTL instruction checks the link valid flag and writes the SL-buffered data if the flag is true.

The processing of interrupts is mediated by registers. The interrupt flag register IFR is set when an interrupt occurs; the ith bit of IFR corresponds to the ith level of interrupt. Interrupts are enabled and disabled using the interrupt enable register IER. Manual interrupts are controlled using the interrupt set register ISR and interrupt clear register ICR. The interrupt return pointer register IRP contains the return address for the interrupt. The interrupt service table pointer register ISTP points to a table of interrupt handlers. The processor supports nonmaskable interrupts.

The C64x+ is an enhanced version of the C64x. It supports a number of exceptions. The exception flag register EFR indicates which exceptions have been thrown. The exception clear register ECR can be used to clear bits in the EFR. The internal exception report register IERR indicates the cause of an internal exception. The C64x+ provides two modes of program execution, user and supervisor. Several registers, notably those related to interrupts and exceptions, are not available in user mode. A supervisor mode program can enter user mode using the B NRP instruction. A user mode program may enter supervisor mode by using an SWE or SWENR software interrupt.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.187.106