The IA32 Data Register Set Was Small

General

Refer to Figure 35-4 on page 849. The IA register set implemented in pre-P6 processors included very few registers that the programmer could use to hold, test and manipulate data items. The small number of data registers permitted the processor (and the programmer) to keep only a small number of data operands close to the execution units where they can be accessed quickly. Rather, the programmer was forced to write back the contents of one or more of the processor's data registers to memory when he or she needed to read additional data operands from memory to be operated on. Later, when the programmer required access to the original set of data operands, they must again be read from memory (perhaps after first storing the current contents of the data registers). This juggling of data between the register set and memory took time and exacted a penalty (perhaps severe) on the performance of the program.

Figure 35-4. The IA32 Data Registers


The P6 Had 40 General-Purpose Registers

Rather than the extremely limited data register set pictured in Figure 35-4 on page 849, the P6 had 40 data registers (see the ROB in Figure 35-5 on page 850). As described earlier, IA instructions are translated into fixed-length μops prior to being executed. When executed, the μop may:

  • have to place a value into one of the IA GPRs.

  • have to read a value (from an IA data register) that was placed into it by a μop that was executed earlier.

  • when executed, update the contents of the EFlags or the FPU Status register.

Figure 35-5. The P6 ROB


Remember that the processor core permits μops to be executed out-of-order (referred to as speculative execution; see “Speculative Execution” on page 853). Imagine what might happen if the results of each μop's execution were immediately committed to the processor's register set. Values in registers would be changed and condition bits in the EFlags and FPU Status registers would be updated, perhaps erroneously and certainly not in the expected order.

Rather than immediate commitment of μop execution results to the real register set, the processor stores the result of a μop's execution in the ROB entry that contained the μop. If, when executed, another μop required as an input the result produced by one or more μops that precede it in program flow, the result(s) were forwarded directly to it from the ROB entries of the other μops (assuming that they had completed execution). If a μop had been dispatched for execution and another μop that required the results of its execution was queued for dispatch to an execution unit, the result of the first μop's execution is forwarded directly from the execution unit to the queued μop (and is also stored in the ROB entry associated with the μop that produced the result). This was referred to as Feed Forwarding (and is now referred to as Store-to-Load Forwarding).

Rerouting accesses intended for the IA register set to the larger ROB register set (consisting of 40 entries) was necessary for speculative execution and is referred to as Register Aliasing or Register Renaming.

The Pentium® 4 Implements a Large Array of Data Registers

Like the P6 processor, the Pentium® 4 executes μops out-of-order and speculatively, and temporarily stores the results in invisible registers within the processor core. When:

  • a μop has completed execution, and

  • it has been established that it should have been executed, and

  • it is its turn to be retired (in strict program order),

the results are made visible in the appropriate IA32 data register. It should be stressed, however, that not all results are placed in data registers. As an example, when a store (i.e., a memory data write) is retired, its write data is written into the L1 Data Cache (if the memory area is a cacheable and the line is in the cache), or, if the memory area is uncacheable, is placed in a buffer to be written to external memory.

The Pentium® 4 processor core implements the following alias data registers (see Figure 35-6 on page 851):

  • 128 integer data registers.

  • 128 FP data registers.

Figure 35-6. The Pentium® 4's Alias Registers


This is far more than the 40 alias data registers implemented in the P6 processor.

The Compiler Manages Data Register Usage

Knowing the target processor that code is being compiled for, it is the compiler's job to formulate IA32 instructions in a manner that will maximize the processor's usage of the alias registers.

Elimination of False Register Dependencies

Consider the following example:

mov eax,17     ;17 -> eax
add [140],eax  ;memory loc.= eax + content of memory loc.
mov eax,3      ;3 -> eax
add eax,ebx    ;eax = ebx + eax

In earlier x86 processors, these instructions would have to be executed one at a time in sequence in order to yield the correct results. As an example, the processor couldn't execute the third and fourth instructions before the first and second had completed (because the second instruction must use the value placed into EAX before the third instruction can place a new value into EAX).

Prior to dispatching the μops that represent the four instructions for execution, the Pentium® 4 processor would tag the μops as follows:

  • The first μop is tagged so that when executed, it will store its result (i.e., the value 17) in one of the alias registers (the one identified by the tag).

  • The second μop is tagged so that when executed, rather than the EAX register, it will use the contents of the alias register in which the first μop stored the value 17.

  • Like the first μop, the third μop is tagged so that when executed, it will store its result (i.e., the value 3) in one of the alias registers (rather than in EAX register).

  • The third μop is tagged in the following manner:

    - When executed, it obtains one of its source operands from the alias register that the first μop loaded with the value 17.

    - It obtains its other source operand from the alias register that contains the results of the most recent write that was performed to the EBX register.

    - It adds the two source operands together and stores the result in an alias register.

The processor can execute the μops representing instructions 1 and 3 simultaneously, placing the values 17 and 3 into two of the integer register file alias registers.

Likewise, the μops representing instructions 2 and 4 can then be executed in parallel, obtaining the values 17 and 3 from the integer register file alias registers that contain the results of the execution of μops 1 and 3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.151