As noted earlier in this chapter (see “MMX Not Implemented” on page 572), the Pentium® Pro did not implement the MMX instruction set or register set.
The new instructions added to the instruction set are shown below and are described in the sections that follow:
CMOV (Conditional Move).
FCMOV (FP Conditional Move).
FCOMI (FP Compare and Set EFlags).
RDPMC (Read Performance Monitoring Counters).
UD2 (UnDefined).
Starting with the P6 processor family, the IA32 processors have a deep instruction pipeline and execute instructions out-of-order. For these reasons, mispredicted branch instructions can cause a fairly substantial decrease in performance. When a branch is executed and it is determined that its branch path was predicted incorrectly, all of the instructions currently in the prefetch streaming buffer and the earlier instruction pipeline stages must be flushed. In addition, any instructions in the ROB that fall after the mispredicted branch in the program must be deleted from the ROB.
The ideal program would have no branches, or only unconditional branches. Since this isn't realistic, however, a better plan would be to limit, as much as possible, the number of conditional branches found in a program (because any conditional branch is an opportunity for a misprediction and the attendant performance dip). The Conditional Move instruction (CMOV) permits the elimination of a conditional branch by testing a condition and only performing the indicated move if the condition tests true.
The Conditional Move instruction checks the state of one or more EFlags status bits (CF, OF, PF, SF, and ZF) and only executes the move if the condition is met. If the condition is not satisfied, the move is not performed and execution continues with the instruction following the CMOV. This instruction can move a 16- or 32-bit value from memory to a GPR or from one GPR to another. Moves of 8-bit register operands are not supported. A processor's support for this feature may be determined by executing a CPUID request type 1 and checking EDX[CMOV] (1 indicates it is supported; see Figure 24-23 on page 591).
Mispredicted branches cause a deep dip in performance, so, wherever possible, the elimination of conditional branches eliminates the possibility of a branch misprediction.
The FP Conditional Move (FCMOV) is the FP equivalent of the CMOV instruction. It conditionally moves values between FP stack registers and permits the elimination of a conditional branch. Prior to executing the FCMOV instruction, the programmer would execute an FCOMI, FCOMIP, FUCOMI, or FCOMIP instruction (see the next section) to set the appropriate condition bits in the integer EFlags register to be tested by the FCMOV instruction. A processor's support for this feature may be determined by executing a CPUID request type 1 and checking EDX[CMOV] (1 indicates it is supported; see Figure 24-23 on page 591).
The following instructions have been added to the FP instruction set and are intended for use with the FCMOV instruction:
FCOMI. FP Compare real and set Integer flags (in the EFlags register, rather than the status bits in the FP SWR) instruction.
FCOMIP. FP Compare real and set Integer flags instruction. This instruction also pops the FP register stack.
FUCOMI. FP Unordered Compare real and set Integer flags instruction
FUCOMIP. FP Unordered Compare real and set Integer flags instruction.
This instruction also pops the FP register stack.
A processor's support for this feature may be determined by executing a CPUID request type 1 and checking EDX[CMOV] (1 indicates it is supported; see Figure 24-23 on page 591).
Although the Performance Monitoring Counters were implemented in the pre-MMX Pentium®, they could only be accessed by privilege level 0 code using the RDMSR and WRMSR instructions. This meant that code running at a lesser privilege level (1, 2, or 3) could not uti-lize the Performance Monitoring facility.
The RDPMC instruction loads the contents of the specified (in ECX) 40-bit counter into the EDX:EAX register pair (EDX is loaded with the upper 8 bits and EAX with the lower 32 bits). The counters (0 or 1) are specified by placing 0000h or 0001h, respectively, in ECX.
This instruction allows code running at a less privileged level to read the counters (if CR4[PCE] = 1), thereby permitting performance monitoring by less privileged code without incurring the overhead of an OS call.
The RDPMC instruction isn't a serializing instruction (for more information, refer to “Serializing Instructions” on page 1079). If an exact event count is required, use a serializing instruction (such as CPUID) before and/or after the RDPMC instruction.
The RDPMC instruction can be executed in 16-bit mode or in VM86 mode, but the full contents of ECX is used to identify the counter to access and the 40-bit result is returned in the EDX:EAX register pair.
This instruction is provided for software testing to explicitly generate an Invalid Opcode exception. Its opcode is reserved for this purpose. Other than raising the Invalid Opcode exception, it is the same as the NOP instruction.
The CPUID instruction was enhanced to support request type 2 as well as types 0 and 1. The type 2 request returns cache and TLB topology info. In addition, the CPUID instruction was enhanced to support the BIOS Update feature (see “MicroCode Update Feature” on page 631 for a detailed description). For a detailed discussion of CPU identification and determination of a processor's capabilities, refer to “CPU Identification” on page 1443.
3.137.162.110