Instruction Set Changes

MMX Not Implemented

As noted earlier in this chapter (see “MMX Not Implemented” on page 572), the Pentium® Pro did not implement the MMX instruction set or register set.

New Instructions

The new instructions added to the instruction set are shown below and are described in the sections that follow:

  • CMOV (Conditional Move).

  • FCMOV (FP Conditional Move).

  • FCOMI (FP Compare and Set EFlags).

  • RDPMC (Read Performance Monitoring Counters).

  • UD2 (UnDefined).

Conditional Move (CMOV) Eliminates Branches
Problem It Addresses

Starting with the P6 processor family, the IA32 processors have a deep instruction pipeline and execute instructions out-of-order. For these reasons, mispredicted branch instructions can cause a fairly substantial decrease in performance. When a branch is executed and it is determined that its branch path was predicted incorrectly, all of the instructions currently in the prefetch streaming buffer and the earlier instruction pipeline stages must be flushed. In addition, any instructions in the ROB that fall after the mispredicted branch in the program must be deleted from the ROB.

The ideal program would have no branches, or only unconditional branches. Since this isn't realistic, however, a better plan would be to limit, as much as possible, the number of conditional branches found in a program (because any conditional branch is an opportunity for a misprediction and the attendant performance dip). The Conditional Move instruction (CMOV) permits the elimination of a conditional branch by testing a condition and only performing the indicated move if the condition tests true.

Description

The Conditional Move instruction checks the state of one or more EFlags status bits (CF, OF, PF, SF, and ZF) and only executes the move if the condition is met. If the condition is not satisfied, the move is not performed and execution continues with the instruction following the CMOV. This instruction can move a 16- or 32-bit value from memory to a GPR or from one GPR to another. Moves of 8-bit register operands are not supported. A processor's support for this feature may be determined by executing a CPUID request type 1 and checking EDX[CMOV] (1 indicates it is supported; see Figure 24-23 on page 591).

Conditional FP Move (FCMOV) Eliminates Branches
Problem Addressed

Mispredicted branches cause a deep dip in performance, so, wherever possible, the elimination of conditional branches eliminates the possibility of a branch misprediction.

Description

The FP Conditional Move (FCMOV) is the FP equivalent of the CMOV instruction. It conditionally moves values between FP stack registers and permits the elimination of a conditional branch. Prior to executing the FCMOV instruction, the programmer would execute an FCOMI, FCOMIP, FUCOMI, or FCOMIP instruction (see the next section) to set the appropriate condition bits in the integer EFlags register to be tested by the FCMOV instruction. A processor's support for this feature may be determined by executing a CPUID request type 1 and checking EDX[CMOV] (1 indicates it is supported; see Figure 24-23 on page 591).

FCOMI, FCOMIP, FUCOMI, and FUCOMIP

The following instructions have been added to the FP instruction set and are intended for use with the FCMOV instruction:

  • FCOMI. FP Compare real and set Integer flags (in the EFlags register, rather than the status bits in the FP SWR) instruction.

  • FCOMIP. FP Compare real and set Integer flags instruction. This instruction also pops the FP register stack.

  • FUCOMI. FP Unordered Compare real and set Integer flags instruction

  • FUCOMIP. FP Unordered Compare real and set Integer flags instruction.

    This instruction also pops the FP register stack.

A processor's support for this feature may be determined by executing a CPUID request type 1 and checking EDX[CMOV] (1 indicates it is supported; see Figure 24-23 on page 591).

RDPMC
Problem Addressed

Although the Performance Monitoring Counters were implemented in the pre-MMX Pentium®, they could only be accessed by privilege level 0 code using the RDMSR and WRMSR instructions. This meant that code running at a lesser privilege level (1, 2, or 3) could not uti-lize the Performance Monitoring facility.

Description

The RDPMC instruction loads the contents of the specified (in ECX) 40-bit counter into the EDX:EAX register pair (EDX is loaded with the upper 8 bits and EAX with the lower 32 bits). The counters (0 or 1) are specified by placing 0000h or 0001h, respectively, in ECX.

This instruction allows code running at a less privileged level to read the counters (if CR4[PCE] = 1), thereby permitting performance monitoring by less privileged code without incurring the overhead of an OS call.

The RDPMC instruction isn't a serializing instruction (for more information, refer to “Serializing Instructions” on page 1079). If an exact event count is required, use a serializing instruction (such as CPUID) before and/or after the RDPMC instruction.

The RDPMC instruction can be executed in 16-bit mode or in VM86 mode, but the full contents of ECX is used to identify the counter to access and the 40-bit result is returned in the EDX:EAX register pair.

UD2

This instruction is provided for software testing to explicitly generate an Invalid Opcode exception. Its opcode is reserved for this purpose. Other than raising the Invalid Opcode exception, it is the same as the NOP instruction.

The CPUID Instruction Enhanced

The CPUID instruction was enhanced to support request type 2 as well as types 0 and 1. The type 2 request returns cache and TLB topology info. In addition, the CPUID instruction was enhanced to support the BIOS Update feature (see “MicroCode Update Feature” on page 631 for a detailed description). For a detailed discussion of CPU identification and determination of a processor's capabilities, refer to “CPU Identification” on page 1443.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.162.110