The Core

General

The Pentium® II processor core represented a slightly modified Pentium® Pro processor core. It has the following differences:

  • The MMX register set and instruction set was added (see “MMX Capability” on page 519 for a description of MMX).

  • Four new instructions were added to the instruction set. See “Instruction Set Changes” on page 707.

  • The L1 Code and Data Cache sizes were increased in size from 8KB each to 16KB each.

  • While the Pentium® Pro included ECC protection on both the L1 and L2 Caches, the early versions of the Pentium® II did not have ECC protection on the L2 cache. The later versions added it back in.

  • The Pentium® Pro did not exhibit optimal performance when executing legacy 16-bit code. The Pentium® II corrected this problem (see “16-bit Code Optimization” on page 672).

L1 Caches

In order to compensate for the slower BSB speed, the size of the L1 Data and Code Caches were doubled in size (when compared to the Pentium® Pro processor) from 8KB each to 16KB each.

L1 Code Cache Characteristics

The characteristics of the Pentium® II's L1 Code Cache are:

  • It's 16KB in size.

  • It is a 4-way set-associative cache.

  • The cache line size is 32 bytes.

  • It implements an SI subset of the MESI coherency protocol.

For a detailed description, refer to chapter 7 of the MindShare book entitled Pentium® Pro and Pentium® II System Architecture, Second Edition.

L1 Data Cache Characteristics

The characteristics of the Pentium® II's L1 Data Cache are:

  • It's 16KB in size.

  • It is a 4-way set-associative cache.

  • Each cache bank is subdivided into two subbanks.

  • The cache line size is 32 bytes.

  • It implements the full MESI coherency protocol.

  • It is a non-blocking cache.

For a detailed description, refer to chapter 7 of the MindShare book entitled Pentium® Pro and Pentium® II System Architecture, Second Edition.

L1 and L2 Cache Error Protection

In the Pentium® Pro processor, the L1 and L2 Caches are ECC protected. The versions of the Pentium® II processor designed for desktop applications had ECC protected L1 Caches, but the L2 Cache was not ECC protected. As of July '97, Intel® also offered versions of the Pentium® II processor that provided ECC protection on the L2 as well as the L1 caches. This feature was important for servers, but not necessarily for the desktop. The addition of ECC to the L2 Cache added a cycle to each L2 Cache access, thereby degrading performance (in life, there's no such thing as a free ride). The L2 Cache was still 512KB in size.

16-bit Code Optimization

The Pentium® Pro processor design was optimized for 32-bit code execution. One of the characteristics of 32-bit code is that the data segment registers (DS, ES, FS and GS) are almost never loaded with new values by 32-bit applications. 16-bit code, however, writes new values into the data segment registers quite frequently.

The Pentium® Pro Was Not Optimized

Unlike the processor's GPRs, the Pentium® Pro processor's data segment registers were not aliased (for more information, see “The IA32 Data Register Set” on page 174 on the CD). Loading a new value into a data segment register (see Figure 26-8 on page 673) immediately changed the value in the target data segment register. μops that reside downstream from the μop that changed the segment register (see Figure 26-9 on page 674) might perform memory accesses (loads or stores) within the affected memory data segment. If the processor were permitted to speculatively execute instructions beyond the one that changed the data segment register before the segment register load μop had completed execution, those downstream loads and/or stores would be using the old, stale contents of the segment register and would address the wrong memory location.

Figure 26-8. A Data Segment Register Load in the ROB


Figure 26-9. Loads and Stores May Reside Beneath the Data Segment Register Load


For this reason, the Pentium® Pro processor was designed not to execute any μops beyond the segment register load until all μops up to and including the segment register load had completed execution (see Figure 26-10 on page 675). Only then was the processor core permitted to start executing μops downstream of the segment register load. The processor's performance degraded because it was restrained from out-of-order execution. Since 16-bit code typically changed the contents of the data segment registers a lot, the processor suffered poor performance when executing 16-bit code. Since Windows 95 implemented quite a bit of 16-bit code, its performance suffered when executed on a Pentium® Pro-based system.

Figure 26-10. The Data Segment Register Load Acts as a Fence in the ROB


Pentium® II Shadows the Data Segment Registers

Intel® fixed this problem in the Pentium® II processor by aliasing (or shadowing) the data segment registers to μop entries in the ROB (as was already done for the GPRs). This was accomplished in the RAT stage of the instruction pipeline. Intel® estimated that the Pentium® II processor's shadowing of the segment registers could increase performance when executing 16-bit code by as much as 8 to 10% when compared to the Pentium® Pro.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.31.73