The HT Approach

Instruction Level Parallelism (ILP)

Refer to Figure 39-2 on page 969. Instruction Level Parallelism (ILP) refers to a superscalar processor's ability to dispatch and execute multiple instructions simultaneously (using an array of execution units). Optimized compilers attempt to keep as many of the execution units busy in each clock cycle as possible, but, in almost every clock cycle, one or more execution units are typically idle.

Figure 39-2. It's Difficult Keeping All of the Execution Units Busy


The number of execution units that are actually productive in each clock cycle is a function of the instruction mix that comprises the currently running program and even the finest program will have difficulty keep everyone productive all of the time.

Such a waste!

But What If...

But what if the execution units in the processor core were fed two separate instruction streams associated with two separate programs? Odds are, this would result in a greater variety of instructions to choose from and therefore probably keep more of the execution units busy during each clock cycle.

That's what HT is all about! In each clock cycle, the processor alternates between fetching instructions that are associated with two separate programs (i.e, threads).

This Requires Two, Almost Complete Register Sets

Fetching the instructions associated with two separate threads would require (at a minimum) two CS:EIP register pairs to point to the instructions comprising the two threads.

In addition, it wouldn't do to have two completely separate programs interleaving their accesses to exactly the same processor register set. Being completely unaware of each other, each of the two programs would assume that whatever it places in a processor register (e.g., EAX, XMM3, MMX3, etc.) would still be there when it goes back to use that value again.

This means the HT processor must implement two, almost complete register sets.

HT = Simultaneous Multithreading

HT permits the processor core to fetch and execute two, separate threads (code streams) simultaneously (so to speak). Two CS:EIP register pairs are implemented and the instruction prefetcher alternates (clock cycle-by-clock cycle) fetching code for one program and then the other.

The physical processor implements two almost complete register sets so it can track the execution of two, separate programs. There is only one set of execution units however.

Terms: Cluster, Physical CPU, Logical CPU

A HT processor package (referred to as a physical processor) effectively contains two or more engines each of which is capable of fetching and preparing a separate thread to be fed to the core's single set of execution units. Each of these engines is referred to as a logical processor. Current HT implementations (~ April of 2004) implement two logical processors (but the architecture will support more).

Each physical processor in a multiprocessor system belongs to the cluster of processors that reside on the same FSB. A specific system implementation may contain one or more processor clusters.

Detecting HT Capability

The programmer must ensure that both of the following are true in determining whether a processor supports HT:

  • Execute a CPUID request type 1 and ensure that:

    - EDX[HTT] = 1 (see Figure 39-3 on page 972).

    Figure 39-3. EDX Contents After a CPUID Request Type 1

    - EBX[23:16] = the number of logical processors supported (see Figure 39-5 on page 977).

    Figure 39-5. EBX Contents Returned by a CPUID Request Type 1

Intel® documentation states that if EDX[HTT] = 1 and the number of logical processors = 1, then the processor does not have HT capability.

Enabling/Disabling HT

The processor's HT capability can be completely disabled for the entire power-up session by ensuring that A[31]# is sampled asserted (an electrical low) on the trailing-edge of Reset assertion. See “Hyper-Threading Option” on page 869.

Each Logical Processor Has Its Own Local APIC

The OS scheduler must have the ability to deliver interrupt messages to a specific logical processor within a physical processor. In addition, the IO APIC may be programmed to deliver certain interrupt messages to a specific logical processor within a physical processor.

For these reasons, a Local APIC is implemented for each of the logical processors within a physical processor.

HT Processor Resource Types

General

The HT processor does not implement two complete processor cores (many transistors). Rather, each of the processor core's resources falls into one of the following categories:

  • Replicated resources. Each logical processor has its own, dedicated instance of the resource (e.g., the ITLBs are replicated).

  • Partitioned resources. These are the resources that are partitioned equally for each logical processor when both logical processors are executing code. If either of the logical processors should execute a HLT instruction, the partitioned resource is recombined and the full resource is dedicated to the logical processor that is still executing code.

  • Shared resources. These are the resources that are always shared by all of the logical processors within a physical processor.

  • Resources that may be shared or partitioned (depending on the processor implementation).

Some resources (e.g., the Trace Cache) are referred to as Entry Tagged. The resource is shared by the logical processors, but it is not partitioned. Rather, each resource entry contains the ID of the logical processor with whom that entry is associated.

The following sections define the resources that fall into each of these categories.

Resources that Are Always Replicated

The following resources are always replicated for each of the logical processors:

  • The GPRs: EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP.

  • The Segment registers: CS, DS, SS, ES, FS, and GS.

  • The EFlags and EIP registers.

  • The x87 FPU registers: ST[7:0], FSW, FCW, FTW, the data operand pointer, and instruction pointer registers.

  • The MMX registers: MM[7:0].

  • The SSE registers: XMM[7:0] and MXCSR.

  • The following, privileged registers: CR0, CR2, CR3, CR4, GDTR, LDTR, IDTR, and TR.

  • The debug-related registers: DR[3:0], DR[7:6], and the IA32_DEBUGCTL MSR.

  • The following Machine Check registers: IA32_MCG_STATUS and IA32_MCG_CAP MSRs.

  • The Thermal clock modulation and ACPI Power management control MSRs.

  • The Time Stamp Counter.

  • Most of the other MSR registers, including the Page Attribute Table (PAT). The exceptions are defined in the next section.

  • The Local APIC registers.

Resources that Are Always Shared

The following resources are always shared by the logical processors:

  • The IA32_MISC_ENABLE MSR.

  • The Memory Type and Range Registers (MTRRs).

Each of the logical processors within a physical processor can independently read and/or write these resources.

Resources Wherein Sharing or Replication Is Design-Specific

Whether the following resources are replicated or shared is implementation-specific:

  • All of the Machine Check Architecture MSRs (except for the IA32_MCG_STATUS and IA32_MCG_CAP MSRs).

  • The Performance Monitoring control and counter MSRs.

  • There is a reference in one Intel® document that says the BTBs may be either shared or partitioned.

  • Whether a processor's caches are shared by the logical processors in a physical processor or they are replicated for each logical processor is implementation specific.

  • Whether a processor's TLBs are shared by the logical processors in a physical processor or they are replicated for each logical processor is implementation specific.

The HT States

An HT-enabled processor can be any of the following HT states:

  • The ST0 state. There is a single program thread executing on logical processor 0. Logical processor 1 has executed a HLT instruction and is therefore not executing code.

    - Assuming that its Local APIC is enabled to recognize interrupts, an interrupt (including receipt of an IPI from another logical processor's Local APIC) will cause a resumption of code execution.

  • The ST1 state. There is a single program thread executing on logical processor 1. Logical processor 0 has executed a HLT instruction and is therefore not executing code.

  • The MT state. Multiple Thread mode. Both logical processors are executing threads.

  • The Auto-Halt Power-Down power conservation state (see “The AutoHalt Power Down State” on page 686). All logical processors in a physical processor have executed the HLT instruction.

Switching HT States

During a period of time when both logical processors have been executing code, the partitioned resources were divided. When one of the logical processors executes the HLT instruction, the ST0 or ST1 state is entered. The partitioned buffers are drained of all instructions associated with the halted logical processor and are then recombined into a single buffer dedicated to the logical processor that is still executing code.

Conversely, when the halted logical processor resumes program execution (due to the receipt of an interrupt from its Local APIC), the partitioned buffers must be drained and partitioned again.

Processor Enumeration

Assignment of IDs to the Processor” on page 860 described how the processor is automatically assigned a cluster ID, physical processor ID, and Local APIC ID (which is also the logical processor's ID). Figure 39-4 on page 976 illustrates the Local APIC's APIC ID register. Please note the following:

  • The Logical Processor ID field may be extended to 2 or 3 bits in future processors (allowing up to 4 or 8 logical processors per package).

  • The Physical Package ID field may be extended to more than 2 bits in future processors (allowing more than four physical processors per cluster).

Figure 39-4. The Local APIC's APIC ID Register


Executing a CPUID request type 1 returns the information shown in Figure 39-5 on page 977 (in the EBX register). It's interesting to note that although an HT-enabled processor contains one Local APIC per logical processor, a CPUID request type 1 only returns the ID of one of them. That raises the question of which logical processor's ID is returned. The Intel® documentation doesn't say so, but it is the ID of logical processor 0.

As with IA32 processors that are not HT-capable, software can assign a different Local APIC ID to a logical processor's Local APIC by writing the value into the Local APIC's APIC ID register; however, the CPUID instruction will still report the processor's initial APIC ID (the value assigned during power-up or RESET).

It is a rule that all processors present in the system must support the same number of logical processors per physical processor.

The MP table built by the BIOS only contains the Local APIC IDs of the primary logical processor (logical processor 0) in each physical package. However, all of the logical processors in the system are included in the ACPI (Advanced Configuration and Power Interface) table, with the primary logical processors at the top of the table followed by the secondary logical processors. The MPS predates the advent of HT technology, so there is no concept of multiple Local APICs within a processor package.

A detailed description of the enumeration process can be found in “Boot Strap Processor (BSP) Selection” on page 885 and “How the APs are Discovered and Configured” on page 888.

The Primary and Secondary Logical Processor

Within a physical processor, logical processor 0 is referred to as the primary logical processor, while 1 is the secondary logical processor.

OS Support for HT

General

To use HT, the OS must support a multi-threaded operation on a single physical processor that contains multiple logical processors.

OSs that Include Native HT Support

The following OSs support HT:

  • Windows XP Professional Edition.

  • Windows XP Home Edition.

  • Linux version 2.4.x (and higher).

OSs that Are Compatible with HT

The following OSs must be re-installed in order to recognize HT and enable HT support:

  • Windows 2000 (all versions).

  • Windows NT 4.0 (limited driver support).

  • The required additional steps are manual steps performed by the end-user to select the additional processors during the OS re-install.

OSs with No HT Support

The following OSs do not have HT support (and HT should be disabled via the BIOS; see “Hyper-Threading Option” on page 869):

  • Windows ME.

  • Windows 98 (has limited driver support). This statement is on the Microsoft web site and the author is not sure how to interpret it.

  • Windows.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.199.184