Starting with the Pentium®, all IA32 processors implement the 64-bit Time Stamp Counter (TSC) register (see Figure 21-8 on page 501). It is cleared to zero on the assertion of reset to the processor. Upon the deassertion of the reset input, the TSC is incremented once for each processor clock cycle.
The TSC can be read in either of two ways:
Privilege level 0 code can use the RDMSR instruction to read the TSC. The TSC's MSR address is 10h. The programmer sets ECX = 10h and executes the RDMSR instruction. The current contents of the TSC is loaded into EDX and EAX: the upper 32 bits into EDX and the lower 32 bits into EAX.
The TSC can also be read using the RDTSC instruction. This instruction can always be executed when the processor is operating in Real Mode. When the processor is operating in Protected Mode, it can always be executed by privilege level 0 code. Whether or not a privilege level 3 program can successfully execute RDTSC is governed by the state of CR4[TSD] (see “Restricting Access to the TSC” on page 499).
Privilege level 0 code can use the WRMSR instruction to write to the TSC. However, only the lower 32 bits of the TSC can be written to.
ECX = 10h (the MSR address of the TSC).
EDX = don't care.
EAX = the value to be written into the lower 32 bits of the TSC.
The WRMSR instruction is then executed. The upper 32 bits of the TSC cannot be written to and are automatically cleared to zero on any write to the TSC. The value from EAX is written into the lower 32 bits of the TSC.
When the processor is in Protected Mode or VM86 Mode, CR4[TSD] (see Figure 21-6 on page 497) restricts use of the RDTSC instruction as follows:
When CR4[TSD] = 0, RDTSC can be executed at any privilege level.
When CR4[TSD] = 1, RDTSC can only be executed by programs executing with a privilege level of 0.
The RDTSC instruction can be executed when the processor is in Real Mode.
When the TSC reaches a count of all ones, the next processor clock cycle causes it to wrap to zero. The architecture guarantees that the TSC frequency and configuration will be such that it will not wraparound within 10 years after being reset to 0. For the Pentium® 4, Pentium® 4 Xeon, Pentium® M, the P6 family, and the Pentium® processors, the period for counter wrap is several thousands of years.
Starting with the Pentium® Pro processor, IA32 processors translate IA32 instructions into primitive, fixed-length instructions (micro-ops, or μops) that are executed by the processor core. μops can be executed out of order by the processor core. Consider the following (where “xxx” = any instruction):
xxx xxx rdtsc xxx xxx xxx xxx xxx rdtsc xxx
It is the programmer's intention to measure the execution time of the code residing between the two reads of the TSC. However, since the processor core can execute instructions out of order, the processor may execute all, some, or none of the instructions between the two reads of the TSC. Obviously, this would yield a pretty useless measurement.
Certain instructions are serializing instructions (see “Serializing Instructions” on page 1079). In plain language, this means that the instruction acts as a fence in the program. The core cannot move beyond the fence until all μops upstream of the fence and the fence itself complete execution. Only then is the core permitted to execute μops that reside downstream of the fence. Consider the following:
xxx xxx rdtsc CPUID xxx xxx xxx xxx xxx rdtsc CPUID xxx
The core cannot execute any of the instructions that lie beneath the first CPUID instruction until all of the instructions up to and including the CPUID have been completed and their respective results retired to the register set. Only then can it drop beneath the first fence (i.e., the first CPUID). Likewise, the core cannot move beneath the second fence (CPUID) until all of the μops between the first and second fences have completed execution. The measurement taken will be accurate.
3.144.244.228