Chapter 10. Instruction Set Architecture

image with no caption

This chapter discusses the implementation of a CPU’s instruction set. Although the choice of a given instruction set (for example, the 80x86 instruction set) is usually beyond the control of a software engineer, understanding the choices a hardware design engineer has to make when designing a CPU’s instruction set can definitely help you write better code.

CPU instruction sets contain several trade-offs based on assumptions that computer architects make concerning the way software engineers write code. If the machine instructions you choose match the assumptions the CPU’s designers have made, then your code will probably run faster and require fewer machine resources. Conversely, if your code violates the assumptions the hardware engineers have made, chances are pretty good it will not perform as well as it other-wise could.

Although studying the instruction set may seem like a task suited only to assembly language programmers, even high-level language programmers should understand the design of their CPU’s instruction set. After all, every high-level language statement maps to some sequence of machine instructions. Indeed, studying generic instruction set design is probably more important to high-level language programmers than it is to assembly programmers (who should study the specific instruction set they are using), as the general concepts are portable across architectures. Therefore, even if you don’t ever intend to write software using assembly language, it’s important to understand how the underlying machine instructions work and how they were designed in the first place.

10.1 The Importance of the Design of the Instruction Set

The design of the CPU’s instruction set is one of the most interesting and important aspects of CPU design. The instruction set architecture (ISA) is something that a designer must get correct from the start of the design cycle. Features like caches, pipelining, superscalar implementation, and so on, can all be grafted on o a CPU long after the original design is obsolete. However, it is very difficult to change the instruction set once a CPU is in production and people are writing software using those instructions. Therefore, instruction set design requires very careful consideration.

You might be tempted to think that the kitchen sink approach to instruction set design (as in “everything, including the kitchen sink”), in which you include every instruction you can dream up, is best. This approach fails for several reasons, as you’ll soon see. Instruction set design is the epitome of compromise management. Good instruction set design involves selecting what to throw out rather than what to keep. It’s easy enough to say, “Let’s include everything.” The hard part is deciding what to leave out once it becomes clear that you can’t have every instruction on the chip. Why can’t we have it all? Well, in the real world some nasty realities prevent this:

Nasty reality 1: Silicon real estate

The first problem with “putting it all on the chip” is that each feature requires some number of transistors on the CPU’s silicon die (chip). CPU designers work with a “silicon budget” and are given a finite number of transistors to work with. This means that there aren’t enough transistors to support “putting every possible feature” on a CPU. The original 8086 processor, for example, had a transistor budget of fewer than 30 thousand transistors. The Pentium III processor had a budget of over 8 million transistors. These two budgets reflect the differences in semiconductor technology in 1978 versus 1998.

Nasty reality 2: Cost

Although it is possible to use millions of transistors on a CPU today, the more transistors that are used the more expensive the CPU becomes. For example, at the beginning of 2004, Pentium IV processors using millions of transistors cost hundreds of dollars. Contemporary CPUs with 30,000 transistors cost only a few dollars.

Nasty reality 3: Expandability

One problem with the kitchen sink approach is that it’s very difficult to anticipate all the features people will want. For example, Intel’s MMX and SIMD instruction enhancements were added to make multimedia programming more practical on the Pentium processor. Back in 1978, when Intel created the first 8086 processor, very few people could have possibly anticipated the need for these instructions. Therefore, a CPU designer needs to allow for making extensions to the instruction set in future members of the CPU family to handle unanticipated needs.

Nasty reality 4: Legacy support

This is almost the opposite of expandability. Often, an instruction that the CPU designer feels is important turns out to be less useful than anticipated. For example, the loop instruction on the 80x86 CPU sees very little use in modern high-performance programs. The 80x86 enter instruction is another good example. When designing a CPU using the kitchen sink approach, it is common to discover that most programs never use some of the instructions. Unfortunately, removing instructions from later versions of a processor will break existing programs that use those instructions. Generally, once an instruction is added to the instruction set, it will have to be supported in all future members of the processor. Unless very few programs use the instruction, and CPU designers are willing to let them break, removing instructions is a difficult thing to do.

Nasty reality 5: Complexity

The popularity of a new processor is easily measured by how much software people write for that processor. Most CPU designs die a quick death because no one writes software specific to that CPU. Therefore, a CPU designer must consider the assembly programmers and compiler writers who will be using the chip upon introduction. While a kitchen sink approach might seem to appeal to such programmers, the truth is that no one wants to learn an overly complex system. A CPU that does everything under the sun appeals mostly to someone who is already familiar with the CPU. However, pity the poor soul who doesn’t know the chip and has to learn it all at once.

These problems with the kitchen sink approach all have a common solution: design a simple instruction set for the first version of the CPU, and leave room for later expansion. This is one of the main reasons the 80x86 has proven to be so popular and long-lived. Intel started with a relatively simple CPU and figured out how to extend the instruction set over the years to accommodate new features.

10.2 Basic Instruction Design Goals

The efficiency of your programs largely depends upon the instructions that they use. Short instructions use very little memory and often execute rapidly — nice attributes to have when writing great code. On the other hand, larger instructions can often handle more complex tasks, with a single instruction often doing the work of several less-complex instructions. To enable software engineers to write the best possible code, computer architects must strike a careful compromise between overly simplistic instructions (which are short and efficient, but don’t do very much work) and overly complex instructions that consume excessive memory or require too many machine cycles to execute.

In a typical CPU, the computer encodes instructions as numeric values (operation codes or opcodes) and stores them in memory. The encoding of these instructions is one of the major tasks in instruction set design, requiring careful thought. Each instruction must have a unique opcode (clearly, two different instructions cannot share the same opcode or the CPU will not be able to differentiate them). With an n-bit number, there are 2n different possible opcodes, so to encode m instructions requires at least log2(m) bits. The main point to keep in mind is that the size of individual CPU instructions is dependent on the total number of instructions that the CPU supports.

Encoding opcodes is a little more involved than assigning a unique numeric value to each instruction. Remember, decoding each instruction and executing the specified task requires actual circuitry. Suppose we have a 7-bit opcode. With an opcode of this size, we could encode 128 different instructions. To decode each of these 128 instructions requires a 7-line to 128-line decoder — an expensive piece of circuitry. However, assuming that the instruction opcodes contain certain (binary) patterns, a single large decoder can often be replaced by several smaller, less expensive decoders.

If an instruction set contains 128 unrelated instructions, there’s little one can do other than decode the entire bit string for each instruction. However, in most architectures the instructions fall into related groups. On the 80x86 CPUs, for example, the mov(eax,ebx); and mov(ecx,edx); instructions have different opcodes because the instructions are different. However, these two instructions are obviously related in that they both move data from one register to another. The only differences between the two are their source and destination operands. This suggests that CPU designers could encode instructions like mov with a sub-opcode, and then they could encode the instruction’s operands using other bit fields within the opcode.

For example, given an instruction set with only eight instructions, each with two operands, and each operand having only one of four possible values, the instructions could be encoded using three packed fields containing three, two, and two bits, respectively (see Figure 10-1). This encoding only needs three simple decoders to determine what the CPU should do. While this is a simple case, it does demonstrate one very important facet of instruction set design — it is important to make opcodes easy to decode and the easiest way to do this is to construct the opcode using several different bit fields. The smaller these bit fields are, the easier it will be for the hardware to decode and execute the instruction.

Separating an opcode into several fields to ease decoding

Figure 10-1. Separating an opcode into several fields to ease decoding

It would seem that when encoding 2n different instructions using n bits, there would be very little leeway in choosing the size of the instruction. It’s going to take n bits to encode those 2n instructions; you can’t do it with any fewer. It is possible, however, to use more than n bits; and believe it or not, that’s the secret to reducing the average instruction size.

10.2.1 Choosing Opcode Length

Before discussing how it is possible to use larger instructions to generate shorter programs, a quick digression is necessary. The first thing to know is that the opcode length isn’t arbitrary. Assuming that a CPU is capable of reading bytes from memory, the opcode will probably have to be some even multiple of eight bits long. If the CPU is not capable of reading bytes from memory (most RISC CPUs read memory only in 32- or 64-bit chunks), then the opcode is going to be the same size as the smallest object the CPU can read from memory at one time. Any attempt to shrink the opcode size below this limit is futile. In this chapter, we’ll work with opcodes that must have a length that is a multiple of eight bits.

Another point to consider here is the size of an instruction’s operands. Some CPU designers include all operands in their opcode. Other CPU designers do not count operands like immediate constants or address displacements as part of the opcode. We will take the latter approach here and not count either of these as part of the actual opcode.

An 8-bit opcode can encode only 256 different instructions. Even if we don’t count instruction operands as part of the opcode, having only 256 different instructions is a stringent limit. Though CPUs with 8-bit opcodes exist, modern processors tend to have far more than 256 different instructions. Because opcodes must have a length that is a multiple of 8 bits, the next smallest possible opcode size is 16 bits. A 2-byte opcode can encode up to 65,536 different instructions, though the opcode size has doubled from 8 to 16 bits (meaning that the instructions will be larger).

When reducing instruction size is an important design goal, CPU designers often employ data compression theory to reduce the average instruction size. The first step is to analyze programs written for a typical CPU and count the number of occurrences of each instruction over a large number of applications. The second step is to create a list of these instructions, sorted by their frequency of use. Next, the most frequently used instructions are assigned 1-byte opcodes; 2-byte opcodes are assigned to the next most frequently used instructions, and opcodes of three or more bytes are assigned to the rarely used instructions. Although this scheme requires opcodes with a maximum size of three or more bytes, most of the actual instructions appearing in a program will use one or two byte opcodes. The average opcode length will be somewhere between one and two bytes (let’s call it 1.5 bytes), and a typical program will be shorter than had all instructions employed a 2-byte opcode (see Figure 10-2).

Although using variable-length instructions allows one to create smaller programs, it comes at a price. First, decoding variable-length instructions is a bit more complicated than decoding fixed-length instructions. Before decoding a particular instruction field, the CPU must first decode the instruction’s size. This extra step consumes time and may affect the overall performance of the CPU by introducing delays in the decoding step and, thereby, limiting the maximum clock speed of the CPU (because those delays stretch out a single clock period, thus reducing the CPU’s clock frequency). Another problem with having variable-length instructions is that it makes decoding multiple instructions in a pipeline difficult, because the CPU cannot easily determine the instruction boundaries in the prefetch queue.

These reasons, along with some others, explain why most popular RISC architectures avoid variable-length instructions. However, in this chapter, we’ll study a variable-length approach because saving memory is an admirable goal.

Encoding instructions using a variable-length opcode

Figure 10-2. Encoding instructions using a variable-length opcode

10.2.2 Planning for the Future

Before actually choosing the instructions to implement in a CPU, designers must plan for the future. The need for new instructions will undoubtedly appear after the initial design, so reserving some opcodes specifically for expansion purposes is a good idea. Given the instruction opcode format appearing in Figure 10-2, it might not be a bad idea to reserve one block of 64 1-byte opcodes, half (4,096) of the 2-byte opcodes, and half (1,048,576) of the 3-byte opcodes for future use. Giving up 64 of the very valuable 1-byte opcodes may seem extravagant, but history suggests that such foresight is rewarded.

10.2.3 Choosing Instructions

The next step is to choose the instructions to implement. Even if nearly half the instructions have been reserved for future expansion, that doesn’t mean that all the remaining opcodes must be used to implement instructions. A designer can leave a good number of these instructions unimplemented, effectively reserving them for the future as well. The right approach is not to use up the opcodes as quickly as possible, but rather to produce a consistent and complete instruction set given the design compromises. The main point to keep in mind here is that it’s much easier to add an instruction later than it is to remove an instruction later. So, for the first go-around, it’s generally better to go with a simpler design rather than a more complex design.

The first step is to choose some generic instruction types. Early in the design process it is important to limit the choice to very common instructions. Instruction sets of other processors are probably the best place to look for suggestions when choosing these instructions. For example, most processors will have instructions like the following:

  • Data movement instructions (such as mov)

  • Arithmetic and logical instructions (such as add, sub, and, or, not)

  • Comparison instructions

  • A set of conditional jump instructions (generally used after the comparison instructions)

  • Input/output instructions

  • Other miscellaneous instructions

The designer of the CPU’s initial instruction set should have the goal of choosing a reasonable set of instructions that will allow programmers to write efficient programs without adding so many instructions that the instruction set design exceeds the silicon budget or violates other design constraints. This requires strategic decisions, which CPU designers should make based on careful research, experimentation, and simulation.

10.2.4 Assigning Opcodes to Instructions

Once the initial instructions have been chosen, the next step is to assign opcodes to them. The first step is to group the instructions into sets according to the characteristics they share. For example, an add instruction is probably going to support the exact same set of operands as the sub instruction. Therefore, it makes sense to put these two instructions into the same group. On the other hand, the not instruction generally requires only a single operand, as does a neg instruction. Therefore, it makes sense to put these two instructions in the same group, but in a different group from the one with add and sub.

Once all the instructions are grouped, the next step is to encode them. A typical encoding scheme will use some bits to select the group the instruction falls into, some bits to select a particular instruction from that group, and some bits to encode the operand types (such as registers, memory locations, and constants). The number of bits needed to encode all this information can have a direct impact on the instruction’s size, regardless of the instruction’s usage frequency. For example, suppose two bits are needed to select an instruction’s group, four bits to select the instruction within that group, and six bits to specify the instruction’s operand types. In this case, the instructions are not going to fit into an 8-bit opcode. On the other hand, if all that’s needed is to push one of eight different registers onto the stack, four bits will be enough to specify the push instruction group, and three bits will be enough to specify the register.

Encoding instruction operands with a minimal amount of space is always a problem, because many instructions allow a large number of operands. For example, the generic 80x86 mov instruction allows two operands and requires a 2-byte opcode.[40] However, Intel noticed that the mov(disp,eax); and mov(eax,disp); instructions occur frequently in programs. Therefore, they created a special 1-byte version of these instructions to reduce their size and, consequently, the size of programs that use these instructions. Note that Intel did not remove the 2-byte versions of these instructions. There are two different instructions that will store EAX into memory and two different instructions that will load EAX from memory. A compiler or assembler would always emit the shorter versions of each of these pairs of instructions.

Notice an important trade-off Intel made with the mov instruction: it gave up an extra opcode in order to provide a shorter version of one variant of each instruction. Actually, Intel uses this trick all over the place to create shorter and easier-to-decode instructions. Back in 1978, creating redundant instructions to reduce program size was a good compromise given the cost of memory. Today, however, a CPU designer would probably use those redundant opcodes for different purposes.

10.3 The Y86 Hypothetical Processor

Because of enhancements made to the 80x86 processor family over the years, Intel’s design goals in 1978, and advances in computer architecture over the years, the encoding of 80x86 instructions is very complex and somewhat illogical. The 80x86 is not a good example to use when introducing the design of an instruction set. Therefore, to further our discussion, we will discuss instruction set design in two stages: first, we will develop a simple (trivial) instruction set for a hypothetical processor that is a small subset of the 80x86, and then we will expand our discussion to the full 80x86 instruction set. Our hypothetical processor is not a true 80x86 CPU, so we will call it the Y86 processor to avoid any accidental association with the Intel x86 family.

10.3.1 Y86 Limitations

The hypothetical Y86 processor is a very stripped down version of the 80x86 CPUs. Before we begin, let’s lay out the restrictions we’ve placed on our Y86 instruction set design:

  • The Y86 only supports one operand size — 16 bits. This simplification frees us from having to encode the size of the operand as part of the opcode (thereby reducing the total number of opcodes we will need).

  • The Y86 processor only supports four 16-bit registers: AX, BX, CX, and DX. This lets us encode register operands with only two bits (versus the three bits the 80x86 family requires to encode eight registers).

  • The Y86 only supports a 16-bit address bus with a maximum of 65,536 bytes of addressable memory.

These simplifications, plus a very limited instruction set, will allow us to encode all Y86 instructions using a one-byte opcode and a 2-byte displacement/offset when applicable.

10.3.2 Y86 Instructions

Including both forms of the mov instruction, the Y86 CPU still provides only 18 basic instructions. Seven of these instructions have two operands, eight of these instructions have one operand, and five instructions have no operands at all. The instructions are mov (two forms), add, sub, cmp, and, or, not, je, jne, jb, jbe, ja, jae, jmp, get, put, and halt.

10.3.2.1 The mov Instruction

The mov instruction is actually two instructions merged into the same instruction class. These are the two forms of the mov instruction:

mov( reg/memory/constant, reg );
mov( reg, memory );

In these forms, reg is either AX, BX, CX, or DX; constant is a numeric constant using hexadecimal notation, and memory is an operand specifying a memory location. The reg/memory/constant operand tells you that this particular operand may be either a register, a memory location, or a constant.

10.3.2.2 Arithmetic and Logical Instructions

The arithmetic and logical instructions take the following forms:

add( reg/memory/constant, reg );
sub( reg/memory/constant, reg );
cmp( reg/memory/constant, reg );
and( reg/memory/constant, reg );
or( reg/memory/constant, reg );

not( reg/memory );

The add instruction adds the value of the first operand to the value of the second operand, storing the sum in the second operand. The sub instruction subtracts the value of the first operand from the value of the second, storing the difference in the second operand. The cmp instruction compares the value of the first operand against the value of the second and saves the result of the comparison for use by the conditional jump instructions (described in the next section). The and and or instructions compute bitwise logical operations between their two operands and store the result of the operation in the second operand. The not instruction appears separately because it only supports a single operand. not is the bitwise logical operation that inverts the bits of its single memory or register operand.

10.3.2.3 Control Transfer Instructions

The control transfer instructions interrupt the execution of instructions stored in sequential memory locations and transfer control to instructions stored at some other point in memory. They do this either unconditionally, or conditionally, using the result from a cmp instruction. These are the control transfer instructions:

ja   dest;  // Jump if above (i.e., greater than)
jae  dest;  // Jump if above or equal (i.e., greater than or equal to)
jb   dest;  // Jump if below (i.e., less than)
jbe  dest;  // Jump if below or equal (i.e., less than or equal to)
je   dest;  // Jump if equal
jne  dest;  // Jump if not equal

jmp  dest;  // Unconditional jump

The first six instructions (ja, jae, jb, jbe, je, and jne) let you check the result of the previous cmp instruction, that is the result of the comparison of that instruction’s first and second operands.[41] For example, if you compare the AX and BX registers with a cmp(ax,bx); instruction and execute the ja instruction, the Y86 CPU will jump to the specified destination location if AX is greater than BX. If AX is not greater than BX, control will fall through to the next instruction in the program. In contrast to the first six instructions, the jmp instruction unconditionally transfers control to the instruction at the destination address.

10.3.2.4 Miscellaneous Instructions

The Y86 supports three instructions that do not have any operands. The get and put instructions let you read and write integer values: get will stop and prompt the user for a hexadecimal value and then store that value into the AX register; put displays the value of the AX register in hexadecimal format. The halt instruction terminates program execution.

10.3.3 Addressing Modes on the Y86

Before assigning opcodes, we have to take a look at the operands these instructions support. As you’ve seen, the 18 Y86 instructions use five different operand types: registers, constants, and three memory-addressing modes (the indirect addressing mode, the indexed addressing mode, and the direct addressing mode). The following paragraphs explain these operand types.

Register operands are the easiest to understand. Consider the following forms of the mov instruction:

mov( ax, ax );
mov( bx, ax );
mov( cx, ax );
mov( dx, ax );

The first instruction accomplishes absolutely nothing. It copies the value from the AX register back into the AX register. The remaining three instructions copy the values of BX, CX, and DX into AX. Note that these instructions leave BX, CX, and DX unchanged. The second operand, the destination operand, is not limited to AX; you can move values to and from any of these registers.

Constants are also easy to understand. The following instructions load their respective registers with the specified constant (all numeric constants in Y86 assembly language are given in hexadecimal, so the “$” prefix is not necessary):

mov( 25, ax );
mov( 195, bx );
mov( 2056, cx );
mov( 1000, dx );

As mentioned, the Y86 instruction set uses three addressing modes to access data in memory. The following instructions demonstrate the use of these three addressing modes:

mov( [1000], ax );
mov( [bx], ax );
mov( [1000+bx], ax );

The first instruction uses the direct addressing mode to load AX with the 16-bit value stored in memory starting at location $1000.

The mov([bx],ax); instruction loads AX with the value at the memory location specified by the contents of the BX register, rather than simply storing BX’s value into AX. This is the indirect addressing mode. Note that mov([1000],ax); is equivalent to the following two instructions:

mov( 1000, bx );
mov( [bx], ax );

The third of the addressing mode examples above, mov( [1000+bx], ax );, provides an example of the indexed addressing mode. This instruction adds the value of the BX register with the value $1000 and uses this sum as the effective memory address. Then it loads the value at this effective memory address into the AX register. This instruction is useful for accessing elements of arrays, records, and other data structures.

10.3.4 Encoding Y86 Instructions

Because a real CPU uses logic circuitry to decode the opcodes and act appropriately on them, we have seen that it is not a very good idea to arbitrarily assign opcodes to machine instructions. A typical CPU opcode uses a certain number of bits in the opcode to denote the instruction class (such as mov, add, sub), and a certain number of bits to encode each of the operands.

A typical Y86 instruction takes the form shown in Figure 10-3. The basic instruction is either one or three bytes long, and the instruction opcode consists of a single byte that contains three fields. The first field, consisting of the HO three bits, defines the instruction, and these three bits provide eight possible combinations. As there are 18 different Y86 instructions, we’ll have to pull some tricks to handle the remaining 10 instructions.

Basic Y86 instruction encoding

Figure 10-3. Basic Y86 instruction encoding

10.3.4.1 Eight Generic Y86 Instructions

As you can see in Figure 10-3, seven of the eight basic opcodes encode the or, and, cmp, sub, and add instructions, as well as both versions of the mov instruction. The eighth basic opcode is a special expansion opcode. This special instruction class provides a mechanism that allows us to expand the number of available instruction classes, which we will return to shortly.

To determine the full opcode for a particular instruction, you need only select the appropriate bits for the iii, rr, and mmm fields (identified in Figure 10-3). The rr field contains the destination register (except for the version of the mov instruction whose iii field is %111), and the mmm field encodes the source operand. For example, to encode the mov(bx,ax); instruction you would select iii = 110 (mov(reg,reg);), rr = 00 (AX), and mmm = 001 (BX). This produces the 1-byte instruction %11000001 or $C0.

Some Y86 instructions are larger than one byte. To illustrate why this is necessary, take, for example, the instruction mov([1000],ax);, which loads the AX register with the value stored at memory location $1000. The encoding for the opcode is %11000110 or $C6. However, the encoding for the mov([2000],ax); instruction is also $C6. Clearly these two instructions do different things: one loads the AX register from memory location $1000, while the other loads the AX register from memory location $2000.

In order to differentiate between instructions that encode an address using the [xxxx] or [xxxx+bx] addressing modes, or to encode a constant using the immediate addressing mode, you must append the 16-bit address or constant after the instruction’s opcode. Within this 16-bit address or constant, the LO byte must follow the opcode in memory and the HO byte must follow the LO byte. So, the three byte encoding for mov([1000],ax); would be $C6, $00, $10, and the three byte encoding for mov([2000],ax); would be $C6, $00, $20.

10.3.4.2 Using the Special Expansion Opcode

The special opcode in Figure 10-3 allows the Y86 CPU to expand the set of available instructions that can be encoded using a single byte. This opcode handles several zero- and one-operand instructions, as shown in Figure 10-4 and Figure 10-5.

Single-operand instruction encodings (iii = %000)

Figure 10-4. Single-operand instruction encodings (iii = %000)

Zero-operand instruction encodings (iii = %000 and rr = %00)

Figure 10-5. Zero-operand instruction encodings (iii = %000 and rr = %00)

There are four one-operand instruction classes whose encodings are shown in Figure 10-4. The first 2-bit encoding for the rr field, %00, further expands the instruction set by providing a way to encode the zero-operand instructions shown Figure 10-5. Five of these instructions are illegal instruction opcodes; the three valid opcodes are the halt instruction, which terminates program execution, the get instruction, which reads a hexadecimal value from the user and stores this value in the AX register, and the put instruction, which outputs the value in the AX register.

The second 2-bit encoding for the rr field, %01, is also part of an expansion opcode that provides all the Y86 jump instructions (see Figure 10-6). The third rr field encoding, %10, is for the not instruction. The fourth rr field encoding is currently unassigned. Any attempt to execute an opcode with an iii field encoding of %000 and an rr field encoding of %11 will halt the processor with an illegal instruction error. CPU designers often reserve unassigned opcodes like this one to allow themselves to extend the instruction set at a future date (as Intel did when moving from the 80286 processor to the 80386).

Jump instruction encodings

Figure 10-6. Jump instruction encodings

As shown in Figure 10-6, there are seven jump instructions in the Y86 instruction set and they all take the following form: jxx address;. The jmp instruction copies the 16-bit address value that follows the opcode into the instruction pointer register, causing the CPU to fetch the next instruction from the target address of the jmp.

The jmp instruction is an example of an unconditional jump instruction. It always transfers control to the target address. The remaining six instructions — ja, jae, jb, jbe, je, and jne — are conditional jump instructions. They test some condition and only jump to the instruction’s destination address if the condition is true; if the condition is false, these six instructions fall through to the next program instruction in memory. You would normally execute these conditional jump instructions immediately after a cmp instruction, as the cmp instruction sets the less than and equality flags that the conditional jump instructions test. Note that there are eight possible jump opcodes, but the Y86 uses only seven of them. The eighth opcode, %00001111, is another illegal opcode.

10.3.5 Examples of Encoding Y86 Instructions

Keep in mind that the Y86 processor fetches instructions as bit patterns from memory. It then decodes and executes those bit patterns. The processor does not execute instructions as strings of characters that are readable by humans, such as mov(ax,bx);. Instead, it executes the bit pattern $C1 in memory. Instructions like mov(ax,bx); and add(5,cx); are human-readable representations of instructions that must first be converted into binary representation, or machine code. In this section, we will explore this conversion.

10.3.5.1 The add Instruction

The first step in converting instructions to machine code is to choose an instruction to convert. We’ll start with a very simple example, the add(cx,dx); instruction. Once you’ve chosen the instruction, you look up the instruction in one of the opcode figures from the previous section. The add instruction is in the first group (see Figure 10-3) and has an iii field of %101. The source operand is CX, so the mmm field is %010 and the destination operand is DX so the rr field is %11. Merging these bits produces the opcode %10111010 or $BA (see Figure 10-7).

Encoding the add( cx, dx ); instruction

Figure 10-7. Encoding the add( cx, dx ); instruction

Now consider the add(5,ax); instruction. Because this instruction has an immediate source operand (a constant), the mmm field will be %111 (see Figure 10-3). The destination register operand is AX (%00), and the instruction class field is %101, so the full opcode becomes $10100111 or $A7. However, this does not complete the encoding of the instruction. We also have to include the 16-bit constant $0005 as part of the instruction. The binary encoding of the constant must immediately follow the opcode in memory, with the LO byte of the constant following the opcode, and the HO byte of the constant following its LO byte, because the bytes are arranged in little endian order. So the sequence of bytes in memory, from lowest address to highest address, is $A7, $05, $00 (see Figure 10-8).

Encoding the add( 5, ax ); instruction

Figure 10-8. Encoding the add( 5, ax ); instruction

The add([2ff+bx],cx); instruction also contains a 16-bit constant that is the displacement portion of the indexed addressing mode. To encode this instruction, we use the following field values: iii = %101, rr = %10, and mmm = %101. This produces the opcode byte %10110101 or $B5. The complete instruction also requires the constant $2FF so the full instruction is the 3-byte sequence $B5, $FF, $02 (see Figure 10-9).

Encoding the add( [$2ff+bx], cx ); instruction

Figure 10-9. Encoding the add( [$2ff+bx], cx ); instruction

Now consider the add([1000],ax); instruction. This instruction adds the 16-bit contents of memory locations $1000 and $1001 to the value in the AX register. Once again, iii = %101 for the add instruction. The destination register is AX so rr = %00. Finally, the addressing mode is the displacement-only addressing mode, so mmm = %110. This forms the opcode %10100110, or $A6. The complete instruction is three bytes long because it must also encode the displacement (address) of the memory location in the two bytes following the opcode. Therefore, the complete 3-byte sequence is $A6, $00, $10 (see Figure 10-10).

Encoding the add ( [1000], ax ); instruction

Figure 10-10. Encoding the add ( [1000], ax ); instruction

The last addressing mode to consider is the register indirect addressing mode, [bx]. The add([bx],bx); instruction uses the following encoded values: mmm = %101, rr = %01 (bx), and mmm = %100 ([bx]). Because the value in the BX register completely specifies the memory address, there is no need to attach a displacement field to the instruction’s encoding. Hence, this instruction is only one byte long (see Figure 10-11).

Encoding the add ( [bx], bx ); instruction

Figure 10-11. Encoding the add ( [bx], bx ); instruction

You use a similar approach to encode the sub, cmp, and, and or instructions. The only difference between encoding these instructions and the add instruction is the value you use for the iii field in the opcode.

10.3.5.2 The mov Instruction

The Y86 mov instruction is special, because it comes in two forms. The only difference between the encoding of the add instruction and the encoding of the mov instruction’s first form (iii = %110) is the iii field. This first form of mov copies either a constant or data from the register or memory address specified by the mmm field into the destination register specified by the rr field.

The second form of the mov instruction (iii = %111) copies data from the source register specified by the rr field to a destination memory location that the mmm field specifies. In this second form of the mov instruction, the source and destination meanings of the rr and mmm fields are reversed so that rr is the source field and mmm is the destination field. Another difference between the two forms of mov is that in its second form, the mmm field may only contain the values %100 ([bx]), %101 ([disp+bx]), and %110 ([disp]). The destination values cannot be any of the registers encoded by mmm field values in the range %000..%011 or a constant encoded by an mmm field of %111. These encodings are illegal because the first form of the mov handles cases with a register destination, and because storing data into a constant doesn’t make any sense.

10.3.5.3 The not Instruction

The not instruction is the only instruction with a single memory/register operand that the Y86 processor supports. The not instruction has the following syntax:

not(reg);

or

not(address);

. . . where address represents one of the memory addressing modes ([bx], [disp+bx], or [disp]). You may not specify a constant operand for the not instruction.

Because the not instruction has only a single operand, it needs only the mmm field to encode this operand. An iii field of %000 along with an rr field of %10 identifies the not instruction. In fact, whenever the iii field contains zero, the CPU knows that decoding beyond the iii field is necessary to identify the instruction. In this case, the rr field specifies whether we have encoded the not instruction or one of the other specially encoded instructions.

To encode an instruction like not(ax);, you would simply specify %000 for the iii field and %10 for the rr field. Then you would encode the mmm field the same way you would encode it for the add instruction. Because mmm = %000 for AX, the encoding of not(ax); would be %00010000 or $10 (see Figure 10-12).

Encoding the not(AX); instruction

Figure 10-12. Encoding the not(AX); instruction

The not instruction does not allow an immediate, or constant, operand, so the opcode %00010111 ($17) is an illegal opcode.

10.3.5.4 The Jump Instructions

The Y86 jump instructions also use the special encoding, meaning that the iii field for jump instructions is always %000. These instructions are always three bytes long. The first byte, the opcode, specifies which jump instruction to execute and the next two bytes specify the address in memory to which the CPU transfers control (if the condition is met, in the case of the conditional jumps). There are seven different Y86 jump instructions, six conditional jumps, and one unconditional jump, jmp. All seven of these instructions set iii = %000 and rr = %01, and therefore only differ according to their mmm fields. The eighth possible opcode, with an mmm field value of %111, is an illegal opcode (see Figure 10-6).

Encoding these instructions is relatively straightforward. Picking the instruction you want to encode completely determines the opcode. The opcode values fall in the range $08..$0E ($0F is the illegal opcode).

The only field that requires some thought is the 16-bit operand that follows the opcode. This field holds the address of the target instruction to which the unconditional jump always transfers, and to which the conditional jumps transfer if the transfer condition is true. To properly encode this 16-bit operand you must know the address of the opcode byte of the target instruction. If you’ve already converted the target instruction to binary form and stored it into memory, you’re all set — just specify the target instruction’s address as the sole operand of the jump instruction. On the other hand, if you haven’t yet written, converted, and placed the target instruction into memory, knowing its address would seem to require a bit of divination. Fortunately, you can figure out the target address by computing the lengths of all the instructions between the current jump instruction you’re encoding and the target instruction. Unfortunately, this is an arduous task.

The best way to go about calculating the distance is to write all your instructions down on paper, compute their lengths (which is easy, because all instructions are either one or three bytes long depending on whether they have a 16-bit operand), and then assign an appropriate address to each instruction. Once you’ve done this, you’ll know the starting address for each instruction, and you can put target address operands into your jump instructions as you encode them.

10.3.5.5 The Zero-Operand Instructions

The remaining instructions, the zero-operand instructions, are the easiest to encode. Because they have no operands they are always one byte long. These instructions always have iii = %000 and rr = %00, and mmm specifies the particular instruction opcode (see Figure 10-5). Note that the Y86 CPU leaves five of these instructions undefined (so we can use these opcodes for future expansion).

10.3.6 Extending the Y86 Instruction Set

The Y86 CPU is a trivial CPU, suitable only for demonstrating how to encode machine instructions. However, like any good CPU, the Y86 design does provide the capability for expansion. Therefore, if you wanted to improve the CPU by adding new instructions, the Y86’s instruction set will allow you to do it.

You can increase the number of instructions in a CPU’s instruction set by using either undefined or illegal opcodes on the CPU. Because the Y86 CPU has several illegal and undefined opcodes, we can use them to expand the instruction set.

Using undefined opcodes to define new instructions works best when there are undefined bit patterns within an opcode group, and the new instruction you want to add falls into that same group. For example, the opcode %00011mmm falls into the same group as the not instruction, which also has an iii field value of %000. If you decided that you really needed a neg (negate) instruction, using the %00011mmm opcode makes sense because you’d probably expect the neg instruction to use the same syntax as the not instruction. Likewise, if you want to add a zero-operand instruction to the instruction set, there are five undefined zero-operand instructions in the Y86 instruction set for you to choose from (%0000000..%00000100, see Figure 10-5). You’d just appropriate one of these opcodes and assign your instruction to it.

Unfortunately, the Y86 CPU doesn’t have many illegal opcodes available. For example, if you wanted to add the shl (shift left), shr (shift right), rol (rotate left), and ror (rotate right) instructions as single-operand instructions, there is insufficient space within the group of single-operand instruction opcodes to add these instructions (only %00011mmm is currently open). Likewise, there are no two-operand opcodes open, so if you wanted to add an xor (exclusive OR) instruction or some other two-operand instruction, you’d be out of luck.

A common way to handle this dilemma, and one the Intel designers have employed, is to use one of the undefined opcodes as a prefix opcode byte. For example, the opcode $FF is illegal (it corresponds to a mov(dx,const); instruction), but we can use this byte as a special prefix byte to further expand the instruction set (see Figure 10-13).[42]

Using a prefix byte to extend the instruction set

Figure 10-13. Using a prefix byte to extend the instruction set

Whenever the CPU encounters a prefix byte in memory, it reads and decodes the next byte in memory as the actual opcode. However, it does not treat the second byte as it would a standard opcode that did not come after a prefix byte. Instead, it allows the CPU designer to create a completely new opcode scheme, independent of the original instruction set. A single-expansion opcode byte allows the CPU designer to add up to 256 additional instructions to the instruction set. If the CPU designer wishes to add even more instructions, that designer can use additional illegal opcode bytes (in the original instruction set) to add yet more expansion opcodes, each with their own independent instruction sets, or the CPU designer can follow the opcode expansion prefix byte with a 2-byte opcode (yielding up to 65,536 new instructions) or any other scheme the CPU designer can dream up.

Of course, one big drawback to this opcode expansion scheme is that it increases the size of the new instructions by one byte, because each instruction now requires the prefix byte as part of the opcode. This also increases the cost of the circuitry (decoding prefix bytes and multiple instruction sets is fairly complex), so you don’t want to use this scheme for the basic instruction set. Nevertheless, it does provide a good mechanism for expanding the instruction set when you’ve run out of opcodes.

10.4 Encoding 80x86 Instructions

The Y86 processor is simple to understand; it is easy to encode instructions by hand for this processor, and it is a great vehicle for learning how to assign opcodes. It’s also a purely hypothetical device intended only as a teaching tool. It’s time to take a look at the machine instruction format for a real CPU — the 80x86. After all, the programs you’re going to write will run on a real CPU, like the 80x86, so to fully appreciate what your compilers are doing with your code (so you can choose the best statements and data structures when writing that code), you need to understand how real instructions are encoded. Even if you’re using a different CPU, studying the 80x86 instruction encoding is a good exercise, because it is one of the more complex instruction sets in common use today and will provide a good insight into the operation of other real-world CPUs.

They don’t call the 80x86 a Complex Instruction Set Computer (CISC) chip for nothing. Although more complex instruction encodings do exist, no one is going to challenge the assertion that the 80x86 has a complex instruction encoding. The generic 80x86 instruction takes the form shown in Figure 10-14. Although this diagram seems to imply that instructions can be up to 16 bytes long, instructions cannot actually be greater than 15 bytes in length.

80x86 instruction encoding

Figure 10-14. 80x86 instruction encoding

The prefix bytes are not the same as the opcode expansion prefix byte that we discussed in the previous section. Instead, the 80x86 prefix bytes modify the behavior of existing instructions. An instruction may have a maximum of four prefix bytes attached to it, but the 80x86 certainly supports more than four different prefix values. Also note that the behaviors of many prefix bytes are mutually exclusive, and the instruction’s results will be undefined if you put a pair of mutually exclusive prefix bytes in front of an instruction. We’ll take a look at a couple of these prefix bytes in a moment.

The 80x86 supports two basic opcode sizes: a standard 1-byte opcode and a 2-byte opcode consisting of a $0F opcode expansion prefix byte and a second byte specifying the actual instruction. One way to think of this opcode expansion prefix byte is as an 8-bit extension of the iii field in the Y86 encoding. This enables the encoding of up to 512 different instruction classes, although the 80x86 does not yet use them all. In reality, various instruction classes use certain bits in this opcode expansion prefix byte for decidedly non-instruction-class purposes. For example, consider the add instruction opcode. It takes the form shown in Figure 10-15.

80x86 add opcode

Figure 10-15. 80x86 add opcode

Bit 1 (d) specifies the direction of the transfer. If this bit is zero, then the destination operand is a memory location, such as in add(al,[ebx]);. If this bit is one, the destination operand is a register, as in add([ebx],al);.

Note that bit 0 (s) specifies the size of the operands the add instruction operates upon. There is a problem here, however. The 80x86 family supports up to three different operand sizes: 8-bit operands, 16-bit operands, and 32-bit operands. With a single size bit, the instruction can only encode two of these three different sizes. In modern (32-bit) operating systems, the vast majority of operands are either 8 bits or 32 bits, so the 80x86 CPU uses the size bit in the opcode to encode 8-bit or 32-bit sizes. For 16-bit operands, which occur less frequently than 8-bit or 32-bit operands, Intel uses a special opcode prefix byte to specify the size. As long as instructions that have 16-bit operands occur less than one out of every eight instructions (and this is generally true), this is more compact than adding another bit to the instruction’s size. Using a size prefix byte allowed Intel’s designers to extend the number of operand sizes without having to change the instruction encoding inherited from the original 16-bit processors in this CPU family.

10.4.1 Encoding Instruction Operands

The mod-reg-r/m byte (see Figure 10-14) provides the encoding for instruction operands by specifying the base addressing mode used to access the operands and the instruction operand size. This byte contains the fields shown in Figure 10-16.

mod-reg-r/m byte

Figure 10-16. mod-reg-r/m byte

The reg field almost always specifies an 80x86 register. However, depending on the instruction, the register specified by reg can be either the source or the destination operand. To distinguish between the two possibilities, many instructions have the d (direction) field in their opcode that contains a value of zero when reg is the source and a value of one when reg is the destination operand.

This field uses the 3-bit register encodings found in Table 10-1. As you just learned, the size bit in the instruction’s opcode specifies whether the reg field specifies an 8- or 32-bit register (when operating under a modern, 32-bit operating system). To make the reg field specify a 16-bit register requires setting the size bit in the opcode to one, as well as adding an additional prefix byte.

Table 10-1. reg Field Encodings

reg Value

Register If Data Size Is 8 Bits

Register If Data Size Is 16 Bits

Register If Data Size Is 32 Bits

%000

al

ax

eax

%001

cl

cx

ecx

%010

dl

dx

edx

%011

bl

bx

ebx

%100

ah

sp

esp

%101

ch

bp

ebp

%110

dh

si

esi

%111

bh

di

edi

With the d bit in the opcode of a two-operand instruction determining whether the reg field contains the source operand or the destination operand, the mod and r/m fields combine to specify the other of the two operands. In the case of a single-operand instruction like not or neg, the reg field will contain an opcode extension and mod and r/m will combine to specify the only operand. The mod and r/m fields together specify the operand addressing modes listed in Table 10-2 and Table 10-3.

Table 10-2. mod Field Encodings

mod

Description

%00

Specifies register indirect addressing mode (with two exceptions: scaled indexed [sib] addressing modes with no displacement operand when r/m = %100; and displacement-only addressing mode when r/m = %101).

%01

Specifies that a 1-byte signed displacement follows the addressing mode byte(s).

%10

Specifies that a 1-byte signed displacement follows the addressing mode byte(s).

%11

Specifies direct register access.

Table 10-3. mod-r/m Encodings

mod

r/m

Addressing Mode

%00

%000

[eax]

%01

%000

[eax+disp8]

%10

%000

[eax+disp32]

%11

%000

al, ax, or eax

%00

%001

[ecx]

%01

%001

[ecx+disp8]

%10

%001

[ecx+disp32]

%11

%001

cl, cx, or ecx

%00

%010

[edx]

%01

%010

[edx+disp8]

%10

%010

[edx+disp32]

%11

%010

dl, dx, or edx

%00

%011

[ebx]

%01

%011

[ebx+disp8]

%10

%011

[ebx+disp32]

%11

%011

bl, bx, or ebx

%00

%100

Scaled indexed (sib) mode

%01

%100

sib + disp8 mode

%10

%100

sib + disp32 mode

%11

%100

ah, sp, or esp

%00

%101

Displacement-only mode (32-bit displacement)

%01

%101

[ebp+disp8]

%10

%101

[ebp+disp32]

%11

%101

ch, bp, or ebp

%00

%110

[esi]

%01

%110

[esi+disp8]

%10

%110

[esi+disp32]

%11

%110

dh, si, or esi

%00

%111

[edi]

%01

%111

[edi+disp8]

%10

%111

[edi+disp32]

%11

%111

bh, di, or edi

There are a couple of interesting things to note about Table 10-2 and Table 10-3. First, there are two different forms of the [reg+disp] addressing modes: one form with an 8-bit displacement and one form with a 32-bit displacement. Addressing modes whose displacement falls in the range −128..+127 require only a single byte after the opcode to encode the displacement. Instructions with a displacement that falls within this range will be shorter and sometimes faster than instructions whose displacement values are not within this range and thus require the addition of four bytes after the opcode.

The second thing to note is that there is no [ebp] addressing mode. If you look in Table 10-3 where this addressing mode logically belongs (where r/m is %101 and mod is %00), you’ll find that its slot is occupied by the 32-bit displacement-only addressing mode. The basic encoding scheme for addressing modes didn’t allow for a displacement-only addressing mode, so Intel “stole” the encoding for [ebp] and used that for the displacement-only mode. Fortunately, anything you can do with the [ebp] addressing mode you can also do with the [ebp+disp8] addressing mode by setting the 8-bit displacement to zero. True, such an instruction is a little bit longer than it would otherwise need to be if the [ebp] addressing mode existed, but the same capabilities are still there. Intel wisely chose to replace this particular register indirect addressing mode because they anticipated that programmers would use it less often than they would use the other register indirect addressing modes.

Another thing you’ll notice missing from this table are addressing modes of the form [esp], [esp+disp8], and [esp+disp32]. Intel’s designers borrowed the encodings for these three addressing modes to support the scaled indexed addressing modes they added to their 32-bit processors in the 80x86 family.

If r/m = %100 and mod = %00, then this specifies an addressing mode of the form [reg132+reg232*n]. This scaled index addressing mode computes the final address in memory as the sum of reg2 multiplied by n (n = 1, 2, 4, or 8) and reg1. Programs most often use this addressing mode when reg1 is a pointer holding the base address of an array of bytes (n = 1), words (n = 2), double words (n = 4), or quad words (n = 8) and reg2 holds the index into that array.

If r/m = %100 and mod = %01, then this specifies an addressing mode of the form [reg132+reg232*n + disp8]. This scaled index addressing mode computes the final address in memory as the sum of reg2 multiplied by n (n = 1, 2, 4, or 8), reg1, and the 8-bit signed displacement (sign extended to 32 bits). Programs most often use this addressing mode when reg1 is a pointer holding the base address of an array of records, reg2 holding the index into that array, and disp8 providing the offset to a desired field in the record.

If r/m = %100 and mod = %10, then this specifies an addressing mode of the form [reg132+reg232*n + disp32]. This scaled index addressing mode computes the final address in memory as the sum of reg2 multiplied by n (n = 1, 2, 4, or 8), reg1, and the 32-bit signed displacement. Programs most often use this addressing mode to index into static arrays of bytes, words, double words, or quad words.

If values corresponding to one of the sib modes appear in the mod and r/m fields, then the addressing mode is a scaled indexed addressing mode with a second byte (the sib) following the mod-reg-r/m byte, though don’t forget that the mod field still specifies a displacement size of zero, one, or four bytes. Figure 10-17 shows the layout of this extra sib, and Table 10-4, Table 10-5, and Table 10-6 explain the values for each of the sib fields.

The sib (scaled index byte) layout

Figure 10-17. The sib (scaled index byte) layout

Table 10-4. Scale Values

Scale Value

Index*Scale Value

%00

Index*1

%01

Index*2

%10

Index*4

%11

Index*8

Table 10-5. Register Values for sib Encoding

Index Value

Register

%000

EAX

%001

ECX

%010

EDX

%011

EBX

%100

Illegal

%101

EBP

%110

ESI

%111

EDI

Table 10-6. Base Register Values for sib Encoding

Base Value

Register

%000

EAX

%001

ECX

%010

EDX

%011

EBX

%100

ESP

%101

Displacement only if mod = %00, EBP if mod = %01 or %10

%110

ESI

%111

EDI

The mod-reg-r/m and sib bytes are complex and convoluted, no question about that. The reason these addressing mode bytes are so convoluted is that Intel reused its 16-bit addressing circuitry when it switched to the 32-bit format rather than simply abandoning the 16-bit circuitry at that point. There were good hardware reasons for doing this, but the result is a complex scheme for specifying addressing modes.

Note that if the r/m field of the mod-reg-r/m byte contains %100 and mod does not contain %11 the addressing mode is a sib mode rather than the expected [esp], [esp+disp8], or [ESP+disp32] mode. In this case the compiler or assembler will emit an extra sib byte immediately following the mod-reg-r/m byte. Table 10-7 lists the various combinations of legal scaled indexed addressing modes on the 80x8

Table 10-7. The Scaled Indexed Addressing Modes

mod

Index

Legal Scaled Indexed Addressing Modes[a]

%00

%000

[base32+eax*n]

Base ≠ %101

%001

[base32+ecx*n]

 

%010

[base32+edx*n]

 

%011

[base32+ebx*n]

 

%100

n/a[b]

 

%101

[base32+ebp*n]

 

%110

[base32+esi*n]

 

%111

[base32+edi*n]

%00

%000

[disp32+eax*n]

Base = %101[c]

%001

[disp32+ecx*n]

 

%010

[disp32+edx*n]

 

%011

[disp32+ebx*n]

 

%100

n/a

 

%101

[disp32+ebp*n]

 

%110

[disp32+esi*n]

 

%111

[disp32+edi*n]

%01

%000

[disp8+base32+eax*n]

 

%001

[disp8+base32+ecx*n]

 

%010

[disp8+base32+edx*n]

 

%011

[disp8+base32+ebx*n]

 

%100

n/a

 

%101

[disp8+base32+ebp*n]

 

%110

[disp8+base32+esi*n]

 

%111

[disp8+base32+edi*n]

%10

%000

[disp32+base32+eax*n]

 

%001

[disp32+base32+ecx*n]

 

%010

[disp32+base32+edx*n]

 

%011

[disp32+base32+ebx*n]

 

%100

n/a

 

%101

[disp32+base32+ebp*n]

 

%110

[disp32+base32+esi*n]

 

%111

[disp32+base32+edi*n]

[a] The base32 register can be any of the 80×86 32-bit general-purpose registers, as specified by the base field.

[b] The 80×86 does not allow a program to use the ESP as an index register.

[c] The 80×86 doesn’t support a [base32+ebp*n] addressing mode, but you can achieve the same effective address using [base32+ebp*n+disp8] with an 8-bit displacement value of zero.

In each of the addressing modes appearing in Table 10-7, the mod field of the mod-reg-r/m byte specifies the size of the displacement (zero, one, or four bytes). The base and index fields of the sib specify the base and index registers, respectively. Note that this addressing mode does not allow the use of the ESP register as an index register. Presumably, Intel left this particular mode undefined to provide the ability to extend the addressing modes to three bytes in a future version of the CPU, although doing so seems a bit extreme.

Just as the mod-reg-r/m encoding replaced the [ebp] addressing mode with a displacement only mode, the sib addressing format replaces the [EBP+index*scale] mode with a displacement plus index mode (that is, no base register). If it turns out that you really need to use the [EBP+index*scale] addressing mode, you will have to use the [disp8+EBP+index*scale] mode instead, specifying a one-byte displacement value of zero.

10.4.2 Encoding the add Instruction — Some Examples

To figure out how to encode an instruction using this complex scheme, some examples may prove useful. Therefore, let’s look at how to encode the 80x86 add instruction using various addressing modes. The add opcode is either $00, $01, $02, or $03, depending on the direction and size bits in the opcode (see Figure 10-15). Figure 10-18 through Figure 10-25 on the next three pages show how to encode various forms of the add instruction using different addressing modes.

Encoding the add( al, cl ); instruction

Figure 10-18. Encoding the add( al, cl ); instruction

There is an interesting side effect of the mod-reg-r/m organization and resulting from how the direction bit works: some instructions have two different legal opcodes. For example, we could also encode the add(al,cl); instruction shown in Figure 10-18 as $02, $C8 by reversing the positions of the AL and CL registers in the reg and r/m fields and then setting the d bit (bit 1) in the opcode to one. This applies to all instructions with two register operands and a direction bit, such as the add(eax,ecx); instruction in Figure 10-19, which can also be encoded as $03, $C8.

Encoding the add( eax, ecx ); instruction

Figure 10-19. Encoding the add( eax, ecx ); instruction

Encoding the add( disp, edx ); instruction

Figure 10-20. Encoding the add( disp, edx ); instruction

Encoding the add( [ebx], edi ); instruction

Figure 10-21. Encoding the add( [ebx], edi ); instruction

Encoding the add( [esi+disp8], eax ); instruction

Figure 10-22. Encoding the add( [esi+disp8], eax ); instruction

Encoding the add( [ebp+disp32], ebx ); instruction

Figure 10-23. Encoding the add( [ebp+disp32], ebx ); instruction

Encoding the add( [disp32+eax*1], ebp ); instruction

Figure 10-24. Encoding the add( [disp32+eax*1], ebp ); instruction

Encoding the add( [ebx+edi*4], ecx ); instruction

Figure 10-25. Encoding the add( [ebx+edi*4], ecx ); instruction

10.4.3 Encoding Immediate Operands

You may have noticed that the mod-reg-r/m and sib bytes don’t contain any bit combinations you can use to specify that an instruction contains an immediate operand. The 80x86 uses a completely different opcode to specify an immediate operand. Figure 10-26 shows the basic encoding for an add immediate instruction.

Encoding an add immediate instruction

Figure 10-26. Encoding an add immediate instruction

There are three major differences between the encoding of the add immediate instruction and the standard add instruction. First, and most important, the opcode has a one in the HO bit position. This tells the CPU that the instruction has an immediate constant. This one change alone, however, does not tell the CPU that it must execute an add instruction, as you’ll see shortly.

The second difference is that there is no direction bit in the opcode. This makes sense because you cannot specify a constant as a destination operand. Therefore, the destination operand is always the location specified by the mod and r/m bits in the mod-reg-r/m field.

In place of the direction bit, the opcode has a sign extension (x) bit. For 8-bit operands, the CPU ignores the sign extension bit. For 16-bit and 32-bit operands, the sign-extension bit specifies the size of the constant following the add instruction. If the sign extension bit contains zero, the constant is already the same size as the operand (either 16 or 32 bits). If the sign-extension bit contains one, the constant is a signed 8-bit value, and the CPU sign extends this value to the appropriate size before adding it to the operand. This little trick often makes programs quite a bit shorter because one commonly adds small constants to 16- or 32-bit destination operands.

The third difference between the add immediate and the standard add instruction is the meaning of the reg field in the mod-reg-r/m byte. Because the instruction implies that the source operand is a constant and the mod-r/m fields specify the destination operand, the instruction does not need to use the reg field to specify an operand. Instead, the 80x86 CPU uses these three bits as an opcode extension. For the add immediate instruction, these three bits must contain zero, and another bit pattern would correspond to a different instruction.

Note that when adding a constant to a memory location, any displacement associated with the memory location immediately precedes the constant data in the instruction sequence.

10.4.4 Encoding 8-, 16-, and 32-Bit Operands

When Intel designed the 8086, they used one opcode bit (s) to specify whether the operand sizes were 8 or 16 bits. Later, when they extended the 80x86 architecture to 32 bits with the introduction of the 80386, they had a problem: with this single operand size bit they could only encode two sizes, but they needed to encode three (8, 16, and 32 bits). To solve this problem, they used an operand-size prefix byte.

Intel studied its instruction set and came to the conclusion that in a 32-bit environment, programs were likely to use 8-bit and 32-bit operands far more often than 16-bit operands. Therefore, Intel decided to let the size bit (s) in the opcode select between 8- and 32-bit operands, as the previous sections describe. Although modern 32-bit programs don’t use 16-bit operands very often, they do need them now and then. To allow for 16-bit operands, Intel lets you prefix a 32-bit instruction with the operand size prefix byte, whose value is $66. This prefix byte tells the CPU that the operands contain 16-bit data rather than 32-bit data.

You do not have to explicitly put an operand-size prefix byte in front of your 16-bit instructions; the assembler or compiler will take care of this automatically for you. However, do keep in mind that whenever you use a 16-bit object in a 32-bit program, the instruction is one byte longer because of the prefix value. Therefore, you should be careful about using 16-bit instructions if size, and, to a lesser extent, speed are important.

10.4.5 Alternate Encodings for Instructions

As noted earlier in this chapter, one of Intel’s primary design goals for the 80x86 was to create an instruction set to allow programmers to write very short programs in order to save memory, which was precious at the time. One way they did this was to create alternative encodings of some very commonly used instructions. These alternative instructions were shorter than their standard counterparts, and Intel hoped that programmers would make extensive use of the shorter versions, thus creating shorter programs.

A good example of these alternative instructions are the add(constant,accumulator); instructions, where the accumulator is AL, AX, or EAX. The 80x86 provides 1-byte opcodes for add(constant,al); and add(constant,eax );, which are $04 and $05, respectively. With a 1-byte opcode and no mod-reg-r/m byte, these instructions are one byte shorter than their standard add immediate counterparts. The add(constant,ax); instruction requires an operand size prefix, so its opcode is effectively two bytes if you count the prefix byte. This, however, is still one byte shorter than the corresponding standard add immediate.

You do not have to specify anything special to use these instructions. Any decent assembler or compiler will automatically choose the shortest possible instruction it can use when translating your source code into machine code. However, you should note that Intel only provides alternative encodings for the accumulator registers. Therefore, if you have a choice of several instructions to use and the accumulator registers are among these choices, the AL, AX, and EAX registers are often your best bet. However, this option is usually only available to assembly language programmers.

10.5 Implications of Instruction Set Design to the Programmer

Upon initial inspection, it would seem that instruction set design is of little interest to programmers who simply want to write great code, rather than design their own instruction sets. However, only by knowing the computer’s architecture and, in particular, how the CPU encodes machine instructions, can a programmer make the most efficient use of the machine’s instructions. By studying instruction set design, a programmer can gain a clear understanding of the following:

  • Why some instructions are shorter than others

  • Why some instructions are faster than others

  • Which constant values the CPU can handle efficiently

  • Whether constants are more efficient than memory locations

  • Why certain arithmetic and logical operations are more efficient than others

  • Which types of arithmetic expressions are more easily translated into machine code than other types

  • Why code is less efficient if it transfers control over a large distance in the object code

. . . and so on.

By studying instruction set design, a programmer becomes more aware of the implications of the code they write (even in a high-level language) with respect to efficient operation on the CPU. Armed with this knowledge, the programmer can write great code.

10.6 For More Information

Like the previous chapter, this chapter deals with issues of computer architecture. There are many college and professional texts on computer architecture available that will provide additional information about instruction set design and the choices and trade-offs one must make when designing an instruction set. Patterson and Hennessy’s Computer Architecture: A Quantitative Approach is one of the better regarded texts on this subject (though it is a bit heavily biased towards RISC architectures). The data books for your particular CPU will discuss the instruction encoding for that CPU and detail the operation of each machine instruction the CPU supports. A good text on assembly language programming may also prove helpful when learning more about instruction set design (such as my text for Intel 80x86 users, The Art of Assembly Language). For details on the design of the 80x86 instruction set, you’ll want to check out Intel’s Pentium manuals, available online at http://www.intel.com. Of course, similar information is available at just about every CPU manufacturer’s website.



[40] Actually, Intel claims it’s a 1-byte opcode plus a 1-byte mod-reg-r/m byte. For our purposes, we’ll treat the mod-reg-r/m byte as part of the opcode.

[41] The Y86 processor only performs unsigned comparisons.

[42] We could also have used values $F7, $EF, and $E7 as they also correspond to an attempt to store a register into a constant. However, $FF is easier to decode. On the other hand, if you need even more prefix bytes for instruction expansion, you can use these three values as well.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.154.64