Integer Multiplication and Division

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6.5. Integer Multiplication and Division

In Section 6.1.6, we demonstrated how addition can be based on logic. Choosing two's complement representation for signed integers makes subtraction a special case of addition that is relatively easy to implement at the hardware level. That choice avoids complications associated with positive and negative variants of zero in sign and magnitude representation.

In this section, we extend the exploration of arithmetic founded on logic. First we look at an early algorithm that accomplishes multiplication by using addition. Then we look at division problems, where the divisor is a constant. The divisor is converted into a reciprocal to be used in a multiplication operation that is mathematically equivalent to the division.

6.5.1. Booth's Algorithm for Multiplication

Repeated addition can accomplish multiplication (N × X = X + X + X for N =3, etc.), but requires an accumulator that is double the width of the operands. Repeated addition using a counted loop takes a long time for very large values of N, and can require carries.

Performing “long multiplication” on decimal numbers involves a combination of individual multiplication steps and an overall addition, with multiple carries, that is guided by positional weights (Section 1.8.1). Long multiplication for binary numbers works analogously. The multiplication steps are trivial (either 0 × X or 1 × X), but the “troublesome” carries (Section 6.1.6) are still present.

During the course of his development of an early computer architecture, Andrew Donald Booth first explained a method in which the carries cannot build up. More importantly, the algorithm itself can proceed in identical fashion regardless of the signs or order of the two's complement operands (X = multiplicand, N = multiplier).

Our presentation of Booth's algorithm uses a notational approach introduced by John Peatman, which we demonstrate with 8-bit numbers. The multiplier can be written as:

N = –128n_s + 64n₆ + 32n₅ + 16n₄ + 8n₃ + 4n₂ + 2n₁ + n₀

where n_s is the sign bit. This expression can be rewritten in the following “clever” way:

N = –128(n_s–n₆) – 64(n₆–n₅) – 32(n₅–n₄) – 16(n₄–n₃) – 8(n₃–n₂) – 4(n₂–n₁) – 2(n₁–n₀) – (n₀–0)

where all of the terms now have the same form. In an 8-bit number, the sign bit n_s is obviously n₇.

Since each n_i bit is a coefficient taking a value of 0 or 1 in a weighted positional notation, each symbolic difference in parentheses will be +1, –1, or 0. These values constitute the recipe for adding, subtracting, or ignoring each multiple 2ⁱ X needed to compose the product P = N × X.

It is less obvious, but much more important, to observe that adjacent nonignorable operations alternate in sign (addition or subtraction) as a partial product P is being accumulated. Consider the number N = 0b00001011 with any X:

Criterion		Starting condition:	P = 0
`10`	–1(n₀–0) = –1		P = –1X
`11`	–2(n₁–n₀) = 0	no change
`01`	–4(n₂–n₁) = +4		P = +3X
`10`	–8(n₃–n₂) = –8		P = –5X
`01`	–16(n₄–n₃) = +16		P = +11X
`00`	–2ⁱ(n_i – n_i_–1) = 0	all the rest yield 0

where the criterion column (examined below) juxtaposes bits n_i and n_i_–₁. Note also the special first case n₀ and 0. The correct sum is produced for the multiplier N = 11₁₀ and any X.

Recall that addition of numbers with unlike signs can never exceed the bit-width of the representation. The strict alternation of signs—that is, addition and subtraction—keeps the partial product P from growing any larger than the width of N. Stated differently, a widening P coupled with a narrowing N always fits in 2w bits, where w is the width of the numeric representation.

The schematic arrangement for the Booth algorithm in Figure 6-4 has the following features: one register of width w bits to hold the multiplicand X; a double register of width 2w bits, split into L and R halves, where L is initialized to zero and the R contains the multiplier N; and additional storage for a single bit, n_i_–₁, which is the bit most recently shifted out of the LR double register. That single bit is initialized to zero.

Figure 6-4. Register scheme for the Booth algorithm

Shifting L (and R) to the right, relative to a stationary X, is equivalent to writing down the partial products of X that are shifted to the left in standard long multiplication.

The criterion for addition or subtraction can be recognized as the appropriate entry in Table 6-4, corresponding to the current multiplier bit n_i and the previous multiplier bit n_i_–₁.

Table 6-4. Operation at Step i in the Booth Algorithm
Bit n_i	Bit n_i–₁	Action
0	0	do nothing; then shift right
0	1	add X; then shift right
1	0	subtract X; then shift right
1	1	do nothing; then shift right

With the stated initializations and an understanding of the presentation in Figure 6-4 and Table 6-4, Booth's algorithm simply becomes:

repeat w times:
isolate the current ni for use in the next iteration
L := L+X, L := L-X, or leave L alone // Table 6-4
shift the pattern in LR to the right one bit position
maintain sign extension for the pattern in LR at all times

Carries from the leftmost bit of L are ignored. For N = 11 as given above, this plays out as follows for X = 13 = 0b00001101:

iteration	L	R	n _i_–1
	`00000000`	`00001011`	`0`	initialization
1	`11110011` `11111001`	`00001011` `10000101`	`0` `1`	10: subtract X from L shift LR (same sign); update n_i–₁
2	`11111001` `11111100`	`10000101` `11000010`	`1` `1`	11: do nothing shift LR (same sign); update n_i–₁
3	`00001001` `00000100`	`11000010` `11100001`	`1` `0`	01: add X to L shift LR (same sign); update n_i–₁
4	`11110111` `11111011`	`11100001` `11110000`	`0` `1`	10: subtract X from L shift LR (same sign); update n_i–₁
5	`00001000` `00000100`	`11110000` `01111000`	`1` `0`	01: add X to L shift LR (same sign); update n_i–₁
6	`00000100` `00000010`	`01111000` `00111100`	`0` `0`	00: do nothing shift LR (same sign); update n_i–₁
7	`00000010` `00000001`	`00111100` `00011110`	`0` `0`	00: do nothing shift LR (same sign); update n_i–₁
8	`00000001` `00000000`	`00011110` `10001111`	`0` `1`	00: do nothing shift LR (same sign); update n_i–₁

with a final result of 0b10001111 = 143₁₀ = 13 × 11. The L portion is zero only because this particular product is a small positive number.

Can we now convert this algorithm into Itanium instructions using 64-bit operands? Our outline suggests roles for a double-width register, shifting of a double-width intermediate quantity that eventually becomes the product, and a criterion for predicating addition, subtraction, or neither at each stage in a counted loop. Representing the single-bit memory location with register rn, initialized to zero, we express Itanium instructions for the body of the loop:

        and         rt=0x1,rr;;   // Isolate lowest bit of R
        xor         ri=rn,rt;;    // ri = whether to act
        cmp.ne      pi,p0=0,ri    // pi = whether to act
        mov         rn=rt;;       // Save bit for next time
(pi)    cmp.eq.unc  pa,ps=0,rt;;  // Add, subtract, nop?
(pa)    add         rl=rl,rx      // Add X to L
(ps)    sub         rl=rl,rx;;    // Subtract X from L
        shrp        rr=rl,rr,1    // New R of shifted LR
        shr         rl=rl,1       // New L of shifted LR

There are countless ways to implement Booth's algorithm in any language or architecture. Some may offer shorter code sequences based on tight iterative loops, but EPIC principles can significantly improve performance by eliminating the branches inside those loops.

6.5.2. Unsigned Multiplication

We can easily adapt the implementation of Booth's algorithm for circumstances where we know that the two operands are unsigned, rather than signed, integers. In order to see how to make this correction, look back at the leading term in the algebraic sum for the value of the multiplier N (see the beginning of Section 6.5.1).

When the operands are signed, as with the normal implementation of Booth's algorithm, the leading term has the general weighted value –2^w^–¹n_w_–₁, where w is the bit width of the integer representation. When the operands are unsigned, that leading term should have the weighted value +2^w^–1n_w_–₁. The result from the normal implementation of Booth's algorithm needs to be corrected by +2×2^w^–1n_w_–₁ = +2^wn_w_–1.

In pseudocode, X is added to L after the last shift operation of the loop if and only if the leading bit n_w_–₁ of N is 1:

                              // Unsigned multiplication only
       tbit.nz   pn,p0=rn,63  // If bit 63 of N is 1,
(pn)   add       rl=rl,rx     //  add X after last shift
                              // Unsigned multiplication only

Since our development treated X symbolically throughout, the leading (and no longer sign) bit of X can be ignored, as can any carry-out to the left in L.

6.5.3. Division Using Known Reciprocals

When we introduced integer arithmetic instructions in Chapter 4, we pointed out that Alpha, Itanium, and some other high-performance architectures lack an integer divide instruction. As Bhandarkar has noted, several methods can be used to implement integer division:

Use of floating-point hardware to convert from integer into floating-point representation, to divide, and then to convert back. Accuracy might be limited by the precision of the floating-point representation (not a problem for the Itanium ISA);
Iterative testing and subtraction to implement algorithms analogous to decimal long division; and
Multiplication by the reciprocal of the divisor, and then a shift instruction to normalize the result.

This last method, using a reciprocal, is especially applicable for division by a constant that is known at the time the program is assembled or compiled.

Consider the very common need to divide a binary integer by 10 as part of an algorithm for formatting decimal numbers for printing. The reciprocal of 10 is the repeating fraction 0.0001100110011... (binary). In order to preserve maximum precision, we can instead use the normalized 64-bit fraction 0.CCCCCCCCCCCCCCCD (hexadecimal representation of 0.8) for the multiplication, with a subsequent division by 8. The final hexadecimal digit of the fraction is D rather than C in order to ensure normal rounding, since shifting as a means of division truncates, rather than rounds, the result. According to standard algebra, this division by 8 can be deferred until an extended-precision product has been formed through multiplication of the source number by the normalized fraction. That is, we can use the simple mathematical identities x/10 = 0.1x = 0.8x/8.

We now explain this method in more detail. Since one-tenth is less than one-eighth but greater than one-sixteenth, the representation of 0.1 (decimal) as a binary fraction begins with three zero bits before the first one bit (that is, no one-half, no one-quarter, no one-eighth, one one-sixteenth, etc.). The full binary representation for one-tenth is then found, by a continuation of this process, to contain a repeating bit pattern, …1100…, which corresponds to the hexadecimal character C. With binary numbers and fractions, each shift to the left by one bit position corresponds to a multiplication of the value by a factor of two, and each shift to the right by one bit position corresponds to a division of the value by a factor of two. If we shift the bit pattern representing one-tenth three bit positions to the left, so that those first three zeros are gone, the value becomes 2³ = 8 times greater; thus the resulting 0.CCC… hexadecimal pattern represents the value eight tenths.

When we wish to divide a value x by ten, we first multiply that value by 0.8 to form an intermediate product 0.8x of highest precision. Then we divide the intermediate product 0.8x by 8 to obtain a result equivalent to x/10. Division by 8 is easily accomplished on any computer that supports an arithmetic shift to the right by three bit positions.

This algorithm is quite easily expressed using Itanium instructions, with the source number in register rx. We imagine some function, umpy8hi, which implements either Booth's algorithm or a sequence of Itanium floating-point instructions to perform 8-byte unsigned multiplication while retaining the high-order portion of a full 128-bit product. For reasons that will unfold below, we need an instruction that yields the high portion. Bhandarkar's suggestion can be expressed as:

DOT8 = 0xCCCCCCCCCCCCCCCD     // 0.8 (finite approx.)
(code fragment)
movl     rf = DOT8;;          // Get reciprocal factor
umpy8hi  rl = rf,rx;;         // rl = high 64 bits
shr      rq = rl,3;;          // Divide by 8

We now have the desired quotient in register rq. The original source number is still in register rx. If we are doing a modulo calculation and need to know the remainder, we can go on to compute it from rx – 10*rq.

We can illustrate this method by asking for the result of integer division of 256 by 10 using this program fragment. Since 256 = 2⁸, the multiplication of DOT8 by 256 will result in a 128-bit product based on a shift to the left by 8 bits, or 2 hexadecimal digits, where

low-order 64 bits are:    CCCCCCCCCCCCCD00 (hexadecimal)
high-order 64 bits are:   00000000000000CC (hexadecimal)

Remember that our version of Booth's algorithm retains the low-order 64 bits in R and the high-order 64 bits in L (register rl in the pseudocode just above). Continuing, we now divide the high-order portion by 8,

CC₁₆ = 1100 1100₂
      0001 1001₂  =  19₁₆ = 25₁₀

thus obtaining the proper result of 25; integer division discards the remainder.

In developing this methodology, we chose to treat the reciprocal of the known divisor as a binary fraction and the multiplicand (dividend) as a whole number. Accordingly, the split between the whole number and the fraction was located exactly between the high- and low-order 64-bit portions of the full product. Picking off the integer part of the quotient then simply meant choosing the contents of register L in our illustration of Booth's algorithm. Among computer architectures with machine instructions for multiplication, some provide the low- and high-order segments in separate registers from the execution of a single instruction, while others have two different opcodes that specify which segment is placed into a destination register.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Integer Multiplication and Division

Create new playlist

Sign In

Sign Up

6.5. Integer Multiplication and Division

6.5.1. Booth's Algorithm for Multiplication

Figure 6-4. Register scheme for the Booth algorithm

6.5.2. Unsigned Multiplication

6.5.3. Division Using Known Reciprocals

Table of Contents for
Integer Multiplication and Division