84 5. FIXED-POINT VS. FLOATING-POINT
0:25, respectively. e result is 0:1875, which falls within the fractional range. Notice that
the hardware generates the same 1’s and 0’s; what is different is the interpretation of the bits.
* 1110 * -2 * 1.110 * -0.25
11110100 -12 11.110100 -0.1875
best approximation
in 4-biy memory
Note that since the
MSB is a sign bit,
the corresponding
partial product is
the 2’s complement
of the multiplicand
extended
sign bit
sign bit
Figure 5.4: Binary and fractional multiplication.
When multiplying QN numbers, it should be remembered that the result will consist of
2N fractional bits, one sign bit, and one or more extended sign bits. Based on the data type used,
the result has to be shifted accordingly. If two Q15 numbers are multiplied, the result will be
32-bits wide, with the MSB being the extended sign bit followed by the sign bit. e imaginary
decimal point will be after the 30th bit. After discarding the extended sign bit with a 1-bit left
shift, a right shift of 16 is required to store the result in a 16-bit memory location as a Q15
number. It should be realized that some precision is lost, of course, as a result of discarding the
smaller fractional bits. Since only 16 bits can be stored, the shifting allows one to retain the
higher precision fractional bits. If a 32-bit storage capability is available, a left shift of 1 can be
performed to remove the extended sign bit and store the result as a Q31 number.
To further understand a possible precision loss when manipulating Q-format numbers,
let us consider another example where two Q3.12 numbers corresponding to 7.5 and 7.25 are
multiplied and that the available memory space is 16-bit wide. As can be seen from Figure 5.5,
the resulting product might be left shifted by 4 bits to store all the fractional bits corresponding
to Q3.12 format. However, doing so results in a product value of 6.375, which is different than
the correct value of 54.375. If the fractional product is stored in a lower precision Q-format—say,
in Q6.9 format—then the correct product value can be stored.
Although Q-format solves the problem of overflow in multiplication, addition, and sub-
traction still pose a problem. When adding two Q15 numbers, the sum exceeds the range of
Q15 representation. To solve this problem, the scaling approach, discussed later in the chapter,
needs to be employed.