81
C H A P T E R 5
Fixed-Point vs. Floating-Point
One important feature that distinguishes different processors is whether their CPUs perform
fixed-point or floating-point arithmetic. In a fixed-point processor, numbers are represented and
manipulated in integer format. In a floating-point processor, in addition to integer arithmetic,
floating-point arithmetic can be handled. is means that numbers are represented by the com-
bination of a mantissa (or a fractional part) and an exponent part, and the CPU possesses the
necessary hardware for manipulating both of these parts. As a result, in general, floating-point
operations involve more logic elements (larger ALU) and more cycles (more time) to manipulate
floating-point values.
In a fixed-point processor, one needs to be concerned with the dynamic range of numbers,
since a much narrower range of numbers can be represented in integer format as compared to
floating-point format. For most applications, such a concern can be virtually ignored when using
a floating-point processor. Consequently, fixed-point processors usually demand more coding
effort than do floating-point processors.
5.1 Q-FORMAT NUMBER REPRESENTATION
e decimal value of a 2’s-complement number B D b
N 1
b
N 2
: : : b
1
b
0
; b
i
2 f0; 1g, is given by
D.B/ D b
N 1
2
N 1
C b
N 2
2
N 2
C C b
1
2
1
C b
0
2
0
: (5.1)
e 2’s-complement representation allows a processor to perform integer addition and subtrac-
tion by using the same hardware. When using unsigned integer representation, the sign bit is
treated as an extra bit. Only positive numbers get represented this way.
ere is a limitation to the dynamic range of the foregoing integer representation scheme.
For example, in a 16-bit system it is not possible to represent numbers larger than C2
15
1 D32,767 or smaller than 2
15
D32,768. To cope with this limitation, numbers are normal-
ized between 1 and 1. In other words, they are represented as fractions. is normalization is
achieved by the programmer moving the implied or imaginary binary point (note that there is
no physical memory allocated to this point), as indicated in Figure 5.1. is way, the fractional
value is given by
F .B/ D b
N 1
2
0
C b
N 2
2
1
C C b
1
2
.N
2/
C b
0
2
.N
1/
: (5.2)
is representation scheme is referred to as Q-format or fractional representation. e
programmer needs to keep track of the implied binary point when manipulating Q-format num-
82 5. FIXED-POINT VS. FLOATING-POINT
Integer Representation
Implied binary point
b
N-1
b
N-2
b
0
Fractional Representation
b
N-1
b
N-2
b
0
Implied Binary Point
Figure 5.1: Integer vs. fractional representation.
bers. For instance, let us consider two Q15 format numbers and a 16-bit wide memory. Each
number consists of 1 sign bit plus 15 fractional bits. When these numbers are multiplied, a Q30
format number is obtained (the product of two fractions is still a fraction), with bit 31 being the
sign bit and bit 32 another sign bit (called extended sign bit). If not enough bits are available to
store all 32 bits, and only 16 bits can be stored, it makes sense to store the most significant bits.
is translates into storing the upper portion of the 32-bit product register, minus the extended
sign bit, by doing a 1-bit left shift followed by a 16-bit right shift. In this manner, the prod-
uct would be stored in Q15 format (see Figure 5.2). Notation for Q-format numbers is QM:N
where M represents the number of bits corresponding to the whole-number part and N the
number of bits corresponding to the fractional-number part.
Based on 2’s-complement representation, a dynamic range of
2
N 1
D.B/ <
2
N 1
1 can be achieved, where N denotes the number of bits. For an easy illustration, let
us consider a 4-bit system where the most negative number is 8 and the most positive number
7. e decimal representations of the numbers are shown in Figure 5.3. Notice how the numbers
change from most positive to most negative with the sign bit. Since only the integer numbers
falling within the limits 8 and 7 can be represented, it is easy to see that any multiplication or
addition resulting in a number larger than 7 or smaller than 8 will cause overflow. For exam-
ple, when 6 is multiplied by 2, the number 12 is obtained. Hence, the result is greater than the
representation limits and will be wrapped around the circle to 1100, which is 4.
Q-format representation addresses this problem by normalizing the dynamic range be-
tween 1 and 1. Any resulting multiplication falls within the limits of this dynamic range.
Using Q-format representation, the dynamic range is divided into 2
N
sections, where 2
.N 1/
is the size of a section. e most negative number is always 1 and the most positive number is
1 2
.N 1/
.
5.1. Q-FORMAT NUMBER REPRESENTATION 83
Q
30
S
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
S
S
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
Q
15
Q
15
S
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
Q
15
S
z
z
z
z
z
z
z
z
z
z
z
z
z
z
?
Add 1 to ? bit then truncate
If ? = 0, no eff
ect (i.e., rounded down)
If ? = 1, result is rounded up
Figure 5.2: Multiplying and storing Q-15 numbers.
0000
1000
0
-8
1111
0111
-1
7
0011
1011
3
-5
0010
1010
2
-6
0001
1001
1
-7
1100 0100-4 4
1101
0101
-3
5
1110
0110
-2
6
Figure 5.3: Four-bit binary representation.
e following example helps one to see the difference in the two representation schemes.
As shown in Figure 5.4, the multiplication of 0110 by 1110 in binary is the equivalent of mul-
tiplying 6 by 2 in decimal, giving an outcome of 12, a number exceeding the dynamic range
of the 4-bit system. Based on the Q3 representation, these numbers correspond to 0.75 and
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.202.72