The Raspberry Pi is based on a system on a chip. This chip contains the quad-core ARM CPU that we have been studying along with a couple of coprocessors. In this chapter, we’ll be looking at what the floating-point unit (FPU) does. Some ARM documentation refers to this as the Vector Floating-Point (VFP) to promote the fact that it can do some limited vector processing. Any vector processing in the FPU is now replaced by the much better parallel processing provided by the NEON coprocessor, which we study in Chapter 12, “NEON Coprocessor.” Regardless, the FPU provides several useful instructions for performing floating-point mathematics.
We’ll review what floating-point numbers are, how they are represented in memory, and how to insert them into our Assembly programs. We’ll look at how to transfer data between the FPU and the ARM’s regular registers and memory. We’ll look at how to perform basic arithmetic operations, comparisons, and conversions.
About Floating-Point Numbers
1.456354 x 1016
There is a fractional part and an exponent that lets you move the decimal place to the left if it’s positive and to the right if it’s negative. The Raspberry Pi deals with single-precision floating-point numbers that are 32 bits in size and double-precision floating-point numbers that are 64 bits in size.
Bits of a floating-point number
Name | Precision | Sign | Fractional | Exponent | Decimal digits |
---|---|---|---|---|---|
Single | 32 bits | 1 | 24 | 8 | 7 |
Double | 64 bits | 1 | 53 | 11 | 16 |
The decimal digits column of Table 11-1 is the approximate number of decimal digits that the format can represent, or the decimal precision.
Normalization and NaNs
In the integers we’ve seen so far, all combinations of the bits provide a valid unique number. No two different patterns of bits produce the same number; however, this isn’t the case in floating-point. First of all, we have the concept of not a number or NaN. NaNs are produced from illegal operations like dividing by zero or taking the square root of a negative number. These allow the error to quietly propagate through the calculation without crashing a program. In the IEEE 754 specification, a NaN is represented by an exponent of all 1 bits.
1E0 = 0.1E1 = 0.01E2 = 0.001E3
All of these represent 1, but we call the first one with no leading zeros the normalized form. The ARM FPU tries to keep floating-point numbers in normal form, but will break this rule for small numbers, where the exponent is already as negative as it can go, then to try to avoid underflow errors, the FPU will give up on normalization to represent numbers a bit smaller than it could otherwise.
Rounding Errors
If we take a number like and represent it in floating-point, then we only keep 7 or so digits for single precision. This introduces rounding errors. If these are a problem, usually going to double precision solves the problems, but some calculations are prone to magnifying rounding errors, such as subtracting two numbers that have a minute difference.
Note
Floating-point numbers are represented in base 2, so the decimal expansions that lead to repeating patterns of digit is different than that of base 10. It comes as a surprise to many people that 0.1 is a repeating binary fraction: 0.00011001100110011…, meaning that adding dollars and cents in floating-point will introduce rounding error over enough calculations.
For financial calculations, most applications use fixed-point arithmetic that is built on integer arithmetic to avoid rounding errors in addition and subtraction.
Defining Floating-Point Numbers
These directives always take base 10 numbers.
FPU Registers
Note
Registers S0 and S1 take the same space as D0. The registers S2 and S3 use the same space as D1 and so on. The FPU just gives an easier syntax to do either single-precision or double-precision operations. It is up to us to keep things straight and not corrupt our registers by accessing the same space incorrectly.
The Raspberry Pi 2, 3, and 4 have 16 additional double-precision registers D16–D31 which have no single-precision counterparts.
These are a subset of the registers available for the NEON processor, which we will cover in the next chapter. For now, just a warning that there could be a conflict with the NEON processor if we are using that as well.
Function Call Protocol
Callee saved: The function is responsible for saving registers S16–S31 (D8–D15) needed to be saved by a function if the function uses them.
Caller saved: All other registers don’t need to be saved by a function, so they must be saved by the caller if they are required to be preserved. This includes S0–S15 (D0–D7) and D16–D31 if they are present. This also applies to any additional registers for the NEON coprocessor.
Note
The double is also our first 64-bit data type. There is an additional rule about placing these in registers, namely, that when passing a 64-bit item, it can go in registers R0 and R1 or R2 and R3. It cannot be placed in R1 and R2. And it can’t half be in R3 and half on the stack. We’ll see this later calling printf with a double as a parameter.
VPUSH {reglist}
VPOP {reglist}
You are only allowed one list in these instructions that you can create with either S or D registers.
Note
The list can’t be longer than 16 D registers.
About Building
Here we specify that we have an FPU. This will give us vfpv2 which works for all Raspberry Pi. We could use vfpv3 or vfpv4 for newer Pi if we need a newer feature. All the floating-point examples in this book work for any Pi and can just use the generic version of the command-line parameter.
Loading and Saving FPU Registers
VLDR Fd, [Rn{, #offset}]
VSTR Fd, [Rn{, #offset}]
VLDM Rn{!}, Registers
VSTM Rn{!}, Registers
Registers are a range of registers like for the VPUSH and VPOP instructions. Only one range is allowed, and it can have at most 16 double registers. These will load the number from the address pointed to by Rn, and the number of registers and whether they are single or double will determine how much data is loaded. The optional ! will update the pointer in Rn after the operation if present.
Basic Arithmetic
The floating-point processor includes the four basic arithmetic operations, along with a few extensions like our favorite multiply and accumulate. There are some specialty functions like square root and quite a few variations that affect the sign—negate versions of functions.
VADD.F32 {Sd}, Sn, Sm
VADD.F64 {Dd}, Dn, Dm
VSUB.F32 {Sd}, Sn, Sm
VSUB.F64 {Dd}, Dn, Dm
VMUL.F32 {Sd,} Sn, Sm
VMUL.F64 {Dd,} Dn, Dm
VDIV.F32 {Sd}, Sn, Sm
VDIV.F64 {Dd}, Dn, Dm
VMLA.F32 Sd, Sn, Sm
VMLA.F64 Dd, Dn, Dm
VSQRT.F32 Sd, Sm
VSQRT.F64 Dd, Dm
These functions are all fairly simple, so let’s move on to an example.
Distance Between Points
d = sqrt( (y2-y1)2 + (x2-x1)2 )
Function to calculate the distance between two points
Main program to call the distance function three times
Makefile for the distance program
We constructed the data, so the first set of points comprise a 3-4-5 triangle, which is why we get the exact answer of 5 for the first distance.
The distance function is straightforward. It loads all four numbers in one VLDM instruction and then calls the various floating-point arithmetic functions to perform the calculation. We don’t really need to save any registers, but I included the VPUSH and VPOP instructions as an example.
The part of the main routine that loops and calls the distance routine is straightforward. The part that calls printf has a couple of new complexities. The problem is that the C printf routine only has support to print doubles. In C this isn’t much of a problem, since you can just cast the argument to force a conversion. In Assembly, we need to convert our single-precision sum to a double-precision number, so we can print it.
To do the conversion, we VMOV the sum back to the FPU. VMOV is a handy instruction to move values between FPU registers and between FPU and CPU registers. We use the strange looking VCVT.F64.F32 instruction to convert from single to double precision. This function is the topic of the next section. We then VMOV the freshly constructed double back to registers R2 and R3.
When we call printf, the first parameter goes in R0. We then hit the rule about having to place the next 64-bit parameter in R2 and R3.
Note
If you are debugging the program with gdb and you want to see the contents of the FPU registers at any point, use the “info all-registers” command that will exhaustively list all the coprocessor registers. We won’t see some of these until the next chapter when we cover the NEON coprocessor.
Floating-Point Conversions
VCVT.F64.F32 Dd, Sm
VCVT.F32.F64 Sd, Dm
These convert single to double precision and double to single precision.
VCVT.F64.S32 Dd, Sm
VCVT.F32.S32 Sd, Sm
VCVT.F64.U32 Dd, Sm
VCVT.F32.U32 Sd, Sm
where the source can be either a signed or unsigned integer.
VCVTmode.S32.F64 Sd, Dm
VCVTmode.S32.F32 Sd, Sm
VCVTmode.U32.F64 Sd, Dm
VCVTmode.U32.F32 Sd, Sm
A: Round to nearest, ties away from zero
N: Round to nearest, ties to even
P: Round toward plus infinity
M: Round toward minus infinity
VCVT.S32.F64 Dd, Dd, #fbits
where #fbits are the number of bits in the fractional part of the fixed-point number.
Note
This form isn’t useful for money computations, for those you should multiply by 100, for two decimal places and convert.
Floating-Point Comparison
VCMP.F32 Sd, Sm
VCMP.F32 Sd, #0
VCMP.F64 Dd, Dm
VCMP.F64 Dd, #0
It can compare two single-precision registers or two double-precision registers. It allows one immediate value, namely, zero, so it can compare either a single- or double-precision register to zero.
VMRS APSR_nzcv, FPSCR
VMRS copies just the N, Z, C, and V flags from the FPCR to the CPSR. After the copy, we can use any instruction that reads these flags.
abs(S1 - S2) < e
where abs() is a function to calculate the absolute value.
Example
Let’s create a routing to test if two floating-point numbers are equal using this technique. We’ll first add 100 cents, then test if they exactly equal $1.00 (spoiler alert, they won’t). Then we’ll compare the sum using our fpcomp routine that tests them within a supplied tolerance (usually referred to as epsilon).
Routine to compare two floating-point numbers within a tolerance
Main program to add up 100 cents and compare to $1.00
The makefile for the floating-point comparison example
The program demonstrates how to compare floating-point numbers and how to copy the results to the CPSR, so we can branch based on the result.
We haven’t talked about the bit format of floating-point numbers, but the first bit is zero indicating positive. The next 8 bits are the exponent, which is 7F; the exponent doesn’t use two’s complement; instead, it’s value is what is there minus 127. In this case, the exponent is 0. As S2 has no more bits, but in normalized form, there is an implied 1 after the exponent, so this then gives the value of 1. Then S1 has a value of 0.99999934, showing the rounding error creeping in, even in the small number of additions we performed.
Then we call our fpcomp routine that determines if the numbers are within the provided tolerance and that considers them equal.
It didn’t take that many additions to start introducing rounding errors into our sums. You must be careful when using floating-point for this reason.
Summary
In this chapter, we covered what are floating-point numbers and how they are represented. We covered normalization, NaNs, and rounding error. We showed how to create floating-point numbers in our .data section and discussed the bank of single- and double-precision floating-point registers and how they overlap. We covered how to load them into the floating-point registers, perform mathematical operations, and save them back to memory.
We looked at how to convert between different floating-point types, how to compare floating-point numbers, and how to copy the result back to the ARM CPU. We looked at the effect rounding error has on these comparisons.
In Chapter 12, “NEON Coprocessor,” we’ll look at how to perform multiple floating-point operations in parallel.