© Stephen Smith 2019
S. SmithRaspberry Pi Assembly Language Programminghttps://doi.org/10.1007/978-1-4842-5287-1_11

11. Floating-Point Operations

Stephen Smith1 
(1)
Gibsons, BC, Canada
 

The Raspberry Pi is based on a system on a chip. This chip contains the quad-core ARM CPU that we have been studying along with a couple of coprocessors. In this chapter, we’ll be looking at what the floating-point unit (FPU) does. Some ARM documentation refers to this as the Vector Floating-Point (VFP) to promote the fact that it can do some limited vector processing. Any vector processing in the FPU is now replaced by the much better parallel processing provided by the NEON coprocessor, which we study in Chapter 12, “NEON Coprocessor.” Regardless, the FPU provides several useful instructions for performing floating-point mathematics.

We’ll review what floating-point numbers are, how they are represented in memory, and how to insert them into our Assembly programs. We’ll look at how to transfer data between the FPU and the ARM’s regular registers and memory. We’ll look at how to perform basic arithmetic operations, comparisons, and conversions.

About Floating-Point Numbers

Floating-point numbers are a way to represent numbers in scientific notation on the computer. Scientific notation represents numbers something like this:
  • 1.456354 x 1016

There is a fractional part and an exponent that lets you move the decimal place to the left if it’s positive and to the right if it’s negative. The Raspberry Pi deals with single-precision floating-point numbers that are 32 bits in size and double-precision floating-point numbers that are 64 bits in size.

The Raspberry Pi uses the IEEE 754 standard for floating-point numbers. Each number contains a sign bit to indicate if it is positive or negative, a field of bits for the exponent, and a string of digits for the fractional part. Table 11-1 lists the number of bits for the parts of each format.
Table 11-1

Bits of a floating-point number

Name

Precision

Sign

Fractional

Exponent

Decimal digits

Single

32 bits

1

24

8

7

Double

64 bits

1

53

11

16

The decimal digits column of Table 11-1 is the approximate number of decimal digits that the format can represent, or the decimal precision.

Normalization and NaNs

In the integers we’ve seen so far, all combinations of the bits provide a valid unique number. No two different patterns of bits produce the same number; however, this isn’t the case in floating-point. First of all, we have the concept of not a number or NaN. NaNs are produced from illegal operations like dividing by zero or taking the square root of a negative number. These allow the error to quietly propagate through the calculation without crashing a program. In the IEEE 754 specification, a NaN is represented by an exponent of all 1 bits.

A normalized floating-point number means the first digit in the fractional part is non-zero. A problem with floating-point numbers is that numbers can often be represented in multiple ways. For instance, a fractional part of 0 with either sign bit or any exponent is zero. Consider a representation of 1:
  • 1E0 = 0.1E1 = 0.01E2 = 0.001E3

All of these represent 1, but we call the first one with no leading zeros the normalized form. The ARM FPU tries to keep floating-point numbers in normal form, but will break this rule for small numbers, where the exponent is already as negative as it can go, then to try to avoid underflow errors, the FPU will give up on normalization to represent numbers a bit smaller than it could otherwise.

Rounding Errors

If we take a number like ../images/486919_1_En_11_Chapter/486919_1_En_11_Figa_HTML.gif  and represent it in floating-point, then we only keep 7 or so digits for single precision. This introduces rounding errors. If these are a problem, usually going to double precision solves the problems, but some calculations are prone to magnifying rounding errors, such as subtracting two numbers that have a minute difference.

Note

Floating-point numbers are represented in base 2, so the decimal expansions that lead to repeating patterns of digit is different than that of base 10. It comes as a surprise to many people that 0.1 is a repeating binary fraction: 0.00011001100110011…, meaning that adding dollars and cents in floating-point will introduce rounding error over enough calculations.

For financial calculations, most applications use fixed-point arithmetic that is built on integer arithmetic to avoid rounding errors in addition and subtraction.

Defining Floating-Point Numbers

The GNU Assembler has directives for defining storage for both single- and double-precision floating-point numbers. These are .single and .double, for example:
.single   1.343, 4.343e20, -0.4343, -0.4444e-10
.double   -4.24322322332e-10, 3.141592653589793

These directives always take base 10 numbers.

FPU Registers

The ARM FPU has its own set of registers. There are 32 single-precision floating-point registers that are referred to as S0, S1, …, S31. These same registers can also be referred to as 16 double-precision registers D0, …, D15. Figure 11-1 shows this configuration of registers.
../images/486919_1_En_11_Chapter/486919_1_En_11_Fig1_HTML.jpg
Figure 11-1

The ARM’s FPU registers (the single-precision registers on the left overlap the double-precision registers on the right)

Note

Registers S0 and S1 take the same space as D0. The registers S2 and S3 use the same space as D1 and so on. The FPU just gives an easier syntax to do either single-precision or double-precision operations. It is up to us to keep things straight and not corrupt our registers by accessing the same space incorrectly.

The Raspberry Pi 2, 3, and 4 have 16 additional double-precision registers D16D31 which have no single-precision counterparts.

These are a subset of the registers available for the NEON processor, which we will cover in the next chapter. For now, just a warning that there could be a conflict with the NEON processor if we are using that as well.

Function Call Protocol

In Chapter 6, “Functions and the Stack,” we gave the protocol for who saves which registers when calling functions. With these floating-point registers, we have to add them to our protocol:
  • Callee saved: The function is responsible for saving registers S16S31 (D8D15) needed to be saved by a function if the function uses them.

  • Caller saved: All other registers don’t need to be saved by a function, so they must be saved by the caller if they are required to be preserved. This includes S0S15 (D0D7) and D16D31 if they are present. This also applies to any additional registers for the NEON coprocessor.

Note

The double is also our first 64-bit data type. There is an additional rule about placing these in registers, namely, that when passing a 64-bit item, it can go in registers R0 and R1 or R2 and R3. It cannot be placed in R1 and R2. And it can’t half be in R3 and half on the stack. We’ll see this later calling printf with a double as a parameter.

Here are our first coprocessor instructions:
  • VPUSH {reglist}

  • VPOP {reglist}

For example:
      VPUSH  {S16-S31}
      VPOP   {S16-S31}

You are only allowed one list in these instructions that you can create with either S or D registers.

Note

The list can’t be longer than 16 D registers.

About Building

All the examples in this chapter use the C runtime and are built using gcc . This works fine in the same manner as the previous chapters. If we want to use the GNU Assembler directly via the as command, then we need to modify our makefile with
%.o : %.s
      as -mfpu=vfp $(DEBUGFLGS) $(LSTFLGS) $< -o $@

Here we specify that we have an FPU. This will give us vfpv2 which works for all Raspberry Pi. We could use vfpv3 or vfpv4 for newer Pi if we need a newer feature. All the floating-point examples in this book work for any Pi and can just use the generic version of the command-line parameter.

Loading and Saving FPU Registers

In Chapter 5, “Thanks for the Memories,” we covered the LDR and STR instructions to load registers from memory, then store them back to memory. The floating-point coprocessor has similar instructions for its registers:
  • VLDR  Fd, [Rn{, #offset}]

  • VSTR  Fd, [Rn{, #offset}]

We see that both instructions support pre-indexed addressing offsets. The Fd register can be either an S or D register. For example:
      LDR   R1, =fp1
      VLDR  S4, [R1]
      VLDR  S4, [R1, #4]
      VSTR  S4, [R1]
      VSTR  S4, [R1, #4]
      ...
.data
fp1:  .single    3.14159
fp2:  .single    4.3341
fp3:  .single 0.0
There is also a load multiple instruction and store multiple—these are
  • VLDM Rn{!}, Registers

  • VSTM Rn{!}, Registers

Registers are a range of registers like for the VPUSH and VPOP instructions. Only one range is allowed, and it can have at most 16 double registers. These will load the number from the address pointed to by Rn, and the number of registers and whether they are single or double will determine how much data is loaded. The optional ! will update the pointer in Rn after the operation if present.

Basic Arithmetic

The floating-point processor includes the four basic arithmetic operations, along with a few extensions like our favorite multiply and accumulate. There are some specialty functions like square root and quite a few variations that affect the sign—negate versions of functions.

Each of these functions comes in two versions, a 32-bit version that you put .F32 after and a 64-bit version that you place .F64 after. It would be nice if the Assembler just did this for you based on the registers you provide, but if you leave off the size part, the error message is misleading. Here is a selection of the instructions:
  • VADD.F32  {Sd}, Sn, Sm

  • VADD.F64  {Dd}, Dn, Dm

  • VSUB.F32  {Sd}, Sn, Sm

  • VSUB.F64  {Dd}, Dn, Dm

  • VMUL.F32  {Sd,} Sn, Sm

  • VMUL.F64  {Dd,} Dn, Dm

  • VDIV.F32  {Sd}, Sn, Sm

  • VDIV.F64  {Dd}, Dn, Dm

  • VMLA.F32  Sd, Sn, Sm

  • VMLA.F64  Dd, Dn, Dm

  • VSQRT.F32  Sd, Sm

  • VSQRT.F64  Dd, Dm

If the destination register is in curly brackets {}, it is optional, so we can leave it out. This means we apply the second operand to the first, so to add S1 to S4, we simply write
      VADD.F32   S4, S1

These functions are all fairly simple, so let’s move on to an example.

Distance Between Points

If we have two points (x1, y1) and (x2, y2), then the distance between them is given by the formula
  • d = sqrt( (y2-y1)2 + (x2-x1)2 )

Let’s write a function to calculate this for any two single-precision floating-point pair of coordinates. We’ll use the C runtime’s printf function to print out our results. First the distance function from Listing 11-1, in the file distance.s.
@
@ Example function to calculate the distance
@ between two points in single precision
@ floating point.
@
@ Inputs:
@     R0 - pointer to the 4 FP numbers
@            they are x1, y1, x2, y2
@ Outputs:
@     R0 - the length (as single precision FP)
.global distance @ Allow function to be called by others
@
distance:
      @ push all registers to be safe, we don't
      @ really need to push so many.
      push  {R4-R12, LR}
      vpush {S16-S31}
      @ load all 4 numbers at once
      vldm  R0, {S0-S3}
      @ calc s4 = x2 - x1
      vsub.f32   S4, S2, S0
      @ calc s5 = y2 - y1
      vsub.f32   S5, S3, S1
      @ calc s4 = S4 ∗ S4 (x2-X1)^2
      vmul.f32   S4, S4
      @ calc s5 = s5 ∗ s5 (Y2-Y1)^2
      vmul.f32   S5, S5
      @ calc S4 = S4 + S5
      vadd.f32   S4, S5
      @ calc sqrt(S4)
      vsqrt.f32  S4, S4
      @ move result to R0 to be returned
      vmov  R0, S4
      @ restore what we preserved.
      vpop  {S16-S31}
      pop   {R4-R12, PC}
Listing 11-1

Function to calculate the distance between two points

Now we place the code from Listing 11-2 in main.s, which calls distance three times with three different points and prints out the distance for each one.
@
@ Main program to test our distance function
@
@ r7 - loop counter
@ r8 - address to current set of points
.global main @ Provide program entry point
@
      .equ  N, 3  @ Number of points.
main:
      push  {R4-R12, LR}
      ldr  r8, =points @ pointer to current points
      mov  r7, #N     @ number of loop iterations
loop: mov  r0, r8     @ move pointer to parameter 1
      bl   distance   @ call distance function
@ need to take the single precision return value
@ and convert it to a double, because the C printf
@ function can only print doubles.
      vmov  s2, r0         @ move back to fpu for conversion
      vcvt.f64.f32 d0, s2  @ convert single to double
      vmov r2, r3, d0      @ return double to r2, r3
      ldr  r0, =prtstr     @ load print string
      bl   printf          @ print the distance
      add  r8, #(4*4)      @ 4 points each 4 bytes
      subs r7, #1          @ decrement loop counter
      bne  loop            @ loop if more points
      mov  r0, #0          @ return code
      pop  {R4-R12, PC}
.data
points:    .single   0.0, 0.0, 3.0, 4.0
      .single    1.3, 5.4, 3.1, -1.5
      .single 1.323e10, -1.2e-4, 34.55, 5454.234
prtstr:    .asciz "Distance = %f "
Listing 11-2

Main program to call the distance function three times

The makefile is in Listing 11-3.
distance: distance.s main.s
      gcc -o distance distance.s main.s
Listing 11-3

Makefile for the distance program

If we build and run the program, we get
pi@raspberrypi:~/asm/Chapter 11 $ make
gcc -g -o distance distance.s main.s
pi@raspberrypi:~/asm/Chapter 11 $ ./distance
Distance = 5.000000
Distance = 7.130919
Distance = 13230000128.000000
pi@raspberrypi:~/asm/Chapter 11 $

We constructed the data, so the first set of points comprise a 3-4-5 triangle, which is why we get the exact answer of 5 for the first distance.

The distance function is straightforward. It loads all four numbers in one VLDM instruction and then calls the various floating-point arithmetic functions to perform the calculation. We don’t really need to save any registers, but I included the VPUSH and VPOP instructions as an example.

The part of the main routine that loops and calls the distance routine is straightforward. The part that calls printf has a couple of new complexities. The problem is that the C printf routine only has support to print doubles. In C this isn’t much of a problem, since you can just cast the argument to force a conversion. In Assembly, we need to convert our single-precision sum to a double-precision number, so we can print it.

To do the conversion, we VMOV the sum back to the FPU. VMOV is a handy instruction to move values between FPU registers and between FPU and CPU registers. We use the strange looking VCVT.F64.F32 instruction to convert from single to double precision. This function is the topic of the next section. We then VMOV the freshly constructed double back to registers R2 and R3.

When we call printf, the first parameter goes in R0. We then hit the rule about having to place the next 64-bit parameter in R2 and R3.

Note

If you are debugging the program with gdb and you want to see the contents of the FPU registers at any point, use the “info all-registers” command that will exhaustively list all the coprocessor registers. We won’t see some of these until the next chapter when we cover the NEON coprocessor.

Floating-Point Conversions

In the last example, we had our first look at the conversion instruction VCVT . The FPU supports a variety of versions of this function; not only does it support conversions between single- and double-precision floating-point numbers, but it supports conversions to and from integers. It also supports conversion to fixed-point decimal numbers (integers with an implied decimal). It supports several rounding methods as well. The most used versions of this function are
  • VCVT.F64.F32  Dd, Sm

  • VCVT.F32.F64  Sd, Dm

These convert single to double precision and double to single precision.

To convert from an integer to a floating-point number, we have
  • VCVT.F64.S32  Dd, Sm

  • VCVT.F32.S32  Sd, Sm

  • VCVT.F64.U32  Dd, Sm

  • VCVT.F32.U32  Sd, Sm

where the source can be either a signed or unsigned integer.

To convert from floating-point to integer, we have
  • VCVTmode.S32.F64  Sd, Dm

  • VCVTmode.S32.F32  Sd, Sm

  • VCVTmode.U32.F64  Sd, Dm

  • VCVTmode.U32.F32  Sd, Sm

In this direction, we have rounding, so we specify the method of rounding we want with mode. Mode must be one of
  • A: Round to nearest, ties away from zero

  • N: Round to nearest, ties to even

  • P: Round toward plus infinity

  • M: Round toward minus infinity

There are similar versions for fixed point such as
  • VCVT.S32.F64  Dd, Dd, #fbits

where #fbits are the number of bits in the fractional part of the fixed-point number.

Note

This form isn’t useful for money computations, for those you should multiply by 100, for two decimal places and convert.

Floating-Point Comparison

The floating-point instructions don’t affect the CPSR. There is a Floating-Point Status Control Register (FPSCR) for floating-point operations. It contains N, Z, C, and V flags like the CPSR. The meaning of these is mostly the same. There are no S versions of the floating-point instructions; there is only one instruction that updates these flags, namely, the VCMP instruction. Here are some of its forms:
  • VCMP.F32  Sd, Sm

  • VCMP.F32  Sd, #0

  • VCMP.F64  Dd, Dm

  • VCMP.F64  Dd, #0

It can compare two single-precision registers or two double-precision registers. It allows one immediate value, namely, zero, so it can compare either a single- or double-precision register to zero.

The VCMP instruction updates the FPSCR, but all our branch-on-condition instructions branch based on flags in the CPSR. This forces an extra step to copy the flags from the FPSCR to the CPSR before using one of our regular instructions to act on the results of the comparison. There is an instruction specifically for this purpose:
  • VMRS  APSR_nzcv, FPSCR

VMRS copies just the N, Z, C, and V flags from the FPCR to the CPSR. After the copy, we can use any instruction that reads these flags.

Testing for equality of floating-point numbers is problematic due to rounding error, numbers are often close but not exactly equal. The solution is to decide on a tolerance, then consider numbers equal if they are within the tolerance from each other. For instance, we might define e = 0.000001 and then consider two registers equal if
  • abs(S1 - S2) < e

where abs() is a function to calculate the absolute value.

Example

Let’s create a routing to test if two floating-point numbers are equal using this technique. We’ll first add 100 cents, then test if they exactly equal $1.00 (spoiler alert, they won’t). Then we’ll compare the sum using our fpcomp routine that tests them within a supplied tolerance (usually referred to as epsilon).

We start with our floating-point comparison routine, placing the contents of Listing 11-4 into fpcomp.s.
@
@ Function to compare to floating point numbers
@ the parameters are a pointer to the two numbers
@ and an error epsilon.
@
@ Inputs:
@     R0 - pointer to the 3 FP numbers
@            they are x1, x2, e
@ Outputs:
@     R0 - 1 if they are equal, else 0
.global fpcomp @ Allow function to be called by others
@
fpcomp:
      @ push all registers to be safe, we don't really
      @ need to push so many.
      push  {R4-R12, LR}
      vpush {S16-S31}
      @ load all 3 numbers at once
      vldm  R0, {S0-S2}
      @ calc s3 = x2 - x1
      vsub.f32    S3, S1, S0
      vabs.f32    S3, S3
      vcmp.f32    S3, S2
      vmrs        APSR_nzcv, FPSCR
      BLE         notequal
      MOV         R0, #1
      B           done
notequal:MOV            R0, #0
      @ restore what we preserved.
done: vpop  {S16-S31}
      pop   {R4-R12, PC}
Listing 11-4

Routine to compare two floating-point numbers within a tolerance

Now the main program maincomp.s contains Listing 11-5.
@
@ Main program to test our distance function
@
@ r7 - loop counter
@ r8 - address to current set of points
.global main @ Provide program starting address to linker
      .equ  N, 100    @ Number of additions.
main:
      push  {R4-R12, LR}
@ Add up one hundred cents and test
@ if they equal $1.00
      mov   r7, #N    @ number of loop iterations
@ load cents, running sum and real sum to FPU
      ldr  r0, =cent
      vldm r0, {S0-S2}
loop:
      @ add cent to running sum
      vadd.f32   s1, s0
      subs r7, #1     @ decrement loop counter
      bne  loop       @ loop if more points
      @ compare running sum to real sum
      vcmp.f32 s1, s2
      @ copy FPSCR to CPSR
      vmrs       APSR_nzcv, FPSCR
      @ print if the numbers are equal or not
      beq  equal
      ldr  r0, =notequalstr
      bl   printf
      b    next
equal:  ldr      r0, =equalstr
      bl   printf
next:
@ load pointer to running sum, real sum and epsilon
      ldr   r0, =runsum
      vldm  r0, {S0-S2}
@ call comparison function
      bl    fpcomp        @ call comparison function
@ compare return code to 1 and print if the numbers
@ are equal or not (within epsilon).
      cmp   r0, #1
      beq   equal2
      ldr   r0, =notequalstr
      bl    printf
      b     done
equal2:  ldr      r0, =equalstr
      bl   printf
done: mov   r0, #0           @ return code
      pop   {R4-R12, PC}
.data
cent: .single   0.01
runsum: .single 0.0
sum:  .single 1.00
epsilon:.single 0.00001
equalstr:  .asciz "equal "
notequalstr: .asciz "not equal "
Listing 11-5

Main program to add up 100 cents and compare to $1.00

The makefile, in Listing 11-6, is as we would expect.
fpcomp: fpcomp.s maincomp.s
      gcc -o fpcomp fpcomp.s maincomp.s
Listing 11-6

The makefile for the floating-point comparison example

If we build and run the program, we get
pi@raspberrypi:~/asm/Chapter 11 $ make
gcc -g -o fpcomp fpcomp.s maincomp.s
pi@raspberrypi:~/asm/Chapter 11 $ ./fpcomp
not equal
equal
pi@raspberrypi:~/asm/Chapter 11 $

The program demonstrates how to compare floating-point numbers and how to copy the results to the CPSR, so we can branch based on the result.

If we run the program under gdb, we can examine the sum of 100 cents. We see
S1 = 0x3f7ffff5
S2 = 0x3f80

We haven’t talked about the bit format of floating-point numbers, but the first bit is zero indicating positive. The next 8 bits are the exponent, which is 7F; the exponent doesn’t use two’s complement; instead, it’s value is what is there minus 127. In this case, the exponent is 0. As S2 has no more bits, but in normalized form, there is an implied 1 after the exponent, so this then gives the value of 1. Then S1 has a value of 0.99999934, showing the rounding error creeping in, even in the small number of additions we performed.

Then we call our fpcomp routine that determines if the numbers are within the provided tolerance and that considers them equal.

It didn’t take that many additions to start introducing rounding errors into our sums. You must be careful when using floating-point for this reason.

Summary

In this chapter, we covered what are floating-point numbers and how they are represented. We covered normalization, NaNs, and rounding error. We showed how to create floating-point numbers in our .data section and discussed the bank of single- and double-precision floating-point registers and how they overlap. We covered how to load them into the floating-point registers, perform mathematical operations, and save them back to memory.

We looked at how to convert between different floating-point types, how to compare floating-point numbers, and how to copy the result back to the ARM CPU. We looked at the effect rounding error has on these comparisons.

In Chapter 12, “NEON Coprocessor,” we’ll look at how to perform multiple floating-point operations in parallel.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.15.149