© Jo Van Hoey 2019
J. Van HoeyBeginning x64 Assembly Programminghttps://doi.org/10.1007/978-1-4842-5076-1_27

27. Watch Your MXCSR

Jo Van Hoey1 
(1)
Hamme, Belgium
 
Before diving into SSE programming, you need to understand the SSE control and status register for floating-point operations, called mxcsr. It is a 32-bit register, of which only the lower 16 bits are used. Here is the layout:

Bit

Mnemonic

Meaning

0

IE

Invalid operation error

1

DE

Denormal error

2

ZE

Divide-by-zero error

3

OE

Overflow error

4

UE

Underflow error

5

PE

Precision error

6

DAZ

Denormals are zeros

7

IM

Invalid operation mask

8

DM

Denormal operation mask

9

ZM

Divide-by-zero mask

10

OM

Overflow mask

11

UM

Underflow mask

12

PM

Precision mask

13

RC

Rounding control

14

RC

Rounding control

15

FZ

Flush to zero

Bits 0 to 5 indicate when a floating-point exception has been detected, such as a divide by zero, or when because of a floating-point operation, a value loses some precision. Bits 7 to 12 are masks, controlling the behavior when a floating-point operation sets a flag in bits 0 to 5. If, for example, a divide-by-zero happens, normally a program would throw an error and possibly crash. When you set the divide-by-zero mask flag to 1, the program will not crash, and you can execute a certain instruction to mitigate the crash. The masks are by default set to 1 so that no SIMD floating-point exceptions will be raised. Two bits (bits 13 and 14) control the rounding, as shown here:

Bits

Meaning

00

Round to nearest

01

Round down

10

Round up

11

Truncate

We will not discuss all the status and mask details of the mxcsr register; refer to the Intel manuals for all details.

Manipulating the mxcsr Bits

The bits in the mxcsr register can be manipulated with the ldmxcsr and stmxcsr instructions. The default mxcsr state is 00001F80, or 0001 1111 1000 0000. All the mask bits are set, and rounding is set to nearest.

Listing 27-1 through Listing 27-4 show an example of what can be done with mxcsr.
; mxcsr.asm
extern printf
extern print_mxcsr
extern print_hex
section .data
      eleven      dq    11.0
      two         dq    2.0
      three       dq    3.0
      ten         dq    10.0
      zero        dq    0.0
      hex         db    "0x",0
      fmt1        db    10,"Divide, default mxcsr:",10,0
      fmt2        db    10,"Divide by zero, default mxcsr:",10,0
      fmt4        db    10,"Divide, round up:",10,0
      fmt5        db    10,"Divide, round down:",10,0
      fmt6        db    10,"Divide, truncate:",10,0
      f_div       db    "%.1f divided by %.1f is %.16f, in hex: ",0
      f_before    db    10,"mxcsr before:",9,0
      f_after     db    "mxcsr after:",9,0
;mxcsr values
      default_mxcsr     dd 0001111110000000b
      round_nearest     dd 0001111110000000b
      round_down        dd 0011111110000000b
      round_up          dd 0101111110000000b
      truncate          dd 0111111110000000b
section .bss
        mxcsr_before    resd  1
        mxcsr_after     resd  1
        xmm             resq  1
section .text
      global main
main:
push rbp
mov  rbp,rsp
;division
;default mxcsr
      mov   rdi,fmt1
      mov   rsi,ten
      mov   rdx,two
      mov   ecx, [default_mxcsr]
      call  apply_mxcsr
;----------------------------------------------
;division with precision error
;default mxcsr
      mov   rdi,fmt1
      mov   rsi,ten
      mov   rdx,three
      mov   ecx, [default_mxcsr]
      call  apply_mxcsr
;divide by zero
;default mxcsr
      mov   rdi,fmt2
      mov   rsi,ten
      mov   rdx,zero
      mov   ecx, [default_mxcsr]
      call  apply_mxcsr
;division with precision error
;round up
      mov   rdi,fmt4
      mov   rsi,ten
      mov   rdx,three
      mov   ecx, [round_up]
      call  apply_mxcsr
;division with precision error
;round up
      mov   rdi,fmt5
      mov   rsi,ten
      mov   rdx,three
      mov   ecx, [round_down]
      call  apply_mxcsr
;division with precision error
;truncate
      mov   rdi,fmt6
      mov   rsi,ten
      mov   rdx,three
      mov   ecx, [truncate]
      call  apply_mxcsr
;----------------------------------------------
;division with precision error
;default mxcsr
      mov   rdi,fmt1
      mov   rsi,eleven
      mov   rdx,three
      mov   ecx, [default_mxcsr]
      call  apply_mxcsr;division with precision error
;round up
      mov   rdi,fmt4
      mov   rsi,eleven
      mov   rdx,three
      mov   ecx, [round_up]
      call  apply_mxcsr
;division with precision error
;round up
      mov   rdi,fmt5
      mov   rsi,eleven
      mov   rdx,three
      mov   ecx, [round_down]
      call  apply_mxcsr
;division with precision error
;truncate
      mov   rdi,fmt6
      mov   rsi,eleven
      mov   rdx,three
      mov   ecx, [truncate]
      call  apply_mxcsr
leave
ret
;function --------------------------------------------
apply_mxcsr:
push  rbp
mov   rbp,rsp
      push rsi
      push  rdx
      push  rcx
      push  rbp            ; one more for stack alignment
      call  printf
      pop   rbp
      pop   rcx
      pop   rdx
      pop   rsi
      mov         [mxcsr_before],ecx
      ldmxcsr     [mxcsr_before]
      movsd       xmm2, [rsi] ; double precision float into xmm2
      divsd       xmm2, [rdx]     ; divide xmm2
      stmxcsr     [mxcsr_after]   ; save mxcsr to memory
      movsd       [xmm],xmm2      ; for use in print_xmm
      mov         rdi,f_div
      movsd       xmm0, [rsi]
      movsd       xmm1, [rdx]
      call        printf
      call        print_xmm
;print mxcsr
      mov         rdi,f_before
      call        printf
      mov         rdi, [mxcsr_before]
      call        print_mxcsr
      mov         rdi,f_after
      call        printf
      mov         rdi, [mxcsr_after]
      call        print_mxcsr
leave
ret
;function --------------------------------------------
print_xmm:
push rbp
mov  rbp,rsp
     mov   rdi, hex    ;print 0x
     call  printf
     mov   rcx,8
.loop:
     xor   rdi,rdi
     mov   dil,[xmm+rcx-1]
     push  rcx
     call  print_hex
     pop   rcx
     loop  .loop
leave
ret
Listing 27-1

mxcsr.asm

// print_hex.c
#include <stdio.h>
void print_hex(unsigned char n){
           if (n < 16) printf("0");
           printf("%x",n);
}
Listing 27-2

print_hex.c

// print_mxcsr.c
#include <stdio.h>
void print_mxcsr(long long n){
      long long s,c;
      for (c = 15; c >= 0; c--)
      {
           s = n >> c;
           // space after every 8th bit
           if ((c+1) % 4 == 0) printf(" ");
           if (s & 1)
                      printf("1");
           else
                      printf("0");
      }
      printf(" ");
}
Listing 27-3

print_mxcsr.c

mxcsr: mxcsr.o print_mxcsr.o print_hex.o
      gcc -o mxcsr mxcsr.o print_mxcsr.o print_hex.o -no-pie
mxcsr.o: mxcsr.asm
      nasm -f elf64 -g -F dwarf mxcsr.asm -l mxcsr.lst
print_mxcsr: print_mxcsr.c
      gcc -c print_mxcsr.c
print_hex: print_hex.c
      gcc -c print_hex.c
Listing 27-4

makefile

In this program, we show different rounding modes and a masked zero division. The default rounding is rounding to nearest. For example, in decimal, computing a positive number ending with a .5 or higher would be rounded to the higher number, and a negative number ending with a .5 or higher would be rounded to the lower (more negative) number. However, here we are rounding in hexadecimal, not decimal, and that does not always give the same result as rounding in decimal!

Figure 27-1 shows the output.
../images/483996_1_En_27_Chapter/483996_1_En_27_Fig1_HTML.jpg
Figure 27-1

mxcsr.asm output

Analyzing the Program

Let’s analyze the program. We have a number of divisions where we apply rounding. The divisions are done in the function apply_mxcsr. Before calling this function, we put the address of the print title in rdi, the dividend in rdi, and the divisor in rdx. Then we copy the desired mxcsr value from memory to ecx; for the first call, it’s the default mxcsr value. Then we call apply_mxcsr. In this function, we print the title, without forgetting to first preserve the necessary registers and align the stack. We then store the value in ecx to mxcsr_before and load mxcsr with the value stored in mxcsr_before with the instruction ldmxcsr. The instruction ldmxcsr takes a 32-bit memory variable (double word) as the operand. The instruction divsd takes an xmm register as a first argument and an xmm register or 64-bit variable as a second operand. After the division is done, the content of the mxcsr register is stored in memory in the variable mxcsr_after with the instruction stmxcsr . We copy the quotient in xmm2 to memory in the variable xmm in order to print it.

We first print the quotient in decimal and then want to print it in hexadecimal on the same line. We cannot print a hexadecimal value with printf from within assembly (at least not in the version in use here); we have to create a function for doing that. So, we created the function print_xmm . This function takes the memory variable xmm and loads bytes into dil one by one in a loop. In the same loop, the custom-built C function print_hex is called for every byte. By using the decreasing loop counter rcx in the address, we also take care of little-endianness: the floating-point value is stored in memory in little-endian format!

Finally, mxcsr_before and mxcsr_after are displayed so that we can compare them. The function print_mxcsr is used to print the bits in mxcsr and is similar to the bit printing functions we used in previous chapters.

Some readers may find this complex; just step through the program with a debugger and observe the memory and registers.

Let’s analyze the output: you can see that mxcsr does not change when we divide 10 by 2. When we divide 10 by 3, we have 3.333. Here mxcsr signals a precision error in bit 5. The default rounding, rounding to nearest, increases the last hexadecimal from a to b. In decimal, the rounding would be a rounding down; however, in hexadecimal, an a, which is higher than 8, will be rounded up to b.

We continue with a zero division: mxcsr signals a zero division in bit 2, but the program does not crash because the zero-division mask ZE is set. The result is inf or 0x7ff0000000000000.

The next division and round-up has the same result as rounding to nearest. The next two divisions with round-down and truncate result in a number with a last hexadecimal digit of a.

To show the difference in rounding, we do the same exercise with 11 divided by 3. This division results in a quotient with a low final hexadecimal digit. You can compare the rounding behavior.

As an exercise, clear the zero-division mask bit and rerun the program. You will see that the program will crash. The zero-division mask and the other masks allow you to catch errors and jump to some error procedure.

Summary

In this chapter, you learned about the following:
  • The layout and purpose of the mxcsr register

  • How to manipulate the mxcsr register

  • How to round subtleties

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.206.112