Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

J. Van HoeyBeginning x64 Assembly Programminghttps://doi.org/10.1007/978-1-4842-5076-1_27

27. Watch Your MXCSR

Jo Van Hoey¹

(1)

Hamme, Belgium

Before diving into SSE programming, you need to understand the SSE control and status register for floating-point operations, called mxcsr. It is a 32-bit register, of which only the lower 16 bits are used. Here is the layout:

Bit	Mnemonic	Meaning
0	IE	Invalid operation error
1	DE	Denormal error
2	ZE	Divide-by-zero error
3	OE	Overflow error
4	UE	Underflow error
5	PE	Precision error
6	DAZ	Denormals are zeros
7	IM	Invalid operation mask
8	DM	Denormal operation mask
9	ZM	Divide-by-zero mask
10	OM	Overflow mask
11	UM	Underflow mask
12	PM	Precision mask
13	RC	Rounding control
14	RC	Rounding control
15	FZ	Flush to zero

Bits 0 to 5 indicate when a floating-point exception has been detected, such as a divide by zero, or when because of a floating-point operation, a value loses some precision. Bits 7 to 12 are masks, controlling the behavior when a floating-point operation sets a flag in bits 0 to 5. If, for example, a divide-by-zero happens, normally a program would throw an error and possibly crash. When you set the divide-by-zero mask flag to 1, the program will not crash, and you can execute a certain instruction to mitigate the crash. The masks are by default set to 1 so that no SIMD floating-point exceptions will be raised. Two bits (bits 13 and 14) control the rounding, as shown here:

Bits	Meaning
00	Round to nearest
01	Round down
10	Round up
11	Truncate

We will not discuss all the status and mask details of the mxcsr register; refer to the Intel manuals for all details.

Manipulating the mxcsr Bits

The bits in the mxcsr register can be manipulated with the ldmxcsr and stmxcsr instructions. The default mxcsr state is 00001F80, or 0001 1111 1000 0000. All the mask bits are set, and rounding is set to nearest.

Listing 27-1 through Listing 27-4 show an example of what can be done with mxcsr.

; mxcsr.asm

extern printf

extern print_mxcsr

extern print_hex

section .data

eleven dq 11.0

two dq 2.0

three dq 3.0

ten dq 10.0

zero dq 0.0

hex db "0x",0

fmt1 db 10,"Divide, default mxcsr:",10,0

fmt2 db 10,"Divide by zero, default mxcsr:",10,0

fmt4 db 10,"Divide, round up:",10,0

fmt5 db 10,"Divide, round down:",10,0

fmt6 db 10,"Divide, truncate:",10,0

f_div db "%.1f divided by %.1f is %.16f, in hex: ",0

f_before db 10,"mxcsr before:",9,0

f_after db "mxcsr after:",9,0

;mxcsr values

default_mxcsr dd 0001111110000000b

round_nearest dd 0001111110000000b

round_down dd 0011111110000000b

round_up dd 0101111110000000b

truncate dd 0111111110000000b

section .bss

mxcsr_before resd 1

mxcsr_after resd 1

xmm resq 1

section .text

global main

main:

push rbp

mov rbp,rsp

;division

;default mxcsr

mov rdi,fmt1

mov rsi,ten

mov rdx,two

mov ecx, [default_mxcsr]

call apply_mxcsr

;----------------------------------------------

;division with precision error

;default mxcsr

mov rdi,fmt1

mov rsi,ten

mov rdx,three

mov ecx, [default_mxcsr]

call apply_mxcsr

;divide by zero

;default mxcsr

mov rdi,fmt2

mov rsi,ten

mov rdx,zero

mov ecx, [default_mxcsr]

call apply_mxcsr

;division with precision error

;round up

mov rdi,fmt4

mov rsi,ten

mov rdx,three

mov ecx, [round_up]

call apply_mxcsr

;division with precision error

;round up

mov rdi,fmt5

mov rsi,ten

mov rdx,three

mov ecx, [round_down]

call apply_mxcsr

;division with precision error

;truncate

mov rdi,fmt6

mov rsi,ten

mov rdx,three

mov ecx, [truncate]

call apply_mxcsr

;----------------------------------------------

;division with precision error

;default mxcsr

mov rdi,fmt1

mov rsi,eleven

mov rdx,three

mov ecx, [default_mxcsr]

call apply_mxcsr;division with precision error

;round up

mov rdi,fmt4

mov rsi,eleven

mov rdx,three

mov ecx, [round_up]

call apply_mxcsr

;division with precision error

;round up

mov rdi,fmt5

mov rsi,eleven

mov rdx,three

mov ecx, [round_down]

call apply_mxcsr

;division with precision error

;truncate

mov rdi,fmt6

mov rsi,eleven

mov rdx,three

mov ecx, [truncate]

call apply_mxcsr

leave

ret

;function --------------------------------------------

apply_mxcsr:

push rbp

mov rbp,rsp

push rsi

push rdx

push rcx

push rbp ; one more for stack alignment

call printf

pop rbp

pop rcx

pop rdx

pop rsi

mov [mxcsr_before],ecx

ldmxcsr [mxcsr_before]

movsd xmm2, [rsi] ; double precision float into xmm2

divsd xmm2, [rdx] ; divide xmm2

stmxcsr [mxcsr_after] ; save mxcsr to memory

movsd [xmm],xmm2 ; for use in print_xmm

mov rdi,f_div

movsd xmm0, [rsi]

movsd xmm1, [rdx]

call printf

call print_xmm

;print mxcsr

mov rdi,f_before

call printf

mov rdi, [mxcsr_before]

call print_mxcsr

mov rdi,f_after

call printf

mov rdi, [mxcsr_after]

call print_mxcsr

leave

ret

;function --------------------------------------------

print_xmm:

push rbp

mov rbp,rsp

mov rdi, hex ;print 0x

call printf

mov rcx,8

.loop:

xor rdi,rdi

mov dil,[xmm+rcx-1]

push rcx

call print_hex

pop rcx

loop .loop

leave

ret

Listing 27-1

mxcsr.asm

// print_hex.c

#include <stdio.h>

void print_hex(unsigned char n){

if (n < 16) printf("0");

printf("%x",n);

}

Listing 27-2

print_hex.c

// print_mxcsr.c

#include <stdio.h>

void print_mxcsr(long long n){

long long s,c;

for (c = 15; c >= 0; c--)

{

s = n >> c;

// space after every 8th bit

if ((c+1) % 4 == 0) printf(" ");

if (s & 1)

printf("1");

else

printf("0");

}

printf(" ");

}

Listing 27-3

print_mxcsr.c

mxcsr: mxcsr.o print_mxcsr.o print_hex.o

gcc -o mxcsr mxcsr.o print_mxcsr.o print_hex.o -no-pie

mxcsr.o: mxcsr.asm

nasm -f elf64 -g -F dwarf mxcsr.asm -l mxcsr.lst

print_mxcsr: print_mxcsr.c

gcc -c print_mxcsr.c

print_hex: print_hex.c

gcc -c print_hex.c

Listing 27-4

makefile

In this program, we show different rounding modes and a masked zero division. The default rounding is rounding to nearest. For example, in decimal, computing a positive number ending with a .5 or higher would be rounded to the higher number, and a negative number ending with a .5 or higher would be rounded to the lower (more negative) number. However, here we are rounding in hexadecimal, not decimal, and that does not always give the same result as rounding in decimal!

Figure 27-1 shows the output.

../images/483996_1_En_27_Chapter/483996_1_En_27_Fig1_HTML.jpg — Figure 27-1
mxcsr.asm output

Analyzing the Program

Let’s analyze the program. We have a number of divisions where we apply rounding. The divisions are done in the function apply_mxcsr. Before calling this function, we put the address of the print title in rdi, the dividend in rdi, and the divisor in rdx. Then we copy the desired mxcsr value from memory to ecx; for the first call, it’s the default mxcsr value. Then we call apply_mxcsr. In this function, we print the title, without forgetting to first preserve the necessary registers and align the stack. We then store the value in ecx to mxcsr_before and load mxcsr with the value stored in mxcsr_before with the instruction ldmxcsr. The instruction ldmxcsr takes a 32-bit memory variable (double word) as the operand. The instruction divsd takes an xmm register as a first argument and an xmm register or 64-bit variable as a second operand. After the division is done, the content of the mxcsr register is stored in memory in the variable mxcsr_after with the instruction stmxcsr . We copy the quotient in xmm2 to memory in the variable xmm in order to print it.

We first print the quotient in decimal and then want to print it in hexadecimal on the same line. We cannot print a hexadecimal value with printf from within assembly (at least not in the version in use here); we have to create a function for doing that. So, we created the function print_xmm . This function takes the memory variable xmm and loads bytes into dil one by one in a loop. In the same loop, the custom-built C function print_hex is called for every byte. By using the decreasing loop counter rcx in the address, we also take care of little-endianness: the floating-point value is stored in memory in little-endian format!

Finally, mxcsr_before and mxcsr_after are displayed so that we can compare them. The function print_mxcsr is used to print the bits in mxcsr and is similar to the bit printing functions we used in previous chapters.

Some readers may find this complex; just step through the program with a debugger and observe the memory and registers.

Let’s analyze the output: you can see that mxcsr does not change when we divide 10 by 2. When we divide 10 by 3, we have 3.333. Here mxcsr signals a precision error in bit 5. The default rounding, rounding to nearest, increases the last hexadecimal from a to b. In decimal, the rounding would be a rounding down; however, in hexadecimal, an a, which is higher than 8, will be rounded up to b.

We continue with a zero division: mxcsr signals a zero division in bit 2, but the program does not crash because the zero-division mask ZE is set. The result is inf or 0x7ff0000000000000.

The next division and round-up has the same result as rounding to nearest. The next two divisions with round-down and truncate result in a number with a last hexadecimal digit of a.

To show the difference in rounding, we do the same exercise with 11 divided by 3. This division results in a quotient with a low final hexadecimal digit. You can compare the rounding behavior.

As an exercise, clear the zero-division mask bit and rerun the program. You will see that the program will crash. The zero-division mask and the other masks allow you to catch errors and jump to some error procedure.

Summary

In this chapter, you learned about the following:

The layout and purpose of the mxcsr register
How to manipulate the mxcsr register
How to round subtleties

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 27. Watch Your MXCSR

Create new playlist

Sign In

Sign Up

27. Watch Your MXCSR

Manipulating the mxcsr Bits

Analyzing the Program

Summary

Table of Contents for
27. Watch Your MXCSR