Bit | Mnemonic | Meaning |
---|---|---|
0 | IE | Invalid operation error |
1 | DE | Denormal error |
2 | ZE | Divide-by-zero error |
3 | OE | Overflow error |
4 | UE | Underflow error |
5 | PE | Precision error |
6 | DAZ | Denormals are zeros |
7 | IM | Invalid operation mask |
8 | DM | Denormal operation mask |
9 | ZM | Divide-by-zero mask |
10 | OM | Overflow mask |
11 | UM | Underflow mask |
12 | PM | Precision mask |
13 | RC | Rounding control |
14 | RC | Rounding control |
15 | FZ | Flush to zero |
Bits | Meaning |
---|---|
00 | Round to nearest |
01 | Round down |
10 | Round up |
11 | Truncate |
We will not discuss all the status and mask details of the mxcsr register; refer to the Intel manuals for all details.
Manipulating the mxcsr Bits
The bits in the mxcsr register can be manipulated with the ldmxcsr and stmxcsr instructions. The default mxcsr state is 00001F80, or 0001 1111 1000 0000. All the mask bits are set, and rounding is set to nearest.
mxcsr.asm
print_hex.c
print_mxcsr.c
makefile
In this program, we show different rounding modes and a masked zero division. The default rounding is rounding to nearest. For example, in decimal, computing a positive number ending with a .5 or higher would be rounded to the higher number, and a negative number ending with a .5 or higher would be rounded to the lower (more negative) number. However, here we are rounding in hexadecimal, not decimal, and that does not always give the same result as rounding in decimal!
Analyzing the Program
Let’s analyze the program. We have a number of divisions where we apply rounding. The divisions are done in the function apply_mxcsr. Before calling this function, we put the address of the print title in rdi, the dividend in rdi, and the divisor in rdx. Then we copy the desired mxcsr value from memory to ecx; for the first call, it’s the default mxcsr value. Then we call apply_mxcsr. In this function, we print the title, without forgetting to first preserve the necessary registers and align the stack. We then store the value in ecx to mxcsr_before and load mxcsr with the value stored in mxcsr_before with the instruction ldmxcsr. The instruction ldmxcsr takes a 32-bit memory variable (double word) as the operand. The instruction divsd takes an xmm register as a first argument and an xmm register or 64-bit variable as a second operand. After the division is done, the content of the mxcsr register is stored in memory in the variable mxcsr_after with the instruction stmxcsr . We copy the quotient in xmm2 to memory in the variable xmm in order to print it.
We first print the quotient in decimal and then want to print it in hexadecimal on the same line. We cannot print a hexadecimal value with printf from within assembly (at least not in the version in use here); we have to create a function for doing that. So, we created the function print_xmm . This function takes the memory variable xmm and loads bytes into dil one by one in a loop. In the same loop, the custom-built C function print_hex is called for every byte. By using the decreasing loop counter rcx in the address, we also take care of little-endianness: the floating-point value is stored in memory in little-endian format!
Finally, mxcsr_before and mxcsr_after are displayed so that we can compare them. The function print_mxcsr is used to print the bits in mxcsr and is similar to the bit printing functions we used in previous chapters.
Some readers may find this complex; just step through the program with a debugger and observe the memory and registers.
Let’s analyze the output: you can see that mxcsr does not change when we divide 10 by 2. When we divide 10 by 3, we have 3.333. Here mxcsr signals a precision error in bit 5. The default rounding, rounding to nearest, increases the last hexadecimal from a to b. In decimal, the rounding would be a rounding down; however, in hexadecimal, an a, which is higher than 8, will be rounded up to b.
We continue with a zero division: mxcsr signals a zero division in bit 2, but the program does not crash because the zero-division mask ZE is set. The result is inf or 0x7ff0000000000000.
The next division and round-up has the same result as rounding to nearest. The next two divisions with round-down and truncate result in a number with a last hexadecimal digit of a.
To show the difference in rounding, we do the same exercise with 11 divided by 3. This division results in a quotient with a low final hexadecimal digit. You can compare the rounding behavior.
As an exercise, clear the zero-division mask bit and rerun the program. You will see that the program will crash. The zero-division mask and the other masks allow you to catch errors and jump to some error procedure.
Summary
The layout and purpose of the mxcsr register
How to manipulate the mxcsr register
How to round subtleties