© Jo Van Hoey 2019
J. Van HoeyBeginning x64 Assembly Programminghttps://doi.org/10.1007/978-1-4842-5076-1_33

33. Do the Shuffle!

Jo Van Hoey1 
(1)
Hamme, Belgium
 

With the unmasked string instructions, we have a few options. We can find a first or last occurrence of a character, but finding all occurrences is more challenging. We can compare strings and find a difference, but finding all differences is more complicated. Luckily, we also have string instructions that use masks, which makes them much more powerful. But before diving into mask instructions, we need to look at shuffling.

A First Look at Shuffling

Shuffling means moving around packed values. The moving can be within the same xmm register or from one xmm register to another xmm register, or it can be from a 128-bit memory location to an xmm register.

Listing 33-1 shows the example code.
; shuffle.asm
extern printf
section .data
      fmt0  db "These are the numbers in memory: ",10,0
      fmt00 db "This is xmm0: ",10,0
      fmt1  db "%d ",0
      fmt2  db "Shuffle-broadcast double word %i:",10,0
      fmt3  db "%d %d %d %d",10,0
      fmt4  db "Shuffle-reverse double words:",10,0
      fmt5  db "Shuffle-reverse packed bytes in xmm0:",10,0
      fmt6  db "Shuffle-rotate left:",10,0
      fmt7  db "Shuffle-rotate right:",10,0
      fmt8  db "%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c",10,0
      fmt9  db "Packed bytes in xmm0:",10,0
      NL    db 10,0
      number1     dd 1
      number2     dd 2
      number3     dd 3
      number4     dd 4
      char  db "abcdefghijklmnop"
      bytereverse db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
section .bss
section .text
      global main
main:
push  rbp
mov   rbp,rsp
      sub  rsp,32     ;stackspace for the original xmm0
                      ;and for the modified xmm0
; SHUFFLING DOUBLE WORDS
; first print the numbers in reverse
      mov   rdi, fmt0
      call  printf
      mov   rdi, fmt1
      mov   rsi, [number4]
      xor   rax,rax
      call  printf
      mov   rdi, fmt1
      mov   rsi, [number3]
      xor   rax,rax
      call  printf
      mov   rdi, fmt1
      mov   rsi, [number2]
      xor   rax,rax
      call  printf
      mov   rdi, fmt1
      mov   rsi, [number1]
      xor   rax,rax
      call  printf
      mov   rdi, NL
      call  printf
; build xmm0 with the numbers
      pxor      xmm0,xmm0
      pinsrd    xmm0, dword[number1],0
      pinsrd    xmm0, dword[number2],1
      pinsrd    xmm0, dword[number3],2
      pinsrd    xmm0, dword[number4],3
      movdqu    [rbp-16],xmm0   ;save xmm0 for later use
      mov       rdi, fmt00
      call      printf          ;print title
      movdqu    xmm0,[rbp-16]   ;restore xmm0 after printf
      call      print_xmm0d     ;print xmm0
      movdqu    xmm0,[rbp-16]   ;restore xmm0 after printf
; SHUFFLE-BROADCAST
; shuffle: broadcast least significant dword (index 0)
      movdqu    xmm0,[rbp-16]         ;restore xmm0
      pshufd    xmm0,xmm0,00000000b   ;shuffle
      mov       rdi,fmt2
      mov       rsi, 0                ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0d      ;print the content of xmm0
; shuffle: broadcast dword index 1
      movdqu    xmm0,[rbp-16]         ;restore xmm0
      pshufd    xmm0,xmm0,01010101b   ;shuffle
      mov       rdi,fmt2
      mov       rsi, 1                ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0d      ;print the content of xmm0
; shuffle: broadcast dword index 2
      movdqu    xmm0,[rbp-16]         ;restore xmm0
      pshufd    xmm0,xmm0,10101010b   ;shuffle
      mov       rdi,fmt2
      mov       rsi, 2                ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0d      ;print the content of xmm0
; shuffle: broadcast dword index 3
      movdqu    xmm0,[rbp-16]         ;restore xmm0
      pshufd    xmm0,xmm0,11111111b   ;shuffle
      mov       rdi,fmt2
      mov       rsi, 3                ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0d      ;print the content of xmm0
; SHUFFLE-REVERSE
; reverse double words
      movdqu    xmm0,[rbp-16]         ;restore xmm0
      pshufd    xmm0,xmm0,00011011b   ;shuffle
      mov       rdi,fmt4              ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0d      ;print the content of xmm0
; SHUFFLE-ROTATE
; rotate left
      movdqu    xmm0,[rbp-16]         ;restore xmm0
      pshufd    xmm0,xmm0,10010011b   ;shuffle
      mov       rdi,fmt6              ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0d      ;print the content of xmm0
; rotate right
      movdqu    xmm0,[rbp-16]         ;restore xmm0
      pshufd    xmm0,xmm0,00111001b   ;shuffle
      mov       rdi,fmt7              ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0d      ;print the content of xmm0
;SHUFFLING BYTES
      mov       rdi, fmt9
      call      printf           ;print title
      movdqu    xmm0,[char]      ;load the character in xmm0
      movdqu    [rbp-32],xmm0    ;printf destroys xmm0
      call      print_xmm0b      ;print the bytes in xmm0
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      movdqu    xmm1,[bytereverse]    ;load the mask
      pshufb    xmm0,xmm1             ;shuffle bytes
      mov       rdi,fmt5              ;print title
      movdqu    [rbp-32],xmm0         ;printf destroys xmm0
      call      printf
      movdqu    xmm0,[rbp-32]    ;restore xmm0 after printf
      call      print_xmm0b      ;print the content of xmm0
leave
ret
;function to print double words--------------------
print_xmm0d:
push  rbp
mov   rbp,rsp
      mov       rdi, fmt3
      xor       rax,rax
      pextrd    esi, xmm0,3    ;extract the double words
      pextrd    edx, xmm0,2    ;in reverse, little endian
      pextrd    ecx, xmm0,1
      pextrd    r8d, xmm0,0
      call      printf
leave
ret
;function to print bytes---------------------------
print_xmm0b:
push  rbp
mov   rbp,rsp
      mov       rdi, fmt8
      xor       rax,rax
      pextrb    esi, xmm0,0    ;in reverse, little endian
      pextrb    edx, xmm0,1    ;use registers first and
      pextrb    ecx, xmm0,2    ;then the stack
      pextrb    r8d, xmm0,3
      pextrb    r9d, xmm0,4
      pextrb    eax, xmm0,15
      push  rax
      pextrb    eax, xmm0,14
      push  rax
      pextrb    eax, xmm0,13
      push  rax
      pextrb    eax, xmm0,12
      push  rax
      pextrb    eax, xmm0,11
      push  rax
      pextrb    eax, xmm0,10
      push  rax
      pextrb    eax, xmm0,9
      push  rax
      pextrb    eax, xmm0,8
      push  rax
      pextrb    eax, xmm0,7
      push  rax
      pextrb    eax, xmm0,6
      push  rax
      pextrb    eax, xmm0,5
      push  rax
      xor       rax,rax
      call  printf
leave
ret
Listing 33-1

shuffle.asm

First, we reserve space on the stack for variables of 128 bytes. We need this space for “pushing” xmm registers on the stack. We cannot use the standard push/pop instructions with xmm registers; we must use memory addressing to copy them to and from the stack. We use rbp, the base pointer, as a point of reference.

We print the numbers we will use as packed values. Then we load the numbers as double words into xmm0 with the instruction pinsrd (which means “packed insert double”). We save (push) xmm0 as a local stack variable with the instruction movdqu [rbp-16],xmm0. (We reserved space for this local variable at the start of the program.) Every time we execute printf, xmm0 will be modified, intentionally or not. So, we have to preserve and restore the original value of xmm0 if needed. The instruction movdqu is used to move unaligned packed integer values. To help visualize the results of the shuffling, we take into account little-endian formatting when printing. Doing so will show you xmm0, as you can see in a debugger such as SASM.

To shuffle, we need a destination operand, a source operand, and a shuffle mask. The mask is an 8-bit immediate. We will discuss some useful examples of shuffling and the respective masks in the following sections.
  • Shuffle broadcast

  • Shuffle reverse

  • Shuffle rotate

Shuffle Broadcast

A picture can make everything more understandable. Figure 33-1 shows four examples of shuffle broadcast.
../images/483996_1_En_33_Chapter/483996_1_En_33_Fig1_HTML.png
Figure 33-1

Shuffle broadcast

In the figure, the source and target are both xmm0. The lowest significant double word, d0, is specified in the mask as 00b. The second lowest, d1, is specified as 01b. The third, d2, is specified as 10b. The fourth, d3, is specified as 11b. The binary mask 10101010b, or aah in hexadecimal, works as follows: put d2 (10b) in the four target packed double-word positions. Similarly, the mask 11111111b would place d3 (11b) in the four target packed double word positions.

When you study the code, you will see the following simple shuffle instruction:
      pshufd xmm0,xmm0,10101010b

We accomplish a broadcast of the third-lowest element in xmm0. Because the function printf modifies xmm0, we need to save the content of xmm0 by storing it to memory before calling printf. In fact, we need to do more work to protect the content of xmm0 than to do the shuffling itself. 

Of course, you are not limited to the four masks we presented here; you can create any 8-bit mask and mix and shuffle as you like.

Shuffle Reverse

Figure 33-2 shows the schematic overview of a shuffle reverse.
../images/483996_1_En_33_Chapter/483996_1_En_33_Fig2_HTML.png
Figure 33-2

Shuffle reverse

The mask is 00011011b or 1bh, and that translates to the following:
  • 11 (value in d3) goes into position 0

  • 01 (value in d2) goes into position 1

  • 10 (value in d1) goes into position 2

  • 00 (value in d0) goes into position 3

As you can see in the example code, this is simple to code in assembly language, as shown here:
      pshufd xmm0,xmm0,1bh

Shuffle Rotate

There are two versions of shuffle rotate: rotate left and rotate right. It just a matter of providing the correct mask as the last argument of the shuffle instruction. Figure 33-3 shows the schematic overview.
../images/483996_1_En_33_Chapter/483996_1_En_33_Fig3_HTML.png
Figure 33-3

Shuffle rotate

Here it is in assembly language:
      pshufd xmm0,xmm0,93h
      pshufd xmm0,xmm0,39h

Shuffle Bytes

You can shuffle double words with pshufd and words with pshufw. You can also shuffle high words and low words with pshufhw and pshuflw, respectively. You can find all the details in the Intel manuals. All these instructions use a source operand, a target operand, and a mask specified with an immediate. Providing an immediate as a mask has its limitations: it is inflexible, and you have to provide the mask at assembly time, not at runtime.

But there is a solution: shuffle bytes.

You can shuffle bytes with pshufb. This instruction takes only two operands: a target xmm register operand and a mask stored in an xmm register or 128-bit memory location. In the previous code, we reversed the string 'char' with pshufb. We provide a mask at memory location bytereverse in section .data; the mask demands that we put byte 15 in position 0, byte 14 in position 1, and so on. We copy the string to be shuffled in xmm0 and the mask in xmm1, so the shuffle instruction is then as follows:
      pshufb xmm0, xmm1

Then the magic happens. Remember, the mask goes in the second operand; the source is the same as the destination and goes in the first operand.

The nice thing here is that we do not have to provide the mask at assemble time as an immediate. The mask can be built in xmm1 as a result of a computation at runtime.

Finally, Figure 33-4 shows the output of the example code.
../images/483996_1_En_33_Chapter/483996_1_En_33_Fig4_HTML.jpg
Figure 33-4

shuffle.asm output

Summary

In this chapter, you learned about the following:
  • Shuffle instructions

  • Shuffle masks

  • Runtime masks

  • How to use the stack with xmm registers

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.193