Now that we know how to shuffle, we can discuss string masks.
Remember that SSE provides two string manipulation instructions that use a mask: pcmpistrm and pcmpestrm. We will be using implicit length instructions. At first, using masks looks complicated, but once you get the hang of it, you will see how powerful masking can be.
Searching for Characters
string4.asm
print16b.c
makefile
The main part of the program is quite simple, but as with the previous examples, the program is complicated by the fact that we want to print some result on the screen. We could have avoided the printing parts and used a debugger to study the results in the registers and memory. But coping with the challenges of printing is fun, right?
In our example program, we are going to search for two characters in a string. We provide a string, aptly called string1, and we look for the character 'e', which we stored in string2, and the character 'a', stored in string3.
We use a number of functions. Let’s first discuss the function reverse_xmm0 . This function takes xmm0 as an argument and reverses the order of the bytes using a shuffle. By doing so, we will be able to print xmm0 starting with the least significant bytes first and thus print in little-endian format. That is why we presented shuffling in the previous chapter.
We also have a function to measure the length of a string: pstrln . We need this because we will be reading 16-byte blocks. The last block will probably not contain 16-bytes, so for the last block, we need to determine the position of the terminating 0. This will help us to print a mask that has the same length as the string.
Our custom function pcharsrch, which takes the three strings as arguments, is where the action takes place. In the function we first do some housekeeping such as initializing registers. Register xmm1 will be used as a mask; we store the characters to search for in xmm1 with the instruction pinsrb (packed insert bytes). Then we start looping, copying each time 16 bytes of string1 in xmm2, in search of our character, or the terminating null. We use the masking instruction pcmpistrm (packed compare implicit length string with a mask). The pcmpistrm instruction takes as a third operand an immediate control byte specifying what to do, in this case “equal any” and a “byte mask in xmm0.” So, we will be looking for “any” character that “equals” our search strings. For every matching character in xmm2, the bit in xmm0 that corresponds to the position of the matching character in xmm2 will be set to 1. The pcmpistrm instruction does not have xmm0 as an operand, but it is used implicitly. The return mask will always be kept in xmm0.
The difference with pcmistri is that pcmistri would return an index of 1, matching the position in ecx. But pcmpistrm will return all matching positions in xmm0 for the 16-byte block. That allows you to drastically cut down on the number of steps to execute in order to find all matches.
You can use a bit mask or a byte mask for xmm0 (set or clear bit 6 in the control byte). We used a byte mask so that you can read the xmm0 register more easily with a debugger, two ffs in xmm0 indicate a byte with all the bits set to 1.
After the first 16-byte block is investigated, we verify whether we have found a terminating 0 and store the result of the verification in cl for later use. We want to print the mask stored in xmm0 with the function print_mask. In the debugger, notice that the byte mask is reversed in xmm0, because of the little-endian format. So, before printing, we have to reverse it; that is what we do in our function reverse_xmm0. Then we call our C function print16b to print the reversed mask. However, we cannot provide xmm0 as an argument to print16b, because under the covers print16b is using printf, and printf will interpret xmm0 as a floating-point value, not a byte mask. So, before calling print16b, we transfer the bit mask in xmm0 to r13d, with the instruction pmovmksb (which means “move byte mask”). We will use r13d later for counting; for printing we copy it to edi. We store xmm1 on the stack for later use.
We call the C function print16b to print the mask. This function takes edi (the mask) and rsi (length, passed from the caller) as arguments.
Upon returning to pcharsrch, we count the number of 1s in r13d with the instruction popcnt and update the counter in r12d. We also determine whether we have to exit the loop because a terminating null was detected in the block of bytes.
Before calling print_mask, when a terminating 0 is found, the relevant length of the last block is determined with the function pstrlen. The start address of that block is determined by adding rbx, containing the already screened bytes from previous blocks, to rdi, the address of string1. The string length, returned in rax, is used to compute the number of remaining mask bytes in xmm0 that are passed in rsi to print.
Isn’t printing a lot of fun?
Don’t be overwhelmed by the printing stuff. Concentrate first on how masks work, which is the main purpose of this chapter.
What can we do with a mask returned by pcmpistrm? Well, the resulting mask can be used, for example, to count all the occurrences of a search argument or to find all occurrences and replace them with something else, creating your own find-and-replace functionality.
Now let’s look at another search.
Searching for a Range of Characters
A range can be any number of characters to search for, e.g., all uppercase characters, all characters between a and k, all characters that represent digits, and so on.
string5.asm
This program is almost entirely the same as the previous one; we just gave string2 and string3 more meaningful names. Most important, we changed the control byte that is handed to pcmpistrm to 01000100b, which means “equal range” and “mask byte in xmm0.”
The print handling is the same as in the previous section.
Let’s see one more example.
Searching for a Substring
string6.asm
We used almost the same code as before; we only changed the strings, and the control byte contains “equal ordered” and “byte mask in xmm0.” Pretty easy, isn't it?
Summary
Using string masks
Searching for characters, ranges, and substrings
Printing masks from xmm registers