With SSE version 4.2, four compare-string instructions were introduced: two instructions for strings with implicit lengths and two instructions for strings with explicit lengths. Two of these four instructions use masks.
A string with an implicit length is a string with a terminating 0. For a string with an explicit length, the length has to be specified by some other means.
String | Instruction | arg1 | arg2 | arg3 | Output |
---|---|---|---|---|---|
implicit | pcmpistri | xmm | xmm/m128 | imm8 | Index in ecx |
implicit | pcmpistrm | xmm | xmm/m128 | imm8 | Mask in xmm0 |
explicit | pcmpestri | xmm | xmm/m128 | imm8 | Index in ecx |
explicit | pcmpestrm | xmm | xmm/m128 | imm8 | Mask in xmm0 |
pcmpistri: Packed compare implicit length strings, return index
pcmpistrm: Packed compare implicit length strings, return mask
pcmpestri: Packed compare explicit length strings, return index
pcmpestrm: Packed compare explicit length strings, return mask
These compare instructions take three arguments. Argument 1 is always an xmm register, argument 2 can be an xmm register or a memory location, and argument 3 is an “immediate,” which is a control byte (imm8 in the Intel manuals) that specifies how the instruction executes. The control byte has an important role, so we will spend some time explaining the details.
The imm8 Control Byte
imm8 Control Byte
Options | Bit Position | Bit Value | Operation | Meaning |
---|---|---|---|---|
7 | 0 | Reserved | Reserved | |
Output Format | 6 | 0 | Bit mask | xmm0 contains IntRes2 as a bit mask |
1 | Byte mask | xmm0 contains IntRes2 as a byte mask | ||
0 | Least significant index | Least significant index found in ecx | ||
1 | Most significant index | Most significant index found in ecx | ||
Polarity | 5,4 | 00 | + | IntRes2 = IntRes1 |
01 | - | IntRes2 = ~IntRes1 | ||
10 | Masked + | IntRes2 = IntRes1 | ||
11 | Masked - | IntRes2 = ~IntRes1 | ||
Aggregation | 3,2 | 00 | Equal any | Match characters |
and | 01 | Equal range | Match characters in range | |
Comparison | 10 | Equal each | String compare | |
11 | Equal ordered | Substring search | ||
Data Format | 1,0 | 00 | Packed unsigned bytes | |
01 | Packed unsigned words | |||
10 | Packed signed bytes | |||
11 | Packed signed words |
The compare instructions take the input data (the format is specified in bits 1 and 0), execute aggregation and comparison actions (bits 2 and 3), which give an intermediate result (a match between arg1 and arg2). This result is called IntRes1 in the Intel manuals. The polarity is applied on IntRes1 to give IntRes2 . IntRes2 is then used to output a result in the required format. Negative polarity (~IntRes1) means take the ones’ complement of IntRes1 and put the result in IntRes2. That is, convert every 1 bit to a 0 bit and convert every 0 bit to a 1 bit. It’s a logical NOT, in other words. The result in IntRes2 can be stored as a mask in xmm0 for the mask instructions pcmpistrm and pcmpestrm or as an index in ecx for pcmpistri and pcmpestri. Some examples will be helpful here.
Using the imm8 Control Byte
In this section we show how we can set the bits in the imm8 control byte in order to control the behavior of the packed string instructions. We added examples to illustrate the effect of the different settings.
Bits 0 and 1
Bits 0 and 1 indicate the data source format; the data source can be a packed byte or a packed word, unsigned or signed.
Bits 2 and 3
Bits 2 and 3 indicate the aggregation to be applied. The result is called IntRes1 (intermediate result 1). A block of 16 bytes is taken from the second operand and compared with the content in the first operand.
equal any (00) or find characters from a set: This means search operand 1 and look for any characters in operand 2. When you find a match, set the corresponding bit to 1 in IntRes1. Here’s an example:
operand 1: "this is a joke!!"operand 2: "i!"IntRes1: 0010010000000011equal range (01) or find characters from a range: This means search operand 1 and look for any characters in the range given in operand 2. When you find a match, set the corresponding bit to 1 in IntRes1. Here’s an example:
operand 1: "this is a joke!!"operand 2: "aj"IntRes1: 0010010010100100equal each (10) or string compare: This means compare any character in operand 1 to the corresponding character in operand 2. When you find a match, set the corresponding bit in IntRes1 to 1. Here’s an example:
operand 1: "this is a joke!!"operand 2: "this is no joke!"IntRes1: 1111111100000000equal ordered (11) or substring search: This means search operand 1 for the string in operand 2. When you find a match, set the corresponding bit in IntRes1 to 1. Here’s an example:
operand 1: "this is a joke!!"operand 2: "is"IntRes1: 0010010000000000
Bits 4 and 5
Positive polarity (00) and (10): IntRes2 will be identical to IntRes1. Here’s an example:
IntRes1: 0010010000000011IntRes2: 0010010000000011Negative polarity (01) and (11): IntRes2 will be the ones’ complement, or the logical negation of IntRes1. Here’s an example:
IntRes1: 0010010000000011IntRes2: 1101101111111100
Bit 6
Not using a mask:
0: The index returned in ecx is the least significant bit set in IntRes2. Here’s an example:
IntRes2: 0010010011000000ecx = 6In IntRes2, the first 1 bit is found at index 6 (counting starts at 0 and from the right).1: The index returned in ecx is the most significant bit set in IntRes2. Here’s an example:
IntRes2: 0010010010100100ecx = 13In IntRes2, the last 1 bit is found at index 13 (counting starts at 0 and from the right).Using a mask:
0: IntRes2 is returned as a mask in the least significant bits of xmm0 (zero extension to 128 bits). Here’s an example:
Search for all characters 'a' and 'e' in the string = 'qdacdekkfijlmdoz'thenxmm0: 024hor in binary 0000000000100100Note that the mask is reversed in xmm0.
1: IntRes2 is expanded into a byte/word mask into xmm0. Here’s an example:
Search for all characters 'a' and 'e' in the string = 'qdacdekkfijlmdoz'thenxmm0: 00000000000000000000ff0000ff0000Note that the mask is reversed in xmm0.
Bit 7 Reserved
Bit 7 is reserved.
The Flags
In the examples in the following chapter, we will use the CF flag to see whether there was any result and ZF to detect the end of a string.
This theory might sound complicated; indeed, it’s time for some practice.
Summary
SSE string manipulation instructions
The layout and use of the imm8 control byte