30. SSE String Manipulation

With SSE version 4.2, four compare-string instructions were introduced: two instructions for strings with implicit lengths and two instructions for strings with explicit lengths. Two of these four instructions use masks.

A string with an implicit length is a string with a terminating 0. For a string with an explicit length, the length has to be specified by some other means.

In this chapter, we will spend some time with SSE strings, because the compare instructions are a bit complicated and unusual, especially when using masks. Here are the instructions:












Index in ecx






Mask in xmm0






Index in ecx






Mask in xmm0

Here is what the instructions mean:
  • pcmpistri: Packed compare implicit length strings, return index

  • pcmpistrm: Packed compare implicit length strings, return mask

  • pcmpestri: Packed compare explicit length strings, return index

  • pcmpestrm: Packed compare explicit length strings, return mask

These compare instructions take three arguments. Argument 1 is always an xmm register, argument 2 can be an xmm register or a memory location, and argument 3 is an “immediate,” which is a control byte (imm8 in the Intel manuals) that specifies how the instruction executes. The control byte has an important role, so we will spend some time explaining the details.

The imm8 Control Byte

Table 30-1 shows the layout of the control byte.
Table 30-1

imm8 Control Byte


Bit Position

Bit Value








Output Format



Bit mask

xmm0 contains IntRes2 as a bit mask



Byte mask

xmm0 contains IntRes2 as a byte mask



Least significant index

Least significant index found in ecx



Most significant index

Most significant index found in ecx





IntRes2 = IntRes1




IntRes2 = ~IntRes1



Masked +

IntRes2 = IntRes1



Masked -

IntRes2 = ~IntRes1




Equal any

Match characters




Equal range

Match characters in range




Equal each

String compare



Equal ordered

Substring search

Data Format



Packed unsigned bytes



Packed unsigned words



Packed signed bytes



Packed signed words


The compare instructions take the input data (the format is specified in bits 1 and 0), execute aggregation and comparison actions (bits 2 and 3), which give an intermediate result (a match between arg1 and arg2). This result is called IntRes1 in the Intel manuals. The polarity is applied on IntRes1 to give IntRes2 . IntRes2 is then used to output a result in the required format. Negative polarity (~IntRes1) means take the ones’ complement of IntRes1 and put the result in IntRes2. That is, convert every 1 bit to a 0 bit and convert every 0 bit to a 1 bit. It’s a logical NOT, in other words. The result in IntRes2 can be stored as a mask in xmm0 for the mask instructions pcmpistrm and pcmpestrm or as an index in ecx for pcmpistri and pcmpestri. Some examples will be helpful here.

Here are some control byte examples:
00001000 or 0x08:
     00 - packed unsigned bytes,
     10 - equal each,
     00 - positive polarity,
     00 - lowest significant index into ecx
01000100 or 0x44:
     00 - packed unsigned bytes,
     01 - equal range,
     00 - positive polarity,
     01 - xmm0 contains byte mask

Using the imm8 Control Byte

In this section we show how we can set the bits in the imm8 control byte in order to control the behavior of the packed string instructions. We added examples to illustrate the effect of the different settings. 

Bits 0 and 1

Bits 0 and 1 indicate the data source format; the data source can be a packed byte or a packed word, unsigned or signed.

Bits 2 and 3

Bits 2 and 3 indicate the aggregation to be applied. The result is called IntRes1 (intermediate result 1). A block of 16 bytes is taken from the second operand and compared with the content in the first operand.

The aggregation can be as follows:
  • equal any (00) or find characters from a set: This means search operand 1 and look for any characters in operand 2. When you find a match, set the corresponding bit to 1 in IntRes1. Here’s an example:

    operand 1: "this is a joke!!"
    operand 2: "i!"
    IntRes1:    0010010000000011
  • equal range (01) or find characters from a range: This means search operand 1 and look for any characters in the range given in operand 2. When you find a match, set the corresponding bit to 1 in IntRes1. Here’s an example:

    operand 1: "this is a joke!!"
    operand 2: "aj"
    IntRes1:    0010010010100100
  • equal each (10) or string compare: This means compare any character in operand 1 to the corresponding character in operand 2. When you find a match, set the corresponding bit in IntRes1 to 1. Here’s an example:

    operand 1: "this is a joke!!"
    operand 2: "this is no joke!"
    IntRes1:    1111111100000000
  • equal ordered (11) or substring search: This means search operand 1 for the string in operand 2. When you find a match, set the corresponding bit in IntRes1 to 1. Here’s an example:

    operand 1: "this is a joke!!"
    operand 2: "is"
    IntRes1:    0010010000000000

Bits 4 and 5

Bits 4 and 5 apply the polarity and store the result in IntRes2.
  • Positive polarity (00) and (10): IntRes2 will be identical to IntRes1. Here’s an example:

    IntRes1: 0010010000000011
    IntRes2: 0010010000000011
  • Negative polarity (01) and (11): IntRes2 will be the ones’ complement, or the logical negation of IntRes1. Here’s an example:

    IntRes1: 0010010000000011
    IntRes2: 1101101111111100

Bit 6

Bit 6 sets the output format, with two cases.
  • Not using a mask:

  • 0: The index returned in ecx is the least significant bit set in IntRes2. Here’s an example:

    IntRes2: 0010010011000000
    ecx = 6
    In IntRes2, the first 1 bit is found at index 6 (counting starts at 0 and from the right).
  • 1: The index returned in ecx is the most significant bit set in IntRes2. Here’s an example:

    IntRes2: 0010010010100100
    ecx = 13
    In IntRes2, the last 1 bit is found at index 13 (counting starts at 0 and from the right).
  • Using a mask:

  • 0: IntRes2 is returned as a mask in the least significant bits of xmm0 (zero extension to 128 bits). Here’s an example:

    Search for all characters 'a' and 'e' in the string = 'qdacdekkfijlmdoz'
    xmm0: 024h
    or in binary 0000000000100100
  • Note that the mask is reversed in xmm0.

  • 1: IntRes2 is expanded into a byte/word mask into xmm0. Here’s an example:

    Search for all characters 'a' and 'e' in the string = 'qdacdekkfijlmdoz'
    xmm0:  00000000000000000000ff0000ff0000
  • Note that the mask is reversed in xmm0.

Bit 7 Reserved

Bit 7 is reserved.

The Flags

For the implicit length instructions, the flags are used in a way that is different from what you have seen in previous chapters (see the Intel manuals).
CF – Reset if IntRes2 is equal to zero, set otherwise
ZF – Set if any byte/word of xmm2/mem128 is null, reset otherwise
SF – Set if any byte/word of xmm1 is null, reset otherwise
OF – IntRes2[0]
AF – Reset
PF – Reset
For the explicit length instructions, the flags are also used in different ways, as follows (see the Intel manuals):
CF – Reset if IntRes2 is equal to zero, set otherwise
ZF – Set if absolute-value of EDX is < 16 (8), reset otherwise
SF – Set if absolute-value of EAX is < 16 (8), reset otherwise
OF – IntRes2[0]
AF – Reset
PF – Reset

In the examples in the following chapter, we will use the CF flag to see whether there was any result and ZF to detect the end of a string.

This theory might sound complicated; indeed, it’s time for some practice.


In this chapter, you learned about the following:
  • SSE string manipulation instructions

  • The layout and use of the imm8 control byte

