When we think about strings, we humans normally assume that strings are a series of characters that form words or phrases that we can understand. But in assembly language, any list or array of contiguous memory places is considered a string, whether it’s human-understandable or not. Assembly provides us with a number of powerful instructions for manipulating these blocks of data in an efficient way. In our examples, we will use readable characters, but keep in mind that in reality assembly does not care if the characters are readable. We will show how to move strings around, how to scan them, and how to compare strings.
As powerful as these instructions may be, we will propose even better functionality when we discuss SIMD instructions in later chapters. But let’s start with the basic instructions here.
Moving Strings
move_strings.asm
In this program, we use a macro (for more details on macros, see Chapter 18) to do the printing, but we could as well have used the C printf function, as we have done already so many times.
We start with creating a string with the 95 printable characters in the ASCII table, the first being 32 (the space) and the last being 126 (the tilde, or ~). There’s nothing special here. We first print a title, and then we put the first ASCII code in rax, letting rdi point to the address of my_string in memory. Then we put the length of the string in rcx to use in a loop. In the loop, we copy one ASCII code from al to my_string, take the next code and write it to the next memory address in my_string, and so on. Finally, we print the string. Again, there’s nothing new here.
In the next part, we modify the content of my_string to all 0s (ASCII 48). To do that, we put the string length again in rcx for building a loop. Then we use the instruction stosb to store the 1s (ASCII 49) to my_string. The instruction stosb only needs the start address of the string in rdi and the character to write in rax, and stosb steps to the next memory address in each repeat of the loop. We do not have to care about increasing rdi anymore.
In the next part of the program, we go one step further and get rid of the rcx loop. We use the instruction rep stosb for repeating the stosb a number of times. The number of repetitions is stored in rcx. This is a highly efficient method of initializing memory.
Next, we continue moving around memory content. Strictly speaking, we will be copying memory blocks, not moving copy content. First, we initialize our string again with the readable ASCII codes. We could optimize this code by using a macro or a function for that, instead of just repeating the code. Then we start the copying of the string/memory block: from my_string to other_string. The address of the source string goes into rsi, and the address of the destination string goes in rdi. This is easy to remember, because the s in rsi stands for source and the d in rdi stands for destination. Then use rep movsb, and we are done! The rep copying stops when rcx becomes 0.
In the last part of the program, we will reverse move memory content. The concept can be a little bit confusing; we go in some detail here. When using movsb , the content of DF (the direction flag) is taken into account. When DF=0, rsi and rdi are increased by 1, pointing to the next higher memory address. When DF=1, rsi and rdi are decreased by 1, pointing to the next lower memory address. This means that in our example with DF=1, rsi needs to point to the address of the highest memory address to be copied and decrease from there. In addition, rdi needs to point to the highest destination address and decrease from there. The intention is to “walk backward” when copying, that is, decreasing rsi and rdi with every loop. Be careful: rsi and rdi both are decreased; you cannot use the DF to increase one register and decrease another (reversing the string). In our example, we do not copy the whole string, but only the lowercase alphabet, and we put them at the higher memory places at the destination. The instruction lea rsi,[my_string+length-4] loads the effective address of my_string in rsi and skips four characters that are not part of the alphabet. The DF flag can be set to 1 with std and set to 0 with cld. Then we invoke the powerful rep movsb, and we are done.
Here CountReg ← (CountReg – 1); tells us that the counter will be decreased first. Studying the operation of instructions can be useful for understanding the behavior of an instruction. As a final note, stosb and movsb work with bytes; there are also stosw , movsw , stosd , and movsd to work with words and double words, and rsi and rdi are accordingly incremented or decremented with 1 for bytes, 2 for words, and 4 for double words.
Comparing and Scanning Strings
strings.asm
For the comparison, we will discuss two versions. As before, we put the address of the first (source) string in rsi, the address of the second string (destination) in rdi, and the string length in rcx. Just to be sure, we clear the direction flag, DF, with cld. So, we walk forward in the strings.
The instruction cmpsb compares two bytes and sets the status flag ZF to 1 if the two compared bytes are equal or to 0 if the 2 bytes are not equal.
- jz : Jump if zero (ZF=1)
The equivalent je: Jump if equal (ZF=1) (bytes equal)
- jnz : Jump if not zero (ZF=0)
The equivalent jne: Jump if not equal (ZF=0) (bytes not equal)
The registers rsi and rdi are increased by cmpsb when DF is not set and decreased when DF is set. We create a loop that executes cmpsb, until ZF becomes 0. When ZF becomes 0, the execution jumps out of the loop and starts calculating the position of the differing character based on the value in rcx. However, rcx is adjusted only at the end of a loop, which was never completed, so we have to adjust rcx (decrease it with 1). The resulting position is returned to main in rax.
In the second version for comparing, we will use repe, a version of rep, meaning “repeat while equal.” As before, cmpsb sets ZF according to the comparison, and ZF=1 means the bytes are equal. As soon as cmpsb sets ZF equal to 0, the repe loop is ended, and rcx can be used to compute the position where the differing character appeared. If the strings are completely the same, then rcx will be 0 and ZF will be 1. After repe, the instruction je tests if ZF equals 1. If ZF is 1, the strings are equal; if 0, the strings are not equal. We use rcx to calculate the differing position, so there’s no need to adjust rcx, because repe decreases rcx first in every loop.
The scanning works similarly, but with repne , “repeat while not equal,” instead of repe. We also use lodsb and load the byte at address rsi into rax. The instruction scasb compares the byte in al (the low byte in rax) with the byte pointed to by rdi and sets (1=equal) or resets (0=not equal) the ZF flag accordingly. The instruction repne looks at the status flag and continues if ZF = 0; that is, the 2 bytes are not equal. If the 2 bytes are equal, scasb sets ZF to 1, the repne loop stops, and rcx can be used to compute the position of the byte in the string.
The scanning works with only one character as a search argument; if you are wondering how to use a string as search argument, you will have to scan character by character. Or better yet, wait for the chapters on SIMD.
Summary
Moving and copying memory blocks in an extremely efficient way
Using movsb and rep
Comparing and scanning memory blocks
Using cmpsb, scasb, repe, and repne