In this chapter, we discuss the ARM-based computer’s memory. So far, we’ve used memory to hold our Assembly instructions; now we will look in detail at how to define data in memory, then how to load memory into registers for processing, and, finally, how to write the results back to memory.
The ARM processor uses what is called a load-store architecture. This means that the instruction set is divided into two categories: one to load and store values from and to memory and the other to perform arithmetic and logical operations between the registers. We’ve spent most of our time looking at the arithmetic and logical operations. Now we will look at the other category.
Memory addresses are 64 bits while instructions are 32 bits, so we have the same problems that we experienced in Chapter 2, “Loading and Adding,” where we used all sorts of tricks to load 64 bits into a register using a 32-bit instruction. In this chapter, we’ll use these same tricks for loading addresses, along with a few new ones, the goal being to load a 64-bit address in one instruction in as many cases as we can.
The ARM instruction set has some powerful instructions to access memory, including several techniques to access arrays of data structures and to increment pointers in loops while loading or storing data.
Defining Memory Contents
Some sample memory directives
The first line defines 7 bytes all with the same value. We can define our bytes in decimal, octal (base 8), binary, hex, or ASCII. Anywhere we define numbers, we can use expressions that the Assembler will evaluate when it compiles our program.
We start most memory directives with a label, so we can access it from the code. The only exception is if we are defining a larger array of numbers that extends over several lines.
A decimal integer starts with a nonzero digit and contains decimal digits 0–9.
An octal integer starts with zero and contains octal digits 0–7.
A binary integer starts with 0b or 0B and contains binary digits 0–1.
A hex integer starts with 0x or 0X and contains hex digits 0–F.
A floating-point number starts with 0f or 0e followed by a floating-point number.
Be careful not to start decimal numbers with zero (0), since this indicates the constant is an octal (base 8) number.
Negative (-) will take the two’s complement of the integer.
Complement (~) will take the one’s complement of the integer.
The list of memory definition Assembler directives
Directive | Description |
---|---|
.ascii | A string contained in double quotes |
.asciz | A 0-byte terminated ascii string |
.byte | 1-byte integers |
.double | Double-precision floating-point values |
.float | Floating-point values |
.octa | 16-byte integers |
.quad | 8-byte integers |
.short | 2-byte integers |
.word | 4-byte integers |
ASCII escape character sequence codes
Escape Character Sequence | Description |
---|---|
Backspace (ASCII code 8) | |
f | Form feed (ASCII code 12) |
| New line (ASCII code 10) |
| Return (ASCII code 13) |
| Tab (ASCII code 9) |
ddd | An octal ASCII code (ex 123) |
xdd | A hex ASCII code (ex x4F) |
\ | The “” character |
” | The double quote character |
anything-else | Anything-else |
Aligning Data
The first byte is word aligned, but because it is only 1 byte, the next word of data will not be aligned. If we need it to be word aligned, then we can add the “.align 4” directive to make it word aligned. This will result in three wasted bytes, but with gigabyte of memory, this shouldn’t be too much of a worry.
ARM Assembly instructions must be word aligned, so if we insert data in the middle of some instructions, then we need an .align directive before the instructions continue, or our program will crash when we run it. In the next section, we’ll see that when we load data with PC relative addressing, these addresses must also be word aligned. Usually the Assembler will give you an error when alignment is required, and throwing in an “.align 4” directive is a quick fix.
Loading a Register with an Address
In this section, we will look at the LDR instruction and its variations to load a memory address into a register. Once we have an address into a register, we’ll go on to look at all the ways we can use it to load and store data.
It’s a bit confusing that we use the LDR instruction to both load an address into a register and then to use that address to load actual data into a register. The two operations are distinct, and it’s almost worth considering LDR as two separate instructions, one where we are using PC relative addressing to load an address and then the other being all the forms of LDR where we are loading data.
PC Relative Addressing
to load the address of our helloworld string into X1. The Assembler knows the value of the program counter at this point, so it can provide an offset to the correct memory address. Therefore, it’s called PC relative addressing. There is a bit more complexity to this, which we’ll get to in a minute.
The offset from the PC has 19 bits in the instruction, which gives a range of +/-1MB. The offset address is in words.
The GNU Assembler is helping us out by putting the constant we want into memory, then creating a PC relative instruction to load it.
The PC has become more of an abstract register in the modern 64-bit world. The ARM processor can execute multiple instructions at once and even execute them out of order. In the 32-bit world, the PC was a real register that you could load, add to, and manipulate like any general-purpose register. This caused havoc for hardware engineers trying to design efficient instruction pipelines, so in 64 bits, instructions can’t manipulate the PC directly. For PC relative addressing, it really becomes addressing relative to the current instruction. In the preceding example, “ldr X1, #8” means 8 words from the current instruction.
In Chapter 2, “Loading and Adding,” we performed this with a MOV/MOVT pair. Here we are doing the same thing in one instruction. Both take the same memory, either two 32-bit instruction or one 32-bit instruction, and one 32-bit memory location.
the Assembler did the same thing; it created the address of the hellostring in memory and then loaded the contents of that memory location, not the helloworld string. We’ll look carefully at this process when we discuss our program to convert strings to upper-case later in this chapter.
These constants the Assembler creates are placed at the end of the .text section which is where the Assembly instructions go, not in the .data section. This makes them read-only in normal circumstances, so they can’t be modified. Any data that you want to modify should go in a .data section.
- 1.
An offset of 1MB looks large, but only addresses a fraction of the memory in a modern computer. This way we can access 1MB objects rather than 1MB words. This helps keep our program equally efficient as it gets larger.
- 2.
All the labels we define go into the object file’s symbol table, making this array of addresses, essentially our symbol table. This way it’s easy for the linker/loader and operating system to change memory addresses without you needing to recompile your program.
- 3.
If you need any of these variables to be global, you can just make them global (accessible to other files), without changing your program. If we didn’t have this level of indirection, making a variable global would require adjustments to the instructions that load and save it.
This is another example of the tools helping us, though at first it may not seem so. In our simple one-line examples, it appears to add a layer of complexity, but in a real program, this is the design pattern that works.
then the helloworld string has to be in the .text section. iOS doesn’t like the other form since the loader has to fix up the addresses to where the program is loaded in memory, and Apple considers this a worthwhile optimization.
Loading Data from Memory
In our HelloWorld program, we only needed the address to pass on to Linux, which then used it to print our string. Generally, we like to use these addresses to load data into a register.
The data types for the load/store instructions
Type | Meaning |
---|---|
B | Unsigned byte |
SB | Signed byte |
H | Unsigned halfword (16 bits) |
SH | Signed halfword (16 bits) |
SW | Signed word |
The signed version will extend the sign across the rest of the register when we load the data. We don’t need unsigned word, since we just use a W register in this case.
Loading an address and then the value
If you step through this in the debugger, you can watch it load 0x123456789ABCDEF0 into X2.
The square bracket syntax represents indirect memory access. This means load the data stored at the address pointed to by X1, not move the contents of X1 into X2.
This works, but you might be dissatisfied that it took us two instructions to load X2 with our value from memory, one to load the address and then one to load the data. This is life programming a RISC processor; each instruction executes very quickly, but performs a small chunk of work. As we develop algorithms, we’ll see that we usually load an address once and then use it quite a bit, so most accesses take one instruction once we are going.
Indexing Through Memory
Pseudo-code to loop through an array
The ARM instruction set gives us support for doing these sorts of operations.
Indexing into an array
Notice how we use W2 to specify that we want to load 32 bits or one word. Addresses are always 64 bits and we must use an X register. However, as in this case, we often only need to load a smaller quantity of data.
Using a register as an offset
Multiplying an offset by 4 using a shift operation
Write Back
When the address is calculated, the result is thrown away after we’ve loaded the register. When performing a loop, it is handy to keep the calculated address. This saves us doing a separate ADD on our index register.
updates X1 with the value calculated. In the examples we’ve studied, this isn’t that useful, but it becomes much more useful in the next section. You can only use this in the simple case shown; it can’t be used when a register is used in place of an immediate offset.
Post-Indexed Addressing
The preceding section covers what is called pre-indexed addressing. This is because the address is calculated and then the data is retrieved using the calculated address. In post-indexed addressing, the data is retrieved first using the base register; then any offset adding is done. In the context of one instruction, this seems strange, but when we write loops, we will see this is what we want. The calculated address is written back to the base address register, since otherwise there is no point in using this feature, so we don’t need the !.
Example of post-indexed addressing
Converting to Upper-Case
Pseudo-code to convert a string to upper-case
In this example, we are going to use NULL-terminated strings. These are very common in C programming. Here instead of a string being a length and a sequence of characters, the string is the sequence of characters, followed by a NULL (ASCII code 0 or ) character. To process the string, we simply loop until we hit the NULL character. This is quite different than the fixed length string we dealt with when printing hex digits in Chapter 4, “Controlling Program Flow.”
We’ve already covered FOR and WHILE loops. The third common structured programming loop is the DO/UNTIL loop, which puts the condition at the end of the loop. In this construct, the loop is always executed once. In our case, we want this, since if the string is empty, we still want to copy the NULL character, so the output string will then be empty as well.
Another difference is that we aren’t changing the input string. Instead we leave the input string alone and produce a new output string with the upper-case version of the input string.
Pseudo-code for how we will implement the IF statement
We don’t have the structured programming constructs of a high-level language to help us, and this turns out to be quite efficient in Assembly Language.
Program to convert a string to upper-case
Two instructions: Initialize our pointers for instr and outstr.
Five instructions: Make up the if statement.
Four instructions: For the loop, including loading a character, saving a character, updating both pointers, checking for a null character, and branching if not null.
It would be nice if STRB also set the condition flags, but there is no STRBS version. LDR and STR just load and save; they don’t have functionality to examine what they are loading or saving, so they can’t set the condition flags, hence the need for the CMP instruction in the UNTIL part of the loop to test for NULL.
In this example, we use the LDRB and STRB instructions, since we are processing byte by byte. The STRB instruction is the reverse of the LDRB instruction. It saves its first argument to the address built from all its other parameters. By covering LDR in so much detail, we’ve also covered STR which is the mirror image.
The lower-case characters have higher values than the upper-case characters, so we just use an expression that the Assembler will evaluate to get the correct number to subtract.
Here we’ve just loaded X1 with the address of outstr. X3 held the address of outstr in our loop, but because we used post-indexed addressing, it got incremented in each iteration of the loop. As a result, it is now pointing 1 past the end of the string. We then calculate the length by subtracting the address of the start of the string from the address of the end of the string. We could have kept a counter for this in our loop, but in Assembly we are trying to be efficient, so we want as few instructions as possible in our loops.
Disassembly of the upper-case program
Here objdump is trying to be helpful by telling us what will be loaded, namely, the address stored at address 0x400100, which the Assembler added to our .text section to hold the address of our input string. If we look at address 0x400100, we see it contains 0x00410110, which is the address of instr in the .data section. It might appear here that the addresses are 32 bits, but this is objdump doing some misinterpretation. Notice the 0 word before the address, which objdump has listed as an illegal instruction, whereas this is really the other half of our address.
If we look at the actual encoding of the instruction, it is 0x58000284. The 58 is the opcode and the low-order 5 bits are the register number, in this case 4. This means the offset encoded in the instruction is 101000 in binary. Remember the offset is in words, so we need to shift left 2 bits to multiply by 4 for the offset in bytes which gives 0101 0000 in binary which is 0x50 in hex. If we add 0x50 to the address of the LDR instruction which is 0x4000b0, we get the desired address of 0x400100. Aren’t we glad the Assembler does all this for us?
This shows how the Assembler added the literal for the address of the string instr at the end of the code section. When we do the LDR, it accesses this literal and loads it into memory; this gives us the address we need in memory. The other literal added to the code section is the address of outstr.
The print (p) command knows about our labels but doesn’t know about our data types, and we must cast the label to tell it how to format the output. Gdb handles this better with high-level languages because it knows about the data types of the variables. In Assembly, we are closer to the metal.
Storing a Register
The store register STR instruction is a mirror of the LDR instruction. All the addressing modes we’ve talked about for LDR work for STR. This is necessary since in a load-store architecture, we need to store everything we load after it is processed in the CPU. We’ve seen the STR instruction a couple of times already in our examples.
If we are using the same registers to load and store the data in a loop, typically the first LDR call will use pre-indexed addressing without write back and then the STR instruction will use post-indexed addressing with write back to advance to the next item for the next iteration of the loop.
Double Registers
There are doubleword versions of all the LDR and STR instructions we’ve seen. The LDP instruction takes a pair of registers to load as parameters and then loads 128 bits of memory into these. Similarly for the STP instruction.
Example of loading and storing a doubleword
We will use these instructions extensively when we need to save registers to the stack and later restore them in Chapter 6, “Functions and the Stack.”
Summary
With this chapter, we can now load data from memory, operate on it in the registers, and then save the result back to memory. We examined how the data load and store instructions help us with arrays of data and how they help us index through data in loops.
In the next chapter, we will look at how to make our code reusable; after all, wouldn’t our upper-case program be handy if we could call it whenever we wish?
Exercises
- 1.
Create a small program to try out all the data definition directives the Assembler provides. Assemble your program and use objdump to examine the data. Add some align directives and examine how they move around.
- 2.
Explain how the LDR instruction lets you load any 64-bit address in only one 32-bit instruction.
- 3.
Write a program that converts a string to all lower-case.
- 4.
Write a program that converts any non-alphabetic character in a NULL-terminated string to a space.