© Stephen Smith 2020
S. SmithProgramming with 64-Bit ARM Assembly Languagehttps://doi.org/10.1007/978-1-4842-5881-1_5

5. Thanks for the Memories

Stephen Smith1 
(1)
Gibsons, BC, Canada
 

In this chapter, we discuss the ARM-based computer’s memory. So far, we’ve used memory to hold our Assembly instructions; now we will look in detail at how to define data in memory, then how to load memory into registers for processing, and, finally, how to write the results back to memory.

The ARM processor uses what is called a load-store architecture. This means that the instruction set is divided into two categories: one to load and store values from and to memory and the other to perform arithmetic and logical operations between the registers. We’ve spent most of our time looking at the arithmetic and logical operations. Now we will look at the other category.

Memory addresses are 64 bits while instructions are 32 bits, so we have the same problems that we experienced in Chapter 2, “Loading and Adding,” where we used all sorts of tricks to load 64 bits into a register using a 32-bit instruction. In this chapter, we’ll use these same tricks for loading addresses, along with a few new ones, the goal being to load a 64-bit address in one instruction in as many cases as we can.

The ARM instruction set has some powerful instructions to access memory, including several techniques to access arrays of data structures and to increment pointers in loops while loading or storing data.

Defining Memory Contents

Before loading and storing memory, first we need to define some memory to operate on. The GNU Assembler contains several directives to help you define memory to use in your program. These appear in a .data section of your program. We’ll look at some examples and then summarize in Table 5-1. Listing 5-1 starts us off by showing us how to define bytes, words, 64-bit integers, and ASCII strings.
label: .byte 74, 0112, 0b00101010, 0x4A, 0X4a, 'J', 'H' + 2
       .word 0x1234ABCD, -1434
       .quad 0x123456789ABCDEF0
       .ascii      "Hello World "
Listing 5-1

Some sample memory directives

The first line defines 7 bytes all with the same value. We can define our bytes in decimal, octal (base 8), binary, hex, or ASCII. Anywhere we define numbers, we can use expressions that the Assembler will evaluate when it compiles our program.

We start most memory directives with a label, so we can access it from the code. The only exception is if we are defining a larger array of numbers that extends over several lines.

The .byte statement defines 1 or more bytes of memory. Listing 5-1 shows the various formats we can use for the contents of each byte, as follows:
  • A decimal integer starts with a nonzero digit and contains decimal digits 0–9.

  • An octal integer starts with zero and contains octal digits 0–7.

  • A binary integer starts with 0b or 0B and contains binary digits 0–1.

  • A hex integer starts with 0x or 0X and contains hex digits 0–F.

  • A floating-point number starts with 0f or 0e followed by a floating-point number.

Note

Be careful not to start decimal numbers with zero (0), since this indicates the constant is an octal (base 8) number.

The example then shows how to define a word, a quad (64-bit integer), and an ASCII string, as we saw in our HelloWorld program in Chapter 1, “Getting Started.” There are two prefix operators we can place in front of an integer:
  • Negative (-) will take the two’s complement of the integer.

  • Complement (~) will take the one’s complement of the integer.

For example:
.byte -0x45, -33, ~0b00111001
Table 5-1 lists the various data types we can define this way.
Table 5-1

The list of memory definition Assembler directives

Directive

Description

.ascii

A string contained in double quotes

.asciz

A 0-byte terminated ascii string

.byte

1-byte integers

.double

Double-precision floating-point values

.float

Floating-point values

.octa

16-byte integers

.quad

8-byte integers

.short

2-byte integers

.word

4-byte integers

If we want to define a larger set of memory, there are a couple of mechanisms to do this without having to list and count them all, such as
.fill    repeat, size, value
This repeats a value of a given size, repeat times, for example:
zeros:    .fill    10, 4, 0
creates a block of memory with 10 4-byte words all with a value of zero. The following code
.rept count
...
.endr
repeats the statements between .rept and .endr, count times. This can surround any code in your Assembly, for instance, you can make a loop by repeating your code count times, for example:
rpn: .rept 3
     .byte    0, 1, 2
     .endr
is translated to
.byte    0, 1, 2
.byte    0, 1, 2
.byte    0, 1, 2
In ASCII strings, we’ve seen the special character “ ” for new line. There are a few more for common unprintable characters as well as to give us an ability to put double quotes in our strings. The “” is called an escape character, which is a metacharacter to define special cases. Table 5-2 lists the escape character sequences supported by the GNU Assembler.
Table 5-2

ASCII escape character sequence codes

Escape Character Sequence

Description



Backspace (ASCII code 8)

f

Form feed (ASCII code 12)

New line (ASCII code 10)

Return (ASCII code 13)

Tab (ASCII code 9)

ddd

An octal ASCII code (ex 123)

xdd

A hex ASCII code (ex x4F)

\

The “” character

The double quote character

anything-else

Anything-else

Aligning Data

These data directives put the data in memory contiguously byte by byte. However, the ARM processor often requires data to be aligned on word boundaries, or some other measure. We can instruct the Assembler to align the next piece of data with an .align directive. For instance, consider
.data
     .byte    0x3F
     .align   4
     .word    0x12345678

The first byte is word aligned, but because it is only 1 byte, the next word of data will not be aligned. If we need it to be word aligned, then we can add the “.align 4” directive to make it word aligned. This will result in three wasted bytes, but with gigabyte of memory, this shouldn’t be too much of a worry.

ARM Assembly instructions must be word aligned, so if we insert data in the middle of some instructions, then we need an .align directive before the instructions continue, or our program will crash when we run it. In the next section, we’ll see that when we load data with PC relative addressing, these addresses must also be word aligned. Usually the Assembler will give you an error when alignment is required, and throwing in an “.align 4” directive is a quick fix.

Loading a Register with an Address

In this section, we will look at the LDR instruction and its variations to load a memory address into a register. Once we have an address into a register, we’ll go on to look at all the ways we can use it to load and store data.

It’s a bit confusing that we use the LDR instruction to both load an address into a register and then to use that address to load actual data into a register. The two operations are distinct, and it’s almost worth considering LDR as two separate instructions, one where we are using PC relative addressing to load an address and then the other being all the forms of LDR where we are loading data.

PC Relative Addressing

In Chapter 1, “Getting Started,” we introduced the LDR instruction to load the address of our “Hello World!” string. We needed to do this to pass the address of what to print to the Linux write command. This is a simple example of PC relative addressing. It is convenient, since it doesn’t involve any other registers. If you keep your data close to your code, it is painless. We just needed to code
LDR   X1, =helloworld

to load the address of our helloworld string into X1. The Assembler knows the value of the program counter at this point, so it can provide an offset to the correct memory address. Therefore, it’s called PC relative addressing. There is a bit more complexity to this, which we’ll get to in a minute.

The offset from the PC has 19 bits in the instruction, which gives a range of +/-1MB. The offset address is in words.

PC relative addressing has one more trick up its sleeve; it gives us a way to load any 64-bit quantity into a register in only one instruction, for example, consider
LDR   X1, =0x1234ABCD1234ABCD
This assembles into
ldr   X1, #8
.quad 0x1234abcd1234abcd

The GNU Assembler is helping us out by putting the constant we want into memory, then creating a PC relative instruction to load it.

The PC has become more of an abstract register in the modern 64-bit world. The ARM processor can execute multiple instructions at once and even execute them out of order. In the 32-bit world, the PC was a real register that you could load, add to, and manipulate like any general-purpose register. This caused havoc for hardware engineers trying to design efficient instruction pipelines, so in 64 bits, instructions can’t manipulate the PC directly. For PC relative addressing, it really becomes addressing relative to the current instruction. In the preceding example, “ldr X1, #8” means 8 words from the current instruction.

In Chapter 2, “Loading and Adding,” we performed this with a MOV/MOVT pair. Here we are doing the same thing in one instruction. Both take the same memory, either two 32-bit instruction or one 32-bit instruction, and one 32-bit memory location.

In fact, this is how the Assembler handles all data labels. When we specified
LDR   X1, =helloworld

the Assembler did the same thing; it created the address of the hellostring in memory and then loaded the contents of that memory location, not the helloworld string. We’ll look carefully at this process when we discuss our program to convert strings to upper-case later in this chapter.

These constants the Assembler creates are placed at the end of the .text section which is where the Assembly instructions go, not in the .data section. This makes them read-only in normal circumstances, so they can’t be modified. Any data that you want to modify should go in a .data section.

Why would the Assembler do this? Why not just point the PC relative index directly at the data? There are several reasons for this, not all of them specific to the ARM instruction set:
  1. 1.

    An offset of 1MB looks large, but only addresses a fraction of the memory in a modern computer. This way we can access 1MB objects rather than 1MB words. This helps keep our program equally efficient as it gets larger.

     
  2. 2.

    All the labels we define go into the object file’s symbol table, making this array of addresses, essentially our symbol table. This way it’s easy for the linker/loader and operating system to change memory addresses without you needing to recompile your program.

     
  3. 3.

    If you need any of these variables to be global, you can just make them global (accessible to other files), without changing your program. If we didn’t have this level of indirection, making a variable global would require adjustments to the instructions that load and save it.

     

This is another example of the tools helping us, though at first it may not seem so. In our simple one-line examples, it appears to add a layer of complexity, but in a real program, this is the design pattern that works.

If you do want to avoid this extra indirection, you can use the ADR instruction. We saw this in our iOS example in Chapter 3, “Tooling Up.” ADR is like LDR, only it doesn’t perform the extra indirection. If we do
ADR   X1, helloworld

then the helloworld string has to be in the .text section. iOS doesn’t like the other form since the loader has to fix up the addresses to where the program is loaded in memory, and Apple considers this a worthwhile optimization.

Loading Data from Memory

In our HelloWorld program, we only needed the address to pass on to Linux, which then used it to print our string. Generally, we like to use these addresses to load data into a register.

The simple form of LDR to load data given an address is
LDR{type}   Xt, [Xa]
where type is one of the types listed in Table 5-3.
Table 5-3

The data types for the load/store instructions

Type

Meaning

B

Unsigned byte

SB

Signed byte

H

Unsigned halfword (16 bits)

SH

Signed halfword (16 bits)

SW

Signed word

The signed version will extend the sign across the rest of the register when we load the data. We don’t need unsigned word, since we just use a W register in this case.

Listing 5-2 shows the typical usage where we load an address into a register and then use that address to load the data we want.
// load the address of mynumber into X1
      LDR   X1, =mynumber
// load the word stored at mynumber into X2
      LDR   X2, [X1]
.data
mynumber:   .QUAD 0x123456789ABCDEF0
Listing 5-2

Loading an address and then the value

If you step through this in the debugger, you can watch it load 0x123456789ABCDEF0 into X2.

Note

The square bracket syntax represents indirect memory access. This means load the data stored at the address pointed to by X1, not move the contents of X1 into X2.

This works, but you might be dissatisfied that it took us two instructions to load X2 with our value from memory, one to load the address and then one to load the data. This is life programming a RISC processor; each instruction executes very quickly, but performs a small chunk of work. As we develop algorithms, we’ll see that we usually load an address once and then use it quite a bit, so most accesses take one instruction once we are going.

Indexing Through Memory

All high-level programming languages have an array construct. They can define an array of objects and then access the individual elements by index. The high-level language will define the array with something like
DIM A[10] AS WORD
then access the individual elements with statements like those in Listing 5-3.
 // Set the 5th element of the array to the value 6
A[5] = 6
// Set the variable X equal to the 3rd array element
      X = A[3]
// Loop through all 10 elements
       FOR I = 1 TO 10
             // Set element I to I cubed
             A[I] = I ** 3
       NEXT I
Listing 5-3

Pseudo-code to loop through an array

The ARM instruction set gives us support for doing these sorts of operations.

Suppose we have an array of 10 words (4 bytes each) defined by
arr1:   .FILL   10, 4, 0
Let’s load the array’s address into X1:
LDR   X1, =arr1
We can now access the elements using LDR as demonstrated in Listing 5-4 and Figure 5-1.
      // Load the first element
      LDR   W2, [X1]
      // Load element 3
      // The elements count from 0, so 2 is
      // the third one. Each word is 4 bytes,
      // so we need to multiply by 4
      LDR   W2, [X1, #(2 * 4)]
Listing 5-4

Indexing into an array

../images/494415_1_En_5_Chapter/494415_1_En_5_Fig1_HTML.jpg
Figure 5-1

Graphical view of using X1 and an index to load W2

Notice how we use W2 to specify that we want to load 32 bits or one word. Addresses are always 64 bits and we must use an X register. However, as in this case, we often only need to load a smaller quantity of data.

This is fine for accessing hard-coded elements, but what about via a variable? We can use a register as demonstrated in Listing 5-5.
// The 3rd element is still number 2
      MOV   X3, #(2 * 4)
// Add the offset in X3 to X1 to get our element.
      LDR   W2, [X1, X3]
Listing 5-5

Using a register as an offset

We can do these shifts in reverse. If X1 points to the end of the array, we can do
LDR   W2, [X1, #-(2 * 4)]
MOV   X3, #(-2 * 4)
LDR   W2, [X1, X3]
With the register as the offset, it is the same as a register and shift type Operand2 that we studied in Chapter 2, “Loading and Adding.” For the preceding constants, we could do a ∗ 4 in the immediate instruction, but if it’s in a register, we would need to do an additional shift operation and put the result in yet another register. With the register/shift format, we can handle quite a few cases easily. Computing the address of an array of words is demonstrated in Listing 5-6.
// Our array is of WORDs. 2 is the index
   MOV   X3, #2
// Shift X3 left by 2 positions to multiply
// by 4 to get the correct address.
   LDR   W2, [X1, X3, LSL #2]
Listing 5-6

Multiplying an offset by 4 using a shift operation

Write Back

When the address is calculated, the result is thrown away after we’ve loaded the register. When performing a loop, it is handy to keep the calculated address. This saves us doing a separate ADD on our index register.

The syntax for this is to put an exclamation mark (!) after the instruction, and then the Assembler will set the bit in the generated instruction asking the CPU to save the calculated address; thus
LDR W2, [X1, #(2 * 4)]!

updates X1 with the value calculated. In the examples we’ve studied, this isn’t that useful, but it becomes much more useful in the next section. You can only use this in the simple case shown; it can’t be used when a register is used in place of an immediate offset.

Post-Indexed Addressing

The preceding section covers what is called pre-indexed addressing. This is because the address is calculated and then the data is retrieved using the calculated address. In post-indexed addressing, the data is retrieved first using the base register; then any offset adding is done. In the context of one instruction, this seems strange, but when we write loops, we will see this is what we want. The calculated address is written back to the base address register, since otherwise there is no point in using this feature, so we don’t need the !.

We indicate we want post-index addressing by placing the items to add outside the square brackets. In Listing 5-7, LDR will load X1 with the contents of memory pointed to by X2 and then update X2 by adding the immediate constant to it.
// Load X1 with the memory pointed to by X2
// Then do X2 = X2 + 2
   LDR   X1, [X2], #2
Listing 5-7

Example of post-indexed addressing

Converting to Upper-Case
As an example of how post-indexed addressing helps up write loops, let’s consider looping through a string of ASCII bytes. Suppose we want to convert any lower-case characters to upper-case. Listing 5-8 gives pseudo-code to do this.
i = 0
DO
      char = inStr[i]
      IF char >= 'a' AND char <= 'z' THEN
            char = char - ('a' - 'A')
      END IF
      outStr[i] = char
      i = i + 1
UNTIL char == 0
PRINT outStr
Listing 5-8

Pseudo-code to convert a string to upper-case

In this example, we are going to use NULL-terminated strings. These are very common in C programming. Here instead of a string being a length and a sequence of characters, the string is the sequence of characters, followed by a NULL (ASCII code 0 or ) character. To process the string, we simply loop until we hit the NULL character. This is quite different than the fixed length string we dealt with when printing hex digits in Chapter 4, “Controlling Program Flow.”

We’ve already covered FOR and WHILE loops. The third common structured programming loop is the DO/UNTIL loop, which puts the condition at the end of the loop. In this construct, the loop is always executed once. In our case, we want this, since if the string is empty, we still want to copy the NULL character, so the output string will then be empty as well.

Another difference is that we aren’t changing the input string. Instead we leave the input string alone and produce a new output string with the upper-case version of the input string.

As is common in Assembly Language programming, we reverse the logic, to jump around the code in the IF block. Listing 5-9 shows the updated pseudo-code.
      IF char < 'a' GOTO continue
      IF char > 'z' GOTO continue
      char = char - ('a' - 'A')
continue: // the rest of the program
Listing 5-9

Pseudo-code for how we will implement the IF statement

We don’t have the structured programming constructs of a high-level language to help us, and this turns out to be quite efficient in Assembly Language.

Listing 5-10 is the Assembly code to convert a string to upper-case.
//
// Assembler program to convert a string to
// all upper case.
//
// X0-X2 - parameters to Linux function services
// X3 - address of output string
// X4 - address of input string
// W5 - current character being processed
// X8 - linux function number
//
.global _start // Provide program starting address to linker
_start: LDR   X4, =instr      // start of input string
        LDR   X3, =outstr     // address of output string
// The loop is until byte pointed to by X1 is non-zero
loop:   LDRB  W5, [X4], #1    // load character and incr pointer
// If W5 > 'z' then goto cont
       CMP   W5, #'z'         // is letter > 'z'?
       B.GT  cont
// Else if W5 < 'a' then goto end if
       CMP   W5, #'a'
       B.LT  cont            // goto to end if
// if we got here then the letter is lower case, so convert it.
       SUB   W5, W5, #('a'-'A')
cont:  // end if
       STRB  W5, [X3], #1    // store character to output str
       CMP   W5, #0          // stop on hitting a null character
       B.NE  loop            // loop if character isn't null
// Setup the parameters to print our hex number
// and then call Linux to do it.
      MOV    X0, #1          // 1 = StdOut
      LDR    X1, =outstr     // string to print
      SUB    X2, X3, X1      // get the len by sub'ing the pointers
      MOV    X8, #64         // Linux write system call
      SVC    0               // Call Linux to output the string
// Setup the parameters to exit the program
// and then call Linux to do it.
      MOV    X0, #0          // Use 0 return code
      MOV    X8, #93         // Service code 93 terminates
      SVC    0               // Call Linux to terminate the program
.data
instr:  .asciz  "This is our Test String that we will convert. "
outstr:      .fill  255, 1, 0
Listing 5-10

Program to convert a string to upper-case

If we compile and run the program, we get the desired output:
smist08@kali:~/asm64/Chapter 5$ make
as   upper.s -o upper.o
ld -o upper upper.o
smist08@kali:~/asm64/Chapter 5$ ./upper
THIS IS OUR TEST STRING THAT WE WILL CONVERT.
smist08@kali:~/asm64/Chapter 5$
This program is quite short. Besides all the comments and the code to print the string and exit, there are only 11 Assembly instructions to initialize and execute the loop:
  • Two instructions: Initialize our pointers for instr and outstr.

  • Five instructions: Make up the if statement.

  • Four instructions: For the loop, including loading a character, saving a character, updating both pointers, checking for a null character, and branching if not null.

It would be nice if STRB also set the condition flags, but there is no STRBS version. LDR and STR just load and save; they don’t have functionality to examine what they are loading or saving, so they can’t set the condition flags, hence the need for the CMP instruction in the UNTIL part of the loop to test for NULL.

In this example, we use the LDRB and STRB instructions, since we are processing byte by byte. The STRB instruction is the reverse of the LDRB instruction. It saves its first argument to the address built from all its other parameters. By covering LDR in so much detail, we’ve also covered STR which is the mirror image.

To convert the letter to upper-case, we use
SUB   W5, W5, #('a'-'A')

The lower-case characters have higher values than the upper-case characters, so we just use an expression that the Assembler will evaluate to get the correct number to subtract.

When we come to print the string, we don’t know its length and Linux requires the length. We use the following instruction:
SUB   X2, X3, X1

Here we’ve just loaded X1 with the address of outstr. X3 held the address of outstr in our loop, but because we used post-indexed addressing, it got incremented in each iteration of the loop. As a result, it is now pointing 1 past the end of the string. We then calculate the length by subtracting the address of the start of the string from the address of the end of the string. We could have kept a counter for this in our loop, but in Assembly we are trying to be efficient, so we want as few instructions as possible in our loops.

Let’s look at Listing 5-11, a disassembly of our program.
Disassembly of section .text:
00000000004000b0 <_start>:
  4000b0:    58000284    ldr    x4, 400100 <cont+0x30>
  4000b4:    580002a3    ldr    x3, 400108 <cont+0x38>
00000000004000b8 <loop>:
  4000b8:    38401485    ldrb   w5, [x4], #1
  4000bc:    7101e8bf    cmp    w5, #0x7a
  4000c0:    5400008c    b.gt   4000d0 <cont>
  4000c4:    710184bf    cmp    w5, #0x61
  4000c8:    5400004b    b.lt   4000d0 <cont> // b.tstop
  4000cc:    510080a5    sub    w5, w5, #0x20
00000000004000d0 <cont>:
  4000d0:    38001465    strb   w5, [x3], #1
  4000d4:    710000bf    cmp    w5, #0x0
  4000d8:    54ffff01    b.ne   4000b8 <loop>  // b.any
  4000dc:    d2800020    mov    x0, #0x1       // #1
  4000e0:    58000141    ldr    x1, 400108 <cont+0x38>
  4000e4:    cb010062    sub    x2, x3, x1
  4000e8:    d2800808    mov    x8, #0x40      // #64
  4000ec:    d4000001    svc    #0x0
  4000f0:    d2800000    mov    x0, #0x0       // #0
  4000f4:    d2800ba8    mov    x8, #0x5d      // #93
  4000f8:    d4000001    svc    #0x0
  4000fc:    00000000    .inst  0x00000000 ; undefined
  400100:    00410110    .word  0x00410110
  400104:    00000000    .word  0x00000000
  400108:    0041013f    .word  0x0041013f
  40010c:    00000000    .word  0x00000000
Contents of section .data:
 410110 54686973 20697320 6f757220 54657374  This is our Test
 410120 20537472 696e6720 74686174 20776520   String that we
 410130 77696c6c 20636f6e 76657274 2e0a0000  will convert....
 410140 00000000 00000000 00000000 00000000  ................
Listing 5-11

Disassembly of the upper-case program

The instruction
LDR   X4, =instr
is converted to
ldr   x4, 400100 <cont+0x30>

Here objdump is trying to be helpful by telling us what will be loaded, namely, the address stored at address 0x400100, which the Assembler added to our .text section to hold the address of our input string. If we look at address 0x400100, we see it contains 0x00410110, which is the address of instr in the .data section. It might appear here that the addresses are 32 bits, but this is objdump doing some misinterpretation. Notice the 0 word before the address, which objdump has listed as an illegal instruction, whereas this is really the other half of our address.

If we look at the actual encoding of the instruction, it is 0x58000284. The 58 is the opcode and the low-order 5 bits are the register number, in this case 4. This means the offset encoded in the instruction is 101000 in binary. Remember the offset is in words, so we need to shift left 2 bits to multiply by 4 for the offset in bytes which gives 0101 0000 in binary which is 0x50 in hex. If we add 0x50 to the address of the LDR instruction which is 0x4000b0, we get the desired address of 0x400100. Aren’t we glad the Assembler does all this for us?

This shows how the Assembler added the literal for the address of the string instr at the end of the code section. When we do the LDR, it accesses this literal and loads it into memory; this gives us the address we need in memory. The other literal added to the code section is the address of outstr.

To see this program in action, it is worthwhile to single step through it in gdb. You can watch the registers with the “i r” (info registers) command. To view instr and oustr as the processing occurs, there are a couple of ways of doing it. From the disassembly, we know the address of instr is 0x410110, so we can enter
(gdb) x /2s 0x410110
0x410110:       "This is our Test String that we will convert. "
0x41013f:       "TH"
(gdb)
This is convenient since the x command knows how to format strings, but it doesn’t know about labels. We can also enter
(gdb) p (char[10]) outstr
$1 = "TH00000000000000"
(gdb)

The print (p) command knows about our labels but doesn’t know about our data types, and we must cast the label to tell it how to format the output. Gdb handles this better with high-level languages because it knows about the data types of the variables. In Assembly, we are closer to the metal.

Storing a Register

The store register STR instruction is a mirror of the LDR instruction. All the addressing modes we’ve talked about for LDR work for STR. This is necessary since in a load-store architecture, we need to store everything we load after it is processed in the CPU. We’ve seen the STR instruction a couple of times already in our examples.

If we are using the same registers to load and store the data in a loop, typically the first LDR call will use pre-indexed addressing without write back and then the STR instruction will use post-indexed addressing with write back to advance to the next item for the next iteration of the loop.

Double Registers

There are doubleword versions of all the LDR and STR instructions we’ve seen. The LDP instruction takes a pair of registers to load as parameters and then loads 128 bits of memory into these. Similarly for the STP instruction.

For example, Listing 5-12 loads the address of a 128-bit quantity (the address is still 64 bits) and then loads the 128 bits into X2 and X3. Then we store X2 and X3 back into the myoctaword.
      LDR   X1, =myoctaword
      LDP   X2, X3, [X1]
      STP   X2, X3, [X1]
.data
myoctaword: .OCTA 0x12345678876543211234567887654321
Listing 5-12

Example of loading and storing a doubleword

We will use these instructions extensively when we need to save registers to the stack and later restore them in Chapter 6, “Functions and the Stack.”

Summary

With this chapter, we can now load data from memory, operate on it in the registers, and then save the result back to memory. We examined how the data load and store instructions help us with arrays of data and how they help us index through data in loops.

In the next chapter, we will look at how to make our code reusable; after all, wouldn’t our upper-case program be handy if we could call it whenever we wish?

Exercises

  1. 1.

    Create a small program to try out all the data definition directives the Assembler provides. Assemble your program and use objdump to examine the data. Add some align directives and examine how they move around.

     
  2. 2.

    Explain how the LDR instruction lets you load any 64-bit address in only one 32-bit instruction.

     
  3. 3.

    Write a program that converts a string to all lower-case.

     
  4. 4.

    Write a program that converts any non-alphabetic character in a NULL-terminated string to a space.

     
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.104.248