Memory is used by the processor as a storage room for data and instructions. We have already discussed registers, which are high-speed access storage places. Accessing memory is a lot slower than accessing registers. But the number of registers is limited. The memory size has a theoretical limit of 264 addresses, which is 18,446,744,073,709,551,616, or 16 exabytes. You cannot use that much memory because of practical design issues! It is time to investigate memory in more detail.
Exploring Memory
Listing 8-1 shows an example we will use during our discussion of memory.
; memory.asm
section .data
bNum db 123
wNum dw 12345
warray times 5 dw 0 ; array of 5 words
; containing 0
dNum dd 12345
qNum1 dq 12345
text1 db "abc",0
qNum2 dq 3.141592654
text2 db "cde",0
section .bss
bvar resb 1
dvar resd 1
wvar resw 10
qvar resq 3
section .text
global main
main:
push rbp
mov rbp, rsp
lea rax, [bNum] ;load address of bNum in rax
mov rax, bNum ;load address of bNum in rax
mov rax, [bNum] ;load value at bNum in rax
mov [bvar], rax ;load from rax at address bvar
lea rax, [bvar] ;load address of bvar in rax
lea rax, [wNum] ;load address of wNum in rax
mov rax, [wNum] ;load content of wNum in rax
lea rax, [text1] ;load address of text1 in rax
mov rax, text1 ;load address of text1 in rax
mov rax, text1+1 ;load second character in rax
lea rax, [text1+1] ;load second character in rax
mov rax, [text1] ;load starting at text1 in rax
mov rax, [text1+1] ;load starting at text1+1 in rax
mov rsp,rbp
pop rbp
ret
Listing 8-1
memory.asm
Make this program. There is no output for this program; use a debugger to step through each instruction. SASM is helpful here.
We defined some variables of different sizes, including an array of five double words filled with zeros. We also defined some items in section .bss. Look in your debugger for rsp, the stack pointer; it is a very high value. The stack pointerrefers to an address in high memory. The stack is an area in memory used for temporarily storing data. The stack will grow as more data is stored in it, and it will grow in the downward direction, from higher addresses to lower addresses. The stack pointer rsp will decrease every time you put data on the stack. We will discuss the stack in a separate chapter, but remember already that the stack is a place somewhere in high memory. See Figure 8-1.
We used the lea instruction, which means “load effective address,” to load the memory address of bNum into rax. We can obtain the same result with mov, without the square brackets around bNum. If we use the square brackets, [ ], with the mov instruction, we are loading the value, not the address at bNum into rax. But we are not loading only bNum into rax. Because rax is a 64-bit (or 8-byte) register, more bytes are loaded into rax. Our bNum is the rightmost byte in rax (little endian); here we are only interested in the register al. When you require rax to contain only the value 123, you would first have to clear rax, as shown here:
xor rax, rax
Then instead of this:
mov rax, [bNum]
use this:
mov al, [bNum]
Be careful about the sizes of data you are moving to and from memory. Look, for instance, at the following:
mov [bvar],rax
With this instruction, you are moving the 8 bytes in rax to the address bvar. If you only intended to write 123 to bvar, you can check with your debugger that you overwrite another 7 bytes in memory (choose type d for bvar in the SASM memory window)! This can introduce nasty bugs in your program. To avoid that, replace the instruction with the following:
mov [bvar],al
When loading content from memory address text1 into rax, note how the value in rax is in little-endian notation. Step through the program to investigate the different instructions, and change values and sizes to see what happens.
There are two ways to load a memory address: mov and lea. Using lea can make your code more readable, as everybody can immediately see that you are handling addresses here. You can also use lea to speed up calculations, but we will not use lea for that purpose here.
Start gdb memory and then disass main and look at the left column with memory addresses (Figure 8-2). Do not forget to first delete the line added by SASM for correct debugging, as we explained in the previous chapter. In our case, the first instruction is located at address 0x4004a0.
Now we will use readelf at the command line. Remember that we asked NASM to assemble using the ELF format (see the makefile). readelf is a CLI tool used to obtain more information about the executable file. If you feel the irresistible urge to know more about linkers, here is an interesting source of information:
Linkers and Loaders, John R. Levine, 1999, The Morgan Kaufmann Series in Software Engineering and Programming
As you probably guessed, at the CLI you can also type the following:
man elf
For our purposes, at the CLI type the following:
readelf --file-header ./memory
You will get some general information about our executable memory. Look at Entry point address: 0x4003b0. That is the memory location of the start of our program. So, between the program entry and the start of the code, as shown in GDB (0x4004a0), there is some overhead. The header provides us with additional information about the OS and the executable code. See Figure 8-3.
readelf is convenient for exploring a binary executable. Figure 8-4 shows some more examples.
With grep we specify that we are looking for all lines with the word main in it. Here you see that the main function starts at 0x4004a0, as we saw in GDB. In the following example, we look in the symbols table for every occurrence of the label start. We see the start addresses of section .data, section .bss, and the start of the program itself. See Figure 8-5.
Let’s see what we have in memory with the instruction, as shown here:
readelf --symbols ./memory |tail +10|sort -k 2 -r
The tail instruction ignores some lines that are not interesting to us right now. We sort on the second column (the memory addresses) in reverse order. As you see, some basic knowledge of Linux commands comes in handy!
The start of the program is at some low address, and the start of main is at 0x004004a0. Look for the start of section .data, (0x00601018), with the addresses of all its variables and the start of section .bss, (0x00601051), with the addresses reserved for its variables.
Let’s summarize our findings: we found at the beginning of this chapter that the stack is in high memory (see rsp). With readelf, we found that the executable code is at the lower side of memory. On top of the executable code, we have section .data and on top of that section .bss. The stack in high memory can grow; it grows in the downward direction toward section .bss. The available free memory between the stack and the other sections is called the heap.
The memory in section .bss is assigned at runtime; you can easily check that. Take note of the size of the executable, and then change, for example, the following:
qvar resq 3
to the following:
qvar resq 30000
Rebuild the program and look again at the size of the executable. The size will be the same, so no additional memory is reserved at assembly/link time. See Figure 8-6.
To summarize, Figure 8-7 shows how the memory looks when an executable is loaded.
Why is it important to know about memory structure? It is important to know that the stack grows in the downward direction. When we exploit the stack later in this book, you will need this knowledge. Also, if you are into forensics or malware investigation, being able to analyze memory is an essential skill. We only touched on some basics here; if you want to know more, refer to the previously mentioned sources.