In this chapter, we will introduce you to debugging an assembly program. Debugging is an important skill, because with a debugger you can investigate the content of registers and memory in hexadecimal, binary, or decimal representation. You already know from the previous chapter that the CPU is intensively using registers and memory, and a debugger allows you to execute the instructions step-by-step, while looking at how the content of the registers, memory, and flag changes. Maybe you have experienced already your first assembly program crashing upon execution with an unfriendly message such as “Memory Segmentation Fault.” With a debugger you can step through your program and find out exactly where and why things went wrong.
Start Debugging
Once you have assembled and linked your hello, world program, without errors, you obtain an executable file. With a debugger tool you can load an executable program into the computer memory and execute it line by line while examining various registers and memory places. There are several free and commercial debuggers available. In Linux, the mother of all debuggers is GDB; it is a command-line program, with very cryptic commands. So much fun! In future chapters, we will use SASM, a tool with a graphical user interface, that is based on GDB. But having a basic knowledge of GDB itself can be useful, because not all GDB functionality is available in SASM.
In your further career as an assembly programmer, you will certainly look at various debuggers with nice user interfaces, each one targeted at a specific platform, such as Windows, Mac, or Linux. These GUI debuggers will help you debug long and complex programs with much more ease as compared to a CLI debugger. But GDB is a comprehensive and “quick and dirty way” to do Linux debugging. GDB is installed on most Linux development systems, and if not, it can be easily installed for troubleshooting without much overhead for the system. We will use GDB for now to give you some essentials and turn to other tools in later chapters. One note, GDB seems to be developed for debugging higher-level languages; some features will not be of any help when debugging assembly.
Debugging a program with a CLI debugger can be overwhelming the first time. Do not despair when reading this chapter; you will see that things get easier as we progress.
If the output on your screen is different from our screen, containing lots of % signs, then your GDB is configured to use the AT&T syntax flavor. We will use the Intel syntax flavor, which is more intuitive (to us). We will show how to change the flavor in a minute.
GDB will run your hello program, printing hello, world, and return to its prompt (gdb).
To quit GDB, type quit.
Let’s do some interesting stuff with GDB!
But first we will change the disassembly flavor; do this only if you had the % signs in the previous exercise. Load the executable hello into GDB if it is not already there.
This will put the disassembled code in a format that is already familiar. You can make Intel the default flavor for GDB by using the appropriate setting in your Linux shell profile. See the documentation of your Linux distribution. In Ubuntu 18.04, create a .gdbinit file in your home directory, containing the previous set instruction. Log out and log in, and you should be using GDB with the Intel flavor from now on.
Start GDB with hello to begin your analysis. As you learned before, the hello, world program first initializes some data in section.data and section.bss and then proceeds to the main label. That is where the action starts, so let’s begin our examination there.
GDB returns your source code, more or less. The returned source code is not exactly the same as the source you wrote originally. Strange, isn’t it? What happened here? Some analysis is needed.
The long numbers on the left, starting with 0x00..., are memory addresses; they are the places where the machine instructions of our program are stored. As you can see, from the addresses and the <+5> in the second line, the first instruction, mov eax,0x1, needs five bytes of memory. But wait a minute, in our source code we wrote mov rax,1. What is the deal with the eax?
Well, if you look in the register table from Chapter 2, you will see that eax is the low 32-bit part of the rax register. The assembler is smart enough to figure out that a 64-bit register is far too much waste of resources for storing the number 1, so it uses a 32-bit register. The same is true for the use of edi and edx instead of rdi and rdx. The 64-bit assembler is an extension of the 32-bit assembler, and you will see that whenever possible the assembler will use 32-bit instructions.
The 0x1 is the hexadecimal representation of the decimal number 1, 0xd is decimal 13, and 0x3c is decimal 60.
The nop instruction means “no operation” and is inserted there by the assembler for memory management reasons.
What happened to our msg? The instruction mov rsi, msg got replaced by movabs rsi,0x601030. Do not worry about movabs for now; it is there because of 64-bit addressing, and it is used to put an immediate (value) in a register. The 0x601030 is the memory address where msg is stored on our computer. This can be a different address in your case.
The x stands for “examine,” and the s stands for “string.” GDB answered that 0x601030 is the start of the string msg and tries to show the whole string up until a string-terminating 0. Now you know one of the reasons why we put a terminating 0 after hello, world.
With c you ask for a character. Here GDB returns the first character of msg, preceded by the decimal ASCII code of that character. Do a Google search for a table of ASCII codes to verify, and keep that table handy for future use; there’s no need to memorize it. Or open an additional terminal window and type man ascii at the CLI.
Let’s look at some other examples.
This is our first instruction, mov eax,0x1 , in machine language. We saw that same instruction when we examined the hello.lst file.
Step It Up!
Let’s step through the program with the debugger. Load your program again in GDB if it is not there yet.
The debugger stops at the break and shows the next instruction that will be executed. That is, mov rax,1 is not executed yet.
The content of the registers is not important now, except for rip, the instruction pointer. Register rip has the value 0x4004e0, which is the memory address of the next instruction to execute. Check your disassemble listing; 0x4004e0 (in our case) points to the first instruction, mov rax,1. GDB stops just before that instruction and waits for your commands. It is important to remember that the instruction pointed to by rip is not yet executed.
In your case, GDB may show something different than 0x4004e0. That’s okay; it is the address of that particular line in memory, which may be different depending on your computer configuration.
Indeed, rax contains now 0x1, and rip contains the address of the next instruction to execute.
Step further through the program and notice how rsi receives the address of msg, prints hello, world on the screen, and exits. Notice also how rip points every time to the next instruction to execute.
Some Additional GDB Commands
continue or c: Continue execution until next breakpoint.
step or s: Step into the current line, eventually jumping into the called function.
next or n: Step over the current line and stop at the next line.
help or h: Show help.
tui enable: Enable a simple text user interface; to disable, use tui disable.
print or p: Print the value of a variable, register, and so on.
Print rax: p $rax.
Print rax in binary: p/t $rax.
Print rax in hexadecimal: p/x $rax.
One important remark about GDB: to properly use it, you must insert a function prologue and a function epilogue in your code. We will show in the next chapter how to do that, and in a later chapter we will discuss function prologues and function epilogues when we talk about stack frames. For short programs such as our hello, world program, there is no problem. But with longer programs, GDB will show unexpected behavior if there is no prologue or epilogue.
Play around with GDB, refer to the online manual (type man gdb at the CLI), and get familiar with GDB, because even when you use a GUI debugger, some functionality may not be available. Or you may not want to install a GUI debugger on your system at all.
A Slightly Improved Version of hello, world
You noticed that after printing hello, world, the command prompt appeared on the same line. We want to have hello, world printed on its own line, with the command prompt on a new line.
A Better Version of hello,world
Type this code in your editor and save it as hello2.asm in a new directory. Copy the previous makefile to this new directory; in this makefile, change every instance of hello into hello2 and save the file.
The 10 is the decimal representation of a new line (0xa in hexadecimal). Try it! Do not forget to increase rdx to 13 for the additional 10 character.
Another Version of hello,world
Using this version, however, means that the new line is part of our string, and that is not always desired, because a new line is a formatting instruction that you may only intend to use when displaying a string, not when executing other string-handling functions. On the other hand, it makes your code simpler and shorter. It’s your decision!
Summary
How to use GDB, a CLI debugger
How to print a new line