© Jo Van Hoey 2019
J. Van HoeyBeginning x64 Assembly Programminghttps://doi.org/10.1007/978-1-4842-5076-1_3

3. Program Analysis with a Debugger: GDB

Jo Van Hoey1 
(1)
Hamme, Belgium
 

In this chapter, we will introduce you to debugging an assembly program. Debugging is an important skill, because with a debugger you can investigate the content of registers and memory in hexadecimal, binary, or decimal representation. You already know from the previous chapter that the CPU is intensively using registers and memory, and a debugger allows you to execute the instructions step-by-step, while looking at how the content of the registers, memory, and flag changes. Maybe you have experienced already your first assembly program crashing upon execution with an unfriendly message such as “Memory Segmentation Fault.” With a debugger you can step through your program and find out exactly where and why things went wrong.

Start Debugging

Once you have assembled and linked your hello, world program, without errors, you obtain an executable file. With a debugger tool you can load an executable program into the computer memory and execute it line by line while examining various registers and memory places. There are several free and commercial debuggers available. In Linux, the mother of all debuggers is GDB; it is a command-line program, with very cryptic commands. So much fun! In future chapters, we will use SASM, a tool with a graphical user interface, that is based on GDB. But having a basic knowledge of GDB itself can be useful, because not all GDB functionality is available in SASM.

In your further career as an assembly programmer, you will certainly look at various debuggers with nice user interfaces, each one targeted at a specific platform, such as Windows, Mac, or Linux. These GUI debuggers will help you debug long and complex programs with much more ease as compared to a CLI debugger. But GDB is a comprehensive and “quick and dirty way” to do Linux debugging. GDB is installed on most Linux development systems, and if not, it can be easily installed for troubleshooting without much overhead for the system. We will use GDB for now to give you some essentials and turn to other tools in later chapters. One note, GDB seems to be developed for debugging higher-level languages; some features will not be of any help when debugging assembly.

Debugging a program with a CLI debugger can be overwhelming the first time. Do not despair when reading this chapter; you will see that things get easier as we progress.

To start debugging the hello program , in the CLI navigate to the directory where you saved the hello program. At the command prompt, type the following:
      gdb hello
GDB will load the executable hello into memory and answer with its own prompt (gdb), waiting for your instructions. If you type the following:
      list
GDB will show a number of lines of your code. Type list again , and GDB will show the next lines, and so on. To list a specific line, for example, the start of your code, type list 1. Figure 3-1 shows an example.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig1_HTML.jpg
Figure 3-1

GDB list output

If the output on your screen is different from our screen, containing lots of % signs, then your GDB is configured to use the AT&T syntax flavor. We will use the Intel syntax flavor, which is more intuitive (to us). We will show how to change the flavor in a minute.

If you type the following:
      run

GDB will run your hello program, printing hello, world, and return to its prompt (gdb).

Figure 3-2 shows the results on our screen.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig2_HTML.jpg
Figure 3-2

GDB run output

To quit GDB, type quit.

Let’s do some interesting stuff with GDB!

But first we will change the disassembly flavor; do this only if you had the % signs in the previous exercise. Load the executable hello into GDB if it is not already there.

Type the following:
      set disassembly-flavor intel

This will put the disassembled code in a format that is already familiar. You can make Intel the default flavor for GDB by using the appropriate setting in your Linux shell profile. See the documentation of your Linux distribution. In Ubuntu 18.04, create a .gdbinit file in your home directory, containing the previous set instruction. Log out and log in, and you should be using GDB with the Intel flavor from now on.

Start GDB with hello to begin your analysis. As you learned before, the hello, world program first initializes some data in section.data and section.bss and then proceeds to the main label. That is where the action starts, so let’s begin our examination there.

At the (gdb) prompt, type the following:
      disassemble main

GDB returns your source code, more or less. The returned source code is not exactly the same as the source you wrote originally. Strange, isn’t it? What happened here? Some analysis is needed.

Figure 3-3 shows what GDB returned on our computer.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig3_HTML.jpg
Figure 3-3

GDB disassemble output

The long numbers on the left, starting with 0x00..., are memory addresses; they are the places where the machine instructions of our program are stored. As you can see, from the addresses and the <+5> in the second line, the first instruction, mov eax,0x1, needs five bytes of memory. But wait a minute, in our source code we wrote mov rax,1. What is the deal with the eax?

Well, if you look in the register table from Chapter 2, you will see that eax is the low 32-bit part of the rax register. The assembler is smart enough to figure out that a 64-bit register is far too much waste of resources for storing the number 1, so it uses a 32-bit register. The same is true for the use of edi and edx instead of rdi and rdx. The 64-bit assembler is an extension of the 32-bit assembler, and you will see that whenever possible the assembler will use 32-bit instructions.

The 0x1 is the hexadecimal representation of the decimal number 1, 0xd is decimal 13, and 0x3c is decimal 60.

The nop instruction means “no operation” and is inserted there by the assembler for memory management reasons.

What happened to our msg? The instruction mov rsi, msg got replaced by movabs rsi,0x601030. Do not worry about movabs for now; it is there because of 64-bit addressing, and it is used to put an immediate (value) in a register. The 0x601030 is the memory address where msg is stored on our computer. This can be a different address in your case.

At the (gdb) prompt, type the following:
      x/s 0x601030 (or x/s 'your_memory_address')
GDB answers with the output shown in Figure 3-4.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig4_HTML.jpg
Figure 3-4

GDB output

The x stands for “examine,” and the s stands for “string.” GDB answered that 0x601030 is the start of the string msg and tries to show the whole string up until a string-terminating 0. Now you know one of the reasons why we put a terminating 0 after hello, world.

You can also type the following:
      x/c 0x601030
to get the output shown in Figure 3-5.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig5_HTML.jpg
Figure 3-5

GDB output

With c you ask for a character. Here GDB returns the first character of msg, preceded by the decimal ASCII code of that character. Do a Google search for a table of ASCII codes to verify, and keep that table handy for future use; there’s no need to memorize it. Or open an additional terminal window and type man ascii at the CLI.

Let’s look at some other examples.

Use this to show 13 characters starting at a memory address (see Figure 3-6):
      x/13c 0x601030
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig6_HTML.jpg
Figure 3-6

GDB output

Use the following to show 13 characters starting at a memory address in decimal representation (see Figure 3-7):
      x/13d 0x601030
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig7_HTML.jpg
Figure 3-7

GDB output

Use the following to show 13 characters starting at a memory address in hexadecimal representation (see Figure 3-8):
      x/13x 0x601030
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig8_HTML.jpg
Figure 3-8

GDB output

Use the following to show msg (see Figure 3-9):
      x/s &msg
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig9_HTML.jpg
Figure 3-9

GDB output

Let’s return to the disassemble listing. Type the following:
      x/2x 0x004004e0
This shows in hexadecimal the content of the two memory addresses starting at 0x004004e0 (see Figure 3-10).
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig10_HTML.jpg
Figure 3-10

GDB output

This is our first instruction, mov eax,0x1 , in machine language. We saw that same instruction when we examined the hello.lst file.

Step It Up!

Let’s step through the program with the debugger. Load your program again in GDB if it is not there yet.

First, we will put a break in the program, pausing the execution and allowing us to examine a number or things. Type the following:
      break main
In our case, GDB answers with the output in Figure 3-11.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig11_HTML.jpg
Figure 3-11

GDB output

Then type the following:
      run
Figure 3-12 shows the output.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig12_HTML.jpg
Figure 3-12

GDB output

The debugger stops at the break and shows the next instruction that will be executed. That is, mov rax,1 is not executed yet.

Type the following:
      info registers
GDB returns the output shown in Figure 3-13.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig13_HTML.jpg
Figure 3-13

GDB registers output

The content of the registers is not important now, except for rip, the instruction pointer. Register rip has the value 0x4004e0, which is the memory address of the next instruction to execute. Check your disassemble listing; 0x4004e0 (in our case) points to the first instruction, mov rax,1. GDB stops just before that instruction and waits for your commands. It is important to remember that the instruction pointed to by rip is not yet executed.

In your case, GDB may show something different than 0x4004e0. That’s okay; it is the address of that particular line in memory, which may be different depending on your computer configuration.

Type the following to advance one step:
      step
The type the following, which is the abbreviation for info registers :
      i r
Figure 3-14 shows the output.
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig14_HTML.jpg
Figure 3-14

GDB registers output

Indeed, rax contains now 0x1, and rip contains the address of the next instruction to execute.

Step further through the program and notice how rsi receives the address of msg, prints hello, world on the screen, and exits. Notice also how rip points every time to the next instruction to execute.

Some Additional GDB Commands

break or b: Set a breakpoint as we have done before.
      disable breakpoint number
      enable breakpoint number
      delete breakpoint number

continue or c: Continue execution until next breakpoint.

step or s: Step into the current line, eventually jumping into the called function.

next or n: Step over the current line and stop at the next line.

help or h: Show help.

tui enable: Enable a simple text user interface; to disable, use tui disable.

print or p: Print the value of a variable, register, and so on.

Here are some examples:
  • Print rax: p $rax.

  • Print rax in binary: p/t $rax.

  • Print rax in hexadecimal: p/x $rax.

One important remark about GDB: to properly use it, you must insert a function prologue and a function epilogue in your code. We will show in the next chapter how to do that, and in a later chapter we will discuss function prologues and function epilogues when we talk about stack frames. For short programs such as our hello, world program, there is no problem. But with longer programs, GDB will show unexpected behavior if there is no prologue or epilogue.

Play around with GDB, refer to the online manual (type man gdb at the CLI), and get familiar with GDB, because even when you use a GUI debugger, some functionality may not be available. Or you may not want to install a GUI debugger on your system at all.

A Slightly Improved Version of hello, world

You noticed that after printing hello, world, the command prompt appeared on the same line. We want to have hello, world printed on its own line, with the command prompt on a new line.

Listing 3-1 shows the code to do that.
;hello2.asm
section .data
      msg    db    "hello, world",0
      NL     db    0xa  ; ascii code for new line
section .bss
section .text
     global main
main:
    mov     rax, 1        ; 1 = write
    mov     rdi, 1        ; 1 = to stdout
    mov     rsi, msg      ; string to display
    mov     rdx, 12       ; length of string, without 0
    syscall               ; display the string
    mov     rax, 1        ; 1 = write
    mov     rdi, 1        ; 1 = to stdout
    mov     rsi, NL       ; display new line
    mov     rdx, 1        ; length of the string
    syscall               ; display the string
    mov     rax, 60       ; 60 = exit
    mov     rdi, 0        ; 0 = success exit code
    syscall               ; quit
Listing 3-1

A Better Version of hello,world

Type this code in your editor and save it as hello2.asm in a new directory. Copy the previous makefile to this new directory; in this makefile, change every instance of hello into hello2 and save the file.

We added a variable, NL, containing hexadecimal 0xa, which is the ASCII code for new line, and print this NL variable just after we print msg. That’s it! Go ahead—assemble and run it (see Figure 3-15).
../images/483996_1_En_3_Chapter/483996_1_En_3_Fig15_HTML.jpg
Figure 3-15

A better version of hello, world

Another way to accomplish this is by changing our msg , as shown here:
      msg   db      "hello, world",10,0

The 10 is the decimal representation of a new line (0xa in hexadecimal). Try it! Do not forget to increase rdx to 13 for the additional 10 character.

Listing 3-2 shows the code. Save this as hello3.asm in a separate directory, copy and a modify a makefile appropriately, and build and run.
;hello3.asm
section .data
      msg      db      "hello, world",10,0
section .bss
section .text
      global main
main:
      mov     rax, 1            ; 1 = write
      mov     rdi, 1            ; 1 = to stdout
      mov     rsi, msg          ; string to display
      mov     rdx, 13           ; length of string, without 0
      syscall                   ; display the string
      mov     rax, 60           ; 60 = exit
      mov     rdi, 0            ; 0 = success exit code
      syscall                   ; quit
Listing 3-2

Another Version of hello,world

Using this version, however, means that the new line is part of our string, and that is not always desired, because a new line is a formatting instruction that you may only intend to use when displaying a string, not when executing other string-handling functions. On the other hand, it makes your code simpler and shorter. It’s your decision!

Summary

In this chapter, you learned the following:
  • How to use GDB, a CLI debugger

  • How to print a new line

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.178.237