How it works...

Most of this recipe should be fairly familiar in terms of reading the output of the readelf tool, the ELF man page output, and the elf.h output. In this recipe, we're investigating one program that's been compiled in two different ways and examining the ELF information of the resultant binaries. Don't worry if you don't recognize all of the segments or sections of these binaries just yet; the point of this exercise is reveal the differences that can occur in binaries based on the options that are passed during compilation. We also want to see the differences between a binary originally written in assembly versus a binary originally written in a higher-level language such as C.

The ELF man page and the output of the elf.h header file should be familiar, so we won't repeat what we covered in the previous chapter. In step 5, we run some Terminal commands to change our working directory into ~/bac/Binary_Analysis_Cookbook/Chapter_03/64bit/. Then, we compile the ch03-helloworld64C.c program using the GNU Compiler Collection (GCC) program, make the binary executable using chmod +x, create a stripped version of the ch03-helloworld64C binary by stripping all of the symbols from the file, and save the output as ch03-helloworld64C-stripped. Following this, we run the readelf command using the -a -W arguments, indicating that we want to print out all of the relevant information using as much width for the displayed output as necessary.

The -a argument is the same as if we passed the -h -l -S -s -r -d -V -A -I arguments on the command line to display the ELF header, program headers, section headers, symbols, relocations, the dynamic section, version information, architecture specific information, and a histogram of bucket list lengths. The output in step 6 is pretty lengthy and reveals a ton of information about the non-stripped binary.

The ELF header should be familiar at this point and reviewing the magic bytes of 7f 45 4c 46 02 01 01 reveals we're viewing an ELF formatted 64-bit binary, using the complement of 2, Little Endian notation for data, while the binary is using the current ELF version. We can also see we're dealing with an executable file as opposed to a relocatable object file and the address in memory where execution begins is 0x400a0. There are nine program headers, each fifty-six bytes in size, and thirty-one section headers, each sixty-four bytes in size.

The section headers table is pretty impressive to review as well as we can take a look at quite a few sections that we may not be familiar with (keeping in mind we've only reviewed the ELF information from a program originally written in a lower-level language). Let's break down some of these sections and explain their purpose:

.text: We already know this read-only section contains the executable code of the binary and will serve as an important section in our binary analysis efforts. This is also one of the sections contained in the text segment in executable ELF binaries.
.bss: This section contains uninitialized data when setting up memory allocation for the program. This section can be found within the data segment within an ELF executable binary.
.data: This section contains initialized data that's used when setting up memory for the program. This section is also found within the data segment within an ELF executable binary.
.rodata: This section contains read-only data and is used for non-writable segments for the process image in executable ELF binaries.
.shstrtab: This section contains the section header string table, which holds the names of all of the sections within the binary.
.symtab: This section contains the symbol table that holds an array of symbol references that are used by the linker and loader for locating and relocating data and information within the binary.
.strtab: This is used in conjunction with the symbol table section. This string table section contains null-terminated strings of the symbolic names that are found in the symbol table.
.init: This section is responsible for initializing the process image for an ELF executable.
.fini: This section is responsible for the terminating code for an ELF executable process image.
.plt: This executable section contains the Procedure Linkage Table and holds data that redirects library functions to their absolute locations in memory in conjunction with .got (data) and .got.plt (functions) during the dynamic linking process.
.got: This writable section contains the Global Offset Table and holds the absolute addresses of the data references that are used in the dynamic linking process to relocate the position independent address to the absolute memory addresses. This process helps to resolve shared library data during runtime and process creation and is used in conjunction with the Procedure Linkage Table.
.got.plt: This section works in conjunction with the Procedure Linkage Table and contains the addresses for functions that are used by the Procedure Linkage Table. This is also used during the dynamic linking process.

You'll encounter more sections in binaries than what we've analyzed here, but for now, suffice to say this is a good start at understanding the ELF binary format, especially when a binary is originally written in a high-level language such as C. A C++ program is even more interesting to look when it comes to some of these differences compared to a C program. Since we have a good grasp on the sections we may encounter, let's look at segments and see which sections we could encounter in these segments for executable ELF object files:

Text segment: This is the read-only segment where executable code and read-only data exists. Some common sections within this segment are .text, .rodata, .hash, .dynsym, .dynstr, .plt, and .rel.got.
Data segment: This is the writable segment that contains the .data, .dynamic, .got, and .bss sections.

So far, we've made a few references to dynamic linking and haven't really explained it, so let's do that now. When we're compiling/running a binary, we have to take several aspects of a program into consideration. Some of the shared libraries, functions, and procedures just can't be linked during compile time because their addresses aren't known.

Static linking happens during compile time, binding the relocatable portions of the program where possible. For the rest of the program, the shared libraries, their functions, their data, and their procedures just can't be linked statically. So, to overcome this, dynamic linking enters the picture at runtime, performing the relocations as needed, when needed. This lazy binding provides the benefit of efficiency and is the default behavior of the dynamic linker on Linux.

Continuing to review the output of step 6, we can see that our program headers table gives us insight into all of the program segments that will help to set up the process when this program is executed and then which segments contain which sections. The dynamic section shows us the necessary information for the dynamic linker we mentioned in the previous paragraph. Following that, the .rela.dyn and .rela.plt sections show us information about relocations for the dynamic linker.

One of the entries of interest is the location of the printf() function. Remember, function calls and their absolute memory addresses are handled by the dynamic linker when they're needed during program execution. The specific output I'm referring to can be seen in the following screenshot:

What we can gather from this part of the output is that the printf() function is used somewhere in the program. Larger programs with more code, and that use additional functions from shared libraries, will have many more functions linked dynamically like this. Continuing to review the output, the .symtab section shows us all of the symbol references in the program, including any variables, or function names, and immediately, we can identify the printf() function reference again.

Let's pause here and move on to steps 7 and 8 and explain what we see here. In step 7, we run the -s argument to readelf against our stripped binary. Then, in step 8, we run the same command against our non-stripped binary. Can you see the difference? The stripped binary is missing the .symtab section while the non-stripped binary isn't. This lack of a symbol table is the result of stripping our binary using the strip -s command earlier in this recipe. As we can see, the only symbol section that's available is .dynsym, which is needed for dynamic linking.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...