How it works...

Before we dive into explaining this recipe, let's digest the ELF specification as it pertains to analyzing binaries. First off, the acronym ELF stands for executable and linking format and happens to hold the championship trophy for its presence on Linux. ELF is everywhere on Linux and is the primary format for binaries. Every ELF file, regardless of whether it's an executable file, shared object file, or relocatable object file, begins with the ELF header. The ELF header is constructed using a C structure of the following format when viewing /usr/include/elf.h. This is shown in the following screenshot:

When examining the format of this header, we notice that the header begins with a character array containing a magic number and other information (see the first declaration inside the C structure in the preceding screenshot). Just what is this magic number? I'm glad you asked because we can see it in action if we look at the screenshot in step 10 of this recipe. We can see the 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 bytes next to the magic: label, but this doesn't give us much information unless we know how to translate it. Let's do that now. The 7f byte represents the first byte in all ELF files. If you're ever looking at a hexadecimal dump of a binary and see that it begins with this byte, it's a good indication you're dealing with an ELF binary. The next three bytes, 45 4C 46, all look like possibly printable characters. Let's refer to our man page for ASCII:

If we look along the right-hand side of the preceding truncated chart, we can see that 45 is the hexadecimal representation for the E, 4c character is the hexadecimal representation for the L character, and 46 is the hexadecimal representation of the character, F. So, 45 4c 46 essentially represents the word ELF. At this point, we should commit those 4 bytes to memory so that whenever we are viewing a hexadecimal dump of a binary, we can easily recognize when we are dealing with an ELF binary. The next byte we can see is 01. This fourth byte represents whether we're dealing with an ELF32 or an ELF64 format and computer architecture. This corresponds to the Class: label in the step 10 output image. A value of 01 represents ELF32, while a value of 02 represents ELF64. The next byte, 01, is set based on whether or not we're dealing with a Little Endian or Big Endian, or none in terms of how data is formatted in memory.

A value of 00 represents ELFDATANONE (None), a value of 01 represents ELFDATA2LSB (2's complement, Little Endian), where LSB stands for Least Significant Byte, while a value of 02 represents ELFDATA2MSB (the complement of 2, Big Endian), where MSB stands for Most Significant Byte. Since our host stores data in memory with the least significant bytes first, we see a value of 01, representing Little Endian format for data. This is also shown next to the Data: label in the output for step 10. The next byte, 01, represents the current ELF version, as indicated by the first Version: label in the output of step 10. The next two bytes, 00 00, represent the OS/ABI version and ABI version defaults values. The remaining bytes are NULL bytes and used as padding.

We can see other useful information presented in the step 10 output image, such as the machine architecture; what type of ELF file we're dealing with, which is an executable in our case; the memory address of the executable part of the binary; how far into the binary the program headers begin; where the section headers begin; how big the ELF header is, how large the program headers are; how many program headers exist; how large each section header is; how many section headers we have; and how many indexes there are in the section header of the string table. You're probably asking what this all means, and how it affects you when you're analyzing binaries. Great question—let's cover that now.

The ELF header that we just learned about contains the lay of the land for the rest of the ELF binary. It's our navigator—our GPS—and helps us to understand all of the pieces of the binary. Before moving on, let's take a look at the parts of an ELF binary and how they relate to the goals we're trying to accomplish through the recipes in this book. Let's get started:

ELF header: This is always located at the beginning of the ELF binary and contains information about the layout of the rest of the binary.
Program Header Table: This is responsible for indexing each segment as program headers. This is optional for relocatable files used in linking but absolutely necessary for execution and laying out the process image during execution.
Segment: This is a collection of sections into useable chunks for execution, indexed as Program Headers by the Program Header Table. Each segment can contain zero or more sections.
Section Header Table: This is responsible for indexing each section header that's present within an ELF binary and is necessary for relocatable object files that are used in linking.
Section Header: This is responsible for describing each section within an ELF binary and is necessary for relocatable object files that are used in linking.
Section: This is a part of the binary that contains either data or code and is described by the Section Header. We've already worked with the .text section within our assembly recipes. Sections are necessary for relocatable object files that are used in linking.

Now that we have a basic understanding of the different portions of an ELF binary and their different purposes, let's get back to the recipe. In step 9 and step 11, we run the same readelf -h command against the ch02-helloworld binary and then do the same again against the assembled object file, ch02-helloworld.o. When reviewing the output of each in step 10 and step 12, we can see at least one major difference. The binary is labeled as an executable, while the assembled object file is labeled as a relocatable object file. This is great because that's exactly what they are. Remember how I mentioned sections are used for relocatable object files and segments for executable files? Notice that, in the output for step 10 for the executable, the program headers begin 52 bytes into the file, while for the relocatable object file output in step 12, there are no program headers. We still see section headers in both because, in the executable output in step 10, segments are just a collection of sections, so they are still necessary whereas, in the relocatable object file output in step 12, we only have sections because they are necessary for linking.

One important piece of knowledge to help understand this is to think about the process we used to produce the binary from the assembly code. First, we had to assemble the code using NASM, which produced a relocatable object using the ELF format, and then we had to link that object file using ld to produce our binary. Hopefully, this is clear, and you can see the differences between an executable and a relocatable object file when reviewing the ELF format of these files. From this point forward, we'll just focus on executable ELF binaries.

Step 13 prints out the program header information of our ch02-helloworld binary. Program headers are formatted as follows:

The possible values for p_type (Segment type) are as follows. In our recipe, the only program header is of the LOAD type:

Continuing with our output in step 14, we can see that our only program header, which starts at a point 52 bytes into the binary, starts at virtual address 0x08048000, physical address 0x08048000, has a file size of 0x0008e (142) bytes, and takes up the same amount of memory. It's set with the R and E flags, indicating that segment is set with the permission's read/execute and requires a memory alignment of 0x1000 (4096) bytes. We can also see which sections are mapped to the one segment, which is indicated in the program header table. This is the executable .text section. This makes sense since, when we developed our assembly program, we wrote our executable code in the executable .text section.

In step 15, which results in the output shown in step 16, we use the -S argument on readelf to look at the Section header table. The Section header table uses the following format:

Let's review the output displayed in step 16 from left to right. We have the string table index in brackets, followed by the name of the section; the type of section; the virtual address in the memory of the section; the file offset of the section from the beginning of the binary itself; how big the section is in bytes; the entry size of the section if it contains a table itself, section flags, which, in our case indicate the .text section is executable and occupies memory during execution; whether or not the section is linked to another section; whether or not there is additional information about the particular section; and the section's memory alignment. In the output in step 16, we can see the .text section gives a type of PROGBITS and is marked as executable (X). The PROGBITS type indicates this section contains program data. Here are the rest of the possible values for the type (sh_type) column:

We also notice sections of the STRTAB (string table) and SYMTAB (symbol table) types, which we can dissect further if need be. In step 17, we run the -s argument on readelf to review the SYMTAB (symbol table) section. This section, as shown in step 18, contains seven entries. Each entry follows the following format:

Parsing the output in step 18 from left to right, we have the string table index, the location in memory of the symbol itself, how large the symbol is in bytes, the type of symbol it is, the symbol's binding, whether it's visible or not, the section index, and the symbol name. Notice that, in our output, we recognize at least two entries: the _start symbol, marked GLOBAL, and the name of our file, ch02-helloworld.asm. This should give us a clue that the symbol table gives us information about the possible variables or function names that were used. This will become clear when we dive into a binary that started as a C program in later recipes.

Step 19 through step 22 show some of the power of the readelf tool to display any relocated bytes from the .text section with the -R argument. In step 21, we use the -x argument on the .text section to dump the bytes of that section in hexadecimal format. Optionally, we could also use the -p argument on the .text section to dump any string data from that section. If we did, we would see our Hello, World! string displayed differently.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...