In this chapter, we cover the following topics:
• C programming language
• Computer memory
• Intel processors
• Assembly language basics
• Debugging with gdb
• Python survival skills
Why study programming? Ethical hackers should study programming and learn as much about the subject as possible in order to find vulnerabilities in programs and get them fixed before unethical hackers and black hats take advantage of them. Many security professionals come at programming from a nontraditional perspective, often having no programming experience prior to beginning their career. Bug hunting is very much a foot race: if a vulnerability exists, who will find it first? The purpose of this chapter is to give you the survival skills necessary to understand upcoming chapters and then later to find the holes in software before the black hats do.
The C programming language was developed in 1972 by Dennis Ritchie from AT&T Bell Labs. The language was heavily used in Unix and is therefore ubiquitous. In fact, many of the staple networking programs and operating systems, as well as large applications such as Microsoft Office Suite, Adobe Reader, and browsers, are written in combinations of C, C++, Objective-C, assembly, and a couple of other lower-level languages.
Although each C program is unique, some common structures can be found in most programs. We’ll discuss these in the next few sections.
All C programs “should” (see the “For Further Reading” section for an exception) contain a main() function (lowercase) that follows the format
where both the return value type and arguments are optional. If no return value type is specified, a return type of int is used; however, some compilers may throw warnings if you fail to specify its return value as int or attempt to use void. If you use command-line arguments for main(), you could use the format
(among others), where the argc integer holds the number of arguments and the argv array holds the input arguments (strings). The name of the program is always stored at offset argv[0]. The parentheses and brackets are mandatory. The brackets are used to denote the beginning and end of a block of code. Although procedure and function calls are optional, the program would do nothing without them. A procedure statement is simply a series of commands that performs operations on data or variables and normally ends with a semicolon.
Functions are self-contained bundles of code that can be called for execution by main() or other functions. They are nonpersistent and can be called as many times as needed, thus preventing us from having to repeat the same code throughout a program. The format is as follows:
The function name and optional argument list comprise the signature. By looking at it, you can tell if the function requires arguments that will be used in processing the procedures of the function. Also notice the optional return value; this tells you if the function returns a value after executing and, if so, what type of data that is.
The call to the function may look like this:
The following is a simple example:
Here, we are including the appropriate header files, which include the function declarations for exit and printf. The exit function is defined in stdlib.h, and printf is defined in stdio.h. If you do not know what header files are required based on the dynamically linked functions you are using in a program, you can simply look at the manual entry, such as man sscanf, and refer to the synopsis at the top. We then define the main function with a return value of int. We specify void in the arguments location between the parentheses because we do not want to allow arguments passed to the main function. We then create a variable called x with a data type of int . Next, we call the function foo and assign the return value to x. The foo function simply returns the value 8 . This value is then printed onto the screen using the printf function, using the format string %d to treat x as a decimal value .
Function calls modify the flow of a program. When a call to a function is made, the execution of the program temporarily jumps to the function. After execution of the called function has completed, control returns to the calling function at the virtual memory address directly below the call instruction. This process will make more sense during our discussion of stack operations in Chapter 10.
Variables are used in programs to store pieces of information that may change and may be used to dynamically influence the program. Table 2-1 shows some common types of variables.
When the program is compiled, most variables are pre-allocated memory of a fixed size according to system-specific definitions of size. Sizes in Table 2-1 are considered typical; there is no guarantee you will get those exact sizes. It is left up to the hardware implementation to define the size. However, the function sizeof() is used in C to ensure that the correct sizes are allocated by the compiler.
Variables are typically defined near the top of a block of code. As the compiler chews up the code and builds a symbol table, it must be aware of a variable before that variable is used in the code later. The word “symbol” is simply a name or identifier. This formal declaration of variables is done in the following manner:
Table 2-1 Types of Variables
For example, in the line
int a = 0;
an integer (normally 4 bytes) is declared in memory with a symbol of a and an initial value of 0.
Once a variable is declared, the assignment construct is used to change the value of the variable. For example, the statement
x=x+1;
is an assignment statement that changes the value of the variable x. The new value of x is the current value of x modified by the + operator. It is common to use the format
destination = source <with optional operators>
where destination is the location in which the final outcome is stored.
The C language comes with many useful constructs bundled into the libc library. One of many commonly used constructs is the printf command, generally used to print output to the screen. There are two forms of the printf command:
printf(<string>);
printf(<format string>, <list of variables/values>);
The first format is straightforward and is used to display a simple string to the screen. The second format allows for more flexibility through the use of a format type that can be composed of normal characters and special symbols that act as placeholders for the list of variables following the comma. Commonly used format symbols are listed and described in Table 2-2.
These format types allow the programmer to indicate how they want data displayed to the screen, written to a file, or other possibilities through the use of the printf family of functions. As an example, say you know a variable to be a float and you want to ensure that it is printed out as such, and you also want to limit its width, both before and after the floating point. In this case, you could use the code in the following lab in Kali, where we first change our shell to bash and then get the code from GitHub using git clone.
Table 2-2 printf Format Types
Lab 2-1: Format Strings
In this lab, we download the code for all the labs in this chapter and then focus on format strings, which will allow us to format the output of our program as we wish.
Now, we can look at our code:
In the first printf call , we use a total width of 5, with 2 values after the floating point. In the second call to printf , we use a total width of 4, with 1 value after the floating point.
Now, let’s compile it with gcc and run it:
NOTE The examples in this chapter use 2020.4 64-bit Kali Linux. If you are using 32-bit Kali Linux, you may need to change your compiler options.
The scanf command complements the printf command and is generally used to get input from the user. The format is
scanf(<format string>, <list of variables/values>);
where the format string can contain format symbols such as those shown for printf in Table 2-2. For example, the following code will read an integer from the user and store it in a variable called number:
scanf("%d", &number);
Actually, the & symbol means we are storing the value in the memory location pointed to by number. This will make more sense when we talk about pointers later in the chapter in the “Pointers” section. For now, realize that you must use the & symbol before any variable name with scanf. The command is smart enough to change types on the fly, so if you were to enter a character in the previous command prompt, the command would convert the character into the decimal (ASCII) value automatically. Bounds checking is not done in regard to string size, however, which may lead to problems, as discussed later in Chapter 10.
The strcpy command is one of the most dangerous functions used in C. The format of the command is as follows:
strcpy(<destination>, <source>);
The purpose of the command is to copy each character in the source string (a series of characters ending with a null character,