Why study programming? Ethical hackers should study programming and learn as much about the subject as possible in order to find vulnerabilities in programs and get them fixed before unethical hackers take advantage of them. Many security professionals come at programming from a nontraditional perspective, often having no programming experience prior to beginning their career. Bug hunting is very much a foot race: if a vulnerability exists, who will find it first? The purpose of this chapter is to give you the survival skills necessary to understand upcoming chapters and then later to find the holes in software before the black hats do.
In this chapter, we cover the following topics:
• C programming language
• Computer memory
• Intel processors
• Assembly language basics
• Debugging with gdb
• Python survival skills
The C programming language was developed in 1972 by Dennis Ritchie from AT&T Bell Labs. The language was heavily used in Unix and is therefore ubiquitous. In fact, many of the staple networking programs and operating systems, as well as large applications such as Microsoft Office Suite, Adobe Reader, and browsers, are written in combinations of C, C++, Objective-C, assembly, and a couple of other lower-level languages.
Although each C program is unique, some common structures can be found in most programs. We’ll discuss these in the next few sections.
All C programs contain a main() function (lowercase) that follows the format
where both the return value type and arguments are optional. If no return value type is specified, a return type of int is used; however, some compilers may throw warnings if you fail to specify its return value as int or attempt to use void. If you use command-line arguments for main(), use the format
where the argc integer holds the number of arguments and the argv array holds the input arguments (strings). The name of the program is always stored at offset argv[0]. The parentheses and brackets are mandatory, but white space between these elements does not matter. The brackets are used to denote the beginning and end of a block of code. Although procedure and function calls are optional, the program would do nothing without them. A procedure statement is simply a series of commands that performs operations on data or variables and normally ends with a semicolon.
Functions are self-contained bundles of code that can be called for execution by main() or other functions. They are nonpersistent and can be called as many times as needed, thus preventing us from having to repeat the same code throughout a program. The format is as follows:
The first line of a function is called the signature. By looking at it, you can tell if the function returns a value after executing or requires arguments that will be used in processing the procedures of the function.
The call to the function looks like this:
The following is a simple example:
Here, we are including the appropriate header files, which include the function declarations for exit and printf. The exit function is defined in stdlib.h, and printf is defined in stdio.h. If you do not know what header files are required based on the dynamically linked functions you are using in a program, you can simply look at the manual entry, such as man sscanf, and refer to the synopsis at the top. We then define the main function with a return value of int. We specify void in the arguments location between the parentheses because we do not want to allow arguments passed to the main function. We then create a variable called x with a data type of int. Next, we call the function foo and assign the return value to x. The foo function simply returns the value 8. This value is then printed onto the screen using the printf function, using the format string %d to treat x as a decimal value.
Function calls modify the flow of a program. When a call to a function is made, the execution of the program temporarily jumps to the function. After execution of the called function has completed, control returns to the calling function at the virtual memory address directly below the call instruction. This process will make more sense during our discussion of stack operations in Chapter 11.
Variables are used in programs to store pieces of information that may change and may be used to dynamically influence the program. Table 2-1 shows some common types of variables.
Table 2-1 Types of Variables
When the program is compiled, most variables are preallocated memory of a fixed size according to system-specific definitions of size. Sizes in Table 2-1 are considered typical; there is no guarantee you will get those exact sizes. It is left up to the hardware implementation to define the size. However, the function sizeof() is used in C to ensure that the correct sizes are allocated by the compiler.
Variables are typically defined near the top of a block of code. As the compiler chews up the code and builds a symbol table, it must be aware of a variable before that variable is used in the code later. The word symbol is simply a name or identifier. This formal declaration of variables is done in the following manner:
For example,
where an integer (normally 4 bytes) is declared in memory with a name of a and an initial value of 0.
Once a variable is declared, the assignment construct is used to change the value of the variable. For example, the statement
is an assignment statement containing a variable, x, modified by the + operator. The new value is stored in x. It is common to use the format
where destination is the location in which the final outcome is stored.
The C language comes with many useful constructs bundled into the libc library. One of many commonly used constructs is the printf command, generally used to print output to the screen. There are two forms of the printf command:
The first format is straightforward and is used to display a simple string to the screen. The second format allows for more flexibility through the use of a format type that can be composed of normal characters and special symbols that act as placeholders for the list of variables following the comma. Commonly used format symbols are listed and described in Table 2-2.
Table 2-2 printf Format Types
These format types allow the programmer to indicate how they want data displayed to the screen, written to a file, or other possibilities through the use of the printf family of functions. As an example, say you know a variable to be a float and you want to ensure that it is printed out as such, and you also want to limit its width, both before and after the floating point. In this case, you could use the following:
In the first printf call, we use a total width of 5, with 2 values after the floating point. In the second call to printf, we use a total width of 4, with 1 value after the floating point.
NOTE The examples in this chapter use 32-bit Kali Linux. If you are using 64-bit Kali Linux, you may need to change your compiler options.
The scanf command complements the printf command and is generally used to get input from the user. The format is
where the format string can contain format symbols such as those shown for printf in Table 2-2. For example, the following code will read an integer from the user and store it into a variable called number:
Actually, the & symbol means we are storing the value into the memory location pointed to by number. This will make more sense when we talk about pointers later in the chapter in the “Pointers” section. For now, realize that you must use the & symbol before any variable name with scanf. The command is smart enough to change types on the fly, so if you were to enter a character in the previous command prompt, the command would convert the character into the decimal (ASCII) value automatically. Bounds checking is not done in regard to string size, however, which may lead to problems, as discussed later in Chapter 11.
The strcpy command is one of the most dangerous functions used in C. The format of the command is as follows:
The purpose of the command is to copy each character in the source string (a series of characters ending with a null character,