For the purpose of this chapter, hacking means gaining illicit access to a computer or network by various tricky means. This chapter offers techniques to hack programs by providing them with bad data. Another form of hacking is social engineering where you trick people into revealing their passwords, or other personal data, over the phone, social media, or e-mail; however, that’s a topic for a different book.
Every programmer should know about hacking. If you don’t know how hackers exploit security weaknesses in program code, then you will unknowingly provide these for them.
Buffer Overrun Hack
As an example, we’ll look at the classic buffer overrun problem, how it happens, how to exploit it, and then how to protect against it. Anyone with security experience will notice that our upper-case routine is error-prone and will likely lead to a buffer overrun vulnerability in our code. Let’s look at what buffer overrun is and how it gets exploited.
Causes of Buffer Overrun
Our upper-case routine happily converts text to upper-case until it hits a NULL (0) character . If the provided text is bigger than the output buffer the caller provides, then this routine overwrites whatever is in memory after it. Depending on where the buffer is located, this affects the type of attack that’s possible. We’re going to look at this buffer being located on the stack. The weakness of the stack is that this is where function return addresses get stored when we nest function calls. If we arrange our code exactly, we can overwrite a function return address and cause the function to return to a place of our choosing.
There are other forms of buffer overrun attacks if the data is stored in the C runtime heap, or in the program’s data segment. These attacks are like what we will explore for the stack.
If you enter too much data into such a text field, the program typically crashes, since you’ve overwritten important program data and corrupted pointers. Even though the hacker won’t get any proprietary data this way, this is still a good foundation for a denial of service (DoS) attack. If this is a web server and you cause it to crash, then it needs to be restarted and re-initialized. This typically takes several seconds. This means we can send a message to the web server every few seconds to keep it offline.
Stealing Credit Card Numbers
Imagine a credit card company’s web server running a web application that uses our upper-case program, because it needs to convert names to upper-case super fast, so that its web pages are exceptionally responsive. Suppose there’s a page on the web site where you enter your name, and the web application converts it to upper-case; but the web page wasn’t error checking for the length of data and passed it to our upper-case routine as is. Furthermore, for convenience this web application provides several administrative utilities, such as a facility to download all the credit card data, so it can be backed up. These utilities are only available to administrative users with special clearance and require a digital certificate to access. As a hacker, we want to dupe the customer facing part of the web site into giving us access to the administrative part without requiring extra authentication.
In Chapter 6, “Functions and the Stack,” we learned that if a function calls another function, it must store the LR register to the stack, so that it won’t be lost. We’ll modify our main program and upper-case routine to have an intermediate routine, so LR is stored to the stack and allocates the output buffer on the stack.
Main web application for the credit card company
repeated over and over until you hit Ctrl+C. This is in spite of the routine DownloadCreditCardNumbers never being called within the program. We’ll see why the program is put in an infinite loop shortly.
We won’t include the code for the user interface; we’ll just provide the data in our .data section. We want to keep things simple and easy to follow.
Let’s look at what happens to the stack through the process as this function runs.
Stepping Through the Stack
Remember that the stack grows downward, so when we push something onto the stack, we decrement SP. The pointer we pass for outstr will be 0x7ffffff1f0, and since our loop in the upper-case routine increments, if it overflows its buffer, it overwrites the stored value for LR located at memory address 0x7ffffff200. The strategy is to overwrite LR with an address causing the program to do our bidding.
Excerpts of the objdump output of the program in Listing 16-1
- 1.
In _start we do the BL to the calltoupper routine. This places the address of the next instruction into LR and jumps to calltoupper. This means LR has the value 0x4000ec at this point.
- 2.
On entering calltoupper, SP contains 0x7ffffff210. Execute the
STR LR, [sp, #-16]!instruction which decrements SP by 16 and copies LR to this memory location. This makes SP 0x7ffffff200 and the 16 bytes there contain0x7ffffff200: 0x004000ec 0x00000000 0x00000000 0x00000000showing that LR was pushed to the stack.
- 3.
Execute
SUB SP, SP, #16This allocates 16 bytes for our output buffer. This reduces the stack pointer to 0x7ffffff1f0 and the contents of the stack are0x7ffffff1f0: 0x00000000 0x00000000 0x00000000 0x000000000x7ffffff200: 0x004000ec 0x00000000 0x00000000 0x00000000 - 4.
The function toupper converts our string to upper-case. It does this correctly for the first part of the string “This is our Test” (16 bytes). Since there is no NULL (0) terminator, it will also process the next byte 0xb0 that isn’t lower-case, so will be copied as is. The next byte is a NULL (0), so it stops. SP isn’t affected by this series of operations, but on returning from toupper, the stack contains
0x7ffffff1f0: 0x53494854 0x20534920 0x2052554f 0x545345540x7ffffff200: 0x004000b0 0x00000000 0x00000000 0x00000000The first line is our string, converted to upper-case. But notice the return address at 0x7ffffff200 has changed from 0x004000ec to 0x004000b0. This means the return address is the address of the DownloadCreditCardNumbers routine.
- 5.
The calltoupper cleans up the stack and returns
ADD SP, SP, #16 // Free outstrLDR LR, [SP], #16RETThe key point is that the LDR instruction loads the address of DownloadCreditCardNumbers into LR, then the RET instruction branches to that routine causing a major data breach.
- 1.
We only need to copy one byte to get the address changed to what we want, since the next byte of the address is NULL (0).
- 2.
The byte we needed to copy wasn’t one for a lower-case letter, so it was left alone by the toupper routine.
A successful hack usually requires some luck and fortuitous circumstances. If this wasn’t the case, we still have some options. For example, we could jump into the middle of the DownloadCreditCardNumbers routine. The start of a function usually contains function prologue that, if we never intend to successfully return from, can be skipped. After all, we don’t care if the program continues to work correctly, only that we get our downloaded credit card numbers.
The reason the program goes into an infinite loop is because we don’t do a BL to call DownloadCreditCardNumbers; we use a RET instruction. So nothing updates LR to a new value; therefore, the RET at the end of DownloadCreditCardNumbers jumps to the same address again.
This was an example of one particular buffer overrun exploit; however, hackers have many ways to exploit buffer overruns, whether the data is on the stack, in the C memory heap, or in our data segment. Let’s look at several ways to avoid buffer overrun problems.
Mitigating Buffer Overrun Vulnerabilities
To combat buffer overrun problems, there are techniques we can use in our code and that our tools can provide to help us. In this section, we’ll look at both. First of all, let’s consider the bad design of the function parameters to our upper-case routine. Before we consider a solution, let’s look at the root cause of many buffer overrun problems, the C runtime’s strcpy function, and the various solutions proposed to fix this design.
Don’t Use strcpy
This NULL terminates the string, but it requires the programmer to remember to always do this. Perhaps, under deadline pressure, this may be forgotten.
This function eliminates that problem, as the destination is always NULL (0) terminated, but this function is nonstandard and not part of the GNU C library.
This strecpy passes in a pointer to the end of the destination buffer. This is handy when you nest calls, since end stays constant, unlike a remaining length that shrinks as you build the string. Again, this is a nonstandard function and not part of the C runtime.
In strncpy_s we provide the size of both buffers and the function returns an error code to tell us what happened.
I went through this discussion to point out that there are a lot of trade-offs in fixing API designs. When making the upper-case routine more secure, there are quite a few trade-offs to consider. We’ll present a list of recommendations toward the end of this chapter, but first let’s see what the operating system and GNU compiler can do to help us.
PIE Is Good
The exploit we performed previously relied upon us knowing the address of the DownloadCreditCardNumbers routine. The assumption is that we learned this from somewhere else, perhaps obtaining an illicit copy of the application’s source code, or the build map file from the dark web.
With modern virtual memory systems, the operating system can give a process any memory addresses it likes; they don’t need to have any relation to real memory addresses. This gave rise to a feature called position-independent executables (PIE) introduced to Linux around 2005. With this feature, an executable is loaded with a different base address each time it is run. This is a special case of address space layout randomization (ASLR) , and you often see it referred to by either name.
If we debug this with gdb, we’ll see it runs as before, but all the addresses are changed. Often when debugging, we turn off PIE and only enable it for release to make decoding what is going on easier.
Apple’s iOS operating system turns on PIE by default. If your program can’t handle it, then you need to deliberately turn it off.
This still isn’t ideal; it’s better since the credit card numbers didn’t get stolen, but the program still crashed. This can lead to an easy DoS attack for hackers to make our application unavailable.
We mentioned that the program needs to be relocatable. What stops your program being relocatable? Mostly hard-coding memory addresses in your data section that the linker doesn’t know about. For example, when we use LDR, it creates an address in memory to use, but it also creates a relocation record so the loader can fix up the address.
Apple enforces using ADR instead of LDR to reduce the number of relocation records that need to be processed. In Chapter 2, “Loading and Adding,” we showed how to load a register with a MOV and three MOVK instructions. If you use this technique to load a memory address, then your program won’t be relocatable as the loader has no idea what you’re doing and can’t fix up the address.
It’s a good practice to enable PIE for any C or Assembly Language programs. PIE isn’t perfect; therefore, hackers have found ways around it. But it introduces a second step; hackers usually require a second vulnerability in addition to the buffer overrun to hack your program.
Poor Stack Canaries Are the First to Go
The GNU C compiler has a feature to detect buffer overruns. The idea is, in any routine that contains a string buffer located on the stack, to add extra code to place a secret random value next to the stored function return address. Then this value is tested before the function returns, and if corrupted, then a buffer overrun has occurred, and the program is terminated. These stack canaries are like the proverbial canaries in a coal mine, because when something goes wrong, they’re the first to go and warn us that something bad is happening.
We add four instructions to the function prologue and four instructions to the function epilogue.
- 1.
STP: Standard instruction to store the LR and FP to the stack. It subtracts 32 from the stack, rather than 16 to make room for the stack canary.
- 2.
ADRP: Standard instruction to load a pointer to the page that contains our data segment. Here we’re only interested in the stack canary value, but most routines will use this for other purposes as well.
- 3.
MOV: Move SP to FP, standard instruction to set up the C stack frame.
- 4.
LDR: Form the address of the stack canary. Offset 4040 is where the stack canary is stored. This is a random value generated by the C runtime initialization code.
- 5.
LDR: Load the value of the stack canary into register X1.
- 6.
STR: Store the stack canary to the correct place on the stack to guard the function return pointer (pushed LR).
- 7.
MOV: Overwrite the stack canary with zero, so it isn’t left lying around. This is to try and prevent data leakage.
- 1.
LDR: Load the stack canary from the stack into register X1.
- 2.
LDR: Load the original stack canary value from the C runtime’s data segment. In this case, X0 still contains the pointer, so we don’t need to rebuild it.
- 3.
EOR: Compare the two values. Exclusive OR’ing two registers has the same effect as subtracting them, in that the result is zero if they are the same (see Exercise 1 in this chapter).
- 4.
CBNZ: If the values are not equal (Z flag not set), then we have a problem and jump to the BL instruction after the RET instruction.
- 5.
LDP: Load LR and FP back from the stack. If we got this far, we are reasonably confident that LR hasn’t been overwritten because the stack canary survived.
- 6.
RET: Normal subroutine return.
- 7.
BL: Call to error reporting routine. This routine terminates the program rather than returning.
Stack canaries are quite effective, but if a hacker discovers the value used in a running process, they can construct a buffer overrun exploit. Plus, the fact that having your process terminate like this is never a good thing.
Preventing Code Running on the Stack
Originally stack overflow exploits would copy a hacker’s Assembly Language program as a regular part of the buffer, then overwrite the function’s return address to cause this code to execute. The ARM CPU’s hardware security marks pages of memory as readable, writable, and executable. To prevent code running from the stack, Linux removed the bit allowing code to execute there and made the stack read and write only. With a simple example like this one, it’s hard to do without adding a lot of extra compile and link switches to enable stack code execution, since it’s firmly off by default.
This doesn’t make executing code on the stack impossible, but it makes it much more difficult, requiring an extra exploit to disable this feature. The other danger is that a shared library you’re using disables this feature and you’re unaware of it.
Trade-offs of Buffer Overflow Mitigation Techniques
Care needs to be taken when designing our APIs to prevent security vulnerabilities. We should only use routines that provide some protection against buffer overrun, for example, using strncpy over strcpy. Enforce this by adding checks to the code check-in process in your source control system. But as pointed out previously, there are still trade-offs and weaknesses in these approaches. Ultimately the best protection from buffer overruns is to not have them in the first place, but beware that no matter how careful you are, mistakes and bugs happen.
Beware of data leakages. If you include a memory address in an error message, then a hacker can use this to determine what the PIE offset is. This might sound unlikely, but there are cases where programmers have a general error reporting mechanism that includes the contents of all the registers. Some of these likely contain memory addresses. CPU exploits like Spectre and Meltdown show how to access bits of memory contained in the CPU cache. It is unlikely a hacker will find a password this way, but very likely they’ll find a memory address or a stack canary.
If we turn on and incorporate every buffer overflow protection technique and tool available, then chances are that our code will run as much as 50% slower. This might be acceptable in some applications, or parts of applications; however, there are going to be parts of an application that need high performance in order to be competitive or even usable.
If we have a section of code that needs to be heavily optimized, we need to ensure there is a layer or module outside of this code that sanitizes and ensures the correctness of the data that is passed to the optimized routine. It needs to be ensured that this data checking can’t be bypassed and that it ensures that the data passes any assumptions in the optimized routines. Code and security reviews can help with this to ensure several sets of eyes have looked for potential problems. The reviewers must have security and hacking expertise, so they know what to look out for.
Placing this code in the user interface module is often a mistake. For example, if you’re writing a web application, then the UI is typically written in JavaScript and runs in the browser. Since JavaScript is an interpreted language, hackers can modify the JavaScript to bypass any error checking. Hackers may dispense with the JavaScript entirely and send bad messages to the web server. The same is true for all client/server applications. The server must validate its data and not rely on the UI layer.
A weakness with the Linux facilities like PIE is that if you link any shared library that disables PIE, then PIE is disabled for the entire application. It’s critical to ensure the completed executable still has PIE enabled; otherwise you need to find the offending libraries and replace them. The same is true for disabling stack execution. There isn’t any good reason to not use PIE, or prevent stack execution, since these don’t degrade the performance of your application.
Similarly, you might have stack canaries enabled in your code, but the shared libraries you’re using may not be compiled with this option. Therefore, your code is all protected, but if hackers find a buffer overflow in a routine in a shared library, then they will likely be able to exploit it. Stack canaries are expensive to use, so often programmers use these sparingly or not at all.
Hackers are clever and look for small chinks in an application’s armor that they can exploit. Hackers are patient, and if they find one chink that isn’t quite enough to use, they keep looking. By combining several bits of information and holes, they can work out how to crack your program’s security.
Summary
This chapter was a small glimpse into the world of hacking. We showed how one of the most famous exploits works, namely, exploiting buffer overrun. We then looked at various solutions to the problem, to make our programs more bulletproof, and also how to fix our own code and use the various tools provided by Linux and GNU C.
The occurrence of major data breaches at banks, credit agencies, and other online corporate systems happens regularly. Large corporations have the money to hire the best security consultants and use the best tools, yet they’re exploited time and again. Take this as a warning to be diligent and conscious of hacking issues in your own programming.
If you’ve read this far, you should have a good idea of how to write 64-bit Assembly Language programs for Android, iOS, and Linux. You know how to write basic programs, as well as use the FPU and the advanced NEON processor to execute SIMD instructions.
Now it's up to you to go forth and experiment. The only way to learn programming is by doing. Think up your own Assembly Language projects, for example:
- 1.
Control a robot connected to the GPIO pins of an NVidia Jetson Nano.
- 2.
Optimize an AI object recognition algorithm with Assembly Language code, even using the NEON processor.
- 3.
Contribute to the ARM-specific parts of the Linux kernel to improve the operating system’s performance.
- 4.
Enhance GCC to generate more efficient ARM code.
- 5.
Think of something original that might be the next killer application.
Exercises
- 1.
In the discussion of the epilogue code when stack canaries are enabled, we mentioned that the instruction
eor x0, x1, x0
will set X0 to zero if X0 and X1 are equal. Look up the logic rules for the exclusive or instruction and show how this works.
- 2.
Consider the various APIs for strcpy. Choose one for toupper and implement it to prevent a buffer overrun.
- 3.
Turn on stack canaries for the upper.c program from Chapter 15, “Reading and Understanding Code.” Play with it to see it working correctly and a stack overrun being caught.
- 4.
Turn on PIE with some of the existing sample programs to ensure they work okay.
- 5.
Do you think that always turning on maximum protection and living with the performance hit is the safest approach?