© Stephen Smith 2020
S. SmithProgramming with 64-Bit ARM Assembly Languagehttps://doi.org/10.1007/978-1-4842-5881-1_16

16. Hacking Code

Stephen Smith1 
(1)
Gibsons, BC, Canada
 

For the purpose of this chapter, hacking means gaining illicit access to a computer or network by various tricky means. This chapter offers techniques to hack programs by providing them with bad data. Another form of hacking is social engineering where you trick people into revealing their passwords, or other personal data, over the phone, social media, or e-mail; however, that’s a topic for a different book.

Every programmer should know about hacking. If you don’t know how hackers exploit security weaknesses in program code, then you will unknowingly provide these for them.

Buffer Overrun Hack

As an example, we’ll look at the classic buffer overrun problem, how it happens, how to exploit it, and then how to protect against it. Anyone with security experience will notice that our upper-case routine is error-prone and will likely lead to a buffer overrun vulnerability in our code. Let’s look at what buffer overrun is and how it gets exploited.

Causes of Buffer Overrun

Our upper-case routine happily converts text to upper-case until it hits a NULL (0) character . If the provided text is bigger than the output buffer the caller provides, then this routine overwrites whatever is in memory after it. Depending on where the buffer is located, this affects the type of attack that’s possible. We’re going to look at this buffer being located on the stack. The weakness of the stack is that this is where function return addresses get stored when we nest function calls. If we arrange our code exactly, we can overwrite a function return address and cause the function to return to a place of our choosing.

There are other forms of buffer overrun attacks if the data is stored in the C runtime heap, or in the program’s data segment. These attacks are like what we will explore for the stack.

If you enter too much data into such a text field, the program typically crashes, since you’ve overwritten important program data and corrupted pointers. Even though the hacker won’t get any proprietary data this way, this is still a good foundation for a denial of service (DoS) attack. If this is a web server and you cause it to crash, then it needs to be restarted and re-initialized. This typically takes several seconds. This means we can send a message to the web server every few seconds to keep it offline.

Stealing Credit Card Numbers

Imagine a credit card company’s web server running a web application that uses our upper-case program, because it needs to convert names to upper-case super fast, so that its web pages are exceptionally responsive. Suppose there’s a page on the web site where you enter your name, and the web application converts it to upper-case; but the web page wasn’t error checking for the length of data and passed it to our upper-case routine as is. Furthermore, for convenience this web application provides several administrative utilities, such as a facility to download all the credit card data, so it can be backed up. These utilities are only available to administrative users with special clearance and require a digital certificate to access. As a hacker, we want to dupe the customer facing part of the web site into giving us access to the administrative part without requiring extra authentication.

In Chapter 6, “Functions and the Stack,” we learned that if a function calls another function, it must store the LR register to the stack, so that it won’t be lost. We’ll modify our main program and upper-case routine to have an intermediate routine, so LR is stored to the stack and allocates the output buffer on the stack.

Listing 16-1 contains three routines: One is the skeleton of the credit card company’s web application. It has the usual _start entry point that calls the routine calltoupper. This routine pushes LR to the stack and allocates 16 bytes for the output buffer. The second is the DownloadCreditCardNumbers routine that we shouldn’t be able to access. And the third is the specially constructed input data that if we enter in a text box will cause nefarious things to happen.
//
// Assembler program to demonstrate a buffer
// overrun hacking attack.
//
// X0-X2 - parameters to Linux function services
// X1 - address of output string
// X0 - address of input string
// X8 - Linux function number
//
.global _start            // Provide program starting address
DownloadCreditCardNumbers:
// Setup the parameters to print hello world
// and then call Linux to do it.
      MOV    X0, #1     // 1 = StdOut
      LDR    X1, =getcreditcards // string to print
      MOV    X2, #30             // length of our string
      MOV    X8, #64             // Linux write system call
      SVC    0          // Call linux to output the string
      RET
calltoupper:
       STR   LR, [SP, #-16]!     // Put LR on the stack
       SUB   SP, SP, #16         // 16 bytes for outstr
       LDR   X0, =instr          // start of input string
       MOV   X1, SP              // address of output string
       BL    toupper
aftertoupper:       // convenient label to use as a breakpoint
       ADD   SP, SP, #16   // Free outstr
       LDR   LR, [SP], #16
       RET
_start:
       BL   calltoupper
// Setup the parameters to exit the program
// and then call Linux to do it.
      MOV     X0, #0      // Use 0 return code
      MOV     X8, #93     // Service command code 93 terminates
      SVC     0           // Call Linux to terminate the program
.data
instr:  .ascii  "This is our Test"     // Correct length string
        .dword 0x00000000004000b0      // overwrite for LR
getcreditcards:       .asciz  "Downloading Credit Card Data! "
        .align 4
Listing 16-1

Main web application for the credit card company

For this example, we use the first optimized example of the upper-case routine, upper.s, from Chapter 14, “Optimizing Code,” that uses the range shift optimization. When this program is compiled and run, you get
Downloading Credit Card Data!

repeated over and over until you hit Ctrl+C. This is in spite of the routine DownloadCreditCardNumbers never being called within the program. We’ll see why the program is put in an infinite loop shortly.

We won’t include the code for the user interface; we’ll just provide the data in our .data section. We want to keep things simple and easy to follow.

Let’s look at what happens to the stack through the process as this function runs.

Stepping Through the Stack

The stack is set up in the calltoupper function. Figure 16-1 shows the values of SP and what is stored in each 16-byte block. Remember that SP must always be 16-byte aligned.
../images/494415_1_En_16_Chapter/494415_1_En_16_Fig1_HTML.jpg
Figure 16-1

The contents of the stack inside the calltoupper function

Remember that the stack grows downward, so when we push something onto the stack, we decrement SP. The pointer we pass for outstr will be 0x7ffffff1f0, and since our loop in the upper-case routine increments, if it overflows its buffer, it overwrites the stored value for LR located at memory address 0x7ffffff200. The strategy is to overwrite LR with an address causing the program to do our bidding.

Listing 16-2 shows the memory addresses of the key instructions we will consider. We want to overwrite the LR register with 0x4000b0; that’s the address of the DownloadCreditCardNumbers routine.
00000000004000b0 <DownloadCreditCardNumbers>:
  4000b0:   d2800020   mov   x0, #0x1
...
00000000004000c8 <calltoupper>:
  4000c8:   f81f0ffe   str   x30, [sp, #-16]!
...
00000000004000e8 <_start>:
  4000e8:   97fffff8   bl   4000c8 <calltoupper>
  4000ec:   d2800000   mov  x0, #0x0
Listing 16-2

Excerpts of the objdump output of the program in Listing 16-1

  1. 1.

    In _start we do the BL to the calltoupper routine. This places the address of the next instruction into LR and jumps to calltoupper. This means LR has the value 0x4000ec at this point.

     
  2. 2.

    On entering calltoupper, SP contains 0x7ffffff210. Execute the

    STR   LR, [sp, #-16]!
    instruction which decrements SP by 16 and copies LR to this memory location. This makes SP 0x7ffffff200 and the 16 bytes there contain
    0x7ffffff200: 0x004000ec 0x00000000 0x00000000 0x00000000

    showing that LR was pushed to the stack.

     
  3. 3.

    Execute

    SUB   SP, SP, #16
    This allocates 16 bytes for our output buffer. This reduces the stack pointer to 0x7ffffff1f0 and the contents of the stack are
    0x7ffffff1f0: 0x00000000 0x00000000 0x00000000  0x00000000
    0x7ffffff200: 0x004000ec 0x00000000 0x00000000  0x00000000
     
  4. 4.

    The function toupper converts our string to upper-case. It does this correctly for the first part of the string “This is our Test” (16 bytes). Since there is no NULL (0) terminator, it will also process the next byte 0xb0 that isn’t lower-case, so will be copied as is. The next byte is a NULL (0), so it stops. SP isn’t affected by this series of operations, but on returning from toupper, the stack contains

    0x7ffffff1f0: 0x53494854  0x20534920  0x2052554f 0x54534554
    0x7ffffff200: 0x004000b0  0x00000000  0x00000000 0x00000000

    The first line is our string, converted to upper-case. But notice the return address at 0x7ffffff200 has changed from 0x004000ec to 0x004000b0. This means the return address is the address of the DownloadCreditCardNumbers routine.

     
  5. 5.

    The calltoupper cleans up the stack and returns

    ADD   SP, SP, #16   // Free outstr
    LDR   LR, [SP], #16
    RET

    The key point is that the LDR instruction loads the address of DownloadCreditCardNumbers into LR, then the RET instruction branches to that routine causing a major data breach.

     
In performing this hack, we are lucky on a couple of points:
  1. 1.

    We only need to copy one byte to get the address changed to what we want, since the next byte of the address is NULL (0).

     
  2. 2.

    The byte we needed to copy wasn’t one for a lower-case letter, so it was left alone by the toupper routine.

     

A successful hack usually requires some luck and fortuitous circumstances. If this wasn’t the case, we still have some options. For example, we could jump into the middle of the DownloadCreditCardNumbers routine. The start of a function usually contains function prologue that, if we never intend to successfully return from, can be skipped. After all, we don’t care if the program continues to work correctly, only that we get our downloaded credit card numbers.

The reason the program goes into an infinite loop is because we don’t do a BL to call DownloadCreditCardNumbers; we use a RET instruction. So nothing updates LR to a new value; therefore, the RET at the end of DownloadCreditCardNumbers jumps to the same address again.

This was an example of one particular buffer overrun exploit; however, hackers have many ways to exploit buffer overruns, whether the data is on the stack, in the C memory heap, or in our data segment. Let’s look at several ways to avoid buffer overrun problems.

Mitigating Buffer Overrun Vulnerabilities

To combat buffer overrun problems, there are techniques we can use in our code and that our tools can provide to help us. In this section, we’ll look at both. First of all, let’s consider the bad design of the function parameters to our upper-case routine. Before we consider a solution, let’s look at the root cause of many buffer overrun problems, the C runtime’s strcpy function, and the various solutions proposed to fix this design.

Don’t Use strcpy

The C runtime’s strcpy routine has the following prototype:
char * strcpy ( char * destination,
      const char * source );
It copies characters from source to destination, until a NULL (0) character is encountered. This results in buffer overrun vulnerabilities like we just encountered. The original suggested solution was to replace all occurrences of strcpy with strncpy:
char * strncpy ( char * destination,
      const char * source, size_t num );
Here you place the size of the destination in num, and it stops copying at that point. That stops the buffer overrun at this point, but now the destination string is not NULL (0) terminated, and this leads to a buffer overrun later in the code. One suggestion is to always do the following:
strncpy( dest, source, num );
dest[num-1] = ‘’;

This NULL terminates the string, but it requires the programmer to remember to always do this. Perhaps, under deadline pressure, this may be forgotten.

A new function was then introduced to the BSD C runtime, strlcpy, that always NULL terminates the destination string:
size_t strlcpy(char *destination,
      const char *source, size_t size);

This function eliminates that problem, as the destination is always NULL (0) terminated, but this function is nonstandard and not part of the GNU C library.

A criticism of both strncpy and strlcpy type functions is that they eliminate the ability to nest these functions to quickly build larger more complicated strings. This is because you don’t easily know the remaining buffer length if you’re concatenating strings together. Another suggested solution is the following:
char * strecpy ( char * destination,
      const char * source, char * end );

This strecpy passes in a pointer to the end of the destination buffer. This is handy when you nest calls, since end stays constant, unlike a remaining length that shrinks as you build the string. Again, this is a nonstandard function and not part of the C runtime.

These functions all stop overwriting the destination buffer and prevent data corruption. However, they all have a problem that they could allow the leakage of sensitive data. Suppose the source isn’t NULL (0) terminated and the source buffer is smaller than the destination buffer; then the function will copy data until the destination buffer is full. This means we’ve copied some possibly sensitive data from past the end of the source buffer into the destination buffer. If this is displayed later, it might give away some sort of sensitive or helpful information to hackers. This leads to another form:
errno_t strncpy_s(char * destination, size_t destmax,
    const char * source, size_t srcmax);

In strncpy_s we provide the size of both buffers and the function returns an error code to tell us what happened.

I went through this discussion to point out that there are a lot of trade-offs in fixing API designs. When making the upper-case routine more secure, there are quite a few trade-offs to consider. We’ll present a list of recommendations toward the end of this chapter, but first let’s see what the operating system and GNU compiler can do to help us.

PIE Is Good

The exploit we performed previously relied upon us knowing the address of the DownloadCreditCardNumbers routine. The assumption is that we learned this from somewhere else, perhaps obtaining an illicit copy of the application’s source code, or the build map file from the dark web.

With modern virtual memory systems, the operating system can give a process any memory addresses it likes; they don’t need to have any relation to real memory addresses. This gave rise to a feature called position-independent executables (PIE) introduced to Linux around 2005. With this feature, an executable is loaded with a different base address each time it is run. This is a special case of address space layout randomization (ASLR) , and you often see it referred to by either name.

This sounds good, so why did our preceding exploit work? Why didn’t PIE defeat us? The reason is that you need to turn on PIE in the command line for the ld command. This is a conservative approach, whereby turning it on, you’re acknowledging that you don’t have any code that can’t be relocated. Furthermore, none of the shared libraries you’re using aren’t relocatable. To turn on PIE, we need to add -pie to the list of options for the ld command. If we do this, we get the following:
smist08@kali:~/asm64/Chapter 16$ make
as   main.s -o main.o
as   upper.s -o upper.o
ld -pie -o upperpie main.o upper.o
smist08@kali:~/asm64/Chapter 16$ ./upperpie
Segmentation fault
smist08@kali:~/asm64/Chapter 16$

If we debug this with gdb, we’ll see it runs as before, but all the addresses are changed. Often when debugging, we turn off PIE and only enable it for release to make decoding what is going on easier.

Note

Apple’s iOS operating system turns on PIE by default. If your program can’t handle it, then you need to deliberately turn it off.

This still isn’t ideal; it’s better since the credit card numbers didn’t get stolen, but the program still crashed. This can lead to an easy DoS attack for hackers to make our application unavailable.

We mentioned that the program needs to be relocatable. What stops your program being relocatable? Mostly hard-coding memory addresses in your data section that the linker doesn’t know about. For example, when we use LDR, it creates an address in memory to use, but it also creates a relocation record so the loader can fix up the address.

Apple enforces using ADR instead of LDR to reduce the number of relocation records that need to be processed. In Chapter 2, “Loading and Adding,” we showed how to load a register with a MOV and three MOVK instructions. If you use this technique to load a memory address, then your program won’t be relocatable as the loader has no idea what you’re doing and can’t fix up the address.

It’s a good practice to enable PIE for any C or Assembly Language programs. PIE isn’t perfect; therefore, hackers have found ways around it. But it introduces a second step; hackers usually require a second vulnerability in addition to the buffer overrun to hack your program.

Poor Stack Canaries Are the First to Go

The GNU C compiler has a feature to detect buffer overruns. The idea is, in any routine that contains a string buffer located on the stack, to add extra code to place a secret random value next to the stored function return address. Then this value is tested before the function returns, and if corrupted, then a buffer overrun has occurred, and the program is terminated. These stack canaries are like the proverbial canaries in a coal mine, because when something goes wrong, they’re the first to go and warn us that something bad is happening.

The source code that accompanies this book has a version of upper.c from Chapter 15, “Reading and Understanding Code,” that introduces a buffer overrun. Like PIE, this is an optional feature and we need to enable it with a gcc command line option. Here we use -fstack-protector-all, which is the most aggressive form of this feature. If we add this, compile, and run, we get the following:
smist08@kali:~/asm64/Chapter 16$ make
gcc -o uppercanary -fstack-protector-all -O3 upper.c
smist08@kali:~/asm64/Chapter 16$ ./uppercanary
Input: This is a test!xxxxxxxxxxxxxxxxxxxxyyyyyyandevenlongerandlongerandlonger
Output: THIS IS A TEST!XXXXXXXXXXXXXXXXXXXYYYYYYANDEVENLONGERANDLONGERANDLONGER
*** stack smashing detected ***: <unknown> terminated
Aborted
smist08@kali:~/asm64/Chapter 16$
This is great, as it prevented our buffer overrun, but it is quite expensive since it adds quite a few instructions to every function. Let’s look at the code that’s generated inside our functions. The following is extracted from and objdump of this program:
00000000000008e8 <routine>:
 8e8:   a9be7bfd    stp   x29, x30, [sp, #-32]!
 8ec:   90000080    adrp  x0, 10000 <__FRAME_END__+0xf3c0>
 8f0:   910003fd    mov   x29, sp
 8f4:   f947e400    ldr   x0, [x0, #4040]
 8f8:   f9400001    ldr   x1, [x0]
 8fc:   f9000fe1    str   x1, [sp, #24]
 900:   d2800001    mov   x1, #0x0                  // #0
// body of routine ...
 904:   f9400fe1    ldr   x1, [sp, #24]
 908:   f9400000    ldr   x0, [x0]
 90c:   ca000020    eor   x0, x1, x0
 910:   b5000080    cbnz  x0, 920 <routine+0x38>
 918:   a8c27bfd    ldp   x29, x30, [sp], #32
 91c:   d65f03c0    ret
 920:   97ffff74    bl    6f0 <__stack_chk_fail@plt>

We add four instructions to the function prologue and four instructions to the function epilogue.

Let’s go through the instructions in the function prologue one by one:
  1. 1.

    STP: Standard instruction to store the LR and FP to the stack. It subtracts 32 from the stack, rather than 16 to make room for the stack canary.

     
  2. 2.

    ADRP: Standard instruction to load a pointer to the page that contains our data segment. Here we’re only interested in the stack canary value, but most routines will use this for other purposes as well.

     
  3. 3.

    MOV: Move SP to FP, standard instruction to set up the C stack frame.

     
  4. 4.

    LDR: Form the address of the stack canary. Offset 4040 is where the stack canary is stored. This is a random value generated by the C runtime initialization code.

     
  5. 5.

    LDR: Load the value of the stack canary into register X1.

     
  6. 6.

    STR: Store the stack canary to the correct place on the stack to guard the function return pointer (pushed LR).

     
  7. 7.

    MOV: Overwrite the stack canary with zero, so it isn’t left lying around. This is to try and prevent data leakage.

     
Next, let’s go through the instructions in the function epilogue:
  1. 1.

    LDR: Load the stack canary from the stack into register X1.

     
  2. 2.

    LDR: Load the original stack canary value from the C runtime’s data segment. In this case, X0 still contains the pointer, so we don’t need to rebuild it.

     
  3. 3.

    EOR: Compare the two values. Exclusive OR’ing two registers has the same effect as subtracting them, in that the result is zero if they are the same (see Exercise 1 in this chapter).

     
  4. 4.

    CBNZ: If the values are not equal (Z flag not set), then we have a problem and jump to the BL instruction after the RET instruction.

     
  5. 5.

    LDP: Load LR and FP back from the stack. If we got this far, we are reasonably confident that LR hasn’t been overwritten because the stack canary survived.

     
  6. 6.

    RET: Normal subroutine return.

     
  7. 7.

    BL: Call to error reporting routine. This routine terminates the program rather than returning.

     

Stack canaries are quite effective, but if a hacker discovers the value used in a running process, they can construct a buffer overrun exploit. Plus, the fact that having your process terminate like this is never a good thing.

Preventing Code Running on the Stack

Originally stack overflow exploits would copy a hacker’s Assembly Language program as a regular part of the buffer, then overwrite the function’s return address to cause this code to execute. The ARM CPU’s hardware security marks pages of memory as readable, writable, and executable. To prevent code running from the stack, Linux removed the bit allowing code to execute there and made the stack read and write only. With a simple example like this one, it’s hard to do without adding a lot of extra compile and link switches to enable stack code execution, since it’s firmly off by default.

This doesn’t make executing code on the stack impossible, but it makes it much more difficult, requiring an extra exploit to disable this feature. The other danger is that a shared library you’re using disables this feature and you’re unaware of it.

Trade-offs of Buffer Overflow Mitigation Techniques

Care needs to be taken when designing our APIs to prevent security vulnerabilities. We should only use routines that provide some protection against buffer overrun, for example, using strncpy over strcpy. Enforce this by adding checks to the code check-in process in your source control system. But as pointed out previously, there are still trade-offs and weaknesses in these approaches. Ultimately the best protection from buffer overruns is to not have them in the first place, but beware that no matter how careful you are, mistakes and bugs happen.

Beware of data leakages. If you include a memory address in an error message, then a hacker can use this to determine what the PIE offset is. This might sound unlikely, but there are cases where programmers have a general error reporting mechanism that includes the contents of all the registers. Some of these likely contain memory addresses. CPU exploits like Spectre and Meltdown show how to access bits of memory contained in the CPU cache. It is unlikely a hacker will find a password this way, but very likely they’ll find a memory address or a stack canary.

If we turn on and incorporate every buffer overflow protection technique and tool available, then chances are that our code will run as much as 50% slower. This might be acceptable in some applications, or parts of applications; however, there are going to be parts of an application that need high performance in order to be competitive or even usable.

If we have a section of code that needs to be heavily optimized, we need to ensure there is a layer or module outside of this code that sanitizes and ensures the correctness of the data that is passed to the optimized routine. It needs to be ensured that this data checking can’t be bypassed and that it ensures that the data passes any assumptions in the optimized routines. Code and security reviews can help with this to ensure several sets of eyes have looked for potential problems. The reviewers must have security and hacking expertise, so they know what to look out for.

Note

Placing this code in the user interface module is often a mistake. For example, if you’re writing a web application, then the UI is typically written in JavaScript and runs in the browser. Since JavaScript is an interpreted language, hackers can modify the JavaScript to bypass any error checking. Hackers may dispense with the JavaScript entirely and send bad messages to the web server. The same is true for all client/server applications. The server must validate its data and not rely on the UI layer.

A weakness with the Linux facilities like PIE is that if you link any shared library that disables PIE, then PIE is disabled for the entire application. It’s critical to ensure the completed executable still has PIE enabled; otherwise you need to find the offending libraries and replace them. The same is true for disabling stack execution. There isn’t any good reason to not use PIE, or prevent stack execution, since these don’t degrade the performance of your application.

Similarly, you might have stack canaries enabled in your code, but the shared libraries you’re using may not be compiled with this option. Therefore, your code is all protected, but if hackers find a buffer overflow in a routine in a shared library, then they will likely be able to exploit it. Stack canaries are expensive to use, so often programmers use these sparingly or not at all.

Hackers are clever and look for small chinks in an application’s armor that they can exploit. Hackers are patient, and if they find one chink that isn’t quite enough to use, they keep looking. By combining several bits of information and holes, they can work out how to crack your program’s security.

Summary

This chapter was a small glimpse into the world of hacking. We showed how one of the most famous exploits works, namely, exploiting buffer overrun. We then looked at various solutions to the problem, to make our programs more bulletproof, and also how to fix our own code and use the various tools provided by Linux and GNU C.

The occurrence of major data breaches at banks, credit agencies, and other online corporate systems happens regularly. Large corporations have the money to hire the best security consultants and use the best tools, yet they’re exploited time and again. Take this as a warning to be diligent and conscious of hacking issues in your own programming.

If you’ve read this far, you should have a good idea of how to write 64-bit Assembly Language programs for Android, iOS, and Linux. You know how to write basic programs, as well as use the FPU and the advanced NEON processor to execute SIMD instructions.

Now it's up to you to go forth and experiment. The only way to learn programming is by doing. Think up your own Assembly Language projects, for example:

  1. 1.

    Control a robot connected to the GPIO pins of an NVidia Jetson Nano.

     
  2. 2.

    Optimize an AI object recognition algorithm with Assembly Language code, even using the NEON processor.

     
  3. 3.

    Contribute to the ARM-specific parts of the Linux kernel to improve the operating system’s performance.

     
  4. 4.

    Enhance GCC to generate more efficient ARM code.

     
  5. 5.

    Think of something original that might be the next killer application.

     

Exercises

  1. 1.

    In the discussion of the epilogue code when stack canaries are enabled, we mentioned that the instruction

    eor x0, x1, x0

    will set X0 to zero if X0 and X1 are equal. Look up the logic rules for the exclusive or instruction and show how this works.

     
  2. 2.

    Consider the various APIs for strcpy. Choose one for toupper and implement it to prevent a buffer overrun.

     
  3. 3.

    Turn on stack canaries for the upper.c program from Chapter 15, “Reading and Understanding Code.” Play with it to see it working correctly and a stack overrun being caught.

     
  4. 4.

    Turn on PIE with some of the existing sample programs to ensure they work okay.

     
  5. 5.

    Do you think that always turning on maximum protection and living with the performance hit is the safest approach?

     
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.160.156