Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

S. SmithRaspberry Pi Assembly Language Programminghttps://doi.org/10.1007/978-1-4842-5287-1_14

14. Reading and Understanding Code

Stephen Smith¹

(1)

Gibsons, BC, Canada

We’ve now learned quite a bit of ARM 32-bit Assembly language; one of the things we can do is read another programmer’s code. Reading other programmer’s code is a great way to add to our toolkit of tips and tricks and improve our own coding. We’ll review some places where you can find Assembly source code for the ARM32. Then we’ll look at how the GNU C compiler writes Assembly code and how we can analyze it. We’ll look at the NSA’s Ghidra hacking tool that can convert Assembly code back into C code—at least approximately.

We’ll use our uppercase program to see how the C compiler writes Assembly code and then examine how Ghidra can take that code and reconstitute the C code. We’ll also look at how the C compiler deals with the lack of an integer division instruction in older ARM processors.

Raspbian and GCC

One of the many nice things about working with the Raspberry Pi and GNU Compiler Collection is that they are open source. That means you can browse through the source code and peruse the Assembly parts contained there.

They are available in the following Github repositories:

Raspbian Linux kernel: https://github.com/raspberrypi/linux
GCC source code: https://github.com/gcc-mirror/gcc

Clicking the “Clone or download” button and choosing “Download ZIP” is the easiest way to obtain it. Within all this source code, a couple of good folders to peruse ARM 32-bit Assembly source code are

Raspbian Linux kernel:
- arch/arm/common
- arch/arm/kernel
- arch/arm/crypto
GCC:
- libgcc/config/arm

Note

The arch/arm/crypto has several cryptographic routines implemented on the NEON coprocessor.

The Assembly source code for these are in ∗.S files (note the uppercase S). Raspbian is based on Debian Linux. Both Debian Linux and GCC support dozens of processor architectures, so when looking for Assembly source code, make sure you look for ∗.S files in an arm folder. If you are interested, you could compare the ARM 32-bit Assembly files to the files for other processors.

The source code for these use both GNU Assembler directives like .MACRO and C preprocessor directives like #define and #ifdef. If you are going to read this source code, it helps to brush up on the C preprocessor.

The GNU compiler supports older ARM processors than contained in any Raspberry Pi, as well as configurations of the ARM processor that the Raspberry foundation never used. For instance, there is a library to implement IEEE 754 floating-point for ARM processors without an FPU. However, all Raspberry Pis do have an FPU, so this isn’t used.

Division Revisited

In Chapter 10, “Multiply, Divide, and Accumulate,” we assumed we had a newer Raspberry Pi and used the newer ARM processor’s SDIV or UDIV instructions. We just left a comment that if you wanted to divide on older Pi, then use the FPU or roll your own. We never covered how to roll our own. Another approach is to see what the C compiler does. Consider Listing 14-1, the simple C program.

#include <stdio.h>

int main()

{

int x = 100;

int y = 25;

int z;

z = x / y;

printf("%d / %d = %d ", x, y, z);

return(0);

}

Listing 14-1

Simple C program that divides two numbers

We can compile this with

gcc -o div div.c

Note

We can’t use any of the -O flag options, because any optimization will remove the expression and the compiler will just plug 4 in for z.

We can look at the generated Assembly code with

objdump -d div

Because we didn’t compile with an -O option, there is a lot of code, but in the middle of the main routine, we see

10454: e51b100c ldr r1, [fp, #-12]

10458: e51b0008 ldr r0, [fp, #-8]

1045c: eb00000b bl 10490 <__divsi3>

10460: e1a03000 mov r3, r0

which sets up and calls a division routine called _divsi3. The Assembly for the _divsi3 routine is also present in the output from objdump. It is very long and contains code like

104e0: e1530f81 cmp r3, r1, lsl #31

104e4: e0a00000 adc r0, r0, r0

104e8: 20433f81 subcs r3, r3, r1, lsl #31

and repeated 32 times. What’s going on here? Since we can download the source code for gcc and all its libraries, we can look at the source code. If we search for the definition of _divsi3, we will find it in libgcc/config/arm/lib1funcs.S. This source code is confusing, because it contains versions of its routines for different generations of ARM, as well as having versions that use thumb code. We’ll cover thumb code in Chapter 15, “Thumb Code,” but until then we can ignore those parts.

Listing 14-2 is the main part of the division routine.

ARM_FUNC_START divsi3

ARM_FUNC_ALIAS aeabi_idiv divsi3

cmp r1, #0

beq LSYM(Ldiv0)

LSYM(divsi3_skip_div0_test):

eor ip, r0, r1 @ save the sign of the result.

do_it mi

rsbmi r1, r1, #0 @ loops below use unsigned.

subs r2, r1, #1 @ division by 1 or -1 ?

beq 10f

movs r3, r0

do_it mi

rsbmi r3, r0, #0 @ positive dividend value

cmp r3, r1

bls 11f

tst r1, r2 @ divisor is power of 2?

beq 12f

ARM_DIV_BODY r3, r1, r0, r2

cmp ip, #0

do_it mi

rsbmi r0, r0, #0

RET

Listing 14-2

Main part of the gcclib division routine

The routine starts by checking for division by 0, which is an error. It then looks for the easy cases of division by 1 or –1, then the other cases of dividing by a power of 2. It also saves the sign bits so the answer can be set properly at the end.

There are a lot of macros used in this code. Listing 14-3 is the one that generates the actual division is ARM_DIV_BODY.

.macro ARM_DIV_BODY dividend, divisor, result, curbit

clz curbit, dividend

clz esult, divisor

sub curbit, esult, curbit

rsbs curbit, curbit, #31

addne curbit, curbit, curbit, lsl #1

mov esult, #0

addne pc, pc, curbit, lsl #2

nop

.set shift, 32

.rept 32

.set shift, shift - 1

cmp dividend, divisor, lsl #shift

adc esult, esult, esult

subcs dividend, dividend, divisor, lsl #shift

.endr

.endm

Listing 14-3

Main body of the division routine

Within this macro is

.set shift, 32

.rept 32

.set shift, shift - 1

cmp dividend, divisor, lsl #shift

adc esult, esult, esult

subcs dividend, dividend, divisor, lsl #shift

.endr

which generates the repetitive code we see. This is a form of optimization called loop unrolling, where if a loop executes a fixed number of times, we just duplicate the code that many times. This saves us an expensive branch instruction, as well as the arithmetic calculating the loop index. Division will be used often enough that we want the code as fast as possible, and we can spare the extra code space to achieve this.

The algorithm for this division is basically the same long division algorithm you learned in elementary school. It is just a bit simpler in binary since there can only be two answers at each step, whether to put a 1 in the result or not.

Note

If we included the -march=“armv8-a” compiler switch, then the compiler would use a SDIV instruction instead of this function call. GCC will use advanced ARM features if it knows they are available.

Sadly, the Assembly source code contained in gcc and Linux isn’t always as well documented as we would like, but it does give us quite a bit of source code to ponder and learn from.

You might want to look at ieee754-sf.S and ieee754-df.S in the same folder as lib1funcs.S, gcc/libgcc/config/arm. These are the implementations of floating-point in single and double precision for ARM processors that don’t have an FPU. It’s interesting to see all the work the FPU does for us.

Code Created by GCC

In the last section, we looked at some code generated by gcc to see how it handles the lack of a SDIV instruction. Let’s look at how gcc would write our code. We’ll code our uppercase routine in C and compare the generated code to what we wrote. For this example, we want gcc to do as good a job as possible, so we will use the -O3 option to get maximal optimization.

We create upper.c from Listing 14-4.

#include <stdio.h>

int mytoupper(char *instr, char *outstr)

{

char cur;

char *orig_outstr = outstr;

{

cur = *instr;

if ((cur >= 'a') && (cur <='z'))

{

cur = cur - ('a'-'A');

}

*outstr++ = cur;

instr++;

} while (cur != '');

return( outstr - orig_outstr );

}

#define BUFFERSIZE 250

char *tstStr = "This is a test!";

char outStr[BUFFERSIZE];

int main()

{

mytoupper(tstStr, outStr);

printf("Input: %s Output: %s ", tstStr, outStr);

return(0);

}

Listing 14-4

C implementation of our mytoupper routine

We can compile this with

gcc -O3 -o upper upper.c

then run objdump to see the generated code

objdump -d upper >od.txt

We get Listing 14-5.

00010318 <main>:

10318: e59f2048 ldr r2, [pc, #72] ; 10368 <main+0x50>

1031c: e59f3048 ldr r3, [pc, #72] ; 1036c <main+0x54>

10320: e92d4010 push {r4, lr}

10324: e5921000 ldr r1, [r2]

10328: e1a02001 mov r2, r1

1032c: e4d24001 ldrb r4, [r2], #1

10330: e2833001 add r3, r3, #1

10334: e2440061 sub r0, r4, #97 ; 0x61

10338: e3500019 cmp r0, #25

1033c: e2440020 sub r0, r4, #32

10340: 95430001 strbls r0, [r3, #-1]

10344: 9afffff8 bls 1032c <main+0x14>

10348: e3540000 cmp r4, #0

1034c: e5434001 strb r4, [r3, #-1]

10350: 1afffff5 bne 1032c <main+0x14>

10354: e59f2010 ldr r2, [pc, #16] ; 1036c <main+0x54>

10358: e59f0010 ldr r0, [pc, #16] ; 10370 <main+0x58>

1035c: ebffffe1 bl 102e8 <printf@plt>

10360: e1a00004 mov r0, r4

10364: e8bd8010 pop {r4, pc}

10368: 00021028 .word 0x00021028

1036c: 00021030 .word 0x00021030

10370: 0001050c .word 0x0001050c

Listing 14-5

Assembly code generated by the C compiler for our uppercase function

A few things to notice about this listing are as follows:

The compiler automatically inlined the mytoupper function like our macro version.
The compiler knows about the range optimization and shifted the range, so it only does one comparison.
The compiler made good use of the registers and didn’t create a stack frame. It only uses five registers, so it only needs to push/pop R4.
The compiler knows how to use conditional instructions.
The compiler took a slightly different approach to adding the conditional, putting it on a store instruction, so the converted character is only stored if the character is lowercase. It then jumps to loop since it knows if it’s lowercase, it can’t be NULL. Otherwise, it falls through, stores the unconverted character, checks for NULL, and loops if it isn’t.

Overall, the compiler did a good job of compiling our code, just taking a couple extra instructions over what we wrote in the last chapter. GCC has supported the ARM processor for 20 years now. ARM Holdings has made major contributions to GCC to improve the ARM support. All the work over this time has led to a robust and performant system, and the best part is that it is all open source.

This is why many Assembly language programmers start with C code, then only recode in Assembly if the C code isn’t efficient. This usually happens when the complexity is higher and the need for speed is greater, such as the code in the gcclib for floating-point arithmetic and division, where speed is crucial, and pure Assembler is better at bit-level manipulations than C.

In Chapter 8, “Programming GPIO Pins,” we looked at programming the GPIO pins using the GPIO controller’s memory registers. This sort of code will confuse the optimizer. Often it needs to be turned off, or it optimizes away the code that accesses these locations. This is because we write to memory locations and never read them and read memory we never set. There are keywords to help the optimizer, but in the end, Assembler can result in quite a bit better code, because you are working against the C optimizer, that doesn’t know what the GPIO controller is doing with this memory.

Reverse Engineering and Ghidra

In the Raspbian world, most of the programs you encounter are open source that you can easily download the source code and study it. There is documentation on how it works, and you are actively encouraged to contribute to the program, perhaps fix bugs or add a new feature.

Suppose we encounter a program that we don’t have the source code for, and we want to know how it works. Perhaps we want to study it to see if it contains malware. It might be the case that we are worried about privacy concerns and want to know what information the program sends on the Internet. Maybe it's a game, and we want to know if there is a secret code we can enter to go into God mode. What is the best way to go about this?

We can examine the Assembly code of any Linux executable using objdump or gdb. We know enough about Assembly that we can make sense of the instructions we encounter. However, this doesn’t help us form a big picture of how the program is structured and it’s time-consuming.

There are tools to help with this. Until recently there were only expensive commercial products available; however, the NSA, yes, that NSA, released a version of the tool that their hackers use to analyze code. It is called Ghidra, named after the three-headed monster that Godzilla fights. This tool lets you analyze compiled programs and includes the ability to decompile a program back into C code. It includes tools to show you the graphs of function calls and the ability to make annotations as you learn things.

Sadly, Ghidra doesn’t run properly on the Raspberry Pi anymore, even though it is written in Java. The NSA states that Ghidra won’t be supported running on 32-bit operating systems anymore. However, Ghidra still supports analyzing 32-bit programs. It also has full support for the ARM processor. This means we need to transfer our executable file to a computer running a 64-bit operating system, whether it is Linux, macOS, or Windows.

You can download Ghidra from https://ghidra-sre.org/ . To install it, you unzip it, then run the ghidraRun script if you are on Linux. Ghidra requires the Java runtime; if you don’t have this already installed, you will need to install it for your operating system.

Decompiling an optimized C program is difficult. As we saw in the last section, the GCC optimizer does some major rewriting of our original code as part of converting it to Assembly language. Let’s take the upper program that we compiled from C in the last section, give it to Ghidra to decompile, and see whether the result is like our starting source code.

If we create a project in Ghidra, import our upper program, then run the code browser we get the window shown in Figure 14-1.

../images/486919_1_En_14_Chapter/486919_1_En_14_Fig1_HTML.jpg — Figure 14-1
Ghidra analyzing our upper program

Listing 14-6 is the C code that Ghidra generated. I added the lines above the definition of the main routine, so the program will compile and run.

#include <stdio.h>

#define BUFFERSIZE 250

char *tstStr = "This is a test!";

char outStr[BUFFERSIZE];

typedef unsigned int uint;

typedef unsigned char byte;

typedef void undefined;

#define true 1

uint main(void)

{

byte bVar1;

undefined *puVar2;

byte *pbVar3;

byte *pbVar4;

puVar2 = tstStr;

pbVar3 = tstStr;

pbVar4 = outStr;

do {

while( true ) {

bVar1 = *pbVar3;

if (0x19 < (uint)bVar1 - 0x61) break;

*pbVar4 = bVar1 - 0x20;

pbVar3 = pbVar3 + 1;

pbVar4 = pbVar4 + 1;

}

*pbVar4 = bVar1;

pbVar3 = pbVar3 + 1;

pbVar4 = pbVar4 + 1;

} while (bVar1 != 0);

printf("Input: %s Output: %s ",puVar2,outStr);

return (uint)bVar1;

}

Listing 14-6

C code created by Ghidra for our upper C program

If we run the program, we get the expected output:

pi@raspberrypi:~/asm/Chapter 14 $ make

gcc -O3 -o upperghidra upperghidra.c

pi@raspberrypi:~/asm/Chapter 14 $ ./upperghidra

Input: This is a test!

Output: THIS IS A TEST!

pi@raspberrypi:~/asm/Chapter 14 $

The code produced isn’t pretty. The variable names are generated. It knows tstStr and outStr, because these are global variables. The logic is in smaller steps, often each C statement being the equivalent of a single Assembly instruction. When trying to figure out a program you don’t have the source code for, having a couple of different viewpoints is a great help.

Note

This technique only works for true compiled languages like C, Fortran, or C++. It does not work for interpreted languages like Python or JavaScript; it also doesn’t work for partially compiled languages that use a virtual machine architecture like Java or C#. There are other tools for these and often these are much more effective, since the compile step doesn’t do as much.

Summary

In this chapter, we reviewed where we can find some sample Assembly source code in the Raspbian Linux kernel and the GCC runtime library. We looked at how GCC compiles the division operator from C and what happens when the ARM processor doesn’t support a division instruction. We wrote a C version of our uppercase program, so we could compare the Assembly code that the C compiler produces and compare it to what we have written.

We then looked at the sophisticated Ghidra program for decompiling programs to reverse the process and see what it produces. Although it produces working C code from Assembly code, it isn’t that easy to read.

In Chapter 15, “Thumb Code,” we’ll look at thumb code where we reduce the Assembly instruction size from 32 bits to 16 bits.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 14. Reading and Understanding Code

Create new playlist

Sign In

Sign Up

14. Reading and Understanding Code

Raspbian and GCC

Note

Division Revisited

Note

Note

Code Created by GCC

Reverse Engineering and Ghidra

Note

Summary

Table of Contents for
14. Reading and Understanding Code