In this chapter, we will examine how to organize our code into small independent units called functions . This allows us to build reusable components that we can call easily from anywhere we wish.
Typically, in software development we start with low-level components, then build on these to create higher and higher level applications. So far, we know how to loop, perform conditional logic, and perform some arithmetic. Now, we examine how to compartmentalize our code into building blocks.
We introduce the stack ; this is a computer science data structure for storing data. If we are going to build useful reusable functions, we will need a good way to manage register usage, so that all these functions don’t clobber each other. In Chapter 5, “Thanks for the Memories,” we studied how to store data in a data segment in main memory. The problem with this is that this memory exists for the duration that our program runs. With small functions, like our converting to uppercase program, they run quickly and might need a few memory locations while they run, but when they are done, they don’t need this memory anymore. Stacks provide us a tool to manage register usage across function calls and a tool to provide memory to functions for the duration of their invocation.
We introduce a number of low-level concepts first, then we put them all together to effectively create and use functions.
Stacks on Raspbian
push : Adds an element to the area
pop : Returns and removes the element that was most recently added
This behavior is also called a LIFO (last in first out) queue.
When Raspbian runs a program, it gives it an 8 MB stack. In Chapter 1, “Getting Started,” we mentioned that register R13 had a special purpose as the Stack Pointer (SP) . You might have noticed that R13 is named SP in gdb, and you might have noticed that when you debugged programs, it had a large value, something like 0x7efff380. This is a pointer to the current stack location.
The ARM32 instruction set has two instructions to manipulate stacks, Load Multiple (LDM) and Store Multiple (STM) . These two instructions have quite a few options. These are to support things like whether the stack grows by increasing addresses or by decreasing addresses—, whether SP points to the end of the stack or the next free location on the stack. These options could be useful, if you are creating your own stack, or to match the requirement of a different operating system. But all we want is to work with the stack Raspbian provides us.
Branch with Link
To call a function, we need to set up the ability for the function to return execution to after the point where we called the function. We do this with the other special register we listed in Chapter 1, “Getting Started,” the Link Register (LR) which is R14. To make use of LR, we introduce the Branch with Link (BL) instruction, which is the same as the Branch (B) instruction, except it puts the address of the next instruction into LR before it performs the branch, giving us a mechanism to return from the function.
To return from our function, we use the Branch and Exchange (BX) instruction. This branch instruction takes a register as its argument, allowing us to branch to the address stored in LR to continue processing after the function completes.
Skeleton code to call a function and return
Nesting Function Calls
We successfully called and returned from a function, but we never used the stack. Why did we introduce the stack first and then not use it? First think what happens if in the course of its processing myfunc calls another function. We would expect this to be fairly common, as we write code building on the functionality we’ve previously written. If myfunc executes a BL instruction, then BL will copy the next address into LR overwriting the return address for myfunc and myfunc won’t be able to return. What we need is a way to keep a chain of return addresses as we call function after function. Well, not a chain of return addresses, but a stack of return addresses.
Skeleton code for a function that calls another function
In this example, we see how convenient the stack is to store data that only needs to exist for the duration of a function call.
If a function, such as myfunc, calls other functions, then it must save LR; if it doesn’t call other functions, such as myfunc2, then it doesn’t need to save LR. Programmers often push and pop LR regardless, since if the function is modified later to add a function call and the programmer forgets to add LR to the list of saved registers, then the program will fail to return and either go into an infinite loop or crash. The downside is that there is only so much bandwidth between the CPU and memory, so PUSHing and POPing more registers does take extra execution cycles. The trade-off in speed vs. maintainability is a subjective decision depending on the circumstances.
Function Parameters and Return Values
In high-level languages, functions take parameters and return their results. Assembly language programming is no different. We could invent our own mechanisms to do this, but this is counterproductive. Eventually we will want our code to interoperate with code written in other programming languages. We will want to call our new super-fast functions from C code, and we might want to call functions that were written in C.
To facilitate this, there are a set of design patterns for calling functions. If we follow these, our code will work reliably since others have already worked out all the bugs, plus we achieve the goal of writing interoperable code.
The caller passes the first four parameters in R0, R1, R2, and R3. If there are additional parameters, then they are pushed onto the stack. If we only have two parameters, then we would only use R0 and R1. This means the first four parameters are already loaded into registers and ready to be processed. Additional parameters need to be popped from the stack before being processed.
To return a value to the caller, place it in R0 before returning. If you need to return more data, you would have one of the parameters be an address to a memory location where you can place the additional data to be returned. This is the same as C where you return data through call by reference parameters.
Managing the Registers
R0–R3: These are the function parameters. The function can use these for any other purpose modifying them freely. If the calling routine needs them saved, it must save them itself.
R4–R12: These can be used freely by the called routine, but if it is responsible for saving them. That means the calling routine can assume these registers are intact.
SP: This can be freely used by the called routine. The routine must POP the stack the same number of times that it PUSHes, so it is intact for the calling routine.
LR: The called routine must preserve this as we discussed in the last section.
CPSR: Neither routine can make any assumptions about the CPSR. As far as the called routine is concerned, all the flags are unknown; similarly, they are unknown to the caller when the function returns.
Summary of the Function Call Algorithm
- 1.
If we need any of R0–R4, save them.
- 2.
Move first four parameters into registers R0–R4.
- 3.
Push any additional parameters onto the stack.
- 4.
Use BL to call the function.
- 5.
Evaluate the return code in R0.
- 6.
Restore any of R0–R4 that we saved.
- 1.
PUSH LR and R4–R12 onto the stack.
- 2.
Do our work.
- 3.
Put our return code into R0.
- 4.
POP LR and R4–R12.
- 5.
Use the BX instruction to return execution to the caller.
Note We can save ourselves some steps if we just use R0–R3 for function parameters and return codes and short-term work. Then we never have to save and restore them around function calls.
I specified saving all of LR and R4–R12, which is the safest and most maintainable practice. However, if we know we don’t use some of these registers, we can skip saving them and save some execution time on function entry and exit.
These aren’t all the rules. The coprocessors also have registers that might need saving. We’ll discuss those rules when we discuss the coprocessors.
Uppercase Revisited
Let’s organize our uppercase example from Chapter 5, “Thanks for the Memories,” as a proper function. We’ll move the function into its own file and modify the makefile to make both the calling program and the uppercase function.
Main program for uppercase example
Function to convert strings to all uppercase
Makefile for the uppercase function example
We see copies of registers R4 and R5 on the stack.
From this little exercise, we can see what type of stack Linux uses, namely, it is a descending stack; the addresses get small as the stack grows. Further SP points to the last item saved (and not the next free slot).
Note
The toupper function doesn’t call any other functions, so we don’t save LR along with R4 and R5. If we ever change it to do so, we will need to add LR to the list. This version of toupper is intended to be as fast as possible, so I didn’t add any extra code for future maintainability and safety.
Most C programmers will object that this function is dangerous. If the input string isn’t NULL terminated, then it will overrun the output string buffer, overwriting the memory past the end. The solution is to pass in a third parameter with the buffer lengths and check in the loop that we stop at the end of the buffer if there is no NULL character.
This routine only processes the core ASCII characters. It doesn’t handle the localized characters like é; it won’t be converted to É.
Stack Frames
In our uppercase function, we didn’t need any additional memory, since we could do all our work with the available registers. When we code larger functions, we often require more memory for our variables than fit in the registers. Rather than add clutter to the .data section, we store these variables on the stack.
PUSHing these variables on the stack isn’t practical, since we usually need to access them in a random order, rather than the strict LIFO protocol that PUSH/POP enforce.
to release our variables from the stack. Remember, it is the responsibility of a function to restore SP to its original state before returning.
This is the simplest way to allocate some variables. However, if we are doing a lot of other things with the stack in our function, it can be hard to keep track of these offsets. The way we alleviate this is with a stack frame. Here we allocate a region on the stack and keep a pointer to this region in another register that we will refer to as the Frame Pointer (FP) . You could use any register as the FP, but we will follow the C programming convention and use R11.
When we use FP , we need to include it in the list of registers we PUSH at the beginning of the function and then POP at the end. Since R11, the FP is one we are responsible for saving.
In this book, we’ll tend to not use FP. This saves a couple of cycles on function entry and exit. After all, in Assembly language programming, we want to be efficient.
Stack Frame Example
Simple skeletal function that demonstrates a stack frame
Defining Symbols
In this example, we introduce the .EQU Assembler directive. This directive allows us to define symbols that will be substituted by the Assembler before generating the compiled code. This way, we can make the code more readable. In this example, keeping track of which variable is which on the stack makes the code hard to read and is error-prone. With the .EQU directive, we can define each variable’s offset on the stack once.
Sadly, .EQU only defines numbers, so we can’t define the whole “[SP, #4]” type string.
One More Optimization
You might notice that our SUMFN doesn’t end in “BX LR”. This is a little optimization. The BX instruction basically moves LR into PC, so why not just POP LR directly into PC? Notice this is what the POP instruction at the end of the routine does. If we pushed LR, we can save an instruction this way. This works fine as long as the caller is regular ARM32 Assembly code. There is another type of code called Thumb code which we will look at in Chapter 15, “Thumb Code.” BX lets us return to a caller that is running in Thumb mode, where popping to PC won’t cause the processor to change how it interprets instructions.
Macros
Program to call our toupper macro
Macro version of our toupper function
Include Directive
The file uppermacro.s defines our macro to convert a string to uppercase. The macro doesn’t generate any code; it just defines the macro for the Assembler to insert wherever it is called from. This file doesn’t generate an object (∗.o) file; rather, it is included by whichever file needs to use it.
takes the contents of this file and inserts it at this point, so that our source file becomes larger. This is done before any other processing. This is similar to the C #include preprocessor directive.
Macro Definition
You can see how the parameters are used in the code with instr and oustr. These are text substitutions and need to result in correct Assembly syntax or you will get an error.
Labels
The f after the 2 means the next label 2 in the forward direction. The 1b means the next label 1 in the backward direction.
To prove that this works, we call toupper twice in the mainmacro.s file to show everything works and that we can reuse this macro as many times as we like.
Why Macros?
you will see the two copies of code inserted. With functions, there is no extra code generated each time. This is why functions are quite appealing, even with the extra work of dealing with the stack.
The reason macros get used is performance. Most Raspberry Pi models have a gigabyte or more of memory that is room for a lot of copies of code. Remember that whenever we branch, we have to restart the execution pipeline, making branching an expensive instruction. With macros, we eliminate the BL branch to call the function and the BX branch to return. We also eliminate the PUSH and POP instructions to save and restore any registers we use. If a macro is small and we use it a lot, there could be considerable execution time savings.
Note
Notice in the macro implementation of toupper that I only used registers R0–R3. This is to try and avoid using any registers important to the caller. There is no standard on how to regulate register usage with macros, like there is with functions, so it is up to you, the programmer, to avoid conflicts and strange bugs.
Summary
In this chapter, we covered the ARM stack and how it is used to help implement functions. We covered how to write and call functions as a first step to creating libraries of reusable code. We learned how to manage register usage, so there aren’t any conflicts between our calling programs and our functions. We learned the function calling protocol that will allow us to interoperate with other programming languages. We looked at defining stack-based storage for local variables and how to use this memory.
Finally, we covered the GNU Assembler’s macro ability as an alternative to functions in certain performance critical applications.