In this chapter, we will examine how to organize our code into small independent units called functions. This allows us to build reusable components, which we can call easily form anywhere we wish by setting up parameters and calling them.
Typically, in software development, we start with low-level components. Then we build on these to create higher- and higher-level modules. So far, we know how to loop, perform conditional logic, and perform some arithmetic. Now, we examine how to compartmentalize code into building blocks.
We introduce the stack, a computer science data structure for storing data. If we’re going to build useful reusable functions, we need a good way to manage register usage, so that all these functions don’t clobber each other. In Chapter 5, “Thanks for the Memories,” we studied how to store data in a data segment in main memory. The problem with this is that this memory exists for the duration that the program runs. With small functions, like converting to upper-case, they run quickly; thus they might need a few memory locations while they run, but when they’re done, they don’t need this memory anymore. Stacks provide us a tool to manage register usage across function calls and a tool to provide memory to functions for the duration of their invocation.
We introduce several low-level concepts first, and then we put them all together to effectively create and use functions. First up is the abstract data type called a stack that is a convenient mechanism to store data for the duration of a function call.
Stacks on Linux
Push: Adds an element to the area
Pop: Returns and removes the element that was most recently added
This behavior is also called a LIFO (last in first out) queue.
When Linux runs a program, it gives it an 8-megabyte stack. In Chapter 1, “Getting Started,” we mentioned that register X31 had a special purpose as both the zero register and the stack pointer (SP). You might have noticed that X31 is named SP in gdb and that when you debugged programs, it had a large value, something like 0x7ffffff230. This is a pointer to the current stack location.
The ARM instruction set has a handful of instructions to manipulate the stack; remember that any instruction that doesn’t operate on the stack sees it as the zero register. There are two instructions to place registers on the stack, STR and STP, and then two instructions to retrieve items from the stack into registers, LDR and LDP. We studied all these instructions in Chapter 5, “Thanks for the Memories,” but here we’ll use specific forms to copy data to and from the stack and to adjust the stack pointer appropriately.
The ARM hardware requires that SP is always 16-byte aligned. This means we can only add and subtract from SP with multiples of 16. If we use SP when it isn’t 16-byte aligned, we will get a bus error and our program will terminate.
The convention for the stack is that SP points to the last element on the stack and the stack grows downward. This is why SP contains a large address. The STR instruction copies X0 to the memory location at SP – 16 and then updates SP to contain this address since the stored value is now the last value on the stack. We’re wasting 8 bytes here, since X0 is only 8 bytes in size. To keep the proper alignment, we must use 16 bytes.
This does the reverse operation. It moves the data pointed to by SP from the stack to X0 and then adds 16 to the SP.
since we aren’t wasting any space on the stack. But it does take longer to transfer 16 bytes to memory than 8 bytes.
The LDR, LDP, STR, and STP instructions are powerful general-purpose instructions that support stacks that grow in either direction or can be based on any register. Plus, they have all the functionality we covered in Chapter 5, “Thanks for the Memories.” In our usage, we want to implement them exactly as prescribed, so we work well in the Linux environment and can interact with code written in another language by other programmers. Now we’ll get into the details of calling functions and see how the stack fits into this with the branch with link instruction.
Branch with Link
To call a function, we need to set up the ability for the function to return execution to after the point where we called the function. We do this with the other special register we listed in Chapter 1, “Getting Started,” the link register (LR) which is X30. To make use of LR, we introduce the branch with link (BL) instruction, which is the same as the branch (B) instruction, except it puts the address of the next instruction into LR before it performs the branch, giving a mechanism to return from the function.
To return from the function, we use the return (RET) instruction. This instruction branches to the address stored in LR to return from the function. It’s important to use this instruction rather than some other branch instruction, because the instruction pipeline knows about RET instructions and knows to continue processing instructions from where LR points. This way we don’t have a performance penalty for returning from functions.
Skeleton code to call a function and return
There is only one LR, so you might be wondering what happens if another function is called? How do we preserve the original value of LR when function calls are nested?
Nesting Function Calls
We successfully called and returned from a function, but we never used the stack. Why did we introduce the stack first and then not use it? First of all, think of what happens if in the course of its processing, myfunc calls another function. We would expect this to be fairly common, as we write code building on the functionality we’ve previously written. If myfunc executes a BL instruction, then BL will copy the next address into LR overwriting the return address for myfunc and myfunc won’t be able to return. What we need is a way to keep a chain of return addresses as we call function after function. Well, not a chain of return addresses, but a stack of return addresses.
Skeleton code for a function that calls another function
In this example, we see how convenient the stack is to store data that only needs to exist for the duration of a function call.
If a function, such as myfunc, calls other functions then it must save LR; if it doesn’t call other functions, such as myfunc2, then it doesn’t need to save LR. Programmers often push and pop LR regardless, since if the function is modified later to add a function call, and the programmer forgets to add LR to the list of saved registers, then the program will fail to return and either go into an infinite loop or crash. The downside is that there’s only so much bandwidth between the CPU and memory, so PUSHing and POPing more registers does take extra execution cycles. The trade-off in speed vs. maintainability is a subjective decision depending on the circumstances.
Calling and returning from the function is only half the story. Like in high-level languages, we need to pass parameters (data) into our functions to be processed and then receive the results of the processing back in return values. Now we’ll look at how to do this.
Function Parameters and Return Values
In high-level languages, functions take parameters and return their results. Assembly Language programming is no different. We could invent our own mechanisms to do this, but this is counterproductive. Eventually, we will want the code to interoperate with code written in other programming languages. We will want to call the new super-fast functions from C code, and we might want to call functions that were written in C.
To facilitate this, there are a set of design patterns for calling functions. If we follow these, the code will work reliably since others have already worked out all the bugs, plus we achieve the goal of writing interoperable code.
The caller passes the first eight parameters in X0 to X7. If there are additional parameters, then they are pushed onto the stack. If we only have two parameters, then we would only use X0 and X1. This means the first eight parameters are already loaded into registers and ready to be processed. Additional parameters need to be popped from the stack before being processed.
To return a value to the caller, place it in X0 before returning. In fact, you can return a 128-bit integer in the X0, X1 register pair. If you need to return more data, you would have one of the parameters be an address to a memory location where you can place the additional data to be returned. This is the same as C where you return data through call by reference parameters.
Since both the caller and callee are using the same set of general-purpose registers, we need a protocol or convention to ensure that one doesn’t overwrite the working data of the other. Next, we’ll look at the register management convention for the ARM processor.
Managing the Registers
X0–X7: These are the function parameters. The function can use these for any other purpose modifying them freely. If the calling routine needs them saved, it must save them itself.
X0–X18: Corruptible registers that a function is free to use without saving. If a caller needs these, then it is responsible for saving them.
X19–X30: These are callee saved, so must be pushed to the stack if used in a function.
SP: This can be freely used by the called routine. The routine must POP the stack the same number of times that it PUSHes, so it’s intact for the calling routine.
LR: The called routine must preserve this as we discussed in the last section.
Condition flags: Neither routine can make any assumptions about the condition flags. As far as the called routine is concerned, all the flags are unknown; similarly they are unknown to the caller when the function returns.
Summary of the Function Call Algorithm
- 1.
If we need any of X0–X18, save them.
- 2.
Move first eight parameters into registers X0–X7.
- 3.
Push any additional parameters onto the stack.
- 4.
Use BL to call the function.
- 5.
Evaluate the return code in X0.
- 6.
Restore any of X0–X18 that we saved.
- 1.
PUSH LR and X19–X30 onto the stack if used in the routine.
- 2.
Do our work.
- 3.
Put our return code into X0.
- 4.
POP LR and X19–X30 if pushed in step 1.
- 5.
Use the RET instruction to return execution to the caller.
We can save steps if we just use X0–X18 for function parameters, return codes, and short-term work. Then we never have to save and restore them around function calls.
These aren’t all the rules. The coprocessors also have registers that might need saving. We’ll discuss those rules when we discuss the coprocessors.
Let’s look at a practical example by converting our upper-case program into a function that we can call with parameters to convert any strings we wish.
Upper-Case Revisited
Let’s organize our upper-case example from Chapter 5, “Thanks for the Memories,” as a proper function. We’ll move the function into its own file and modify the makefile to make both the calling program and the upper-case function.
Main program for upper-case example
Function to convert strings to all upper-case
Makefile for the upper-case function example
The toupper function doesn’t call any other functions, so we don’t save LR. If we ever change it to do so, we need to push LR to the stack and pop it before we return. Since X0–X18 are all corruptible, we have plenty of general-purpose registers to use without needing to save any.
Most C programmers will object that this function is dangerous. If the input string isn’t NULL terminated, then it will overrun the output string buffer—overwriting the memory past the end. The solution is to pass in a third parameter with the buffer lengths and check in the loop that we stop at the end of the buffer if there is no NULL character.
This routine only processes the core ASCII characters. It doesn’t handle the localized characters, for example, é won’t be converted to É.
In the upper-case function, we didn’t need any additional memory, since we could do all the work with the available registers. When we code larger functions, we often require more memory for the variables than fit in the registers. Rather than add clutter to the .data section, we store these variables on the stack. The section of the stack that holds our local variables is called a stack frame.
Stack Frames
Stacks work great for saving and restoring registers, but to work well for other data, we need the concept of a stack frame. Here we allocate a block or frame of memory on the stack that we use to store our variables. This is an efficient mechanism to allocate some memory at the start of a function and then release it before we return.
PUSHing variables on the stack isn’t practical, since we need to access them in a random order, rather than the strict LIFO protocol that PUSH/POP enforce.
to release our variables from the stack. Remember, it is the responsibility of a function to restore SP to its original state before returning.
This is the simplest way to allocate some variables. However, if we are doing a lot of other things with the stack in our function, it can be hard to keep track of these offsets. The way to alleviate this is with a stack frame. Here we allocate a region on the stack and keep a pointer to this region in another register that we will refer to as the frame pointer (FP). You could use any register as the FP, but we will follow the C programming convention and use X29.
When using FP, include it in the list of registers we PUSH at the beginning of the function and then POP at the end. Since X29, the FP is one we are responsible for saving. One good thing about using FP is that it isn’t required to be 16-byte aligned.
In this book, we’ll tend to NOT use FP. This saves a couple of cycles on function entry and exit. After all, in Assembly Language programming, we want to be efficient.
Stack Frame Example
Simple skeletal function that demonstrates a stack frame
Defining Symbols
In this example, we introduce the .EQU Assembler directive. This directive allows us to define symbols that will be substituted by the Assembler before generating the compiled code. This way we can make the code more readable. In this example, keeping track of which variable is which on the stack makes the code hard to read and error-prone. With the .EQU directive, we can define each variable’s offset on the stack once.
Sadly, .EQU only defines numbers, so we can’t define the whole “[SP, #4]” type string.
Macros
Program to call our toupper macro
Macro version of our toupper function
Include Directive
The file uppermacro.s defines the macro to convert a string to upper-case. The macro doesn’t generate any code; it just defines the macro for the Assembler to insert wherever it is called from. This file doesn’t generate an object (∗.o) file; rather it is included by whichever file needs to use it.
takes the contents of this file and inserts it at this point, so that the source file becomes larger. This is done before any other processing. This is like the C #include preprocessor directive.
Macro Definition
The parameters are used in the code with instr and oustr. These are text substitutions and need to result in correct Assembly syntax or you will get an error.
Labels
The f after the 2 means the next label 2 in the forward direction. The 1b means the next label 1 in the backward direction.
To prove that this works, we call toupper twice in the mainmacro.s file, to show everything works and that we can reuse this macro as many times as we like.
Why Macros?
two copies of code are inserted. With functions, there is no extra code generated each time. This is why functions are quite appealing, even with the extra work of dealing with the stack.
The reason macros get used is performance. Most ARM devices have a gigabyte or more of memory—a lot of room for multiple copies of code. Remember that whenever we branch, we must restart the execution pipeline, making branching an expensive instruction. With macros, we eliminate the BL branch to call the function and the RET branch to return. We also eliminate any instructions to save and restore the registers we use. If a macro is small and we use it a lot, there could be considerable execution time savings.
Notice in the macro implementation of toupper that only the registers X0–X3 were used. This avoids using any registers important to the caller. There is no standard on how to regulate register usage with macros, like there’s with functions, so it is up to you the programmer to avoid conflicts and strange bugs.
We can also use macros to make the code more readable and easier to write, as described in the next section.
Macros to Improve Code
Define four macros for pushing and popping the stack
Use our push and pop macros
This makes writing the function prologues and epilogues easier and clearer.
Summary
In this chapter, we covered the ARM stack and how it’s used to help implement functions. We covered how to write and call functions as a first step to creating libraries of reusable code. We learned how to manage register usage, so there aren’t any conflicts between calling programs and functions. We learned the function calling protocol, which allows us to interoperate with other programming languages. Also, we looked at defining stack-based storage for local variables and how to use this memory.
Finally, we covered the GNU Assembler’s macro ability as an alternative to functions in certain performance critical applications.
Exercises
- 1.
If we are coding for an operating system where the stack grows upward, how would we code the LDR, LDP, STR, and STP instructions?
- 2.
Suppose we have a function that uses registers X4, X5, W20, X23, and W27. Further this function calls other functions. Code the prologue and epilogue of this function to store and restore the correct registers to/from the stack.
- 3.
Write a function to convert text to all lower-case. Have this function in one file and a main program in another file. In the main program, call the function three times with different test strings.
- 4.
Convert the lower-case program in Exercise 3 to a macro. Have it run on the same three test strings to ensure it works properly.
- 5.
Why does the function calling protocol have some registers need to be saved by the caller and some by the callee? Why not make all saved by one or the other?