© Stephen Smith 2019
S. SmithRaspberry Pi Assembly Language Programminghttps://doi.org/10.1007/978-1-4842-5287-1_6

6. Functions and the Stack

Stephen Smith1 
(1)
Gibsons, BC, Canada
 

In this chapter, we will examine how to organize our code into small independent units called functions . This allows us to build reusable components that we can call easily from anywhere we wish.

Typically, in software development we start with low-level components, then build on these to create higher and higher level applications. So far, we know how to loop, perform conditional logic, and perform some arithmetic. Now, we examine how to compartmentalize our code into building blocks.

We introduce the stack ; this is a computer science data structure for storing data. If we are going to build useful reusable functions, we will need a good way to manage register usage, so that all these functions don’t clobber each other. In Chapter 5, “Thanks for the Memories,” we studied how to store data in a data segment in main memory. The problem with this is that this memory exists for the duration that our program runs. With small functions, like our converting to uppercase program, they run quickly and might need a few memory locations while they run, but when they are done, they don’t need this memory anymore. Stacks provide us a tool to manage register usage across function calls and a tool to provide memory to functions for the duration of their invocation.

We introduce a number of low-level concepts first, then we put them all together to effectively create and use functions.

Stacks on Raspbian

In computer science, a stack is an area of memory where there are two operations:
  • push : Adds an element to the area

  • pop : Returns and removes the element that was most recently added

This behavior is also called a LIFO (last in first out) queue.

When Raspbian runs a program, it gives it an 8 MB stack. In Chapter 1, “Getting Started,” we mentioned that register R13 had a special purpose as the Stack Pointer (SP) . You might have noticed that R13 is named SP in gdb, and you might have noticed that when you debugged programs, it had a large value, something like 0x7efff380. This is a pointer to the current stack location.

The ARM32 instruction set has two instructions to manipulate stacks, Load Multiple (LDM) and Store Multiple (STM) . These two instructions have quite a few options. These are to support things like whether the stack grows by increasing addresses or by decreasing addresses—, whether SP points to the end of the stack or the next free location on the stack. These options could be useful, if you are creating your own stack, or to match the requirement of a different operating system. But all we want is to work with the stack Raspbian provides us.

Fortunately, the GNU Assembler offers simpler pseudo-instructions that are mapped back to the correct forms of LDM and STM. These are
      PUSH   {reglist}
      POP    {reglist}
The {reglist} parameter is a list of registers, containing a comma-separated list of registers and register ranges. A register range is something like R2R4, which means R2, R3, and R4, for example:
      PUSH   {r0, r5-r12}
      POP {r0-r4, r6, r9-r12}
The registers are stored on the stack in numerical order, with the lowest register at the lowest address. You shouldn’t include PC or SP in this list. Figure 6-1 shows the process of pushing a register onto the stack and then Figure 6-2 shows the reverse operation of popping that value off the stack.
../images/486919_1_En_6_Chapter/486919_1_En_6_Fig1_HTML.jpg
Figure 6-1

Pushing R5 onto the stack

../images/486919_1_En_6_Chapter/486919_1_En_6_Fig2_HTML.jpg
Figure 6-2

Popping R4 from the stack

Branch with Link

To call a function, we need to set up the ability for the function to return execution to after the point where we called the function. We do this with the other special register we listed in Chapter 1, “Getting Started,” the Link Register (LR) which is R14. To make use of LR, we introduce the Branch with Link (BL) instruction, which is the same as the Branch (B) instruction, except it puts the address of the next instruction into LR before it performs the branch, giving us a mechanism to return from the function.

To return from our function, we use the Branch and Exchange (BX) instruction. This branch instruction takes a register as its argument, allowing us to branch to the address stored in LR to continue processing after the function completes.

In Listing 6-1, the BL instruction stores the address of the following MOV instruction into LR and then branches to myfunc. Myfunc does the useful work the function was written to do, then returns execution to the caller by having BX branch to the location stored in LR, which is the MOV instruction following the BL instruction.
      @ ... other code ...
      BL    myfunc
      MOV   R1, #4
      @ ... more code ...
-----------------------------
myfunc:      @ do some work
             BX LR
Listing 6-1

Skeleton code to call a function and return

Nesting Function Calls

We successfully called and returned from a function, but we never used the stack. Why did we introduce the stack first and then not use it? First think what happens if in the course of its processing myfunc calls another function. We would expect this to be fairly common, as we write code building on the functionality we’ve previously written. If myfunc executes a BL instruction, then BL will copy the next address into LR overwriting the return address for myfunc and myfunc won’t be able to return. What we need is a way to keep a chain of return addresses as we call function after function. Well, not a chain of return addresses, but a stack of return addresses.

If myfunc is going to call other functions, then it needs to push LR onto the stack as the first thing it does and pop it from the stack just before it returns, for example, Listing 6-2 shows this process.
      @ ... other code ...
      BL    myfunc
      MOV   R1, #4
      @ ... more code ...
-----------------------------
myfunc:     PUSH {LR}
            @ do some work ...
            BL    myfunc2
            @ do some more work...
            POP {LR}
            BX LR
myfunc2:    @ do some work ....
            BX LR
Listing 6-2

Skeleton code for a function that calls another function

In this example, we see how convenient the stack is to store data that only needs to exist for the duration of a function call.

If a function, such as myfunc, calls other functions, then it must save LR; if it doesn’t call other functions, such as myfunc2, then it doesn’t need to save LR. Programmers often push and pop LR regardless, since if the function is modified later to add a function call and the programmer forgets to add LR to the list of saved registers, then the program will fail to return and either go into an infinite loop or crash. The downside is that there is only so much bandwidth between the CPU and memory, so PUSHing and POPing more registers does take extra execution cycles. The trade-off in speed vs. maintainability is a subjective decision depending on the circumstances.

Function Parameters and Return Values

In high-level languages, functions take parameters and return their results. Assembly language programming is no different. We could invent our own mechanisms to do this, but this is counterproductive. Eventually we will want our code to interoperate with code written in other programming languages. We will want to call our new super-fast functions from C code, and we might want to call functions that were written in C.

To facilitate this, there are a set of design patterns for calling functions. If we follow these, our code will work reliably since others have already worked out all the bugs, plus we achieve the goal of writing interoperable code.

The caller passes the first four parameters in R0, R1, R2, and R3. If there are additional parameters, then they are pushed onto the stack. If we only have two parameters, then we would only use R0 and R1. This means the first four parameters are already loaded into registers and ready to be processed. Additional parameters need to be popped from the stack before being processed.

To return a value to the caller, place it in R0 before returning. If you need to return more data, you would have one of the parameters be an address to a memory location where you can place the additional data to be returned. This is the same as C where you return data through call by reference parameters.

Managing the Registers

If you call a function, chances are it was written by a different programmer and you don’t know what registers it will use. It would be very inefficient, if you had to reload all your registers every time you call a function. As a result, there are a set of rules to govern which registers a function can use and who is responsible for saving each one:
  • R0R3: These are the function parameters. The function can use these for any other purpose modifying them freely. If the calling routine needs them saved, it must save them itself.

  • R4R12: These can be used freely by the called routine, but if it is responsible for saving them. That means the calling routine can assume these registers are intact.

  • SP: This can be freely used by the called routine. The routine must POP the stack the same number of times that it PUSHes, so it is intact for the calling routine.

  • LR: The called routine must preserve this as we discussed in the last section.

  • CPSR: Neither routine can make any assumptions about the CPSR. As far as the called routine is concerned, all the flags are unknown; similarly, they are unknown to the caller when the function returns.

Summary of the Function Call Algorithm

Calling routine
  1. 1.

    If we need any of R0R4, save them.

     
  2. 2.

    Move first four parameters into registers R0R4.

     
  3. 3.

    Push any additional parameters onto the stack.

     
  4. 4.

    Use BL to call the function.

     
  5. 5.

    Evaluate the return code in R0.

     
  6. 6.

    Restore any of R0R4 that we saved.

     
Called function
  1. 1.

    PUSH LR and R4R12 onto the stack.

     
  2. 2.

    Do our work.

     
  3. 3.

    Put our return code into R0.

     
  4. 4.

    POP LR and R4R12.

     
  5. 5.

    Use the BX instruction to return execution to the caller.

    Note We can save ourselves some steps if we just use R0R3 for function parameters and return codes and short-term work. Then we never have to save and restore them around function calls.

    I specified saving all of LR and R4R12, which is the safest and most maintainable practice. However, if we know we don’t use some of these registers, we can skip saving them and save some execution time on function entry and exit.

    These aren’t all the rules. The coprocessors also have registers that might need saving. We’ll discuss those rules when we discuss the coprocessors.

     

Uppercase Revisited

Let’s organize our uppercase example from Chapter 5, “Thanks for the Memories,” as a proper function. We’ll move the function into its own file and modify the makefile to make both the calling program and the uppercase function.

First create a file called main.s containing Listing 6-3 for the driving application.
@
@ Assembler program to convert a string to
@ all uppercase by calling a function.
@
@ R0-R2 - parameters to linux function services
@ R1 - address of output string
@ R0 - address of input string
@ R5 - current character being processed
@ R7 - linux function number
@
.global _start    @ Provide program starting address
_start: LDR  R0, =instr @ start of input string
      LDR    R1, =outstr @ address of output string
      BL     toupper
@ Set up the parameters to print our hex number
@ and then call Linux to do it.
      MOV   R2,R0  @ return code is the length of the string
      MOV   R0, #1        @ 1 = StdOut
      LDR   R1, =outstr @ string to print
      MOV   R7, #4       @ linux write system call
      SVC   0         @ Call linux to output the string
@ Set up the parameters to exit the program
@ and then call Linux to do it.
      MOV     R0, #0      @ Use 0 return code
      MOV     R7, #1     @ Command code 1 terminates
      SVC     0        @ Call linux to terminate the program
.data
instr:  .asciz  "This is our Test String that we will convert. "
outstr:      .fill 255, 1, 0
Listing 6-3

Main program for uppercase example

Now create a file called upper.s containing Listing 6-4, the uppercase conversion function.
@
@ Assembler program to convert a string to
@ all uppercase.
@
@ R1 - address of output string
@ R0 - address of input string
@ R4 - original output string for length calc.
@ R5 - current character being processed
@
.global toupper     @ Allow other files to call this routine
toupper:    PUSH    {R4-R5} @ Save the registers we use.
      MOV   R4, R1
@ The loop is until byte pointed to by R1 is non-zero
loop: LDRB  R5, [R0], #1      @ load character and increment pointer
@ If R5 > 'z' then goto cont
      CMP   R5, #'z'      @ is letter > 'z'?
      BGT   cont
@ Else if R5 < 'a' then goto end if
      CMP   R5, #'a'
      BLT   cont  @ goto to end if
@ if we got here then the letter is lower case, so convert it.
      SUB   R5, #('a'-'A')
cont: @ end if
      STRB  R5, [R1], #1      @ store character to output str
      CMP   R5, #0            @ stop on hitting a null character
      BNE   loop        @ loop if character isn't null
      SUB   R0, R1, R4  @ get the length by subtracting the pointers
      POP   {R4-R5}     @ Restore the register we use.
      BX    LR          @ Return to caller
Listing 6-4

Function to convert strings to all uppercase

To build these, use the makefile in Listing 6-5.
UPPEROBJS = main.o upper.o
ifdef DEBUG
DEBUGFLGS = -g
else
DEBUGFLGS =
endif
LSTFLGS =
all: upper
%.o : %.s
      as $(DEBUGFLGS) $(LSTFLGS) $< -o $@
upper: $(UPPEROBJS)
      ld -o upper $(UPPEROBJS)
Listing 6-5

Makefile for the uppercase function example

Let’s step through the function call and examine the contents of important registers and the stack. We set a breakpoint at _start and single-step through the first couple of instructions and stop at the BL instruction. I set R4 to 12 and R5 to 13, so we can follow how these are saved to the stack.
r4             0xc                 12
r5             0xd                 13
sp             0x7efff380          0x7efff380
lr             0x0                 0
pc             0x10084             0x10084 <_start+16>
We see the BL instruction is at 0x10084. Now let’s single-step again to execute the BL instruction. Here are the same registers:
r4             0xc                 12
r5             0xd                 13
sp             0x7efff380          0x7efff380
lr             0x10088             65672
pc             0x100b0             0x100b0 <toupper>
The LR has been set to 0x10088 which is the instruction after the BL instruction (0x10084+4). The PC is now 0x100b0, pointing to the first instruction in the toupper routine. The first instruction in toupper is the PUSH instruction to save registers R4 and R5. Let’s single-step through that instruction and examine the registers again.
r4             0xc                 12
r5             0xd                 13
sp             0x7efff378          0x7efff378
lr             0x10088             65672
pc             0x100b4             0x100b4 <toupper+4>
We see that the stack pointer (SP) has been decremented by 8 bytes (two words) to 0x7efff378. None of the other registers have changed. Pushing registers onto the stack does not affect their values; it only saves them. If we look at location 0x7efff378, we see
(gdb) x /4xw 0x7efff378
0x7efff378:   0x0000000c   0x0000000d   0x00000001   0x7efff504

We see copies of registers R4 and R5 on the stack.

From this little exercise, we can see what type of stack Linux uses, namely, it is a descending stack; the addresses get small as the stack grows. Further SP points to the last item saved (and not the next free slot).

Note

The toupper function doesn’t call any other functions, so we don’t save LR along with R4 and R5. If we ever change it to do so, we will need to add LR to the list. This version of toupper is intended to be as fast as possible, so I didn’t add any extra code for future maintainability and safety.

Most C programmers will object that this function is dangerous. If the input string isn’t NULL terminated, then it will overrun the output string buffer, overwriting the memory past the end. The solution is to pass in a third parameter with the buffer lengths and check in the loop that we stop at the end of the buffer if there is no NULL character.

This routine only processes the core ASCII characters. It doesn’t handle the localized characters like é; it won’t be converted to É.

Stack Frames

In our uppercase function, we didn’t need any additional memory, since we could do all our work with the available registers. When we code larger functions, we often require more memory for our variables than fit in the registers. Rather than add clutter to the .data section, we store these variables on the stack.

PUSHing these variables on the stack isn’t practical, since we usually need to access them in a random order, rather than the strict LIFO protocol that PUSH/POP enforce.

To allocate space on the stack, we use a subtract instruction to grow the stack by the amount we need. Suppose we need three variables which are each 32-bit integers, say a, b, and c. Therefore, we need 12 bytes allocated on the stack (3 variables x 4 bytes/word).
      SUB   SP, #12
This moves the stack pointer down by 12 bytes, providing us a region of memory on the stack to place our variables. Suppose a is in R0, b in R1, and c in R2, we can then store these using
      STR   R0, [SP]            @ Store a
      STR   R1, [SP, #4]        @ Store b
      STR   R2, [SP, #8]        @ Store c
Before the end of the function, we need to execute
      ADD   SP, #12

to release our variables from the stack. Remember, it is the responsibility of a function to restore SP to its original state before returning.

This is the simplest way to allocate some variables. However, if we are doing a lot of other things with the stack in our function, it can be hard to keep track of these offsets. The way we alleviate this is with a stack frame. Here we allocate a region on the stack and keep a pointer to this region in another register that we will refer to as the Frame Pointer (FP) . You could use any register as the FP, but we will follow the C programming convention and use R11.

To use a stack frame, we first set our frame pointer to the next free spot on the stack (it grows in descending addresses), then we allocate the space as before:
      SUB   FP, SP, #4
      SUB   SP, #12
Now we address our variables using an offset from FP.
      STR   R0, [FP]               @ Store a
      STR   R1, [FP, #-4]          @ Store b
      STR   R2, [FP, #-8]          @ Store c

When we use FP , we need to include it in the list of registers we PUSH at the beginning of the function and then POP at the end. Since R11, the FP is one we are responsible for saving.

In this book, we’ll tend to not use FP. This saves a couple of cycles on function entry and exit. After all, in Assembly language programming, we want to be efficient.

Stack Frame Example

Listing 6-6 is a simple skeletal example of a function that creates three variables on the stack.
@ Simple function that takes 2 parameters
@ VAR1 and VAR2. The function adds them,
@ storing the result in a variable SUM.
@ The function returns the sum.
@ It is assumed this function does other work,
@ including other functions.
@ Define our variables
            .EQU   VAR1, 0
            .EQU   VAR2, 4
            .EQU   SUM,  8
SUMFN:      PUSH   {R4-R12, LR}
            SUB    SP, #12      @ room for three 32-bit values
            STR    R0, [SP, #VAR1]    @ save passed in param.
            STR    R1, [SP, #VAR2]    @ save second param.
@ Do a bunch of other work, but don't change SP.
            LDR    R4, [SP, #VAR1]
            LDR    R5, [SP, #VAR2]
            ADD    R6, R4, R5
            STR    R6, [SP, #SUM]
@ Do other work
@ Function Epilog
            LDR    R0, [SP, #SUM]     @ load sum to return
            ADD    SP, #12     @ Release local vars
            POP    {R4-R12, PC} @ Restore regs and return
Listing 6-6

Simple skeletal function that demonstrates a stack frame

Defining Symbols

In this example, we introduce the .EQU Assembler directive. This directive allows us to define symbols that will be substituted by the Assembler before generating the compiled code. This way, we can make the code more readable. In this example, keeping track of which variable is which on the stack makes the code hard to read and is error-prone. With the .EQU directive, we can define each variable’s offset on the stack once.

Sadly, .EQU only defines numbers, so we can’t define the whole “[SP, #4]” type string.

One More Optimization

You might notice that our SUMFN doesn’t end in “BX LR”. This is a little optimization. The BX instruction basically moves LR into PC, so why not just POP LR directly into PC? Notice this is what the POP instruction at the end of the routine does. If we pushed LR, we can save an instruction this way. This works fine as long as the caller is regular ARM32 Assembly code. There is another type of code called Thumb code which we will look at in Chapter 15, “Thumb Code.” BX lets us return to a caller that is running in Thumb mode, where popping to PC won’t cause the processor to change how it interprets instructions.

Macros

Another way to make our uppercase loop into a reusable bit of code is to use macros. The GNU Assembler has a powerful macro capability; with macros rather than calling a function, the Assembler creates a copy of the code in each place where it is called, substituting any parameters. Consider this alternate implementation of our uppercase program; the first file is mainmacro.s containing the contents of Listing 6-7.
@
@ Assembler program to convert a string to
@ all uppercase by calling a macro.
@
@ R0-R2 - parameters to linux function services
@ R1 - address of output string
@ R0 - address of input string
@ R7 - linux function number
@
.include "uppermacro.s"
.global _start       @ Provide program starting address
_start:      toupper tststr, buffer
@ Set up the parameters to print our hex number
@ and then call Linux to do it.
      MOV   R2,R0  @ R0 is the length of the string
      MOV   R0, #1        @ 1 = StdOut
      LDR   R1, =buffer @ string to print
      MOV   R7, #4       @ linux write system call
      SVC   0       @ Call linux to output the string
@ Call it a second time with our second string.
      toupper tststr2, buffer
@ Set up the parameters to print our hex number
@ and then call Linux to do it.
      MOV   R2,R0     @ R0 is the length of the string
      MOV   R0, #1            @ 1 = StdOut
      LDR   R1, =buffer @ string to print
      MOV   R7, #4            @ linux write system call
      SVC   0         @ Call linux to output the string
@ Set up the parameters to exit the program
@ and then call Linux to do it.
      MOV     R0, #0     @ Use 0 return code
      MOV     R7, #1    @ Service command code 1 terminates this program
      SVC     0  @ Call linux to terminate
.data
tststr:  .asciz  "This is our Test String that we will convert. "
tststr2: .asciz     "A second string to uppercase!! "
buffer:      .fill 255, 1, 0
Listing 6-7

Program to call our toupper macro

The macro to uppercase the string is in uppermacro.s containing Listing 6-8.
@
@ Assembler program to convert a string to
@ all uppercase.
@
@ R1 - address of output string
@ R0 - address of input string
@ R2 - original output string for length calc.
@ R3 - current character being processed
@
@ label 1 = loop
@ label 2 = cont
.MACRO      toupper      instr, outstr
      LDR   R0, =instr
      LDR   R1, =outstr
      MOV   R2, R1
@ The loop is until byte pointed to by R1 is non-zero
1:    LDRB  R3, [R0], #1      @ load character and increment pointer
@ If R5 > 'z' then goto cont
      CMP   R3, #'z'        @ is letter > 'z'?
      BGT   2f
@ Else if R5 < 'a' then goto end if
      CMP   R3, #'a'
      BLT   2f    @ goto to end if
@ if we got here then the letter is lower-case, so convert it.
      SUB   R3, #('a'-'A')
2:    @ end if
      STRB  R3, [R1], #1  @ store character to output str
      CMP   R3, #0        @ stop on hitting a null character
      BNE   1b            @ loop if character isn't null
      SUB   R0, R1, R2 @ get the length by subtracting the pointers
.ENDM
Listing 6-8

Macro version of our toupper function

Include Directive

The file uppermacro.s defines our macro to convert a string to uppercase. The macro doesn’t generate any code; it just defines the macro for the Assembler to insert wherever it is called from. This file doesn’t generate an object (∗.o) file; rather, it is included by whichever file needs to use it.

The .include directive
.include "uppermacro.s"

takes the contents of this file and inserts it at this point, so that our source file becomes larger. This is done before any other processing. This is similar to the C #include preprocessor directive.

Macro Definition

A macro is defined with the .MACRO directive. This gives the name of the macro and lists its parameters. The macro ends at the following .ENDM directive. The form of the directive is
.MACRO      macroname     parameter1, parameter2, ...
Within the macro, you specify the parameters by preceding their name with a backslash, for instance, parameter1 to place the value of parameter1. Our toupper macro defines two parameters instr and outstr:
.MACRO      toupper     instr, outstr

You can see how the parameters are used in the code with instr and oustr. These are text substitutions and need to result in correct Assembly syntax or you will get an error.

Labels

Our labels “loop” and “cont” are replaced with the labels “1” and “2”. This takes away from the readability of the program. The reason we do this is that if we didn’t, we would get an error that a label was defined more than once, if we use the macro more than once. The trick here is that the Assembler lets you define numeric labels as many times as you want. Then to reference them in our code, we used
      BGT   2f
      BNE   1b            @ loop if character isn't null

The f after the 2 means the next label 2 in the forward direction. The 1b means the next label 1 in the backward direction.

To prove that this works, we call toupper twice in the mainmacro.s file to show everything works and that we can reuse this macro as many times as we like.

Why Macros?

Macros substitute a copy of the code at every point they are used. This will make your executable file larger. If you
objdump -d mainmacro

you will see the two copies of code inserted. With functions, there is no extra code generated each time. This is why functions are quite appealing, even with the extra work of dealing with the stack.

The reason macros get used is performance. Most Raspberry Pi models have a gigabyte or more of memory that is room for a lot of copies of code. Remember that whenever we branch, we have to restart the execution pipeline, making branching an expensive instruction. With macros, we eliminate the BL branch to call the function and the BX branch to return. We also eliminate the PUSH and POP instructions to save and restore any registers we use. If a macro is small and we use it a lot, there could be considerable execution time savings.

Note

Notice in the macro implementation of toupper that I only used registers R0R3. This is to try and avoid using any registers important to the caller. There is no standard on how to regulate register usage with macros, like there is with functions, so it is up to you, the programmer, to avoid conflicts and strange bugs.

Summary

In this chapter, we covered the ARM stack and how it is used to help implement functions. We covered how to write and call functions as a first step to creating libraries of reusable code. We learned how to manage register usage, so there aren’t any conflicts between our calling programs and our functions. We learned the function calling protocol that will allow us to interoperate with other programming languages. We looked at defining stack-based storage for local variables and how to use this memory.

Finally, we covered the GNU Assembler’s macro ability as an alternative to functions in certain performance critical applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.218.254