Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

S. SmithRaspberry Pi Assembly Language Programminghttps://doi.org/10.1007/978-1-4842-5287-1_6

6. Functions and the Stack

Stephen Smith¹

(1)

Gibsons, BC, Canada

In this chapter, we will examine how to organize our code into small independent units called functions . This allows us to build reusable components that we can call easily from anywhere we wish.

Typically, in software development we start with low-level components, then build on these to create higher and higher level applications. So far, we know how to loop, perform conditional logic, and perform some arithmetic. Now, we examine how to compartmentalize our code into building blocks.

We introduce the stack ; this is a computer science data structure for storing data. If we are going to build useful reusable functions, we will need a good way to manage register usage, so that all these functions don’t clobber each other. In Chapter 5, “Thanks for the Memories,” we studied how to store data in a data segment in main memory. The problem with this is that this memory exists for the duration that our program runs. With small functions, like our converting to uppercase program, they run quickly and might need a few memory locations while they run, but when they are done, they don’t need this memory anymore. Stacks provide us a tool to manage register usage across function calls and a tool to provide memory to functions for the duration of their invocation.

We introduce a number of low-level concepts first, then we put them all together to effectively create and use functions.

Stacks on Raspbian

In computer science, a stack is an area of memory where there are two operations:

push : Adds an element to the area
pop : Returns and removes the element that was most recently added

This behavior is also called a LIFO (last in first out) queue.

When Raspbian runs a program, it gives it an 8 MB stack. In Chapter 1, “Getting Started,” we mentioned that register R13 had a special purpose as the Stack Pointer (SP) . You might have noticed that R13 is named SP in gdb, and you might have noticed that when you debugged programs, it had a large value, something like 0x7efff380. This is a pointer to the current stack location.

The ARM32 instruction set has two instructions to manipulate stacks, Load Multiple (LDM) and Store Multiple (STM) . These two instructions have quite a few options. These are to support things like whether the stack grows by increasing addresses or by decreasing addresses—, whether SP points to the end of the stack or the next free location on the stack. These options could be useful, if you are creating your own stack, or to match the requirement of a different operating system. But all we want is to work with the stack Raspbian provides us.

Fortunately, the GNU Assembler offers simpler pseudo-instructions that are mapped back to the correct forms of LDM and STM. These are

PUSH {reglist}

POP {reglist}

The {reglist} parameter is a list of registers, containing a comma-separated list of registers and register ranges. A register range is something like R2–R4, which means R2, R3, and R4, for example:

PUSH {r0, r5-r12}

POP {r0-r4, r6, r9-r12}

The registers are stored on the stack in numerical order, with the lowest register at the lowest address. You shouldn’t include PC or SP in this list. Figure 6-1 shows the process of pushing a register onto the stack and then Figure 6-2 shows the reverse operation of popping that value off the stack.

../images/486919_1_En_6_Chapter/486919_1_En_6_Fig1_HTML.jpg — Figure 6-1
Pushing R5 onto the stack

../images/486919_1_En_6_Chapter/486919_1_En_6_Fig2_HTML.jpg — Figure 6-2
Popping R4 from the stack

Branch with Link

To call a function, we need to set up the ability for the function to return execution to after the point where we called the function. We do this with the other special register we listed in Chapter 1, “Getting Started,” the Link Register (LR) which is R14. To make use of LR, we introduce the Branch with Link (BL) instruction, which is the same as the Branch (B) instruction, except it puts the address of the next instruction into LR before it performs the branch, giving us a mechanism to return from the function.

To return from our function, we use the Branch and Exchange (BX) instruction. This branch instruction takes a register as its argument, allowing us to branch to the address stored in LR to continue processing after the function completes.

In Listing 6-1, the BL instruction stores the address of the following MOV instruction into LR and then branches to myfunc. Myfunc does the useful work the function was written to do, then returns execution to the caller by having BX branch to the location stored in LR, which is the MOV instruction following the BL instruction.

@ ... other code ...

BL myfunc

MOV R1, #4

@ ... more code ...

-----------------------------

myfunc: @ do some work

BX LR

Listing 6-1

Skeleton code to call a function and return

Nesting Function Calls

We successfully called and returned from a function, but we never used the stack. Why did we introduce the stack first and then not use it? First think what happens if in the course of its processing myfunc calls another function. We would expect this to be fairly common, as we write code building on the functionality we’ve previously written. If myfunc executes a BL instruction, then BL will copy the next address into LR overwriting the return address for myfunc and myfunc won’t be able to return. What we need is a way to keep a chain of return addresses as we call function after function. Well, not a chain of return addresses, but a stack of return addresses.

If myfunc is going to call other functions, then it needs to push LR onto the stack as the first thing it does and pop it from the stack just before it returns, for example, Listing 6-2 shows this process.

@ ... other code ...

BL myfunc

MOV R1, #4

@ ... more code ...

-----------------------------

myfunc: PUSH {LR}

@ do some work ...

BL myfunc2

@ do some more work...

POP {LR}

BX LR

myfunc2: @ do some work ....

BX LR

Listing 6-2

Skeleton code for a function that calls another function

In this example, we see how convenient the stack is to store data that only needs to exist for the duration of a function call.

If a function, such as myfunc, calls other functions, then it must save LR; if it doesn’t call other functions, such as myfunc2, then it doesn’t need to save LR. Programmers often push and pop LR regardless, since if the function is modified later to add a function call and the programmer forgets to add LR to the list of saved registers, then the program will fail to return and either go into an infinite loop or crash. The downside is that there is only so much bandwidth between the CPU and memory, so PUSHing and POPing more registers does take extra execution cycles. The trade-off in speed vs. maintainability is a subjective decision depending on the circumstances.

Function Parameters and Return Values

In high-level languages, functions take parameters and return their results. Assembly language programming is no different. We could invent our own mechanisms to do this, but this is counterproductive. Eventually we will want our code to interoperate with code written in other programming languages. We will want to call our new super-fast functions from C code, and we might want to call functions that were written in C.

To facilitate this, there are a set of design patterns for calling functions. If we follow these, our code will work reliably since others have already worked out all the bugs, plus we achieve the goal of writing interoperable code.

The caller passes the first four parameters in R0, R1, R2, and R3. If there are additional parameters, then they are pushed onto the stack. If we only have two parameters, then we would only use R0 and R1. This means the first four parameters are already loaded into registers and ready to be processed. Additional parameters need to be popped from the stack before being processed.

To return a value to the caller, place it in R0 before returning. If you need to return more data, you would have one of the parameters be an address to a memory location where you can place the additional data to be returned. This is the same as C where you return data through call by reference parameters.

Managing the Registers

If you call a function, chances are it was written by a different programmer and you don’t know what registers it will use. It would be very inefficient, if you had to reload all your registers every time you call a function. As a result, there are a set of rules to govern which registers a function can use and who is responsible for saving each one:

R0–R3: These are the function parameters. The function can use these for any other purpose modifying them freely. If the calling routine needs them saved, it must save them itself.
R4–R12: These can be used freely by the called routine, but if it is responsible for saving them. That means the calling routine can assume these registers are intact.
SP: This can be freely used by the called routine. The routine must POP the stack the same number of times that it PUSHes, so it is intact for the calling routine.
LR: The called routine must preserve this as we discussed in the last section.
CPSR: Neither routine can make any assumptions about the CPSR. As far as the called routine is concerned, all the flags are unknown; similarly, they are unknown to the caller when the function returns.

Summary of the Function Call Algorithm

Calling routine

1.
If we need any of R0–R4, save them.
2.
Move first four parameters into registers R0–R4.
3.
Push any additional parameters onto the stack.
4.
Use BL to call the function.
5.
Evaluate the return code in R0.
6.
Restore any of R0–R4 that we saved.

Called function

1.
PUSH LR and R4–R12 onto the stack.
2.
Do our work.
3.
Put our return code into R0.
4.
POP LR and R4–R12.
5.
Use the BX instruction to return execution to the caller.
Note We can save ourselves some steps if we just use R0–R3 for function parameters and return codes and short-term work. Then we never have to save and restore them around function calls.
I specified saving all of LR and R4–R12, which is the safest and most maintainable practice. However, if we know we don’t use some of these registers, we can skip saving them and save some execution time on function entry and exit.
These aren’t all the rules. The coprocessors also have registers that might need saving. We’ll discuss those rules when we discuss the coprocessors.

Uppercase Revisited

Let’s organize our uppercase example from Chapter 5, “Thanks for the Memories,” as a proper function. We’ll move the function into its own file and modify the makefile to make both the calling program and the uppercase function.

First create a file called main.s containing Listing 6-3 for the driving application.

@ Assembler program to convert a string to

@ all uppercase by calling a function.

@ R0-R2 - parameters to linux function services

@ R1 - address of output string

@ R0 - address of input string

@ R5 - current character being processed

@ R7 - linux function number

.global _start @ Provide program starting address

_start: LDR R0, =instr @ start of input string

LDR R1, =outstr @ address of output string

BL toupper

@ Set up the parameters to print our hex number

@ and then call Linux to do it.

MOV R2,R0 @ return code is the length of the string

MOV R0, #1 @ 1 = StdOut

LDR R1, =outstr @ string to print

MOV R7, #4 @ linux write system call

SVC 0 @ Call linux to output the string

@ Set up the parameters to exit the program

@ and then call Linux to do it.

MOV R0, #0 @ Use 0 return code

MOV R7, #1 @ Command code 1 terminates

SVC 0 @ Call linux to terminate the program

.data

instr: .asciz "This is our Test String that we will convert. "

outstr: .fill 255, 1, 0

Listing 6-3

Main program for uppercase example

Now create a file called upper.s containing Listing 6-4, the uppercase conversion function.

@ Assembler program to convert a string to

@ all uppercase.

@ R1 - address of output string

@ R0 - address of input string

@ R4 - original output string for length calc.

@ R5 - current character being processed

.global toupper @ Allow other files to call this routine

toupper: PUSH {R4-R5} @ Save the registers we use.

MOV R4, R1

@ The loop is until byte pointed to by R1 is non-zero

loop: LDRB R5, [R0], #1 @ load character and increment pointer

@ If R5 > 'z' then goto cont

CMP R5, #'z' @ is letter > 'z'?

BGT cont

@ Else if R5 < 'a' then goto end if

CMP R5, #'a'

BLT cont @ goto to end if

@ if we got here then the letter is lower case, so convert it.

SUB R5, #('a'-'A')

cont: @ end if

STRB R5, [R1], #1 @ store character to output str

CMP R5, #0 @ stop on hitting a null character

BNE loop @ loop if character isn't null

SUB R0, R1, R4 @ get the length by subtracting the pointers

POP {R4-R5} @ Restore the register we use.

BX LR @ Return to caller

Listing 6-4

Function to convert strings to all uppercase

To build these, use the makefile in Listing 6-5.

UPPEROBJS = main.o upper.o

ifdef DEBUG

DEBUGFLGS = -g

else

DEBUGFLGS =

endif

LSTFLGS =

all: upper

%.o : %.s

as $(DEBUGFLGS) $(LSTFLGS) $< -o $@

upper: $(UPPEROBJS)

ld -o upper $(UPPEROBJS)

Listing 6-5

Makefile for the uppercase function example

Let’s step through the function call and examine the contents of important registers and the stack. We set a breakpoint at _start and single-step through the first couple of instructions and stop at the BL instruction. I set R4 to 12 and R5 to 13, so we can follow how these are saved to the stack.

r4 0xc 12

r5 0xd 13

sp 0x7efff380 0x7efff380

lr 0x0 0

pc 0x10084 0x10084 <_start+16>

We see the BL instruction is at 0x10084. Now let’s single-step again to execute the BL instruction. Here are the same registers:

r4 0xc 12

r5 0xd 13

sp 0x7efff380 0x7efff380

lr 0x10088 65672

pc 0x100b0 0x100b0 <toupper>

The LR has been set to 0x10088 which is the instruction after the BL instruction (0x10084+4). The PC is now 0x100b0, pointing to the first instruction in the toupper routine. The first instruction in toupper is the PUSH instruction to save registers R4 and R5. Let’s single-step through that instruction and examine the registers again.

r4 0xc 12

r5 0xd 13

sp 0x7efff378 0x7efff378

lr 0x10088 65672

pc 0x100b4 0x100b4 <toupper+4>

We see that the stack pointer (SP) has been decremented by 8 bytes (two words) to 0x7efff378. None of the other registers have changed. Pushing registers onto the stack does not affect their values; it only saves them. If we look at location 0x7efff378, we see

(gdb) x /4xw 0x7efff378

0x7efff378: 0x0000000c 0x0000000d 0x00000001 0x7efff504

We see copies of registers R4 and R5 on the stack.

From this little exercise, we can see what type of stack Linux uses, namely, it is a descending stack; the addresses get small as the stack grows. Further SP points to the last item saved (and not the next free slot).

Note

The toupper function doesn’t call any other functions, so we don’t save LR along with R4 and R5. If we ever change it to do so, we will need to add LR to the list. This version of toupper is intended to be as fast as possible, so I didn’t add any extra code for future maintainability and safety.

Most C programmers will object that this function is dangerous. If the input string isn’t NULL terminated, then it will overrun the output string buffer, overwriting the memory past the end. The solution is to pass in a third parameter with the buffer lengths and check in the loop that we stop at the end of the buffer if there is no NULL character.

This routine only processes the core ASCII characters. It doesn’t handle the localized characters like é; it won’t be converted to É.

Stack Frames

In our uppercase function, we didn’t need any additional memory, since we could do all our work with the available registers. When we code larger functions, we often require more memory for our variables than fit in the registers. Rather than add clutter to the .data section, we store these variables on the stack.

PUSHing these variables on the stack isn’t practical, since we usually need to access them in a random order, rather than the strict LIFO protocol that PUSH/POP enforce.

To allocate space on the stack, we use a subtract instruction to grow the stack by the amount we need. Suppose we need three variables which are each 32-bit integers, say a, b, and c. Therefore, we need 12 bytes allocated on the stack (3 variables x 4 bytes/word).

SUB SP, #12

This moves the stack pointer down by 12 bytes, providing us a region of memory on the stack to place our variables. Suppose a is in R0, b in R1, and c in R2, we can then store these using

STR R0, [SP] @ Store a

STR R1, [SP, #4] @ Store b

STR R2, [SP, #8] @ Store c

Before the end of the function, we need to execute

ADD SP, #12

to release our variables from the stack. Remember, it is the responsibility of a function to restore SP to its original state before returning.

This is the simplest way to allocate some variables. However, if we are doing a lot of other things with the stack in our function, it can be hard to keep track of these offsets. The way we alleviate this is with a stack frame. Here we allocate a region on the stack and keep a pointer to this region in another register that we will refer to as the Frame Pointer (FP) . You could use any register as the FP, but we will follow the C programming convention and use R11.

To use a stack frame, we first set our frame pointer to the next free spot on the stack (it grows in descending addresses), then we allocate the space as before:

SUB FP, SP, #4

SUB SP, #12

Now we address our variables using an offset from FP.

STR R0, [FP] @ Store a

STR R1, [FP, #-4] @ Store b

STR R2, [FP, #-8] @ Store c

When we use FP , we need to include it in the list of registers we PUSH at the beginning of the function and then POP at the end. Since R11, the FP is one we are responsible for saving.

In this book, we’ll tend to not use FP. This saves a couple of cycles on function entry and exit. After all, in Assembly language programming, we want to be efficient.

Stack Frame Example

Listing 6-6 is a simple skeletal example of a function that creates three variables on the stack.

@ Simple function that takes 2 parameters

@ VAR1 and VAR2. The function adds them,

@ storing the result in a variable SUM.

@ The function returns the sum.

@ It is assumed this function does other work,

@ including other functions.

@ Define our variables

.EQU VAR1, 0

.EQU VAR2, 4

.EQU SUM, 8

SUMFN: PUSH {R4-R12, LR}

SUB SP, #12 @ room for three 32-bit values

STR R0, [SP, #VAR1] @ save passed in param.

STR R1, [SP, #VAR2] @ save second param.

@ Do a bunch of other work, but don't change SP.

LDR R4, [SP, #VAR1]

LDR R5, [SP, #VAR2]

ADD R6, R4, R5

STR R6, [SP, #SUM]

@ Do other work

@ Function Epilog

LDR R0, [SP, #SUM] @ load sum to return

ADD SP, #12 @ Release local vars

POP {R4-R12, PC} @ Restore regs and return

Listing 6-6

Simple skeletal function that demonstrates a stack frame

Defining Symbols

In this example, we introduce the .EQU Assembler directive. This directive allows us to define symbols that will be substituted by the Assembler before generating the compiled code. This way, we can make the code more readable. In this example, keeping track of which variable is which on the stack makes the code hard to read and is error-prone. With the .EQU directive, we can define each variable’s offset on the stack once.

Sadly, .EQU only defines numbers, so we can’t define the whole “[SP, #4]” type string.

One More Optimization

You might notice that our SUMFN doesn’t end in “BX LR”. This is a little optimization. The BX instruction basically moves LR into PC, so why not just POP LR directly into PC? Notice this is what the POP instruction at the end of the routine does. If we pushed LR, we can save an instruction this way. This works fine as long as the caller is regular ARM32 Assembly code. There is another type of code called Thumb code which we will look at in Chapter 15, “Thumb Code.” BX lets us return to a caller that is running in Thumb mode, where popping to PC won’t cause the processor to change how it interprets instructions.

Macros

Another way to make our uppercase loop into a reusable bit of code is to use macros. The GNU Assembler has a powerful macro capability; with macros rather than calling a function, the Assembler creates a copy of the code in each place where it is called, substituting any parameters. Consider this alternate implementation of our uppercase program; the first file is mainmacro.s containing the contents of Listing 6-7.

@ Assembler program to convert a string to

@ all uppercase by calling a macro.

@ R0-R2 - parameters to linux function services

@ R1 - address of output string

@ R0 - address of input string

@ R7 - linux function number

.include "uppermacro.s"

.global _start @ Provide program starting address

_start: toupper tststr, buffer

@ Set up the parameters to print our hex number

@ and then call Linux to do it.

MOV R2,R0 @ R0 is the length of the string

MOV R0, #1 @ 1 = StdOut

LDR R1, =buffer @ string to print

MOV R7, #4 @ linux write system call

SVC 0 @ Call linux to output the string

@ Call it a second time with our second string.

toupper tststr2, buffer

@ Set up the parameters to print our hex number

@ and then call Linux to do it.

MOV R2,R0 @ R0 is the length of the string

MOV R0, #1 @ 1 = StdOut

LDR R1, =buffer @ string to print

MOV R7, #4 @ linux write system call

SVC 0 @ Call linux to output the string

@ Set up the parameters to exit the program

@ and then call Linux to do it.

MOV R0, #0 @ Use 0 return code

MOV R7, #1 @ Service command code 1 terminates this program

SVC 0 @ Call linux to terminate

.data

tststr: .asciz "This is our Test String that we will convert. "

tststr2: .asciz "A second string to uppercase!! "

buffer: .fill 255, 1, 0

Listing 6-7

Program to call our toupper macro

The macro to uppercase the string is in uppermacro.s containing Listing 6-8.

@ Assembler program to convert a string to

@ all uppercase.

@ R1 - address of output string

@ R0 - address of input string

@ R2 - original output string for length calc.

@ R3 - current character being processed

@ label 1 = loop

@ label 2 = cont

.MACRO toupper instr, outstr

LDR R0, =instr

LDR R1, =outstr

MOV R2, R1

@ The loop is until byte pointed to by R1 is non-zero

1: LDRB R3, [R0], #1 @ load character and increment pointer

@ If R5 > 'z' then goto cont

CMP R3, #'z' @ is letter > 'z'?

BGT 2f

@ Else if R5 < 'a' then goto end if

CMP R3, #'a'

BLT 2f @ goto to end if

@ if we got here then the letter is lower-case, so convert it.

SUB R3, #('a'-'A')

2: @ end if

STRB R3, [R1], #1 @ store character to output str

CMP R3, #0 @ stop on hitting a null character

BNE 1b @ loop if character isn't null

SUB R0, R1, R2 @ get the length by subtracting the pointers

.ENDM

Listing 6-8

Macro version of our toupper function

Include Directive

The file uppermacro.s defines our macro to convert a string to uppercase. The macro doesn’t generate any code; it just defines the macro for the Assembler to insert wherever it is called from. This file doesn’t generate an object (∗.o) file; rather, it is included by whichever file needs to use it.

The .include directive

.include "uppermacro.s"

takes the contents of this file and inserts it at this point, so that our source file becomes larger. This is done before any other processing. This is similar to the C #include preprocessor directive.

Macro Definition

A macro is defined with the .MACRO directive. This gives the name of the macro and lists its parameters. The macro ends at the following .ENDM directive. The form of the directive is

.MACRO macroname parameter1, parameter2, ...

Within the macro, you specify the parameters by preceding their name with a backslash, for instance, parameter1 to place the value of parameter1. Our toupper macro defines two parameters instr and outstr:

.MACRO toupper instr, outstr

You can see how the parameters are used in the code with instr and oustr. These are text substitutions and need to result in correct Assembly syntax or you will get an error.

Labels

Our labels “loop” and “cont” are replaced with the labels “1” and “2”. This takes away from the readability of the program. The reason we do this is that if we didn’t, we would get an error that a label was defined more than once, if we use the macro more than once. The trick here is that the Assembler lets you define numeric labels as many times as you want. Then to reference them in our code, we used

BGT 2f

BNE 1b @ loop if character isn't null

The f after the 2 means the next label 2 in the forward direction. The 1b means the next label 1 in the backward direction.

To prove that this works, we call toupper twice in the mainmacro.s file to show everything works and that we can reuse this macro as many times as we like.

Why Macros?

Macros substitute a copy of the code at every point they are used. This will make your executable file larger. If you

objdump -d mainmacro

you will see the two copies of code inserted. With functions, there is no extra code generated each time. This is why functions are quite appealing, even with the extra work of dealing with the stack.

The reason macros get used is performance. Most Raspberry Pi models have a gigabyte or more of memory that is room for a lot of copies of code. Remember that whenever we branch, we have to restart the execution pipeline, making branching an expensive instruction. With macros, we eliminate the BL branch to call the function and the BX branch to return. We also eliminate the PUSH and POP instructions to save and restore any registers we use. If a macro is small and we use it a lot, there could be considerable execution time savings.

Note

Notice in the macro implementation of toupper that I only used registers R0–R3. This is to try and avoid using any registers important to the caller. There is no standard on how to regulate register usage with macros, like there is with functions, so it is up to you, the programmer, to avoid conflicts and strange bugs.

Summary

In this chapter, we covered the ARM stack and how it is used to help implement functions. We covered how to write and call functions as a first step to creating libraries of reusable code. We learned how to manage register usage, so there aren’t any conflicts between our calling programs and our functions. We learned the function calling protocol that will allow us to interoperate with other programming languages. We looked at defining stack-based storage for local variables and how to use this memory.

Finally, we covered the GNU Assembler’s macro ability as an alternative to functions in certain performance critical applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Functions and the Stack

Create new playlist

Sign In

Sign Up

6. Functions and the Stack

Stacks on Raspbian

Branch with Link

Nesting Function Calls

Function Parameters and Return Values

Managing the Registers

Summary of the Function Call Algorithm

Uppercase Revisited

Note

Stack Frames

Stack Frame Example

Defining Symbols

One More Optimization

Macros

Include Directive

Macro Definition

Labels

Why Macros?

Note

Summary

Table of Contents for
6. Functions and the Stack