In the early days of microcomputers, like the Apple II, people wrote complete applications in Assembly language, such as the first spreadsheet program VisiCalc. Many video games were written in Assembly to squeeze every bit of performance they could out of the hardware. These days, modern compilers like the GNU C compiler generate fairly good code and microprocessors are much faster; as a result, most applications are written in a collection of programming languages, where each excels at a specific function. If you are writing a video game today, chances are you would write most in C, C++, or even C#, then use Assembly for performance, or to access parts of the video hardware not exposed through the graphics library you are using.
In this chapter, we will look at using components written in other languages from our Assembly language code and look at how other languages can make use of the fast-efficient code we are writing in Assembly.
Calling C Routines
If we want to call C functions, we must restructure our program. The C runtime has a _start label; it expects to be called first and to initialize itself before calling our program, as it does by calling a main function. If we leave our _start label in, we will get an error that _start is defined more than once. Similarly, we won’t call the Linux terminate program service anymore; instead, we’ll return from main and let the C runtime do that along with any other cleanup it performs.
That will call as on myprogram.s and then do the ld command including the C runtime.
The C runtime gives us a lot of capabilities including wrappers for most of the Linux system services. There is an extensive library for manipulating NULL-terminated strings, routines for memory management, and routines to convert between all the data types.
Printing Debug Information
One handy use of the C runtime is to print out data to trace what our program is doing. We wrote a routine to output the contents of a register in hexadecimal, and we could write more Assembly code to extend this or we could just get the C runtime to do it. After all, if we are printing out trace or debugging information, it doesn’t need to be performant, rather just easy to add to our code.
For this example, we’ll use the C runtime’s printf function to print out the contents of a register in both decimal and hexadecimal format. We’ll package this routine as a macro, and we’ll preserve all the registers with push and pop instructions. This way, we can call the macro without worrying about register conflicts. The exception is CPSR which it can’t preserve, so don’t put these macros between instructions that set the CPSR, then test the CPSR. We also provide a macro to print a string for either logging or formatting purposes.
Debug macros that use the C runtime’s printf function
Preserving State
First, we push registers R0–R4 and LR; we either use these registers, or printf might change them. They aren’t saved as part of the function calling protocol. At the end, we restore these. This makes calling our macros as minimally disruptive to the calling code as possible.
Calling Printf
c for character.
d for decimal.
x for hex.
0 means 0 pad.
A number specifies the length of the field to print.
Note
It is important to move the value of the register to R2 and R3 first since populating the other registers might wipe out the passed-in value if we are printing R0 or R1. If our register is R2 or R3, one of the MOV instructions does nothing. Luckily, we don’t get an error or warning, so we don’t need a special case.
Passing a String
In the printStr macro, we pass in a string to print. Assembly doesn’t handle strings, so we embed the string in the code with an .asciz directive, then branch around it.
There is an .align directive right after the string, since Assembly instructions must be word aligned. It is good practice to add an .align directive after strings, since other data types will load faster if they are word aligned.
Generally, I don’t like adding data to the code section, but for our macro, this is the easiest way. The assumption is that the debug calls will be removed from the final code. If we add too many strings, we could make PC relative offsets too large to be resolved. If this happens, we may need to shorten the strings or remove some.
Adding with Carry Revisited
Updated addexamp2.s to print out the inputs and outputs
Makefile for updated addexamp2.s
Besides adding the debug statements, notice how the program is restructured as a function. The entry point is main, and it follows the function protocol of saving all the registers. Since this is the main routine and only called once, we save all the registers rather than try to track the registers we are really using. This is the safest, since then we don’t have to worry about it as we work on our program.
By just adding the C runtime, we bring a powerful tool chest to save us time as we develop our full Assembly application. On the downside, notice our executable has grown to over 8KB.
Calling Assembly Routines from C
A typical scenario is to write most of our application in C, then call Assembly language routines in specific use cases. If we follow the function calling protocol from Chapter 6, “Functions and the Stack,” C won’t be able to tell the difference between our functions and any other functions written in C.
Main program to show calling our toupper function from C
Makefile for C and our toupper function
We had to change the name of our toupper function to mytoupper, since there is already a toupper function in the C runtime, and this led to a multiple definition error. This had to be done in both the C and the Assembly code. Otherwise, the function is the same as in Chapter 6, “Functions and the Stack.”
This should be familiar to all C programmers, as you must do this for C functions as well. Usually, you would gather up all these definitions and put them in a header (.h) file.
The string is in uppercase as we would expect, but the string length appears one greater than we might expect. That is because the length includes the NULL character that isn’t the C standard. If we really wanted to use this a lot with C, we should subtract 1, so that our length is consistent with other C runtime routines.
Packaging Our Code
We could leave our Assembly code in individual object (.o) files, but it is more convenient for programmers using our library to package them together in a library. This way, the user of our Assembly routines just needs to add one library to get all of our code, rather than possibly dozens of .o files. In Linux there are two ways to do this; the first way is to package our code together into a static library that is linked into the program. The second method is to package our code as a shared library that lives outside the calling program and can be shared by several applications.
Static Library
Makefile to build upper.s into a statically linked library
The only difference to the last example is that we first use as to compile upper.s into upper.o and then use ar to build a library containing our routine. If we want to distribute our library, we include libupper.a, a header file with the C function definitions, and some documentation. Even if you aren’t selling or otherwise distributing your code, building libraries internally can help organizationally to share code among programmers and reduce duplicated work.
Shared Library
Shared libraries are much more technical than statically linked libraries. They place the code in a separate file from the executable and are dynamically loaded by the system as needed. There are a number of issues, but we are only going to touch on them, such as versioning and library placement in the filesystem. If you decide to package your code as a shared library, this section provides a starting point and demonstrates that it applies to Assembly code as much as C code.
The shared library is created with the gcc command, giving it the -shared command-line parameter to indicate we want to create a shared library and then the -soname parameter to name it.
To use a shared library, it must be in a specific place in the filesystem. We can add new places, but we are going to use a place created by the C runtime, namely, /usr/local/lib. After we build our library, we copy it here and create a couple of links to it. These steps are all required as part of shared library versioning control system.
Makefile for building and using a shared library
before we run the program. This causes Linux to search all the folders that hold shared libraries and update its master list. We have to run this once after we successfully compile our library, or Linux won’t know it exists.
Gcc inserted this indirection into our code, so the loader can fix up the address when it dynamically loads the shared library.
Embedding Assembly Code Inside C Code
The GNU C compiler allows Assembly code to be embedded right in the middle of C code. It contains features to interact with C variables and labels and cooperate with the C compiler and optimizer for register usage.
Embedding our Assembly routine directly in C code
AssemblerTemplate: A C string containing the Assembly code. There are macro substitutions that start with % to let the C compiler insert the inputs and outputs.
OutputOperands: A list of variables or registers returned from the code. This is required, since it is expected that the routine does something. In our case this is “=r” (len) where the =r means an output register and that we want it to go into the C variable len.
InputOperands: A list of input variables or registers used by our routine, in this case “r” (str), “r” (outBuf) meaning we want two registers, one holding str and one holding outBuf. It is fortunate that C string variables hold the address of the string, which is what we want in the register.
Clobbers: A list of registers that we use and will be clobbered when our code runs, in this case “r4” and “r5”.
GotoLabelsr: A list of C program labels that our code might want to jump to. Usually, this is an error exit. If you do jump to a C label, you have to warn the compiler with a goto asm-qualifier.
You can label the input and output operands, we didn’t, and that means the compiler will assign them names %0, %1, … as you can see used in the Assembly code.
Running the program produces the same output as the last section.
If you disassemble the program, you will find that the C compiler avoids using registers R4 and R5 entirely, leaving them to us. You will see it load up our input registers from the variables on the stack, before our code executes and then copies our return value from the assigned register to the variable len on the stack. It doesn’t give the same registers we originally used, but that isn’t a problem.
This routine is straightforward and doesn’t have any side effects. If your Assembly code is modifying things behind the scenes, you need to add a volatile keyword to the asm statement to make the C compile be more conservative on any assumptions it makes about your code.
Calling Assembly from Python
If we write our functions following the Raspbian function calling protocol from Chapter 6, “Functions and the Stack,” we can follow the documentation on how to call C functions for any given programming language. Python has a good capability to call C functions in its ctypes module. This module requires we package our routines into a shared library. Since Python is an interpreted language, we can’t link static libraries to it, but we can dynamically load and call shared libraries. The techniques we go through here for Python have matching components in many other interpreted languages.
Python code to call mytoupper
The code is fairly simple; we first import the ctypes module so we can use it. We then load our shared library with the CDLL function. This is an unfortunate name since it refers to Windows DLLs rather than something more operating system neutral. Since we installed our shared library in /usr/local/lib and added it to the Linux shared library cache, Python has no trouble finding and loading it.
The next two lines are optional, but good practice. They define the function parameters and return type to Python, so it can do extra error checking.
In Python, strings are immutable, meaning you can’t change them, and they are in Unicode, meaning each character takes up more than 1 byte. We need to provide the strings in regular buffers that we can change, and we need the strings in ASCII rather than Unicode. We can make a string ASCII in Python by putting a “b” in front of the string; that means to make it a byte array using ASCII characters. The create_string_buffer function in the ctypes module creates a string buffer that is compatible with C (and hence Assembly) for us to use.
Summary
In this chapter, we looked at calling C functions from our Assembly code. We made use of the standard C runtime to develop some debug helper functions to make developing our Assembly code a little easier. We then did the reverse and called our Assembly uppercase function from a C main program.
We learned how to package our code as both static and shared libraries. We discussed how to package our code for consumption. We looked at how to call our uppercase function from Python, which is typical of high-level languages with the ability to call shared libraries.
In the next chapter, Chapter 10, “Multiply, Divide, and Accumulate,” we will return to mathematics. We will cover multiplication, division, and multiply with accumulate.