© Stephen Smith 2020
S. SmithProgramming with 64-Bit ARM Assembly Languagehttps://doi.org/10.1007/978-1-4842-5881-1_9

9. Interacting with C and Python

Stephen Smith1 
(1)
Gibsons, BC, Canada
 

In the early days of microcomputers, like the Apple II, people wrote complete applications in Assembly Language, such as the first spreadsheet program VisiCalc. Many video games were also written in Assembly to squeeze every bit of performance they could out of the hardware. These days, modern compilers like the GNU C compiler generate good code and microprocessors are much faster; as a result most applications are written in a collection of programming languages, where each excels at a specific function. If you are writing a video game today, chances are you would write most in C, C++, or even C# and then use Assembly for performance, or to access parts of the video hardware not exposed through the graphics library you are using.

In this chapter, we will look at using components written in other languages from our Assembly Language code and look at how other computer languages can make use of the fast-efficient code we are writing in Assembly.

Calling C Routines

If we want to call C functions, we must restructure our program. The C runtime has a _start label; it expects to be called first and to initialize itself before calling our program, which it does by calling a main function. If we leave our _start label in, we will get an error that _start is defined more than once. Similarly, we won’t call the Linux terminate program service anymore; instead we’ll return from main and let the C runtime do that along with any other cleanup it performs.

To include the C runtime, we could add it to the command line arguments in the ld command in our makefile. However, it’s easier to compile our program with the GNU C compiler (which includes the GNU Assembler); then it will link in the C runtime automatically. To compile our program, we will use
gcc -o myprogram myprogram.s

That will call as on myprogram.s and then do the ld command including the C runtime.

The C runtime gives us a lot of capabilities including wrappers for most of the Linux system services. There is an extensive library for manipulating NULL-terminated strings, routines for memory management, and routines to convert between all the data types.

Printing Debug Information

One handy use of the C runtime is to print out data to trace what our program is doing. We wrote a routine to output the contents of a register in hexadecimal, and we could write more Assembly code to extend this, or we could just get the C runtime to do it. After all, if we are printing out trace or debugging information, it doesn’t need to be performant, rather easy to add to our code.

For this example, we’ll use the C runtime’s printf function to print out the contents of a register in both decimal and hexadecimal format. We’ll package this routine as a macro, and we’ll preserve all the registers that might be corrupted. This way we can call the macro without worrying about register conflicts. The exception is the condition flags which it can’t preserve, so don’t put these macros between instructions that set the flags and then test the flags. We also provide a macro to print a string for either logging or formatting purposes.

The C printf function is mighty, as it takes a variable number of arguments depending on the contents of a format string. There is extensive online documentation on printf, so for a fuller understanding, please have a look. We will call our collection of macros debug.s., and it contains the code from Listing 9-1.
// Various macros to help with debugging
// These macros preserve all registers.
// Beware they will change the condition flags.
.macro  printReg    reg
      stp       X0, X1, [SP, #-16]!
      stp       X2, X3, [SP, #-16]!
      stp       X4, X5, [SP, #-16]!
      stp       X6, X7, [SP, #-16]!
      stp       X8, X9, [SP, #-16]!
      stp       X10, X11, [SP, #-16]!
      stp       X12, X13, [SP, #-16]!
      stp       X14, X15, [SP, #-16]!
      stp       X16, X17, [SP, #-16]!
      stp       X18, LR, [SP, #-16]!
      mov       X2, X eg    // for the %d
      mov       X3, X eg    // for the %x
      mov       X1, # eg
      add       X1, X1, #'0' // for %c
      ldr       X0, =ptfStr  // printf format str
      bl        printf // call printf
      ldp       X18, LR, [SP], #16
      ldp       X16, X17, [SP], #16
      ldp       X14, X15, [SP], #16
      ldp       X12, X13, [SP], #16
      ldp       X10, X11, [SP], #16
      ldp       X8, X9, [SP], #16
      ldp       X6, X7, [SP], #16
      ldp       X4, X5, [SP], #16
      ldp       X2, X3, [SP], #16
      ldp       X0, X1, [SP], #16
.endm
.macro    printStr    str
      stp       X0, X1, [SP, #-16]!
      stp       X2, X3, [SP, #-16]!
      stp       X4, X5, [SP, #-16]!
      stp       X6, X7, [SP, #-16]!
      stp       X8, X9, [SP, #-16]!
      stp       X10, X11, [SP, #-16]!
      stp       X12, X13, [SP, #-16]!
      stp       X14, X15, [SP, #-16]!
      stp       X16, X17, [SP, #-16]!
      stp       X18, LR, [SP, #-16]!
      ldr       X0, =1f     // load print str
      bl        printf // call printf
      ldp       X18, LR, [SP], #16
      ldp       X16, X17, [SP], #16
      ldp       X14, X15, [SP], #16
      ldp       X12, X13, [SP], #16
      ldp       X10, X11, [SP], #16
      ldp       X8, X9, [SP], #16
      ldp       X6, X7, [SP], #16
      ldp       X4, X5, [SP], #16
      ldp       X2, X3, [SP], #16
      ldp       X0, X1, [SP], #16
      b         2f           // branch around str
1:    .asciz         "str "
      .align         4
2:
.endm
.data
ptfStr: .asciz "X%c = %32ld, 0x%016lx "
.align 4
.text
Listing 9-1

Debug macros that use the C runtime’s printf function

Preserving State

First, we push registers X0X18 and LR; we either use these registers or printf might change them. They aren’t saved as part of the function calling protocol. At the end, we restore these. This makes calling our macros as minimally disruptive to the calling code as possible.

It is unfortunate that each instruction can only save or restore two registers at a time, and since there are 19 corruptible registers along with LR, this means ten instructions to push all these registers and another ten to pop them all off of the stack.

Calling Printf

We call the C function with these arguments:
printf("R%c = %32ld, 0x%016lx ", reg, Rreg, Rreg);
Since there are four parameters, we set them into X0X3. In printf, each string that starts with a percentage sign (“%”) takes the next parameter and formats it according to the next letter:
  • c for character

  • d for decimal

  • x for hex

  • 0 means 0 pad

  • l for long meaning 64 bits

  • A number specifying the length of the field to print

Note

It is important to move the value of the register to X2 and X3 first since populating the other registers might wipe out the passed in value if we are printing X0 or X1. If our register is X2 or X3, one of the MOV instructions does nothing. Luckily, we don’t get an error or warning, so we don’t need a special case.

Now we look at the details of how we pass this format string to printf.

Passing a String

In the printStr macro, we pass in a string to print. Assembly doesn’t handle strings, so we embed the string in the code with an .asciz directive, then branch around it.

There is an .align directive right after the string, since Assembly instructions must be word aligned. It is good practice to add an .align directive after strings, since other data types will load faster if they are word aligned.

Generally, I don’t like adding data to the code section, but for our macro, this is the easiest way. The assumption is that the debug calls will be removed from the final code. If we add too many strings, we could make PC relative offsets too large to be resolved. If this happens, we may need to shorten the strings, or remove some.

Next, we need a program that needs to print something.

Adding with Carry Revisited

In Chapter 2, “Loading and Adding,” we gave sample code to add two 128-bit numbers using ADDS and ADC instructions. What was lacking from this example was some way to see the output. Now we’ll take addexamp2.s and add some calls to our debug macros, in Listing 9-2, to show it in action.
//
// Example of 128-Bit addition with the ADD/ADC instructions.
//
.include "debug.s"
.global main            // Provide program starting address
// Load the registers with some data
// First 64-bit number is 0x0000000000000003FFFFFFFFFFFFFFFF
main:
      STR    LR,[SP,#-16]!
      MOV    X2, #0x0000000000000003
      MOV    X3, #0xFFFFFFFFFFFFFFFF  // will change to MOVN
// Second 64-bit number is 0x00000000000000050000000000000001
      MOV    X4, #0x0000000000000005
      MOV    X5, #0x0000000000000001
      printStr "Inputs:"
      printReg 2
      printReg 3
      printReg 4
      printReg 5
      ADDS   X1, X3, X5   // Lower order word
      ADC    X0, X2, X4   // Higher order word
      printStr "Outputs:"
      printReg 1
      printReg 0
      MOV    X0, #0       // return code
      LDR    LR, [SP], #16
      RET
Listing 9-2

Updated addexamp2.s to print out the inputs and outputs

The makefile, in Listing 9-3, for this is quite simple.
addexamp2: addexamp2.s debug.s
     gcc -o addexamp2 addexamp2.s
Listing 9-3

Makefile for updated addexamp2.s

If we compile and run the program, we will see
smist08@kali:~/asm64/Chapter 9$ make
gcc -o addexamp2 addexamp2.s
smist08@kali:~/asm64/Chapter 9$ ./addexamp2
Inputs:
X2 =                                3, 0x0000000000000003
X3 =                               -1, 0xffffffffffffffff
X4 =                                5, 0x0000000000000005
X5 =                                1, 0x0000000000000001
Outputs:
X1 =                                0, 0x0000000000000000
X0 =                                9, 0x0000000000000009
smist08@kali:~/asm64/Chapter 9$

Besides adding the debug statements, notice how the program is restructured as a function. The entry point is main, and it follows the function protocol of saving LR.

By just adding the C runtime, we bring a powerful tool-chest to save us time as we develop our full Assembly application. On the downside, notice our executable has grown to over 9KB.

Now we know how to call C routines from our Assembly Language code, next let’s do the reverse and call Assembly Language from C.

Calling Assembly Routines from C

A typical scenario is to write most of our application in C, then call Assembly Language routines in specific use cases. If we follow the function calling protocol from Chapter 6, “Functions and the Stack,” C won’t be able to tell the difference between our functions and any functions written in C.

As an example, let’s call the toupper function in Listing 9-4 from C. Listing 9-4 contains the C code for uppertst.c to call our Assembly function.
//
// C program to call our Assembly
// toupper routine.
//
#include <stdio.h>
extern int mytoupper( char *, char * );
#define MAX_BUFFSIZE 255
int main()
{
      char *str = "This is a test.";
      char outBuf[MAX_BUFFSIZE];
      int len;
      len = mytoupper( str, outBuf );
      printf("Before str: %s ", str);
      printf("After str: %s ", outBuf);
      printf("Str len = %d ", len);
      return(0);
}
Listing 9-4

Main program to show calling our toupper function from C

The makefile is in Listing 9-5.
uppertst: uppertst.c upper.s
      gcc -o uppertst uppertst.c upper.s
Listing 9-5

Makefile for C and our toupper function

We had to change the name of our toupper function to mytoupper, since there is already a toupper function in the C runtime, and this led to a multiple definition error. This had to be done in both the C and the Assembly code. Otherwise, the function is the same as in Chapter 6, “Functions and the Stack.”

We must define the parameters and return code for our function to the C compiler. We do this with
extern int mytoupper( char *, char * );

This should be familiar to all C programmers, as you must do this for C functions as well. Usually, you would gather up all these definitions and put them in a header (.h) file.

As far as the C code is concerned, there is no difference in using this Assembly function than if we wrote it in C. When we compile and run the program, we get
smist08@kali:~/asm64/Chapter 9$ make
gcc -o uppertst uppertst.c upper.s
smist08@kali:~/asm64/Chapter 9$ ./uppertst
Before str: This is a test.
After str: THIS IS A TEST.
Str len = 16
smist08@kali:~/asm64/Chapter 9$

The string is in upper-case as we would expect, but the string length appears one greater than we might expect. That is because the length includes the NULL character, which isn’t the C standard. If we really wanted to use this a lot with C, we should subtract 1, so that our length is consistent with other C runtime routines.

Packaging Our Code

We could leave our Assembly code in individual object (.o) files, but it’s more convenient for programmers using our library to package them together in a library. This way the user of our Assembly routines just needs to add one library to get all of our code, rather than possibly dozens of .o files. In Linux there are two ways to do this. The first way is to package our code together into a static library that is linked into the program. The second method is to package our code as a shared library that lives outside the calling program and can be shared by several applications.

Static Library

To package our code as a static library, we use the Linux ar command. This command will take a number of .o files and combine them into a single file, by convention lib<ourname>.a, that can then be included into a gcc or ld command. To do this, we modify our makefile to build this way as demonstrated in Listing 9-6.
LIBOBJS = upper.o
all: uppertst2
%.o : %.s
      as $(DEBUGFLGS) $< -o $@
libupper.a: $(LIBOBJS)
      ar -cvq libupper.a upper.o
uppertst2: uppertst.c libupper.a
      gcc -o uppertst2 uppertst.c libupper.a
Listing 9-6

Makefile to build upper.s into a statically linked library

If we build and run this program, we get:
smist08@kali:~/asm64/Chapter 9$ make
as   upper.s -o upper.o
ar -cvq libupper.a upper.o
a - upper.o
gcc -o uppertst2 uppertst.c libupper.a
smist08@kali:~/asm64/Chapter 9$ ./uppertst2
Before str: This is a test.
After str: THIS IS A TEST.
Str len = 16
smist08@kali:~/asm64/Chapter 9$

The only difference compared to the last example is that we first use as to compile upper.s into upper.o and then use ar to build a library containing our routine. If we want to distribute our library, we include libupper.a, a header file with the C function definitions and some documentation. Even if you aren’t selling, or otherwise distributing your code, building libraries internally can help organizationally to share code among programmers and reduce duplicated work. In the next section, we explore shared libraries, another Linux facility for sharing code.

Shared Library

Shared libraries are much more technical than statically linked libraries. They place the code in a separate file from the executable and are dynamically loaded by the system as needed. There are several issues, but we are only going to touch on them, such as versioning and library placement in the file system. If you decide to package your code as a shared library, this section provides a starting point and demonstrates that it applies to Assembly Language code as much as C code.

The shared library is created with the gcc command, giving it the -shared command line parameter to indicate we want to create a shared library and then the -soname parameter to name it.

To use a shared library, it must be in a specific place in the filesystem. We can add new places, but we’re going to use a place created by the C runtime, namely, /usr/local/lib. After we build our library, we copy it here and create a couple of links to it. These steps are all required as part of shared library versioning control system.

Then to use our shared library libup.so.1, we include -lup on the gcc command to compile uppertst3. The makefile is presented in Listing 9-7.
LIBOBJS = upper.o
all: uppertst3
%.o : %.s
        as $(DEBUGFLGS) $< -o $@
libup.so.1.0: $(LIBOBJS)
        gcc -shared -Wl,-soname,libup.so.1 -o libup.so.1.0: $(LIBOBJS)
        gcc -shared -Wl,-soname,libup.so.1 -o libup.so.1.0 $(LIBOBJS)
        mv libup.so.1.0 /usr/local/lib
        ln -sf /usr/local/lib/libup.so.1.0 /usr/local/lib/libup.so.1
        ln -sf /usr/local/lib/libup.so.1.0 /usr/local/lib/libup.so
        ldconfig
uppertst3: libup.so.1.0
       gcc -o uppertst3 uppertst.c -lup
Listing 9-7

Makefile for building and using a shared library

If we run this, several commands will fail. To copy the files to /usr/local/lib, we need root access, so use the sudo command to run make. Notice there is a call to the following command:
ldconfig

after the shared library is put in place. This causes Linux to search all the folders that hold shared libraries and update its master list. We must run this once after we successfully compile our library, or Linux won’t know it exists.

Note

Placing -lup on the end of the command to build uppertst3, after the file that uses it, is important, or you will get unresolved externals when you build.

The following is the sequence of commands to build and run the program:
smist08@kali:~/asm64/Chapter 9$ sudo make -B
as   upper.s -o upper.o
gcc -shared -Wl,-soname,libup.so.1 -o libup.so.1.0 upper.o
mv libup.so.1.0 /usr/local/lib
ln -sf /usr/local/lib/libup.so.1.0 /usr/local/lib/libup.so.1
ln -sf /usr/local/lib/libup.so.1.0 /usr/local/lib/libup.so
ldconfig
gcc -o uppertst3 uppertst.c -lup
smist08@kali:~/asm64/Chapter 9$ ./uppertst3
Before str: This is a test.
After str: THIS IS A TEST.
Str len = 16
smist08@kali:~/asm64/Chapter 9$
If you use objdump to look inside uppertst3, you won’t find the code for the mytoupper routine; instead, in our main code, you will find
 7dc: 97ffffad bl 690 <mytoupper@plt>
which calls
0000000000000690 <mytoupper@plt>:
 690: b0000090 adrp x16, 11000 <__cxa_finalize@GLIBC_2.17>
 694: f9401211 ldr x17, [x16, #32]
 698: 91008210 add x16, x16, #0x20
 69c: d61f0220 br x17

Gcc inserted this indirection into our code, so the loader can fix up the address when it dynamically loads the shared library.

As a final technique, we will look at mixing Assembly Language and C code in the same source code file.

Embedding Assembly Code Inside C Code

The GNU C compiler allows Assembly code to be embedded right in the middle of C code. It contains features to interact with C variables and labels and cooperate with the C compiler for register usage.

Listing 9-8 is a simple example, where we embed the core algorithm for the toupper function inside the C main program.
//
// C program to embed our Assembly
// toupper routine inline.
//
#include <stdio.h>
extern int mytoupper( char *, char * );
#define MAX_BUFFSIZE 255
int main()
{
      char *str = "This is a test.";
      char outBuf[MAX_BUFFSIZE];
      int len;
      asm
      (
            "MOV   X4, %2 "
            "loop: LDRB    W5, [%1], #1 "
            "CMP   W5, #'z' "
            "BGT   cont "
            "CMP   W5, #'a' "
            "BLT   cont "
            "SUB   W5, W5, #('a'-'A') "
            "cont: STRB W5, [%2], #1 "
            "CMP   W5, #0 "
            "B.NE  loop "
            "SUB   %0, %2, X4 "
            : "=r" (len)
            : "r" (str), "r" (outBuf)
            : "r4", "r5"
      );
      printf("Before str: %s ", str);
      printf("After str: %s ", outBuf);
      printf("Str len = %d ", len);
      return(0);
}
Listing 9-8

Embedding our Assembly routine directly in C code

The asm statement lets us embed Assembly code directly into our C code. By doing this, we could write an arbitrary mixture of C and Assembly. I stripped out the comments from the Assembly code, so the structure of the C and Assembly is a bit easier to read. The general form of the asm statement is
asm asm-qualifiers ( AssemblerTemplate
                : OutputOperands
                [ : InputOperands]
                [ : Clobbers ] ]
                [ : GotoLabels])
The parameters are
  • AssemblerTemplate: A C string containing the Assembly code. There are macro substitutions that start with % to let the C compiler insert the inputs and outputs.

  • OutputOperands: A list of variables or registers returned from the code. This is required, since it’s expected that the routine does something. In our case, this is “=r” (len) where the =r means an output register and that we want it to go into the C variable len.

  • InputOperands: List of input variables or registers used by our routine. In this case “r” (str), “r” (outBuf) meaning we want two registers, one holding str and one holding outBuf. It is fortunate that C string variables hold the address of the string, which is what we want in the register.

  • Clobbers: A list of registers that we use and will be clobbered when our code runs. In this case “r4” and “r5”. This statement is the same for all processors, so it just means registers 4 and 5, which in our case are X4 and X5.

  • GotoLabelsr: A list of C program labels that our code might want to jump to. Usually, this is an error exit. If you do jump to a C label, you must warn the compiler with a goto asm-qualifier.

You can label the input and output operands, we didn’t, and that means the compiler will assign them names %0, %1, … as you can see used in the Assembly code.

Since this is a single C file, it is easy to compile with
gcc -o uppertst4 uppertst4.c

Running the program produces the same output as the last section.

If you disassemble the program, you will find that the C compiler avoids using registers X4 and X5 entirely, leaving them to us. You will see it loads up our input registers from the variables on the stack, before our code executes and then copies our return value from the assigned register to the variable len on the stack. It doesn’t give the same registers we originally used, but that isn’t a problem.

This routine is straightforward and doesn’t have any side effects. If your Assembly code is modifying things behind the scenes, you need to add a volatile keyword to the asm statement to make the C compile be more conservative on any assumptions it makes about your code.

In the next section, we’ll look at calling our Assembly Language code from the popular Python programming language.

Calling Assembly from Python

If we write our functions following the Linux function calling protocol from Chapter 6, “Functions and the Stack,” we can follow the documentation on how to call C functions for any given programming language. Python has a good capability to call C functions in its ctypes module. This module requires we package our routines into a shared library.

Since Python is an interpreted language, we can’t link static libraries to it, but we can dynamically load and call shared libraries. The techniques we go through here for Python have matching components in many other interpreted languages.

The hard part is already done, we’ve built the shared library version of our upper-case function; all we must do is call it from Python. Listing 9-9 is the Python code for uppertst5.py.
from ctypes import *
libupper = CDLL("libup.so")
libupper.mytoupper.argtypes = [c_char_p, c_char_p]
libupper.mytoupper.restype = c_int
inStr = create_string_buffer(b"This is a test!")
outStr = create_string_buffer(250)
len = libupper.mytoupper(inStr, outStr)
print(inStr.value.decode())
print(outStr.value.decode())
print(len)
Listing 9-9

Python code to call mytoupper

The code is fairly simple; we first import the ctypes module so we can use it. We then load our shared library with the CDLL function. This is an unfortunate name since it refers to Windows DLLs, rather than something more operating system neutral. Since we installed our shared library in /usr/local/lib and added it to the Linux shared library cache, Python has no trouble finding and loading it.

The next two lines are optional, but good practice. They define the function parameters and return type to Python, so it can do extra error checking.

In Python, strings are immutable, meaning you can’t change them, and they are in Unicode, meaning each character takes up more than one byte. We need to provide the strings in regular buffers that we can change, and we need the strings in ASCII rather than Unicode. We can make a string ASCII in Python by putting a “b” in front of the string, which means to make it a byte array using ASCII characters. The create_string_buffer function in the ctypes module creates a string buffer that is compatible with C (and hence Assembly) for us to use.

We then call our function and print the inputs and outputs; it uses the decode method to convert from ASCII back to Unicode. There are quite a few good Python IDEs for Linux. I used the Thonny Python IDE as shown in Figure 9-1, so we can use that to test the program.
../images/494415_1_En_9_Chapter/494415_1_En_9_Fig1_HTML.jpg
Figure 9-1

Our Python program running in the Thonny IDE

Summary

In this chapter, we looked at calling C functions from our Assembly code. We made use of the standard C runtime to develop some debug helper functions to make developing our Assembly code a little easier. We then did the reverse and called our Assembly upper-case function from a C main program.

We learned how to package our code as both static and shared libraries. We discussed how to package our code for consumption. We looked at how to call our upper-case function from Python, which is typical of high-level languages with the ability to call shared libraries.

In the next chapter, Chapter 10, “Interfacing with Kotlin and Swift,” we will see how to incorporate Assembly Language code into Android and iOS apps.

Exercises

  1. 1.

    Add a macro to debug.s to print a string given a register as a parameter that contains a pointer to the string to print.

     
  2. 2.

    Add a macro to debug.s to print a register, if it contains a single ASCII character.

     
  3. 3.

    In the printReg macro, set X0X18 to known unusual values before the call to printf. Then step through the call to printf to see how many of these registers are clobbered.

     
  4. 4.

    Create a C program to call the lower-case routine from Chapter 6 (“Functions and the Stack”), Exercise 3, and print out some test cases.

     
  5. 5.

    Create static and shared library packages for the lower-case routine from Chapter 6, Exercise 3.

     
  6. 6.

    Take the lower-case routine from Chapter 6, Exercise 3, and embed it in C code using an asm statement.

     
  7. 7.

    Create a Python program to call the shared library from Exercise 5.

     
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.125.171