Creating C-structs from Rust

Now, how do we call these functions from Rust? Moreover, how do we create instances of mars_t or insn_t? Well, recall back in Chapter 03, The Rust Memory Model – Ownership, References, and Manipulation, that Rust allows control over memory layout in structures. Specifically, all Rust structures are implicitly repr(Rust), types aligned to byte boundaries with structure fields being reordered as the compiler sees fit, among other details. We also noted the existence of a repr(C) for structures, in which Rust would lay out a structure's memory representation in the same manner as C. That knowledge now comes to bear.

What we will do is this. First, we'll compile our C code as a library and rig it to link into feruscore. That's done by placing a build.rs at the root of the project and using the cc (https://crates.io/crates/cc) crate to produce a static archive, like so:

extern crate cc;

fn main() {
    cc::Build::new()
        .file("c_src/sim.c")
        .flag("-std=c11")
        .flag("-O3")
        .flag("-Wall")
        .flag("-Werror")
        .flag("-Wunused")
        .flag("-Wpedantic")
        .flag("-Wunreachable-code")
        .compile("mars");
}

Cargo will produce libmars.a into target/ when the project is built. But, how do we make insn_t? We copy the representation. The C side of this project defines insn_t like so:

typedef struct insn_st {
  uint16_t a, b;
  uint16_t in;
} insn_t;

uint16_t a and uint16_t b are the a-field and b-field of the instruction, where uint16_t is a compressed representation of the OpCode, Modifier, and Modes in an instruction. The Rust side of the project defines an instruction like so:

#[derive(PartialEq, Eq, Copy, Clone, Debug, Default)]
#[repr(C)]
pub struct Instruction {
    a: u16,
    b: u16,
    ins: u16,
}

This is the exact layout of the inst_t C. The reader will note that this is quite different from the definition of Instruction we saw in the previous chapter. Also, note that the field names do not matter, only the bit representation. The C structure calls the last field of the struct in, but this is a reserved keyword in Rust, so it is ins in the Rust side. Now, what is going on with that ins field? Recall that the Mode enumeration only had five fields. All we really need to encode a mode is three bits, converting the enumeration into numeric representation. A similar idea holds for the other components of an instruction. The layout of the ins field is:

bit         15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
field       |-flags-| |–-opcode–-| |–mod-| |b-mode| |a-mode|

The Mode for a-field is encoded in bits 0, 1 and, 2. The Mode for b-field is in bits 3, 4, and 5, and so on for the other instruction components. The last two bits, 14 and 15, encode a flag that is almost always zero. A non-zero flag is an indicator to the simulator that the non-zero instruction is the START instruction—the 0^th instruction of a warrior is not necessarily the one executed first by MARS. This compact structure requires a little more work on the part of the programmer to support it. For instance, the Instruction can no longer be created directly by the programmer but has to be constructed through a builder. The InstructionBuilder, defined in src/instruction.rs, is:

pub struct InstructionBuilder {
    core_size: u16,
    ins: u16,
    a: u16,
    b: u16,
}

As always, we have to keep track of the core size. Building the builder is straightforward enough, by this point:

impl InstructionBuilder {
    pub fn new(core_size: u16) -> Self {
        InstructionBuilder {
            core_size,
            ins: 0_u16,
            a: 0,
            b: 0,
        }
    }

Writing a field into the instruction requires a little bit manipulation. Here's writing a Modifier:

    pub fn modifier(mut self, modifier: Modifier) -> Self {
        let modifier_no = modifier as u16;
        self.ins &= !MODIFIER_MASK;
        self.ins |= modifier_no << MODIFIER_MASK.trailing_zeros();
        self
    }

The constant MODIFIER_MASK is defined in a block at the top of the source file with the other field masks:

const AMODE_MASK: u16 = 0b0000_0000_0000_0111;
const BMODE_MASK: u16 = 0b0000_0000_0011_1000;
const MODIFIER_MASK: u16 = 0b0000_0001_1100_0000;
const OP_CODE_MASK: u16 = 0b0011_1110_0000_0000;
const FLAG_MASK: u16 = 0b1100_0000_0000_0000;

Observe that the relevant bits in the masks are 1 bits. In InstructionBuilder::modifier we &= the negation of the mask, which boolean-ands ins with the negation of the modifier mask, zero-ing the Modifier that was previously there. That done, the Modifier encoded as u16 is shifted left and boolean-or'ed into place. The trailing_zeros() function returns the total number of contiguous zeros in the lower end of a word, the exact number we need to shift by for each mask. Those readers that have done bit-manipulation work in other languages may find this to be very clean. I think so as well. Rust's explicit binary form for integers makes writing, and later, understanding, masks a breeze. Common bit-manipulation operations and queries are implemented on every basic integer type. Very useful.

The OpCode layout has changed somewhat. We don't repr(C) the enum, as the bit representation does not matter. What does matter, since this is enumeration is field-less, is which integer the variants cast to. First in the source maps to 0, the second to 1, and so forth. The C code has op-codes defined like so in c_src/insn.h:

enum ex_op {
    EX_DAT,             /* must be 0 */
    EX_SPL,
    EX_MOV,
    EX_DJN,
    EX_ADD,
    EX_JMZ,
    EX_SUB,
    EX_SEQ,
    EX_SNE,
    EX_SLT,
    EX_JMN,
    EX_JMP,
    EX_NOP,
    EX_MUL,
    EX_MODM,
    EX_DIV,             /* 16 */
};

The Rust version is as follows:

#[derive(PartialEq, Eq, Copy, Clone, Debug, Rand)]
pub enum OpCode {
    Dat,  // 0
    Spl,  // 1
    Mov,  // 2
    Djn,  // 3
    Add,  // 4
    Jmz,  // 5
    Sub,  // 6
    Seq,  // 7
    Sne,  // 8
    Slt,  // 9
    Jmn,  // 10
    Jmp,  // 11
    Nop,  // 12
    Mul,  // 13
    Modm, // 14
    Div,  // 15
}

The other instruction components have been shuffled around just a little bit to cope with the changes required by the C code. The good news is, this representation is more compact than the one from the previous chapter and should probably be maintained, even if all the C code were ported into Rust, a topic we'll get into later. But—and I'll spare you the full definition of InstructionBuilder because once you've seen one set-function you've seen them all—all this bit fiddling does make the implementation harder to see, and to correct at a glance. The instruction module now has QuickCheck tests to verify that all the fields get set correctly, meaning they can be ready right back out again no matter how many times fields are set and reset. You are encouraged to examine the QuickCheck tests yourself.

The high-level idea is this—a blank Instruction is made and a sequence of change orders is run over that Instruction—momentarily shifted into an InstructionBuilder to allow for modification—and then the changed field is read and confirmed to have become the value it was changed to. The technique is inline with what we've seen before elsewhere.

Now, what about that mars_t? The C definition, in c_src/sim.h, is:

typedef struct mars_st {
  uint32_t nWarriors;

  uint32_t cycles;
  uint16_t coresize;
  uint32_t processes;

  uint16_t maxWarriorLength;

  w_t* warTab;
  insn_t* coreMem;
  insn_t** queueMem;
} mars_t;

The nWarriors field sets how many warriors will be in the simulation, which for feruscore is always two cycles controls the number of cycles a round will take before ending if both warriors are still alive, processes the maximum number of processes available, and maxWarriorLength shows the maximum number of instructions a warrior may be. All of these are more or less familiar from the last chapter, just in a new programming language and with different names. The final three fields are pointers to arrays and are effectively private to the simulation function. These are allocated and deallocated by sim_alloc_bufs and sim_free_bufs, respectively. The Rust side of this structure looks like so, from src/mars.rs:

#[repr(C)]
pub struct Mars {
    n_warriors: u32,
    cycles: u32,
    core_size: u16,
    processes: u32,
    max_warrior_length: u16,
    war_tab: *mut WarTable,
    core_mem: *mut Instruction,
    queue_mem: *const *mut Instruction,
}

The only new type here is WarTable. Even though our code will never explicitly manipulate the warrior table, we do still have to be bit-compatible with C. The definition of WarTable is:

#[repr(C)]
struct WarTable {
    tail: *mut *mut Instruction,
    head: *mut *mut Instruction,
    nprocs: u32,
    succ: *mut WarTable,
    pred: *mut WarTable,
    id: u32,
}

We could have maybe got away with just making these private fields in Mars pointers to void in mars_st, but that would have reduced type information on the C side of the project and this approach might hamper future porting efforts. With the type explicit on the Rust side of the project, it's much easier to consider rewriting the C functions in Rust.

Table of Contents for Creating C-structs from Rust

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating C-structs from Rust