Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 21. Unsafe Code

Let no one think of me that I am humble or weak or passive;
Let them understand I am of a different kind:
dangerous to my enemies, loyal to my friends.
To such a life glory belongs.

Euripides, Medea

The secret joy of systems programming is that, underneath every single safe language and carefully designed abstraction is a swirling maelstrom of wildly unsafe machine language and bit-fiddling. You can write that in Rust, too.

The language we’ve presented up to this point in the book ensures your programs are free of memory errors and data races entirely automatically, through types, lifetimes, bounds checks, and so on. But this sort of automated reasoning has its limits; there are many valuable techniques that Rust cannot recognize as safe.

Unsafe code lets you tell Rust, “In this case, just trust me.” By marking off a block or function as unsafe, you acquire the ability to call unsafe functions in the standard library, dereference unsafe pointers, and call functions written in other languages like C and C++, among other powers. All of Rust’s usual safety checks still apply: type checks, lifetime checks, and bounds checks on indices all occur normally. Unsafe code just enables a small set of additional features.

This ability to step outside the boundaries of safe Rust is what makes it possible to implement many of Rust’s most fundamental features in Rust itself, as is commonly done in C and C++ systems. Unsafe code is what allows the Vec type to manage its buffer efficiently; the std::io module to talk to the operating system; and the std::thread and std::sync modules to provide concurrency primitives.

This chapter covers the essentials of working with unsafe features:

Rust’s unsafe blocks establish the boundary between ordinary, safe Rust code and code that uses unsafe features.
You can mark functions as unsafe, alerting callers to the presence of extra contracts they must follow to avoid undefined behavior.
Raw pointers and their methods allow unconstrained access to memory, and let you build data structures Rust’s type system would otherwise forbid.
Understanding the definition of undefined behavior will help you appreciate why it can have consequences far more serious than just getting incorrect results.
Rust’s foreign function interface lets you use libraries written in other languages.
Unsafe traits, analogous to unsafe functions, impose a contract that each implementation (rather than each caller) must follow.

Unsafe from What?

At the start of this book, we showed a C program that crashes in a surprising way because it fails to follow one of the rules prescribed by the C standard. You can do the same in Rust:

$ cat crash.rs
fn main() {
    let mut a: usize = 0;
    let ptr = &mut a as *mut usize;
    unsafe {
        *ptr.offset(3) = 0x7ffff72f484c;
    }
}
$ cargo build
   Compiling unsafe-samples v0.1.0
    Finished debug [unoptimized + debuginfo] target(s) in 0.44 secs
$ ../../target/debug/crash
crash: Error: .netrc file is readable by others.
crash: Remove password or make file unreadable by others.
Segmentation fault (core dumped)
$

This program borrows a mutable reference to the local variable a, casts it to a raw pointer of type *mut usize, and then uses the offset method to produce a pointer three words further along in memory. This happens to be where main’s return address is stored. The program overwrites the return address with a constant, such that returning from main behaves in a surprising way. What makes this crash possible is the program’s incorrect use of unsafe features—in this case, the ability to dereference raw pointers.

An unsafe feature is one that imposes a contract: rules that Rust cannot enforce automatically, but which you must nonetheless follow to avoid undefined behavior.

A contract goes beyond the usual type checks and lifetime checks, imposing further rules specific to that unsafe feature. Typically, Rust itself doesn’t know about the contract at all; it’s just explained in the feature’s documentation. For example, the raw pointer type has a contract forbidding you to dereference a pointer that has been advanced beyond the end of its original referent. The expression *ptr.offset(3) = ... in this example breaks this contract. But, as the transcript shows, Rust compiles the program without complaint: its safety checks do not detect this violation. When you use unsafe features, you, as the programmer, bear the responsibility for checking that your code adheres to their contracts.

Lots of features have rules you should follow to use them correctly, but such rules are not contracts in the sense we mean here unless the possible consequences include undefined behavior. Undefined behavior is behavior Rust firmly assumes your code could never exhibit. For example, Rust assumes you will not overwrite a function call’s return address with something else. Code that passes Rust’s usual safety checks and complies with the contracts of the unsafe features it uses cannot possibly do such a thing. Since the program violates the raw pointer contract, its behavior is undefined, and it goes off the rails.

If your code exhibits undefined behavior, you have broken your half of your bargain with Rust, and Rust declines to predict the consequences. Dredging up irrelevant error messages from the depths of system libraries and crashing is one possible consequence; handing control of your computer over to an attacker is another. The effects could vary from one release of Rust to the next, without warning. Sometimes, however, undefined behavior has no visible consequences. For example, if the main function never returns (perhaps it calls std::process::exit to terminate the program early), then the corrupted return address probably won’t matter.

You may only use unsafe features within an unsafe block or an unsafe function; we’ll explain both in the sections that follow. This makes it harder to use unsafe features unknowingly: by forcing you to write an unsafe block or function, Rust makes sure you have acknowledged that your code may have additional rules to follow.

Unsafe Blocks

An unsafe block looks just like an ordinary Rust block preceded by the unsafe keyword, with the difference that you can use unsafe features in the block:

unsafe {
    String::from_utf8_unchecked(ascii)
}

Without the unsafe keyword in front of the block, Rust would object to the use of from_utf8_unchecked, which is an unsafe function. With the unsafe block around it, you can use this code anywhere.

Like an ordinary Rust block, the value of an unsafe block is that of its final expression, or () if it doesn’t have one. The call to String::from_utf8_unchecked shown earlier provides the value of the block.

An unsafe block unlocks four additional options for you:

You can call unsafe functions. Each unsafe function must specify its own contract, depending on its purpose.
You can dereference raw pointers. Safe code can pass raw pointers around, compare them, and create them by conversion from references (or even from integers), but only unsafe code can actually use them to access memory. We’ll cover raw pointers in detail and explain how to use them safely in “Raw Pointers”.
You can access mutable static variables. As explained in “Global Variables”, Rust can’t be sure when threads are using mutable static variables, so their contract requires you to ensure all access is properly synchronized.
You can access functions and variables declared through Rust’s foreign function interface. These are considered unsafe even when immutable, since they are visible to code written in other languages that may not respect Rust’s safety rules.

Restricting unsafe features to unsafe blocks doesn’t really prevent you from doing whatever you want. It’s perfectly possible to just stick an unsafe block into your code and move on. The benefit of the rule lies mainly in drawing human attention to code whose safety Rust can’t guarantee:

You won’t accidentally use unsafe features, and then discover you were responsible for contracts you didn’t even know existed.
An unsafe block attracts more attention from reviewers. Some projects even have automation to ensure this, flagging code changes that affect unsafe blocks for special attention.
When you’re considering writing an unsafe block, you can take a moment to ask yourself whether your task really requires such measures. If it’s for performance, do you have measurements to show that this is actually a bottleneck? Perhaps there is a good way to accomplish the same thing in safe Rust.

Example: An Efficient ASCII String Type

Here’s the definition of Ascii, a string type that ensures its contents are always valid ASCII. This type uses an unsafe feature to provide zero-cost conversion into String:

mod my_ascii {
    use std::ascii::AsciiExt; // for u8::is_ascii

    /// An ASCII-encoded string.
    #[derive(Debug, Eq, PartialEq)]
    pub struct Ascii(
        // This must hold only well-formed ASCII text:
        // bytes from `0` to `0x7f`.
        Vec<u8>
    );

    impl Ascii {
        /// Create an `Ascii` from the ASCII text in `bytes`. Return a
        /// `NotAsciiError` error if `bytes` contains any non-ASCII
        /// characters.
        pub fn from_bytes(bytes: Vec<u8>) -> Result<Ascii, NotAsciiError> {
            if bytes.iter().any(|&byte| !byte.is_ascii()) {
                return Err(NotAsciiError(bytes));
            }
            Ok(Ascii(bytes))
        }
    }

    // When conversion fails, we give back the vector we couldn't convert.
    // This should implement `std::error::Error`; omitted for brevity.
    #[derive(Debug, Eq, PartialEq)]
    pub struct NotAsciiError(pub Vec<u8>);

    // Safe, efficient conversion, implemented using unsafe code.
    impl From<Ascii> for String {
        fn from(ascii: Ascii) -> String {
            // If this module has no bugs, this is safe, because
            // well-formed ASCII text is also well-formed UTF-8.
            unsafe { String::from_utf8_unchecked(ascii.0) }
        }
    }
    ...
}

The key to this module is the definition of the Ascii type. The type itself is marked pub, to make it visible outside the my_ascii module. But the type’s Vec<u8> element is not public, so only the my_ascii module can construct an Ascii value or refer to its element. This leaves the module’s code in complete control over what may or may not appear there. As long as the public constructors and methods ensure that freshly created Ascii values are well-formed and remain so throughout their lives, then the rest of the program cannot violate that rule. And indeed, the public constructor Ascii::from_bytes carefully checks the vector it’s given before agreeing to construct an Ascii from it. For brevity’s sake, we don’t show any methods, but you can imagine a set of text-handling methods that ensure Ascii values always contain proper ASCII text, just as a String’s methods ensure that its contents remain well-formed UTF-8.

This arrangement lets us implement From<Ascii> for String very efficiently. The unsafe function String::from_utf8_unchecked takes a byte vector and builds a String from it without checking whether its contents are well-formed UTF-8 text; the function’s contract holds its caller responsible for that. Fortunately, the rules enforced by the Ascii type are exactly what we need to satisfy from_utf8_unchecked’s contract. As we explained in “UTF-8”, any block of ASCII text is also well-formed UTF-8, so an Ascii’s underlying Vec<u8> is immediately ready to serve as a String’s buffer.

With these definitions in place, you can write:

use my_ascii::Ascii;

let bytes: Vec<u8> = b"ASCII and ye shall receive".to_vec();

// This call entails no allocation or text copies, just a scan.
let ascii: Ascii = Ascii::from_bytes(bytes)
    .unwrap(); // We know these chosen bytes are ok.

// This call is zero-cost: no allocation, copies, or scans.
let string = String::from(ascii);

assert_eq!(string, "ASCII and ye shall receive");

No unsafe blocks are required to use Ascii. We have implemented a safe interface using unsafe operations, and arranged to meet their contracts depending only on the module’s own code, not on its users’ behavior.

An Ascii is nothing more than a wrapper around a Vec<u8>, hidden inside a module that enforces extra rules about its contents. A type of this sort is called a newtype, a common pattern in Rust. Rust’s own String type is defined in exactly the same way, except that its contents are restricted to be UTF-8, not ASCII. In fact, here’s the definition of String from the standard library:

pub struct String {
    vec: Vec<u8>,
}

At the machine level, with Rust’s types out of the picture, a newtype and its element have identical representations in memory, so constructing a newtype doesn’t require any machine instructions at all. In Ascii::from_bytes, the expression Ascii(bytes) simply deems the Vec<u8>’s representation to now hold an Ascii value. Similarly, String::from_utf8_unchecked probably requires no machine instructions when inlined: the Vec<u8> is now considered to be a String.

Unsafe Functions

An unsafe function definition looks like an ordinary function definition preceded by the unsafe keyword. The body of an unsafe function is automatically considered an unsafe block.

You may call unsafe functions only within unsafe blocks. This means that marking a function unsafe warns its callers that the function has a contract they must satisfy to avoid undefined behavior.

For example, here’s a new constructor for the Ascii type we introduced before that builds an Ascii from a byte vector without checking if its contents are valid ASCII:

// This must be placed inside the `my_ascii` module.
impl Ascii {
    /// Construct an `Ascii` value from `bytes`, without checking
    /// whether `bytes` actually contains well-formed ASCII.
    ///
    /// This constructor is infallible, and returns an `Ascii` directly,
    /// rather than a `Result<Ascii, NotAsciiError>` as the `from_bytes`
    /// constructor does.
    ///
    /// # Safety
    ///
    /// The caller must ensure that `bytes` contains only ASCII
    /// characters: bytes no greater than 0x7f. Otherwise, the effect is
    /// undefined.
    pub unsafe fn from_bytes_unchecked(bytes: Vec<u8>) -> Ascii {
        Ascii(bytes)
    }
}

Presumably, code calling Ascii::from_bytes_unchecked already knows somehow that the vector in hand contains only ASCII characters, so the check that Ascii::from_bytes insists on carrying out would be a waste of time, and the caller would have to write code to handle Err results that it knows will never occur. Ascii::from_bytes_unchecked lets such a caller sidestep the checks and the error handling.

But earlier we emphasized the importance of Ascii’s public constructors and methods ensuring that Ascii values are well-formed. Doesn’t from_bytes_unchecked fail to meet that responsibility?

Not quite: from_bytes_unchecked meets its obligations by passing them on to its caller via its contract. The presence of this contract is what makes it correct to mark this function unsafe: despite the fact that the function itself carries out no unsafe operations, its callers must follow rules Rust cannot enforce automatically to avoid undefined behavior.

Can you really cause undefined behavior by breaking the contract of Ascii::from_bytes_unchecked? Yes. You can construct a String holding ill-formed UTF-8 as follows:

// Imagine that this vector is the result of some complicated process
// that we expected to produce ASCII. Something went wrong!
let bytes = vec![0xf7, 0xbf, 0xbf, 0xbf];

let ascii = unsafe {
    // This unsafe function's contract is violated
    // when `bytes` holds non-ASCII bytes.
    Ascii::from_bytes_unchecked(bytes)
};

let bogus: String = ascii.into();

// `bogus` now holds ill-formed UTF-8. Parsing its first character
// produces a `char` that is not a valid Unicode code point.
assert_eq!(bogus.chars().next().unwrap() as u32, 0x1fffff);

This illustrates two critical facts about bugs and unsafe code:

Bugs that occur before the unsafe block can break contracts. Whether an unsafe block causes undefined behavior can depend not just on the code in the block itself, but also on the code that supplies the values it operates on. Everything that your unsafe code relies on to satisfy contracts is safety-critical. The conversion from Ascii to String based on String::from_utf8_unchecked is well-defined only if the rest of the module properly maintains Ascii’s invariants.
The consequences of breaking a contract may appear after you leave the unsafe block. The undefined behavior courted by failing to comply with an unsafe feature’s contract often does not occur within the unsafe block itself. Constructing a bogus String as shown before may not cause problems until much later in the program’s execution.

Essentially, Rust’s type checker, borrow checker, and other static checks are inspecting your program and trying to construct a proof that it cannot exhibit undefined behavior. When Rust compiles your program successfully, that means it succeeded in proving your code sound. An unsafe block is a gap in this proof: “This code,” you are saying to Rust, “is fine, trust me.” Whether your claim is true could depend on any part of the program that influences what happens in the unsafe block, and the consequences of being wrong could appear anywhere influenced by the unsafe block. Writing the unsafe keyword amounts to a reminder that you are not getting the full benefit of the language’s safety checks.

Given the choice, you should naturally prefer to create safe interfaces, without contracts. These are much easier to work with, since users can count on Rust’s safety checks to ensure their code is free of undefined behavior. Even if your implementation uses unsafe features, it’s best to use Rust’s types, lifetimes, and module system to meet their contracts while using only what you can guarantee yourself, rather than passing responsibilities on to your callers.

Unfortunately, it’s not unusual to come across unsafe functions in the wild whose documentation does not bother to explain their contracts. You are expected to infer the rules yourself, based on your experience and knowledge of how the code behaves. If you’ve ever uneasily wondered whether what you’re doing with a C or C++ API is OK, then you know what that’s like.

Unsafe Block or Unsafe Function?

You may find yourself wondering whether to use an unsafe block or just mark the whole function unsafe. The approach we recommend is to first make a decision about the function:

If it’s possible to misuse the function in a way that compiles fine but still causes undefined behavior, you must mark it as unsafe. The rules for using the function correctly are its contract; the existence of a contract is what makes the function unsafe.
Otherwise, the function is safe: no well-typed call to it can cause undefined behavior. It should not be marked unsafe.

Whether the function uses unsafe features in its body is irrelevant; what matters is the presence of a contract. Before, we showed an unsafe function that uses no unsafe features, and a safe function that does use unsafe features.

Don’t mark a safe function unsafe just because you use unsafe features in its body. This makes the function harder to use, and confuses readers who will (correctly) expect to find a contract explained somewhere. Instead, use an unsafe block, even if it’s the function’s entire body.

Undefined Behavior

In the introduction, we said that the term undefined behavior means “behavior that Rust firmly assumes your code could never exhibit.” This is a strange turn of phrase, especially since we know from our experience with other languages that these behaviors do occur by accident with some frequency. Why is this concept helpful in setting out the obligations of unsafe code?

A compiler is a translator from one programming language to another. The Rust compiler takes a Rust program and translates it into an equivalent machine language program. But what does it mean to say that two programs in such completely different languages are equivalent?

Fortunately, this question is easier for programmers than it is for linguists. We usually say that two programs are equivalent if they will always have the same visible behavior when executed: they make the same system calls, interact with foreign libraries in equivalent ways, and so on. It’s a bit like a Turing test for programs: if you can’t tell whether you’re interacting with the original or the translation, then they’re equivalent.

Now consider the following code:

let i = 10;
very_trustworthy(&i);
println!("{}", i * 100);

Even knowing nothing about the definition of very_trustworthy, we can see that it receives only a shared reference to i, so the call cannot change i’s value. Since the value passed to println! will always be 1000, Rust can translate this code into machine language as if we had written:

very_trustworthy(&10);
println!("{}", 1000);

This transformed version has the same visible behavior as the original, and it’s probably a bit faster. But it makes sense to consider the performance of this version only if we agree it has the same meaning as the original. What if very_trustworthy were defined as follows?

fn very_trustworthy(shared: &i32) {
    unsafe {
        // Turn the shared reference into a mutable pointer.
        // This is undefined behavior.
        let mutable = shared as *const i32 as *mut i32;
        *mutable = 20;
    }
}

This code breaks the rules for shared references: it changes the value of i to 20, even though it should be frozen because i is borrowed for sharing. As a result, the transformation we made to the caller now has a very visible effect: if Rust transforms the code, the program prints 1000; if it leaves the code alone and uses the new value of i, it prints 2000. Breaking the rules for shared references in very_trustworthy means that shared references won’t behave as expected in its callers.

This sort of problem arises with almost every kind of transformation Rust might attempt. Even inlining a function into its call site assumes, among other things, that when the callee finishes, control flow returns to the call site. But we opened the chapter with an example of ill-behaved code that violates even that assumption.

It’s basically impossible for Rust (or any other language) to assess whether a transformation to a program preserves its meaning unless it can trust the fundamental features of the language to behave as designed. And whether they do or not can depend not just on the code at hand, but on other, potentially distant, parts of the program. In order to do anything at all with your code, Rust must assume that the rest of your program is well-behaved.

Here, then, are Rust’s rules for well-behaved programs:

The program must not read uninitialized memory.
The program must not create invalid primitive values:
- References or boxes that are null
- bool values that are not either a 0 or 1
- enum values with invalid discriminant values
- char values that are not valid, nonsurrogate Unicode code points
- str values that are not well-formed UTF-8
The rules for references explained in Chapter 5 must be followed. No reference may outlive its referent; shared access is read-only access; and mutable access is exclusive access.
The program must not dereference null, incorrectly aligned, or dangling pointers.
The program must not use a pointer to access memory outside the allocation with which the pointer is associated. We will explain this rule in detail in “Dereferencing Raw Pointers Safely”.
The program must be free of data races. A data race occurs when two threads access the same memory location without synchronization, and at least one of the accesses is a write.
The program must not unwind across a call made from another language, via the foreign function interface, as explained in “Unwinding”.
The program must comply with the contracts of standard library functions.

These rules are all that Rust assumes in the process of optimizing your program and translating it into machine language. Undefined behavior is, simply, any violation of these rules. This is why we say that Rust assumes your program will not exhibit undefined behavior: this assumption is necessary if we hope to conclude that the compiled program is a faithful translation of the source code.

Rust code that does not use unsafe features is guaranteed to follow all of the preceding rules, once it compiles. Only when you use unsafe features do these rules become your responsibility. In C and C++, the fact that your program compiles without errors or warnings means much less; as we mentioned in the introduction to this book, even the best C and C++ programs written by well-respected projects that hold their code to high standards exhibit undefined behavior in practice.

Unsafe Traits

An unsafe trait is a trait that has a contract Rust cannot check or enforce that implementers must satisfy to avoid undefined behavior. To implement an unsafe trait, you must mark the implementation as unsafe. It is up to you to understand the trait’s contract, and make sure your type satisfies it.

A function that bounds its type variables with an unsafe trait is typically one that uses unsafe features itself, and satisfies their contracts only by depending on the unsafe trait’s contract. An incorrect implementation of the trait could cause such a function to exhibit undefined behavior.

The classic examples of unsafe traits are std::marker::Send and std::marker::Sync. These traits don’t define any methods, so they’re trivial to implement for any type you like. But they do have contracts: Send requires implementers to be safe to move to another thread, and Sync requires them to be safe to share among threads via shared references. Implementing Send for an inappropriate type, for example, would make std::sync::Mutex no longer safe from data races.

As a simple example, the Rust library includes an unsafe trait, core::nonzero::Zeroable, for types that can be safely initialized by setting all their bytes to zero. Clearly, zeroing a usize is fine, but zeroing a &T gives you a null reference, which will cause a crash if dereferenced. For types that are zeroable, some optimizations are possible: you can initialize an array of them quickly with std::mem::write_bytes (Rust’s equivalent of memset), or use operating system calls that allocate zeroed pages. (As of Rust 1.17, Zeroable is experimental, so it may be changed or removed in future versions of Rust, but it’s a good, simple, real-world example.)

Zeroable is a typical marker trait, lacking methods or associated types:

pub unsafe trait Zeroable {}

The implementations for appropriate types are similarly straightforward:

unsafe impl Zeroable for u8 {}
unsafe impl Zeroable for i32 {}
unsafe impl Zeroable for usize {}
// and so on for all the integer types

With these definitions, we can write a function that quickly allocates a vector of a given length containing a Zeroable type:

#![feature(nonzero)]  // permits `Zeroable`

extern crate core;
use core::nonzero::Zeroable;

fn zeroed_vector<T>(len: usize) -> Vec<T>
    where T: Zeroable
{
    let mut vec = Vec::with_capacity(len);
    unsafe {
        std::ptr::write_bytes(vec.as_mut_ptr(), 0, len);
        vec.set_len(len);
    }
    vec
}

This function starts by creating an empty Vec with the required capacity, and then calls write_bytes to fill the unoccupied buffer with zeros. (The write_byte function treats len as a number of T elements, not a number of bytes, so this call does fill the entire buffer.) A vector’s set_len method changes its length without doing anything to the buffer; this is unsafe, because you must ensure that the newly enclosed buffer space actually contains properly initialized values of type T. But this is exactly what the T: Zeroable bound establishes: a block of zero bytes represent a valid T value. Our use of set_len is safe.

Here, we put it to use:

let v: Vec<usize> = zeroed_vector(100_000);
assert!(v.iter().all(|&u| u == 0));

Clearly, Zeroable must be an unsafe trait, since an implementation that doesn’t respect its contract can lead to undefined behavior:

struct HoldsRef<'a>(&'a mut i32);

unsafe impl<'a> Zeroable for HoldsRef<'a> { }

let mut v: Vec<HoldsRef> = zeroed_vector(1);
*v[0].0 = 1;   // crashes: dereferences null pointer

Rust compiles this without complaint: it has no idea what Zeroable is meant to signify, so it can’t tell when it’s being implemented for an inappropriate type. As with any other unsafe feature, it’s up to you to understand and adhere to an unsafe trait’s contract.

Note that unsafe code must not depend on ordinary, safe traits being implemented correctly. For example, suppose there were an implementation of the std::hash::Hasher trait that simply returned a random hash value, with no relation to the values being hashed. The trait requires that hashing the same bits twice must produce the same hash value, but this implementation doesn’t meet that requirement; it’s simply incorrect. But because Hasher is not an unsafe trait, unsafe code must not exhibit undefined behavior when it uses this hasher. The std::collections::HashMap type is carefully written to respect the contracts of the unsafe features it uses regardless of how the hasher behaves. Certainly, the table won’t function correctly: lookups will fail, and entries will appear and disappear at random. But the table will not exhibit undefined behavior .

Raw Pointers

A raw pointer in Rust is an unconstrained pointer. You can use raw pointers to form all sorts of structures that Rust’s checked pointer types cannot, like doubly linked lists or arbitrary graphs of objects. But because raw pointers are so flexible, Rust cannot tell whether you are using them safely or not, so you can dereference them only in an unsafe block.

Raw pointers are essentially equivalent to C or C++ pointers, so they’re also useful for interacting with code written in those languages.

There are two kinds of raw pointers:

A *mut T is a raw pointer to a T that permits modifying its referent.
A *const T is a raw pointer to a T that only permits reading its referent.

(There is no plain *T type; you must always specify either const or mut.)

You can create a raw pointer by conversion from a reference, and dereference it with the * operator:

let mut x = 10;
let ptr_x = &mut x as *mut i32;

let y = Box::new(20);
let ptr_y = &*y as *const i32;

unsafe {
    *ptr_x += *ptr_y;
}
assert_eq!(x, 30);

Unlike boxes and references, raw pointers can be null, like NULL in C or nullptr in C++:

fn option_to_raw<T>(opt: Option<&T>) -> *const T {
    match opt {
        None => std::ptr::null(),
        Some(r) => r as *const T
    }
}

assert!(!option_to_raw(Some(&("pea", "pod"))).is_null());
assert_eq!(option_to_raw::<i32>(None), std::ptr::null());

This example has no unsafe blocks: creating raw pointers, passing them around, and comparing them are all safe. Only dereferencing a raw pointer is unsafe.

A raw pointer to an unsized type is a fat pointer, just as the corresponding reference or Box type would be. A *const [u8] pointer includes a length along with the address, and a trait object like *mut std::io::Write pointer carries a vtable.

Although Rust implicitly dereferences safe pointer types in various situations, raw pointer dereferences must be explicit:

The . operator will not implicitly dereference a raw pointer; you must write (*raw).field or (*raw).method(...).
Raw pointers do not implement Deref, so deref coercions do not apply to them.
Operators like == and < compare raw pointers as addresses: two raw pointers are equal if they point to the same location in memory. Similarly, hashing a raw pointer hashes the address it points to, not the value of its referent.
Formatting traits like std::fmt::Display follow references automatically, but don’t handle raw pointers at all. The exceptions are std::fmt::Debug and std::fmt::Pointer, which show raw pointers as hexadecimal addresses, without dereferencing them.

Unlike the + operator in C and C++, Rust’s + does not handle raw pointers, but you can perform pointer arithmetic via their offset and wrapping_offset methods. There is no standard operation for finding the distance between two pointers, as the - operator does in C and C++, but you can write one yourself:

fn distance<T>(left: *const T, right: *const T) -> isize {
    (left as isize - right as isize) / std::mem::size_of::<T>() as isize
}

let trucks = vec!["garbage truck", "dump truck", "moonstruck"];
let first = &trucks[0];
let last = &trucks[2];
assert_eq!(distance(last, first), 2);
assert_eq!(distance(first, last), -2);

Even though distance’s parameters are raw pointers, we can pass it references: Rust implicitly coerces references to raw pointers (but not the other way around, of course).

The as operator permits almost every plausible conversion from references to raw pointers or between two raw pointer types. However, you may need to break up a complex conversion into a series of simpler steps. For example:

&vec![42_u8] as *const String  // error: invalid conversion
&vec![42_u8] as *const Vec<u8> as *const String;  // permitted

Note that as will not convert raw pointers to references. Such conversions would be unsafe, and as should remain a safe operation. Instead, you must dereference the raw pointer (in an unsafe block), and then borrow the resulting value.

Be very careful when you do this: a reference produced this way has an unconstrained lifetime: there’s no limit on how long it can live, since the raw pointer gives Rust nothing to base such a decision on. In “A Safe Interface to libgit2” later in this chapter, we show several examples of how to properly constrain lifetimes.

Many types have as_ptr and as_mut_ptr methods that return a raw pointer to their contents. For example, array slices and strings return pointers to their first elements, and some iterators return a pointer to the next element they will produce. Owning pointer types like Box, Rc, and Arc have into_raw and from_raw functions that convert to and from raw pointers. Some of these methods’ contracts impose surprising requirements, so check their documentation before using them.

You can also construct raw pointers by conversion from integers, although the only integers you can trust for this are generally those you got from a pointer in the first place. “Example: RefWithFlag” uses raw pointers this way.

Unlike references, raw pointers are neither Send nor Sync. As a result, any type that includes raw pointers does not implement these traits by default. There is nothing inherently unsafe about sending or sharing raw pointers between threads; after all, wherever they go, you still need an unsafe block to dereference them. But given the roles raw pointers typically play, the language designers considered this behavior to be the more helpful default. We already discussed how to implement Send and Sync yourself in “Unsafe Traits”.

Dereferencing Raw Pointers Safely

Here are some common-sense guidelines for using raw pointers safely:

Dereferencing null pointers or dangling pointers is undefined behavior, as is referring to uninitialized memory, or values that have gone out of scope.
Dereferencing pointers that are not properly aligned for their referent type is undefined behavior.
You may borrow values out of a dereferenced raw pointer only if doing so obeys the rules for reference safety explained in Chapter 5: No reference may outlive its referent; shared access is read-only access; and mutable access is exclusive access. (This rule is easy to violate by accident, since raw pointers are often used to create data structures with nonstandard sharing or ownership.)
You may use a raw pointer’s referent only if it is a well-formed value of its type. For example, you must ensure that dereferencing a *const char yields a proper, nonsurrogate Unicode code point.
You may use the offset and wrapping_offset methods on raw pointers only to point to bytes within the variable or heap-allocated block of memory that the original pointer referred to, or to the first byte beyond such a region.

If you do pointer arithmetic by converting the pointer to an integer, doing arithmetic on the integer, and then converting it back to a pointer, the result must be a pointer that the rules for the offset method would have allowed you to produce.
If you assign to a raw pointer’s referent, you must not violate the invariants of any type of which the referent is a part. For example, if you have a *mut u8 pointing to a byte of a String, you may only store values in that u8 that leave the String holding well-formed UTF-8.

The borrowing rule aside, these are essentially the same rules you must follow when using pointers in C or C++.

The reason for not violating types’ invariants should be clear. Many of Rust’s standard types use unsafe code in their implementation, but still provide safe interfaces on the assumption that Rust’s safety checks, module system, and visibility rules will be respected. Using raw pointers to circumvent these protective measures can lead to undefined behavior.

The complete, exact contract for raw pointers is not easily stated, and may change as the language evolves. But the principles outlined here should keep you in safe territory.

Example: RefWithFlag

Here’s an example of how to take a classic¹ bit-level hack made possible by raw pointers, and wrap it up as a completely safe Rust type. This module defines a type, RefWithFlag<'a, T>, that holds both a &'a T and a bool, like the tuple (&'a T, bool), and yet still manages to occupy only one machine word instead of two. This sort of technique is used regularly in garbage collectors and virtual machines, where certain types—say, the type representing an object—are so numerous that adding even a single word to each value would drastically increase memory use:

mod ref_with_flag {
    use std::marker::PhantomData;
    use std::mem::align_of;

    /// A `&T` and a `bool`, wrapped up in a single word.
    /// The type `T` must require at least two-byte alignment.
    ///
    /// If you're the kind of programmer who's never met a pointer whose
    /// 2⁰-bit you didn't want to steal, well, now you can do it safely!
    /// ("But it's not nearly as exciting this way...")
    pub struct RefWithFlag<'a, T: 'a> {
        ptr_and_bit: usize,
        behaves_like: PhantomData<&'a T> // occupies no space
    }

    impl<'a, T: 'a> RefWithFlag<'a, T> {
        pub fn new(ptr: &'a T, flag: bool) -> RefWithFlag<T> {
            assert!(align_of::<T>() % 2 == 0);
            RefWithFlag {
                ptr_and_bit: ptr as *const T as usize | flag as usize,
                behaves_like: PhantomData
            }
        }

        pub fn get_ref(&self) -> &'a T {
            unsafe {
                let ptr = (self.ptr_and_bit & !1) as *const T;
                &*ptr
            }
        }

        pub fn get_flag(&self) -> bool {
            self.ptr_and_bit & 1 != 0
        }
    }
}

This code takes advantage of the fact that many types must be placed at even addresses in memory: since an even address’s least significant bit is always zero, we can store something else there, and then reliably reconstruct the original address just by masking off the bottom bit. Not all types qualify; for example, the types u8 and (bool, [i8; 2]) can be placed at any address. But we can check the type’s alignment on construction and refuse types that won’t work.

You can use RefWithFlag like this:

use ref_with_flag::RefWithFlag;

let vec = vec![10, 20, 30];
let flagged = RefWithFlag::new(&vec, true);
assert_eq!(flagged.get_ref()[1], 20);
assert_eq!(flagged.get_flag(), true);

The constructor RefWithFlag::new takes a reference and a bool value, asserts that the reference’s type is suitable, and then converts the reference to a raw pointer, and then a usize. The usize type is defined to be large enough to hold a pointer on whatever processor we’re compiling for, so converting a raw pointer to a usize and back is well-defined. Once we have a usize, we know it must be even, so we can use the | bitwise-or operator to combine it with the bool, which we’ve converted to an integer 0 or 1.

The get_flag method extracts the bool component of a RefWithFlag. It’s simple: just mask off the bottom bit and check if it’s nonzero.

The get_ref method extracts the reference from a RefWithFlag. First, it masks off the usize’s bottom bit and converts it to a raw pointer. The as operator will not convert raw pointers to references, but we can dereference the raw pointer (in an unsafe block, naturally) and borrow that. Borrowing a raw pointer’s referent gives you a reference with an unbounded lifetime: Rust will accord the reference whatever lifetime would make the code around it check, if there is one. Usually, though, there is some specific lifetime which is more accurate, and would thus catch more mistakes. In this case, since get_ref’s return type is &'a T, Rust sees that the reference’s lifetime is the same as RefWithFlag’s lifetime parameter 'a, which is just what we want: that’s the lifetime of the reference we started with.

In memory, a RefWithFlag looks just like a usize: since PhantomData is a zero-sized type, the behaves_like field takes up no space in the structure. But the PhantomData is necessary for Rust to know how to treat lifetimes in code that uses RefWithFlag. Imagine what the type would look like without the behaves_like field:

// This won't compile.
pub struct RefWithFlag<'a, T: 'a> {
    ptr_and_bit: usize
}

In Chapter 5, we pointed out that any structure containing references must not outlive the values they borrow, lest the references become dangling pointers. The structure must abide by the restrictions that apply to its fields. This certainly applies to RefWithFlag: in the example code we just looked at, flagged must not outlive vec, since flagged.get_ref() returns a reference to it. But our reduced RefWithFlag type contains no references at all, and never uses its lifetime parameter 'a. It’s just a usize. How should Rust know that any restrictions apply to ptr_and_bit’s lifetime? Including a PhantomData<&'a T> field tells Rust to treat RefWithFlag<'a, T> as if it contained a &'a T, without actually affecting the struct’s representation.

Although Rust doesn’t really know what’s going on (that’s what makes RefWithFlag unsafe), it will do its best to help you out with this. If you omit the behaves_like field, Rust will complain that the parameters 'a and T are unused, and suggest using a PhantomData.

RefWithFlag uses the same tactics as the Ascii type we presented earlier to avoid undefined behavior in its unsafe block. The type itself is pub, but its fields are not, meaning that only code within the ref_with_flag module can create or look inside a RefWithFlag value. You don’t have to inspect much code to have confidence that the ptr_and_bit field is well constructed.

Nullable Pointers

A null raw pointer in Rust is a zero address, just as in C and C++. For any type T, the std::ptr::null<T> function returns a *const T null pointer, and std::ptr::null_mut<T> returns a *mut T null pointer.

There are a few ways to check whether a raw pointer is null. The simplest is the is_null method, but the as_ref method may be more convenient: it takes a *const T pointer and returns an Option<&'a T>, turning a null pointer into a None. Similarly, the as_mut method converts *mut T pointers into Option<&'a mut T> values.

Type Sizes and Alignments

A value of any Sized type occupies a constant number of bytes in memory, and must be placed at an address that is a multiple of some alignment value, determined by the machine architecture. For example, an (i32, i32) tuple occupies eight bytes, and most processors prefer it to be placed at an address that is a multiple of four.

The call std::mem::size_of::<T>() returns the size of a value of type T, in bytes, and std::mem::align_of::<T>() returns its required alignment. For example:

assert_eq!(std::mem::size_of::<i64>(), 8);
assert_eq!(std::mem::align_of::<(i32, i32)>(), 4);

Any type’s alignment is always a power of two.

A type’s size is always rounded up to a multiple of its alignment, even if it technically could fit in less space. For example, even though a tuple like (f32, u8) requires only five bytes, size_of::<(f32, u8)>() is 8, because align_of::<(f32, u8)>() is 4. This ensures that if you have an array, the size of the element type always reflects the spacing between one element and the next.

For unsized types, the size and alignment depend on the value at hand. Given a reference to an unsized value, the std::mem::size_of_val and std::mem::align_of_val functions return the value’s size and alignment. These functions can operate on references to both Sized and unsized types.

// Fat pointers to slices carry their referent's length.
let slice: &[i32] = &[1, 3, 9, 27, 81];
assert_eq!(std::mem::size_of_val(slice), 20);

let text: &str = "alligator";
assert_eq!(std::mem::size_of_val(text), 9);

use std::fmt::Display;
let unremarkable: &Display = &193_u8;
let remarkable: &Display = &0.0072973525664;

// These return the size/alignment of the value the
// trait object points to, not those of the trait object
// itself. This information comes from the vtable the
// trait object refers to.
assert_eq!(std::mem::size_of_val(unremarkable), 1);
assert_eq!(std::mem::align_of_val(remarkable), 8);

Pointer Arithmetic

Rust lays out the elements of an array, slice, or vector as a single contiguous block of memory, as shown in Figure 21-1. Elements are regularly spaced, so that if each element occupies size bytes, then the i’th element starts with the i * size’th byte.

An array of four `i32` elements, cleverly named `array`. Each element occupies four bytes. The start address is the address of the first byte of the first element. `array[0]` falls at byte offset 0. `array[3]` falls at byte offset 12.

One nice consequence of this is that if you have two raw pointers to elements of an array, comparing the pointers gives the same results as comparing the elements’ indices: if i < j, then a raw pointer to the i’th element is less than a raw pointer to the j’th element. This makes raw pointers useful as bounds on array traversals. In fact, the standard library’s simple iterator over a slice is defined like this:

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
    ...
}

The ptr field points to the next element iteration should produce, and the end field serves as the limit: when ptr == end, the iteration is complete.

Another nice consequence of array layout: if element_ptr is a *const T or *mut T raw pointer to the i’th element of some array, then element_ptr.offset(o) is a raw pointer to the (i + o)’th element. Its definition is equivalent to this:

fn offset(self: *const T, count: isize) -> *const T
    where T: Sized
{
    let bytes_per_element = std::mem::size_of::<T>() as isize;
    let byte_offset = count * bytes_per_element;
    (self as isize).checked_add(byte_offset).unwrap() as *const T
}

The std::mem::size_of::<T> function returns the size of the type T in bytes. Since isize is, by definition, large enough to hold an address, you can convert the base pointer to an isize, do arithmetic on that value, and then convert the result back to a pointer.

It’s fine to produce a pointer to the first byte after the end of an array. You cannot dereference such a pointer, but it can be useful to represent the limit of a loop, or for bounds checks.

However, it is undefined behavior to use offset to produce a pointer beyond that point, or before the start of the array, even if you never dereference it. For the sake of optimization, Rust would like to assume that ptr.offset(i) > ptr when i is positive, and that ptr.offset(i) < ptr when i is negative. This assumption seems safe, but it may not hold if the arithmetic in offset overflows an isize value. If i is constrained to stay within the same array as ptr, no overflow can occur: after all, the array itself does not overflow the bounds of the address space. (To make pointers to the first byte after the end safe, Rust never places values at the upper end of the address space.)

If you do need to offset pointers beyond the limits of the array they are associated with, you can use the wrapping_offset method. This is equivalent to offset, but Rust makes no assumptions about the relative ordering of ptr.wrapping_offset(i) and ptr itself. Of course, you still can’t dereference such pointers unless they fall within the array.

Moving into and out of Memory

If you are implementing a type that manages its own memory, you will need to track which parts of your memory hold live values and which are uninitialized, just as Rust does with local variables. Consider this code:

let pot = "pasta".to_string();
let plate;

plate = pot;

After this code has run, the situation looks like Figure 21-2.

Two local string variables, `pot` and `plate`. `pot` is uninitialized, but still holds its prior pointer, capacity, and length. `plate` holds the string `"pasta"`.

After the assignment, pot is uninitialized, and plate is the owner of the string.

At the machine level, it’s not specified what a move does to the source, but in practice it usually does nothing at all. The assignment probably leaves pot still holding a pointer, capacity, and length for the string. Naturally, it would be disastrous to treat this as a live value, and Rust ensures that you don’t.

The same considerations apply to data structures that manage their own memory. Suppose you run this code:

let mut noodles = vec!["udon".to_string()];
let soba = "soba".to_string();
let last;

In memory, the state looks like Figure 21-3.

A vector holding one string, with capacity for one more. A variable `soba`, holding the string "soba". A variable `last`, uninitialized.

The vector has the spare capacity to hold one more element, but its contents are junk, probably whatever that memory held previously. Suppose you then run this code:

noodles.push(soba);

Pushing the string onto the vector transforms that uninitialized memory into a new element, as illustrated in Figure 21-4.

A vector holding two strings, with no more capacity. A variable `soba`, uninitialized, but still holding the pointer, capacity, and length it held while live. A variable `last`, uninitialized.

The vector has initialized its empty space to own the string, and incremented its length to mark this as a new, live element. The vector is now the owner of the string; you can refer to its second element, and dropping the vector would free both strings. And soba is now uninitialized.

Finally, consider what happens when we pop a value from the vector:

last = noodles.pop().unwrap();

In memory, things now look like Figure 21-5.

A vector holding one strings, with capacity for one more. The vector's space capacity still holds the pointer, capacity and length it held while live. A variable `soba`, uninitialized, but still holding the pointer, capacity, and length it held while live. A variable `last`, owning a string.

The variable last has taken ownership of the string. The vector has decremented its length to indicate that the space that used to hold the string is now uninitialized.

Just as with pot and pasta earlier, all three of soba, last, and the vector’s free space probably hold identical bit patterns. But only last is considered to own the value. Treating either of the other two locations as live would be a mistake.

The true definition of an initialized value is one that is treated as live. Writing to a value’s bytes is usually a necessary part of initialization, but only because doing so prepares the value to be treated as live.

Rust tracks local variables at compile time. Types like Vec, HashMap, Box, and so on track their buffers dynamically. If you implement a type that manages its own memory, you will need to do the same.

Rust provides two essential operations for implementing such types:

std::ptr::read(src) moves a value out of the location src points to, transferring ownership to the caller. After calling read, you must treat *src as uninitialized memory. The src argument should be a *const T raw pointer, where T is a sized type.

This is the operation behind Vec::pop. Popping a value calls read to move the value out of the buffer, and then decrements the length to mark that space as uninitialized capacity.
std::ptr::write(dest, value) moves value into the location dest points to, which must be uninitialized memory before the call. The referent now owns the value. Here, dest must be a *mut T raw pointer and value a T value, where T is a sized type.

This is the operation behind Vec::push. Pushing a value calls write to move the value into the next available space, and then increments the length to mark that space as a valid element.

Both are free functions, not methods on the raw pointer types.

Note that you cannot do these things with any of Rust’s safe pointer types. They all require their referents to be initialized at all times, so transforming uninitialized memory into a value, or vice versa, is outside their reach. Raw pointers fit the bill.

The standard library also provides functions for moving arrays of values from one block of memory to another:

std::ptr::copy(src, dst, count) moves the array of count values in memory starting at src to the memory at dst, just as if you had written a loop of read and write calls to move them one at a time. The destination memory must be uninitialized before the call, and afterward the source memory is left uninitialized. The src and dest arguments must be *const T and *mut T raw pointers, and count must be a usize.
std::ptr::copy_nonoverlapping(src, dst, count) is like the corresponding call to copy, except that its contract further requires that the source and destination blocks of memory must not overlap. This may be slightly faster than calling copy.

There are two other families of read and write functions, also in the std::ptr module:

The read_unaligned and write_unaligned functions are like read and write, except that the pointer need not be aligned as normally required for the referent type. These functions may be slower than the plain read and write functions.
The read_volatile and write_volatile functions are the equivalent of volatile reads and writes in C or C++.

Example: GapBuffer

Here’s an example that puts the raw pointer functions just described to use.

Suppose you’re writing a text editor, and you’re looking for a type to represent the text. You could choose String, and use the insert and remove methods to insert and delete characters as the user types. But if they’re editing text at the beginning of a large file, those methods can be expensive: inserting a new character involves shifting the entire rest of the string to the right in memory, and deletion shifts it all back to the left. You’d like such common operations to be cheaper.

The Emacs text editor uses a simple data structure called a gap buffer which can insert and delete characters in constant time. Whereas a String keeps all its spare capacity at the end of the text, which makes push and pop cheap, a gap buffer keeps its spare capacity in the midst of the text, at the point where editing is taking place. This spare capacity is called the gap. Inserting or deleting elements at the gap is cheap: you simply shrink or enlarge the gap as needed. You can move the gap to any location you like by shifting text from one side of the gap to the other. When the gap is empty, you migrate to a larger buffer.

While insertion and deletion in a gap buffer are fast, changing the position at which they take place entails moving the gap to the new position. Shifting the elements requires time proportional to the distance being moved. Fortunately, typical editing activity involves making a bunch of changes in one neighborhood of the buffer before going off and fiddling with text someplace else.

In this section we’ll implement a gap buffer in Rust. To avoid being distracted by UTF-8, we’ll make our buffer store char values directly, but the principles of operation would be the same if we stored the text in some other form.

First, we’ll show a gap buffer in action. This code creates a GapBuffer, inserts some text in it, and then moves the insertion point to sit just before the last word:

use gap::GapBuffer;

let mut buf = GapBuffer::new();
buf.insert_iter("Lord of the Rings".chars());
buf.set_position(12);

After running this code, the buffer looks as shown in Figure 21-6.

A gap buffer containing the text "Lord of the Rings". The buffer has capacity for 28 characters, but contains only 17. The first twelve characters appear at the left, followed by an eleven-character gap, followed by the remaining five characters.

Insertion is a matter of filling in the gap with new text. This code adds a word and ruins the film:

buf.insert_iter("Onion ".chars());

This results in the state shown in Figure 21-7.

A gap buffer containing the text "Lord of the Onion Rings". The buffer has capacity for 28 characters, but contains only 23. The first 18 characters appear at the left, followed by a five-character gap, followed by the remaining five characters.

Here’s our GapBuffer type:

mod gap {
    use std;
    use std::ops::Range;

    pub struct GapBuffer<T> {
        // Storage for elements. This has the capacity we need, but its length
        // always remains zero. GapBuffer puts its elements and the gap in this
        // `Vec`'s "unused" capacity.
        storage: Vec<T>,

        // Range of uninitialized elements in the middle of `storage`.
        // Elements before and after this range are always initialized.
        gap: Range<usize>
    }

    ...
}

GapBuffer uses its storage field in a strange way.² It never actually stores any elements in the vector—or not quite. It simply calls Vec::with_capacity(n) to get a block of memory large enough to hold n values, obtains raw pointers to that memory via the vector’s as_ptr and as_mut_ptr methods, and then uses the buffer directly for its own purposes. The vector’s length always remains zero. When the Vec gets dropped, the Vec doesn’t try to free its elements, because it doesn’t know it has any, but it does free the block of memory. This is what GapBuffer wants; it has its own Drop implementation that knows where the live elements are and drops them correctly.

GapBuffer’s simplest methods are what you’d expect:

impl<T> GapBuffer<T> {
    pub fn new() -> GapBuffer<T> {
        GapBuffer { storage: Vec::new(), gap: 0..0 }
    }

    /// Return the number of elements this GapBuffer could hold without
    /// reallocation.
    pub fn capacity(&self) -> usize {
        self.storage.capacity()
    }

    /// Return the number of elements this GapBuffer currently holds.
    pub fn len(&self) -> usize {
        self.capacity() - self.gap.len()
    }

    /// Return the current insertion position.
    pub fn position(&self) -> usize {
        self.gap.start
    }

    ...
}

It cleans up many of the following functions to have a utility method that returns a raw pointer to the buffer element at a given index. This being Rust, we end up needing one method for mut pointers and one for const. Unlike the preceding methods, these are not public. Continuing this impl block:

/// Return a pointer to the `index`'th element of the underlying storage,
/// regardless of the gap.
///
/// Safety: `index` must be a valid index into `self.storage`.
unsafe fn space(&self, index: usize) -> *const T {
    self.storage.as_ptr().offset(index as isize)
}

/// Return a mutable pointer to the `index`'th element of the underlying
/// storage, regardless of the gap.
///
/// Safety: `index` must be a valid index into `self.storage`.
unsafe fn space_mut(&mut self, index: usize) -> *mut T {
    self.storage.as_mut_ptr().offset(index as isize)
}

To find the element at a given index, you must consider whether the index falls before or after the gap, and adjust appropriately:

/// Return the offset in the buffer of the `index`'th element, taking
/// the gap into account. This does not check whether index is in range,
/// but it never returns an index in the gap.
fn index_to_raw(&self, index: usize) -> usize {
    if index < self.gap.start {
        index
    } else {
        index + self.gap.len()
    }
}

/// Return a reference to the `index`'th element,
/// or `None` if `index` is out of bounds.
pub fn get(&self, index: usize) -> Option<&T> {
    let raw = self.index_to_raw(index);
    if raw < self.capacity() {
        unsafe {
            // We just checked `raw` against self.capacity(),
            // and index_to_raw skips the gap, so this is safe.
            Some(&*self.space(raw))
        }
    } else {
        None
    }
}

When we start making insertions and deletions in a different part of the buffer, we need to move the gap to the new location. Moving the gap to the right entails shifting elements to the left, and vice versa, just as the bubble in a spirit level moves in one direction when the fluid flows in the other:

/// Set the current insertion position to `pos`.
/// If `pos` is out of bounds, panic.
pub fn set_position(&mut self, pos: usize) {
    if pos > self.len() {
        panic!("index {} out of range for GapBuffer", pos);
    }

    unsafe {
        let gap = self.gap.clone();
        if pos > gap.start {
            // `pos` falls after the gap. Move the gap right
            // by shifting elements after the gap to before it.
            let distance = pos - gap.start;
            std::ptr::copy(self.space(gap.end),
                           self.space_mut(gap.start),
                           distance);
        } else if pos < gap.start {
            // `pos` falls before the gap. Move the gap left
            // by shifting elements before the gap to after it.
            let distance = gap.start - pos;
            std::ptr::copy(self.space(pos),
                           self.space_mut(gap.end - distance),
                           distance);
        }

        self.gap = pos .. pos + gap.len();
    }
}

This function uses the std::ptr::copy method to shift the elements; copy requires that the destination be uninitialized, and leaves the source uninitialized. The source and destination ranges may overlap, but copy handles that case correctly. Since the gap is uninitialized memory before the call, and the function adjusts the gap’s position to cover space vacated by the copy, the copy function’s contract is satisfied.

Element insertion and removal are relatively simple. Insertion takes over one space from the gap for the new element, whereas removal moves one value out, and enlarges the gap to cover the space it used to occupy:

/// Insert `elt` at the current insertion position,
/// and leave the insertion position after it.
pub fn insert(&mut self, elt: T) {
    if self.gap.len() == 0 {
        self.enlarge_gap();
    }

    unsafe {
        let index = self.gap.start;
        std::ptr::write(self.space_mut(index), elt);
    }
    self.gap.start += 1;
}

/// Insert the elements produced by `iter` at the current insertion
/// position, and leave the insertion position after them.
pub fn insert_iter<I>(&mut self, iterable: I)
    where I: IntoIterator<Item=T>
{
    for item in iterable {
        self.insert(item)
    }
}

/// Remove the element just after the insertion position
/// and return it, or return `None` if the insertion position
/// is at the end of the GapBuffer.
pub fn remove(&mut self) -> Option<T> {
    if self.gap.end == self.capacity() {
        return None;
    }

    let element = unsafe {
        std::ptr::read(self.space(self.gap.end))
    };
    self.gap.end += 1;
    Some(element)
}

Similar to the way Vec uses std::ptr::write for push and std::ptr::read for pop, GapBuffer uses write for insert, and read for remove. And just as Vec must adjust its length to maintain the boundary between initialized elements and spare capacity, GapBuffer adjusts its gap.

When the gap has been filled in, the insert method must grow the buffer to acquire more free space. The enlarge_gap method (the last in the impl block) handles this:

/// Double the capacity of `self.storage`.
fn enlarge_gap(&mut self) {
    let mut new_capacity = self.capacity() * 2;
    if new_capacity == 0 {
        // The existing vector is empty.
        // Choose a reasonable starting capacity.
        new_capacity = 4;
    }

    // We have no idea what resizing a Vec does with its "unused"
    // capacity. So just create a new vector and move over the elements.
    let mut new = Vec::with_capacity(new_capacity);
    let after_gap = self.capacity() - self.gap.end;
    let new_gap = self.gap.start .. new.capacity() - after_gap;

    unsafe {
        // Move the elements that fall before the gap.
        std::ptr::copy_nonoverlapping(self.space(0),
                                      new.as_mut_ptr(),
                                      self.gap.start);

        // Move the elements that fall after the gap.
        let new_gap_end = new.as_mut_ptr().offset(new_gap.end as isize);
        std::ptr::copy_nonoverlapping(self.space(self.gap.end),
                                      new_gap_end,
                                      after_gap);
    }

    // This frees the old Vec, but drops no elements,
    // because the Vec's length is zero.
    self.storage = new;
    self.gap = new_gap;
}

Whereas set_position must use copy to move elements back and forth in the gap, enlarge_gap can use copy_nonoverlapping, since it is moving elements to an entirely new buffer.

Moving the new vector into self.storage drops the old vector. Since its length is zero, the old vector believes it has no elements to drop, and simply frees its buffer. Neatly, copy_nonoverlapping leaves its source uninitialized, so the old vector is correct in this belief: all the elements are now owned by the new vector.

Finally, we need to make sure that dropping a GapBuffer drops all its elements:

impl<T> Drop for GapBuffer<T> {
    fn drop(&mut self) {
        unsafe {
            for i in 0 .. self.gap.start {
                std::ptr::drop_in_place(self.space_mut(i));
            }
            for i in self.gap.end .. self.capacity() {
                std::ptr::drop_in_place(self.space_mut(i));
            }
        }
    }
}

The elements lie before and after the gap, so we iterate over each region and use the std::ptr::drop_in_place function to drop each one. The drop_in_place function is a utility that behaves like drop(std::ptr::read(ptr)), but doesn’t bother moving the value to its caller (and hence works on unsized types). And just as in enlarge_gap, by the time the vector self.storage is dropped, its buffer really is uninitialized.

Like the other types we’ve shown in this chapter, GapBuffer ensures that its own invariants are sufficient to ensure that the contract of every unsafe feature it uses is followed, so none of its public methods need be marked unsafe. GapBuffer implements a safe interface for a feature that cannot be written efficiently in safe code.

Panic Safety in Unsafe Code

In Rust, panics can’t usually cause undefined behavior; the panic! macro is not an unsafe feature. But when you decide to work with unsafe code, panic safety becomes part of your job.

Consider the GapBuffer::remove method from the previous section:

pub fn remove(&mut self) -> Option<T> {
    if self.gap.end == self.capacity() {
        return None;
    }

    let element = unsafe {
        std::ptr::read(self.space(self.gap.end))
    };
    self.gap.end += 1;
    Some(element)
}

The call to read moves the element immediately after the gap out of the buffer, leaving behind uninitialized space. Fortunately, the very next statement enlarges the gap to cover that space, so by the time we return, everything is as it should be: all elements outside the gap are initialized, and all elements inside the gap are uninitialized.

But consider what would happen if, after the call to read but before the adjustment to self.gap.end, this code tried to use a feature that might panic—say, indexing a slice. Exiting the method abruptly anywhere between those two actions would leave the GapBuffer with an uninitialized element outside the gap. The next call to remove could try to read it again; and even simply dropping the GapBuffer would try to drop it. Both are undefined behavior, because they access uninitialized memory.

It’s all but unavoidable for a type’s methods to momentarily relax the type’s invariants while they do their job, and then put everything back to rights before they return. A panic mid-method could cut that cleanup process short, leaving the type in an inconsistent state.

If the type uses only safe code, then this inconsistency may make the type misbehave, but it can’t introduce undefined behavior. But code using unsafe features is usually counting on its invariants to meet the contracts of those features. Broken invariants lead to broken contracts, which lead to undefined behavior.

When working with unsafe features, you must take special care to identify these sensitive regions, and ensure that they do nothing that might panic.

Foreign Functions: Calling C and C++ from Rust

Rust’s foreign function interface lets Rust code call functions written in C or C++.

In this section, we’ll write a program that links with libgit2, a C library for working with the Git version control system. First, we’ll show what it’s like to use C functions directly from Rust. Then, we’ll show how to construct a safe interface to libgit2, taking inspiration from the open source git2-rs crate, which does exactly that.

We’ll assume that you’re familiar with C and the mechanics of compiling and linking C programs. Working with C++ is similar. We’ll also assume that you’re somewhat familiar with the Git version control system.

Finding Common Data Representations

The common denominator of Rust and C is machine language, so in order to anticipate what Rust values look like to C code, or vice versa, you need to consider their machine-level representations. Throughout the book, we’ve made a point of showing how values are actually represented in memory, so you’ve probably noticed that the data worlds of C and Rust have a lot in common: a Rust usize and a C size_t are identical, for example, and structs are fundamentally the same idea in both languages. To establish a correspondence between Rust and C types, we’ll start with primitives and then work our way up to more complicated types.

Given its primary use as a systems programming language, C has always been surprisingly loose about its types’ representations: an int is typically 32 bits long, but could be longer, or as short as 16 bits; a C char may be signed or unsigned; and so on. To cope with this variability, Rust’s std::os::raw module defines a set of Rust types that are guaranteed to have the same representation as certain C types. These cover the primitive integer and character types:

C type	Corresponding std::os::raw type
`short`	`c_short`
`int`	`c_int`
`long`	`c_long`
`long long`	`c_longlong`
`unsigned short`	`c_ushort`
`unsigned`, `unsigned int`	`c_uint`
`unsigned long`	`c_ulong`
`unsigned long long`	`c_ulonglong`
`char`	`c_char`
`signed char`	`c_schar`
`unsigned char`	`c_uchar`
`float`	`c_float`
`double`	`c_double`
`void `, `const void `	`mut c_void`, `const c_void`

Some notes about the table:

Except for c_void, all the Rust types here are aliases for some primitive Rust type: c_char, for example, is either i8 or u8.
There is no endorsed Rust type corresponding to C’s bool. At the moment, a Rust bool is always either a zero or a one byte, the same representation used by all major C and C++ implementations. However, the Rust language team has not committed to keep this representation in the future, since doing so may close opportunities for optimization.
Rust’s 32-bit char type is not the analogue of wchar_t, whose width and encoding vary from one implementation to another. C’s char32_t type is closer, but its encoding is still not guaranteed to be Unicode.
Rust’s primitive usize and isize types have the same representations as C’s size_t and ptrdiff_t.
C and C++ pointers and C++ references correspond to Rust’s raw pointer types, *mut T and *const T.
Technically, the C standard permits implementations to use representations for which Rust has no corresponding type: 36-bit integers, sign-and-magnitude representations for signed values, and so on. In practice, on every platform Rust has been ported to, every common C integer type has a match in Rust, bool aside.

For defining Rust struct types compatible with C structs, you can use the #[repr(C)] attribute. Placing #[repr(C)] above a struct definition asks Rust to lay out the struct’s fields in memory the same way a C compiler would lay out the analogous C struct type. For example, libgit2’s git2/errors.h header file defines the following C struct to provide details about a previously reported error:

typedef struct {
    char *message;
    int klass;
} git_error;

You can define a Rust type with an identical representation as follows:

#[repr(C)]
pub struct git_error {
    pub message: *const c_char,
    pub klass: c_int
}

The #[repr(C)] attribute affects only the layout of the struct itself, not the representations of its individual fields, so to match the C struct, each field must use the C-like type as well: *const c_char for char *, and c_int for int, and so on.

In this particular case, the #[repr(C)] attribute probably doesn’t change the layout of git_error. There really aren’t too many interesting ways to lay out a pointer and an integer. But whereas C and C++ guarantee that a structure’s members appear in memory in the order they’re declared, each at a distinct address, Rust reorders fields to minimize the overall size of the struct, and zero-sized types take up no space. The #[repr(C)] attribute tells Rust to follow C’s rules for the given type.

You can also use #[repr(C)] to control the representation of C-style enums:

#[repr(C)]
enum git_error_code {
    GIT_OK         =  0,
    GIT_ERROR      = -1,
    GIT_ENOTFOUND  = -3,
    GIT_EEXISTS    = -4,
     ...
}

Normally, Rust plays all sorts of games when choosing how to represent enums. For example, we mentioned the trick Rust uses to store Option<&T> in a single word (if T is sized). Without #[repr(C)], Rust would use a single byte to represent the git_error_code enum; with #[repr(C)], Rust uses a value the size of a C int, just as C would.

You can also ask Rust to give an enum the same representation as some integer type. Starting the preceding definition with #[repr(i16)] would give you a 16-bit type with the same representation as the following C++ enum:

#include <stdint.h>

enum git_error_code: int16_t {
    GIT_OK         =  0,
    GIT_ERROR      = -1,
    GIT_ENOTFOUND  = -3,
    GIT_EEXISTS    = -4,
    ...
};

Passing strings between Rust and C is a little harder. C represents a string as a pointer to an array of characters, terminated by a null character. Rust, on the other hand, stores the length of a string explicitly, either as a field of a String, or as the second word of a fat reference &str. Rust strings are not null-terminated; in fact, they may include null characters in their contents, like any other character.

This means that you can’t borrow a Rust string as a C string: if you pass C code a pointer into a Rust string, it could mistake an embedded null character for the end of the string, or run off the end looking for a terminating null that isn’t there. Going the other direction, you may be able to borrow a C string as a Rust &str, as long as its contents are well-formed UTF-8.

This situation effectively forces Rust to treat C strings as types entirely distinct from String and &str. In the std::ffi module, the CString and CStr types represent owned and borrowed null-terminated arrays of bytes. Compared to String and str, the methods on CString and CStr are quite limited, restricted to construction and conversion to other types. We’ll show these types in action in the next section.

Declaring Foreign Functions and Variables

An extern block declares functions or variables defined in some other library that the final Rust executable will be linked with. For example, every Rust program is linked against the standard C library, so we can tell Rust about the C library’s strlen function like this:

use std::os::raw::c_char;

extern {
    fn strlen(s: *const c_char) -> usize;
}

This gives Rust the function’s name and type, while leaving the definition to be linked in later.

Rust assumes that functions declared inside extern blocks use C conventions for passing arguments and accepting return values. They are defined as unsafe functions. These are the right choices for strlen: it is indeed a C function; and its specification in C requires that you pass it a valid pointer to a properly terminated string, which is a contract that Rust cannot enforce. (Almost any function that takes a raw pointer must be unsafe: safe Rust can construct raw pointers from arbitrary integers, and dereferencing such a pointer would be undefined behavior.)

With this extern block, we can call strlen like any other Rust function, although its type gives it away as a tourist:

use std::ffi::CString;

let rust_str = "I'll be back";
let null_terminated = CString::new(rust_str).unwrap();
unsafe {
    assert_eq!(strlen(null_terminated.as_ptr()), 12);
}

The CString::new function builds a null-terminated C string. It first checks its argument for embedded null characters, since those cannot be represented in a C string, and returns an error if it finds any (hence the need to unwrap the result). Otherwise, it adds a null byte to the end, and returns a CString owning the resulting characters.

The cost of CString::new depends on what type you pass it. It accepts anything that implements Into<Vec<u8>>. Passing a &str entails an allocation and a copy, as the conversion to Vec<u8> builds a heap-allocated copy of the string for the vector to own. But passing a String by value simply consumes the string and takes over its buffer, so unless appending the null character forces the buffer to be resized, the conversion requires no copying of text or allocation at all.

CString dereferences to CStr, whose as_ptr method returns a *const c_char pointing at the start of the string. This is the type that strlen expects. In the example, strlen runs down the string, finds the null character that CString::new placed there, and returns the length, as a byte count.

You can also declare global variables in extern blocks. POSIX systems have a global variable named environ that holds the values of the process’s environment variables. In C, it’s declared:

extern char **environ;

In Rust, you would say:

use std::ffi::CStr;
use std::os::raw::c_char;

extern {
    static environ: *mut *mut c_char;
}

To print the environment’s first element, you could write:

unsafe {
    if !environ.is_null() && !(*environ).is_null() {
        let var = CStr::from_ptr(*environ);
        println!("first environment variable: {}",
                 var.to_string_lossy())
    }
}

After making sure environ has a first element, the code calls CStr::from_ptr to build a CStr that borrows it. The to_string_lossy method returns a Cow<str>: if the C string contains well-formed UTF-8, the Cow borrows its content as a &str, not including the terminating null byte. Otherwise, to_string_lossy makes a copy of the text in the heap, replaces the ill-formed UTF-8 sequences with the official Unicode replacement character, '�', and builds an owning Cow from that. Either way, the result implements Display, so you can print it with the {} format parameter.

Using Functions from Libraries

To use functions provided by a particular library, you can place a #[link] attribute atop the extern block that names the library Rust should link the executable with. For example, here’s a program that calls libgit2’s initialization and shutdown methods, but does nothing else:

use std::os::raw::c_int;

#[link(name = "git2")]
extern {
    pub fn git_libgit2_init() -> c_int;
    pub fn git_libgit2_shutdown() -> c_int;
}

fn main() {
    unsafe {
        git_libgit2_init();
        git_libgit2_shutdown();
    }
}

The extern block declares the extern functions as before. The #[link(name = "git2")] attribute leaves a note in the crate to the effect that, when Rust creates the final executable or shared library, it should link against the git2 library. Rust uses the system linker to build executables; on Unix, this passes the argument -lgit2 on the linker command line; on Windows, it passes git2.LIB.

#[link] attributes work in library crates, too. When you build a program that depends on other crates, Cargo gathers together the link notes from the entire dependency graph, and includes them all in the final link.

In this example, if you would like to follow along on your own machine, you’ll need to build libgit2 for yourself. We used libgit2 version 0.25.1, available from https://libgit2.github.com. To compile libgit2, you will need to install the CMake build tool and the Python language; we used CMake version 3.8.0 and Python version 2.7.13, downloaded from https://cmake.org and https://www.python.org.

The full instructions for building libgit2 are available on its website, but they’re simple enough that we’ll show the essentials here. On Linux, assume you’ve already unzipped the library’s source into the directory /home/jimb/libgit2-0.25.1:

$ cd /home/jimb/libgit2-0.25.1
$ mkdir build
$ cd build
$ cmake ..
$ cmake --build .

On Linux, this produces a shared library /home/jimb/libgit2-0.25.1/build/libgit2.so.0.25.1 with the usual nest of symlinks pointing to it, including one named libgit2.so. On macOS, the results are similar, but the library is named libgit2.dylib.

On Windows, things are also straightforward. Assume you’ve unzipped the source into the directory C:UsersJimBlibgit2-0.25.1. In a Visual Studio command prompt:

> cd C:UsersJimBlibgit2-0.25.1
> mkdir build
> cd build
> cmake -A x64 ..
> cmake --build .

These are the same commands as used on Linux, except that you must request a 64-bit build when you run CMake the first time, to match your Rust compiler. (If you have installed the 32-bit Rust toolchain, then you should omit the -A x64 flag to the first cmake command.) This produces an import library git2.LIB and a dynamic-link library git2.DLL, both in the directory C:UsersJimBlibgit2-0.25.1uildDebug. (The remaining instructions are shown for Unix, except where Windows is substantially different.)

Create the Rust program in a separate directory:

$ cd /home/jimb
$ cargo new --bin git-toy

Put the code above in src/main.rs. Naturally, if you try to build this, Rust has no idea where to find the libgit2 you built:

$ cd git-toy
$ cargo run
   Compiling git-toy v0.1.0 (file:///home/jimb/git-toy)
error: linking with `cc` failed: exit code: 1
  |
  = note: "cc" ... "-l" "git2" ...
  = note: /usr/bin/ld: cannot find -lgit2
          collect2: error: ld returned 1 exit status


error: aborting due to previous error

error: Could not compile `git-toy`.

To learn more, run the command again with --verbose.
$

You can tell Rust where to search for libraries by writing a build script, Rust code that Cargo compiles and runs at build time. Build scripts can do all sorts of things: generate code dynamically, compile C code to be included in the crate, and so on. In this case, all you need is to add a library search path to the executable’s link command. When Cargo runs the build script, it parses the build script’s output for information of this sort, so the build script simply needs to print the right magic to its standard output.

To create your build script, add a file named build.rs in the same directory as the Cargo.toml file, with the following contents:

fn main() {
    println!(r"cargo:rustc-link-search=native=/home/jimb/libgit2-0.25.1/build");
}

This is the right path for Linux; on Windows, you would change the path following the text native= to C:UsersJimBlibgit2-0.25.1uildDebug. (We’re cutting some corners to keep this example simple; in a real application, you should avoid using absolute paths in your build script. We cite documentation that shows how to do it right at the end of this section.)

Next, tell Cargo that this is your build script by adding the line build = "build.rs" to the [package] section of your Cargo.toml file. The entire file should now read:

[package]
name = "git-toy"
version = "0.1.0"
authors = ["You <[email protected]>"]
build = "build.rs"

[dependencies]

Now you can almost run the program. On macOS it may work immediately; on a Linux system you will probably see something like the following:

$ cargo run
   Compiling git-toy v0.1.0 (file:///home/jimb/git-toy)
    Finished dev [unoptimized + debuginfo] target(s) in 0.64 secs
     Running `target/debug/git-toy`
target/debug/git-toy: error while loading shared libraries:
libgit2.so.25: cannot open shared object file: No such file or directory
$

This means that, although Cargo succeeded in linking the executable against the library, it doesn’t know where to find the shared library at run time. Windows reports this failure by popping up a dialog box. On Linux, you must set the LD_LIBRARY_PATH environment variable:

$ export LD_LIBRARY_PATH=/home/jimb/libgit2-0.25.1/build:$LD_LIBRARY_PATH
$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/git-toy`
$

On macOS, you may need to set DYLD_LIBRARY_PATH instead.

On Windows, you must set the PATH environment variable:

> set PATH=C:UsersJimBlibgit2-0.25.1uildDebug;%PATH%
> cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/git-toy`
>

Naturally, in a deployed application you’d want to avoid having to set environment variables just to find your library’s code. One alternative is to statically link the C library into your crate. This copies the library’s object files into the crate’s .rlib file, alongside the object files and metadata for the crate’s Rust code. The entire collection then participates in the final link.

It is a Cargo convention that a crate that provides access to a C library should be named LIB-sys, where LIB is the name of the C library. A -sys crate should contain nothing but the statically linked library and Rust modules containing extern blocks and type definitions. Higher-level interfaces then belong in crates that depend on the -sys crate. This allows multiple upstream crates to depend on the same -sys crate, assuming there is a single version of the -sys crate that meets everyone’s needs.

For the full details on Cargo’s support for build scripts and linking with system libraries, see the online Cargo documentation. It shows how to avoid absolute paths in build scripts, control compilation flags, use tools like pkg-config, and so on. The git2-rs crate also provides good examples to emulate; its build script handles some complex situations.

A Raw Interface to libgit2

Figuring out how to use libgit2 properly breaks down into two questions:

What does it take to use libgit2 functions in Rust?
How can we build a safe Rust interface around them?

We’ll take these questions one at a time. In this section, we’ll write a program that’s essentially a single giant unsafe block filled with nonidiomatic Rust code, reflecting the clash of type systems and conventions that is inherent in mixing languages. We’ll call this the raw interface. The code will be messy, but it will make plain all the steps that must occur for Rust code to use libgit2.

Then, in the next section, we’ll build a safe interface to libgit2 that puts Rust’s types to use enforcing the rules libgit2 imposes on its users. Fortunately, libgit2 is an exceptionally well-designed C library, so the questions that Rust’s safety requirements force us to ask all have pretty good answers, and we can construct an idiomatic Rust interface with no unsafe functions.

The program we’ll write is very simple: it takes a path as a command-line argument, opens the Git repository there, and prints out the head commit. But this is enough to illustrate the key strategies for building safe and idiomatic Rust interfaces.

For the raw interface, the program will end up needing a somewhat larger collection of functions and types from libgit2 than we used before, so it makes sense to move the extern block into its own module. We’ll create a file named raw.rs in git-toy/src whose contents are as follows:

#![allow(non_camel_case_types)]

use std::os::raw::{c_int, c_char, c_uchar};

#[link(name = "git2")]
extern {
    pub fn git_libgit2_init() -> c_int;
    pub fn git_libgit2_shutdown() -> c_int;
    pub fn giterr_last() -> *const git_error;

    pub fn git_repository_open(out: *mut *mut git_repository,
                               path: *const c_char) -> c_int;
    pub fn git_repository_free(repo: *mut git_repository);

    pub fn git_reference_name_to_id(out: *mut git_oid,
                                    repo: *mut git_repository,
                                    reference: *const c_char) -> c_int;

    pub fn git_commit_lookup(out: *mut *mut git_commit,
                             repo: *mut git_repository,
                             id: *const git_oid) -> c_int;

    pub fn git_commit_author(commit: *const git_commit) -> *const git_signature;
    pub fn git_commit_message(commit: *const git_commit) -> *const c_char;
    pub fn git_commit_free(commit: *mut git_commit);
}

pub enum git_repository {}
pub enum git_commit {}

#[repr(C)]
pub struct git_error {
    pub message: *const c_char,
    pub klass: c_int
}

#[repr(C)]
pub struct git_oid {
    pub id: [c_uchar; 20]
}

pub type git_time_t = i64;

#[repr(C)]
pub struct git_time {
    pub time: git_time_t,
    pub offset: c_int
}

#[repr(C)]
pub struct git_signature {
    pub name: *const c_char,
    pub email: *const c_char,
    pub when: git_time
}

Each item here is modeled on a declaration from libgit2’s own header files. For example, libgit2-0.25.1/include/git2/repository.h includes this declaration:

extern int git_repository_open(git_repository **out, const char *path);

This function tries to open the Git repository at path. If all goes well, it creates a git_repository object and stores a pointer to it in the location pointed to by out. The equivalent Rust declaration is the following:

pub fn git_repository_open(out: *mut *mut git_repository,
                           path: *const c_char) -> c_int;

The libgit2 public header files define the git_repository type as a typedef for an incomplete struct type:

typedef struct git_repository git_repository;

Since the details of this type are private to the library, the public headers never define struct git_repository, ensuring that the library’s users can never build an instance of this type themselves. One possible analogue to an incomplete struct type in Rust is this:

pub enum git_repository {}

This is an enum type with no variants. There is no way in Rust to make a value of such a type. This is an oddity, but it’s perfect as the reflection of a C type that only libgit2 should ever construct, and which is manipulated solely through raw pointers.

Writing large extern blocks by hand can be a chore. If you are creating a Rust interface to a complex C library, you may want to try using the bindgen crate, which has functions you can use from your build script to parse C header files and generate the corresponding Rust declarations automatically. We don’t have space to show bindgen in action here, but bindgen’s page on crates.io includes links to its documentation.

Next we’ll rewrite main.rs completely. First, we need to declare the raw module:

mod raw;

According to libgit2’s conventions, fallible functions return an integer code that is positive or zero on success, and negative on failure. If an error occurs, the giterr_last function will return a pointer to a git_error structure providing more details about what went wrong. libgit2 owns this structure, so we don’t need to free it ourselves, but it could be overwritten by the next library call we make. A proper Rust interface would use Result, but in the raw version, we want to use the libgit2 functions just as they are, so we’ll have to roll our own function for handling errors:

use std::ffi::CStr;
use std::os::raw::c_int;

fn check(activity: &'static str, status: c_int) -> c_int {
    if status < 0 {
        unsafe {
            let error = &*raw::giterr_last();
            println!("error while {}: {} ({})",
                     activity,
                     CStr::from_ptr(error.message).to_string_lossy(),
                     error.klass);
            std::process::exit(1);
        }
    }

    status
}

We’ll use this function to check the results of libgit2 calls like this:

check("initializing library", raw::git_libgit2_init());

This uses the same CStr methods used earlier: from_ptr to construct the CStr from a C string, and to_string_lossy to turn that into something Rust can print.

Next, we need a function to print out a commit:

unsafe fn show_commit(commit: *const raw::git_commit) {
    let author = raw::git_commit_author(commit);

    let name = CStr::from_ptr((*author).name).to_string_lossy();
    let email = CStr::from_ptr((*author).email).to_string_lossy();
    println!("{} <{}>
", name, email);

    let message = raw::git_commit_message(commit);
    println!("{}", CStr::from_ptr(message).to_string_lossy());
}

Given a pointer to a git_commit, show_commit calls git_commit_author and git_commit_message to retrieve the information it needs. These two functions follow a convention that the libgit2 documentation explains as follows:

If a function returns an object as a return value, that function is a getter and the object’s lifetime is tied to the parent object.

In Rust terms, author and message are borrowed from commit: show_commit doesn’t need to free them itself, but it must not hold on to them after commit is freed. Since this API uses raw pointers, Rust won’t check their lifetimes for us: if we do accidentally create dangling pointers, we probably won’t find out about it until the program crashes.

The preceding code assumes these fields hold UTF-8 text, which is not always correct. Git permits other encodings as well. Interpreting these strings properly would probably entail using the encoding crate. For brevity’s sake, we’ll gloss over those issues here.

Our program’s main function reads as follows:

use std::ffi::CString;
use std::mem;
use std::ptr;
use std::os::raw::c_char;

fn main() {
    let path = std::env::args().skip(1).next()
        .expect("usage: git-toy PATH");
    let path = CString::new(path)
        .expect("path contains null characters");

    unsafe {
        check("initializing library", raw::git_libgit2_init());

        let mut repo = ptr::null_mut();
        check("opening repository",
              raw::git_repository_open(&mut repo, path.as_ptr()));

        let c_name = b"HEAD".as_ptr() as *const c_char;
        let mut oid = mem::uninitialized();
        check("looking up HEAD",
              raw::git_reference_name_to_id(&mut oid, repo, c_name));

        let mut commit = ptr::null_mut();
        check("looking up commit",
              raw::git_commit_lookup(&mut commit, repo, &oid));

        show_commit(commit);

        raw::git_commit_free(commit);

        raw::git_repository_free(repo);

        check("shutting down library", raw::git_libgit2_shutdown());
    }
}

This starts with code to handle the path argument and initialize the library, all of which we’ve seen before. The first novel code is this:

let mut repo = ptr::null_mut();
check("opening repository",
      raw::git_repository_open(&mut repo, path.as_ptr()));

The call to git_repository_open tries to open the Git repository at the given path. If it succeeds, it allocates a new git_repository object for it, and sets repo to point to that. Rust implicitly coerces references into raw pointers, so passing &mut repo here provides the *mut *mut git_repository the call expects.

This shows another libgit2 convention in use. Again, from the libgit2 documentation:

Objects which are returned via the first argument as a pointer-to-pointer are owned by the caller and it is responsible for freeing them.

In Rust terms, functions like git_repository_open pass ownership of the new value to the caller.

Next, consider the code that looks up the object hash of the repository’s current head commit:

let mut oid = mem::uninitialized();
check("looking up HEAD",
      raw::git_reference_name_to_id(&mut oid, repo, c_name));

The git_oid type stores an object identifier—a 160-bit hash code that Git uses internally (and throughout its delightful user interface) to identify commits, individual versions of files, and so on. This call to git_reference_name_to_id looks up the object identifier of the current "HEAD" commit.

In C it’s perfectly normal to initialize a variable by passing a pointer to it to some function that fills in its value; this is how git_reference_name_to_id expects to treat its first argument. But Rust won’t let us borrow a reference to an uninitialized variable. We could initialize oid with zeros, but this is a waste: any value stored there will simply be overwritten.

Initializing oid to uninitialized() gets around this problem. The std::mem::uninitialized function returns a value of any type you like, except that the value consists entirely of uninitialized bits, and no machine code is actually spent producing the value. Rust, however, considers oid to have been assigned some value, so it lets us borrow the reference to it. As you can imagine, in the general case, this is very unsafe. Reading an uninitialized value is undefined behavior, and if any part of the value implements Drop, even dropping it is undefined behavior as well. There are only a few safe things you can do:

You can overwrite it with std::ptr::write, which requires its destination to be uninitialized.
You can pass it to std::mem::forget, which takes ownership of its argument and makes it disappear without dropping it (applying this to an initialized value might be a leak).
You can pass it to a foreign function designed to initialize it, like git_reference_name_to_id.

If the call succeeds, then oid becomes truly initialized, and all is well. If the call fails, the function doesn’t use oid, and its type doesn’t need to be dropped, so the code is safe in that case too.

We could use uninitialized for the repo and commit variables as well, but since these are just single words and uninitialized is so dicey to use, we just go ahead and initialize them to null:

let mut commit = ptr::null_mut();
check("looking up commit",
      raw::git_commit_lookup(&mut commit, repo, &oid));

This takes the commit’s object identifier and looks up the actual commit, storing a git_commit pointer in commit on success.

The remainder of the main function should be self-explanatory. It calls the show_commit function defined earlier, frees the commit and repository objects, and shuts down the library.

Now we can try out the program on any Git repository ready at hand:

$ cargo run /home/jimb/rbattle
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/git-toy /home/jimb/rbattle`
Jim Blandy <jimb@red-bean.com>

Animate goop a bit.

$

A Safe Interface to libgit2

The raw interface to libgit2 is a perfect example of an unsafe feature: it certainly can be used correctly (as we do here, so far as we know), but Rust can’t enforce the rules you must follow. Designing a safe API for a library like this is a matter of identifying all these rules, and then finding ways to turn any violation of them into a type or borrow-checking error.

Here, then, are libgit2’s rules for the features the program uses:

You must call git_libgit2_init before using any other library function. You must not use any library function after calling git_libgit2_shutdown.
All values passed to libgit2 functions must be fully initialized, except for output parameters.
When a call fails, output parameters passed to hold the results of the call are left uninitialized, and you must not use their values.
A git_commit object refers to the git_repository object it is derived from, so the former must not outlive the latter. (This isn’t spelled out in the libgit2 documentation; we inferred it from the presence of certain functions in the interface, and then verified it by reading the source code.)
Similarly, a git_signature is always borrowed from a given git_commit, and the former must not outlive the latter. (The documentation does cover this case.)
The message associated with a commit and the name and email address of the author are all borrowed from the commit, and must not be used after the commit is freed.
Once a libgit2 object has been freed, it must never be used again.

As it turns out, you can build a Rust interface to libgit2 that enforces all of these rules, either through Rust’s type system, or by managing details internally.

Before we get started, let’s restructure the project a little bit. We’d like to have a git module that exports the safe interface, of which the raw interface from the previous program is a private submodule.

The whole source tree will look like this:

git-toy/
├── Cargo.toml
├── build.rs
└── src/
    ├── main.rs
    └── git/
        ├── mod.rs
        └── raw.rs

Following the rules we explained in “Modules in Separate Files”, the source for the git module appears in git/mod.rs, and the source for its git::raw submodule goes in git/raw.rs.

Once again, we’re going to rewrite main.rs completely. It should start with a declaration of the git module:

mod git;

Then, we’ll need to create the git subdirectory, and move raw.rs into it:

$ cd /home/jimb/git-toy
$ mkdir src/git
$ mv src/raw.rs src/git/raw.rs

The git module needs to declare its raw submodule. The file src/git/mod.rs must say:

mod raw;

Since it’s not pub, this submodule is not visible to the main program.

In a bit we’ll need to use some functions from the libc crate, so we must add a dependency in Cargo.toml. The full file now reads:

[package]
name = "git-toy"
version = "0.1.0"
authors = ["Jim Blandy <[email protected]>"]
build = "build.rs"

[dependencies]
libc = "0.2.23"

The corresponding extern crate item must appear in src/main.rs:

extern crate libc;

Now that we’ve restructured our modules, let’s consider error handling. Even libgit2’s initialization function can return an error code, so we’ll need to have this sorted out before we can get started. An idiomatic Rust interface needs its own Error type that captures the libgit2 failure code as well as the error message and class from giterr_last. A proper error type must implement the usual Error, Debug, and Display traits. Then, it needs its own Result type that uses this Error type. Here are the necessary definitions in src/git/mod.rs:

use std::error;
use std::fmt;
use std::result;

#[derive(Debug)]
pub struct Error {
    code: i32,
    message: String,
    class: i32
}

impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> result::Result<(), fmt::Error> {
        // Displaying an `Error` simply displays the message from libgit2.
        self.message.fmt(f)
    }
}

impl error::Error for Error {
    fn description(&self) -> &str { &self.message }
}

pub type Result<T> = result::Result<T, Error>;

To check the result from raw library calls, the module needs a function that turns a libgit2 return code into a Result:

use std::os::raw::c_int;
use std::ffi::CStr;

fn check(code: c_int) -> Result<c_int> {
    if code >= 0 {
        return Ok(code);
    }

    unsafe {
        let error = raw::giterr_last();

        // libgit2 ensures that (*error).message is always non-null and null
        // terminated, so this call is safe.
        let message = CStr::from_ptr((*error).message)
            .to_string_lossy()
            .into_owned();

        Err(Error {
            code: code as i32,
            message,
            class: (*error).klass as i32
        })
    }
}

The main difference between this and the check function from the raw version is that this constructs an Error value instead of printing an error message and exiting immediately.

Now we’re ready to tackle libgit2 initialization. The safe interface will provide a Repository type that represents an open Git repository, with methods for resolving references, looking up commits, and so on. Continuing in git/mod.rs, here’s the definition of Repository:

/// A Git repository.
pub struct Repository {
    // This must always be a pointer to a live `git_repository` structure.
    // No other `Repository` may point to it.
    raw: *mut raw::git_repository
}

A Repository’s raw field is not public. Since only code in this module can access the raw::git_repository pointer, getting this module right should ensure the pointer is always used correctly.

If the only way to create a Repository is to successfully open a fresh Git repository, that will ensure that each Repository points to a distinct git_repository object:

use std::path::Path;

impl Repository {
    pub fn open<P: AsRef<Path>>(path: P) -> Result<Repository> {
        ensure_initialized();

        let path = path_to_cstring(path.as_ref())?;
        let mut repo = null_mut();
        unsafe {
            check(raw::git_repository_open(&mut repo, path.as_ptr()))?;
        }
        Ok(Repository { raw: repo })
    }
}

Since the only way to do anything with the safe interface is to start with a Repository value, and Repository::open starts with a call to ensure_initialized, we can be confident that ensure_initialized will be called before any libgit2 functions. Its definition is as follows:

use std;
use libc;

fn ensure_initialized() {
    static ONCE: std::sync::Once = std::sync::ONCE_INIT;
    ONCE.call_once(|| {
        unsafe {
            check(raw::git_libgit2_init())
                .expect("initializing libgit2 failed");
            assert_eq!(libc::atexit(shutdown), 0);
        }
    });
}

use std::io::Write;

extern fn shutdown() {
    unsafe {
        if let Err(e) = check(raw::git_libgit2_shutdown()) {
            let _ = writeln!(std::io::stderr(),
                             "shutting down libgit2 failed: {}",
                             e);
            std::process::abort();
        }
    }
}

The std::sync::Once type helps run initialization code in a thread-safe way. Only the first thread to call ONCE.call_once runs the given closure. Any subsequent calls, by this thread or any other, block until the first has completed and then return immediately, without running the closure again. Once the closure has finished, calling ONCE.call_once is cheap, requiring nothing more than an atomic load of a flag stored in ONCE.

In the preceding code, the initialization closure calls git_libgit2_init and checks the result. It punts a bit and just uses expect to make sure initialization succeeded, instead of trying to propagate errors back to the caller.

To make sure the program calls git_libgit2_shutdown, the initialization closure uses the C library’s atexit function, which takes a pointer to a function to invoke before the process exits. Rust closures cannot serve as C function pointers: a closure is a value of some anonymous type carrying the values of whatever variables it captures, or references to them; a C function pointer is just a pointer. However, Rust fn types work fine, as long as you declare them extern so that Rust knows to use the C calling conventions. The local function shutdown fits the bill, and ensures libgit2 gets shut down properly.

In “Unwinding”, we mentioned that it is undefined behavior for a panic to cross language boundaries. The call from atexit to shutdown is such a boundary, so it is essential that shutdown not panic. This is why shutdown can’t simply use .expect to handle errors reported from raw::git_libgit2_shutdown. Instead, it must report the error and terminate the process itself. POSIX forbids calling exit within an atexit handler, so shutdown calls std::process::abort to terminate the program abruptly.

It might be possible to arrange to call git_libgit2_shutdown sooner—say, when the last Repository value is dropped. But no matter how we arrange things, calling git_libgit2_shutdown must be the safe API’s responsibility. The moment it is called, any extant libgit2 objects become unsafe to use, so a safe API must not expose this function directly.

A Repository’s raw pointer must always point to a live git_repository object. This implies that the only way to close a repository is to drop the Repository value that owns it:

impl Drop for Repository {
    fn drop(&mut self) {
        unsafe {
            raw::git_repository_free(self.raw);
        }
    }
}

By calling git_repository_free only when the sole pointer to the raw::git_repository is about to go away, the Repository type also ensures the pointer will never be used after it’s freed.

The Repository::open method uses a private function called path_to_cstring, which has two definitions—one for Unix-like systems, and one for Windows:

use std::ffi::CString;

#[cfg(unix)]
fn path_to_cstring(path: &Path) -> Result<CString> {
    // The `as_bytes` method exists only on Unix-like systems.
    use std::os::unix::ffi::OsStrExt;

    Ok(CString::new(path.as_os_str().as_bytes())?)
}

#[cfg(windows)]
fn path_to_cstring(path: &Path) -> Result<CString> {
    // Try to convert to UTF-8. If this fails, libgit2 can't handle the path
    // anyway.
    match path.to_str() {
        Some(s) => Ok(CString::new(s)?),
        None => {
            let message = format!("Couldn't convert path '{}' to UTF-8",
                                  path.display());
            Err(message.into())
        }
    }
}

The libgit2 interface makes this code a little tricky. On all platforms, libgit2 accepts paths as null-terminated C strings. On Windows, libgit2 assumes these C strings hold well-formed UTF-8 and converts them internally to the 16-bit paths Windows actually requires. This usually works, but it’s not ideal. Windows permits filenames that are not well-formed Unicode, and thus cannot be represented in UTF-8. If you have such a file, it’s impossible to pass its name to libgit2.

In Rust, the proper representation of a filesystem path is a std::path::Path, carefully designed to handle any path that can appear on Windows or POSIX. This means that there are Path values on Windows that one cannot pass to libgit2, because they are not well-formed UTF-8. So although path_to_cstring’s behavior is less than ideal, it’s actually the best we can do given libgit2’s interface.

The two path_to_cstring definitions just shown rely on conversions to our Error type: the ? operator attempts such conversions, and the Windows version explicitly calls .into(). These conversions are unremarkable:

impl From<String> for Error {
    fn from(message: String) -> Error {
        Error { code: -1, message, class: 0 }
    }
}

// NulError is what `CString::new` returns if a string
// has embedded zero bytes.
impl From<std::ffi::NulError> for Error {
    fn from(e: std::ffi::NulError) -> Error {
        Error { code: -1, message: e.to_string(), class: 0 }
    }
}

Next, let’s figure out how to resolve a Git reference to an object identifier. Since an object identifier is just a 20 byte hash value, it’s perfectly fine to expose it in the safe API:

/// The identifier of some sort of object stored in the Git object
/// database: a commit, tree, blob, tag, etc. This is a wide hash of the
/// object's contents.
pub struct Oid {
    pub raw: raw::git_oid
}

We’ll add a method to Repository to perform the lookup:

use std::mem::uninitialized;
use std::os::raw::c_char;

impl Repository {
    pub fn reference_name_to_id(&self, name: &str) -> Result<Oid> {
        let name = CString::new(name)?;
        unsafe {
            let mut oid = uninitialized();
            check(raw::git_reference_name_to_id(&mut oid, self.raw,
                                                name.as_ptr() as *const c_char))?;
            Ok(Oid { raw: oid })
        }
    }
}

Although oid is left uninitialized when the lookup fails, this function guarantees that its caller can never see the uninitialized value simply by following Rust’s Result idiom: either the caller gets an Ok carrying a properly initialized Oid value, or it gets an Err.

Next, the module needs a way to retrieve commits from the repository. We’ll define a Commit type as follows:

use std::marker::PhantomData;

pub struct Commit<'repo> {
    // This must always be a pointer to a usable `git_commit` structure.
    raw: *mut raw::git_commit,
    _marker: PhantomData<&'repo Repository>
}

As we mentioned earlier, a git_commit object must never outlive the git_repository object it was retrieved from. Rust’s lifetimes let the code capture this rule precisely.

The RefWithFlag example earlier in this chapter used a PhantomData field to tell Rust to treat a type as if it contained a reference with a given lifetime, even though the type apparently contained no such reference. The Commit type needs to do something similar. In this case, the _marker field’s type is PhantomData<&'repo Repository>, indicating that Rust should treat Commit<'repo> as if it held a reference with lifetime 'repo to some Repository.

The method for looking up a commit is as follows:

use std::ptr::null_mut;

impl Repository {
    pub fn find_commit(&self, oid: &Oid) -> Result<Commit> {
        let mut commit = null_mut();
        unsafe {
            check(raw::git_commit_lookup(&mut commit, self.raw, &oid.raw))?;
        }
        Ok(Commit { raw: commit, _marker: PhantomData })
    }
}

How does this relate the Commit’s lifetime to the Repository’s? The signature of find_commit omits the lifetimes of the references involved according to the rules outlined in “Omitting Lifetime Parameters”. If we were to write the lifetimes out, the full signature would read:

fn find_commit<'repo, 'id>(&'repo self, oid: &'id Oid)
    -> Result<Commit<'repo>>

This is exactly what we want: Rust treats the returned Commit as if it borrows something from self, which is the Repository.

When a Commit is dropped, it must free its raw::git_commit:

impl<'repo> Drop for Commit<'repo> {
    fn drop(&mut self) {
        unsafe {
            raw::git_commit_free(self.raw);
        }
    }
}

From a Commit, you can borrow a Signature (a name and email address) and the text of the commit message:

impl<'repo> Commit<'repo> {
    pub fn author(&self) -> Signature {
        unsafe {
            Signature {
                raw: raw::git_commit_author(self.raw),
                _marker: PhantomData
            }
        }
    }

    pub fn message(&self) -> Option<&str> {
        unsafe {
            let message = raw::git_commit_message(self.raw);
            char_ptr_to_str(self, message)
        }
    }
}

Here’s the Signature type:

pub struct Signature<'text> {
    raw: *const raw::git_signature,
    _marker: PhantomData<&'text str>
}

A git_signature object always borrows its text from elsewhere; in particular, signatures returned by git_commit_author borrow their text from the git_commit. So our safe Signature type includes a PhantomData<&'text str> to tell Rust to behave as if it contained a &str with a lifetime of 'text. Just as before, Commit::author properly connects this 'text lifetime of the Signature it returns to that of the Commit without us needing to write a thing. The Commit::message method does the same with the Option<&str> holding the commit message.

A Signature includes methods for retrieving the author’s name and email address:

impl<'text> Signature<'text> {
    /// Return the author's name as a `&str`,
    /// or `None` if it is not well-formed UTF-8.
    pub fn name(&self) -> Option<&str> {
        unsafe {
            char_ptr_to_str(self, (*self.raw).name)
        }
    }

    /// Return the author's email as a `&str`,
    /// or `None` if it is not well-formed UTF-8.
    pub fn email(&self) -> Option<&str> {
        unsafe {
            char_ptr_to_str(self, (*self.raw).email)
        }
    }
}

The preceding methods depend on a private utility function char_ptr_to_str:

/// Try to borrow a `&str` from `ptr`, given that `ptr` may be null or
/// refer to ill-formed UTF-8. Give the result a lifetime as if it were
/// borrowed from `_owner`.
///
/// Safety: if `ptr` is non-null, it must point to a null-terminated C
/// string that is safe to access.
unsafe fn char_ptr_to_str<T>(_owner: &T, ptr: *const c_char) -> Option<&str> {
    if ptr.is_null() {
        return None;
    } else {
        CStr::from_ptr(ptr).to_str().ok()
    }
}

The _owner parameter’s value is never used, but its lifetime is. Making the lifetimes in this function’s signature explicit gives us:

fn char_ptr_to_str<'o, T: 'o>(_owner: &'o T, ptr: *const c_char)
    -> Option<&'o str>

The CStr::from_ptr function returns a &CStr whose lifetime is completely unbounded, since it was borrowed from a dereferenced raw pointer. Unbounded lifetimes are almost always inaccurate, so it’s good to constrain them as soon as possible. Including the _owner parameter causes Rust to attribute its lifetime to the return value’s type, so callers can receive a more accurately bounded reference.

It is not clear from the libgit2 documentation whether a git_signature’s email and author pointers can be null, and this despite the documentation for libgit2 being quite good. Your authors dug around in the source code for some time without being able to persuade themselves one way or the other, and finally decided that char_ptr_to_str had better be prepared for null pointers just in case. In Rust, this sort of question is answered immediately by the type: if it’s &str, you can count on the string to be there; if it’s Option<&str>, it’s optional.

Finally, we’ve provided safe interfaces for all the functionality we need. The new main function in src/main.rs is slimmed down quite a bit, and looks like real Rust code:

fn main() {
    let path = std::env::args_os().skip(1).next()
        .expect("usage: git-toy PATH");

    let repo = git::Repository::open(&path)
        .expect("opening repository");

    let commit_oid = repo.reference_name_to_id("HEAD")
        .expect("looking up 'HEAD' reference");

    let commit = repo.find_commit(&commit_oid)
        .expect("looking up commit");

    let author = commit.author();
    println!("{} <{}>
",
             author.name().unwrap_or("(none)"),
             author.email().unwrap_or("none"));

    println!("{}", commit.message().unwrap_or("(none)"));
}

In this section, we’ve built a safe API on an unsafe API by arranging for any violation of the latter’s contract to be a Rust type error. The result is an interface that Rust can ensure you are using correctly. For the most part, the rules we’ve made Rust enforce are simply the sorts of rules that C and C++ programmers end up imposing on themselves anyway. What makes Rust feel so much stricter than C and C++ is not that the rules are so foreign, but that that enforcement is mechanical and comprehensive .

Conclusion

Rust is not a simple language. Its goal is to span two very different worlds. It’s a modern programming language, safe by design, with conveniences like closures and iterators; yet it aims to put you in control of the raw capabilities of the machine it runs on, with minimal runtime overhead.

The contours of the language are determined by these goals. Rust manages to bridge most of the gap with safe code. Its borrow checker and zero-cost abstractions put you as close to the bare metal as possible without risking undefined behavior. When that’s not enough, or when you want to leverage existing C code, unsafe code stands ready. But again, the language doesn’t just offer you these unsafe features and wish you luck. The goal is always to use unsafe features to build safe APIs. That’s what we did with libgit2. It’s also what the Rust team has done with Box, Vec, the other collections, channels, and more: the standard library is full of safe abstractions, implemented with some unsafe code behind the scenes.

A language with Rust’s ambitions was, perhaps, not destined to be the simplest of tools. But Rust is safe, fast, concurrent—and effective. Use it to build large, fast, secure, robust systems that take advantage of the full power of the hardware they run on. Use it to make software better.

¹ Well, it’s a classic where we come from.

² There are better ways to handle this using the RawVec type from the alloc crate, but that crate is still unstable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
21. Unsafe Code

Chapter 21. Unsafe Code

Unsafe from What?

Unsafe Blocks

Example: An Efficient ASCII String Type

Unsafe Functions

Unsafe Block or Unsafe Function?

Undefined Behavior

Unsafe Traits

Raw Pointers

Dereferencing Raw Pointers Safely

Example: RefWithFlag

Nullable Pointers

Type Sizes and Alignments

Pointer Arithmetic

Figure 21-1. An array in memory

Moving into and out of Memory

Figure 21-2. Moving a string from one local variable to another

Figure 21-3. A vector with uninitialized, spare capacity

Figure 21-4. After pushing soba’s value onto the vector

Figure 21-5. After popping an element from the vector into last

Example: GapBuffer

Figure 21-6. A gap buffer containing some text

Figure 21-7. A gap buffer containing some more text

Panic Safety in Unsafe Code

Foreign Functions: Calling C and C++ from Rust

Finding Common Data Representations

Declaring Foreign Functions and Variables

Using Functions from Libraries

A Raw Interface to libgit2

A Safe Interface to libgit2

Conclusion

Table of Contents for 21. Unsafe Code

Create new playlist

Sign In

Sign Up

Chapter 21. Unsafe Code

Unsafe from What?

Unsafe Blocks

Example: An Efficient ASCII String Type

Unsafe Functions

Unsafe Block or Unsafe Function?

Undefined Behavior

Unsafe Traits

Raw Pointers

Dereferencing Raw Pointers Safely

Example: RefWithFlag

Nullable Pointers

Type Sizes and Alignments

Pointer Arithmetic

Figure 21-1. An array in memory

Moving into and out of Memory

Figure 21-2. Moving a string from one local variable to another

Figure 21-3. A vector with uninitialized, spare capacity

Figure 21-4. After pushing soba’s value onto the vector

Figure 21-5. After popping an element from the vector into last

Example: GapBuffer

Figure 21-6. A gap buffer containing some text

Figure 21-7. A gap buffer containing some more text

Panic Safety in Unsafe Code

Foreign Functions: Calling C and C++ from Rust

Finding Common Data Representations

Declaring Foreign Functions and Variables

Using Functions from Libraries

A Raw Interface to libgit2

A Safe Interface to libgit2

Conclusion

Table of Contents for
21. Unsafe Code