© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
C. MilanesiBeginning Rusthttps://doi.org/10.1007/978-1-4842-7208-4_22

22. Ownership, Moves, and Copies

Carlo Milanesi1  
(1)
Bergamo, Italy
 
In this chapter, you will learn:
  • Why deterministic and implicit destruction of objects is a big plus of Rust

  • The concept of ownership of objects

  • The three kinds of assignment semantics: share semantics, copy semantics, and move semantics

  • Why implicit share semantics is bad for software correctness

  • Why move semantics may have better performance than copy semantics

  • Why some types need copy semantics and others do not

  • Why some types need to be noncloneable

  • How to specify that a type must use copy semantics

  • How to specify that a type is cloneable

Deterministic Destruction

So far, we saw several ways to allocate objects, both in the stack and in the heap:
  • Each variable whose type is a primitive type or an array, and each argument of a function or a closure allocates an object in the stack.

  • Each temporary object is allocated in the stack. A temporary object means a space of stack memory used only during the execution of a single statement.

  • Each Box variable allocates a pointer in the stack and the referenced object in the heap.

  • Each dynamic string and each collection (including vectors) allocates a header in the stack and a data object in the heap.

The actual instant when such objects are allocated is hard to predict, because it depends on compiler optimizations. So, let’s consider the conceptual instant of such allocations.

Conceptually, every stack allocation happens when the corresponding expression first appears in code.

Every heap allocation happens when there is a need for such data. So:
  • Box-ed objects are allocated when the Box::new function is called.

  • Dynamic strings chars are allocated when some chars are added to a string.

  • Collections contents are allocated when some data is added to a collection.

All this is not different from most programming languages.

And when does deallocation of a data item happen?

Conceptually, in Rust, it happens automatically when the data item is no more accessible. So:
  • The objects allocated by variables whose type is a primitive type or an array are deallocated when the block containing the declaration of such variable ends.

  • The objects allocated by arguments of functions or of closures are deallocated when their function/closure block ends.

  • Temporary objects are deallocated when the statement that allocated them ends (that is, at the next semicolon or when the current block ends).

  • Box-ed objects are deallocated when the block containing their declaration ends.

  • Chars contained in dynamic strings are deallocated when they are removed from the string, or anyway when the block containing the string declaration ends.

  • Items contained in collections are deallocated when they are removed from the collection, or anyway when the block containing the collection declaration ends.

These deallocation rules differentiate Rust from most programming languages. In any language that has temporary objects or stack-allocated objects, the objects of such kinds are deallocated automatically. But heap-allocated object deallocation differs for different languages.

In some languages, like Pascal, C, and C++, heap objects are usually deallocated only explicitly, by invoking functions like “free” or “delete.” In other languages, like Java and C#, heap objects are not immediately deallocated when they are not reachable anymore, but there is a routine, run periodically, which finds unreachable heap objects and deallocates them. This mechanism is named “garbage collection ” because it resembles the urban cleaning system: it periodically cleans the town when some garbage has piled up.

So, in C++ and similar languages, heap deallocation is both deterministic and explicit . It is deterministic, because it happens in well-defined positions of source code; and it is explicit, because it requires that the programmer writes a specific deallocation statement. To be deterministic is good, because it has better performance and it allows the programmer to better control what is going on in the computer. But to be explicit is bad, because if deallocations are performed wrongly, nasty bugs result.

In Java and similar languages, heap deallocation is both nondeterministic and implicit. It is nondeterministic because it happens in unknown instants of execution; and it is implicit because it does not require specific deallocation statements. To be nondeterministic is bad, but to be implicit is good.

Differing from both techniques, in Rust, usually, heap deallocation is both deterministic and implicit, and this is a great advantage of Rust over other languages.

This is possible because of the following mechanism, based on the concept of ownership.

Ownership

Let’s introduce the term to own. In computer programming, for an object A to own an object B means that A is responsible for deallocating B. That means two things:
  • Only A can deallocate B.

  • After A has become unreachable, A must deallocate B.

In Rust, there is no explicit deallocation mechanism, so the first of the two things is pointless, and this definition can be reworded in the following way. For object A to own object B means that the Rust language deallocates B after and only after A has become unreachable. Let’s see an example:
let _num = Box::new(3);

In this tiny program, the Box::new call allocates a heap object that is an integer whose value is 3, and the declaration of the _num variable allocates a stack object that is a pointer whose value is the address of that integer object. We say that such pointer owns that integer, because when _num gets out of its block, and so becomes unreachable, the referenced integer object, containing the value 3, is deallocated. Here we have that a reference owns an object.

Not every reference owns an object, though, as this code shows:
let a = 3;
{
    let _a_ref = &a;
}
print!("{}", a);

Here, the _a_ref variable declaration allocates a reference, but that reference owns nothing. Indeed, at the end of the nested block, the _a_ref variable gets out of its block, so the reference is deallocated; however, the referenced object, which is the number having value 3, shouldn’t be immediately deallocated, because it must be printed at the last statement.

To ensure that every object is automatically deallocated when it is referenced no more, Rust has this simple rule: in every instant of execution, every heap object must have exactly one owner: no more, no less. When that owner is deallocated, the object itself is deallocated. If there were several owners, the object could be deallocated several times, and this is a bug usually named double free. If there were no owners, the object could never be deallocated, and this is a bug named memory leak.

Assignment Semantics

It is quite easy to understand the assignment operation, when it concerns primitive types, which involve only stack allocation. Though, with objects involving heap allocation, assignment is not so simple to understand. For example, what does the following program do?
let v1 = vec![11, 22, 33];
#[allow(unused_variables)]
let v2 = v1;
The operations performed are these:
  • A buffer for the contents of the vector is allocated in the heap.

  • The three integers are copied onto that buffer.

  • The header of v1 is allocated in the stack.

  • The header of v1 is initialized, so that it references the newly allocated heap buffer.

  • The header of v2 is allocated in the stack.

  • The header of v2 is initialized, using v1. But, how is that initialization implemented?

In general, there are at least three ways to implement such operation:
  • Share semantics: The header of v1 is copied onto the header of v2, and nothing else happens. Subsequently, both v1 and v2 can be used, and they both refer to the same heap buffer; therefore, they refer to the same object, not to two distinct objects having the same value. This semantics is implemented by garbage-collecting languages, like Java.

  • Copy semantics: Another heap buffer is allocated. It is as large as the heap buffer used by v1, and the contents of the preexisting buffer are copied onto the new buffer. Then the header of v2 is initialized so that it references the newly allocated buffer. Therefore, the two variables refer to two distinct objects, which initially have the same value. This is implemented, by default, by C++.

  • Move semantics: The header of v1 is copied onto the header of v2, and nothing else happens. Subsequently, v2 can be used, and it refers to the heap buffer that was allocated for v1, but v1 cannot be used anymore. This is implemented, by default, by Rust.

This code shows the move semantics used by Rust:
let v1 = vec![11, 22, 33];
#[allow(unused_variables)]
let v2 = v1;
print!("{}", v1.len());

The compilation of this code will generate, at the last line, the error: borrow of moved value: `v1`. When the value of v1 is assigned to v2, the variable v1 ceases to exist. Trying to use it, even only to get its length, is disallowed by the compiler.

Let’s see why Rust does not implement share semantics. First, if variables are mutable, such semantics would be somewhat confusing. With share semantics, after an item is changed through a variable, that item appears to be changed also when it is accessed through the other variable. It wouldn’t be intuitive, and possibly it is a source of bugs. Therefore, share semantics would be acceptable only for read-only data.

But there is a bigger problem, regarding deallocation. If share semantics was used, both the header of v1 and the header of v2 would own the single data buffer, so when they are deallocated, the same heap buffer would be deallocated twice, causing memory corruption.

To solve this problem, the languages that use share semantics do not deallocate memory at the end of the block in which such memory is allocated. They resort to garbage collection.

Instead, both copy semantics and move semantics are correct. Indeed, the Rust rule regarding deallocation is that any object must have exactly one owner. When copy semantics is used, the original vector buffer keeps its single owner, which is the vector header referenced by v1; and the newly created vector buffer gets its single owner, which is the vector header referenced by v2. On the other hand, when move semantics is used, the single vector buffer changes owner: before the assignment, its owner is the vector header referenced by v1, and after the assignment, its owner is the vector header referenced by v2. Before the assignment, the v2 header does not exist yet, and after the assignment the v1 header does not exist anymore.

And why does Rust not implement copy semantics?

Actually, in some cases copy semantics is more appropriate, but in other cases move semantics is more appropriate. Even C++, since 2011, allows both copy semantics and move semantics. Here is a C++ program using both semantics:
#include <iostream>
#include <vector>
int main() {
    auto v1 = std::vector<int> { 11, 22, 33 };
    const auto v2 = v1;
    const auto v3 = move(v1);
    std::cout << v1.size() << " "
        << v2.size() << " " << v3.size();
}

It will print: 0 3 3.

The first statement of the main function initializes the v1 vector. The second one copies that vector to the v2 vector. The third statement moves the contents of v1 to the v3 vector. The last statement prints the current length of the vectors.

The move C++ standard function copies the value of a vector header to another vector header, like Rust assignments do.

In addition, in C++, the move function empties the source vector (v1). This happens because, in C++, a statement cannot make a variable undefined, so the moved object (v1) must have a valid value.

Therefore, at the end of the program, v2 has a copy of the three items, v3 has just the original three items that were created for v1, and v1 is empty.

In Rust, the way to perform both copy semantics and move semantics is this:
let v1 = vec![11, 22, 33];
let v2 = v1.clone();
let v3 = v1;
// ILLEGAL: print!("{} ", v1.len());
print!("{} {}", v2.len(), v3.len());

This will print 3 3.

This Rust program is similar to the previously shown C++ program.

While in C++ the default semantics is a copy, and it is needed to invoke the move standard function to make a move, in Rust the default semantics is a move, and it is needed to invoke the clone standard function to make a copy. So, the second statement copies the buffer of v1 to v2, and the third statement moves the buffer of v1 to v3.

In addition, while the v1 moved vector in C++ is still accessible, but emptied, in Rust such a variable is not accessible at all anymore. So, the last-but-one statement is not allowed. If it is compiled, it would generate the error: borrow of moved value: `v1`.

Copying vs. Moving Performance

The choice of Rust to favor move semantics is about performance. For an object that owns a heap buffer, like a vector, it is faster to move it than to copy it, because a move of the vector is just a copy of the header; while a copy of the vector requires allocating and initializing a potentially large heap buffer, which eventually will be deallocated. In general, the design choices of Rust are to allow any operation, but to use a more compact notation for the safest and more efficient operations.

In addition, in C++, moved objects are not meant to be used anymore, but, to keep the language backward-compatible with the legacy code base, moved objects are still accessible, and there is the chance that a programmer erroneously uses such objects. In addition, to empty a moved vector has a (small) cost, and when a vector is destructed it should be checked if it is empty, and that also has a (small) cost. Rust has been designed to avoid using moved objects, so there is no chance of erroneously using a moved vector; and the compiler can produce better code because it knows when a vector is moved.

Moving and Destroying Objects

All these concepts apply not only to vectors but also to any object that has a reference to a heap object, like a String or a Box.

This is a Rust program:
let s1 = "abcd".to_string();
let s2 = s1.clone();
let s3 = s1;
// ILLEGAL: print!("{} ", s1.len());
print!("{} {}", s2.len(), s3.len());

It will print: 4 4. Any attempt to access s1 at the end of the program will cause a compilation error.

This is a similar C++ program:
#include <iostream>
#include <string>
int main() {
    auto s1 = std::string { "abcd" };
    const auto s2 = s1;
    const auto s3 = move(s1);
    std::cout << s1.size() << " "
        << s2.size() << " " << s3.size();
}

It will print: 0 4 4, because the moved string s1 has become empty, but it is still accessible.

And this Rust program:
let i1 = Box::new(12345i16);
let i2 = i1.clone();
let i3 = i1;
// ILLEGAL: print!("{} ", i1);
print!("{} {}", i2, i3);

will print: 12345 12345. Any attempt to access i1 at the end of the program will cause a compilation error.

It is similar to this C++ program:
#include <iostream>
#include <memory>
int main() {
    auto i1 = std::unique_ptr<short> {
        new short(12345)
    };
    const auto i2 = std::unique_ptr<short> {
        new short(*i1)
    };
    const auto i3 = move(i1);
    std::cout << (bool)i1 << " " << (bool)i2 << " "
        << (bool)i3 << " " << *i2 << " " << *i3;
}

It will print: 0 1 1 12345 12345. The last statement first checks which unique pointers are null, by casting them to the bool type ; only i1 is null, because it was moved to i3. Then, the values referenced by i2 and i3 are printed.

In Rust, objects are not moved only when they are used to initialize a variable, but also when assigning a variable already having a value, like in this code:
let v1 = vec![false; 3];
let mut _v2 = vec![false; 2];
_v2 = v1;
v1;
and also when passing a value to a function argument, like in this code:
fn f(_v2: Vec<bool>) {}
let v1 = vec![false; 3];
f(v1);
v1;
and also when the assigned object at the moment does not refer to an actual heap, like in this code:
let v1 = vec![false; 0];
let mut _v2 = vec![false; 0];
_v2 = v1;
v1;

Compiling any of the previous three programs, the last statement causes the compilation error: use of moved value: `v1`.

In particular, in the last program, the compiler complains that v1 is moved to _v2, even if they are both empty, and so no heap is used. Why? Because the rule of moves is applied by the compiler, so it must be independent of the actual content of an object at runtime. It is because the most important principle of Rust is that it is better to know of possible errors at compile time rather than at runtime. At compile time, in general, the compiler cannot know whether a vector will be empty or not empty when it is involved in an assignment.

But also, compiling the following program causes an error at the last line. How come?
struct S {}
let s1 = S {};
let _s2 = s1;
s1;

Here, the compiler can be sure that such objects won’t contain references to the heap, but still it complains about moves. Why does Rust not use copy semantics for this type that will never have references to the heap?

Here is the rationale for this. The user-defined type S now has no references to memory, but after future maintenance of the software, one reference to the heap may easily be added, as a field of S or as a field of a field of S, and so on. So, if we now implement copy semantics for S, when the program source is changed so that a String or Box or a collection is added to S, directly or indirectly, a lot of errors would be caused by this semantic change. So, as a rule, it’s better to keep move semantics by default.

Need for Copy Semantics

So, we have seen that for many types of objects, including vectors, dynamic strings, boxes, and structs, move semantics is used by default. Yet, the following program is valid:
let i1 = 123;
let _i2 = i1;
let s1 = "abc";
let _s2 = s1;
let r1 = &i1;
let _r2 = r1;
print!("{} {} {}", i1, s1, r1);

It will print: 123 abc 123. How come there is no moved variable?

Well, the fact is that for primitive numbers, static strings, and references, Rust does not use move semantics. For these data types, Rust uses copy semantics.

Why? We saw previously that if an object can own one or more heap objects, its type should implement move semantics; but if it cannot own any heap memory, it can implement copy semantics just as well. Move semantics is a nuisance for primitive types, and it is improbable that they will ever be changed to own some heap objects. So, for them, copy semantics is safe, efficient, and more convenient.

So, some Rust types implement copy semantics, and others implement move semantics. In particular, copy semantics is used by numbers, Booleans, static strings, arrays, tuples, and references. Instead, move semantics is used, by default, by dynamic strings, boxes, any collection (including vectors), enums, structs, and tuple-structs.

Cloning Objects

With regard to the copying of objects, there is another important distinction to apply, regarding which objects it makes sense to copy.

All the types that implement copy semantics can be copied quite easily, with an assignment; but objects that implement move semantics also can be copied, using the clone standard function. We already saw that a clone function can be applied to dynamic strings, boxes, and vectors. However, for some kinds of types, a clone function shouldn’t be applicable, because no kind of copying is appropriate. Think about a file handle, a GUI window handle, or a mutex handle. If you make some copies of one such handle and then you destroy one of the copies, the underlying resource gets released, and the other copies of the handle will be referencing an inexistent resource.

So, regarding the ability to be copied, there are three kinds of objects:
  • Objects that cannot own anything, and are easy and cheap to copy, and that will remain always so.

  • Objects that may own some heap objects but do not own external resources, so they can be copied, but with a significant runtime cost.

  • Objects that own an external resource, like a file handle or a GUI window handle, so they should never be copied.

The types of the first kind of objects can implement copy semantics, and they should, because it is more convenient. Let’s call them copyable objects.

The types of the second kind of objects can implement copy semantics, but they should implement move semantics instead, to avoid the runtime cost of unneeded duplications. Moreover, they should provide a method to explicitly duplicate them. Let’s call them cloneable but noncopyable objects.

The types of the third kind of objects should implement move semantics, too. But they shouldn’t provide a method to explicitly duplicate them, because they own a resource that cannot be duplicated by Rust code, and such resource should have just one owner. Let’s call them noncloneable objects .

Of course, any object that can be automatically copied can also be explicitly copied, so any copyable object is also a cloneable object.

To summarize, some objects are noncloneable (like file handles), and other are cloneables (explicitly). Some cloneable objects are also (implicitly) copyable (like numbers), while others are noncopyable (like collections).

To distinguish among these three categories, the Rust standard library contains two specific traits: Copy and Clone. Any type implementing the Copy trait is copyable; any type implementing the Clone trait is cloneable.

So, the three kinds described are characterized in this way:
  • The objects, like primitive numbers, that implement both Copy and Clone, are copyable (and also cloneable). They implement copy semantics, and they can also be cloned explicitly.

  • The objects, like collections, that implement Clone, but don’t implement Copy, are cloneable but noncopyable. They implement move semantics, but they can be cloned explicitly.

  • The objects, like file handles, that implement neither Copy nor Clone, are noncloneable (and also noncopyable). They implement move semantics, and they cannot be cloned.

  • No object can implement Copy but not Clone. This means that no object is copyable but not cloneable. This is because such object would be copied implicitly but not explicitly, and this is pointless.

Here is an example of all these cases:
let a1 = 123;
let b1 = a1.clone();
let c1 = b1;
print!("{} {} {}", a1, b1, c1);
let a2 = Vec::<bool>::new();
let b2 = a2.clone();
let c2 = b2;
print!(" {:?}", a2);
// ILLEGAL: print!("{:?}", b2);
print!(" {:?}", c2);
let a3 = std::fs::File::open(".").unwrap();
// ILLEGAL: let b3 = a3.clone();
let c3 = a3;
// ILLEGAL: print!("{:?}", a3);
print!(" {:?}", c3);

This program will print: 123 123 123 [] [] File, and then some information regarding your current directory, like: { fd: 3, path: "/home/yourname/yourdir", read: true, write: false }. It can be compiled only because the three illegal statements have been commented out.

First, a1 is declared as a primitive number. Such type is copyable, and so it can be both explicitly cloned to b1 and implicitly copied to c1. So, there are three distinct objects having the same value, and we can print them all.

Then, a2 is declared as a collection, and specifically a vector of Booleans. Such type is cloneable but not copyable, so it can be explicitly cloned to b2, but the assignment of b2 to c2 is a move, which leaves b2 as undefined. So, after that assignment, we can print a2 and c2, but trying to compile the statement that prints b2 would generate an error with the message: borrow of moved value: `b2`.

At last, a3 is declared as a resource handle, specifically a file handle. Such type is not cloneable, so trying to compile the statement that clones a3 would generate an error with the message: no method named `clone` found for struct `File` in the current scope. It is allowed to assign a3 to c3, but it is a move; so we can print some debug information about c3, but trying to compile the statement that prints a3 would generate an error with the message: borrow of moved value: `a3`.

Making Types Cloneable or Copyable

As said before, enums, structs, and tuple structs, by default, do not implement either the Copy trait or the Clone trait , so they are noncloneable. Though, you may implement the single Clone trait for each of them, of both the Clone trait and the Copy trait.

Let’s start from this illegal program:
struct S {}
let s = S {};
let _ = s.clone();
It is enough to implement the Clone trait for our user-defined type, to make it valid:
struct S {}
impl Clone for S {
    fn clone(&self) -> Self { Self {} }
}
let s = S {};
let _ = s.clone();

Notice that to implement the Clone trait requires defining the clone method, which must return a value whose type must be equal to the type of its argument. The returned value should be equal to the value of its argument, but that is not checked by the compiler.

Implementing Clone does not automatically implement Copy, so the following program is illegal:
struct S {}
impl Clone for S {
    fn clone(&self) -> Self { Self {} }
}
let s = S {};
let _ = s.clone();
let _s2 = s;
let _s3 = s;
The last-but-one statement moves away the value from the s variable, and the last statement tries to access the value of such variable. But is it enough to also implement the Copy trait, to make it valid:
struct S {}
impl Clone for S {
    fn clone(&self) -> Self { Self {} }
}
impl Copy for S {}
let s = S {};
let _ = s.clone();
let _s2 = s;
let _s3 = s;

Notice that an implementation of Copy can be empty; it is enough to declare that Copy is implemented, to activate the copy semantics.

The following program is illegal, though:
struct S {}
impl Copy for S {}

The error message explains why: the trait bound `S: Clone` is not satisfied. The Copy trait can be implemented only if the Clone trait is also implemented.

But the following program also is illegal:
struct S { x: Vec<i32> }
impl Copy for S {}
impl Clone for S {
    fn clone(&self) -> Self { *self }
}

The error message says: the trait `Copy` may not be implemented for this type, indicating the type Vec<i32>.

The program tries to implement the Copy trait for a struct containing a vector. Rust allows you to implement the Copy trait only for types that contain only copyable objects, because copying an object means to copy all its members. Here, Vec does not implement the Copy trait, so S cannot implement it.

Instead, the following program is valid:
struct S { x: Vec<i32> }
impl Clone for S {
    fn clone(&self) -> Self {
        S { x: self.x.clone() }
    }
}
let mut s1 = S { x: vec![12] };
let s2 = s1.clone();
s1.x[0] += 1;
print!("{} {}", s1.x[0], s2.x[0]);

It will print: 13 12.

Here, the S struct is not copyable, but it is cloneable because it implements the Clone trait . Therefore, a duplicate of s1 can be assigned to s2. After that assignment, only s1 is modified, and the print statement shows that they are actually different.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.127.68