© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
C. MilanesiBeginning Rusthttps://doi.org/10.1007/978-1-4842-7208-4_18

18. Data Encapsulation

Carlo Milanesi1  
(1)
Bergamo, Italy
 
In this chapter, you will learn:
  • Why the dot notation for calling a function is more convenient than the functional notation

  • How to use the impl and self keywords to declare functions that can be invoked using the dot notation

  • How to encapsulate function declarations in modules, selecting which functions declared in a module should be accessible from other modules

  • How to create a hierarchy of modules, and how to access any function in such hierarchy

  • How to define type aliases

The Need for Methods

We already saw that there are two possible notations to invoke a function: f(x, y) and x.f(y). The first one is the functional notation, and the second one is the dot notation. Previously, we invoked some functions of the standard library using the functional notation, like String::new() or std::fs::File::open("data.txt"), and other functions using the dot notation, like "abcd".len() or vec![0u8; 0].push(7u8). The dot notation is similar to the notation used to access the fields of tuples, tuple-structs, and structs.

Not every function can be invoked using the dot notation. For example, the String::new and File::open functions can only be invoked using the functional notation.

However, any function that may be invoked using the dot notation may also be invoked using the functional notation. Here is an example:
print!("{} {}",
    "abcd".to_string(),
    std::string::ToString::to_string("abcd"));

It will print: abcd abcd. First, the to_string function is invoked using the dot notation and then using the functional notation.

Here is another example:
print!("{} {}",
    [1, 2, 3].len(),
    <[i32]>::len(&[1, 2, 3]));

It will print: 3 3.

And yet another example:
let mut v1 = vec![0u8; 0];
let mut v2 = vec![0u8; 0];
v1.push(7);
Vec::push(&mut v2, 7);
print!("{:?} {:?}", v1, v2);

It will print: [7] [7]. First, two empty mutable vectors are created. Then, the byte 7 is pushed onto them using two different notations.

The dot notation is typical of the object-oriented programming paradigm, and this is not by chance. Such notation is possible in Rust, because Rust, in some way, supports such paradigm.

The gist of object-oriented programming is having a data type, traditionally named class, which, in addition to containing data, has some specific behavior, implemented as functions associated to such type. Such associated functions are traditionally named methods. Considering a method call like a.f(b, c), we see that this method f is a function that has a privileged argument, a, and two normal arguments b and c. The privileged argument a is an instance of the type to which this method is associated. Such argument is considered the current object of the associated type, therefore it is usually simply called this or self.

When the dot notation is transformed into the functional notation, the current object becomes an additional first argument. In this example, it would become f(a, b, c). Though, the current object must be decorated by the possibly required dereference symbol (&) or mutation keyword (mut), or both, like in f(&mut a, b, c). Such decorations are implicit, using the dot notation.

In addition, there is a scoping issue. In an application, there may be several functions having the same name. For example, the standard library contains several functions named to_string, len, and push. Using the dot notation, the proper method is automatically chosen, because it is the only one having that name among those associated with the type of the current object. Instead, using the functional notation, the scope of the function must be written explicitly. In the preceding examples, the to_string function is in the std::string::ToString scope, the len function is in the <[i32]> scope, and the push function is in the Vec scope. If you omit the scope specifier in the first example, writing simply to_string("abcd"), you get the compilation error: cannot find function `to_string` in this scope.

The dot notation and the underlying object-oriented programming paradigm appear to be so good in simplifying code, that you may wish to use them often, and in particular for the functions you declare. To that purpose, a specific syntax is required to declare methods.

Method Declarations

Let’s see how to implement the basics of object-oriented programming in Rust, by declaring methods for a user-defined datatype.

First, let’s define a type and a function that accesses objects of such type using the known functional notation:
struct Person {
    personal_names: String,
    family_names: String,
}
fn naming(p: Person) -> String {
    format!("{} {}",
        p.personal_names,
        p.family_names)
}
let person = Person {
    personal_names: "John".to_string(),
    family_names: "Doe".to_string(),
};
print!("{}", naming(person));

It will print: John Doe.

First, there is the declaration of the Person struct, and then the declaration of the naming function , which receives an instance of such type by value. Then, an instance of the Person type is created, and it is passed to an invocation of the naming function.

Now, assume that in the last line we would like to use the dot notation. We could try to replace the last line with this line:
print!("{}", person.naming());

Though, we would get the compilation error: no method named `naming` found for struct `Person` in the current scope.

To obtain the desired result, we must use the following method declaration syntax:
struct Person {
    personal_names: String,
    family_names: String,
}
impl Person {
    fn naming(self) -> String {
        format!("{} {}",
            self.personal_names,
            self.family_names)
    }
}
let person = Person {
    personal_names: "John".to_string(),
    family_names: "Doe".to_string(),
};
print!("{}", person.naming());

Here, the declaration of the naming function has been inserted into a block preceded by the clause impl Person. The impl keyword is shorthand for implementation. In Rust, the impl block is a construct designed to encapsulate the methods associated to the type specified just after the impl keyword.

A noticeable aspect of the impl feature is that, differing from most object-oriented languages, in Rust the data and the methods are separated in distinct blocks.

Notice also that the declaration of the naming function has changed. In the argument list, instead of p: Person, there is the self keyword. It is a special argument that represents the current object, on which the method will be applied. This method is declared for the Person type, so here the type of self is implicitly Person. You can specify such type explicitly, like in this line:
fn naming(self: Person) -> String {

In the body of the function, the two occurrences of p have been replaced by self, which represents the method argument, and so also the current object.

After such definition, it is possible to use the dot notation, in the expression person.naming(). It is still possible also to use the functional notation, in this equivalent expression: Person::naming(person).

Object-oriented languages are traditionally classified as pure and hybrid:
  • In a pure object-oriented language, like Smalltalk or Ruby, every function is associated to a class, and there is no type that cannot have methods.

  • In a hybrid object-oriented language, like C++ and Python, there are functions not associated to any class, and only the types defined as classes (or structs) can have methods.

According to this classification, Rust is intermediate between these two categories. For Rust, there are functions not associated to any class, like in hybrid languages, but methods can be added to any Rust type having a name, even to primitive types.

Let’s see some examples:
struct Person (String, u32);
#[allow(dead_code)]
enum Visibility { Visible, Hidden, Collapsed }
impl Person {
    fn age(&self) -> u32 {
        self.1
    }
}
impl Visibility {
    fn is_not_visible(&self) -> bool {
        match self {
            Visibility::Visible => false,
            _ => true,
        }
    }
}
print!("{} ", Person ("John".to_string(), 30).age());
print!("{}", Visibility::Collapsed.is_not_visible());

It will print: 30 true.

First, two types have been defined, a tuple-struct and an enum.

Then, the age method has been declared for the tuple-struct, and the is_not_visible method has been declared for the enum.

Then, such types are instantiated, and their methods are invoked on such instances.

There are some types without a name, like tuples and closures, and they cannot have methods.

Primitive types and types imported from the standard library or from third-party libraries can have methods, too. For example, we have already seen the expression "abcd".to_string(). Though, the syntax that we have seen for declaring methods is valid only for types defined in your own code. In a future chapter we will see how to add methods to primitive types or to types declared in external libraries.

The self and Self Keywords

The Rust self keyword is similar to the this keyword of C++, C#, Java, or to the self keyword of other languages. However, Rust has some differences with such languages:
  • The self special argument is not implied in the declaration of the method. If you need it, you must specify it in the signature.

  • self is not implied when accessing the current object. If you are accessing a field or a method of that object, you must specify self before the name of the field or of the method.

  • self is not a pointer or reference. It receives the current object argument by value. If you need an immutable reference, you should write &self in the signature. If you need a mutable reference, you should write &mut self in the signature.

The type of the self expression is implicitly the type specified just after the impl keyword. Though, to avoid repeating such name, Rust has another keyword: Self. Remember that Rust is case sensitive, so self and Self are two different keywords. Self represents the type of self.

Here is an example that shows some other Rust features regarding methods:
struct Person {
    personal_names: String,
    family_names: String,
}
impl Person {
    fn new() -> Self {
        Self {
            personal_names: String::new(),
            family_names: String::new(),
        }
    }
    fn naming(&self) -> String {
        format!("{} {}",
            self.personal_names,
            self.family_names)
    }
}
impl Person {
    fn set_personal_names(&mut self, new_name: String) {
        self.personal_names = new_name;
    }
}
let mut person = Person::new();
print!("[{}] ", person.naming());
person.personal_names = "John".to_string();
person.family_names = "Doe".to_string();
print!("[{}] ", person.naming());
person.set_personal_names("Jane".to_string());
print!("[{}]", person.naming());

It will print: [ ] [John Doe] [Jane Doe].

First of all, notice that there are two impl blocks, both for the Person type. The first block declares the new and naming functions, and the second block declares the set_personal_names functions . So, it is possible to define several functions in a block, but it is also possible to split the set of functions declarations into several blocks.

In this case, there was no point in having two impl blocks, but in more complex applications it may be useful to be able to add functions to a type in several parts of the code base.

Then, notice that the new function has no arguments, not even the self argument. A function declared in an impl block, but without a self argument, is actually not a method but an associated function. An associated function cannot access an instance of the type to which it is associated, because there is no such current object for it. Keeping it inside an impl block is just an encapsulation choice; it means that this function is considered to be strictly related to the currently implemented type, so it is available only in such scope.

Rust associated functions correspond to static or class methods in C++, C#, Java, and Python, while Rust methods correspond to instance methods in such languages. Though, Rust has no notion of static data members. This concept, present in many other object-oriented languages, is that of a variable present in a single instance in the entire program, but declared inside the scope of a class, and therefore accessible with no further specification only from the class methods or the instance methods of that class. Rust does not allow declaring a variables inside impl blocks (except the local variables of methods), and it does not allow marking as static the members of a type.

A typical use of associated functions is for constructing new instances of the implemented type. Actually, in Rust you can also create any object without calling any method, though you can also encapsulate such creation in an associated function, like the new method in the example.

In C++, C#, and Java, there is the new keyword, and there is the rule that constructors must have the same name of the class. In Rust there are no such features, but there is the convention to name new a method that has no arguments and that returns an instance of type Self . Such method plays the role of default constructor.

The second method shown in the preceding example is naming. Notice that it receives self by immutable reference, to avoid getting it by value. That method corresponds to a const method in C++, because it receives the current object, but it cannot mutate it.

The third method shown in the example is set_personal_names. It needs to change the current object, so it receives the argument &mut self. That method corresponds to a non-const method in C++, because it receives the current object, and it can mutate it.

The mod and the pub Keywords

Those who already know object-oriented programming will find it strange that so far the words private or public have never been mentioned. This is because the concepts of privacy are handled only by the Rust module system. Rust modules are similar to namespaces in other languages.

When a program is very small, all the code can easily be put into just one source file, and in just one module. But, when you have to manage a large code base, there is the need to split the code into several source files, and into several modules too.

Here is an example of a module declared and used in a single source file:
mod routines {
    fn f() -> u32 { g() }
    fn g() -> u32 { 123 }
}
print!("{}", f());

This program first uses the mod keyword to declare a module named routines, containing two function declarations, and then it tries to invoke one of those functions.

If you compile it, though, you get the error: cannot find function `f` in this scope. This happens because the f function is declared inside a block, and that block defines a distinct scope. Such scope is not automatically accessed from outside it. So, the previous code is actually similar to this:
{
    fn f() -> u32 { g() }
    fn g() -> u32 { 123 }
}
print!("{}", f());
While identifiers defined in an anonymous block can never be accessed from outside such a block, identifiers defined in a module can be accessed from outside that module, as shown by the following program:
mod routines {
    pub fn f() -> u32 { g() }
    fn g() -> u32 { 123 }
}
print!("{}", routines::f());

This program will be compiled, and it will print: 123.

There are two changes, with respect to the previous program:
  • The declaration of the f function is preceded by the pub keyword.

  • The invocation of the f function is preceded by the scope specification routines::.

Every identifier declared in a module is accessible to every part of that module. Before this section, we always used only the anonymous global module. This explains why we had little problem accessing identifiers declared by our program.

Though, by default, every identifier declared in a module is not accessible (i.e., it is private) to other modules. To let other modules access an identifier, a module must prefix its declaration with the pub keyword , which is shorthand for public. We needed to allow the last line of the program to access the f function, so we had to make that function public. Instead, the g function is accessed only from inside the module, so it can remain private. There is no way to specify that an identifier is private; it is just the default.

The f function is public, and so it can be accessed by any module. However, when accessed from other modules, its scope path must be specified, as we did in the last statement.

A module can be declared inside another module, like in this example program, in which the main function is explicitly written:
fn f() {
    print!("f ");
    g();
    m::f();
    m::m::f();
}
fn g() { print!("g "); }
mod m {
    pub fn f() {
        print!("1.f ");
        g();
        m::f();
        super::g();
    }
    fn g() { print!("1.g "); }
    pub mod m {
        pub fn f() {
            print!("2.f ");
            g();
            super::g();
            super::super::g();
            crate::g();
        }
        fn g() { print!("2.g "); }
    }
}
fn main() {
    f();
}

It will print: f g 1.f 1.g 2.f 2.g 1.g g g g 2.f 2.g 1.g g g .

In the global module of this program, there are the declarations of an f function, a g function, an m module, and the main function.

In the m module there are declarations of an f function, a g function, and an m module. Of course, such functions and such a module are different from those declared in the global module. To name them in an unambiguous way, we can use the specifications: m::f, m::g, and m::m.

In the m::m module there are declarations of an f function and a g function. To name them in an unambiguous way, we can use the specifications: m::m::f, and m::m::g.

The main function calls the f function at the same level, so such function does not need to be marked as public to be accessible.

The f function calls the g function at the same level, so such function does not need to be marked as public to be accessible.

Then, the f function calls the m::f function in a nested module, so the function needs to be marked as public to be accessible. The m module is at the same level of the f function, so it does not need to be marked as public.

Then, the f function calls the m::m::f function in a doubly nested module, so the function and also the m::m module that contains it need to be marked as public.

In the m::m::f function, we want to call three functions named g:
  • The one at the same nesting level, which we name m::m::g, can be called simply with the expression g().

  • The one up one level, which we name m::g, can be called with the expression super::g(); the super Rust keyword means to move up the hierarchy by one level.

  • The one up two levels, which we name simply g, can be called with the expression super::super::g(). Alternatively, we can use an absolute pathname, starting from the top global module, using the expression crate::g(); the crate Rust keyword is a reference to the global module.

Notice that every statement can access any identifier declared in any containing module. The pub keyword is needed only to access identifiers declared in inner modules.

We already saw such nested notation in expressions like std::fs::File::open("data.txt"). This expression means that we want to access the std global module, the fs public module declared inside it, the public File type declared inside it, and the public open associated function declared for this type.

The type Keyword

Now, let’s see another feature of Rust that allows you to decouple design choices from code.

Say you want to write a portion of code that now uses the f32 type, but in the future it could use the f64 type or some other type. If you intersperse your code with the f32 keyword, when you want to switch to the f64 type you should search and replace all those occurrences, and that is time-consuming and error-prone.

A possible solution is to encapsulate your code in a generic function having such numeric type as parametric type. Though, if that code is just a portion of a function or, conversely, if it spans several functions, that solution is inconvenient.

This situation is similar to the use of literals. It is well known that instead of writing magic literals inside your code, it is better to define named constants, and use those constants inside your code. In this way the purpose of your code becomes clearer, and when you want to change the value of a constant, you change only one line.

Similarly, instead of writing:
fn f1(x: f32) -> f32 { x }
fn f2(x: f32) -> f32 { x }
let a: f32 = 2.3;
let b: f32 = 3.4;
print!("{} {}", f1(a), f2(b));
it is better to write:
type Number = f32;
fn f1(x: Number) -> Number { x }
fn f2(x: Number) -> Number { x }
let a: Number = 2.3;
let b: Number = 3.4;
print!("{} {}", f1(a), f2(b));

Both source programs generate the same executable program, which will print: 2.3 3.4. But the second one begins with an additional statement. The type keyword introduces a type alias. It simply means that whenever the word Number is used as a type, it means the f32 type. Indeed, in the rest of the program every one of the six occurrences of the word f32 has been replaced by the Number word, which has just the same meaning.

The corresponding construct in C language is that using the typedef keyword.

Such constructs do not introduce a distinct type; they introduce just a new name for the same type. This implies that the following code is valid:
type Number = f32;
let a: Number = 2.3;
let _b: f32 = a;

The _b variable is of f32 type, and it is initialized by the value of the a variable, so a must also be of f32 type. But a is declared to be of the Number type, so Number and f32 must be the same type.

Using the type construct has at least two advantages:
  • The purpose of a type may become clearer if you use a meaningful name instead of a primitive type.

  • If, in the previous program, you later decide to use the f64 type everywhere instead of the f32 type, you need to change only one occurrence instead of six occurrences.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.172.104