© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
C. MilanesiBeginning Rusthttps://doi.org/10.1007/978-1-4842-7208-4_10

10. Defining Generic Functions and Types

Carlo Milanesi1  
(1)
Bergamo, Italy
 
In this chapter, you will learn:
  • How to write a single function definition, whose invocations can efficiently handle different data types

  • How to use type inference to avoid the need to specify the types used by generic functions

  • How to write a single struct, tuple-struct, or enum type, whose instances can efficiently contain different data types

  • How to use two important standard generic enums, Option and Result, to represent optional data or fallible functions results

  • How some standard functions ease the handling of Option and Result

Need of Generic Functions

Rust performs a strict data type check, so when you define a function that uses an argument of a certain type, say fn square_root(x: f32) -> f32, the code that invokes such a function must pass to it an expression of exactly that type, like in square_root(45.2f32), or it must perform explicit conversions every time that function is used, like in square_root(45.2f64 as f32). You cannot pass a different type, like in square_root(45.2f64).

This is inconvenient for those who write the code that invokes the function, but also for whoever writes the function itself. As Rust has many different numeric types, when you write a function, you must cope with the problem of which type to choose. For example, if you decide to specify that an argument of your function must be of i16 type, but then in a program you must call this function many times, and in most such calls you want to pass an expression of i32 type, you must perform many type conversions. Such type conversions are verbose, and they may incur a performance overhead.

In addition, consider this code:
// Library code
fn f(ch: char, num1: i16, num2: i16) -> i16 {
    if ch == 'a' { num1 }
    else { num2 }
}
// Application code
print!("{}", f('a', 37, 41));

It will print: 37.

The first part of the code is a definition of a function, designed to be invoked in several places, so this section is nicknamed library code.

The second part is an invocation of that function, so this section is nicknamed application code.

In the application code, if you replace 37 with 37.2, and 41 with 41., you would get a compilation error; moreover, if you add as i16 after each number, obtaining the statement
print!("{}", f('a', 37.2 as i16, 41. as i16));

the program would still print 37 instead of the desired 37.2.

If you decide to change the f function, that is, the library code, replacing i16 with f32 or with f64, the program will work correctly in all the preceding cases, but will force all the callers to use floating-point numbers.

As far as we have seen, the only way for the library writer to satisfy all the application developers using that library code is to write several functions, one for each supported type. As Rust has many primitive data types, and other types may be defined by libraries and by application code, many similar functions should be written.

Defining and Using Generic Functions

The idiomatic way to solve this problem in Rust is to write the following code:
// Library code
fn f<T>(ch: char, num1: T, num2: T) -> T {
    if ch == 'a' { num1 }
    else { num2 }
}
// Application code
let a: i16 = f::<i16>('a', 37, 41);
let b: f64 = f::<f64>('b', 37.2, 41.1);
print!("{} {}", a, b);

This will print 37 41.1.

In the function definition, just after the name of the function, there is the T word enclosed in angular brackets. This symbol is a type parameter of the function declaration.

It means that what is being declared is not a concrete function, but a generic function, which is parameterized by the T type parameter. That function will become a concrete function only when, still at compile time, a concrete type is specified for such T parameter.

The T parameter is defined only in the scope of the function definition. Indeed it is used three times, only in the signature of the function. It also could have been used in the body of the function, but not elsewhere.

While the ch argument is of char type, the num1 and num2 arguments, as well as the function returned value, are of the T generic type. When such a function is used, you must replace such T parameter with a concrete type, so obtaining a concrete function.

The first line of the application code, instead of using the f generic function, uses a function whose name is f::<i16>; this is the name of the concrete function obtained from the generic function by replacing the T parameter with the i16 type. Similarly, the second line of the application code invokes the f::<f64> function; this is the concrete function obtained by replacing the T parameter with the f64 type.

Notice that in the first call, in which the i16 type has been specified in angle brackets, two integer values, which may be constrained to the i16 type, are passed as second and third arguments of the function; and the value returned by the function is assigned to a variable having type i16.

Instead, in the second call, in which the f64 type has been specified, two floating-point values, which may be constrained to the f64 type, are passed as second and third arguments of the f generic function; and the value returned by the function is assigned to a variable having type f64.

If you swapped the arguments in such function calls, passing integers to the function that specifies a floating-point parameter, or passing floating-point numbers to the function that specifies an integer parameter, some mismatched types compilation errors would have been obtained.

In such a way, by writing library code without useless repetitions, it is possible to write application code that uses two distinct types without having to change the existing library code. Other data types could be used just as well.

C language does not allow generic functions, but C++ language allows them: they are the function templates.

Inferring the Parametric Types

The preceding application code may be further simplified, though, as show in this program:
// Library code
fn f<T>(ch: char, num1: T, num2: T) -> T {
    if ch == 'a' { num1 }
    else { num2 }
}
// Application code
let a: i16 = f('a', 37, 41);
let b: f64 = f('b', 37.2, 41.1);
print!("{} {}", a, b);

As it appears, the ::<i16> and ::<f64> clauses have been removed, obtaining anyway an equivalent program. Here, the compiler, when it sees that the f function is invoked with 37 and 41 as arguments, in places where expressions of the T generic type are expected, infers that the T type is that of such numbers, that is, i32. And when the function is invoked with 37.2 and 41.1 as arguments, it infers that the T type is f64.

Indeed, when the compiler is parsing an invocation of a generic function, it infers the type parameter using the types of the function arguments.

Of course, the various types used must be consistent:
fn f<T>(a: T, _b: T) -> T { a }
let _a = f(12u8, 13u8);
let _b = f(12i64, 13i64);
let _c = f(12i16, 13u16);
let _d: i32 = f(12i16, 13i16);

This code generates a compilation error at the last-but-one statement, and two other errors at the last statement.

Indeed, the first statement defines a function expecting a first argument of any type, named T, but expecting a second argument of the same type, as it is also named T. Also the returned value must have the same type.

The first and second invocations of that function pass two numbers of the same type, so they are valid. But the third invocation passes two values of different types.

In the last statement, the two arguments have the same type, but the returned value is assigned to a variable of a different type.

If you need to parameterize a function with several values of different types, you can do that by specifying several type parameters:
fn f<Param1, Param2>(_a: Param1, _b: Param2) {}
f('a', true);
f(12.56, "Hello");
f((3, 'a'), [5, 6, 7]);

This program is valid, even if it does nothing.

In the second statement, the generic parameter Param1 is inferred to be the char type, and the generic parameter Param2 is inferred to be the bool type.

Defining and Using Generic Structs

Parametric types are useful also for declaring generic structs and generic tuple-structs:
#[allow(dead_code)]
struct S<T1, T2> {
   c: char,
    n1: T1,
    n2: T1,
    n3: T2,
}
let _s = S { c: 'a', n1: 34, n2: 782, n3: 0.02 };
struct SE<T1, T2> (char, T1, T1, T2);
let _se = SE ('a', 34, 782, 0.02);

The first statement declares the generic struct S, parameterized by the two types T1 and T2. The first one of such generic types is used by two fields, while the second one is used by only one field.

The second statement creates an object having a concrete version of such generic type. The parameter T1 is implicitly replaced by i32, because the two unconstrained integers 32 and 782 are used to initialize the two fields n1 and n2. The parameter T2 is implicitly replaced by f64, because an unconstrained floating-point number 0.02 is used to initialize the field n3.

The third and fourth statements are similar, but they use a tuple-struct instead of a struct.

Also for structs, the type parameter concretizations can be made explicit:
#[allow(dead_code)]
struct S<T1, T2> {
    c: char,
    n1: T1,
    n2: T1,
    n3: T2,
}
let _s = S::<u16, f32> { c: 'a', n1: 34, n2: 782, n3: 0.02 };
struct SE<T1, T2> (char, T1, T1, T2);
let _se = SE::<u16, f32> ('a', 34, 782, 0.02);

C language does not allow generic structs, but C++ language allows them: they are the class templates and the struct templates.

Genericity Mechanics

To better understand how genericity works, you should take the role of the compiler and follow the process of compilation. Indeed, conceptually, generic code compilation happens in several stages.

Let’s follow the conceptual mechanics of compilation, applied to the following code, which includes the main function:
fn swap<T1, T2>(a: T1, b: T2) -> (T2, T1) { (b, a) }
fn main() {
    let x = swap(3i16, 4u16);
    let y = swap(5f32, true);
    print!("{:?} {:?}", x, y);
}

In the first stage, the source code is scanned, and every time the compiler finds a generic function declaration (in the example, the declaration of the swap function), it loads in its data structures an internal representation of such function, in all its genericity, checking only that there are no syntax errors in the generic code.

In the second stage, the source code is scanned again, and every time the compiler encounters an invocation of a generic function, it loads in its data structures an association between such usage and the corresponding internal representation of the generic declaration, of course after having checked that such correspondence is valid.

Therefore, after the first two stages in our example, the compiler has a generic swap function and a concrete main function, with this last function containing two references to the generic swap function.

In the third stage, all the invocations of generic functions are scanned (in the example, the two invocations of swap). For each of such usages, and for each generic parameter of the corresponding definition, a concrete type is determined. Such a concrete type may be specified explicitly, or (as in the example) it may be inferred from the type of the expression used as the argument of the function. In the example, for the first invocation of swap, the parameter T1 is associated to the i16 type, and the parameter T2 is associated to the u16 type; in the second invocation of swap, the parameter T1 is associated to the f32 type, and the parameter T2 is associated to the bool type.

After having determined the concrete type by which the generic parameters are to be replaced, a concrete version of the generic function is generated. In such a concrete version, every generic parameter is replaced by the concrete type determined for the specific function invocation, and the invocation of the generic function is replaced by an invocation of the just generated concrete function.

For the example, the generated internal representation corresponds to the following Rust code:
fn swap_i16_u16(a: i16, b: u16) -> (u16, i16) { (b, a) }
fn swap_f32_bool(a: f32, b: bool) -> (bool, f32) { (b, a) }
fn main() {
    let x = swap_i16_u16(3i16, 4u16);
    let y = swap_f32_bool(5f32, true);
    print!("{:?} {:?}", x, y);
}

As you can see, there are no more generic definitions or generic function invocations. The generic function definition has been transformed into two concrete function definitions, and the two function invocations now each invoke a different concrete function.

The fourth stage consists of compiling this code.

Notice that it was needed to generate two different concrete functions, as the two invocations of the generic swap function specified different types.

But this code:
fn swap<T1, T2>(a: T1, b: T2) -> (T2, T1) { (b, a) }
let x = swap('A', 4.5);
let y = swap('g', -6.);
print!("{:?} {:?}", x, y);
is internally translated to this code:
fn swap_char_f64(a: char, b: f64) -> (f64, char) { (b, a) }
let x = swap_char_f64('A', 4.5);
let y = swap_char_f64('g', -6.);
print!("{:?} {:?}", x, y);

Even if there are several invocations of the generic function swap, only one concrete version is generated, because all the invocations required the same types for the parameters.

In general, it always applies the optimization of generating only one concrete version of a generic function declaration, when several invocations specify exactly the same type parameters.

The fact that the compiler can generate, in a single program, several concrete versions of machine code corresponding to a single function has consequences:
  • This multistage compilation is somewhat slower, with respect to compiling nongeneric code.

  • The generated code is highly optimized for each specific invocation, as it uses exactly the types used by the caller, without needing conversions or decisions. Therefore, the runtime performance of each invocation is optimized.

  • If many invocations with different data types are performed for a generic function, a lot of machine code is generated, and that can impact performance because of bad code locality. To lessen this phenomenon, named code bloat, it may be better to reduce the number of distinct types used to call a generic function.

All that was said in this section about generic functions also holds for generic structs and tuple-structs.

Generic Arrays and Vectors

Regarding arrays and vectors, there is no news. We saw from Chapter 5, where they were introduced, that they are generic types.

Actually, while arrays are part of the Rust language, vectors are structs defined in the Rust standard library.

Generic Enums

In Rust, even enums can be generic.
enum Result1<SuccessCode, FailureCode> {
    Success(SuccessCode),
    Failure(FailureCode, char),
    Uncertainty,
}
let mut _res = Result1::Success::<u32, u16>(12u32);
_res = Result1::Uncertainty;
_res = Result1::Failure(0u16, 'd');
The preceding program is valid, but the following one causes a compilation at the last line:
enum Result1<SuccessCode, FailureCode> {
    Success(SuccessCode),
    Failure(FailureCode, char),
    Uncertainty,
}
let mut _res = Result1::Success::<u32, u16>(12u32);
_res = Result1::Uncertainty;
_res = Result1::Failure(0u32, 'd');

Here , the first argument of Failure is of type u32, while it should be of type u16, according to the initialization of _res, two lines before.

Generic enums are used a lot in the Rust standard library.

The Option<T> Standard Enum

One of the most used enums defined in the Rust standard library solves the following common problem. If a function can fail, what should it do, when it fails?

For example, the function pop removes the last item from a vector, and returns the removed item, if that vector contains some items. But what should the expression vec![0; 0].pop() do? It is removing an item from an empty vector!

Some languages leave this behavior undefined, possibly leading to unpredictable results. Rust avoids undefined behavior as much as possible.

Some languages raise an exception, to be handled by an enclosing block or by the callers of the current function, or leading to a crash. Rust implements the concept of exception, with the panic concept. Usually, though, a panic is raised only in very exceptional situations, and it is prevented whenever possible.

Some languages return a specific null value. But a vector can contain almost any possible type, and many types have no null value.

Here is the Rust idiomatic solution:
let mut v = vec![11, 22, 33];
for _ in 0..5 {
    let item: Option<i32> = v.pop();
    match item {
        Some(number) => print!("{}, ", number),
        None => print!("#, "),
    }
}

This will print: 33, 22, 11, #, #, .

The v variable is a vector, initially containing three numbers.

The loop performs five iterations. Each of them tries to remove an item from v. If the removal is successful, the removed item is printed; otherwise the # character is printed.

The pop function applied to an object of Vec<T> type returns a value of Option<T> type.

Such a generic type is defined by the Rust standard library as this:
enum Option<T> {
    Some(T),
    None,
}

This enum means: “This is an optional value of T type. It has the option of being a T, and the option of being nothing. It can be something or nothing. If it is something, it is a T.”

Probably such a definition would have been clearer if it had been:
enum Optional<T> {
    Something(T),
    Nothing,
}

It should be thought as such. However, Rust always tries to abbreviate names, so the previous definition is the valid one.

Getting back to the example, at the first iteration of the loop, the value of the variable item is Some(33); at the second iteration, it is Some(22); at the third iteration it is Some(11). Then the v vector has become empty, so pop can only return None, which is assigned to item at the fourth and fifth iterations.

The match statement discriminates when Some number has been popped, and when there was None. In the former case, that number is printed, and in the latter case, just a # is printed.

The Result<T, E> Standard Enum

The Rust standard library also defines a generic enum to handle the case in which a function cannot return a value of the expected type, because of an error condition:
fn divide(numerator: f64, denominator: f64) -> Result<f64, String> {
    if denominator == 0. {
        Err(format!("Divide by zero"))
    } else {
        Ok(numerator / denominator)
    }
}
print!("{:?}, {:?}", divide(8., 2.), divide(8., 0.));

This will print: Ok(4), Err("Divide by zero").

The divide function should return the result of the division of the first number by the second number, but only if the second number is not zero. In this latter case, it should return an error message.

The Result type is similar to the Option type, but while the Option type represents as None the case of a missing result, the Result type can add a value that describes such an anomalous condition.

The definition of this generic enum in the standard library is:
enum Result<T, E> {
    Ok(T),
    Err(E),
}

In our example, T was f64, because that is the type resulting from the division of two f64 numbers, and E was String because we wanted to print a message.

We used the results of the invocations only to print them as debug information, using the {:?} placeholder. In a production program, though, that is not acceptable, because such results can be used in further computations. This is obviously true for the success result (the Ok variant), but also the failure result (the Err variant) can be used to display a clear explanation of how the error happened or to take a branch of the algorithm that handles such case. A more appropriate code would be the following one:
fn divide(numerator: f64, denominator: f64) -> Result<f64, String> {
    if denominator == 0. {
        Err(format!("Divide by zero"))
    } else {
        Ok(numerator / denominator)
    }
}
fn show_divide(num: f64, den: f64) {
    match divide(num, den) {
        Ok(val) => println!("{} / {} = {}", num, den, val),
        Err(msg) => println!("Cannot divide {} by {}: {}",
            num, den, msg),
    }
}
show_divide(8., 2.);
show_divide(8., 0.);
This will print:
8 / 2 = 4
Cannot divide 8 by 0: Divide by zero

Enum Standard Utility Functions

The Option and Result standard generic types allow us to capture in a flexible and efficient way all the cases that happen in real-world code; though, to use a match statement to get the result is quite inconvenient.

Therefore, the standard library contains some utility functions to ease the decoding of an Option or Result value:
fn divide(numerator: f64, denominator: f64) -> Result<f64, String> {
    if denominator == 0. {
        Err(format!("Divide by zero"))
    } else {
        Ok(numerator / denominator)
    }
}
let r1 = divide(8., 2.);
let r2 = divide(8., 0.);
println!("{} {}", r1.is_ok(), r1.is_err());
println!("{} {}", r2.is_ok(), r2.is_err());
println!("{}", r1.unwrap());
println!("{}", r2.unwrap());
This program first prints
true false
false true
4

and then panics with the message: thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Divide by zero"'.

The is_ok function returns true if it is applied to an Ok variant. The is_err function returns true if it is applied to an Err variant. As they are the only possible variants, is_err() is equivalent to !is_ok().

There are similar functions for the Option generic type:
let mut a = Some(12);
print!("{} {}; ", a.is_some(), a.is_none());
a = None;
print!("{} {}", a.is_some(), a.is_none());

It will print: true false; false true.

The unwrap function returns the value of the Ok variant, if it is applied to an Ok variant, and it panics otherwise. The meaning of this function is “I know that this value is a value probably wrapped in an Ok variant, so I just want to get that contained value, getting rid of its wrapping; in the strange case it is not an Ok variant, an irrecoverable error has happened, so I want to immediately terminate the program.”

There is also an unwrap function for the Option enum. For example, to print all the values in a Vec, you can write:
let mut v = vec![11, 22, 33];
for _ in 0..v.len() {
    print!("{}, ", v.pop().unwrap())
}

This will print: 33, 22, 11, . The invocation of unwrap gets the number inside the Ok enum returned by pop(). We avoided calling pop() on an empty vector; otherwise pop() would have returned a None, and unwrap() would have panicked.

The unwrap function is much used in quick-and-dirty Rust programs, where it is OK that a possible errors generates a panic. To create a robust application, in which errors are handled in a user-friendly way, any possible None or Err values must be properly handled.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.154.143