© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
C. MilanesiBeginning Rusthttps://doi.org/10.1007/978-1-4842-7208-4_7

7. Enumerations and Matching

Carlo Milanesi1  
(1)
Bergamo, Italy
 
In this chapter, you will learn:
  • How enums help in defining variables that can take on values only from a finite set of cases

  • How enums can be used to implement discriminated union types

  • How to use the match pattern-matching construct to handle enums

  • How to use the match construct to handle other data types, like integer numbers, strings, and single characters

  • How to define enumerations containing fields of data and how to match them with literals

  • How to define variables in patterns of match statements

  • How to use match expressions

  • How to use Boolean guards to generalize the pattern-matching of the match construct

  • How to use the if-let and the while-let constructs

Enumerations

Instead of writing the following code:
const EUROPE: u8 = 0;
const ASIA: u8 = 1;
const AFRICA: u8 = 2;
const AMERICA: u8 = 3;
const OCEANIA: u8 = 4;
let continent = ASIA;
if continent == EUROPE { print!("E"); }
else if continent == ASIA { print!("As"); }
else if continent == AFRICA { print!("Af"); }
else if continent == AMERICA { print!("Am"); }
else if continent == OCEANIA { print!("O"); }
It is better to write the following equivalent code:
#[allow(dead_code)]
enum Continent {
    Europe,
    Asia,
    Africa,
    America,
    Oceania,
}
let contin = Continent::Asia;
match contin {
    Continent::Europe => print!("E"),
    Continent::Asia => print!("As"),
    Continent::Africa => print!("Af"),
    Continent::America => print!("Am"),
    Continent::Oceania => print!("O"),
}

They both print: As.

The “enumkeyword introduces the new Continent type, specified just after it. Such a type is called enumerative, because it lists a set of items, internally associating a unique number to each item. In the example, the allowed values for the type Continent are Europe, Asia, Africa, America, and Oceania, which respectively are represented internally by the values 0u8, 1u8, 2u8, 3u8, and 4u8. Such allowed values are named variants.

In the simplest cases, like the one just shown, this type is similar to the construct of the C language having the same name.

After having defined an enumerative type, it is possible to create objects of such type, named enumerations, or enums for short. In the example, the contin enum variable, of Continent type, has been defined.

The first line of the program is an attribute. We have seen attributes in the section “Rust Attributes” in Chapter 5. Its purpose is the following one.

The compiler includes a useful check: every variant must be constructed at least one time in an application, otherwise it is useless, and the compiler will emit a warning about it. In the preceding program only the Asia variant is constructed, so the compiler would emit four warnings about the unused variants. To avoid them, the attribute #[allow(dead_code)] has been added. Actually, the creation of an enum variant is some kind of code. The compiler reckons that this code will never be run, so it is dead code. We are warned of the existence of such dead code.

Notice that the use of a variant must be qualified by the name of its type, like in Continent::Asia.

The following code
enum T {A, B, C, D}
let n: i32 = T::D;
let e: T = 1;

generates a compilation error at the second line, and another one at the third line, both of the kind mismatched types. The first error is described as expected `i32`, found enum `T`, and the second error is described as expected enum `T`, found integer. Therefore, enums cannot implicitly be converted to numbers, and numbers cannot implicitly be converted to enums.

The match Construct

In the example introducing the enum construct, the just created enum is used by a new kind of statement, beginning with the “match” keyword .

The match statement is the basic Rust tool to use enumerations, similarly to the switch statement in C language, even if they differ about many aspects.

In the first place, notice that the expression following the match keyword doesn’t have to be enclosed in parentheses.

Then, the various cases, also called arms, are made of a pattern, followed by the symbol “=>” (named fat arrow), followed by an expression. Such arms are separated by commas.

Both in the declaration of the enumerative type, and in the match statement, after the last item, it is optional to put another comma. That is, you can write a comma after the last variant of an enum declaration, or you can omit it; and you can write a comma after the last arm of a match statement, or you can omit it. Usually, if you put each item in a different line, the comma is written, so that every line containing items ends with a comma. However, the comma is usually omitted just before a closed brace, like in enum CardinalPoint { North, South, West, East };.

The behavior of our match statement is as follows.

First, the statement following match is evaluated, thereby obtaining a value, which in our example is Continent::Asia. Then, that value is compared with each one of the (five) patterns, in the order that they appear written. As soon as a pattern matches, the right side of its arm is evaluated and the statement is ended.

Notice that the right side of every arm must be a single expression. So far, we always used print! as if it were a statement, but actually it is an expression.

In fact, any valid expression becomes a valid statement, if you add a semicolon character after it. For example, consider this:
let a = 7.2;
12;
true;
4 > 7;
5.7 + 5. * a;
This code is valid, although, of course, it does nothing. Actually, for the last two statements, these warnings are generated: unused comparison that must be used, and unused arithmetic operation that must be used. This happens because the compiler thinks that such senseless expressions probably are programming errors. To avoid these warnings, write this code:
let a = 7.2;
12;
true;
let _ = 4 > 7;
let _ = 5.7 + 5. * a;

Given that an invocation of the print! macro is a valid expression, when we added a “;” character after it, we transformed it into a statement.

However, notice that there are statements that aren’t valid expressions. For example, “fn empty() {}” is a statement that isn’t a valid expression. If for an arm we wrote:
    Continent::Africa => fn aaa() {},

we would have gotten the error message: expected expression, found keyword `fn`, because at the right of the fat arrow, there isn’t a valid expression.

And what can you do if, in an arm, you wish to execute a statement that isn’t also an expression, or several expressions? In such cases you can use a block:
#[allow(dead_code)]
enum Continent {
    Europe,
    Asia,
    Africa,
    America,
    Oceania,
}
let mut contin = Continent::Asia;
match contin {
    Continent::Europe => {
        contin = Continent::Asia;
        print!("E");
    },
    Continent::Asia => { let a = 7; print!("{}", a); }
    Continent::Africa => print!("Af"),
    Continent::America => print!("Am"),
    Continent::Oceania => print!("O"),
}

This will print: 7.

Here contin has been declared as mutable, and then, in case its value was Europe , it would be changed to Asia and the letter E would be printed. But in case its value was Asia, another variable would be declared, initialized, printed, and then destroyed.

Such two arms have a block as their right side, and, because any block is an expression, this syntax is valid.

Relational Operators and Enums

Enums are not comparable using the “==” operator. Actually, the following program is illegal:
enum CardinalPoint { North, South, West, East }
let direction = CardinalPoint::South;
if direction == CardinalPoint::North { }

For the last statement the compiler generates the message “binary operation `==` cannot be applied to type `CardinalPoint`”. Consequently, to check the value of an enum, you are required to use a match statement.

Enums are important, as they are used in many places in the standard library and also in other Rust libraries. And the match construct is important, because it is required to use enums, even if often it is encapsulated in other constructs.

With enums, not only the “==” operator is forbidden, but also the other relational operators: “!=”, “<”, “<=”, “>”, and “>=”. Therefore, the following code will also generate a compilation error:
enum CardinalPoint { North, South, West, East }
if CardinalPoint::South < CardinalPoint::North { }

Handling All the Cases

If you try to compile the following program:
#[allow(dead_code)]
enum CardinalPoint { North, South, West, East }
let direction = CardinalPoint::South;
match direction {
    CardinalPoint::North => print!("NORTH"),
    CardinalPoint::South => print!("SOUTH"),
}

you will get the error: non-exhaustive patterns: `West` and `East` not covered. The compiler complains that among the allowed values for the expression direction, only two of them were considered, and the cases in which the expression’s value would be West or East weren’t considered. This happens because Rust requires that the match statement explicitly handles every possible case.

However, the following program is valid, as it considers all the possible values of the argument of match:
#[allow(dead_code)]
enum CardinalPoint { North, South, West, East }
let direction = CardinalPoint::South;
match direction {
    CardinalPoint::North => print!("NORTH"),
    CardinalPoint::South => print!("SOUTH"),
    CardinalPoint::East => {},
    CardinalPoint::West => {},
}
However, here the last two variants (East and West) do nothing, and it is annoying to list them anyway. To avoid having to list all the variants that do nothing, it is possible to use an underscore sign in the following way:
#[allow(dead_code)]
enum CardinalPoint { North, South, West, East }
let direction = CardinalPoint::South;
match direction {
    CardinalPoint::North => print!("NORTH"),
    CardinalPoint::South => print!("SOUTH"),
    _ => {},
}
The underscore sign always matches with any value, so it avoids the compilation error, because in this way all cases have been handled. Of course, such a “catch-all” case must be the last one, to avoid catching cases that should be handled differently:
#[allow(dead_code)]
enum CardinalPoint { North, South, West, East }
let direction = CardinalPoint::South;
match direction {
    CardinalPoint::North => print!("NORTH"),
    _ => {},
    CardinalPoint::South => print!("SOUTH"),
}

This program does not print anything, because the matching CardinalPoint::South case is never reached, but the compiler notices that the last case can never be reached, and it emits the warning: unreachable pattern.

The “_” pattern corresponds to the “default” case of the C language.

Using match with Numbers

The match construct , in addition to being needed with enums, is also usable and useful with other data types:
match "value" {
    "val" => print!("value "),
    _ => print!("other "),
}
match 3 {
    3 => print!("three "),
    4 => print!("four "),
    5 => print!("five "),
    _ => print!("other "),
}
match '.' {
    ':' => print!("colon "),
    '.' => print!("point "),
    _ => print!("other "),
}

This will print: other three point .

The first match statement has a string as its argument, so it expects strings as left sides of its arms. In particular, no arm matches exactly, so the default case is taken.

The second match statement has an integer number as its argument, so it expects integer numbers as left sides of its arms. In particular, the pattern of the first arm matches, so it is taken.

The third match statement has a character as its argument, so it expects single characters as left sides of its arms. In particular, the pattern of the second arm matches, so it is taken.

Also for match statements with arguments that aren’t enums, it is required that all possible cases are handled. However, except for enums and Booleans, it is not feasible to specify all single cases; therefore, it is required to use the underscore “catch-all” case.

Enumerations with Data

Rust enumerations aren’t always as simple as the one seen previously. Here is a more complex example:
#[allow(dead_code)]
enum Result {
    Success(u8),
    Failure(u16, char),
    Uncertainty,
}
// let outcome = Result::Success(1);
let outcome = Result::Failure(20, 'X');
match outcome {
    Result::Success(0) => print!("Result: 0"),
    Result::Success(1) => print!("Result: 1"),
    Result::Success(_) => print!("Result: other"),
    Result::Failure(10, 'X') => print!("Error: 10 X"),
    Result::Failure(10, _) => print!("Error: 10"),
    Result::Failure(_, 'X') => print!("Error: X"),
    Result::Failure(_, _) => print!("Error: other"),
    Result::Uncertainty => {},
}

It will print: Error X.

Instead, if the commented out line that declares the outcome variable is reactivated, and the next line is commented out, the program will print: Result: 1.

In this code, in the definition of the Result enumerative type, the first variant specifies a data type enclosed in parentheses (u8); the second variant specifies two data types in parentheses (u16 and char); and the third variant does not specifies any data type (and does not use parentheses).

The effect of such a declaration is that every object having such a Result type, like the variable outcome in the example, can have the following values:Result::Success, and in addition it also contains a field of type u8;Result::Failure, in addition it also contains a field of type u16 and a field of type char;Result::Uncertainty, and it contains no further data.There is no other possibility.

Therefore, with respect to the C language, Rust enumerative types combine the enum feature with the union feature.

You can see that to assign a value to the outcome variable, the name of the variant (Result::Success or Result::Failure) is specified, and such name is followed by some comma-separated values, enclosed in parentheses, like in a function call ((1) in the first case and (20, 'X') in the second case).

When you assign a value to an enumeration of such a type, you must specify in parentheses the arguments having the same types specified in the fields of that variant. In the example, in the Success case, an integer is passed; in the Failure case, an integer and a character are passed; and in the Uncertainty case, no parameter is passed. If you pass arguments of other types or in different numbers, you’ll get a compilation error.

The match statement extends the concept of pattern matching to the variant fields. You can see that in the example code there are seven different patterns. The first three of them require a Result::Success variant; the next three require a Result::Failure variant; and the last one requires a Result::Uncertainty variant.

These patterns contain some literals in parentheses. For example, the first arm contains the “0” literal. In case the value to match is a Success variant containing the “0” value, the pattern matches, so this arm is taken and the string “Result: 0” is printed. In case the value is different, this arm is skipped.

The second arm matches only with the value Success(1). If the value to match is still different, even this arm is skipped. The third arm contains an underscore, which is a catch-all pattern, meaning that any Success pattern will match.

In case the value to match is a Failure variant, the fourth arm checks whether the values of the fields to match are 10 and 'X'. In case both fields match, the arm is taken.

Otherwise, the fifth arm is checked. It checks that the value of the first field is 10, and it allows any value for the second field. If the first field is not 10, this arm is skipped.

The sixth arm allows any value for the first field, but requires that the value of the second field is 'X'.

The seventh arm allows any value for both fields of the Failure variant.

The last arm is taken only for an Uncertainty variant .

So, in every pattern there must be as many arguments as the fields defined in their respective declaration. In addition, the types of the literals must be the same of the respective fields. Here too, doing it differently causes compilation errors.

In addition, a compilation error is emitted if a possible case is not considered by any pattern.

The equivalent program in C language is this:
#include <stdio.h>
int main() {
    enum eResult {
        Success,
        Failure,
        Uncertainty
    };
    struct sResult {
        enum eResult r;
        union {
            char value;
            struct {
                unsigned short error_code;
                char module;
            } s;
        } u;
    } outcome;
    /*
    outcome.r = Success;
    outcome.u.value = 1;
    */
    outcome.r = Failure;
    outcome.u.s.error_code = 20;
    outcome.u.s.module = 'X';
    switch (outcome.r) {
        case Success:
            switch (outcome.u.value) {
                case 0:
                    printf("Result: 0");
                    break;
                case 1:
                    printf("Result: 1");
                    break;
                default:
                    printf("Result: other");
                    break;
            }
            break;
        case Failure:
            switch (outcome.u.s.error_code) {
                case 10:
                    switch (outcome.u.s.module) {
                        case 'X':
                            printf("Error: 10 X");
                            break;
                        default:
                            printf("Error: 10");
                            break;
                    }
                    break;
                default:
                    switch (outcome.u.s.module) {
                        case 'X':
                            printf("Error: X");
                            break;
                        default:
                            printf("Error: other");
                            break;
                    }
                    break;
            }
            break;
    case Uncertainty:
        break;
    }
    return 0;
}

As you can see, it is much more verbose.

match Statements with Variables in Patterns

We saw how to use literals in match statements patterns. Rust also allows the use of variables in these patterns, like in this example:
#[allow(dead_code)]
enum Result {
    Success(u8),
    Failure(u16, char),
    Uncertainty,
}
// let outcome = Result::Success(13);
let outcome = Result::Failure(20, 'X');
match outcome {
    Result::Success(0) => print!("Result: 0"),
    Result::Success(1) => print!("Result: 1"),
    Result::Success(n) => print!("Result: {}", n),
    Result::Failure(10, 'X') => print!("Error: 10 X"),
    Result::Failure(10, m) => print!("Error: 10 in module {}", m),
    Result::Failure(code, 'X') => print!("Error: n.{} X", code),
    Result::Failure(code, module) =>
        print!("Error: n.{} in module {}", code, module),
    Result::Uncertainty => {},
}

This will print: Error: n.20 X.

Activating the first declaration of the outcome variable, instead of the second one, it will print: Result: 13.

In the match statement, the patterns of the third, fifth, sixth, and seventh arms contain identifiers never declared. They are n, m, code, and module. Such identifiers are declared as new variables in this way.

Their type is inferred by the type of the corresponding field. So, the type of n is u8; the type of code is u16; and the type of m and of module is char.

Such variables always match with the corresponding field, and, in case the complete pattern matches, they are initialized by the matching value.

The scope of such variables is only the arm in which they are declared. So, if a variable with the same name exists before, it will be shadowed just for the body of the arm.

match Expressions

Similarly to the if-expressions , there are also match-expressions:
#[allow(dead_code)]
enum CardinalPoint { North, South, West, East }
let direction = CardinalPoint::South;
print!("{}", match direction {
    CardinalPoint::North => 'N',
    CardinalPoint::South => 'S',
    _ => '*',
});

This will print: S.

We already saw that if the “if” keyword is used to create an if expression, it must also have an else block, and its returned type must be the same as the block before the else keyword.

The same applies for match expressions: all the arms of any match expressions must have the right sides of the same type. In the example, the three arms have the values 'N', 'S', and '*', so they are all of char type.

If the third arm were replaced by “_ => {},”, you would get the compilation error “match arms have incompatible types.” Indeed, as two arms are of char type, and one of empty tuple type, it is impossible to determine the type of the whole match expression.

Use of Guards in match Constructs

Let’s assume we want to classify the integer numbers in the following categories: all the negative numbers, the zero number, the one number, and all the other positive numbers. Here is the code to perform such classification:
for n in -2..5 {
    println!("{} is {}.", n, match n {
        0 => "zero",
        1 => "one",
        _ if n < 0 => "negative",
        _ => "plural",
    });
}
This program will print:
-2 is negative.
-1 is negative.
0 is zero.
1 is one.
2 is plural.
3 is plural.
4 is plural.

The for statement iterates the integer numbers from -2 included to 5 excluded.

For each processed number, that number is printed, followed by its classification. Such classification is performed by the match construct, which here is used as an expression having a string as its value. Indeed, each of its arms has a literal string as its value.

The third arm differs from the others, though. Its pattern is an underscore, so such an arm should always match, but the pattern is followed by a clause made by the “if” keyword followed by a Boolean condition. Such a clause causes this pattern to match only if that Boolean condition is true. It is named guard, as it protects the expression by an arbitrary Boolean condition, in addition to the pattern to match.

The last arm is the real catch-all pattern.

if-let and while-let Constructs

Sometimes you just need to check whether an enum value is a certain variant, and in that case you need to extract its fields. It can be accomplished by the following code:
enum E {
    Case1(u32),
    Case2(char),
    Case3(i64, bool),
}
let v = E::Case3(1234, true);
match v {
    E::Case3(n, b) => if b { print!("{}", n) }
    _ => {}
}

In this code, we need to check whether the value of v is a Case3 variant, and in such a case we need to use its two fields.

In addition to this code, the Rust language supports the following syntax, which replaces the preceding match statement:
if let E::Case3(n, b) = v {
    if b { print!("{}", n) }
}

Here the construct let E::Case3(n, b) = v tries to match the value of the expression v with the pattern E::Case3(n, b). If the match is successful, the n and b variables get the value of the two fields of v, and they can be used inside the following block. Instead, if the match is unsuccessful, the condition of the if-expression is considered false, and the following expression is not evaluated; if there is an else clause, it is evaluated.

A similar construct exists for while statements, like in the following code:
enum E {
    Case1(u32),
    Case2(char),
}
let mut v = E::Case1(0);
while let E::Case1(n) = v {
    print!("{}", n);
    if n == 6 { break; }
    v = E::Case1(n + 1);
}

It will print: 0123456.

At any iteration, the value of v is matched against the pattern E::Case1(n). As long as the match is successful, the loop is repeated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.3.111