How to get the arguments from the command line used to launch the program
How to return a status code to the operating system when exiting the program
How to get and set the process environment variables
Techniques and best practices to handle runtime errors
How to read from the console keyboard and how to write to the console screen
How primitive types are converted to string
How to convert a string to a primitive type
How to read or write a binary file
How to read a text file, a line at a time
Command-Line Arguments
it will print: [./main][first][second], even if there are several spaces between the input words.
The args standard library function returns an iterator over the command-line arguments. Such an iterator has type Args, and it produces String values. The first value produced is the program name, with the path used to reach it. The others are the program arguments.
Any blank is usually removed; to keep blanks, you have to enclose arguments in quotation marks, which will be removed. If you launch ./main " first argument" "second argument ", it will print: [./main][ first argument][second argument ].
Process Return Code
This program will terminate immediately when it invokes the exit function, and it will return to the launching process the number 107.
If this program is launched from a console of Unix, Linux, or MacOS, and afterward you write the command echo $?, you will get 107 printed on the console. The corresponding Windows command is echo %errorlevel%.
Environment Variables
It will print: [Err(NotPresent)] [Ok("This is the value")]. First, probably, the abcd environment variable is not yet defined, so the invocation of the var function returns the Err variant of a Result value. The specific kind of error is the enum NotPresent. Then, such an environment variable is set for the current process, by invoking the set_var function . And so, it is found at the next try to get it, and its string value is returned inside an Ok variant.
It will print: Undefined, This is the value .
Reading from the Console
The stdin function returns a handle to the standard input stream of the current process. On that handle, the read_line function can be applied. It waits for an end-of-line or an end-of-file character from the standard input stream, and then it tries to read all the characters present in the input buffer. The read may fail, because another thread may be reading the standard input at the same time.
If that read is successful, the characters read are appended to the string contained in the line variable. That variable is received as argument by reference to a mutable object. The read_line function returns an Ok result object, whose data is the number of bytes read. Notice that such number is 6, because in addition to the five bytes of the string Hello, there is the end-of-line control character. In fact, when the line variable is printed, the terminating closed bracket is printed in a separate line, because the end-of-line character is printed too.
If the read_line function cannot read characters from the standard input stream, it returns an Err result object, and it does not change the value of the line variable.
If your keyboard does not allow you to type those characters, try to type any non-ASCII character.
First, notice that the string printed in the last line spans three lines, as it contains two end-of-line characters. In addition, it contains the 7-byte ASCII string “First: ”, and the 8-byte ASCII string “Second: ”. Also Hello is an ASCII string, and it contains 5 bytes. As we saw in another chapter, the eè€ string contains 6 bytes, so we have 7 + 6 + 1 + 8 + 5 + 1 = 28 bytes.
Second, let’s see how the contents of the text variable are built up. Notice that the read_line function appends the typed line to the object specified by its argument, instead of overwriting it. The text variable is initialized to contain “First:”. In the third line, the first typed line is appended to those contents. Then, in the fourth line, the literal string “Second:” is appended to it. Finally, in the fifth line, the second typed line is appended.
Third, notice that when the read_line function reads the input buffer, it clears it, as the original buffer contents are not read again when the function is invoked for the second time.
However, when this program is compiled, the compiler emits, for both invocations of read_line, the warning unused `Result` that must be used. It means that read_line returns a value of type Result, and that value is ignored. The generic type Result is meant to distinguish between two cases: a success (the Ok variant) and a failure (the Err variant). In this code, such distinction between success and failure is lost, because the return values are not used in any way. In general, it is a bad practice to ignore a return value of type Result, because such a type could represent a runtime error, and the program logic does not take into account such a kind of error. The compiler just warns about this bad practice.
This practice is surely dangerous in production code, but it is not appropriate in debug code either, as it hides the errors that you are looking for.
Therefore, in debug code, it is appropriate to always write at least an unwrap() clause . Such clause ensures that the program will proceed only if the previous call was successful. In case of failure of the previous call, the unwrap call will panic. This is usually good when debugging, because in this way we find where the function fails unexpectedly.
But in production code, matters are not so simple.
Proper Runtime Error Handling
Real-world software often happens to make invocations of functions that return a Result type value. Let’s call such functions fallible. A fallible function normally returns an Ok, but in exceptional cases it returns an Err.
In C++, Java, and other object-oriented languages, the standard error handling technique is based on the so-called exceptions, and on the throw, try, and catch keywords. In Rust, there are no such things; all error-handling is based on the Result type, its functions, and the match statement, or on the panic concept that we have already seen in Chapter 5.
Assume, as it is typical, that you are writing a function f, which, to accomplish its task, has to invoke several fallible functions, f1, f2, f3, and f4. If no function fails, the behavior is this: the argument of f is passed to f1, the value returned by f1 is passed to f2, the value returned by f2 is passed to f3, the value returned by f3 is passed to f4, and the value returned by f4 is returned by f.
But now consider that such functions f1, f2, f3, and f4 are fallible. Each of them returns an error message if it fails, or a result if it is successful. If any one of such functions fails, its error message should be immediately returned by the f function as its error message. If a function is successful, its result should be passed on as shown in the previous code.
As further requirements, let’s assume that f1 fails if its argument is 1, f2 fails if its argument is 2, f3 fails if its argument is 3, and f4 fails if its argument is 4. Otherwise, all of them return the argument received. So, the calls f(1), f(2), f(3) and f(4) should fail, and, for any other integer n, f(n) should return n.
It is quite obvious that such a pattern becomes unwieldy as the number of invocations increases, because the indentation level increases by two at every invocation added.
Every intermediate result is stored in a local variable, and then such variable is checked using the is_err function. In case of failure, that local variable is returned as the failure result of f; in case of success, the unwrap function is used to extract the actual result from the local variable.
This means that the expression is evaluated, and in case the result represents a success (Ok), the value inside it becomes the value of the whole expression; and in case the result represents a failure (Err), the containing function is exited returning such result.
In other words, the macro examines if its argument is Some or Ok, and in such cases it unwraps the argument; otherwise it returns the argument as a return value of the containing function.
It will print: Ok(300) Ok(210) Err("Negative argument") .
The function f1 is invoked three times, with arguments 10, 7, and -1. It invokes the function f2, passing the double of the argument it receives. So f2 receives the values 20, 14, and -2. This function checks its argument, and fails if it is negative; otherwise it returns its argument multiplied by 5, encapsulated in an Ok variant. So, the values returned by the three calls to f2 are Ok(100), Ok(70), and Err("Negative argument".to_string()).
So far, all the numbers are of type i32. The f1 function uses the question mark macro to check whether the value returned by its call to f2 is Ok or Err. In the second case, the error value is forwarded outside of the function. For such forward to be legal, the f1 function must have a return value of type Result, with the Err variant containing a String value. This is required, because the question mark macro cannot change the type of the error variant.
In case the call to f2 returned an Ok variant, the question mark macro strips the Ok casing and reveals the i32 value inside it. That value can be converted to another type; here it is an i64 type. Then the f1 function multiplies that value by 3 and returns the successful result as an Ok(i64) variant. So the value that we got for the three calls to f1 are Ok(300i64), Ok(210i64), and Err("Negative argument".to_string()), which are the printed values.
In general, the question mark macro can be used only in a function whose return value type has the form Result<T1, E> or Option<T1>. In the first case, it can be applied only to an expression whose type is Result<T2, E>, where T2 can be different from T1 but E must be the same; but, if the enclosing function return value type is Option<T1>, the question mark macro can be applied only to an expression whose type is Option<T2>.
In this program, the main function has a signature different than the usual one. Instead of having no return value type (that implies an empty tuple), it has the return value type Result<(), String>. Here, the success type must be an empty tuple, but the failure type can be any type that can be printed for debugging, including strings and all primitive types.
Inside the main function the increment fallible function is defined. It could have been defined just as well outside the main function.
The increment function fails if its argument is negative; otherwise it returns a number.
Then the main function contains three print statements, all invoking the increment function.
The first one succeeds, so the number 5 is printed.
The second one fails, so the string received from the increment function is returned as error from the main function. The Rust runtime support that invoked the main function, receives this error value and prints it on the console using debug formatting, preceded by the string “Error:”. Of course, this kind of output is only for debugging purposes.
The last two statements are never executed.
Notice that this main function must end with the expression Ok(()), because its signature expects a Result.
Writing to the Console
It will print: Hello world.
The stdout standard library function returns a handle to the standard output stream of the current process. The write function can be applied on that handle.
However, the write function cannot directly print static or dynamic strings, and of course neither numbers nor general composite objects.
The write function gets an argument of &[u8] type, which is a reference to a slice of bytes. Such bytes are printed to the console as a UTF-8 string. So, if you want to print a value that is not a slice of bytes in UTF-8 format, first you have to translate it to such a sequence of bytes.
To convert both a static string and a dynamic string to a reference to a slice of bytes, you can use the as_bytes function.
Finally, notice that the write function returns a Result type value, that is, it is a fallible function. If you are quite sure it is not going to fail, you’d best invoke the unwrap function on its return value.
Converting a Value to a String
It will print: 45 4.5 true.
The to_string function allocates a String object, whose header is in the stack and whose contents are in the heap. Therefore, it is not very efficient.
Converting a String to a Value
The parse::<T> function of strings returns a value of type Result<T, E>, where T is the type specified in the call, and E is a type that describes the kind of parse error.
The first call to the parse function specifies that the string must be parsed as a Boolean value. That conversion is successful, so the resulting value, encapsulated in an Ok enum variant, is printed.
The second call to the parse function specifies that the string must be parsed as a single-precision floating-point number. That conversion is successful, so the resulting number is printed.
The third call to the parse function cannot perform the conversion, so an Err variant is returned. Inside that variant it is specified that it is an error of parsing floating-point numbers, and that the kind of error is Invalid.
File Input/Output
The second line invokes the create function to create a file named data.txt in the current folder of the file system. This function is fallible, and, if it is successful in creating that file, it returns a file handle to the file just created.
The last line invokes the write_all function to write some bytes in the newly create file. The saved bytes are the six bytes representing the string eè€.
Before, we saw the write function, which is quite similar to the write_all function. Actually, they are equivalent, when they write to a disk file or to the console. The difference appears when they write to a stream that can accept a limited number of bytes at a time, like a network connection. The write function makes just one attempt to output the buffer, and, if the call is at least partially successful, it returns how many bytes have been actually written, so that further calls can proceed later to output the remaining bytes. Instead, the write_all function keeps making attempts to output the buffer, until the whole buffer is output, or until an error happens. In case of success, it returns just an empty tuple.
This program will print: eè€.
The second line invokes the open function to open an existing file named data.txt in the current folder. This function fails if the file does not exist, or if it is not accessible for whatever reason. If it succeeds, a file handle to such file is assigned to the file variable.
The fourth line invokes the read_to_string function on the file handle to read all the contents of that file and append them to a string variable, passed by reference to a mutable object.
The last line prints to the console the contents just read from the file.
So now, putting together the previous two programs, you should be able to copy a file into another one. But if a file is huge, it may be infeasible to load it all into a string before writing it. It is required to read and write a portion of the file at a time. However, it is inefficient to read and write small portions.
This program must be launched passing two command-line arguments. The first one is the path of the source file, and the second one is the path of the destination file.
The lines from the third to the sixth one assign to the source variable the contents of the first argument, and to the destination variable the contents of the second argument.
The next two lines open the two files. First the source file is opened, and the new handle is assigned to the file_in variable. Then the destination file is created (or truncated, if already existing), and the new handle is assigned to the file_out variable.
Then a 4096-byte buffer is allocated in the stack.
At last, a loop repeatedly reads a 4096-byte chunk from the source file and writes it to the output file. The number of bytes read is implicitly specified by the length of the buffer. But if the remaining portion of the file is not long enough, the read bytes do not fill the buffer.
So, we need the number of bytes read. Such value is put into the nbytes variable.
For a file larger than 4096 bytes, at the first iteration the number of bytes read will be 4096, so some other iterations will be required. For a very small file, one iteration will be enough.
In any case, the buffer must be written to the output file up to the number of bytes read. So, a slice of the buffer is taken from the beginning to the number of read bytes.
Then, if the number of bytes read was less than the length of the buffer, the loop is terminated, as the end of the input file has been reached. Otherwise the loop continues with other iterations.
Notice that there is no need to explicitly close the files. As soon as the file handles exit their scopes, the files are automatically closed, saving and releasing all internal temporary buffers. Differently from the corresponding type of the C++ standard library, fstream , which can be opened after being instantiated and then explicitly closed before being destroyed, so that in any instant can be in open or closed state, Rust File objects are necessarily opened when they are created, and they cannot be closed explicitly. So, when a File object exists, it is necessarily open. To anticipate the closing of a File object, you can add a pair of braces that encloses just the portion of code in which that file must remain open.
Processing Text Files
We saw how to sequentially read or write a file of arbitrary data, usually named binary data.
But when a file contains raw text, like a program source file, it is more convenient to process it a line at a time.
In the first line, the invocation of args gets the command-line iterator and stores it into the command_line variable.
In the second line, the command-line argument at position zero is discarded.
In the third line, the command-line argument at position one is consumed and assigned to the pathname variable. If there is no such argument, the next() function call returns None, so the unwrap() function applied to it panics.
In the fourth line, the count_lines function , defined later, is invoked, passing to it a reference to the path name of the file to read. It is a fallible function. If it is successful, it returns a tuple of two values: the total number of lines counted in the read file, and the number of those lines that are empty or that contain only blanks. That pair is assigned to the counts variable.
The fifth, sixth, and seventh lines are print statements.
From the ninth line, there is the declaration of the count_lines function. It gets a string slice as an argument, and returns a Result that in case of success is a pair of u32 numbers, and in case of failure is a standard I/O error.
The open function is invoked to get a handle for the file indicated by the path name received as an argument. The question mark following it means that if the open function fails, the count_lines function immediately returns the same error code returned by the open function.
The operations performed on a file are not buffered by default. That is optimal if you don’t need buffering, or if you prefer to apply your own buffering. Text lines are usually much shorter than the optimal I/O buffer size, so, when reading a text file, it is more efficient to use a buffered input stream.
After having created a BufReader object, there is no more need to explicitly use the existing File object; so the newly created object can be assigned to another variable named f, so that it will shadow the preexisting variable.
Then, the two counters n_lines and n_empty_lines are declared and initialized.
Then, there is the loop over the file contents. The BufReader type provides the lines iterator generator. It returns an iterator over the lines contained in the file. Notice that Rust iterators are lazy; that is, there is never a memory structure containing all the lines, but every time the iterator is asked for a line, it asks the file buffered reader for a line, and then provides the obtained line. So, at each iteration, the for-loop puts the next line into the line variable and executes the loop block.
But any file read can fail, so line is not a simple string; its type is Result<String, std::io::Error>. Therefore, when it is used, line is followed by a question mark, to get its string value or to return the I/O error.
In the loop body, the n_lines counter is incremented by one at any line, while the n_empty_lines counter is incremented by one only when the line has zero length, after having removed from it any leading or trailing blanks by invoking trim.
The last statement returns a successful value: Ok. The data of such value are the two counters.