Creating a Linux Kernel Module

Any decent operating system can be extended by loadable modules. This is required to support hardware that is not specifically supported by the organization that created the operating system, and so these loadable modules are often named device drivers.

However, this extensibility of operating systems can also be exploited for other purposes. For example, a specific filesystem or network protocol can be supported by the kernel itself through loadablemodules without changing and recompiling the actual kernel.

In this chapter, we will look at how to buildakernel-loadable module, specifically for the Linux operating system and the x86_64 CPU architecture. The concepts and commands that are described here are also applicable to other CPU architectures.

The following topics will be covered in this chapter:

  • Preparing the environment
  • Creating a boilerplate module
  • Using the global variable
  • Allocating memory
  • Creating a driver for a character device

By the end of this chapter, you will have learned some general concepts regarding operating system-extension modules and, in particular, how to create, manage, and debug Linux kernel modules.

Technical requirements

To understand this chapter, some concepts of the Linux operating system should be known. In particular, you need to know the following:

  • How to use the Linux command interpreter (that is, the shell)
  • How to understand C language source code
  • How to use the GCC compiler or the Clang compiler

If you don't have this knowledge, you can refer to the following web resources:

The code examples in this chapter have onlybeen developed and tested on a specific version of Linux—a Linux Mint distribution with the 4.15.0-72-generic kernel version—and so they are only guaranteed to work with this version.The Mint distribution is derived from the Debian distribution and so it shares most of Debian's commands. The desktop environment is irrelevant.

To run the examples in this chapter, you should have access as a superuser (root) to a system running the preceding distribution based on a CPU with the x86_64 architecture.

To build a kernel module, a lot of boilerplate code needs to be written. This work has already been done for you in an open source project available on GitHub athttps://github.com/lizhuohua/linux-kernel-module-rust. Parts of this GitHub project have been copied into a framework to write Linux kernel modules, which will be used in this chapter. This can be found in thelinux-fwfolder of the repository associated with this chapter.

Also, for simplicity, no cross-compilation will be done—that is, the kernel module will be built in the same operating system in which it will be used. This is a bit unusual as often, loadable modules are developed for operating systems or architectures that are not suitable for software development; in some cases, the target system is too constrained to run a convenient development environment, such as a micro-controller.

In other cases, the opposite applies—the target system is too costly to be used by a single developer, such as a supercomputer.

The complete source code for this chapter can be found in theChapter10folder of the repository at https://github.com/PacktPublishing/Creative-Projects-for-Rust-Programmers.

Project overview

In this chapter, we'll look at four projects that will show you how to build increasingly complex Linux kernel modules:

  • boilerplate: An extremely simple kernel module that shows the minimal requirements to build your own module
  • state: A module that keeps some global static variables—that is, astatic state
  • allocating: A module that allocates heap memory—that is, a dynamic state
  • dots: A module that implements a read-only character device that can be associated with a filesystem pathname, and then it can be read as a file

Understanding kernel modules

Kernel modules must satisfy certain requirements imposed by the operating system, and so it is quite unreasonable to try to write a kernel module in an application-oriented programming language, such as Java or JavaScript. Usually, kernel modules are onlywritten in assembly language or in C, and sometimes in C++. However, Rust is designed to be a system programming language, and so it is actually possible to writekernel-loadable modulesin Rust.

While Rust is usuallya portable programming language—the same source code can be recompiled for different CPU architectures and for different operating systems—this is not the case for kernel modules. A specific kernel module must be designed and implemented for a specific operating system. In addition, a specific machine architecture must usually be targeted, although the core logic can be architecture-independent. So, the examples in this chapter will onlytarget the Linux operating system and the x86_64 CPU architecture.

Preparing the environment

Some of the installation work must be performed with superuser privileges. So, you should prefix the sudocommandbefore any command that installs a system-wide package or that changes something in the kernel. Alternatively, you should routinelywork as a superuser. Needless to say, this is dangerous as you can jeopardize the whole system with a wrong command. To work as a superuser, type the following command into a terminal:

          su root
        

Then, type in your superuser password.

The Linux operating system expects its modules to onlybe written in C. If you want to write a kernel module in Rust, a glue software must be used to interface your Rust code to the C language of Linux.

So, a C compiler must be used to build this glue software. Here the clang compiler will be used. This is part of the Low-Level Virtual Machine(LLVM) project.

The Rust compiler also uses libraries of the LLVM project to generate machine code.

You can install the clang compiler in your Linux system by typing the following commands:

          sudo apt update
sudo apt install llvm clang

Notice that the apt command is typical of Debian-derived distributions and is not available on many Linux distributions, nor on other operating systems.

Then, you need to ensure that the C language headers of your current operating system are installed. You can discover what the version of your current operating system is by typing the uname -r command. This will print something similar to 4.15.0-72-generic. You can install the headers for the specific version of the kernel by using a command similar to the following:

          sudo apt install linux-headers-4.15.0-72-generic
        

You can combine the two commands by typing the following command:

          sudo apt install linux-headers-"$(uname -r)"
        

This will generate the correct command for your system.

At the time of writing, Linux kernel modules can onlybe created using thenightlyversion of the Rust compiler. To install the latest version of this compiler, type the following:

          rustup toolchain install nightly
        

Also, the source code of the Rust compiler and the tool to format Rust source code are needed. You can ensure they are installed by typing the following command:

          rustup component add --toolchain=nightly rust-src rustfmt

To ensure that the nightly toolchain of Rust for the x86_64 architecture and Linux will be used by default, run this command:

          rustup default nightly-x86_64-unknown-linux-gnu
        

This can be shortened to rustup default nightly if there are no other target platforms installed on your system.

We know that the cargo utility has several subcommands, such as new, build, and run. For this project, an additional cargosubcommand will be needed—thexbuildsubcommand. This name stands for cross-build, which means to compile for another platform. Actually, it is used to generate machine code for a platform different from the one running the compiler. In this case, it means that while the compiler we are running is a standard executable that is running in user space, the code we are generating will run in kernel space, and so it will need a different standard library. You can install that subcommand by typing this line:

cargo install cargo-xbuild

Then, after you have downloaded the source code associated with this chapter from GitHub, you are ready to run the examples.

Notice that in the downloaded source code, there is a folder for every project, plus a folder named linux-fw. This contains the framework to develop Linux kernel modules, and the examples assume that it is located in this position.

A boilerplate module

The first project is the minimal, loadable kernel module, and so it is called boilerplate. It will just print a message when the module is loaded and another message when it is unloaded.

In the boilerplate folder, there are the following source files:

  • Cargo.toml: The build directives for the Rust project
  • src/lib.rs: The Rust source code
  • Makefile: The build directives to generate and compile the C language glue code and to link the generated object code into a kernel module
  • bd: A shell script to build a debug configuration of the kernel module
  • br: A shell script to build a released configuration of the kernel module

Let's start with building the kernel module.

Building and running the kernel module

To build the kernel module for debugging purposes, open the boilerplate folder and type in this command:

          ./bd
        

Of course, this file must have executable permissions. However, it should already have them when it is installed from the GitHub repository.

The first time you run this script, it will build the framework itself, and so it will take quite a while. After that, it will build the boilerplate project in a couple of minutes.

After the completion of the build command, several files should appear in the current folder. Among them is one named boilerplate.ko, where ko (short for kernel object) is the kernel module we want to install. Its size is huge because it contains a lot of debugging information.

A Linux command that gives information about a Linux module file is modinfo. You can use it by typing the following command:

          modinfo boilerplate.ko
        

This should print some information about the specified file. To load the module into the kernel, type the following command:

          sudo insmod boilerplate.ko
        

The insmod(insert module) command loads a Linux module from the specified file and adds it to the running kernel. Of course, this is a privileged operation that can jeopardize the safety and security of the whole computer system, and so only a superuser can run it. This explains the need to use the sudo command. If the command is successful; nothing is printed to the terminal.

The lsmod(list module) command prints a list of all the currently loaded modules. To select the one you are interested in, you can filter the output using the grep utility. So, you can type the following command:

          lsmod | grep -w boilerplate
        

If boilerplate is loaded, you will get a line similar to the following:

This line contains the name of the module, the memory used by it in bytes, and the number of current uses of these modules.

To unload the loaded module, you can type the following command:

          sudo rmmod boilerplate
        

The rmmod(remove module)command unloads the specified module from the running Linux kernel. If the module is not currently loaded, this command prints an error message and does nothing.

Now, let's look at the behavior of this module. Linux has a memory-only log area called the kernel buffer. Kernel modules can append lines of text to this buffer. When the boilerplate module is loaded, it appends the boilerplate: Loadedtextto the kernel buffer. When the boilerplate module is unloaded, it appends theboilerplate: Unloaded text. Only the kernel and its modules can write to it, but everyone can read it using the dmesg (short for display messages) utility.

If you typedmesginto the terminal, the whole content of the kernel buffer will be printed to the terminal. Typically, there are thousands of messages in the kernel buffer, written by several modules since the last reboot of the system, but the last two lines should be those appended by the boilerplate module. To view just the last 10 lines while keeping their colors, type the following:

          dmesg --color=always | tail
        

The last two lines should look something like the following:

The first part of any line, enclosed in brackets, is a timestamp written by the kernel. This is the time in seconds and microseconds since the start of the kernel. The rest of the line is written by the module code.

Now, we can see how the bd script built this kernel module.

The build commands

The bd script has the following content:

#!/bin/sh
cur_dir=$(pwd)
cd ../linux-fw
cargo build
cd $cur_dir
RUST_TARGET_PATH=$(pwd)/../linux-fw cargo xbuild --target x86_64-linux-kernel-module && make

Let's see what happened in the code:

  • The first line declares that this is a shell script, and so the Bourne shell program will be used to run it.
  • The second line saves the path of the current folder in a temporary variable.
  • The third, fourth, and fifth lines enter the framework folder, build the frameworkfor a debug configuration, and return back to the original folder.
  • The last line builds the module itself. Notice that it ends with && make. This means that after having successfully run the command in the first part of the line, the command in the second part (themakecommand) must be run. Instead, if the command in the first part fails, the second command will not be run. The line begins with the RUST_TARGET_PATH=$(pwd)/../linux-fwclause. It creates an environment variable namedRUST_TARGET_PATH, which is only valid for the rest of the command line. It contains the absolute pathname of theframeworkfolder. Then, the cargo tool is invoked, with an xbuild --target x86_64-linux-kernel-module argument. This is an xbuild subcommand to compile for a different platform than the current one, and the rest of the command specifies that the target is x86_64-linux-kernel-module.This target is specific to the framework we are using. To explain how this target is used, it is necessary to examine theCargo.tomlfile, which consists of the following code:
[package]
name = "boilerplate"
version = "0.1.0"
authors = []
edition = "2018"

[lib]
crate-type = ["staticlib"]

[dependencies]
linux-kernel-module = { path = "../linux-fw" }

[profile.release]
panic = "abort"
lto = true

[profile.dev]
panic = "abort"

The package section is the usual one. The crate-type item of the lib section specifies that the target of the compilation is a static-link library.

The linux-kernel-module module of the dependencies section specifies the relative path of the folder containing the framework. If you prefer to install the framework folder in another position relative to this project or with another name, you should change this path, as well as the RUST_TARGET_PATH environment variable.

Thanks to this directive, it is possible to use the target specified in the cargo command line.

The remaining sections specify that in case of panic, an immediate abort should be done (with no output) and that in the release configuration,Link-Time Optimization (LTO) should be activated.

After completing this cargo command, thetarget/x86_64-linux-kernel-module/debug/libboilerplate.astatic-link libraryshould have been created. As with any other Linux static-link library, its name starts withlib and ends with.a.

The last part of the command line runs the make utility, which is a build tool used mainly when developing in C. Just as the cargo tool uses the Cargo.toml file to know what to do, the make tool uses the Makefile file for the same purposes.

Here, we don't examine Makefile, but we just say that it reads the static library generated by cargo and encapsulates it with some C language glue code to generate the boilerplate.ko file, which is the kernel module.

In addition to the bd file, there is a br file, which is similar but runs both cargo and make with a release option, and so it generates an optimized kernel module. You can run it by typing the following:

          ./br
        

The generated module will overwrite the boilerplate.ko file, which was created by bd. You can see that the new file is much smaller on disk and, using the lsmod utility, you can see that it is also much smaller in memory.

The source code of the boilerplate module

Now, let's examine the Rust source code of this project. It is contained in the src/lib.rs file. The first line is as follows:

#![no_std]

This is a directive to avoid loading the Rust standard library in this project. Actually, many routines of the standard library assume to be run as application code—in user-space, not inside a kernel—and so they cannot be used in this project. Of course, after this directive, many Rust functions that we are accustomed to using are no longer automatically available.

In particular, no heap memory allocator is included by default and so, by default, vectors and strings that need heap memory allocation are not allowed. If you try to use Vec or the String type, you will get a use of undeclared type or module error message.

The next lines are as follows:

use linux_kernel_module::c_types;
use linux_kernel_module::println;

These lines import some names into the current source file. These names are defined in the framework.

The first line imports the declarations of some data types corresponding to the C language data types. They are needed to interface with the kernel, which expects that modules are written in C. After this declaration, you can use, for example, the c_types::c_int expression, which corresponds to the C languageint data type.

The second line imports a macro named println, just like that of the standard library, which is no longer available. Actually, it can be used in the same way, but instead of printing on the terminal, it appends a line to the kernel buffer, prefixed by a timestamp.

Then, there are two entry points of the module—the init_module function, which is invoked by the kernel when the module is loaded, and the cleanup_module function, which is invoked by the kernel when the module is unloaded. They are defined by the following code:

#[no_mangle]
pub extern "C" fn init_module() -> c_types::c_int {
println!("boilerplate: Loaded");
0
}

#[no_mangle]
pub extern "C" fn cleanup_module() {
println!("boilerplate: Unloaded");
}

Their no_mangle attribute is a directive to the linker to keep this exact function name so that the kernel can find this function by its name. Its extern "C" clause specifies that the function-calling convention must be the one normally used by C.

These functions get no arguments, but the first one returns a value that indicates the outcome of the initialization. A 0 result represents success and a 1 result represents failure. It is specified by Linux that the type of this value is the C language int variable and the c_types::c_int type of the framework represents just that binary type.

Both functions print the messages that we saw in the previous section to the kernel buffer. Also, both functions are optional, but if the init_module function is absent, a warning is emitted by the linker.

The last two lines of the file are as follows:

#[link_section = ".modinfo"]
pub static MODINFO: [u8; 12] = *b"license=GPL";

They define a string resource for the linker to insert into the resulting executable.Thename of that string resource is.modinfo and its value islicence=GPL. That value must be a null-terminated ASCII string because that is the string type normally used in C. This section is not required, but if it is absent, a warning is emitted by the linker.

Using global variables

The module boilerplate of the preceding project just printed some static text. However, it is quite typical for a module to have some variables that must be accessed during the lifetime of the module. Usually, Rust does not use mutable global variables because they are not safe and just defines them in the main function and passes them as arguments to the functions called by main. However, kernel modules do not have a main function. They have entry points called by the kernel and so, to keep shared mutable variables, some unsafe code must be used.

The State project shows you how to define and use shared mutable variables. To run it, enter the state folder and type ./bd. Then, type the following four commands:

          sudo insmod state.ko
          
lsmod | grep -w state
sudo rmmod state
dmesg --color=always | tail

Let's see what we did there:

  • The first command will load the module into the kernel with no output to the console.
  • The second command will show that the module is loaded by fetching all the loaded modules and filtering the one called state.
  • The third command will unload the module from the kernel with no output to the console.
  • The last command will show the two lines added by this module to the kernel buffer. They will look like this:
          [123456.789012] state: Loaded
          
[123463.987654] state: Unloaded 1001

Apart from the timestamps, they differ from the boilerplate example due to the name of the module and the addition of the number 1001 to the second line.

Let's see the source code of this project, showing the differences compared with the boilerplate source code. The lib.rs file contains the following additional lines:

struct GlobalData { n: u16 }

static mut GLOBAL: GlobalData = GlobalData { n: 1000 };

The first line defines a data structure type, named GlobalData, containing only a 16-bit unsigned number. The second line defines and initializes a static mutable variable of this type, named GLOBAL .

Then, the init_module function contains the following additional statement:

unsafe { GLOBAL.n += 1; }

This increments the global variable. As it was initialized to 1000, after the module is loaded, the value of this variable is 1001.

Finally, the statement in thecleanup_modulefunction is replaced by the following:

println!("state: Unloaded {}", unsafe { GLOBAL.n });

This formats and prints the value of the global variable. Notice that both reading and writing a global variable is an unsafe operation as it provides access to a mutable static object.

The bd and br files are identical to those in the boilerplate project. The Cargo.toml and Makefile files differ from those in the boilerplate project due to the replacement of the boilerplatestring with the statestring.

Allocating memory

The preceding project defined a global variable, but it did not carry out memory allocation. Even in kernel modules, it is possible to allocate memory, as shown in the allocating project.

To run this project, open theallocatingfolder and type in./bd. Then, type the following four commands:

          sudo insmod allocating.ko
          
lsmod | grep -w allocating
sudo rmmod allocating
dmesg --color=always | tail

These commands have a behavior quite similar to the corresponding commands for the preceding project, but the last one will print a line that, after the timestamp, will contain the following text:

allocating: Unloaded 1001 abcd 500000

Let's examine the source code of this project and see its differences compared with the boilerplate source code. The lib.rs file contains the following additional lines:

extern crate alloc;
use crate::alloc::string::String;
use crate::alloc::vec::Vec;

The first line explicitlydeclares that a memory allocator is needed. Otherwise, as the standard library is not used, no memory allocator will be linked to the executable module.

The second and third lines are required to include the String and Vec typesin the source code, respectively. Otherwise, they will not be available to the source code. Then, there are the following global declarations:

struct GlobalData {
n: u16,
msg: String,
values: Vec<i32>,
}

static mut GLOBAL: GlobalData = GlobalData {
n: 1000,
msg: String::new(),
values: Vec::new(),
};

Now, the data structure contains three fields. Two of them, msg and values, use heap memory when they are not empty, and the GLOBALvariable initializes all of them. Here, no memory allocation is allowed, and so these dynamic fields must be empty.

In the init_module function, as in other entry points, allocations are allowed, and so the following code is valid:

unsafe {
GLOBAL.n += 1;
GLOBAL.msg += "abcd";
GLOBAL.values.push(500_000);
}

This changes all the fields of the global variable, allocating memory for both the msgstringand the valuesvector. Finally, the global variable is accessed to print its values by using the following statement in thecleanup_modulefunction:

unsafe {
println!("allocating: Unloaded {} {} {}",
GLOBAL.n,
GLOBAL.msg,
GLOBAL.values[0]
);
}

The rest of the code is unchanged.

A character device

Unix-like systems are famous for their feature that maps I/O devices to the filesystem. In addition to the predefined I/O devices, it is possible to define your own devices as kernel modules. A kernel device can be attached to real hardware or it can be virtual. In this project, we will build a virtual device.

In Unix-like systems, there are two kinds of I/O devices: block devices and character devices. The former handle packets of bytes in a single operation (that is, they are buffered), while the latter can handle only one byte at a time, with no buffering.

In general, a device can be read, written, or both. Our device will be a read-only device. So, we are going to build a filesystem-mapped, virtual, read-only character device.

Building the character device

Here, we are going to build a character device driver (or character device for short). A character device is a device driver that can handle only one byte at a time with no buffering. The behavior of our device will be quite simple— for every byte read from it, it will return a dot character, but for every 10 characters, an asterisk will be returned instead of a dot.

To build it, open the dotsfolderand type in ./bd. Several files will be created in the current folder, including the dots.kofile, which is our kernel module.

To install it and check whether it is loading, type the following:

          sudo insmod dots.ko
          
lsmod | grep -w dots

Now, the kernel module is loaded as a character device, but it is not yet mapped to a special file. However, you can find it among the loaded devices by using the following command:

          grep -w dots /proc/devices
        

The /proc/devicesvirtual file contains a list of all the loaded device modules. Among them, in the Character devices section, there should be a line like this:

236 dots

This means that there is a loaded character device driver named dots whose internal identifier is 236. This internal identifier is also named a major number because it is the first number of a pair of numbers that actually identifies the device. The other number, known as a minor number, is not used but can be set to 0.

The major number may vary from system to system and from loading to loading because it is assigned by the kernel when the module is loaded. Anyway, it is a small, positive integer number.

Now, we must associate these device drivers with a special file, which is an entry point in the filesystem, that can be used as a file, but is actuallya handle to a device driver. This operation is performed by the following command, in which you should replace236 with the major number you found in the/proc/devicesfile:

          sudo mknod /dev/dots1 c 236 0
        

The mknod Linux command creates a special devicefile. The preceding command creates a special file named dots1 in the dev folder.

This is a privileged command for two reasons:

  • Only a superuser can create special files.
  • Only a superuser can create a file in the dev folder.

The c character means that the created device will be a character device. The followingtwo numbers—236and0—are the major and minor numbers of the new virtual device.

Notice that the name of the special file (dots1) can be different from the name of the device (dots) because the association between the special file and the device driver is performed through the major number.

After creating the special file, you can read some bytes from it. The head command reads the first lines or bytes of a text file. So, type the following:

          head -c42 /dev/dots1
        

This will print the following text to the console:

          .........*.........*.........*.........*..
        

This command reads the first 42 bytes from the specified file.

When asked for the first byte, the module returns a dot. When asked for the second byte, the module returns another dot, and so on for the first nine bytes. However, when asked for the 10th byte, the module returns an asterisk. Then, this behavior is repeated—after nine dots, an asterisk is returned over and over again. In fact, only 42 characters are returned because the head command requested 42 characters from our device.

In other words, if the character generated by the module has an ordinal number that is a multiple of 10, then it is an asterisk; otherwise, it is a dot.

You can create other special files based on the dots module. For example, type the following:

          sudo mknod /dev/dots2 c 236 0
        

Then, type the following command:

          head -c12 /dev/dots2
        

This will print the following text to the console:

          .......*....
        

Notice that 12 characters are printed, as requested by the head command, but this time, the asterisk is at the 8th character, instead of the 10th. This happens because both the dots1 and dots2 special files are associated with the same kernel module, with an identifier (236, 0) and the name dots. This module remembers it has already generated 42 characters, and so after it has generated seven dots, it has to generate its 50th character, which must be an asterisk as it is a multiple of 10.

You can try to type the whole file, but these operations will never end spontaneously because the module will continue to generate characters, as if it were an infinite file. Try to type the following command, and then stop it by pressing Ctrl + C:

          cat /dev/dots1
        

A fast stream of characters will be printed until you stop it.

You can remove the special files by typing the following command:

          sudo rm /dev/dots1 /dev/dots2
        

You can unload the module by typing the following:

          sudo rmmod dots
        

If you unload the module without removing the special files, they will be invalid. If you then try to use one of them, such as by typing head -c4 /dev/dots1, you will get the following error message:

          head: cannot open '/dev/dots1' for reading: No such device or address
        

Now, let's see what has been appended to the kernel buffer by typing the following:

          dmesg --color=always | tail
        

You will see that the last two lines that are printed will be similar to the following:

          [123456.789012] dots: Loaded with major device number 236
          
[123463.987654] dots: Unloaded 54

The first line, printed at module loading, also shows the major number of the module. The last line, printed at module unloading, also showsthe total number of bytes generated by the module (42 + 12 = 54, if you didn't run thecatcommand). Now, let's see the implementation of this module.

The source code of the dots module

The only relevant differences that you will find from the other projects are in the src/lib.rs file.

First, the src/lib.rs file declares the use of the Box generic type, which is not included by default, similar to String and Vec in the preceding project. Then, it declares some other bindings to the kernel:

use linux_kernel_module::bindings::{
__register_chrdev, __unregister_chrdev, _copy_to_user, file, file_operations, loff_t,
};

Their meanings are as follows:

  • __register_chrdev: The function to register a character device in the kernel.
  • __unregister_chrdev: The function to unregister a character device from the kernel.
  • _copy_to_user: The function to copy a sequence of bytes from kernel space to user space.
  • file: The data type representing a file. This is not really used in this project.
  • file_operations: The data type containing the implemented operation on files. Only the read operation is implemented by this module. Consider this to be the perspective of the user code. When the user code reads, the kernel module writes.
  • loff_t: The data type representing a long memory offset, as used by the kernel. This is not really used in this project.

The global information

The global information is kept in the following data type:

struct CharDeviceGlobalData {
major: c_types::c_uint,
name: &'static str,
fops: Option<Box<file_operations>>,
count: u64,
}

Let's understand the preceding code:

  • The first field (major) is the major number of the device.
  • The second field (name) is the name of the module.
  • The third field (fops, short for file operations) is the set of references to the functions that implement the required file operations. This set of references will be allocated to the heap, and so it is encapsulated in a Box object. Any Box object must encapsulate a valid value since its creation, but the set of references to file operations referenced by the fops field can only be created when the kernel initializes the module; so, this field is encapsulated in an Option object, which will be initialized as None by Rust and will receive a Box object when the kernel initializes the module.
  • The last field (count) is the counter of generated bytes.

As anticipated, the following is the declaration and initialization of the global object:

static mut GLOBAL: CharDeviceGlobalData = CharDeviceGlobalData {
major: 0,
name: "dots",
fops: None,
count: 0,
};

The module contains only three functions: init_module, cleanup_module, and read_dot. The first two functions are the ones invoked by the kernel when the module is loaded and unloaded, respectively. The third function is called by the kernel every time some user code tries to read a byte from this module.

While theinit_module and cleanup_module functions are linked using their name (so they must have exactly these names) and must be preceded by the #[no_mangle] directive to avoid that their name is changed by Rust, the read_dot function will be passed to the kernel through its address, and not its name. Therefore, it can have any name you like, and the #[no_mangle] directive is not required for it.

The initialization call

Let's see the first part of the body of theinit_module function:

let mut fops = Box::new(file_operations::default());
fops.read = Some(read_dot);
let major = unsafe {
__register_chrdev(
0,
0,
256,
GLOBAL.name.as_bytes().as_ptr() as *const i8,
&*fops,
)
};

In the first statement, a file_operations structure, containing the references to the file operations, is created with default values and put into a Box object.

The default value of any file operation is None, meaning that nothing is performed when this kind of operation is required. We will use just the read file operation and we will need this operation to call the read_dot function. Therefore, in the second statement, this function is assigned to the read field of the newly created structure.

The third statement calls the __register_chrdevkernel function, which registers a character device. This function is officially documented on a web page, available at https://www.kernel.org/doc/html/latest/core-api/kernel-api.html?highlight=__register_chrdev#c.__register_chrdev. The five arguments of this function have the following purposes:

  • The first argument is the required major number of the device. However, if it is 0, as in our case, a major number will be generated by the kernel and returned by the function.
  • The second argument is the value to start from in order to generate the minor number. We will start from 0.
  • The third argument is the number of minor numbers that we request to allocate. We will allocate 256 minor numbers, from 0 to 255.
  • The fourth argument is the name of the range of devices we are registering. The kernel expects a null-terminated ASCII string. Therefore, the name field has been declared with an ending binary of 0, and here, a rather complex expression just changes the data type of this name. The as_bytes() call converts the string slice into a byte slice. The as_ptr() call gets the address of the first byte of this slice. The as *const i8 clause converts this Rust pointer into a raw pointer to bytes.
  • The fifth argument is the address of the file operation structure. Only its read field will be used by the kernel when a read operation is performed.

Now, let's see the rest of the body of the init_module function:

if major < 0 {
return 1;
}
unsafe {
GLOBAL.major = major as c_types::c_uint;
}
println!("dots: Loaded with major device number {}", major);
unsafe {
GLOBAL.fops = Some(fops);
}
0

The major number returned by the call to __register_chrdev should be a non-negative number generated by the kernel. It is onlya negative number in the case of an error. As we want to fail the loading of the module in case of a registration fail, we return 1—in this case, meaning there has been a failure in the loading of the module.

In case of success, the major number is stored in the major field of our global structure. Then, a success message is added to the kernel buffer, containing the generated major number.

Finally, the fops file operation structure is stored in the global structure.

Notice that after the registration call, the kernel keeps the address of the fops structure, and so this address should never be changed while the function is registered. This holds, however, because this structure is allocated by the Box::new call and the assignment of fops moves just the Box object, which is the pointer to the heap object, not the heap object itself. This explains why a Box object has been used.

The cleanup call

Now, let's look at the body of the cleanup_module function:

unsafe {
println!("dots: Unloaded {}", GLOBAL.count);
__unregister_chrdev(
GLOBAL.major,
0,
256,
GLOBAL.name.as_bytes().as_ptr() as *const i8,
)
}

The first statement prints the unloading message to the kernel buffer, including the total count of bytes read from this module since its loading.

The second statement calls the __unregister_chrdevkernel function, which unregisters a previously registered character device. This function is officially documented on a web page, available athttps://www.kernel.org/doc/html/latest/core-api/kernel-api.html?highlight=__unregister_chrdev#c.__unregister_chrdev.

Its arguments are quite similar to the first four arguments of the function used to register the device. They must be identical to the corresponding registered values. However, while, in the registering function, we specified 0 as the major number, here we must specify the actual major number.

The reading function

Finally, let's see the definition of the function that will be invoked by the kernel every time some user code tries to read a byte from this module:

extern "C" fn read_dot(
_arg1: *mut file,
arg2: *mut c_types::c_char,
_arg3: usize,
_arg4: *mut loff_t,
) -> isize {
unsafe {
GLOBAL.count += 1;
_copy_to_user(
arg2 as *mut c_types::c_void,
if GLOBAL.count % 10 == 0 { "*" } else { "." }.as_ptr() as *const c_types::c_void,
1,
);
1
}
}

Also, this function must be decorated by the extern "C" clause to ensure that its calling convection is the same as the one used by the kernel, which is the one used by the system's C language compiler.

This function has four arguments, but we will onlyuse the second one. This argument is a pointer to a structure in user-space where the generated character must be written. The body of the function contains only three statements.

The first statement increments the total count of bytes read by the user code (which is written by the kernel module).

The second statement is a call to the _copy_to_user kernel function. This is the function to use when you want to copy one or more bytes from a memory area controlled by kernel code to a memory area controlled by the user code because a simple assignment is not allowed for this operation. This function is officially documented at https://www.kernel.org/doc/htmldocs/kernel-api/API---copy-to-user.html

Its first argument is the destination address, which is the memory position where we want to write our byte. In our case, this is simply the second argument of the read_dot function, converted into the proper data type.

The second argument is the source address, which is the memory position where we put the byte we want to return to the user. In our case, we want to return an asterisk after every nine dots. So, we check whether the total number of read characters is a multiple of 10. For this case, we use a static string slice containing only an asterisk: otherwise, we have a string slice containing a dot. The call to as_ptr() gets the address of the first byte of the string slice and the as *const c_types::c_voidclause converts it into the expected data type that corresponds to the const void *C language data type.

The third argument is the number of bytes to copy. Of course, in our case, this is 1.

That's all that is needed to emit dots and asterisks.

Summary

In this chapter, we looked at the tools and techniques that can be used to create loadable modules for the kernel of the Linux operating system using Rust, instead of the typical C programming language.

In particular, we saw the sequence of commands that can be used in a Mint distribution on an x86_64 architecture to configure the appropriate environment to build and test loadable kernel modules. We also looked at the modinfo, lsmod, insmod, rmmod,dmesg, and mknodcommand-line tools.

We saw that to create a kernel module, it is useful to have a framework of code that implements a target framework for the Rust compiler. The Rust source code is compiled to a Linux static libraryusing this target. Then, this library is linked with some C language glue code into a loadable kernel module.

We created four projects of increasing complexity—boilerplate, state, allocating, and dots. In particular, the dots project created a module that can be mapped to a special file using the mknod command; after this mapping, when this special file is read, a stream of dots and asterisks is generated.

In the next and final chapter, we'll consider the advancements of the Rust ecosystem over the next few years—the language, the standard library, the standard tooling, and the freely available libraries and tools. A description of the newly supported asynchronous programming is also included.

Questions

  1. What is a Linux loadable kernel module?
  2. What is the programming language expected to be used by the Linux kernel for its modules?
  3. What is the kernel buffer and what is the first part of every line in it?
  4. What is the purpose of the modinfo, lsmod, insmod, and rmmod Linux commands?
  5. Why, by default, are the String, Vec, and Box data types not available to Rust code for building kernel modules?
  6. What is the purpose of the #[no_mangle]Rust directive?
  7. What is the purpose of the extern "C"Rust clause?
  8. What is the purpose of the init_module and cleanup_module functions?
  9. What is the purpose of the __register_chrdev and __unregister_chrdev functions?
  10. Which function should be used to copy a sequence of bytes from kernel space memory to user-space memory?

Further reading

The framework used for the projects in this chapter is a modification of the open source repository that can be found athttps://github.com/lizhuohua/linux-kernel-module-rust. This repository contains further examples and documentation pertaining to this topic.

The documentation for the Linux kernel can be found at https://www.kernel.org/doc/html/latest/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.100.42