In this chapter, you will look at using Go to explore the container world. You will look at different container-related projects to get a better understanding about containers and some of the technologies they use. There are many different aspects of containers such as security, troubleshooting, and scaling container registries. This chapter will give you an understanding of the following topics:
The Linux namespace
Understanding cgroups and rootfs
How containers use rootfs
You will explore different open source projects to understand how containers work and how tools such as Docker actually work.
Linux Namespace
In this section, you will look at namespaces, which are key components in running containers on your local or cloud environment. Namespaces are features that are only available in the Linux kernel, so everything that you will read here is relevant to the Linux operating system.
A namespace is a feature provided by the Linux kernel for applications to use, so what actually is it? It is used to create an isolated environment for processes that you want to run with their own resources.
Figure 4-1 shows a representation of each isolated namespace that is running applications with its own network. Each application that is running inside a namespace cannot access anything outside its own namespace. For example, App1 cannot access App2 resources. If for some reason App1 crashes, it will not bring down the other applications, nor it will bring down the Linux host. Think of a namespace as an island to run applications; it can provide anything you need for the applications to run on without disturbing the other surrounding islands.
A diagram of the Linux namespace has blocks with app and network numbered 1 to 3.
Figure 4-1
Linux namespace
You can create namespaces using tools that are already available in the Linux system. One of the tools you are going to experiment with is called unshare. It is a tool that allows users to create namespaces and run applications inside that namespace.
Before you run unshare, let’s take a look my local host machine compared to when I run the app using unshare. We will compare the following:
The applications that are running in the host machine compared to when we run them inside a namespace
The available network interface in a host machine compared to when inside the namespace
To list running applications on my local Linux machine, I use the command ps.
ps au
The following is a snippet of the list of applications that are currently running on my local machine:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
3: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000 link/ether xx:xx:xa:xx:xx:xx brd xx:xx:xx:xx:xx:xx
...
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
A screenshot of the output screen depicts two factors of running unshare named Nanik and root.
Figure 4-2
Running unshare
Inside the new namespace, as seen in Figure 4-2, it will only display two processes and one network interface (local interface). This shows that the namespace is isolating access to the host machine.
You have looked at using unshare to create namespaces and run bash as an application isolated in its own namespace. Now that you have a basic understanding of namespaces, you will explore another piece of the puzzle called cgroups in the next section.
cgroups
cgroups stands for control groups, which is a feature provided by the Linux kernel. Namespaces, which we discussed in the previous section, go hand in hand with cgroups. Let’s take a look at what cgroups contains. cgroups gives users the ability to limit certain resources such as the CPU and memory network allocated for a particular process or processes. Host machines resources are finite, and if you want to run multiple processes in separate namespaces, you want to allocate resources across different namespaces.
cgroups resides inside the /sys/fs/cgroupdirectory. Let’s create a subdirectory inside the main cgroup directory and take a peek inside it. Run the following command to create a directory using root:
sudo mkdir /sys/fs/cgroup/example
List the directories inside the newly created directory using the following command:
sudo ls /sys/fs/cgroup/example -la
You will see output that looks like the following:
-r--r--r-- 1 root root 0 May 24 23:06 cgroup.controllers
-r--r--r-- 1 root root 0 May 24 23:06 cgroup.events
-rw-r--r-- 1 root root 0 May 24 23:06 cgroup.freeze
...
-rw-r--r-- 1 root root 0 May 24 23:06 cgroup.type
-rw-r--r-- 1 root root 0 May 24 23:06 cpu.idle
-rw-r--r-- 1 root root 0 May 24 23:06 cpu.max
-rw-r--r-- 1 root root 0 May 24 23:06 cpu.max.burst
-rw-r--r-- 1 root root 0 May 24 23:06 cpu.pressure
-rw-r--r-- 1 root root 0 May 24 23:06 cpuset.cpus
-r--r--r-- 1 root root 0 May 24 23:06 cpuset.cpus.effective
-rw-r--r-- 1 root root 0 May 24 23:06 cpuset.cpus.partition
-rw-r--r-- 1 root root 0 May 24 23:06 cpuset.mems
-r--r--r-- 1 root root 0 May 24 23:06 cpuset.mems.effective
...
-rw-r--r-- 1 root root 0 May 24 23:06 io.max
...
-rw-r--r-- 1 root root 0 May 24 23:06 memory.low
-rw-r--r-- 1 root root 0 May 24 23:06 memory.max
-rw-r--r-- 1 root root 0 May 24 23:06 memory.min
-r--r--r-- 1 root root 0 May 24 23:06 memory.numa_stat
-rw-r--r-- 1 root root 0 May 24 23:06 memory.oom.group
-rw-r--r-- 1 root root 0 May 24 23:06 memory.pressure
...
The directories that you see are actually the configurations that you can set values relevant to the resources that you want to allocate for a particular process. Let’s take a look at an example.
You will run a tool called stress (https://linux.die.net/man/1/stress), which you need to install to your local machine. If you are using Ubuntu, you can use the command
sudo apt install stress
Open a terminal and run the stress tool as follows. The application will run for 60 seconds using one core and consuming 100% of CPU usage.
stress --cpu 1 --timeout 60
Open another terminal and run the following command to obtain the process id of the stress application:
top
On my local machine, the process id is 2185657, as shown in Figure 4-3.
A screenshot of the output screen depicts a table column named P I D, User, P R, N I, V I R T, RES, S H R, S, C P U in percentage, Memory in percentage, and time and command.
Figure 4-3
Output of top
Now insert the value of the process id into the cgroups directory as follows:
The command allocates 20% of the CPU usage for all processes inside the example cgroups, and for this example, the stress application process id is marked as part of the example cgroups. If you have your terminal running top open, you will see that the stress application will now only consume 20% instead of 100%.
This example shows that by applying cgroups to processes, you can restrict the amount of resource it is consuming based on how you want to allocate it.
You looked at cgroups (control groups) in this section and learned how to allocate resources to processes. In the next section, you will learn about rootfs, which you must understand because it is a crucial component in understanding containers.
rootfs
In this section, you will explore rootfs and how it is applied in containers. First, let’s understand what rootfs actually is. rootfs stands for root filesystem, which simply means it is the filesystem containing all the basic necessary files required to boot the operating system. Without the correct rootfs, the operating system will not boot up and no application can run.
rootfs is required so that the operating system can allow other file systems to be mounted, which includes configuration, essential startup processes and data, and other filesystems that are located in other disk partitions. The following shows the minimal directories found in a rootfs:
/bin
/sbin
/etc
/root
/lib
/lib/modules
/dev
/tmp
/boot
/mnt
/proc
/usr
/var,
/home
To run an application inside a container requires rootfs, which allows the application to run like how it runs in a normal system. Let’s take a look at what a minimal rootfs actually looks like. Head over to www.alpinelinux.org/downloads/ to download the Alpine rootfs. Alpine is a very well-known Linux distribution that is used widely when creating containers because of its small image size.
Download the rootfs file from the “Mini Root Filesystem” section as shown in Figure 4-4. If you are using an x86 processor, download the x86_64 file.
A screenshot of the user interface of mini root filesystem software includes I D names aarch64, armhf, armv7, ppc64le, s390x, x86, and x88 underscore 64.
Figure 4-4
Mini root filesystem
Once downloaded, copy the file into a separate directory. In my case, the file is called alpine-minirootfs-3.15.4-x86_64.tar.gz and it is copied into the /home/nanik/play/rootfs directory. Use the following command to extract it:
gunzip ./alpine-minirootfs-3.15.4-x86_64.tar.gz
tar -xvf ./alpine-minirootfs-3.15.4-x86_64.tar
The following is the output of the extracted file:
drwxr-xr-x 19 nanik nanik 4096 Apr 5 02:06 ./
drwxrwxr-x 3 nanik nanik 4096 May 28 18:46 ../
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 bin/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 dev/
drwxr-xr-x 16 nanik nanik 4096 Apr 5 02:06 etc/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 home/
drwxr-xr-x 7 nanik nanik 4096 Apr 5 02:06 lib/
drwxr-xr-x 5 nanik nanik 4096 Apr 5 02:06 media/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 mnt/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 opt/
dr-xr-xr-x 2 nanik nanik 4096 Apr 5 02:06 proc/
drwx------ 2 nanik nanik 4096 Apr 5 02:06 root/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 run/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 sbin/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 srv/
drwxr-xr-x 2 nanik nanik 4096 Apr 5 02:06 sys/
drwxrwxr-x 2 nanik nanik 4096 Apr 5 02:06 tmp/
drwxr-xr-x 7 nanik nanik 4096 Apr 5 02:06 usr/
drwxr-xr-x 12 nanik nanik 4096 Apr 5 02:06 var/
The following output shows what the different directories contain:
.
├── bin
│ ├── arch -> /bin/busybox
...
├── dev
├── etc
...
│ ├── modprobe.d
...
├── home
...
├── sbin
│ ├── acpid -> /bin/busybox
│ ├── adjtimex -> /bin/busybox
...
├── srv
├── sys
├── tmp
├── usr
│ ├── bin
│ │ ├── [ -> /bin/busybox
│ │ ├── [[ -> /bin/busybox
...
│ │ └── yes -> /bin/busybox
│ ├── lib
│ │ ├── engines-1.1
...
│ │ └── modules-load.d
│ ├── local
│ │ ├── bin
...
│ ├── man
│ ├── misc
│ └── udhcpc
│ └── default.script
├── var
│ ├── cache
│ ├── empty
│ ├── lib
Now that you have a good idea of what rootfs is all about and what it contains, you will explore further in the next section how to put everything together into rootfs and run an application like how it normally runs as a container.
Gontainer Project
So far you have looked at how to create the different things that are required to run an application in isolation: namespaces, cgroups and configuring rootfs. In this section, you will look at a sample app that will put everything together and run an application inside its own namespace. In other words, you are going to run the application as a container. The code can be checked out from https://github.com/nanikjava/gontainer.
Make sure you download and extract the rootfs as explained in section “rootFS.” Once the rootfs has been extracted to your local machine, change the directory to the gotainer directory and compile the project using the following command:
go build
Once compiled, you will get an executable called gotainer. Run the application using the following command:
sudo ./gontainer --chrt "[rootfs directory]]" run sh
The command will run the shcommand, which is the native bash command for the Alpine distro in a container. Replace [rootfs directory] with the directory containing the uncompressed Alpine roofs. For example, in my machine, it is /home/nanik/play/rootfs. The full command for my local machine is
sudo ./gontainer --chrt "/home/nanik/play/rootfs" run sh
You will get the prompt /usr # and you’ll able to execute any normal Linux commands. Figure 4-5 shows some of the commands executed inside gotainer.
A screenshot of the framework explains Nanik and roots in a container, Linux, with a total of 28, PIDs.
Figure 4-5
Gotainer in action
Let’s take a look at the code to understand how the whole thing works. There is only one file called gontainer.go. As you saw earlier, the way you run the app is by supplying the argument run sh, which is processed by the main()function shown here:
func main() {
// outline cleanup tasks
wg.Add(1)
...
// actual program
switch args[0] {
case "run":
go run()
...
}
The function run() that takes care of running the application specified with the parameter run is shown here:
func run() {
defer cleanup()
infof("run as [%d] : running %v", os.Getpid(), args[1:])
When a process accesses this magic symbolic link, it resolves to the process's own /proc/[pid] directory.
/proc/[pid]/exe
Under Linux 2.2 and later, this file is a symbolic link containing the actual pathname of the executed command. This symbolic link can be dereferenced normally; attempting to open it will open the executable.
The explanation clearly states that using /proc/self/exe means you are spawning the currently running app, so this means that the run()function is running itself as a separate process, passing in the parameter in lst.
The function uses exec.Command to run /proc/self/exe, passing the variable as lst, which contains the following command:
--chdr /usr --chrt /home/nanik/play/roofs/ --timeout 0s child sh
Let’s explore what the arguments passed to the application are telling the application to do. The init()function declares the following flags that it can receive as arguments:
func init() {
pflag.StringVar(&chroot, "chrt", "", "Where to chroot to. Should contain a linux filesystem. Alpine is recommended. GONTAINER_FS environment is default if not set")
pflag.StringVar(&chdir, "chdr", "/usr", "Initial chdir executed when running container")
pflag.DurationVar(&timeout, "timeout", 0, "Timeout before ending program. If 0 then never ends")
...
infof("flaginputs: %v", flagInputs)
}
Table 4-1 explains the mapping of the argument passed via lst.
Table 4-1
Mapping Arguments
Format
Explanation
--chdr /usr
Initial chdir executed when running container
--chrt /home/nanik/play/roofs/
Where to chroot to
--timeout 0s
Timeout before ending program. If 0, then it never ends.
sh
The app to run once the rootfs is up and running
The only parameter not shown in the table is the childparameter, which is not processed. The childparameter will be processed by the main() function by executing the function child() in goroutine, as shown in the following code snippet:
func main() {
// outline cleanup tasks
...
// actual program
switch args[0] {
...
case "child":
go child()
...
}
The child()function does all the heavy lifting of running the new process in a container-like environment. The following shows the code of the child() function:
func child() {
defer cleanup()
infof("child as [%d]: chrt: %s, chdir:%s", os.Getpid(), chroot, chdir)
infof("running %v", args[1:])
must(syscall.Sethostname([]byte("container")))
must(syscall.Chroot(chroot), "error in 'chroot ", chroot+"'")
syscall.Mkdir(chdir, 0600)
// initial chdir is necessary so dir pointer is in chroot dir when proc mount is called
must(syscall.Chdir("/"), "error in 'chdir /'")
must(syscall.Mount("proc", "proc", "proc", 0, ""), "error in proc mount")
must(syscall.Chdir(chdir), "error in 'chdir ", chdir+"'")
Table 4-2 explains what each section of code is doing. Ignore the must function call as this is an internal function call that checks the return value of each system call.
Table 4-2
Code Explanations
Code
Description
must(syscall.Sethostname([]byte("container")))
Specifies the hostname of the container
must(syscall.Chdir("/"), "error in 'chdir /'")
Performs chroot using the specific rootfs (in this example, it’s /home/nanik/play/rootfs)
must(syscall.Chdir(chdir), "error in 'chdir ", chdir+"'")
Runs the specified argument (in this example, it’s sh)
The following code snippet specifies to the operating system to use the standard in/out and error for the application that is executed:
...
cntcmd.Stdin = os.Stdin
cntcmd.Stdout = os.Stdout
cntcmd.Stderr = os.Stderr
...
Once cntcmd.Run() is completed and the prompt shows up, it means that you are running inside the container, isolated from the host operating system.
Summary
In this chapter, you explored the different parts required to run an application inside a container: namespaces, cgroups, and rootfs. You experimented with the different available Linux tools to create namespaces and configured resources for particular namespaces.
You also explored rootfs, which is a key component to run the operating system, thus allowing applications to run. Finally, you looked at a sample project that shows how to use the different components together inside Go by using the Alpine rootfs.