© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
N. TolaramSoftware Development with Gohttps://doi.org/10.1007/978-1-4842-8731-6_4

4. Simple Containers

Nanik Tolaram1  
(1)
Sydney, NSW, Australia
 
In this chapter, you will look at using Go to explore the container world. You will look at different container-related projects to get a better understanding about containers and some of the technologies they use. There are many different aspects of containers such as security, troubleshooting, and scaling container registries. This chapter will give you an understanding of the following topics:
  • The Linux namespace

  • Understanding cgroups and rootfs

  • How containers use rootfs

You will explore different open source projects to understand how containers work and how tools such as Docker actually work.

Linux Namespace

In this section, you will look at namespaces, which are key components in running containers on your local or cloud environment. Namespaces are features that are only available in the Linux kernel, so everything that you will read here is relevant to the Linux operating system.

A namespace is a feature provided by the Linux kernel for applications to use, so what actually is it? It is used to create an isolated environment for processes that you want to run with their own resources.

Figure 4-1 shows a representation of each isolated namespace that is running applications with its own network. Each application that is running inside a namespace cannot access anything outside its own namespace. For example, App1 cannot access App2 resources. If for some reason App1 crashes, it will not bring down the other applications, nor it will bring down the Linux host. Think of a namespace as an island to run applications; it can provide anything you need for the applications to run on without disturbing the other surrounding islands.

A diagram of the Linux namespace has blocks with app and network numbered 1 to 3.

Figure 4-1

Linux namespace

You can create namespaces using tools that are already available in the Linux system. One of the tools you are going to experiment with is called unshare. It is a tool that allows users to create namespaces and run applications inside that namespace.

Before you run unshare, let’s take a look my local host machine compared to when I run the app using unshare. We will compare the following:
  • The applications that are running in the host machine compared to when we run them inside a namespace

  • The available network interface in a host machine compared to when inside the namespace

To list running applications on my local Linux machine, I use the command ps.
ps au
The following is a snippet of the list of applications that are currently running on my local machine:
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
...
nanik       2551    0.0  0.0  231288    4  tty2     SNl+ May09  0:00 /usr/libexec/gnome-session-binary --systemd --session=pop
nanik       6418    0.0  0.0  21644   712  pts/0    S<s  May09  0:00 bash
nanik       8594    0.0  0.0  22820     8  pts/2    S<s  May09  0:00 bash
nanik       9828    0.0  0.0  22516  4300  pts/3    S<s+ May09  0:03 bash
...
nanik       295802  0.0  0.0  1716900 6408 pts/7    S<l+ May11  2:18 docker run -p 6379:6379 redis
nanik       511876  0.0  0.0  21288    24  pts/6    S<s  May13   0:00 bash
nanik       642244  0.0  0.0  21420     8  pts/8    S<s+ May14   0:00 bash
...
root        1368986 0.0  0.0  25220   108  pts/3    T<   May19   0:00 sudo gedit /etc/hosts
To look at the available network interface on local machine, use the command ip.
ip link
It shows the following interfaces:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp4s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 88:a4:c2:a4:85:ac brd ff:ff:ff:ff:ff:ff
3: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000  link/ether xx:xx:xa:xx:xx:xx brd  xx:xx:xx:xx:xx:xx
...
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether xx:xx:xa:xx:xx:xx brd ff:ff:ff:ff:ff:ff
...
447: thebridge: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default
    link/ether xx:xx:xa:xx:xx:xx brd ff:ff:ff:ff:ff:ff

As you can see, there are many processes running in the local host machine and there are many network interfaces.

Run the following command to create a namespace and run bash inside the namespace as the application:
unshare --user --pid --map-root-user --cgroup --mount-proc --net --uts --fork bash
It will look like Figure 4-2.

A screenshot of the output screen depicts two factors of running unshare named Nanik and root.

Figure 4-2

Running unshare

Inside the new namespace, as seen in Figure 4-2, it will only display two processes and one network interface (local interface). This shows that the namespace is isolating access to the host machine.

You have looked at using unshare to create namespaces and run bash as an application isolated in its own namespace. Now that you have a basic understanding of namespaces, you will explore another piece of the puzzle called cgroups in the next section.

cgroups

cgroups stands for control groups, which is a feature provided by the Linux kernel. Namespaces, which we discussed in the previous section, go hand in hand with cgroups. Let’s take a look at what cgroups contains. cgroups gives users the ability to limit certain resources such as the CPU and memory network allocated for a particular process or processes. Host machines resources are finite, and if you want to run multiple processes in separate namespaces, you want to allocate resources across different namespaces.

cgroups resides inside the /sys/fs/cgroup directory. Let’s create a subdirectory inside the main cgroup directory and take a peek inside it. Run the following command to create a directory using root:
sudo mkdir /sys/fs/cgroup/example
List the directories inside the newly created directory using the following command:
sudo ls /sys/fs/cgroup/example -la
You will see output that looks like the following:
-r--r--r--  1 root root 0 May 24 23:06 cgroup.controllers
-r--r--r--  1 root root 0 May 24 23:06 cgroup.events
-rw-r--r--  1 root root 0 May 24 23:06 cgroup.freeze
...
-rw-r--r--  1 root root 0 May 24 23:06 cgroup.type
-rw-r--r--  1 root root 0 May 24 23:06 cpu.idle
-rw-r--r--  1 root root 0 May 24 23:06 cpu.max
-rw-r--r--  1 root root 0 May 24 23:06 cpu.max.burst
-rw-r--r--  1 root root 0 May 24 23:06 cpu.pressure
-rw-r--r--  1 root root 0 May 24 23:06 cpuset.cpus
-r--r--r--  1 root root 0 May 24 23:06 cpuset.cpus.effective
-rw-r--r--  1 root root 0 May 24 23:06 cpuset.cpus.partition
-rw-r--r--  1 root root 0 May 24 23:06 cpuset.mems
-r--r--r--  1 root root 0 May 24 23:06 cpuset.mems.effective
...
-rw-r--r--  1 root root 0 May 24 23:06 io.max
...
-rw-r--r--  1 root root 0 May 24 23:06 memory.low
-rw-r--r--  1 root root 0 May 24 23:06 memory.max
-rw-r--r--  1 root root 0 May 24 23:06 memory.min
-r--r--r--  1 root root 0 May 24 23:06 memory.numa_stat
-rw-r--r--  1 root root 0 May 24 23:06 memory.oom.group
-rw-r--r--  1 root root 0 May 24 23:06 memory.pressure
...

The directories that you see are actually the configurations that you can set values relevant to the resources that you want to allocate for a particular process. Let’s take a look at an example.

You will run a tool called stress (https://linux.die.net/man/1/stress), which you need to install to your local machine. If you are using Ubuntu, you can use the command
sudo apt install stress
Open a terminal and run the stress tool as follows. The application will run for 60 seconds using one core and consuming 100% of CPU usage.
stress --cpu 1 --timeout 60
Open another terminal and run the following command to obtain the process id of the stress application:
top
On my local machine, the process id is 2185657, as shown in Figure 4-3.

A screenshot of the output screen depicts a table column named P I D, User, P R, N I, V I R T, RES, S H R, S, C P U in percentage, Memory in percentage, and time and command.

Figure 4-3

Output of top

Now insert the value of the process id into the cgroups directory as follows:
sudo echo "200000 1000000" > /sys/fs/cgroup/example/cpu.max
sudo echo "2185657" > /sys/fs/cgroup/example/cgroup.procs

The command allocates 20% of the CPU usage for all processes inside the example cgroups, and for this example, the stress application process id is marked as part of the example cgroups. If you have your terminal running top open, you will see that the stress application will now only consume 20% instead of 100%.

This example shows that by applying cgroups to processes, you can restrict the amount of resource it is consuming based on how you want to allocate it.

You looked at cgroups (control groups) in this section and learned how to allocate resources to processes. In the next section, you will learn about rootfs, which you must understand because it is a crucial component in understanding containers.

rootfs

In this section, you will explore rootfs and how it is applied in containers. First, let’s understand what rootfs actually is. rootfs stands for root filesystem, which simply means it is the filesystem containing all the basic necessary files required to boot the operating system. Without the correct rootfs, the operating system will not boot up and no application can run.

rootfs is required so that the operating system can allow other file systems to be mounted, which includes configuration, essential startup processes and data, and other filesystems that are located in other disk partitions. The following shows the minimal directories found in a rootfs:
/bin
/sbin
/etc
/root
/lib
/lib/modules
/dev
/tmp
/boot
/mnt
/proc
/usr
/var,
/home

To run an application inside a container requires rootfs, which allows the application to run like how it runs in a normal system. Let’s take a look at what a minimal rootfs actually looks like. Head over to www.alpinelinux.org/downloads/ to download the Alpine rootfs. Alpine is a very well-known Linux distribution that is used widely when creating containers because of its small image size.

Download the rootfs file from the “Mini Root Filesystem” section as shown in Figure 4-4. If you are using an x86 processor, download the x86_64 file.

A screenshot of the user interface of mini root filesystem software includes I D names aarch64, armhf, armv7, ppc64le, s390x, x86, and x88 underscore 64.

Figure 4-4

Mini root filesystem

Once downloaded, copy the file into a separate directory. In my case, the file is called alpine-minirootfs-3.15.4-x86_64.tar.gz and it is copied into the /home/nanik/play/rootfs directory. Use the following command to extract it:
gunzip ./alpine-minirootfs-3.15.4-x86_64.tar.gz
tar -xvf ./alpine-minirootfs-3.15.4-x86_64.tar
The following is the output of the extracted file:
drwxr-xr-x 19 nanik nanik    4096 Apr  5 02:06 ./
drwxrwxr-x  3 nanik nanik    4096 May 28 18:46 ../
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 bin/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 dev/
drwxr-xr-x 16 nanik nanik    4096 Apr  5 02:06 etc/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 home/
drwxr-xr-x  7 nanik nanik    4096 Apr  5 02:06 lib/
drwxr-xr-x  5 nanik nanik    4096 Apr  5 02:06 media/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 mnt/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 opt/
dr-xr-xr-x  2 nanik nanik    4096 Apr  5 02:06 proc/
drwx------  2 nanik nanik    4096 Apr  5 02:06 root/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 run/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 sbin/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 srv/
drwxr-xr-x  2 nanik nanik    4096 Apr  5 02:06 sys/
drwxrwxr-x  2 nanik nanik    4096 Apr  5 02:06 tmp/
drwxr-xr-x  7 nanik nanik    4096 Apr  5 02:06 usr/
drwxr-xr-x 12 nanik nanik    4096 Apr  5 02:06 var/
The following output shows what the different directories contain:
.
├── bin
│   ├── arch -> /bin/busybox
...
├── dev
├── etc
...
│   ├── modprobe.d
...
├── home
...
├── sbin
│   ├── acpid -> /bin/busybox
│   ├── adjtimex -> /bin/busybox
...
├── srv
├── sys
├── tmp
├── usr
│   ├── bin
│   │   ├── [ -> /bin/busybox
│   │   ├── [[ -> /bin/busybox
...
│   │   └── yes -> /bin/busybox
│   ├── lib
│   │   ├── engines-1.1
...
│   │   └── modules-load.d
│   ├── local
│   │   ├── bin
...
│       ├── man
│       ├── misc
│       └── udhcpc
│           └── default.script
├── var
│   ├── cache
│   ├── empty
│   ├── lib

Now that you have a good idea of what rootfs is all about and what it contains, you will explore further in the next section how to put everything together into rootfs and run an application like how it normally runs as a container.

Gontainer Project

So far you have looked at how to create the different things that are required to run an application in isolation: namespaces, cgroups and configuring rootfs. In this section, you will look at a sample app that will put everything together and run an application inside its own namespace. In other words, you are going to run the application as a container. The code can be checked out from https://github.com/nanikjava/gontainer.

Make sure you download and extract the rootfs as explained in section “rootFS.” Once the rootfs has been extracted to your local machine, change the directory to the gotainer directory and compile the project using the following command:
go build
Once compiled, you will get an executable called gotainer. Run the application using the following command:
sudo ./gontainer --chrt "[rootfs directory]]" run sh
The command will run the sh command, which is the native bash command for the Alpine distro in a container. Replace [rootfs directory] with the directory containing the uncompressed Alpine roofs. For example, in my machine, it is /home/nanik/play/rootfs. The full command for my local machine is
sudo ./gontainer --chrt "/home/nanik/play/rootfs" run sh
You will get the prompt /usr # and you’ll able to execute any normal Linux commands. Figure 4-5 shows some of the commands executed inside gotainer.

A screenshot of the framework explains Nanik and roots in a container, Linux, with a total of 28, PIDs.

Figure 4-5

Gotainer in action

Let’s take a look at the code to understand how the whole thing works. There is only one file called gontainer.go. As you saw earlier, the way you run the app is by supplying the argument run sh, which is processed by the main() function shown here:
func main() {
  // outline cleanup tasks
  wg.Add(1)
  ...
  // actual program
  switch args[0] {
  case "run":
     go run()
  ...
}
The function run() that takes care of running the application specified with the parameter run is shown here:
func run() {
  defer cleanup()
  infof("run as [%d] : running %v", os.Getpid(), args[1:])
  lst := append(append(flagInputs, "child"), args[1:]...)
  infof("running proc/self/exe %v", lst)
  if timeout > 0 {
     ctx, cancel := context.WithTimeout(context.Background(), timeout)
     defer cancel()
     runcmd = exec.CommandContext(ctx, "/proc/self/exe", lst...)
  } else {
     runcmd = exec.Command("/proc/self/exe", lst...)
  }
  runcmd.Stdin = os.Stdin
  runcmd.Stdout = os.Stdout
  runcmd.Stderr = os.Stderr
  runcmd.SysProcAttr = &syscall.SysProcAttr{
     Cloneflags:   syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
     Unshareflags: syscall.CLONE_NEWNS,
  }
  runcmd.Run()
}
You can see that the code is using /proc/self/exe, so what is this? The Linux manual at https://man7.org/linux/man-pages/man5/proc.5.html says
/proc/self
       When a process accesses this magic symbolic link, it resolves to the process's own /proc/[pid] directory.
/proc/[pid]/exe
       Under Linux 2.2 and later, this file is a symbolic link containing the actual pathname of the executed command. This symbolic link can be dereferenced normally; attempting to open it will open the executable.

The explanation clearly states that using /proc/self/exe means you are spawning the currently running app, so this means that the run() function is running itself as a separate process, passing in the parameter in lst.

The function uses exec.Command to run /proc/self/exe, passing the variable as lst, which contains the following command:
--chdr /usr --chrt /home/nanik/play/roofs/ --timeout 0s child sh
Let’s explore what the arguments passed to the application are telling the application to do. The init() function declares the following flags that it can receive as arguments:
func init() {
  pflag.StringVar(&chroot, "chrt", "", "Where to chroot to. Should contain a linux filesystem. Alpine is recommended. GONTAINER_FS environment is default if not set")
  pflag.StringVar(&chdir, "chdr", "/usr", "Initial chdir executed when running container")
  pflag.DurationVar(&timeout, "timeout", 0, "Timeout before ending program. If 0 then never ends")
  ...
  infof("flaginputs: %v", flagInputs)
}
Table 4-1 explains the mapping of the argument passed via lst.
Table 4-1

Mapping Arguments

Format

Explanation

--chdr /usr

Initial chdir executed when running container

--chrt /home/nanik/play/roofs/

Where to chroot to

--timeout 0s

Timeout before ending program. If 0, then it never ends.

sh

The app to run once the rootfs is up and running

The only parameter not shown in the table is the child parameter, which is not processed. The child parameter will be processed by the main() function by executing the function child() in goroutine, as shown in the following code snippet:
func main() {
  // outline cleanup tasks
  ...
  // actual program
  switch args[0] {
  ...
  case "child":
     go child()
  ...
}
The child() function does all the heavy lifting of running the new process in a container-like environment. The following shows the code of the child() function:
func child() {
  defer cleanup()
  infof("child as [%d]: chrt: %s,  chdir:%s", os.Getpid(), chroot, chdir)
  infof("running %v", args[1:])
  must(syscall.Sethostname([]byte("container")))
  must(syscall.Chroot(chroot), "error in 'chroot ", chroot+"'")
  syscall.Mkdir(chdir, 0600)
  // initial chdir is necessary so dir pointer is in chroot dir when proc mount is called
  must(syscall.Chdir("/"), "error in 'chdir /'")
  must(syscall.Mount("proc", "proc", "proc", 0, ""), "error in proc mount")
  must(syscall.Chdir(chdir), "error in 'chdir ", chdir+"'")
  if timeout > 0 {
     ctx, cancel := context.WithTimeout(context.Background(), timeout+time.Millisecond*50)
     defer cancel()
     cntcmd = exec.CommandContext(ctx, args[1], args[2:]...)
  } else {
     cntcmd = exec.Command(args[1], args[2:]...)
  }
  cntcmd.Stdin = os.Stdin
  ...
  must(cntcmd.Run(), fmt.Sprintf("run %v return error", args[1:]))
  syscall.Unmount("/proc", 0)
}
Table 4-2 explains what each section of code is doing. Ignore the must function call as this is an internal function call that checks the return value of each system call.
Table 4-2

Code Explanations

Code

Description

must(syscall.Sethostname([]byte("container")))

Specifies the hostname of the container

must(syscall.Chdir("/"), "error in 'chdir /'")

Performs chroot using the specific rootfs (in this example, it’s /home/nanik/play/rootfs)

must(syscall.Chdir(chdir), "error in 'chdir ", chdir+"'")

Changes directory to the specified location

must(cntcmd.Run(), fmt.Sprintf("run %v return error", args[1:]))

Runs the specified argument (in this example, it’s sh)

The following code snippet specifies to the operating system to use the standard in/out and error for the application that is executed:
...
cntcmd.Stdin = os.Stdin
cntcmd.Stdout = os.Stdout
cntcmd.Stderr = os.Stderr
...

Once cntcmd.Run() is completed and the prompt shows up, it means that you are running inside the container, isolated from the host operating system.

Summary

In this chapter, you explored the different parts required to run an application inside a container: namespaces, cgroups, and rootfs. You experimented with the different available Linux tools to create namespaces and configured resources for particular namespaces.

You also explored rootfs, which is a key component to run the operating system, thus allowing applications to run. Finally, you looked at a sample project that shows how to use the different components together inside Go by using the Alpine rootfs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.81.162