Virtualization and Containers

Virtualization has been around for a long time as a way to run multiple operating systems on the same machine in order to better utilize physical resources.

One way to achieve virtualization is to employ a virtual machine. Virtual machines work by creating virtual hardware resources, such as CPU, memory, and devices, and use those to install and run multiple operating systems on the same machine. Virtualization can be accomplished by installing a hypervisor application on top of an operating system (called host). The hypervisor is capable of creating, managing, and monitoring virtual machines and their respective operating systems (called guests).

It's important to note that virtual environments, despite their name, have nothing to do with virtual machines. A virtual environment is Python-specific and works by setting up different Python interpreters through shell scripts.

Containers are a way to isolate an application by creating an environment separated from the host operating system and contain only the necessary dependencies. Containers are an operating system feature that allows you to share the hardware resources (provided by the operating system kernel) for multiple instances. A container is different from a virtual machine because it does not abstract hardware resources, but merely shares the operating system's kernel.

Containers are very efficient at utilizing hardware resources as those are accessed natively through the kernel. For this reason, they are an excellent solution for high-performance applications. They are also fast to create and destroy and can be used to quickly test an application in isolation. Containers are also used to simplify deployments (especially microservices) and to develop build servers, such as the ones we mentioned in the preceding section.

In Chapter 8, Distributed Processing, we used docker to easily set up a PySpark installation. Docker is one of the most popular containerization solutions available today. The best way to install docker is by following the instructions on the official website (https://www.docker.com/). After installation, it is possible to easily create and manage containers using the docker command-line interface.

You can start a new container by using the docker run command. In the following example, we will demonstrate how to use docker run to execute a shell session in an Ubuntu 16.04 container. To do this, we will need to specify the following arguments:

-i specifies that we are trying to start an interactive session. It is also possible to execute individual docker commands without interactivity (for example, when starting a web server).
-t <image name> specifies which system image to use. In the following example, we use the ubuntu:16.04 image.
/bin/bash, which is the command to run inside the container, demonstrated as follows:

      $ docker run -i -t ubuntu:16.04 /bin/bash
      root@585f53e77ce9:/#

This command will immediately take us into a separate, isolated shell where we can play around with the system and install software without touching the host operating system. Using a container is a very good way to test installations and deployments on different Linux flavors. After we are done with the interactive shell, we can type the exit command to return to the host system.

In the last chapter, we also made use of the port and detach options, -p and -d, to run the executable pyspark. The -d option simply asks Docker to run the command in the background. The -p <host_port>:<guest_port> option was, instead, necessary to map a network port of the host operating system to the guest system; without this option, the Jupyter Notebook would not have been reachable from a browser running in the host system.

We can monitor the status of the containers with docker ps, as shown in the following snippet. The -a option (which stands for all) serves to output information about all the containers, whether they are currently running or not:

$ docker ps -a
CONTAINER ID IMAGE        COMMAND     CREATED       STATUS     PORTS NAMES
585f53e77ce9 ubuntu:16.04 "/bin/bash" 2 minutes ago Exited (0)       2 minutes ago pensive_hamilton

The information provided by docker ps includes a hexadecimal identifier, 585f53e77ce9, as well as a human readable name, pensive_hamilton, both of which can be used to specify the container in other docker commands. It also includes additional information about the command executed, creation time, and the execution's current status.

You can resume the execution of an exited container using the docker start command. To gain shell access to the container, you can use docker attach. Both these commands can be followed by either the container ID or its human readable name:

$ docker start pensive_hamilton 
pensive_hamilton
$ docker attach pensive_hamilton 
root@585f53e77ce9:/#

A container can be easily removed using the docker run command followed by a container identifier:

$ docker rm pensive_hamilton

As you can see, you are free to execute commands, run, stop, and resume containers as needed, in less than a second. Using docker containers interactively is a great way to test things out and play with new packages without disturbing the host operating system. Since you can run many containers at the same time, docker can also be used to simulate a distributed system (for testing and learning purposes) without having to own an expensive computing cluster.

Docker also allows you to create your own system images, which is useful for distribution, testing, deployment, and documentation purposes. This will be the topic of the next subsection.

Table of Contents for Virtualization and Containers

Create new playlist

Sign In

Sign Up

Table of Contents for
Virtualization and Containers