Relocatable volumes

All of these options that we discussed earlier are fine when working on a single host, but what they lack is real data portability between different physical hosts. For example, the current methods of keeping data persistent can realistically scale up to but not beyond (without some extreme hacking) a single physical server with single Docker Engine and shared attached storage. This might be fine for a powerful server but starts to lack any sort of use in a true clustering configuration since you might be dealing with an unknown number of servers, mixed virtual and physical hosts, different geographic areas, and so on.

Also when a container is restarted, you most likely will not be able to easily predict where it is going to get launched to have the volume backend there for it when it starts. For this use case, there are things called relocatable volumes. These go by various different names, such as "shared multi-host storage", "orchestrated data volume", and many others, but the idea is pretty much the same across the board: have a data volume that will follow the container wherever it goes.

To illustrate the example, here, we have three hosts with two stateful services all connected using the same relocatable volume storage driver:

  • Stateful Container 1 with Volume D on Host 1
  • Stateful Container 2 with Volume G on Host 3

For the purpose of this example, assume that Host 3 has died. In the normal volume driver case, all your data from Stateful Container 2 would be lost, but because you would be using relocatable storage:

  • The orchestration platform will notify your storage driver that the container has died.
  • The orchestration platform will indicate that it wants to restart the killed services on a host with available resources.
  • The volume driver will mount the same volume to the new host that will run the service.
  • The orchestration platform will start the service, passing the volume details into the new container.

In our hypothetical example, the new system state should look a little bit like this:

As you can see from an external point of view, nothing has changed and the data was seamlessly transitioned to the new container and kept its state, which is exactly what we wanted. For this specific purpose, there are a number of Docker volume drivers that one can choose, and each one has its own configuration method for various storage backends, but the only one included with Docker pre-built images for Azure and AWS out of the box is CloudStor, and it is only for Docker Swarm, making it super-specific and completely non-portable.

For various reasons, including the age of technology and lackluster support by Docker and plugin developers, having to do this type of volume handling is most likely going to be the part that you sink a lot of time into when building your infrastructure. I do not want to discourage you, but at the time of writing this, the state of things is really dire regardless of what easy tutorials may like you to believe.

You can find a majority of the drivers at https://docs.docker.com/engine/extend/legacy_plugins/#volume-plugins. After configuration, use them in the following manner if you are doing it manually without orchestration in order to manage mounting:

$ # New-style volume switch (--mount)
$ docker run --mount source=<volume_name>,target=/dest/path,volume-driver=<name>
<image>...


$ # Old-style volume switch
$ docker run -v <volume_name>:/dest/path
--volume-driver <name>
<image>...

For reference, currently, I believe that the most popular plugins for handling relocatable volumes are Flocker, REX-Ray (https://github.com/codedellemc/rexray), and GlusterFS though there are many to choose from, with many of them having similar functionality. As mentioned earlier, the state of this ecosystem is rather abysmal for such an important feature and it seems that almost every big player running their clustering either forks and builds their own storage solution, or they make their own and keep it closed-sourced. Some deployments have even opted to using labels for their nodes to avoid this topic completely and force specific containers to go to specific hosts so that they can use locally mounted volumes.

Flocker's parent company, ClusterHQ, shut down its operations in December 2016 for financial reasons, and while the lack of support would give a bit of a push to not be mentioned here, it is still the most popular one by an order of magnitude for this type of volume management at the time of writing this book. All the code is open sourced on GitHub at https://github.com/ClusterHQ so you can build, install, and run it even without official support. If you want to use this plugin in an enterprise environment and would like to have support for it, some of the original developers are available for hire through a new company called ScatterHQ at https://www.scatterhq.com/ and they have their own source code repositories at https://github.com/ScatterHQ.
GlusterFS is unmaintained in its original source like Flocker, but just like Flocker, you can build, install, and run the full code from the source repository located at https://github.com/calavera/docker-volume-glusterfs. If you would like code versions that have received updates, you can find a few in the fork network at https://github.com/calavera/docker-volume-glusterfs/network.

On top of all this ecosystem fragmentation, this particular way of integrating with Docker is starting to be deprecated in favor of the docker plugin system which manages and installs these plugins as Docker images from Docker Hub but due to lack of availability of these new-style plugins, you might have to use a legacy plugin depending on your specific use cases.

Sadly at the time of writing this book, docker plugin system is, like many of these features, so new that there are barely any available plugins for it. For example, the only plugin from the ones earlier mentioned in legacy plugins that is built using this new system is REX-Ray but the most popular storage backend (EBS) plugin does not seem to install cleanly. By the time you get to read this book, things will probably have changed here but be aware that there is a significant likelihood that in your own implementation you will be using the tried-and-tested legacy plugins.

So with all of these caveats mentioned, let's actually try to get one of the only plugins (sshfs) that can be found working using the new docker plugin install system:

To duplicate this work, you will need access to a secondary machine (though you can run it loopback too) with SSH enabled and reachable from wherever you have Docker Engine running from, since that is the backing storage system that it uses. You will also need the target folder ssh_movable_volume made on the device and possibly the addition of -o odmap=user to the sshfs volume parameters depending on your setup.
$ # Install the plugin
$ docker plugin install vieux/sshfs

Plugin "vieux/sshfs" is requesting the following privileges:
- network: [host]
- mount: [/var/lib/docker/plugins/]
- mount: []
- device: [/dev/fuse]
- capabilities: [CAP_SYS_ADMIN]
Do you grant the above permissions? [y/N] y
latest: Pulling from vieux/sshfs
2381f72027fc: Download complete
Digest: sha256:72c8cfd1a6eb02e6db4928e27705f9b141a2a0d7f4257f069ce8bd813784b558
Status: Downloaded newer image for vieux/sshfs:latest
Installed plugin vieux/sshfs

$ # Sanity check
$ docker plugin ls
ID NAME DESCRIPTION ENABLED
0d160591d86f vieux/sshfs:latest sshFS plugin for Docker true

$ # Add our password to a file
$ echo -n '<password>' > password_file

$ # Create a volume backed by sshfs on a remote server with SSH daemon running
$ docker volume create -d vieux/sshfs
-o [email protected]/ssh_movable_volume
-o password=$(cat password_file)
ssh_movable_volume
ssh_movable_volume

$ # Sanity check
$ docker volume ls
DRIVER VOLUME NAME
vieux/sshfs:latest ssh_movable_volume

$ # Time to test it with a container
$ docker run -it
--rm
--mount source=ssh_movable_volume,target=/my_volume,volume-driver=vieux/sshfs:latest
ubuntu:latest
/bin/bash

root@75f4d1d2ab8d:/# # Create a dummy file
root@75f4d1d2ab8d:/# echo 'test_content' > /my_volume/test_file

root@75f4d1d2ab8d:/# exit
exit

$ # See that the file is hosted on the remote server
$ ssh [email protected]
[email protected]'s password:
<snip>
user@ubuntu:~$ cat ssh_movable_volume/test_file
test_content

$ # Get back to our Docker Engine host
user@ubuntu:~$ exit
logout
Connection to 192.168.56.101 closed.

$ # Clean up the volume
$ docker volume rm ssh_movable_volume
ssh_movable_volume

Due to the way the volume is used, this volume is mostly portable and could allow us the relocatable features we need, though most other plugins use a process that runs outside of Docker and in parallel on each host in order to manage the volume mounting, un-mounting, and moving, so instructions for those will be vastly different.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.244.187