Swarm on zCX
This chapter describes the use of Docker swarm mode on zCX, including the necessary configuration steps for setting up a clustering container environment.
Upon completion of this chapter, you should be able to deploy a container by using a web application on IBM Z and use orchestration tools.
This chapter includes the following topics:
11.1 Swarm introduction
The Docker website describes a swarm as follows:
A swarm consists of multiple Docker hosts, which run in swarm mode and act as managers (to manage membership and delegation) and workers (which run swarm services). A given Docker host can be a manager, a worker, or perform both roles. When you create a service, you define its optimal state (number of replicas, network, and storage resources available to it, ports the service exposes to the outside world, and more). Docker works to maintain that wanted state. For instance, if a worker node becomes unavailable, Docker schedules that node’s tasks on other nodes. A task is a running container that is part of a swarm service and managed by a swarm manager, as opposed to a stand-alone container.1
The Docker command-line interface (CLI) is used to create a swarm, deploy application services to a swarm, and manage swarm behavior. (A web interface is not used for swarm.)
The following key concepts are related to swarm:
Node
A node is an instance of the Docker Engine that is participating in the swarm. Multiple nodes can be run on one physical or virtual server. Typically, nodes are spread across multiple servers in a production environment. Nodes in a swarm can be manager or worker nodes:
 – Manager node
This node conducts the orchestration and management functions for the swarm. Interaction with the swarm happens through a manager node, which in turn dispatches units of work that are called tasks to the worker nodes.
Docker recommends the use of an odd number of manager nodes in a swarm for redundancy to maintain a quorum, with seven nodes being the maximum recommended. It is possible to run with only one manager node. However, if this node fails, the swarm continues to operate but a new cluster must be built to recover.
 
Note: Increasing the number of manager nodes does not increase performance of the swarm, and doing so might negatively affect the performance of the swarm.
 – Worker node
This node receives tasks from the management nodes and runs. Worker nodes inform the manager nodes of their status and the status of the assigned tasks. They exist to run containers. They have no input into load-balancing decisions of the swarm and they do not participate in raft consensus. The status of worker nodes is controlled by the manager.
 
Note: Manager nodes are worker nodes by default. Manager nodes can be demoted to worker node status by using the docker node demote command. Worker nodes can be promoted to manager node status by using the docker node promote command.
Tasks and services
A task carries a Docker container and the commands to be run inside the container. Manager nodes assign tasks to worker nodes according to the number of replicas that are set in the service scale. After a task is assigned to a node, it cannot move to another node. It can run on the assigned node only or fail.
A service is the definition of the tasks to run on the manager or worker nodes. It is the central structure of the swarm system and the primary root of user interaction with the swarm. Service definition specifies which container image to use and which commands to run inside running containers.
Figure 11-1 shows the architecture of a swarm.
Figure 11-1 Swarm architecture diagram
Load balancing
The swarm manager uses ingress load balancing to expose the services that are made available externally to the swarm. The swarm manager can automatically assign the service a PublishedPort, or a PublishedPort can be configured for the service 30000 - 32767 range.
External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster whether the node is running the task for the service. All nodes in the swarm route ingress connections to a running task instance.
Swarm mode has an internal DNS component that automatically assigns a DNS entry to each service in the swarm. The swarm manager uses internal load balancing to distribute requests among services within the cluster, which are based on the DNS name of the service.
11.2 zCX in swarm mode
In this section, we configure Docker Swarm mode on our zCX instances to demonstrate the application portability that can be achieved by zCX.
As discussed earlier in this chapter, Docker swarm is part of Docker Engine. Therefore, we do not have to install any more components to have functionality.
In this example, we complete the following steps:
1. Leave node from Swarm.
2. Start the swarm mode.
3. Add node manager to the cluster.
4. Add worker nodes to the Manager node.
5. Create a Docker service.
6. Scale up a container.
7. Change node availability to simulate a scheduled maintenance.
8. Promote a worker node.
9. Demote a manager node.
10. Scale down a container.
Our environment
The specific components of our swarm environment are listed in Table 11-1.
Table 11-1 Our zCX swarm environment
Server Name
Server IP
Swarm Node role
z/OS LPAR
sc74cn03.pbm.ihost.com
129.40.23.70
Manager
wtsc74.pbm.ihost.com
sc74cn04.pbm.ihost.com
129.40.23.71
Worker
wtsc75.pbm.ihost.com
sc74cn09.pbm.ihost.com
129.40.23.76
Manager
wtsc74.pbm.ihost.com
sc74cn10.pbm.ihost.com
129.40.23.77
Worker
wtsc75.pbm.ihost.com
 
 
 
 
The Example 11-2 depicts our zCX cluster environment.
Figure 11-2 Our zCX environment
Note: Swarm uses server names and IP addresses for cluster configuration. Ensure that you correctly set up DNS with the name and IP addresses of the zCX instances.
11.2.1 Removing node from swarm
Use the docker swarm leave command if you need to remove a specific node from swarm cluster. If your zCX instance has never joined a swarm cluster, you can skip this step. Otherwise, issue the command that is shown in Example 11-1.
Example 11-1 Swiping swarm configuration
admin@570e9473805e:~$ docker swarm leave
Node left the swarm.
 
Note: For Manager Node, you must append --force to force the node to leave the cluster.
11.2.2 Initializing swarm mode
The first node in swarm is called Manager leader. This node is responsible for advertising its IP address. In our case, we use the sc74cn03.pbm.ihost.com (129.40.23.70) instance to act as leader. (See Example 11-2.)
Example 11-2 Initializing a swarm cluster
admin@570e9473805e:~$ docker swarm init --advertise-addr 129.40.23.70
Swarm initialized: current node (1xk53pdq2ujayiea2vbfu3bs1) is now a manager.
 
To add a worker to this swarm, run the following command:
 
docker swarm join --token SWMTKN-1-17aalrjaxjmlx43ytth47xb23enxaek0ris89826ane1d6o99z-1zzz4qvmc8bxhb2le3gulg3nx 129.40.23.70:2377
 
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
 
Note: After issuing the preceding docker swarm init command, the master node is defined and displays the command that allows you to add new worker nodes to the cluster. Make a note of it because we use this command in the next steps.
Example 11-2 demonstrates a command that adds another master node into the cluster. We ran the command as instructed and Example 11-3 shows the output.
Example 11-3 Getting the command to join manager nodes into the cluster
admin@570e9473805e:~$ docker swarm join-token manager
To add a manager to this swarm, run the following command:
 
docker swarm join --token SWMTKN-1-17aalrjaxjmlx43ytth47xb23enxaek0ris89826ane1d6o99z-cafuexvwlvojqhuosk7lqe07q 129.40.23.70:2377
11.2.3 Adding node manager to the cluster
In this step, we add the second manager node to our swarm cluster.
To join sc74cn09.pbm.ihost.com as master node, follow these steps:
1. Connect to sc74cn09.pbm.ihost.com zCX instance by using ssh (in other words, ssh [email protected]).
2. Issue docker swarm join, including the token for the manager node that you got in Example 11-2. The following example shows the output:
Example 11-4 Joining a manager node
admin@01f543067055:~$ docker swarm join --token SWMTKN-1-17aalrjaxjmlx43ytth47xb23enxaek0ris89826ane1d6o99z-cafuexvwlvojqhuosk7lqe07q 129.40.23.70:2377
This node joined a swarm as a manager.
11.2.4 Adding worker nodes to the manager node
The worker nodes load the Docker workloads. Therefore, you can add worker nodes to process more workloads (see Example 11-5).
To join our first worker node (sc74cn04.pbm.ihost.com), complete the following steps:
1. Connect to sc74cn04.pbm.ihost.com zCX instance by using ssh (in other words, ssh [email protected])
2. Issue docker swarm join, including the token for the worker node that you got in Example 11-2. The following example shows the output:
Example 11-5 Joining a worker node
admin@a29d70c73c55:~$ docker swarm join --token SWMTKN-1-17aalrjaxjmlx43ytth47xb23enxaek0ris89826ane1d6o99z-1zzz4qvmc8bxhb2le3gulg3nx 129.40.23.70:2377
This node joined a swarm as a worker.
Example 6 shows successful output for the command for the sc74cn10.pbm.ihost.com server.
Now, check the status of the cluster by running the command that is shown in Example 6 from one of the manager nodes.
Example 6 Checking status of our swarm cluster
admin@570e9473805e:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
1xk53pdq2ujayiea2vbfu3bs1 * sc74cn03.pbm.ihost.com Ready Active Reachable 18.09.2
cidllaskmus095rrb221ii9b9 sc74cn04.pbm.ihost.com Ready Active 18.09.2
4y9a3mej93xvq4denfphhz53u sc74cn09.pbm.ihost.com Ready Active Leader 18.09.2
j2b9596uktzhupuxv6td22mm1 sc74cn10.pbm.ihost.com Ready Active 18.09.2
By default, all nodes in the cluster can receive Docker workloads. We are going to prevent the manager nodes from receiving application workloads. In our setup, only worker nodes will receive application workloads from Swarm. The following commands prevent the manager nodes from receiving workloads. Run these commands on any of the manager nodes. In our example, we are running it on the sc74cn03.pbm.ihost.com server and sc74cn09.pbm.ihost.com as shown in Example 11-7.
Example 11-7 Node update command
admin@570e9473805e:~$ docker node update --availability drain sc74cn03.pbm.ihost.com
sc74cn03.pbm.ihost.com
 
admin@570e9473805e:~$ docker node update --availability drain sc74cn09.pbm.ihost.com
sc74cn09.pbm.ihost.com
 
admin@570e9473805e:~$ docker node ls
ID HOSTNAME STATUS  AVAILABILITY MANAGER STATUS ENGINE VERS
1xk53pdq2ujayiea2vbfu3bs1 * sc74cn03.pbm.ihost.com Ready   Drain Reachable 18.09.2
cidllaskmus095rrb221ii9b9 sc74cn04.pbm.ihost.com Ready   Active 18.09.2
4y9a3mej93xvq4denfphhz53u sc74cn09.pbm.ihost.com Ready   Drain Leader 18.09.2
j2b9596uktzhupuxv6td22mm1 sc74cn10.pbm.ihost.com Ready   Active 18.09.2
As you can see in Example 11-7, the manager nodes are now in Drain status. When you drain a node, the scheduler reassigns any tasks that are running on the node to other available worker nodes in the swarm. It also prevents the scheduler from assigning tasks to the node.
11.2.5 Creating a Docker service
In this step, we defined a cluster service called web server. This service provides the nginx application. Example 8 shows the service status before the command is run. Then, we deploy a new service and check the status to confirm that web server service was created.
These commands must run in any manager node that is part of the cluster. In our example, we ran it in the sc74cn09.pbm.ihost.com server.
Example 8 Creating a Docker service
admin@01f543067055:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
 
admin@01f543067055:~$ docker service create --name webserver -p 80:80 nginx:latest
pwu25hj3wpwwovo73d2is31gn
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
 
admin@01f543067055:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
pwu25hj3wpww webserver replicated 1/1 nginx:latest *:80->80/tcp
You can get more information about the web server service by running the command that is shown in Example 11-9 (the nginx container was started in sc74cn04.pbm.ihost.com server, which is the worker node. Remember that we removed sc74cn03.pbm.ihost.com and sc74cn09.pbm.ihost.com from receiving application workloads because they were drained in a previous step. By default, only one replica is started.
Example 11-9 Checking web server service
admin@01f543067055:~$ docker service ps webserver
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE         ERROR   PORT
vkqgrr6aypi3 webserver.1 nginx:latest sc74cn04.pbm.ihost.com Running Running 2 minutes ago
Now, we stopped the container on sc74cn04.pbm.ihost.com server to see the behavior when someone takes down a container that is managed by swarm (see Example 11-10).
Example 11-10 Stopping nginx container on itsosleb server
admin@a29d70c73c55:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS             NAME
0f0fac618197 nginx:latest "nginx -g 'daemon of…" 5 minutes ago Up 5 minutes 80/tcp webserver.1.vkqgrr6aypi3e8ue88wuqedxw
a29d70c73c55 ibm_zcx_zos_cli_image "sudo /usr/sbin/sshd…" 2 hours ago Up 2 hours 8022/tcp, 0.0.0.0:8022->22/tcp ibm_zcx_zos_cli
 
admin@a29d70c73c55:~$ docker stop 0f0fac618197
0f0fac618197
 
admin@a29d70c73c55:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS     NAME
e5a065c3177b nginx:latest "nginx -g 'daemon of…" 6 seconds ago Up Less than a second 80/tcp webserver.1.2y909gfnju13ybtxwsknljuyr
a29d70c73c55 ibm_zcx_zos_cli_image "sudo /usr/sbin/sshd…" 2 hours ago Up 2 hours 8022/tcp, 0.0.0.0:8022->22/tcp ibm_zcx_zos_cli
Swarm identified that container nginx was down on the sc74cn04.pbm.ihost.com worker node and automatically restarted a new container on the same node. Typically that restart happens on the other worker node. But the decison depends on the calculations of the swarm algorithm.
To confirm that the container was restarted, we ran the command listed in Example 11-11 on sc74cn03.pbm.ihost.com (master node) to show the status of the web server.
Example 11-11 Checking status of web server service after container stops
admin@570e9473805e:~$ docker service ps webserver
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
PORTS
2y909gfnju13 webserver.1 nginx:latest sc74cn04.pbm.ihost.com Running Running 5 minutes ago
vkqgrr6aypi3 \_ webserver.1 nginx:latest sc74cn04.pbm.ihost.com Shutdown Complete 5 minutes ago
11.2.6 Scaling up a container
Scaling a container allows you to increase the number of containers. This activity is common when you need to increase capacity for your application and allow more requests to be processed by the cluster.
We scaled up the web server service from 1 to 4 instances, as shown in Example 11-12. We ran this command on sc74cn03.pbm.ihost.com (master node).
Example 11-12 Scaling up your web server service
admin@570e9473805e:~$ docker service scale webserver=4
webserver scaled to 4
overall progress: 4 out of 4 tasks
1/4: running [==================================================>]
2/4: running [==================================================>]
3/4: running [==================================================>]
4/4: running [==================================================>]
verify: Service converged
 
admin@570e9473805e:~$ docker service ps webserver
ID NAME IMAGE NODE DESIRED STATE  CURRENT STATE      ERROR  PORT
2y909gfnju13 webserver.1 nginx:latest sc74cn04.pbm.ihost.com Running Running 7 minutes ago
vkqgrr6aypi3 \_ webserver.1 nginx:latest sc74cn04.pbm.ihost.com Shutdown Complete 7 minutes ago
omled936b5dt webserver.2 nginx:latest sc74cn10.pbm.ihost.com Running Running 16 seconds ago
4ztuxznjrc8t webserver.3 nginx:latest sc74cn10.pbm.ihost.com Running Running 16 seconds ago
hqb26nea8vmf webserver.4 nginx:latest sc74cn04.pbm.ihost.com Running Running 17 seconds ago
 
admin@570e9473805e:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
pwu25hj3wpww webserver replicated 4/4 nginx:latest *:80->80/tcp
For the next example, we scaled the web server service from 4 to 30, as shown in Example 11-13.
Example 11-13 Scaling web server service from 4 to 30
admin@570e9473805e:~$ docker service scale webserver=30
webserver scaled to 30
overall progress: 30 out of 30 tasks
verify: Service converged
 
admin@570e9473805e:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
pwu25hj3wpww webserver replicated 30/30 nginx:latest *:80->80/tcp
11.2.7 Changing node availability to simulate a scheduled maintenance
Setting node availability from active to drain is often used when you need to perform maintenance on your Docker host. When a node has availability equal to drain, all containers are stopped on this node and started in the other nodes that are members of the cluster.
The status of sc74cn04.pbm.ihost.com (worker node) server changed from active to drain (see Example 11-14).
Example 11-14 Putting server in DRAIN mode
admin@570e9473805e:~$ docker node update --availability drain sc74cn04.pbm.ihost.com
sc74cn04.pbm.ihost.com
 
admin@570e9473805e:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSI
1xk53pdq2ujayiea2vbfu3bs1 * sc74cn03.pbm.ihost.com Ready Drain Reachable 18.09.2
cidllaskmus095rrb221ii9b9 sc74cn04.pbm.ihost.com Ready Drain 18.09.2
4y9a3mej93xvq4denfphhz53u sc74cn09.pbm.ihost.com Ready Drain Leader 18.09.2
j2b9596uktzhupuxv6td22mm1 sc74cn10.pbm.ihost.com Ready Active 18.09.2
 
admin@570e9473805e:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
pwu25hj3wpww webserver replicated 30/30 nginx:latest *:80->80/tcp
The output that is shown in Example 11-15 shows that all containers for the web server service were stopped.
Example 11-15 Show containers that are available
admin@a29d70c73c55:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a29d70c73c55 ibm_zcx_zos_cli_image "sudo /usr/sbin/sshd…" 2 hours ago Up 2 hours 8022/tcp, 0.0.0.0:8022->22/tcp ibm_zcx_zos_cli
To return the sc74cn04.pbm.ihost.com (worker node) server back to active, run the command that is shown in Example 11-16. We also are scaling the web server service to 105 to see instances that are running on sc74cn04.pbm.ihost.com server. This is because swarm does not automatically rebalance the cluster.
Example 11-16 Returning the server back to accept workloads
# Making sc74cn04.pbm.ihost.com server available to the cluster
admin@570e9473805e:~$ docker node update --availability active sc74cn04.pbm.ihost.com
sc74cn04.pbm.ihost.com
 
# Scaling cluster from 30 to 105
admin@570e9473805e:~$ docker service scale webserver=105
webserver scaled to 105
overall progress: 105 out of 105 tasks
verify: Service converged
admin@570e9473805e:~$
 
#Connected on sc74cn04.pbm.ihost.com worker node to confirm containers running after scale the cluster. 53 instances of nginx running on sc74cn04.
admin@a29d70c73c55:~$ docker ps |grep nginx |wc -l
53
11.2.8 Promoting a worker node
You might promote a worker node to be a manager node in scenarios like these:
You need to take a manager node down for maintenance.
You need to increase the number of manager nodes for your swarm cluster.
Example 11-17 shows the method for promoting a node.
Example 11-17 Promoting itsoredb node
admin@570e9473805e:~$ docker node promote sc74cn04.pbm.ihost.com
Node sc74cn04.pbm.ihost.com promoted to a manager in the swarm.
 
admin@570e9473805e:~$ docker node ls
ID HOSTNAME STATUS   AVAILABILITY MANAGER STATUS ENGINE VERSI
1xk53pdq2ujayiea2vbfu3bs1 * sc74cn03.pbm.ihost.com Ready    Drain Reachable 18.09.2
cidllaskmus095rrb221ii9b9 sc74cn04.pbm.ihost.com Ready    Active Reachable 18.09.2
4y9a3mej93xvq4denfphhz53u sc74cn09.pbm.ihost.com Ready    Drain Leader 18.09.2
j2b9596uktzhupuxv6td22mm1 sc74cn10.pbm.ihost.com Ready    Active 18.09.2
11.2.9 Demoting a manager node
We demoted sc74cn04.pbm.ihost.com server, as shown in Example 11-18. Therefore, this server does not process manager nodes operations.
Example 11-18 Demoting a manager node
admin@570e9473805e:~$ docker node demote sc74cn04.pbm.ihost.com
Manager sc74cn04.pbm.ihost.com demoted in the swarm.
 
admin@570e9473805e:~$ docker node ls
ID HOSTNAME STATUS   AVAILABILITY MANAGER STATUS ENGINE VERSIO
1xk53pdq2ujayiea2vbfu3bs1 * sc74cn03.pbm.ihost.com Ready    Drain Reachable 18.09.2
cidllaskmus095rrb221ii9b9 sc74cn04.pbm.ihost.com Ready    Active 18.09.2
4y9a3mej93xvq4denfphhz53u sc74cn09.pbm.ihost.com Ready    Drain Leader 18.09.2
j2b9596uktzhupuxv6td22mm1 sc74cn10.pbm.ihost.com Ready    Active 18.09.2
11.2.10 Scaling down a service
Use the command that is shown in Example 11-19 to scale down your service. In this example, we set the web server service to have only four instances.
Example 11-19 Example of how to scale down service
admin@570e9473805e:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
pwu25hj3wpww webserver replicated 105/105 nginx:latest *:80->80/tcp
 
admin@570e9473805e:~$ docker service scale webserver=4
webserver scaled to 4
overall progress: 4 out of 4 tasks
1/4: running [==================================================>]
2/4: running [==================================================>]
3/4: running [==================================================>]
4/4: running [==================================================>]
verify: Service converged
 
admin@570e9473805e:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
pwu25hj3wpww webserver replicated 4/4 nginx:latest *:80->80/tcp
11.2.11 Considerations regarding the number of manager nodes
It is vital to understand how swarm mode’s fault-tolerance feature works to prevent problems with your cluster service. Although it is possible to run a cluster with only one manager node, this approach is not ideal for organizations that have high-availability policies. If the single-node manager fails, the services continue to process the user requests. However, you must create a cluster to recover the manager node operations.
Docker uses a consensus algorithm that is named raft to achieve internal consistency across the entire swarm cluster and all containers that are running on it. Docker recommends that you implement an odd number, based to your high-availability requirements, according to the following rules:
A three-manager swarm tolerates a maximum loss of one manager.
A five-manager swarm tolerates a maximum simultaneous loss of two manager nodes.
An N-manager cluster tolerates the loss of at most (N-1)/2 managers.
For more information about the cluster algorithm, see this web page.

1 https://docs.docker.com/engine/swarm/key-concepts/#what-is-a-swarm
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.23.130.108