Control play execution for rolling updates

By default, Ansible parallelizes tasks on multiple hosts at the same time to speed up automation tasks in large inventories. The setting for this is defined by the forks parameter in the Ansible configuration file, which defaults to 5 (so, by default, Ansible attempts to run its automation job on five hosts at the same time).

In a load-balanced environment, this is not ideal, especially if you want to avoid downtime. Suppose we have five frontend servers in an inventory (or perhaps even fewer). If we allow Ansible to update all of these at the same time, the end users may experience a loss of service. So, it is important to consider updating all of the servers at different times. Let's reuse our inventory from the previous section with just two servers in it. Obviously, if these were in a load-balanced environment, it would be vital that we only update one of these at a time; if both were taken out of service simultaneously, then end users would definitely lose access to the service until the Ansible play completes successfully.

The answer to this is to use the serial keyword in the play definition to determine how many hosts are operated on at once. Let's demonstrate this through a practical example:

Create the following simple playbook to run two commands on the two hosts in our inventory. The content of the command is not important at this stage, but if you run the date command using the command module, you will be able to see the time that each task is run, as well as if you specify -v to increase the verbosity when you run the play:

---
- name: Simple serial demonstration play
  hosts: frontends
  gather_facts: false

  tasks:
    - name: First task
      command: date
    - name: Second task
      command: date

Now, if you run this play, you will see that it performs all the operations on each host simultaneously, as we have fewer hosts than the default number of forks—5. This behavior is normal for Ansible, but not really what we want as our users will experience service outage:

$ ansible-playbook -i hosts serial.yml

PLAY [Simple serial demonstration play] ****************************************

TASK [First task] **************************************************************
changed: [frt02.example.com]
changed: [frt01.example.com]

TASK [Second task] *************************************************************
changed: [frt01.example.com]
changed: [frt02.example.com]


PLAY RECAP *********************************************************************
frt01.example.com : ok=2 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
frt02.example.com : ok=2 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Now, let's modify the play definition, as shown. We'll leave the tasks sections exactly as they were in step 1:

---
- name: Simple serial demonstration play
  hosts: frontends
  serial: 1
  gather_facts: false

Notice the presence of the serial: 1 line. This tells Ansible to complete the play on 1 host at a time before moving on to the next. If we run the play again, we can see this in action:

$ ansible-playbook -i hosts serial.yml

PLAY [Simple serial demonstration play] ****************************************

TASK [First task] **************************************************************
changed: [frt01.example.com]

TASK [Second task] *************************************************************
changed: [frt01.example.com]

PLAY [Simple serial demonstration play] ****************************************

TASK [First task] **************************************************************
changed: [frt02.example.com]

TASK [Second task] *************************************************************
changed: [frt02.example.com]

PLAY RECAP *********************************************************************
frt01.example.com : ok=2 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
frt02.example.com : ok=2 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Much better! If you imagine that this playbook actually disables these hosts on a load balancer, performs an upgrade, and then re-enables the hosts on the load balancer, this is exactly how you would want the operation to proceed. Doing so without the serial: 1 directive would result in all the hosts being removed from the load balancer at once, causing a loss of service.

It is useful to note that the serial directive can also take a percentage instead of an integer. When you specify a percentage, you are telling Ansible to run the play on that percentage of hosts at one time. So, if you have 4 hosts in your inventory and specify serial: 25%, Ansible will only run the play on one host at a time. If you have 8 hosts in your inventory, it will run the play on two hosts at a time. I'm sure you get the idea!

You can even build on this by passing a list to the serial directive. Consider the following code:

  serial:
    - 1
    - 3
    - 5

This tells Ansible to run the play on 1 host, initially, then on the next 3, and then on batches of 5 at a time until the inventory is completed. You can also specify a list of percentages in place of the integer numbers of hosts. In doing this, you will build up a robust playbook that can perform rolling updates without causing a loss of service to end users. With this complete, let's further build on this knowledge by looking at controlling the maximum failure percentage that Ansible can tolerate before it aborts a play, which will again be useful in highly available or load-balanced environments such as this.

Table of Contents for Control&#xA0;play execution for rolling updates

Create new playlist

Sign In

Sign Up

Table of Contents for
Control play execution for rolling updates