Ansible is a tool, written by people, that runs playbooks, written by people, to configure systems that would ordinarily be manually performed by people, and as such, errors can occur. The end result is only as good as the input.
Typical failures either occur quickly, such as connection problems, and will be relatively self-evident, or after long running jobs that may be as a result of load or network timeouts. In any case, the OpenStack-Ansible playbooks provide an efficient mechanism to rerun playbooks without having to repeat the tasks it has already completed.
On failure, Ansible produces a file in /root
(as we're running these playbooks as root
) called the playbook name, with the file extension of .retry
. This file simply lists the hosts that had failed so this can be referenced when running the playbook again. This targets the single or small group of hosts, which is far more efficient than a large cluster of machines that successfully completed.
We will step through a problem that caused one of the playbooks to fail.
Note the failed playbook and then invoke it again with the following steps:
playbooks
directory as follows:cd /opt/openstack-ansible/playbooks
retry
file:ansible-openstack setup-openstack.yml --retry /root/setup-openstack.retry
retry
file.Should there be a failure at this first stage, execute the following:
inventory
files:rm -f /etc/openstack_deploy/openstack_inventory.json rm -f /etc/openstack_deploy/openstack_hostnames_ips.yml
setup-hosts.yml
playbook:cd /opt/openstack-ansible/playbooks openstack-ansible setup-hosts.yml
In some situations, it might be applicable to destroy the installation and begin again. As each service gets installed in LXC containers, it is very easy to wipe an installation and start from the beginning. To do so, carry out the following steps:
cd /opt/openstack-ansible/playbooks openstack-ansible lxc-containers-destroy.yml
You will be asked to confirm this action. Follow the ons-screen prompts.
ansible hosts -m shell -a "pip uninstall -y appdirs"
rm -f /etc/openstack_deploy/openstack_inventory.json /etc/openstack_deploy/openstack_hostnames_ips.yml
Ansible is not perfect and so are computers. Sometimes failures occur in the environment due to SSH timeouts, or some other transient failure. Also, despite Ansible trying its best to retry the execution of a playbook, the result might be a failure. Failure in Ansible is quite obvious—it is usually predicated by outputs of red text on the screen. In most cases, rerunning the offending playbook may get over some transient problems. Each playbook runs a specific task, and Ansible will state which task has failed. Troubleshooting why that particular task had failed will eventually lead to a good outcome. Worst case, you can reset your installation from the beginning.
3.147.44.182