CHAPTER 2

Applying Practical Automation

You need to know several key things before you automate a new procedure or task. (Well, first you need to know where your soda and potato chips are. Find them? Okay, moving on.) This chapter presents the prerequisite information in an easy-to-digest format. We'll demonstrate these same key points in later chapters when configuring our example systems. You might want to review this chapter after reading the entire book, especially when embarking on a new automation project.

This chapter assumes familiarity with Bourne Shell scripting. Experienced SAs shy away from scripting specifically for the Bash shell (Bourne-Again SHell) except when absolutely necessary. Even if your site has Bash installed everywhere today, you might have to integrate some new systems into your infrastructure tomorrow due to an acquisition. If the script that does some initial automation framework setup—such as installing cfengine or other required administrative utilities—doesn't work on the new systems, you're in for some serious extra work. If your scripting is as portable as possible from the start, in effect you're buying insurance against future pain.

Seeing Everything As a File

One of the core strengths of UNIX and UNIX-like operating systems is the fact that almost everything on the system is represented to the user as a file. Both real and pseudo devices (such as /dev/null, /dev/zero, and so on) can be read from and (often) written to as normal files. This capability has made many operations easy, when the same results would be difficult to attain under other operating systems. Be thankful for the UNIX heritage of being written for and by programmers.

For example, if you want to create an ISO file on a remote system from a DVD in your laptop, you could run this:

dd if=/dev/cdrom | ssh remotehost 'dd of=/opt/big/vmware/sol10.iso'

Linux represents the CD/DVD drive as a file, in this case /dev/cdrom, so you simply use the dd command to copy it bit for bit to a different file. If you don't have the disk space on your laptop for storing the ISO file, you can pipe the dd output over SSH and use dd again on the remote host to place the output in a single file.

You can then configure VMware to mount the ISO file as a CD-ROM drive (standard VMware functionality) and quickly boot from the device and install on a host with no physical CD/DVD drive.

You probably won't ever need to automate ISO-file creation (although every site is different), but it's important to remember that the vast majority of automation operations are based on copying and/or modifying files. Either you need to update a file by copying a new file over it, edit the file in place, or copy out an additional file or files.

Often when files change or new files are distributed, a process on the host needs to restart so the host can recognize the change. Sometimes a host process starts for the first time if the new files comprise a new server/daemon process distributed as a package, tarball, or simply a file.

The bulk of what we'll be doing in this book is copying files, modifying files, and taking actions based on the success or failure of earlier file operations. Certain operations might prove tricky, but most of what we're doing should be familiar to UNIX SAs.

Understanding the Procedure Before Automating It

We've seen many administrators open a cfengine config file to automate a task and end up sitting there, unsure of what to do. It's an easy mistake to make when you need to modify many hosts and want to start the automation right away. The reason they ended up drawing a blank is that they weren't ready to effect changes on even a single host. They needed first to figure out how to reach the desired state.

This is the first rule of automation: automation is simply a set of already working steps, tied together in an automated manner.

This means that the first step toward automating a procedure usually involves manual changes! A development system (such as an SA's desktop UNIX/Linux system or a dedicated server system) is used to build, install, and configure software. You might need to perform these activities separately for all your site's operating systems and hardware platforms (SPARC vs. x86 vs. x86_64, etc.).

Here's an overview of the automated change development process:

  • Make the change in a test environment.
  • Make it fit your policy; for example, make it run as a nonroot user or install it in a specific directory tree.
  • Automate the deployment steps.
  • Test the deployment to a small number of testing or staging hosts and confirm that you achieve the desired effects.
  • Deploy the change to all hosts using the newly developed automation.

So with automation, you simply take the solid work that you already do manually and speed it up. The side effect is that you also reduce the errors involved when deploying the change across all the systems at your site.

Exploring an Example Automation

In this section we'll take a set of manual steps frequently performed at most sites and turn it into an automated procedure. We'll use the example to illustrate the important points about creating automated procedures.

Scripting a Working Procedure

An SA needs to create user accounts regularly. In this case, you'll use several commands to create a directory on a central Network File System (NFS) server and send off a welcome e-mail. You must run the commands on the correct host because the accounts from that host are pushed out to the rest of the hosts.

To begin the automation process, the SA can simply take all the commands and put them into a shell script. The script might look as simple as this:

#!/bin/sh
useradd $1
cp /opt/admin/etc/skel/.* /home/$1/

Then the SA composes an e-mail to the new user with important information (having a template for the user e-mail is helpful). This procedure works, but another SA cannot use it easily. If it generates any errors, you might find it difficult to determine what failed. Plus, you might encounter problems because the script attempts all the steps regardless of any errors resulting from earlier steps. In just a few minutes, you can make some simple additions to turn this procedure into a tool that's usable by all SA staff:

#!/bin/sh
PATH=/sbin:/usr/sbin:/bin:/usr/bin
REQUIRED_HOST=adminhost1

usage() {
        echo "Usage: $0 account_name"
        echo "Make sure this is run on the host: $REQUIRED_HOST"
        exit 1
}
MYHOSTNAME='hostname'

[ -n "$1" -a $MYHOSTNAME == $REQUIRED_HOST ] || usage

USERNAME=$1

useradd -m $USERNAME || exit 1
cp /opt/admin/etc/skel/.bash* /home/$USERNAME/ || exit 1

/usr/bin/mailx -s "Welcome to our site" ${1}@example.net <<EOF
The SA team has created an account for you on the UNIX systems.
You have a default password that's unique to your account,
which will need to be changed upon initial login.
The system will force this password change.

Please call the SA help desk at 555-1212 in order to receive your
password, and to ask any questions that you may have.
EOF

Because the revised script ensures that it's running on the right host and that an argument is passed to it, it now helps the SA make sure it's not called incorrectly. This helps the author and any other users of the script. Having usage information should be considered mandatory for all administrative scripts, even if the scripts are meant to be used only by the original author.

Another advantage of scripting this procedure is that the same message is sent to all new users. Consistency is important for such communications, and it'll help ensure that new users are productive as soon as possible in their new environment.

Administrative scripts should not run if the arguments or input is not exactly correct. You could also improve the preceding script to ensure that the username supplied meets certain criteria.

Prototyping Before You Polish

The preceding script is still a prototype. If you were to give it an official version number, it would need to be something like 0.5, meaning that it's not yet intended for general release. Other SA staff members can run this functional prototype to see if it achieves the desired goal of creating a working user account.

Once this goal is achieved, the automation author can move on to the next step of polishing the script. The SA shouldn't spend much time on cosmetic issues such as more verbose usage messages before ensuring the procedure achieves the desired goal. Such things can wait.

Turning the Script into a Robust Automation

Now you want to turn the script into something you would consider version 1.0—something that will not cause errors when used in unintended ways. Every automation's primary focus should be to achieve one of two things:

  • A change to one or more systems that achieves a business goal: The creation of a new user account falls into this category.
  • No change at all: If something unexpected happens at any point in the automation, no changes should be made at all. This means that if an automated procedure makes several changes, a failure in a later stage should normally result in a rollback of the earlier changes (where appropriate or even possible).

Your earlier user-creation script could use some improved error messages, as well as a rollback step. Give that a shot now:

#!/bin/sh
# Written by ncampi 05/26/08 for new UNIX user account creation
# WARNING!!!! If you attempt to run this for an existing username,
# it will probably delete that user and all their files!
# Think about adding logic to prevent this.

# set the path for safety and security
PATH=/usr/sbin:/bin:/usr/bin

# update me if we fail over or rebuild/rename the admin host
REQUIRED_HOST=adminhost1

usage() {
         echo "Usage: $0 account_name"
         echo "Make sure this is run on the host: $REQUIRED_HOST"
        exit 1
}

die() {
        echo ""
        echo "$*"
        echo ""
        echo "Attempting removal of user account and exiting now."
        userdel -rf $USERNAME
        exit 1
}
MYHOSTNAME='hostname'

[ -n "$1" -a $MYHOSTNAME == $REQUIRED_HOST ] || usage

USERNAME=$1

useradd -m $USERNAME || die "useradd command failed."
cp /opt/admin/etc/skel/.bash* /home/$USERNAME/ ||
die "Copy of skeleton files failed."

/usr/bin/mailx -s "Welcome to our site" ${1}@example.net <<EOF

The SA team has created an account for you on the UNIX systems.
You have a default password that's unique to your account,
which will need to be changed upon initial login. The system will
force this password change upon your first login.

Please visit the SA help desk in order to receive your password,
and to ask any questions that you may have.
EOF

It seems like a bad idea to trust that someone who calls your help desk claiming to be a new user is really the person in question, even if caller ID tells you the phone resides in your building. You might want to require that the user physically visit your help desk. If this isn't possible, the SA staff should come up with a suitable substitute such as calling the new user's official extension, or perhaps having the new user identify himself or herself with some private information such as a home address or employee number.

Attempting to Repair, Then Failing Noisily

The preceding script attempts a removal of the new user account when things go wrong. If the account was never created, that's okay because the userdel command will fail, and it should fail with a meaningful error message such as "No such account."

You'll encounter situations where a rollback is multistep, so you'll need to evaluate each step's exit code and indicate or contraindicate further rollback steps based on those exit codes. Be sure to emit messages about each step being taken and the results of those steps when the command is an interactive command. As the script author you know exactly what a failure means at each step, so be sure to relay that information to the SA running the script.

Each and every step in an automation or administrative script needs to ensure success; don't ever move on blindly with the assumption that a command worked. Even something as simple as copying a few config files into a new user's home directory can fail when a disk fills up. Assumptions can and will bite you.

Focusing on Results

When in doubt, opt for simplicity. Don't attempt fancy logic and complicated commands when the goal is simple.

For example, you might have a script that takes a list of Domain Name System (DNS) servers and generates a resolv.conf file that's pushed to all hosts at your site. When a new DNS server is added or a server is replaced with another, you need to run the script to update the file on all your systems.

Instead of running the script to generate the file on each and every host at your site, you can run the command on one host, take the resulting output, and push that out as a file to all hosts. This technique is simple and reliable compared to the requirement of running a command successfully on every host. A complicated procedure becomes a simple file push. This is the KISS (Keep It Simple, Stupid) principle in all its glory. Our system administration experience has taught us that increased simplicity results in increased reliability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.81.166