CHAPTER 7

Automating a New System
Infrastructure

Every UNIX-based site requires a similar list of infrastructure services in order to function. All sites need to keep the correct time, route e-mail from system processes (such as cron jobs, and in our case cfexecd) to the correct place, convert hostnames into IP addresses, and control user accounts.

We think it's only fair to warn you that this chapter won't go into great detail on the protocols and server software that we'll configure. If we had to explain DNS, NTP, SMTP, NFS, the automounter, and UNIX authentication files in great detail, the chapter would never end. Additionally, it would draw focus away from our goal of automating a new infrastructure using cfengine. We'll recommend other sources of information for the protocols and server software as we progress though the chapter.

When we refer to files in the cfengine masterfiles repository on our central host (goldmaster), we'll use paths relative to /var/lib/cfengine2/masterfiles. This means that the full path to PROD/inputs/tasks/os/cf.ntp is /var/lib/cfengine2/masterfiles/PROD/inputs/tasks/os/cf.ntp.

Implementing Time Synchronization

Many programs and network protocols fail to function properly when the clock on two systems differ by more than a small amount.

The lack of time synchronization can cause extensive problems at a site. These are the most common:

  • E-mail messages have the incorrect time.
  • Log entries cannot be correlated across different systems.
  • Monitoring alerts specify the incorrect time for outages.
  • Authentication transactions fail.
  • Automation-system changes based on file- modification times work improperly.
  • Software build tools such as make (which depend on file-modification times) break.

We'll tackle Network Time Protocol (NTP) configuration before any other infrastructure setup tasks. We won't go into the level of detail that you'll want if you're deploying NTP across hundreds or thousands of systems. If that's the case, accept our apologies and proceed over to http://www.ntp.org to browse the online documentation, or head to your nearest bookseller and pick up a copy of Expert Network Time Protocol by Peter Rybaczyk (Apress, 2005).

The fact that we already have six hosts at our example site without synchronized clocks is a potential problem. The cfservd daemon will refuse to serve files to clients if the clocks on the systems differ by more than one hour. You can turn off this behavior with this setting in cfservd.conf:

DenyBadClocks off

It might make sense to turn it off during the initial bootstrapping phase at your site, before you deploy NTP.

NTP is the Internet standard for time synchronization. Interestingly, it's one of the oldest Internet standards still in widespread use. NTP is a mechanism for transmitting the universal time (UTC, or Coordinated Universal Time) between systems on a network. It is up to the local system to determine the local time zone and Daylight Saving settings, if applicable. NTP has built-in algorithms for dealing with variable network latency, and can achieve rather impressive accuracy even over the public Internet.

External NTP Synchronization

The ntp.org web site has a list of public NTP servers here: http://support.ntp.org/bin/view/Servers/WebHome. These are groups of public NTP servers that use round-robin DNS to enable clients to make a random selection from the group. Both Red Hat and Debian have NTP pools set up this way, and the NTP packages from those distributions utilize these pools by default.

Our intention is to have two of our internal servers synchronize to an external source, and have the rest of our systems synchronize from those two. This is the polite way to utilize a public NTP source: placing as little load as possible on it. We don't want a single system to perform off-site synchronization for our entire network because it becomes a single point of failure. We generally want to set up DNS aliases for system roles such as NTP service, but NTP configuration files use IP addresses. This actually works out well because we have yet to set up internal DNS.

Internal NTP Masters

We'll use our cfengine master host (goldmaster.campin.net) and our Red Hat Kickstart system (rhmaster.campin.net) as the two systems that sync to an external NTP source.


Note  There is no reason to choose Linux over Solaris systems to handle this role. You should find it quite easy to modify this procedure to use one or more Solaris systems to synchronize off site instead, and have all other systems synchronize to the internal Solaris NTP servers.


The Red Hat system already had ntpd installed (the ntp RPM package). If you wish to graphically configure NTP on Red Hat, you'll need to have the system-config-date RPM installed. Basic NTP configuration is straightforward, so we'll stick with text-based methods of configuration.

The Debian system didn't have the required packages installed, so we used apt-get to install the ntp package. We went back to our FAI configuration and added the line ntp to the file /srv/fai/config/package_config/FAIBASE so that all future Debian installs have the package by default. Our Kickstart installation process already installs the ntp RPM, so we don't have to make any Kickstart modifications.

Here is the /etc/ntpd.conf file that we'll use on our systems that synchronize to off-site NTP sources:

driftfile /var/lib/ntp/ntp.drift
statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

# pool.ntp.org maps to more than 300 low-stratum NTP servers.
# Your server will pick a different set every time it starts up.
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
server 3.debian.pool.ntp.org iburst
# By default, exchange time with everybody, but don't allow configuration.
# See /usr/share/doc/ntp-doc/html/accopt.html for details.
restrict −4 default kod notrap nomodify nopeer noquery
restrict −6 default kod notrap nomodify nopeer noquery

# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1

# allow the local subnet to query us
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

Both Red Hat and Debian have a dedicated user to run the NTP daemon process. The user account, named "ntp," will need write access to the /var/lib/ntp directory.

When you name a subnet using the restrict keyword and omit the noquery keyword, the server allows NTP client connections from that subnet.

Configuring the NTP Clients

Now that we have working NTP servers on our network, we need configuration files for the Linux (both Red Hat and Debian) and Solaris systems on our network. We refer to the systems running NTP to synchronize only with internal hosts as NTP "clients."

Solaris 10 NTP Client

You'll find it easy to configure a single Solaris 10 system to synchronize its time using NTP. We will automate the configuration across all our Solaris systems later, but will first test our configuration on a single host to validate it. Simply copy /etc/inet/ntp.servers to /etc/inet/ntp.conf, and comment out these lines:

server 127.127.XType.0
fudge 127.127.XType.0 stratum 0
keys /etc/inet/ntp.keys
trustedkey 0
requestkey 0
controlkey 0

Add lines for our internal NTP servers:

server 192.168.1.249
server 192.168.1.251

Create the file /var/ntp/ntp.drift as root using the touch command, and enable the ntp service:

# touch /var/ntp/ntp.drift
# /usr/sbin/svcadm enable svc:/network/ntp

It's really that easy. Check the /var/log/messages log file for lines like this, indicating success:

Jul 27 18:05:30 aurora ntpdate[995]: [ID 558275 daemon.notice] adjust time server
192.168.1.249 offset 0.008578 sec

Red Hat and Debian NTP Client

We use the same NTP configuration-file contents for all the remaining Debian and Red Hat hosts at our site, shown here:

driftfile /var/lib/ntp/ntp.drift
statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

# By default, exchange time with everybody, but don't allow configuration.
# See /usr/share/doc/ntp-doc/html/accopt.html for details.
restrict −4 default kod notrap nomodify nopeer noquery
restrict −6 default kod notrap nomodify nopeer noquery

# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1

restrict 192.168.1.249   nomodify        # goldmaster.campin.net
restrict 192.168.1.251   nomodify        # rhmaster.campin.net

You'll notice that these file contents resemble the contents of the configuration file used on the hosts that sync off site. The difference here is that we have no server lines, and we added new restrict lines specifying our local NTP server systems.

Copying the Configuration Files with cfengine

Now we will distribute the NTP configuration file using cfengine, including automatic ntp daemon restarts when the configuration file is updated. First, put the files into a suitable place in the cfengine master repository (on the host goldmaster):

# cd /var/lib/cfengine2/masterfiles/PROD/repl/root/etc/ntp/
# ls −1
ntp.conf
ntp.conf-master
ntp.server

You might remember that we created the ntp directory back when we first set up the masterfiles repository. The ntp.conf-masters file is meant for rhmaster and goldmaster, the hosts that synchronize NTP using off-site sources. The ntp.conf file is for all remain-ing Linux hosts, and ntp.server is our Solaris 10 NTP configuration file.

We'll create a task file at the location PROD/inputs/tasks/os/cf.ntp on the cfengine master (goldmaster). Once the task is written, we'll import it into the PROD/inputs/ hostgroups/cf.any file for inclusion across our entire site. Here is the task file:

classes:  # synonym groups:

        ntp_servers                     = (     rhmaster
                                                goldmaster
                                                )

Now we define a simple group of two hosts, the machines that sync off site:

control:

        any::
                AddInstallable          = ( restartntpd )
                AllowRedefinitionOf     = ( ntp_conf_source )

                #
                # The default ntp.conf doesn't sync off-site
                #
                ntp_conf_source = ( "ntp.conf" )

       linux::
                ntp_user        = ( "ntp" )
                ntp_conf_dest   = ( "/etc/ntp.conf" )
                drift_file      = ( "/var/lib/ntp/ntp.drift" )
       solaris|solarisx86::
                ntp_user        = ( "root" )
                ntp_conf_dest   = ( "/etc/inet/ntp.conf" )
                drift_file      = ( "/var/ntp/ntp.drift" )
                ntp_conf_source = ( "ntp.server" )

       ntp_servers::
                # the ntp.conf for these hosts causes ntpd to sync
                # off-site, and share the information with the local net
                ntp_conf_source = ( "ntp.conf-master" )

In the control section, you define class-specific variables for use in the files and copy actions:

files:
        # ensure that the drift file exists and is
        # owned and writable by the correct user
        any::
                $(drift_file) mode=0644 action=touch
                  owner=$(ntp_user) group=$(ntp_user)

If we didn't use variables for the location of the NTP drift file and the owner of the ntpd process, we would have to write multiple files stanzas. When the entry is duplicated with a small change made for the second class of systems, you face a greater risk of making errors when both entries have to be updated later. We avoid such duplication.

We also manage to write only a single copy stanza, again through the use of variables:

copy:

        any::
                $(master_etc)/ntp/$(ntp_conf_source)
                                dest=$(ntp_conf_dest)
                                mode=644
                                type=checksum
                                server=$(fileserver)
                                encrypt=true
                                owner=root
                                group=root
                                define=restartntpd

Here we copy out the applicable NTP configuration file to the correct location for each operating system. When the file is successfully copied, the restartntpd class is defined. This triggers actions in the following shellcommands section:

shellcommands:

        # restart ntpd when the restartntpd class is defined
        debian.restartntpd::
                "/etc/init.d/ntp restart" timeout=30 inform=true

        # restart ntpd when the restartntpd class is defined
        redhat.restartntpd::
                "/etc/init.d/ntpd restart" timeout=30 inform=true

        # restart ntpd when the restartntpd class is defined
        (solarisx86|solaris).restartntpd::
                "/usr/sbin/svcadm restart svc:/network/ntp" timeout=30 inform=true

When the ntp.conf file is updated, the class restartntpd is defined, and it causes the ntp daemon process to restart. Based on the classes a system matches, the restartntpd class causes cfengine to take the appropriate restart action.

Note that we have two almost identical restart commands for the debian and redhat classes. We could have reduced that to a single stanza, as we did for the files and copy actions. Combining those into one shellcommands action is left as an exercise for the reader.

Now let's look at the processes section:

processes:

        # start ntpd when it's not running
        debian::
                "ntpd" restart "/etc/init.d/ntp start"

        # start ntpd when it's not running
        redhat::
                "ntpd" restart "/etc/init.d/ntpd start"

        # this is for when it's not even enabled
        solarisx86|solaris::
                "ntpd" restart "/usr/sbin/svcadm enable svc:/network/ntp"

In this section, we could have used the restartntpd classes to trigger the delivery of a HUP signal to the running ntpd process. We don't do that because a HUP signal causes the ntpd process to die. For this reason, we use the init scripts on Linux and the SMF on Solaris.

This task represents how we'll write many of our future cfengine tasks. We'll define variables to handle different configuration files for different system types, then use actions that utilize those variables.

The required entry in PROD/inputs/hostgroups/cf.any to get all our hosts to import the task is the file path relative to the inputs directory:

import:
         any::
                 tasks/os/cf.motd
                 tasks/os/cf.cfengine_cron_entries
                 tasks/os/cf.ntp

If you decide that more hosts should synchronize off site, you'd simply configure an additional Linux host to copy the ntp.conf-masters file instead of the ntp.conf file. You'd need to write a slightly modified Solaris ntp.server config file if you choose to have a Solaris host function in this role. We haven't done so in this book—not because Solaris isn't suited for the task, but because we needed only two hosts in this role. You'd then add a new restrict line to the NTP client configuration file on Linux, or a new server line for Solaris NTP clients. That's three easy steps to make our site utilize an additional local NTP server.

An Alternate Approach to Time Synchronization

We can perform time synchronization at our site using a much simpler procedure than running the NTP infrastructure previously described. We can simply utilize the ntpdate utility to perform one-time clock synchronization against a remote NTP source. To manually use ntpdate once, run this at the command line as root:

# /usr/sbin/ntpdate 0.debian.pool.ntp.org
20 Sep 17:09:15 ntpdate[181]: adjust time server 208.113.193.10 offset −0.00311 sec

Note that ntpdate will fail if a local ntpd process is running, due to contention for the local NTP TCP/IP port (UDP/123). Temporarily stop any running ntpd processes if you want to test out ntpdate.

We consider this method of time sychronization to be useful only on a temporary basis. The reason for this is that ntpdate will immediately force the local time to be identical to the remote NTP source's time. This can (and often does) result in a major change to the local system's time, basically a jump forward or backward in the system's clock.

By contrast, when ntpd sees a gap between the local system's time and the remote time source(s), it will gradually decrease the difference between the two times until they match. We prefer the approach that ntpd uses because any logs, e-mail, or other information sources where the time is important won't contain misleading times around and during the clock jump.

Because we discourage the use of ntpdate, we won't demonstrate how to automate its usage. That said, if you decide to use ntpdate at your site, you could easily run it from cron or a cfengine shellcommands section on a regular basis.

Incorporating DNS

The Domain Name System (DNS) is a globally distributed database containing domain names and associated information. Calling it a "name-to-IP-address mapping service" is overly simplistic, although it's often described that way. It also contains the list of mail servers for a domain as well as their relative priority, among other things. We don't go into great detail on how the DNS works or the finer details of DNS server administration, but you can get more information from DNS and BIND, Fifth Edition by Cricket Liu and Paul Albitz (O'Reilly Media Inc., 2006), and the Wikipedia entry at http://en.wikipedia.org/wiki/Domain_Name_System.

Choosing a DNS Architecture

Standard practice with DNS is to make only certain hostnames visible to the general public. This means that we wouldn't make records such as those for goldmaster.campin.net available to systems that aren't on our private network. When we need mail to route to us from other sites properly or get our web site up and running, we'll publish MX records (used to map a name to a list of mail exchangers, along with relative preference) and an A record (used to map a name to an IPv4 address) for our web site in the public DNS.

This sort of setup is usually called a "split horizon," or simply "split" DNS. We have the internal hostnames for the hosts we've already set up (goldmaster, etchlamp, rhmaster, rhlamp, hemingway, and aurora) loaded into our campin.net domain with a DNS-hosting company. We'll want to remove those records at some point because they reference private IP addresses. They're of no use to anyone outside our local network and therefore should be visible only on our internal network. We'll enable this record removal by setting up a new private DNS configuration and moving the private records into it.

Right about now you're thinking "Wait! You've been telling your installation clients to use 192.168.1.1 for both DNS and as a default gateway. What gives? Where did that host or device come from?" Good, that was observant of you. When we mentioned that this book doesn't cover the network-device administration in our example environment, we meant our single existing piece of network infrastructure: a Cisco router at 192.168.1.1 that handles routing, Network Address Translation (NAT), and DNS-caching services. After we get DNS up and running on one or more of our UNIX systems, we'll have cfengine configure the rest of our systems to start using our new DNS server(s) instead.

Setting Up Private DNS

We'll configure an internal DNS service that is utilized only from internal hosts. This will be an entirely stand-alone DNS infrastructure not linked in any way to the public DNS for campin.net.

This architecture choice means we need to synchronize any public records (currently hosted with a DNS-hosting company) to the private DNS infrastructure. We currently have only mail (MX) records and the hostnames for our web site (http://www.campin.net and campin.net) hosted in the public DNS. Keeping this short list of records synchronized isn't going to be difficult or time-consuming.

We'll use Berkeley Internet Name Domain (BIND) to handle our internal DNS needs.


Note  Be sure that the BIND software you install is resistant to the DNS protocol flaw made public in July 2008. Also, if your DNS servers are behind NAT, make sure your NAT device doesn't defeat the port randomization that works around the flaw. For more information, see the CERT advisory here: http://www.kb.cert.org/vuls/id/800113.


BIND Configuration

We'll use the etchlamp system that was installed via FAI as our internal DNS server. Once it's working there, we can easily deploy a second system just like it using FAI and cfengine.

First, we need to install the bind9 package, as well as add it to the set of packages that FAI installs on the WEB class.

In order to install the bind9 package without having to reinstall using FAI, run this command as the root user on the system etchlamp:

# apt-get update && apt-get install bind9

The bind9 package depends on other packages such as bind-doc (and several more), but apt-get will resolve the dependencies and install everything required. Because FAI uses apt-get, it will work the same way, so we can just add the line "bind9" to the file /srv/ fai/config/package_config/WEB on our FAI host goldmaster. This will ensure that the preceding manual step never needs to be performed when the host is reimaged.

We'll continue setting up etchlamp manually to ensure that we know the exact steps to configure an internal DNS server. Once we're done, we'll automate the process using cfengine. Note that the bind9 package creates a user account named "bind." Add the lines from your passwd, shadow, and group files to your standardized Debian account files in cfengine. We'll also have to set up file-permission enforcement using cfengine. The BIND installation process might pick different user ID (UID) or group ID (GID) settings from the ones we'll copy out using cfengine.

The Debian bind9 package stores its configuration in the /etc/bind directory. The package maintainer set things up in a flexible manner, where the installation already has the standard and required entries in /etc/bind/named.conf, and the configuration files use an include directive to read two additional files meant for site-specific settings:

  • /etc/bind/named.conf.options: You use this file to configure the options section of named.conf. The options section is used to configure settings such as the name server's working directory, recursion settings, authentication-key options, and more. See the relevant section of the BIND 9 Administrator's Reference Manual for more information: http://www.isc.org/sw/bind/arm95/Bv9ARM.ch06.html#options.
  • /etc/bind/named.conf.local: This file is meant to list the local zones that this BIND instance will load and serve to clients. These can be zone files on local disk, zones slaved from another DNS server, forward zones, or stub zones. We're simply going to load local zones, making this server the "master" for the zones in question.

The existence of these files means that we don't need to develop the configuration files for the standard zones needed on a BIND server; we need only to synchronize site-specific zones. Here is the named.conf.options file as distributed by Debian:

options {
        directory "/var/cache/bind";

        // If there is a firewall between you and nameservers you want
        // to talk to, you might need to uncomment the query-source
        // directive below.  Previous versions of BIND always asked
        // questions using port 53, but BIND 8.1 and later use an unprivileged
        // port by default.

        // query-source address * port 53;

        // If your ISP provided one or more IP addresses for stable
        // nameservers, you probably want to use them as forwarders.
        // Uncomment the following block, and insert the addresses replacing
        // the all-0's placeholder.

        // forwarders {
        //      0.0.0.0;
        // };

        auth-nxdomain no;    # conform to RFC1035
        listen-on-v6 { any; };
};

The only modification we'll make to this file is to change the listen-on-v6 line to this:

listen-on-v6 { none; };

Because we don't intend to utilize IPv6, we won't have BIND utilize it either.

The default Debian /etc/bind/named.conf.local file has these contents:

//
// Do any local configuration here
//

// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/etc/bind/zones.rfc1918";

Note the zones.rfc1918 file. It is a list of "private" IP address ranges specified in RFC1918. The file has these contents:

zone "10.in-addr.arpa"      { type master; file "/etc/bind/db.empty"; };

zone "16.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "17.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "18.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "19.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "20.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "21.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "22.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "23.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "24.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "25.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "26.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "27.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "28.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "29.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "30.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };
zone "31.172.in-addr.arpa"  { type master; file "/etc/bind/db.empty"; };

zone "168.192.in-addr.arpa" { type master; file "/etc/bind/db.empty"; };

It is a good idea to include this configuration file, with an important caveat we'll cover later. When you use this file, the db.empty zone file is loaded for all the RFC1918 address ranges. And because those are valid zone files with no entries for individual reverse DNS records (i.e., PTR records), the DNS traffic for those lookups won't go out to the public DNS. A "host not found" response will be returned to applications looking up the PTR records for IPs in those ranges. Those IP ranges are intended only for private use, so the DNS traffic for these networks should stay on private networks. Most sites utilize those ranges, so the public DNS doesn't have a set of delegated servers that serves meaningful information for these zones.

The caveat mentioned earlier is that we will not want to serve the db.empty file for the 192.168.x.x range that we use at our site. This means we'll delete this line from zones.rfc1918:

zone "168.192.in-addr.arpa" { type master; file "/etc/bind/db.empty"; };

Then we'll uncomment this line in /etc/bind/named.conf.local by deleting the two slashes at the start of the line:

//include "/etc/bind/zones.rfc1918";

Next, you'll need to create the campin.net and 168.192.in-addr.arpa zone files. The file /etc/bind/db.campin.net has these contents:

$TTL 600
@               IN      SOA     etchlamp.campin.net. hostmaster.campin.net. (
                                2008072900 ; serial
                                1800       ; refresh (30 minutes)
                                600        ; retry (10 minutes)
                                2419200    ; expire (4 weeks)
                                600        ; minimum (10 minutes)
                                )

                IN      NS      etchlamp.campin.net.

; the A record for campin.net
        600     IN      A       66.219.68.159

etchlamp         IN      A       192.168.1.239
aurora             IN      A       192.168.1.248
goldmaster     IN      A       192.168.1.249
rhmaster         IN      A       192.168.1.251
rhlamp           IN      A       192.168.1.236
hemingway    IN      A       192.168.1.237

; www.campin.net is a CNAME back to the A record for campin.net
www     600     IN      CNAME   @

skitzo    86400   IN      A       64.81.57.165
scampi  86400   IN      A       66.219.68.159

; www.campin.net is a CNAME back to the A record for campin.net
www     600     IN      CNAME   @

; give the default gateway an easy to remember name
gw              IN      A       192.168.1.1

We created entries for our six hosts, our local gateway address, and some records from our public zone.

Next, you need to create the "reverse" zone, in the file /etc/bind/db.192.168:

$TTL    600
@               IN      SOA     etchlamp.campin.net. hostmaster.campin.net. (
                                2008072900 ; serial
                                1800       ; refresh (30 minutes)
                                600        ; retry (10 minutes)
                                2419200    ; expire (4 weeks)
                                600        ; minimum (10 minutes)
                                )

@       IN      NS      etchlamp.campin.net.

$ORIGIN 1.168.192.in-addr.arpa.

1       IN      PTR     gw.campin.net.

236     IN      PTR     rhlamp.campin.net.
237     IN      PTR     hemingway.campin.net.
239     IN      PTR     etchlamp.campin.net.
248     IN      PTR     aurora.campin.net.
249     IN      PTR     goldmaster.campin.net.
251     IN      PTR     rhmaster.campin.net.

The $ORIGIN keyword set all the following records to the 192.168.1.0/24 subnet's in-addr.arpa reverse DNS range. This made the records simpler to type in. Be sure to terminate the names on the right-hand side of all your records with a dot (period character) when you specify the fully qualified domain name.

Next, populate the file /etc/bind/named.conf.local with these contents, to utilize our new zone files:

include "/etc/bind/zones.rfc1918";

zone "campin.net" {
        type master;
        file "/etc/bind/db.campin.net";
};

zone "168.192.in-addr.arpa" {
        type master;
        file "/etc/bind/db.192.168";
};

Restart BIND using the included init script:

# /etc/init.d/bind9 restart

Look for errors from the init script, as well as in the /var/log/daemon.log log file. If the init script successfully loaded the zones, you'll see lines like this in the log file:

Jul 29 17:43:30 etchlamp named[2580]: zone 168.192.in-addr.arpa/IN: loaded serial
2008072900
Jul 29 17:43:30 etchlamp named[2580]: zone campin.net/IN: loaded serial 2008072900
Jul 29 17:43:30 etchlamp named[2580]: running

Test resolution from another host on the local subnet using the dig command:

$ dig @etchlamp gw.campin.net.

; <<>> DiG 9.3.4-P1.1 <<>> @etchlamp gw.campin.net.
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45274
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; QUESTION SECTION:
;gw.campin.net.                 IN      A

;; ANSWER SECTION:
gw.campin.net.          600     IN      A       192.168.1.1

;; AUTHORITY SECTION:
campin.net.             600     IN      NS      etchlamp.campin.net.

;; ADDITIONAL SECTION:
etchlamp.campin.net.    600     IN      A       192.168.1.239

;; Query time: 19 msec
;; SERVER: 192.168.1.239#53(192.168.1.239)
;; WHEN: Tue Jul 29 17:45:49 2008
;; MSG SIZE  rcvd: 86

This query returns the correct results. In addition, the flags section of the response has the aa bit set, meaning that the remote server considers itself authoritative for the records it returns. Do the same thing again, but this time query for a reverse record:

$ dig @etchlamp -x 192.168.1.1 ptr

; <<>> DiG 9.3.4-P1.1 <<>> @etchlamp -x 192.168.1.1 ptr
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47489
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; QUESTION SECTION:
;1.1.168.192.in-addr.arpa.      IN      PTR

;; ANSWER SECTION:
1.1.168.192.in-addr.arpa. 600   IN      PTR     gw.campin.net.

;; AUTHORITY SECTION:
168.192.in-addr.arpa.   600     IN      NS      etchlamp.campin.net.

;; ADDITIONAL SECTION:
etchlamp.campin.net.    600     IN      A       192.168.1.239

;; Query time: 2 msec
;; SERVER: 192.168.1.239#53(192.168.1.239)
;; WHEN: Tue Jul 29 17:46:11 2008
;; MSG SIZE  rcvd: 108

Again, we have successful results. We had to modify only three included files (zones. rfc1918, named.conf.local, and named.conf.options), and create two new ones (db.campin. net and db.192.168). Now we know the file locations and file contents that we need in order to host our private DNS on a Debian system running BIND.

Automating the BIND Configuration

We'll create a cfengine task to distribute our BIND configuration, and as usual it will restart the BIND daemon when the configuration files are updated.

Here are the steps to automate this process:

  1. Copy the BIND configuration files and zone files (that we created during the development process on etchlamp) to the cfengine master.
  2. Create a cfengine task that copies the BIND configuration files and zones, and restarts the BIND daemon when the files are copied.
  3. Define a new "DNS server" role in cfengine using a class.
  4. Create a new hostgroup file for this new server role in cfengine.
  5. Import the new task into the new DNS server hostgroup file in cfengine.
  6. Import the new hostgroup file into cfagent.conf, so that the hostgroup and task are used.
  7. Test out the entire automation process for the DNS server role by reimaging the DNS server host.

The first step is to get our files from etchlamp onto the cfengine master, in the correct location. Create the directory on goldmaster:

# mkdir -p /var/lib/cfengine2/masterfiles/PROD/repl/root/etc/bind/debian-ext

Now copy those five files from etchlamp to the new directory on goldmaster:

# pwd
/etc/bind
# scp zones.rfc1918 named.conf.local db.campin.net db.192.168 named.conf.options
 goldmaster:/var/lib/cfengine2/masterfiles/PROD/repl/root/etc/bind/debian-ext/

Name the task PROD/inputs/tasks/apps/bind/cf.debian_external_cache and start the task with these contents:

groups:
        have_etc_rndc_key       = ( FileExists(/etc/bind/rndc.key) )

Later in this task we'll perform permission fixes on the rndc.key file, but we like to make sure it's actually there before we do it.

We'll continue explaining the cf.debian_external_cache task. In the control section we tell cfengine about some classes that we dynamically define, and put in an entry for DefaultPkgMgr:

control:
        any::
                 addinstallable          = (     bind_installed bind_installed
                                                           reload_bind

                                                 )

        debian::
                 DefaultPkgMgr           = ( dpkg )

which is required when we use the packages action:

packages:
        debian::
                bind9
                        version=9.3.4
                        cmp=ge
                        define=bind_installed
                        elsedefine=bind_not_installed

We use the packages action simply to detect whether the bind9 package is installed, and we go with the version installed by Debian 4.0 ("Etch") as the minimum installed version. Assumptions will only lead to errors, so we double-check even basic assumptions such as whether BIND has been installed on the system at all.

Here we use the processes action to start up BIND when it is missing from the process list, but only if it's one of our external caches, and only if the bind9 package is installed:

processes:
        debian.bind_installed::
                "named" restart "/etc/init.d/bind9 start" inform=false umask=022

There's no point in even trying to start BIND if it isn't installed.

Here we copy the five files we placed into the debian-ext directory to the host's /etc/ bind directory:

copy:
        debian.bind_installed::
                $(master_etc)/bind/debian-ext/
                                dest=/etc/bind/
                                r=inf
                                mode=644
                                type=checksum
                                purge=false
                                server=$(fileserver)
                                encrypt=true
                                owner=root
                                group=root
                                define=reload_bind

We carefully named the source directory debian-ext because we might end up deploying BIND to our Debian hosts later in some other configuration. Having a complete source directory to copy makes the copy stanza simpler. We know that only the files we want to overwrite are in the source directory on the cfengine master—so be careful not to add files into the source that you don't want automatically copied out. You also have to be careful not to purge during your copy, or you'll lose all the default Debian bind9 configuration files you depend on.

This shellcommands section uses the reload_bind class to trigger a restart of the BIND daemon:

shellcommands:
        debian.restart_bind::
                # when the config is updated, reload bind
                "/etc/init.d/bind9 reload" timeout=30

The reload_bind class is defined when files are copied from the master, via the define= line.

These file and directory settings fix the important BIND files and directory permissions in the unlikely event that the bind user's UID and GID change:

files:
        debian.bind_installed.have_etc_rndc_key::
                /etc/bind/rndc.key owner=bind group=bind m=640 action=fixall
                                               inform=true syslog=on

directories:
        debian.bind_installed::
                /var/cache/bind mode=775 owner=root group=bind inform=true syslog=on
                /etc/bind mode=2755 owner=root group=bind inform=true syslog=on

Such an event happens if and when we later synchronize all the user accounts across our site. Now we'll take steps to recover properly from a bind-user UID/GID change. Set up an alerts section to issue a warning when you designate a host as an external_debian_bind_cache but don't actually have the bind9 package installed:

alerts:
        debian.!bind_installed::
                "Error: I am an external cache but I don't have bind9 installed."

We use the packages action in this task, so we need to add packages to the actionsequence in the control/cf.control_cfagent_conf file for cfengine to run it:

actionsequence = (
    directories
    disable
      packages
    copy
    editfiles
    links
    files
    processes
    shellcommands
)

Now we need to add the task to a hostgroup file, but it certainly isn't a good fit for the cf.any hostgroup. Create a new hostgroup file for the task and place it at PROD/inputs/hostgroups/cf.external_dns_cache. That name was chosen carefully; we won't assume that all our caching DNS servers will be running Debian, or even BIND for that matter. The role is to serve DNS to our network, and the hostgroup name is clear about that. The contents of this new hostgroup file are:

import:
        any::
                tasks/app/bind/cf.debian_external_cache

Now we need to define an alias for the hosts that serve this role. We'll edit PROD/inputs/classes/cf.main_classes and add this line:

caching_dns_servers     = (     etchlamp )

Then we'll edit cfagent.conf and add an import for the new hostgroup file for the caching_dns_servers class:

caching_dns_servers::           hostgroups/cf.external_dns_cache

Wait! If you were to run cfagent -qv on etchlamp at this point, the file PROD/inputs/hostgroups/cf.external_dns_cache would not be imported, even though cfagent's "Defined Classes" output shows that the caching_dns_servers class is set. Most people learn this important lesson the hard way, and we wanted you to learn it the hard way as well, so it will be more likely to stick.

To reorganize in a way that will work with cfengine's issues around imports but preserve our hostgroup system, delete these two lines from cfagent.conf:

any::            hostgroups/cf.any
caching_dns_servers::    hostgroups/cf.external_dns_cache

Place the line in a new file, hostgroups/cf.hostgroup_mappings, with these contents:

import:
         any::                               hostgroups/cf.any
         caching_dns_servers::           hostgroups/cf.external_dns_cache

Remember that any lines added below the cf.external_dns_cache import will apply only to the caching_dns_servers class, unless a new class is specified. That is a common error made by inexperienced cfengine-configuration authors, and often even experienced ones.

We need to add the cf.hostgroup_mappings file to cfagent.conf, by adding this line at the end:

hostgroups/cf.hostgroup_mappings

We don't need to specify the any:: class because it's already inherent in all of this task's imports. In fact, unless otherwise specified, it's inherent in every cfengine action.

Now we should validate that our hostgroup is being imported properly—by running cfagent -qv on etchlamp. Look for this line in the output:

Looking for an input file tasks/app/bind/cf.debian_external_cache

Success! All future hostgroup imports will happen from the cf.hostgroup_mappings file. We'll mention one last thing while on the subject of imports. Note that we don't do any imports in any of our task files. Any file containing actions other than import should not use the import action at all. You can get away with this if you do it carefully, but we'll avoid it like the plague.

Remember that every host that ever matches the caching_dns_servers class will import the cf.external_dns_cache hostgroup file, and therefore will also import the cf.debian_external_cache task. If a Solaris host is specified as a member of the caching_dns_servers class, it will not do anything unintended when it reads the cf.debian_external_cache task. This is because we specify the debian class for safety in the class settings for all our actions. You could further protect non-Debian hosts by importing the task only for Debian hosts from the hostgroups/cf.external_dns_cache file:

import:
        debian::
                 tasks/app/bind/cf.debian_external_cache

Importing the task this way is safer, but even if you do, you should make sure that your cfengine configuration files perform actions only on the hosts you intend. Always be defensive with your configurations, and you'll avoid unintended changes. Up until this point, we have purposely made our task files safe to run on any operating system and hardware architecture by limiting the cases when an action will actually trigger, and we will continue to do so.

Now it's time to reimage etchlamp via FAI, and make sure that the DNS service is fully configured and working when we set up etchlamp from scratch. Always ensure that your automation system works from start to finish. The etchlamp host's minimal install and configuration work will take under an hour, so the effort and time is well worth it.

While etchlamp is reimaging, remove the old installation's cfengine public key on the cfengine master because the reimaging process will generate a new key. The host etchlamp has the IP 192.168.1.239, so run this command on goldmaster as the root user:

# rm /var/lib/cfengine2/ppkeys/root-192.168.1.239.pub

When etchlamp reboots after installation, the cfengine daemons don't start up because we have only the bootstrap update.conf and cfagent.conf files in /var/lib/cfengine2/inputs. We need to make sure that cfagent runs once upon every reboot. Modify /srv/fai/config/scripts/FAIBASE/50-cfengine on the FAI server to add a line that will run cfagent upon every boot, mainly to help on the first boot after installation:

#! /usr/sbin/cfagent -f

control:
   any::
   actionsequence = ( editfiles )
   EditFileSize = ( 30000 )
editfiles:
   any::
        { ${target}/etc/aliases
          AutoCreate
          AppendIfNoSuchLine "root: [email protected]"
        }

        { ${target}/etc/default/cfengine2
          ReplaceAll "=0$" With "=1"
        }

        { ${target}/etc/init.d/bootmisc.sh
          AppendIfNoSuchLine "/usr/sbin/cfagent -qv"
        }

This configures the cfagent program to run from the /etc/init.d/bootmisc.sh file at boot time. So, to recap: We started another reimage of etchlamp and removed /var/lib/cfengine2/ppkeys/root-192.168.1.239.pub again on the cfengine master while the host was reimaging.

The host etchlamp returned from reimaging fully configured, with cfengine running. Now every time a Debian host boots at our site after FAI installs it, it will run cfagent during boot. Without logging into the host (i.e., without manual intervention), you can run a DNS query against etchlamp successfully:

$ dig @etchlamp gw.campin.net

; <<>> DiG 9.3.4-P1.1 <<>> @etchlamp gw.campin.net
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59779
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; QUESTION SECTION:
;gw.campin.net.                 IN      A

;; ANSWER SECTION:
gw.campin.net.          600     IN      A       192.168.1.1

;; AUTHORITY SECTION:
campin.net.             600     IN      NS      etchlamp.campin.net.
;; ADDITIONAL SECTION:
etchlamp.campin.net.     600     IN     A       192.168.1.239

;; Query time: 1 msec
;; SERVER: 192.168.1.239#53(192.168.1.239)
;; WHEN: Wed Jul 30 00:39:52 2008
;; MSG SIZE rcvd: 86

What we have accomplished here is worth celebrating. If you suffer total system failure on the host etchlamp, you can simply reimage a new host with the same host-name and bring it back onto the network as a DNS server. This is exactly what we want of all hosts at our site. As you deploy web servers, NFS servers, and other system roles, you should test that the host can be reimaged and properly configured to serve its designated function again without any human intervention. The extent of human involvement should be to identify hardware and do any Kickstart/FAI/JumpStart configuration needed to support imaging that piece of hardware.

We have a private DNS server now, and although it's the only one, we'll configure the /etc/resolv.conf files across all our hosts to utilize the new DNS server before any other DNS servers. We'll still list our existing DNS server, 192.168.1.1, as the second nameserver in /etc/resolv.conf in case etchlamp becomes unreachable.

Cfengine has a resolve action that you can use to configure the /etc/resolv.conf file. We'll create a task called tasks/os/cf.resolv_conf and test whether we have resolv.conf in a directory where postfix is chrooted by default on Debian:

classes:
        have_postfix_resolv  = ( FileExists(/var/spool/postfix/etc/resolv.conf) )

Here's something we've never done before—change the actionsequence in a task file:

control:
        any::
                addinstallable            = ( reloadpostfix )
                actionsequence          = ( resolve )
                EmptyResolvConf     = (true )

The preceding code adds resolve to the actionsequence. We can add it to the global actionsequence defined in the control/cf.control_cfagent_conf file that's imported directly from cfagent.conf, but there's really no need. We'll generally add actionsequence items there, but we wanted to demonstrate that we still have some flexibility in our cfengine configurations.

The order of the IP addresses and comment is preserved in the /etc/resolv.conf file:

resolve:
        any::
                #
                # If EmptyResolvConf is set to true, we'll completely wipe out
                # resolv.conf EVEN if we have no matches in the below classses!
                #
                # When EmptyResolvConf is set, always be sure that you have an
                # any class to catch all hosts with some basic nameserver entries.
                #
                192.168.1.239
                192.168.1.1
                "# resolv.conf edited by cfengine, don't muck with this"

We added the comment so that if any SAs want to change /etc/resolv.conf directly with a text editor, they'll realize that the file is under cfengine control.

We use the local copy to keep postfix name resolution working properly after cfen-gine updates the /etc/resolv.conf file and to restart postfix when we do the copy:

copy:

        # this is a local copy to keep the chroot'd postfix resolv.conf up to date
        have_postfix_resolv::
                /etc/resolv.conf
                        dest=/var/spool/postfix/etc/resolv.conf
                        mode=644
                        owner=root
                        group=root
                        type=checksum
                        define=reloadpostfix

shellcommands:
        # reload postfix when we update the chroot resolv.conf
        debian.reloadpostfix::
                "/etc/init.d/postfix restart" timeout=30 inform=true

Next, add the task to PROD/inputs/hostgroups/cf.any. Once the task is enabled, we connect to the host aurora and inspect the new /etc/resolv.conf:

# cat /etc/resolv.conf
domain campin.net
nameserver 192.168.1.239
nameserver 192.168.1.1
# resolv.conf edited by cfengine, don't muck with this

Then test name resolution:

# nslookup gw
Server:         192.168.1.239
Address:        192.168.1.239#53

Name:   gw.campin.net
Address: 192.168.1.1

We're done with the DNS for now. When we get more hardware to deploy another Debian-based DNS server system, we'll add it to the caching_dns_servers class, let cfengine set up BIND, then update cf.resolv_conf to add another nameserver entry to all our site's /etc/resolv.conf files.

Taking Control of User Account Files

We need to take control of the user accounts at our site. Every site eventually needs a centralized mechanism the SA staff can use to create and delete accounts, lock them out after a designated number of failed logins, and log user access. This will be usually a system such as NIS/NIS+, LDAP, or perhaps LDAP combined with Kerberos.

At this point, we're not talking about setting up a network-based authentication system—we're not ready for that yet. First, we need to take control of our local account files: /etc/passwd, /etc/shadow, and /etc/group. Even if we already had LDAP deployed at our site and all our users had accounts only in the LDAP directory, we would need to be able to change the local root account password across all our systems on a regular basis. In addition, we normally change the default shell on many system accounts that come with the system, for added security. Allowing local account files to go unmanaged is a security risk.

Standardizing the Local Account Files

We have three different sets of local account files at our site: those for Red Hat, Solaris, and Debian. We're going to standardize the files for each system type, and synchronize those files to each system from our central cfengine server on a regular basis. Over time, we'll need to add accounts to the standard files to support new software (e.g., a "mysql" user to run the MySQL database software). We will never add them directly onto the client systems; instead, we will add them to the centralized files.

We have only two installed instances of each OS type, so it's easy to copy all the files to a safe location and consolidate them. Because we're copying the shadow files, the location should be a directory with restrictive permissions:

# mkdir -m 700 /root/authfiles
# cd /root/authfiles
# for host in goldmaster rhmaster rhlamp ethlamp hemingway aurora ;
do for file in passwd shadow group ; do [ -d $file ] || mkdir -m 700 $file ;
scp root@${host}:/etc/$file ${file}/${file}.$host ; done ; done

These commands will iterate over all our hosts and copy the three files we need to a per-file subdirectory, with a file name that includes the hostname of the system that the file is from. We will illustrate standardization of account files for our two Solaris hosts only, to keep this section brief. Assume that we will perform the same process for Debian and Red Hat.

Now you can go into each directory and compare the files from the two Solaris hosts:

# cd /root/authfiles/passwd
# diff passwd.aurora passwd.hemingway
12a13,14
> postgres:x:90:90:PostgreSQL Reserved UID:/:/usr/bin/pfksh
> svctag:x:95:12:Service Tag UID:/:

The hemingway host has two accounts that weren't created on aurora. We won't need the postgres account, used to run the freeware Postgres database package. We will keep the svctag account because the Solaris serial port–monitoring facilities use it.

# mv passwd.hemingway passwd.solaris
# rm passwd.aurora

Edit passwd.solaris and remove the line starting with postgres. Now the passwd.solaris file contains the accounts we need on both systems. We will use this as our master Solaris password file.

Go through the same procedure for the Solaris shadow files:

# cd ../shadow
# diff shadow.hemingway shadow.aurora
13,14d12
< postgres:NP:::::::
< svctag:*LK*:6445::::::
# mv shadow.hemingway shadow.solaris
# rm shadow.aurora

Use a text editor to remove the postgres line from shadow.solaris as well.

Here's the procedure for the group file:

# diff group.hemingway group.aurora
17d16
< postgres::90:
20a20
> sasl::100:

We have a postgres group on hemingway that we'll remove, and a sasl group on aurora that we'll keep. SASL is the Simple Authentication and Security Layer, which you use to insert authentication into network protocols. We might end up needing this if we set up authenticated Simple-mail Transfer Protocol (SMTP) or another authenticated network protocol later on.

# mv group.aurora group.solaris
# rm group.hemingway

Now we'll move our new files into the directories we created for these files (back when we originally created our masterfiles directory in Chapter 5).

# scp group/group.solaris
goldmaster:/var/lib/cfengine2/masterfiles/PROD/repl/root/etc/group/
# scp passwd/passwd.solaris
goldmaster:/var/lib/cfengine2/masterfiles/PROD/repl/root/etc/passwd/
# scp shadow/shadow.solaris
goldmaster:/var/lib/cfengine2/masterfiles/PROD/repl/root/etc/shadow/

Now perform the same decision-making process for the Red Hat and Debian account files. When you're done, move them into the proper place in the masterfiles directories as you did for the Solaris account files. You need to be careful during this stage that you don't change the UID or GID of system processes without setting up some remediation steps in cfengine.

Our two Debian systems ended up with different UID and GID numbers for the postfix user and group, as well as for the postdrop group (also used by postfix). We chose to stick with the UID and GID from the goldmaster host, and to add some permission fixes in a cfengine task that will fix the ownership of the installed postfix files and directories.

Once we've standardized all our files, we have these files on the cfengine master system:

# pwd
/var/lib/cfengine2/masterfiles/PROD/repl/root/etc
# ls passwd/ shadow/ group
group:
./  ../  group.debian  group.redhat  group.solaris

passwd/:
./  ../  passwd.debian  passwd.redhat  passwd.solaris

shadow/:
./  ../  shadow.debian  shadow.redhat  shadow.solaris

Distributing the Files with cfengine

We'll develop a cfengine task to distribute our new master account files. We will add some safety checks into this task because we need to treat these files with the utmost caution.

We'll place the file in a task called cf.account_sync, with these contents:

classes:  # synonym groups:
                safe_to_sync    = (     debian_4_0
                                            redhat_s_5_2
                                            sunos_5_10
                                           )

We create a group to control which classes of systems get the account-file synchronization. These three classes encompass all the systems we're currently running at our site. We do this because we know our account files will work on the UNIX/Linux versions that we're currently running, but we don't know if they will work on older or newer versions. In fact, if you don't know for sure that something will work, you should assume that it won't.

So if you deploy a new type of system at your site, you run the risk that the new system type won't have local account files synchronized by cfengine. Take measures to detect this situation in the task, and alert the site administrators:

control:
        debian::
                passwd_file    = ( "passwd.debian" )
                shadow_file    = ( "shadow.debian" )
                group_file    = ( "group.debian" )

        redhat::
                passwd_file    = ( "passwd.redhat" )
                shadow_file    = ( "shadow.redhat" )
                group_file    = ( "group.redhat" )

        solaris|solarisx86::
                passwd_file    = ( "passwd.solaris" )
                shadow_file    = ( "shadow.solaris" )
                group_file    = ( "group.solaris" )

Here you'll recognize the standardized files we created earlier.

copy:
        safe_to_sync::
                $(master_etc)/passwd/$(passwd_file)
                        dest=/etc/passwd
                        mode=644
                        server=$(fileserver)
                        trustkey=true
                        type=checksum
                        owner=root
                        group=root
                        encrypt=true
                        verify=true
                        size=>512
                $(master_etc)/shadow/$(shadow_file)
                        dest=/etc/shadow
                        mode=400
                        owner=root
                        group=root
                        server=$(fileserver)
                        trustkey=true
                        type=checksum
                        encrypt=true
                        size=>200

                $(master_etc)/group/$(group_file)
                        dest=/etc/group
                        mode=644
                        owner=root
                        group=root
                        server=$(fileserver)
                        trustkey=true
                        type=checksum
                        encrypt=true
                        size=>200

The size keyword in these copy stanzas adds file-size minimums for the passwd, shadow, and group file copies. We use this keyword so we don't copy out empty or erroneously stripped down files. The minimums should be around half the size of the smallest version that we have of that particular file. You might need to adjust the minimums if the files happen to shrink later on. Usually these files grow in size.

Here we define an alert for hosts that don't have local account files to synchronize:

alerts:
        !safe_to_sync::
                "I am not set up to sync my account files, please check on it."

The alerts action simply prints text used to alert the system administrator. The cfexecd daemon will e-mail this output.

Next, put the task into the cf.any hostgroup:

import:
        any::
                tasks/os/cf.motd
                tasks/os/cf.cfengine_cron_entries
                tasks/os/cf.ntp
                tasks/os/cf.account_sync

When cfagent performs a copy, and the repository variable is defined, the version of the file before the copy is backed up to the repository directory. Define repository like this in PROD/inputs/control/cf.control_cfagent_conf:

repository              = ( $(workdir)/backups )

This means you can see the old local account files in the backup directory on each client after the copy. On Debian the directory is /var/lib/cfengine2/backups, and on the rest of our hosts it's /var/cfengine/backups.

If you encounter any problems, compare the previous and new versions of the files, and see if you left out any needed accounts. Be aware that each performed copy overwrites previous backup files in the repository directory. This means you'll want to validate soon after the initial sync. We also saved the original files in the home directory for the root user. It's a good idea to store them for at least a few days in case you need to inspect them again.

Our etchlamp system had the postfix account's UID and GID change with this local account sync. The GID of the postdrop group also changed. We can fix that with cfengine, in a task we call cf.postfix_permissions:

classes:  # synonym groups:
        have_var_spool_postfix          = ( IsDir("/var/spool/postfix") )
        have_var_spool_postfix_public   = ( IsDir("/var/spool/postfix/public") )
        have_var_spool_postfix_maildrop = ( IsDir("/var/spool/postfix/maildrop") )
        have_usr_sbin_postdrop          = ( IsDir("/usr/sbin/postdrop") )
        have_usr_sbin_postqueue         = ( IsDir("/usr/sbin/postqueue") )

Here we have some classes based on whether files or directories are present on the system. We don't want to assume that postfix is installed on the system. We previously added postfix into the list of FAI base packages, but we can't guarantee with absolute certainty that every Debian system we ever manage will be running postfix.

We could use a more sophisticated test, such as verifying that the postfix Debian package is installed, but a simple directory test suffices and happens quickly:

directories:
        debian.have_var_spool_postfix_public::
                /var/spool/postfix/public mode=2710
                           owner=postfix group=postdrop inform=true

        debian.have_var_spool_postfix_maildrop::
                /var/spool/postfix/maildrop mode=1730
                           owner=postfix group=postdrop inform=true
        debian.have_var_spool_postfix::
                /var/spool/postfix/active mode=700 owner=postfix
                                                        group=root inform=true
                /var/spool/postfix/bounce mode=700 owner=postfix
                                                          group=root inform=true
                /var/spool/postfix/corrupt mode=700 owner=postfix
                                                          group=root inform=true
                /var/spool/postfix/defer mode=700 owner=postfix
                                                       group=root inform=true
                /var/spool/postfix/deferred mode=700 owner=postfix
                                                            group=root inform=true
                /var/spool/postfix/flush mode=700 owner=postfix
                                                       group=root inform=true
                /var/spool/postfix/hold mode=700 owner=postfix
                                                     group=root inform=true
                /var/spool/postfix/incoming mode=700 owner=postfix
                                                              group=root inform=true
                /var/spool/postfix/private mode=700 owner=postfix
                                                          group=root inform=true
                /var/spool/postfix/trace mode=700 owner=postfix
                                                      group=root inform=true

Here we make sure that all the postfix spool directories have the correct ownership and permissions. If you blindly create the directories without verifying that /var/spool/postfix is already there, it'll appear as if postfix is installed when it isn't. This might seem like a minor detail, but the life of an SA comprises a large collection of minor details such as this. Creating confusing situations such as unused postfix spool directories is just plain sloppy, and you should avoid doing so.

Here we ensure that two important postfix binaries have the SetGID bit set, as well as proper ownership:

files:
        debian.have_usr_sbin_postqueue::
                /usr/sbin/postqueue m=2555 owner=root  group=postdrop
                                                 action=fixall inform=true

        debian.have_usr_sbin_postdrop::
                /usr/sbin/postdrop m=2555 owner=root  group=postdrop
                                              action=fixall inform=true

At any time you can validate that postfix has the proper permissions by executing this line:

# postfix check

You'll also want to restart any daemons that had their process-owner UID change after you fixed file and directory permissions.

Now we'll put the task into the cf.any hostgroup:

import:
        any::
                tasks/os/cf.motd
                tasks/os/cf.cfengine_cron_entries
                tasks/os/cf.ntp
                tasks/os/cf.account_sync
                tasks/os/cf.postfix_permissions

You're probably wondering why we put the cf.postfix_permissions task into the cf.any hostgroup, when it performs actions only on Debian hosts. We did this because we might end up having to set postfix permissions on other platforms later. The task does nothing on host types for which it's not intended, so you face little risk of damage.

From this point on, when you install new packages at your site that require additional local system accounts, manually install on one host (of each platform) as a test. When you (or the package) find the next available UID and GID for the account, you can add the account settings into your master passwd, shadow, and group files for synchronization to the rest of your hosts. That way, when you deploy the package to all hosts via cfengine, the needed account will be in place with the proper UID and GID settings. This is another example of how the first step in automating a procedure is to make manual changes on test systems.

Adding New User Accounts

Now you can add user accounts at your site. We didn't want to add a single user account before we had a mechanism to standardize UIDs across the site. The last thing we need is to deploy LDAP or a similar service later on, and have a different UID for each user account—on many systems. We have avoided that mess entirely.

At this point, you can simply add users into the centralized account files stored on the cfengine master. New users won't automatically have a home directory created, but later in the chapter we'll address that issue using a custom adduser script, an NFS-mounted home directory, and the automounter.

Using Scripts to Create User Accounts

You shouldn't ever create user accounts manually by hand-editing the centralized passwd, shadow, and group files at your site. We'll create a simple shell script that chooses the next available UID and GID, prompts for a password, and properly appends the account information into the account files.

We'll make the script simple because we don't intend to use it for long. Before we even write it, we need to consider where we'll put it. We know that it is the first of what will surely be many administrative scripts at our site. When we first created the masterfiles directory structure, we created the directory PROD/repl/admin-scripts/, which we'll put into use now.

We'll copy the contents of this directory to all hosts at our site, at a standard location. We've created a cfengine task to do this, called cf.sync_admin_scripts:

copy:
        any::
                $(master)/repl/admin-scripts
                        dest=/opt/admin-scripts
                        mode=550
                        owner=root
                        group=root
                        type=checksum
                        server=$(fileserver)
                        encrypt=true
                        r=inf
                        purge=true

directories:
        any::
                /opt/admin-scripts  mode=750 owner=root group=root inform=false

We're copying every file in that directory, making sure each is protected from non-root users and executable only for members of the root group. Because we haven't set up special group memberships yet, SA staff will need to become root to execute these scripts—for now, anyway. Remember that our actionsequence specifies that directories runs before copy, so the directory will be properly created before the copy is attempted.

Add this entry to the end of the cf.any hostgroup:

tasks/misc/cf.sync_admin_scripts

You place the task in the misc directory because it's not application-specific and it doesn't affect part of the core operating system. Now you can utilize a collection of administrative scripts that is accessible across the site. You can create the new user script and place it in there. The script itself will have checks to make sure it is running on the appropriate master host.

We call the script add_local_user, and we don't append a file suffix such as .sh. This way, we can rewrite it later in Perl or Python and not worry about a misleading file suffix. UNIX doesn't care about file extensions, and neither should you.

#!/bin/sh
############################################################################
# This script was written to work on Debian Linux, specifically the Debian host
# serving as the cfengine master at our site. Analysis should be done before
# attempting to run elsewhere.
############################################################################
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/opt/admin-scripts

# this is the deepest shared directory for all the
#  passwd/shadow/group files
BASE_PATH=/var/lib/cfengine2/masterfiles/PROD/repl/root/etc
USERNAME_FILE=/var/lib/cfengine2/masterfiles/PROD/repl/root/etc/USERFILE

case 'hostname' in
goldmaster*)
        echo "This is the proper host on which to add users, continuing..."
        ;;
*)
        echo "This is NOT the proper host on which to add users, exiting now..."
        exit 1
        ;;
esac

We have only one cfengine master host that has the centralized files, so make sure we're running on the correct host before moving on. We also define a file, which we'll use later, to store usernames for accounts that we create:

cd $BASE_PATH

LOCKFILE=/root/add_user_lock

rm_lock_file() {
        rm -f $LOCKFILE
}

# don't ever run two of these at once
lockfile $LOCKFILE || exit 1

We define a file to use for locking to ensure that we run only one instance of this script at a time. We use methods that should prevent files from getting corrupted, but if two script instances copy an account file at the same time, update it, then copy it back into place, one of those instances will have its update overwritten.

Now collect some important information about the user account:

# We REALLY need to sanity check what we accept here, before blindly
# trusting the values, that's an excercise for the reader.
echo "Please specify a username for your new account, 8 chars or less: "
read USERNAME

echo "Please give the person's full name for your new account: "
read GECOS

stty -echo
echo "Please specify a password for your new account: "
read PASSWORD
stty echo

Later we should add some logic to test that the password meets certain criteria. The eight-character UNIX username limit hasn't applied for years on any systems that we run, but we observe the old limits just to be safe.

Here we generate an encrypted password hash for our shadow files:

ENC_PASS='echo $PASSWD | mkpasswd -s'

You can add -H md5 to generate an MD5 hash, which is more secure. We've chosen to use the lowest common denominator here, in case we inherit some old system. Which type of hash you choose is up to you.

Now create the file containing the next available UID, if it doesn't already exist:

[ -f "$BASE_PATH/NEXTUID" ] || echo 1001 > $BASE_PATH/NEXTUID

Collect the UID and GID to use for the account. Always use the same number for both:

NEXTUID='cat $BASE_PATH/NEXTUID'

Test that the value inside the NEXTUID file is numerically valid. We would hate to create an account with an invalid UID:

if [ -n "$NEXTUID" -a $NEXTUID -gt 1000 ]
then
        echo "Our next UID appears valid, continuing..."
else
        echo "The $BASE_PATH/NEXTUID file appears to be corrupt, please image
investigate."
        echo "Exiting now..."
        exit 1
fi

Here we set up the formatting of our account-file entries, to be used in the next section:

SEC_SINCE_EPOCH='date +%s'
GROUP_FORMAT="$USERNAME:x:$NEXTUID:"
PASSWD_FORMAT="$USERNAME:x:$NEXTUID:$NEXTUID:$GECOS:/home/$USERNAME:/bin/bash"
SHADOW_FORMAT="$USERNAME:$ENC_PASS:$SEC_SINCE_EPOCH:7:180:14:7::"

If you use this script, you need to set values for the shadow fields that make sense at your site. The meanings are:

1   login name
2   encrypted password
3   days since Jan 1, 1970 that password was last changed
4   days before password may be changed
5   days after which password must be changed
6   days before password is to expire that user is warned
7   days after password expires that account is disabled
8   days since Jan 1, 1970 that account is disabled
9   a reserved field (unused)

The script continues:

for groupfile in group/group*
do
       cp $groupfile ${groupfile}.tmp &&
       echo $GROUP_FORMAT >> ${groupfile}.tmp &&
       mv ${groupfile}.tmp $groupfile ||
       ( echo "Failed to update $groupfile - exiting now." ; rm_lock_file ; exit 1 )
done

for shadowfile in shadow/shadow*
do
       cp $shadowfile ${shadowfile}.tmp &&
       echo $SHADOW_FORMAT >> ${shadowfile}.tmp &&
       mv ${shadowfile}.tmp $shadowfile ||
       ( echo "Failed to update $shadowfile - exiting now." ; rm_lock_file ; image
exit 1 )

done

for passwdfile in passwd/passwd*
do
      cp $passwdfile ${passwdfile}.tmp &&
      echo $PASSWD_FORMAT >> ${passwdfile}.tmp &&
      mv ${passwdfile}.tmp $passwdfile ||
      ( echo "Failed to update $passwdfile - exiting now." ; rm_lock_file ; exit 1 )
done

Update each of the files in the group, shadow, and password directories. Make a copy of the file (i.e., cp $passwdfile ${passwdfile}.tmp), update it (i.e., echo $PASSWD_FORMAT >> ${passwdfile}.tmp), then use the mv command to put it back into place (i.e., mv ${passwdfile}.tmp $passwdfile).

The mv command makes an atomic update when moving files within the same filesystem. This means you face no risk of file corruption from the system losing power or our process getting killed. The command will either move the file into place, or it won't work at all. SAs must make file updates this way. The script will exit with an error if any part of the file-update process fails:

# update the UID file
NEWUID='expr $NEXTUID + 1'
echo $NEWUID > $BASE_PATH/NEXTUID ||
( echo "Update of $BASE_PATH/NEXTUID failed, exiting now" ; rm_lock_file ; exit 1 )

Update the file used to track the next available UID:

# update a file used to create home dirs on the NFS server
if [ ! -f $USERNAME_FILE ]
then
        touch $USERNAME_FILE
fi

cp $USERNAME_FILE ${USERNAME_FILE}.tmp &&
echo $USERNAME >> ${USERNAME_FILE}.tmp &&
mv ${USERNAME_FILE}.tmp $USERNAME_FILE ||
( echo "failed to update $USERNAME_FILE with this user's account name."
    rm_lock_file ; exit 1 )

We store all new user accounts in a text file on the cfengine master system. We'll write another script (PROD/repl/admin-scripts/setup_home_dirs from the next section) that uses this file to create central home directories. The script ends with a cleanup step:

# if we get here without errors, clean up
rm_lock_file

Put this script in the previously mentioned admin-scripts directory, and run it from there on the goldmaster host when a new account is needed.

We've left one exercise for the reader: the task of removing accounts from the centralized account files. You'll probably want to use the procedure in which you edit a temporary file and mv it into place for that task. If the process or system crashes during an update of the account files, corrupted files could copy out during the next scheduled cfengine run. Our size minimums might catch this, but in such a scenario the corrupted files might end up being large, resulting in a successful copy and major problems.

NFS-Automounted Home Directories

We installed the host aurora to function as the NFS server for our future web application. We should also configure the host to export user home directories over NFS.

Configuring NFS-Mounted Home Directories

We'll configure the NFS-share export and the individual user's home directory creation with a combination of cfengine configuration and a script that's used by cfengine.

Put this line into PROD/inputs/classes/cf.main_classes:

homedir_server          = (     aurora )

Create the file PROD/inputs/hostgroups/cf.homedir_server with these contents:

import:
        any::
                tasks/app/nfs/cf.central_home_dirs

Create the file PROD/inputs/tasks/app/nfs/cf.central_home_dirs with these contents:

control:
        any::
                addinstallable = ( create_homedirs enable_nfs )

copy:
        homedir_server.(solaris|solarisx86)::
                $(master_etc)/USERFILE
                        dest=/export/home/USERFILE
                        mode=444
                        owner=root
                        group=root
                        type=checksum
                        server=$(fileserver)
                        encrypt=true
                        define=create_homedirs

                $(master_etc)/skel
                        dest=/export/home/skel
                        mode=555
                        owner=root
                        group=root
                        type=checksum
                        server=$(fileserver)
                        encrypt=true
                        r=inf
directories:
        homedir_server.(solaris|solarisx86)::
                /export/home  mode=755 owner=root group=root inform=false

shellcommands:
        homedir_server.create_homedirs.(solaris|solarisx86)::
                "/opt/admin-scripts/setup_home_dirs"
                        timeout=300 inform=true

        homedir_server.enable_nfs.(solaris|solarisx86)::
                "/usr/sbin/svcadm enable network/nfs/server"
                        timeout=60 inform=true

editfiles:

        homedir_server.(solaris|solarisx86)::
                { /etc/dfs/dfstab
                        AppendIfNoSuchLine  "share -F nfs -o rw,anon=0 /export/home"
                        DefineClasses           "enable_nfs"
                }

This should all be pretty familiar by now. The interesting part is that we sync the USERFILE file, and when it is updated we call a script that creates the needed accounts. This is the first NFS share for the host aurora, so we enable the NFS service when the share is added to /etc/dfs/dfstab.

Create a file at PROD/repl/admin-scripts/setup_home_dirs to create the home directories:

#!/bin/sh
# distributed by cfengine, don't edit locally
PATH=/usr/sbin:/usr/bin:/opt/csw/bin

USERFILE=/export/home/USERFILE

for user in 'cat $USERFILE'
do
        USERDIR=/export/home/$user
        if [ ! -d $USERDIR ]
        then
                cp -r /export/home/skel $USERDIR
                chmod 750 $USERDIR
                chown -R ${user}:${user} $USERDIR
        fi
done

Now that the task is done, enable it in the file PROD/inputs/hostgroups/cf.hostgroup_mappings with this entry:

homedir_server::                hostgroups/cf.homedir_server

Our home-directory server is ready for use by the rest of the hosts on the network.

Configuring the Automounter

Sites often utilize the automounter to mount user home directories. Instead of mounting the home NFS share from all client systems, the automounter mounts individual users' home directories on demand. After a period of no access (normally after the user is logged out for a while), the share is unmounted. Automatic share unmounting results in less maintenance, and it doesn't tax the NFS server as much. Note that most automounter packages can mount remote filesystem types other than NFS.

We're missing the autofs package in our base Debian installation. At this point, we add the autofs package to the /srv/fai/config/package_config/FAIBASE list of packages, so that future Debian installations have the required software. The package already exists on our Red Hat and Solaris installations.

The file names for the automounter configuration files vary slightly between Linux and Solaris. We'll create the needed configuration files and put them into our masterfiles repository. We created an autofs directory at PROD/repl/root/etc/autofs when we first set up our file repository in Chapter 5.

The files we'll utilize and configure on Linux are /etc/auto.master and /etc/auto.home. On Solaris, the files are /etc/auto_master and /etc/auto_home. The auto.master and auto_master files map filesystem paths to files that contain the commands to mount a remote share at that path. The auto.home and auto_home files have the actual mount commands.

Our auto.master and auto_master files each contain only a single line:

/home   /etc/auto.home

Our auto.home and auto_home files are identical, and contain only a single line:

*      -nolock,rsize=32767,wsize=32767,proto=tcp,hard,intr,timeo=8,nosuid,retrans=5
  aurora:/export/home/&

Note  The single line in the auto_home and auto.home files is shown as two lines due to publishing line-length limitations. It is important that you create the entry as a single line in your environment. You can download all the code for this book from the Downloads section of the Apress web site at http://www.apress.com.


We have a number of mount options listed, but the important thing to note is that we use a wildcard pattern on the left to match all paths requested under /home. The wildcard makes the file match /home/nate as well as /home/kirk, and look for the same path (either nate or kirk) in the share on aurora, using the ampersand at the end of the line.

Next, we create a task to distribute the files at PROD/inputs/tasks/os/cf.sync_autofs_maps. This task follows what is becoming a common procedure for us, in which we define some variables to hold different file names appropriate for different hosts or operating systems, then synchronize the files, then restart the daemon(s) as appropriate:

control:
        any::
                addinstallable          = (     restartautofs )

                AllowRedefinitionOf     = (
                                                auto_master
                                                auto_home
                                                )

        linux::
                auto_master     = ( "auto.master" )
                auto_home       = ( "auto.home" )
                auto_net        = ( "auto.net" )
                etc_auto_home   = ( "/etc/auto.home" )
                etc_auto_master = ( "/etc/auto.master" )

        (solaris|solarisx86)::
                auto_master     = ( "auto_master" )
                auto_home       = ( "auto_home" )
                auto_net        = ( "auto_net" )
                etc_auto_home   = ( "/etc/auto_home" )
                etc_auto_master = ( "/etc/auto_master" )

copy:
        any::
                $(master_etc)/autofs/$(auto_master)
                        dest=$(etc_auto_master)
                        mode=444
                        owner=root
                        group=root
                        server=$(fileserver)
                        trustkey=true
                        type=checksum
                        encrypt=true
                        define=restartautofs

                $(master_etc)/autofs/$(auto_home)
                        dest=$(etc_auto_home)
                        mode=444
                        owner=root
                        group=root
                        server=$(fileserver)
                        trustkey=true
                        type=checksum
                        encrypt=true
                        define=restartautofs

shellcommands:
        (debian|redhat).restartautofs::
                # when config is updated, restart autofs
                "/etc/init.d/autofs reload"
                        timeout=60 inform=true

        (solaris|solarisx86).restartautofs::
                # when config is updated, restart autofs
                "/usr/sbin/svcadm restart autofs"
                        timeout=180 inform=false

processes:
        debian|redhat::
                "automount" restart "/etc/init.d/autofs start" inform=true

   solaris|solarisx86::
      "/usr/sbin/svcadm enable autofs ; /usr/sbin/svcadm restart autofs" inform=true

We start the automounter when the process isn't found in the process list. We attempt to enable the NFS service on Solaris when it's not running, then we try to restart it. We don't know what the problem is when it's not running on Solaris, so the enable step seems like a logical solution to one possible cause.

Import this task into PROD/inputs/hostgroups/cf.any to give all your hosts a working automounter configuration.

We now have a system to add users, and we also have a shared home-directory server. This should suffice until you can implement a network-enabled authentication scheme later.

Routing Mail

Mail is the primary message-passing mechanism at UNIX-based sites. You use-mail to notify users of cron-job output, cfexecd sends cfagent output via e-mail, and many application developers and SAs utilize e-mail to send information directly from applications and scripts.

Mail relays on internal networks route e-mail and queue it up for the rest of the hosts on the network when remote destinations become unreachable. You should centralize disk space and CPU resources needed for mail queuing and processing. In addition, it's simpler to configure a centralized set of mail relays to handle special mail-routing tables and aliases than it is to configure all the-mail-transfer agents on all machines at a site.

We'll use our etchlamp Debian host as our site's mail relay. We've built this host entirely using automation, so it's the sensible place to continue to focus infrastructure services.

We add a CNAME for relayhost.campin.net to PROD/repl/root/etc/bind/debian-ext/db.campin.net, and it'll simply go out to etchlamp on the next cfexecd run:

relayhost        IN      CNAME   etchlamp

Be sure to increment the serial number in the zone file.

We run postfix on all our Debian hosts, and we'll stick with postfix as our mail-relay Mail Transfer Agent (MTA). The default postfix configuration on etchlamp needs some modifications from the original file placed in /etc/postfix/main.cf. Modify the file like this:

smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# TLS parameters
smtpd_tls_cert_file=/etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file=/etc/ssl/private/ssl-cert-snakeoil.key
smtpd_use_tls=yes
smtpd_tls_session_cache_database = btree:${queue_directory}/smtpd_scache
smtp_tls_session_cache_database = btree:${queue_directory}/smtp_scache

myhostname = campin.net
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
mydestination = campin.net

myorigin = campin.net
mynetworks = 127.0.0.0/8, 192.168.1.0/24
mailbox_command = procmail -a "$EXTENSION"
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
virtual_maps = hash:/etc/postfix/virtual

Next, create a file that we'll copy to /etc/postfix/virtual on the-mail relay:

campin.net              OK
@campin.net             [email protected]

We use the virtual-domain functionality of postfix to alias the entire campin.net domain to one e-mail address: [email protected]. This ensures that any mail sent will arrive in the SA team's mailbox (hosted with an e-mail hosting provider). Later, we can use the same virtual table to forward specific e-mail addresses to other destinations, instead of the single catch-all address we're using now.

When the source file /etc/postfix/virtual is updated, we need to run this command as root:

# /usr/sbin/postmap /etc/postfix/virtual

This builds a new /etc/postfix/virtual.db file, which is what postfix actually uses. We'll configure cfengine to perform that step for us automatically.

Place the two files in a replication directory on the cfengine master (goldmaster), and also create a new directory under the tasks hierarchy intended for postfix:

# mkdir /var/lib/cfengine2/masterfiles/PROD/repl/root/etc/postfix/
# cp main.cf virtual /var/lib/cfengine2/masterfiles/PROD/repl/root/etc/postfix/
# mkdir /var/lib/cfengine2/masterfiles/PROD/inputs/tasks/app/postfix

First, create a class called relayhost, and place the host etchlamp in it. Place this line in PROD/inputs/classes/cf.main_classes:

relayhost               = (     etchlamp )

Now create the task PROD/inputs/tasks/app/cf.sync_postfix_config with these contents:

control:
        debian_4_0.relayhost::
                main_cf      = ( "main.cf_debian-relayhost" )
                virtual         = ( "virtual-relayhost" )

copy:
        debian_4_0.relayhost::
                $(master_etc)/postfix/$(main_cf)
                                dest=/etc/postfix/main.cf
                                mode=444
                                owner=root
                                group=root
                                type=checksum
                                server=$(fileserver)
                                encrypt=true
                                # we already have reloadpostfix from
                                # tasks/os/cf.resolve_conf, we are reusing it
                                define=reloadpostfix

                $(master_etc)/postfix/$(virtual)
                                dest=/etc/postfix/virtual
                                mode=444
                                owner=root
                                group=root
                                type=checksum
                                server=$(fileserver)
                                encrypt=true
                                define=rebuild_virtual_map

We define variables for the virtual and main.cf files, and copy them individually. They're set up individually because different actions are required when the files are updated. We are careful to copy the configuration files that we've prepared only to Debian 4.0, using the debian_4_0 class. When Debian 5.0 ("Lenny") is released, we'll have to test our config files against the postfix version that it uses. We might have to develop a new "relayhost" postfix configuration file specifically for Lenny when we upgrade or reimage the "relayhost" system to use the newer Debian version. Once again, we assume that something won't work until we can prove that it will.

Here we use the copy action to rebuild the virtual map when it is updated:

shellcommands:
        rebuild_virtual_map::
                "/usr/sbin/postmap /etc/postfix/virtual ; /usr/sbin/postfix reload "
                        timeout=60 inform=true

Now we need another hostgroup file for the "relayhost" role. We create PROD/inputs/hostgroups/cf.relayhost with these contents:

import:
        any::
                tasks/app/postfix/cf.sync_postfix_config

Then to finish the job, map the new class to the hostgroup file by adding this line to PROD/inputs/hostgroups/cf.hostgroup_mappings:

relayhost::                     hostgroups/cf.relayhost

Now etchlamp is properly set up as our mail-relay host. When our network is larger, we can simply add another Debian 4.0 host to the relayhost class in PROD/inputs/control/cf.main_classes, thus properly configuring it as another mail relay. Then we just update the DNS to have two A records for relayhost.campin.net, so that the load is shared between the two. An additional benefit of having two hosts serving in the "relayhost" system role is that if one host fails, mail will still make it off our end systems.

You have several options to accomplish the task of configuring systems across the site to utilize the-mail relay. For example, you can configure Sendmail, qmail, and postfix in a "nullclient" configuration where they blindly forward all mail off the local system. Or you could use the local aliases file to forward mail as well. The method, and automation of that method, is left up to the reader. You should now have a solid understanding of how to use cfengine to automate these configuration changes once you've worked out the procedure on one or more test systems.

Looking Back

In a rather short amount of time, we've gone from having no systems at all to having a basic UNIX/Linux infrastructure up and running. This by itself might not be very interesting, but what is noteworthy is that everything we've done to set up our infrastructure was accomplished using automation.

If our DNS server (and mail-relay) host suffers a hard-drive crash, we will simply replace the drive and reimage the host using FAI and the original hostname. Cfengine will configure a fully functional replacement system automatically, with no intervention required by the SA staff. The benefits of this are obvious:

  • The risk of errors introduced during configuration of the replacement host is reduced to zero (or near zero). Any errors would be the result of further hardware issues.
  • The addition of new hosts to share the load of existing services is equally trivial: you need only to add additional hosts to the role-based classes in cfengine, and cfengine will configure the new host properly for you. From that point, the only steps are to update DNS records or configure applications to use the additional host(s).
  • The difficulty of training new SA staff is reduced. The applications in use at your site, along with the configurations used, are centralized in cfengine. The new SAs can simply read the cfengine and application-configuration files to get a complete picture of how things run at your site.

We now have sufficient core services in place at our site to support customer-facing applications. In the next chapter, we'll take advantage of that fact, and deploy a web site.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.243.64