#88 Mirroring a Website

Large, busy websites like Yahoo! operate a number of mirrors, separate servers that are functionally identical to the main site but are running on different hardware. While it's unlikely that you can duplicate all of their fancy setup, the basic mirroring of a website isn't too difficult with a shell script or two.

The first step is to automatically pack up, compress, and transfer a snapshot of the master website to the mirror server. This is easily done with the remotebackup script shown in Script #87, invoked nightly by cron.

Instead of sending the archive to your own mail address, however, send it to a special address named unpacker, then add a sendmail alias in /etc/aliases (or the equivalent in other mail transport agents) that points to the unpacker script given here, which then unpacks and installs the archive:

unpacker:    "|/home/taylor/bin/archive-unpacker"

You'll want to ensure that the script is executable and be sensitive to what applications are in the default PATH used by sendmail: The /var/log/messages log should reveal whether there are any problems invoking the script as you debug it.

The Code

#!/bin/sh

# unpacker - Given an input stream with a uuencoded archive from
# the remotearchive script, unpacks and installs the archive.

temp="/tmp/$(basename $0).$$"
home="${HOME:-/usr/home/taylor}"
mydir="$home/archive"
webhome="/usr/home/taylor/web"
notify="[email protected]"

( cat - > $temp  # shortcut to save stdin to a file

  target="$(grep "^Subject: " $temp | cut -d  -f2-)"

  echo $(basename $0): Saved as $temp, with $(wc -l < $temp) lines
  echo "message subject="$target""

  # Move into the temporary unpacking directory...

  if [ ! -d $mydir ] ; then
    echo "Warning: archive dir $mydir not found. Unpacking into $home"
    cd $home
    mydir=$home         # for later use
  else
    cd $mydir
  fi

  # Extract the resultant filename from the uuencoded file...

  fname="$(awk '/^begin / {print $3}' $temp)"

  uudecode $temp

  if [ ! -z "$(echo $target | grep 'Backup archive for')" ] ; then
    # All done. No further unpacking needed.
    echo "Saved archive as $mydir/$fname"
    exit 0
  fi

  # Otherwise, we have a uudecoded file and a target directory

  if [ "$(echo $target|cut -c1)" = "/" -o "$(echo $target|cut -c1−2)" = ".." ]
  then
    echo "Invalid target directory $target. Can't use '/' or '..'"
    exit 0
  fi

  targetdir="$webhome/$target"

  if [ ! -d $targetdir ] ; then
    echo "Invalid target directory $target. Can't find in $webhome"
    exit 0
  fi

  gunzip $fname
  fname="$(echo $fname | sed 's/.tgz$/.tar/g')"

  # Are the tar archive filenames in a valid format?

  if [ ! -z "$(tar tf $fname | awk '{print $8}' | grep '^/')" ] ; then
    echo "Can't unpack archive: filenames are absolute."
    exit 0
  fi

  echo ""
  echo "Unpacking archive $fname into $targetdir"

  cd $targetdir
  tar xvf $mydir/$fname | sed 's/^/  /g'

  echo "done!"

) 2>&1 | mail -s "Unpacker output $(date)" $notify

exit 0

How It Works

The first thing to notice about this script is that it is set up to mail its results to the address specified in the notify variable. While you may opt to disable this feature, it's quite helpful to get a confirmation of the receipt and successful unpacking of the archive from the remote server. To disable the email feature, simply remove the wrapping parentheses (from the initial cat to the end of the script), the entire last line in which the output is fed into the mail program, and the echo invocations throughout the script that output its status.

This script can be used to unpack two types of input: If the subject of the email message is a valid subdirectory of the webhome directory, the archive will be unpacked into that destination. If the subject is anything else, the uudecoded, but still compressed (with gzip), archive will be stored in the mydir directory.

One challenge with this script is that the file to work with keeps changing names as the script progresses and unwraps/unpacks the archive data. Initially, the email input stream is saved in $temp, but when this input is run through uudecode, the extracted file has the same name as it had before the uuencode program was run in Avoiding Disaster with a Remote Archive, Script #87. This new filename is extracted as fname in this script:

fname="$(awk '/^begin / {print $3}' $temp)"

Because the tar archive is compressed, $fname is something.tgz. If a valid subdirectory of the main web directory is specified in the subject line of the email, and thus the archive is to be installed, the value of $fname is modified yet again during the process to have a .tar suffix:

fname="$(echo $fname | sed 's/.tgz$/.tar/g')"

As a security precaution, unpacker won't actually unpack a tar archive that contains filenames with absolute paths (a worst case could be /etc/passwd: You really don't want that overwritten because of an email message received!), so care must be taken when building the archive on the local system to ensure that all filenames are relative, not absolute. Note that tricks like ../../../../etc/passwd will be caught by the script test too.

Running the Script

Because this script is intended to be run from within the lowest levels of the email system, it has no parameters and no output: All output is sent via email to the address specified as notify.

The Results

The results of this script aren't visible on the command line, but we can look at the email produced when an archive is sent without a target directory specified:

archive-unpacker: Saved as /tmp/unpacker.38198, with 1081 lines
message subject="Backup archive for Wed Sep 17 22:48:11 GMT 2003"
Saved archive as /home/taylor/archive/backup.030918.tgz

When a target directory is specified but is not available for writing, the following error is sent via email:

archive-unpacker: Saved as /tmp/unpacker.48894, with 1081 lines
message subject="mirror"
Invalid target directory mirror. Can't find in /web

And finally, here is the message sent when everything is configured properly and the archive has been received and unpacked:

archive-unpacker: Saved as /tmp/unpacker.49189, with 1081 lines
message subject="mirror"

Unpacking archive backup.030918.tar into /web/mirror
  ourecopass/
  ourecopass/index.html
  ourecopass/nq-map.gif
  ourecopass/nq-map.jpg
  ourecopass/contact.html
  ourecopass/mailform.cgi
  ourecopass/cgi-lib.pl
  ourecopass/lists.html
  ourecopass/joinlist.cgi
  ourecopass/thanks.html
  ourecopass/thanks-join.html
done!

Sure enough, if we peek in the /web/mirror directory, everything is created as we hoped:

$ ls -Rs /web/mirror
total 1
1 ourecopass/

/web/mirror/ourecopass:
total 62
 4 cgi-lib.pl            2 lists.html            2 thanks-join.html
 2 contact.html          2 mailform.cgi*         1 thanks.html
 2 index.html           20 nq-map.gif
 2 joinlist.cgi*        26 nq-map.jpg

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.186.92