Hour 16 Managing Your Log Files

Image

One of the most important tasks for a system administrator is to keep an eye on system and disk space. Earlier we spent some time exploring df, du, and find, and learned how they can be used to better understand the allocation of disk space on your system.

In this hour, we’ll delve into the specific log files used by the different Unix services, learn how to analyze them to flag any potential problems, and how to keep them from overrunning your disk.

In this hour you learn about

• Exploring the contents of log files

• Tracking hackers through log events

• Cracking open the httpd log file

• Trimming log files with logrotate

• Building your own log rotation tool

Understanding Log Files

Whether it’s the boot process itself, the FTP server, the mail server, or any other service running on your computer, if it does something or someone interacts with it, the event is logged. Some services have their own log files, but many use a shared central log mechanism called syslog. syslog writes all messages to a log file typically called either messages or syslog.

Unix flavors vary on where they put log files, but generally you’ll find them all in /var/log by default.

Task 16.1: Rummaging Around in Log Files

Different systems have a different number of log files, based both on how many services are running and how the logging for each service is configured.

1. First, a peek at the /var/log directory:

Image

Notice that files have version number suffixes. As you’ll learn later in this hour, this indicates that they’re being rotated—the higher the number, the older the file. The version that has no version identification is the current log file.

2. The most basic log file is syslog’s file, and on a Linux box, it’s called /var/log/messages.

    # grep ″^Mar 11″ messages
    Mar 11 00:13:23 staging su(pam_unix)[12835]: session closed for user root
    Mar 11 00:14:14 staging sshd(pam_unix)[12796]: session closed for user taylor
    Mar 11 04:02:18 staging syslogd 1.4.1: restart.
    Mar 11 11:19:32 staging sshd(pam_unix)[14368]: session opened for user taylor
    Image by (uid=0)
    Mar 11 14:09:06 staging su(pam_unix)[14539]: session opened for user root by
    Image taylor(uid=502)
    Mar 11 16:02:05 staging sshd(pam_unix)[14716]: session opened for user taylor
    Image by (uid=0)
    Mar 11 16:28:10 staging sshd(pam_unix)[14716]: session closed for user taylor
    Mar 11 16:28:19 staging su(pam_unix)[14539]: session closed for user root
    Mar 11 16:28:21 staging sshd(pam_unix)[14368]: session closed for user taylor

This is a list of all logged messages for March 11, 2002. Not too exciting, but you can see that su logs events and that the sshd (ssh daemon) logs when users connect and disconnect.

Looking at the contents of this file, we can ascertain that user taylor was logged in until 00:14 when the sshd session ended, and that there was also an su session alive (as root) when the clock rolled over to the March 11. At 4:02 a.m., syslogd restarted, and user taylor logged in twice that day, once at 11:19 a.m. and a second time at 4:02 p.m. (16:02). During that time, taylor switched to root and left a root shell running from 11:19 a.m. to 4:28 p.m. (16:28).

A quick look in earlier log files by searching for Mar 10 will identify the initial su, to see if that was also from taylor:

    # grep ″^Mar 10″ messages.1 | grep ′su′
    Mar 10 23:39:49 staging su(pam_unix)[12835]: session opened for user root by
    Image taylor(uid=502)

Sure enough, at 11:39 p.m., taylor used the su command to become root.

3. Perhaps more important than unthreading the logins for a given day is to scan for any potential security problems. One obvious one: Did anyone try to use the su facility and fail to log in?

    # grep ″;su(″; message* | grep -vE ′(session opened|session closed)′
    messages:Mar 18 12:52:14 staging su(pam_unix)[5330]: authentication failure;
    Image logname=taylor uid=502 euid=0 tty= ruser= rhost= user=root
    messages.1:Mar 12 20:56:09 staging su(pam_unix)[17297]: authentication failure;
    Image logname=taylor uid=502 euid=0 tty= ruser= rhost= user=root
    messages.3:Feb 23 21:09:16 staging su(pam_unix)[15552]: authentication failure;
    Image logname=taylor uid=502 euid=0 tty= ruser= rhost= user=taylorsu

You can see that in the time window that includes all the logged events, there were three failed su attempts, on February 23, March 12, and March 18, all by user taylor. Two were attempts to become root, and one to become taylorsu.

You can also verify that all the sus that succeeded were from known users by building a quick summary and using the helpful uniq utility to see what differences there were:

    # grep ′session opened′ message* | awk ′{print $12 ′ became ′ $10 }′ |
    sort | uniq -c

        2 taylor(uid=0) became taylor
       15 taylor(uid=502) became root
        2 taylor(uid=502) became taylorsu
        4 (uid=0) became judi
       35 (uid=0) became taylor

It’s a bit complex, but we extract all the ″session opened″ messages, strip out the From and To account information, and output a simple three-word value. Then sort ensures they’re in order, and uniq -c counts unique occurrences of each string and outputs a single line for each unique string, prefaced with a repeat count. The second line, for example, shows that taylor used the su utility to become root 15 times.

4. On a Mac OS X system, by contrast, the syslog file is known as system.log, and it contains all sorts of interesting information. For example, wonder what happens when your system goes to sleep and wakes up again?

    Mar 18 11:35:02 dsl-132 mach_kernel: UniNEnet: 0 0 UniNEnet::
       putToSleep - turning off cell clock!!!
    Mar 18 11:35:02 dsl-132 mach_kernel: System Sleep
    Mar 18 11:35:02 dsl-132 mach_kernel: System Wake
    Mar 18 11:35:02 dsl-132 mach_kernel: Wake event 0020

More usefully, both su and sudo are logged here:

    # grep ″su:″ system.log
    Mar 15 09:42:38 dsl-132 su: taylor to root on /dev/ttyp1
    # grep ″sudo:″ system.log
    Mar 11 22:02:45 dsl-132 sudo:   taylor : TTY=ttyp1 ; PWD=/Users/taylor/bin ;
      USER=root ; COMMAND=./docron weekly
    Mar 11 22:05:14 dsl-132 sudo:   taylor : TTY=ttyp2 ; PWD=/Users/taylor ;
      USER=root ; COMMAND=/bin/ls /

There’s no evidence of any users other than taylor running su or sudo in this log file.

5. The next log file to examine is boot.log, which shows you all the messages output to the screen during boot-time. Because most Unix systems aren’t booted every day (hopefully!), this will often be empty, as you can see in the ls listing in step 1.

    # head -20 boot.log.2
    Feb 25 22:44:42 staging atd: atd shutdown succeeded
    Feb 25 22:44:44 staging httpd: httpd shutdown succeeded
    Feb 25 22:44:44 staging sshd: sshd -TERM succeeded
    Feb 25 22:44:44 staging sendmail: sendmail shutdown succeeded
    Feb 25 22:44:44 staging xinetd: xinetd shutdown succeeded
    Feb 25 22:44:44 staging crond: crond shutdown succeeded
    Feb 25 22:44:45 staging dd: 1+0 records in
    Feb 25 22:44:45 staging dd: 1+0 records out
    Feb 25 22:44:45 staging random: Saving random seed: succeeded
    Feb 25 22:44:45 staging nfslock: rpc.statd shutdown succeeded
    Feb 25 22:44:45 staging portmap: portmap shutdown succeeded
    Feb 25 22:44:46 staging syslog: klogd shutdown succeeded
    Feb 25 22:47:37 staging syslog: syslogd startup succeeded
    Feb 25 22:47:37 staging syslog: klogd startup succeeded
    Feb 25 22:47:37 staging portmap: portmap startup succeeded
    Feb 25 22:47:37 staging nfslock: rpc.statd startup succeeded
    Feb 25 22:47:11 staging rc.sysinit: Mounting proc filesystem: succeeded
    Feb 25 22:47:11 staging rc.sysinit: Unmounting initrd: succeeded
    Feb 25 22:47:11 staging sysctl: net.ipv4.ip_forward = 0
    Feb 25 22:47:11 staging sysctl: net.ipv4.conf.default.rp_filter = 1

These messages document a system shutdown (or change in init state that required a set of shutdown events prior to starting up new services) on February 25. Probably most important things to look for are errors, warnings, and similar:

    # grep -E ′(warning|error|crit|fail)′ boot.log*
    #

Nothing’s wrong. Terrific!

6. By contrast, checking the syslog file on Solaris 8 (Solaris doesn’t have a separate boot log file; all information about booting goes into syslog) with the same regular expression reveals:

    # egrep ′(warning|error|crit|fail)′ syslog
    Jan 14 22:36:35 solaris sendmail[347]: [ID 702911 mail.crit] My unqualified
    Image host
      name (solaris) unknown; sleeping for retry
    Jan 15 00:06:08 solaris sendmail[223]: [ID 702911 mail.crit] My unqualified
    Image host
      name (solaris) unknown; sleeping for retry
    Jan 15 00:34:43 solaris sendmail[224]: [ID 702911 mail.crit] My unqualified
    Image host
      name (solaris) unknown; sleeping for retry

Clearly there’s a problem with sendmail resolving its own name at boot-time. This is something well worth exploring further, and ultimately fixing.

7. Let’s quickly look through some of the other log files to understand what’s going on:

    # head -5 cron
    Mar 17 04:05:00 staging CROND[27157]: (root) CMD
      (/usr/bin/mrtg /etc/mrtg/mrtg.cfg)
    Mar 17 04:10:00 staging CROND[27160]: (root) CMD
      (/usr/bin/mrtg /etc/mrtg/mrtg.cfg)
    Mar 17 04:10:00 staging CROND[27161]: (root) CMD
      (/usr/lib/sa/sa1 1 1)
    Mar 17 04:15:00 staging CROND[27164]: (root) CMD
      (/usr/bin/mrtg /etc/mrtg/mrtg.cfg)
    Mar 17 04:20:00 staging CROND[27167]: (root) CMD
      (/usr/bin/mrtg /etc/mrtg/mrtg.cfg)

As expected, the cron log file shows commands and actions taken by the crond daemon. In parenthesis it shows the user account, and also shows the command. If you think someone is using a cron-based program to try and break security, it’ll be logged here.

8. If you’re ready to become paranoid about people trying to get to your computer, have a look at xferlog or ftp.log, depending on your OS. On my Mac OS X server, for example, the log file shows

    # cat ftp.log | grep -v taylor
    Jan 21 12:24:36 dsl-132 ftpd[369]: connection from dsl-155.dsldesigns.com
    Jan 21 12:24:40 dsl-132 ftpd[369]: FTP LOGIN REFUSED
      FROM dsl-155.dsldesigns.com, root
    Jan 21 12:28:06 dsl-132 ftpd[390]: connection from 63.101.93.250
    Jan 21 12:35:40 dsl-132 ftpd[412]: connection from dsl-155.dsldesigns.com
    Jan 21 19:44:24 dsl-132 ftpd[491]: connection from 63.101.93.250
    Jan 21 19:45:28 dsl-132 ftpd[492]: connection from dsl-155.dsldesigns.com
    Jan 21 20:00:39 dsl-132 ftpd[516]: connection from 63.101.93.250
    Jan 22 09:45:04 dsl-132 ftpd[332]: connection from webpac.clemson.edu
    Jan 22 12:27:23 dsl-132 ftpd[460]: connection from 63.101.93.250
    Jan 22 12:34:18 dsl-132 ftpd[461]: connection from 63.101.93.250
    Jan 24 18:00:40 dsl-132 ftpd[369]: connection from dsl-151.dsldesigns.com
    Jan 26 22:44:59 dsl-132 ftpd[927]: connection from dsl-155.dsldesigns.com
    Jan 29 18:27:33 dsl-132 ftpd[359]: connection from 157.161.112.208
    Jan 29 21:48:12 dsl-132 ftpd[378]: connection from
      port-212-202-160-251.reverse.qdsl-home.de
    Jan 29 21:48:13 dsl-132 ftpd[378]: ANONYMOUS FTP LOGIN REFUSED
      FROM port-212-202-160-251.reverse.qdsl-home.de
    Feb 6 12:10:23 dsl-132 ftpd[525]: connection from 199.88.128.27
    Feb 6 15:54:41 dsl-132 ftpd[554]: connection from pd955f64e.dip.t-dialin.net
    Feb 6 21:52:56 dsl-132 ftpd[605]: connection from pd955f64e.dip.t-dialin.net
    Feb 6 21:52:58 dsl-132 ftpd[605]: ANONYMOUS FTP LOGIN REFUSED
      FROM pd955f64e.dip.t-dialin.net
    Feb 7 00:01:09 dsl-132 ftpd[612]: connection from maptech-inc.com
    Feb 7 00:01:12 dsl-132 ftpd[613]: connection from maptech-inc.com
    Feb 7 00:01:21 dsl-132 ftpd[614]: connection from maptech-inc.com

There are no files available to the general public, and the IP address of our system is not advertised anywhere. How these sites are finding our system is a mystery, but what’s not a mystery is that they’re trying to connect and log in.

This is an obvious attempt to break the security on our system, so it’s a good time to turn the ftpd program off until I’m ready to re-enable it. In Mac OS X, the fastest way to do this is to go into the System Preferences, Sharing control panel and uncheck Enable FTP Access. That not only turns it off from future activity, but kills the currently running ftpd as well.

9. Another log file worth keeping an eye on is maillog which records all electronic mail transactions as they occur. If I send a quick e-mail message to myself, for example, the following two lines are written out to the file:

    # tail -2 maillog
    Mar 18 14:24:35 staging sendmail[5459]: g2IMOZb05459: from=root, size=58,
    class=0, nrcpts=1, msgid=<200203182224.g2IMOZb05459@staging.intuitive.com>,
    relay=root@localhost

    Mar 18 14:24:52 staging sendmail[5462]: g2IMOZb05459: [email protected],
    ctladdr=root (0/0), delay=00:00:17, xdelay=00:00:17, mailer=esmtp, pri=30058,
    relay=mail-fwd.verio-web.com. [161.58.148.40], dsn=2.0.0,
    stat=Sent (036711929 Message accepted for delivery)

Lots of stuff, but most importantly notice that the two lines can be matched with the jobID (g2IMOZb05459), and that the first entry indicates from= and the second indicates to=. Without any fancy footwork, we can sort by field value, then extract the from= and to= values to see what’s going on:

    # sort -k6 maillog* | awk ′{print $7 }′ | grep -v root | grep ′@′
    to=[email protected],

It’s a bit complex, but this pipe extracts all the from= and to= values from the log file, strips out mail sent to or from root, then reports all off-system addresses. As expected on this server, only one message has been sent.

There are a lot of log files to keep track of, no question, and there are important snippets of information in each. It’s well worth your time to explore each file and keep returning to each as your system runs to begin to understand what’s contained within them.

Tracking a Hacker

One common thing you’ll find in your log files is that there are weird and surprising entries. Let’s try to track one backward and see if we can ascertain what’s going on.

Task 16.2: Tracking Backward

Although some Unix systems have all their log entries dropped into the syslog file, Linux offers a different, and helpful, log file called secure that contains all security-related log events.

1. On Linux, there’s a very important log file called secure, and it’s well worth looking at its contents:

    # head secure
    Mar 10 10:22:14 staging sshd[12114]: Did not receive identification
      string from 208.37.77.153.
    Mar 10 22:27:39 staging sshd[12659]: Could not reverse map address
      198.76.82.132.
    Mar 10 22:27:42 staging sshd[12659]: Accepted password for taylor
      from 198.76.82.132 port 49154 ssh2
    Mar 10 23:39:43 staging sshd[12796]: Accepted password for taylor
      from 198.76.82.132 port 49156 ssh2
    Mar 11 11:19:29 staging sshd[14368]: Could not reverse map address
      198.76.82.132.
    Mar 11 11:19:32 staging sshd[14368]: Accepted password for taylor
      from 198.76.82.132 port 49152 ssh2
    Mar 11 16:02:00 staging sshd[14716]: Could not reverse map address
      198.76.82.132.
    Mar 11 16:02:05 staging sshd[14716]: Accepted password for taylor
      from 198.76.82.132 port 49153 ssh2
    Mar 11 17:15:38 staging sshd[14891]: Did not receive identification
      string from 129.132.250.236.
    Mar 12 15:01:13 staging sshd[16846]: Could not reverse map address
      198.76.82.132.

This logs all security-related events, including sshd connections. As you can see, taylor logged in from 198.76.82.132 (a known IP address), but there was also a connection from 208.27.77.153.

The host command can do reverse IP mapping, so given an IP address, we can ascertain its domain:

    # host -dv 208.37.77.153
    ;; res_nmkquery(QUERY, 153.77.37.208.IN-ADDR.ARPA., IN, PTR)
    ;; res_send()
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65086
    ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
    ;;     153.77.37.208.IN-ADDR.ARPA, type = PTR, class = IN
    ;; Querying server (# 1) address = 192.216.138.10
    ;; new DG socket
    ;; got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65086
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0
    ;;     153.77.37.208.IN-ADDR.ARPA, type = PTR, class = IN
    153.77.37.208.IN-ADDR.ARPA. 2H IN PTR w153.z208037077.nyc-ny.dsl.cnc.net.
    77.37.208.IN-ADDR.ARPA. 2H IN NS nameserver3.concentric.net.
    77.37.208.IN-ADDR.ARPA. 2H IN NS nameserver.concentric.net.
    77.37.208.IN-ADDR.ARPA. 2H IN NS nameserver1.concentric.net.
    77.37.208.IN-ADDR.ARPA. 2H IN NS nameserver2.concentric.net.
    rcode = 0 (Success), ancount=1
    The following answer is not authoritative:
    The following answer is not verified as authentic by the server:
    153.77.37.208.IN-ADDR.ARPA 7200 IN PTR w153.z208037077.nyc-ny.dsl.cnc.net
    For authoritative answers, see:
    77.37.208.IN-ADDR.ARPA 7200 IN NS      nameserver3.concentric.net
    77.37.208.IN-ADDR.ARPA 7200 IN NS      nameserver.concentric.net
    77.37.208.IN-ADDR.ARPA 7200 IN NS      nameserver1.concentric.net
    77.37.208.IN-ADDR.ARPA 7200 IN NS      nameserver2.concentric.net

The resolved name is buried in the middle of this information:

    w153.z208037077.nyc-ny.dsl.cnc.net.

2. The next step is to look up the complete domain registration record:

     # whois -r cnc.net
     [whois.crsnic.net]

     Whois Server Version 1.3

     Domain names in the .com, .net, and .org domains can now be registered
     with many different competing registrars. Go to http://www.internic.net
     for detailed information.

        Domain Name: CNC.NET
        Registrar: NETWORK SOLUTIONS, INC.
        Whois Server: whois.networksolutions.com
        Referral URL: http://www.networksolutions.com
        Name Server: NAMESERVER.CONCENTRIC.NET
        Name Server: NAMESERVER3.CONCENTRIC.NET
        Updated Date: 05-nov-2001

    >>> Last update of whois database: Tue, 19 Mar 2002 05:21:50 EST <<<

    [whois.networksolutions.com]

    Registrant:
    Concentric Network Corporation (CNC6-DOM)
       1400 Parkmoor Avenue
       San Jose, CA 95126-3429

       Domain Name: CNC.NET

       Administrative Contact:
          Schairer, David R (DRS9) [email protected]
          Concentric Network Corp.
          1400 Parkmoor Ave
          Cupertino, CA 95014
          (408) 817-2800 (800) 745-2747 ext. 2800 (FAX) (408) 817-2630
       Technical Contact:
          DNS & IP ADMIN (DIA-ORG) [email protected]
          Concentric Network Corporation
          1400 Parkmoor Avenue
          San Jose, CA 95126-3429
          (408) 817-2800
          Fax- (408) 817-2630
       Billing Contact:
          XO Communications, Hostmaster (CNCXCH-ORG) [email protected]
          XO Communications
          1400 Parkmoor Ave
          San Jose, CA 95126
          408-817-2800
          Fax- 408-817-2810

       Record last updated on 02-Mar-2001.
       Record expires on 06-Mar-2004.
       Record created on 05-Mar-1997.
       Database last updated on 18-Mar-2002 22:31:00 EST.

       Domain servers in listed order:

       NAMESERVER3.CONCENTRIC.NET 206.173.119.72
       NAMESERVER2.CONCENTRIC.NET 207.155.184.72

Concentric Networks is a large ISP based in Northern California.

3. We’ll send them an e-mail message, but, alas, I don’t expect any sort of meaningful response:

    # mail -s ″One of your customers was trying to hack into my system″
    [email protected]
    Hello. I have tracked backward and found that one of the computers in
    your domain (w153.z208037077.nyc-ny.dsl.cnc.net) was trying to hack into my
    Linux server. In the ″/var/log/secure″ log file, I find the following message
    from ′sshd′:

    Mar 10 10:22:14 staging sshd[12114]: Did not receive identification string
    from 208.37.77.153.

    A reverse lookup of that IP address returns:

    153.77.37.208.in-addr.arpa. domain name pointer w153.z208037077.nyc-ny.dsl.
    Image cnc.net.

    Can you please work backward in your usage logs to ascertain what computer has the specified IP address of 208.37.77.153, and identify the culprit? Needless to say, I don′t want people port-scanning my server!

    Thanks for your assistance on this matter.

    Dave Taylor
    [email protected]
    .

4. About 15 minutes later, we receive the following:

    The original message was received at Tue, 19 Mar 2002 13:46:52 -0500 (EST)

       ----- The following addresses had permanent fatal errors -----
    [email protected]
        (expanded from: <[email protected]>)

       ----- Transcript of session follows -----
    550 [email protected]… Host unknown (Name server: cncx.net: host not found)

Oh well, at least we can see how to work backward in our log files, and at least we tried….

It’s not uncommon to find people trying to probe your ports: Indeed, it’s a matter of five minutes of shell programming to write a ″port scanner″ that will pick an IP address, then see what Internet services, if any, are listening. Once that’s ascertained, there are tools available to try and break in using a specific service ″exploit.″

If it sounds a bit depressing, it should. The best thing you can do for security is to ensure that your system is properly configured, that you keep an eye on your log files, that you have a properly configured firewall, and that you have reliable backups just in case.

There are also some very good books about Unix security, three of which I’ll highlight here:

Hacking Linux Exposed by Brian Hatch et al. (McGraw-Hill) has a good explanation of what hackers do to try and gain entry into your system, though at the expense of less coverage of how to avoid the problem in the first place.

Practical Unix and Internet Security by Simson Garfinkel and Gene Spafford (O’Reilly) is more academic, but a definitive reference on this topic.

Maximum Linux Security by Anonymous (Sams) is another option. A quick search on Amazon or Barnes & Noble will reveal quite a few other choices if none of these jump out at you.

We’ll return to the topic of security again in the last hour.

The httpd Log File

There’s another log file that is well worth exploring in some detail, as it’s often the main purpose for a Unix server in the first place: the Web server log.

There are different Web servers available, but by far the most popular is Apache, a freely distributed HTTP daemon that offers remarkable flexibility, power, and configurability. It is included on just about every modern Unix distribution, and you can download the Apache distribution at www.apache.org.

Task 16.3: Exploring the httpd Log File

The last log file we’ll examine in this hour is the Apache Web server log file, actually a set of files all stored in /var/log/httpd.

1. Here’s what’s typically in that directory:

Image

To make things more interesting, however, we’ll analyze the log files from a busier Web server with larger log files:

    # ls -s
    total 4200
    2594 access_log   654 agent_log    33 error_log   919 referer_log

2. The first snippet of information you would like to extract from your log files is a count of hits, which proves remarkably easy to do:

    # wc -l access_log
      12384 access_log

12,384 hits, but in what period of time? Again, just look at the first and last lines of the log file to find out:

    # head -1 access_log ; tail -1 access_log
    centaurus.4web.cz - - [18/Mar/2002:00:09:53 -0500] ″GET / HTTP/1.0″ 200
    Image 229 ″-″ ″Woko robot 3.0″
    node-c-c10f.a2000.nl - - [18/Mar/2002:18:05:05 -0500] ″GET
    [ic:cc]/taylor/Graphics/biohazard.gif HTTP/1.1″ 200 5330
    http://forum.fok.nl/showtopic.php/119057/1/50″ ″Mozilla/4.0 (compatible;
    Image MSIE 6.0; Windows 98)″

This represents all usage between midnight and 6:05 p.m., a total of 18 hours. This means that this server is seeing a respectable, but not overwhelming, 688 (12384÷18) hits per hour, or 11.4 (688÷60) hits per minute.

3. To extract more information from this log file, a quick summary of the log file format is required. I won’t explain everything in this hour (we’ll look at Apache in more detail later in the book), but here’s the essential field layout:

    visitor-domain - - date&time timezone operation URL - return-code
    Image bytes-sent …

With this information, we can quickly extract interesting information, like what domains account for the greatest number of hits:

    # awk ′{print $1}′ access_log | sort | uniq –c | sort –rn | head -10
        305 193.190.216.249
        208 pc03.wimbp.zgora.pl
        175 164.156.231.55
        145 204.185.56.252
        117 216.35.169.126
        110 slb-proxy-03.boeing.com
         84 194.131.98.235
         78 216-203-142-177.customer.algx.net
         75 61.11.78.180
         70 209.248.92.29

Again, the hosts command can do a reverse lookup for the topmost domain:

    # host -dv 193.190.216.249 | grep ″IN SOA″
    216.190.193.IN-ADDR.ARPA. 2h59m40s IN SOA www.senate.be. sysadmin.senate.be.

A computer in the Belgian Senate accounts for more traffic than any other system visiting the Web site in this 18 hour window. Surprised?

    # host -dv 164.156.231.55 | grep ″IN SOA″
    156.164.IN-ADDR.ARPA. 3H IN SOA jasper.cmic.state.pa.us. security.
    Image state.pa.us

IP address number three is the Pennsylvanian office of Social Security!

4. Another common query to the Apache log file is about what URLs are most commonly requested. This is easy to calculate when you notice that field seven is the requested URL:

    # awk ′{print $7}′ access_log | sort | uniq -c | sort -rn | head
       1334 /cgi-local/etymologic.cgi
        699 /cgi-local/trivial.cgi
        508 /
        437 /taylor/Graphics/biohazard.gif
        219 /sites/trivial/Graphics/kudos.gif
        217 /sites/trivial/Graphics/intsys.gif
        213 /sites/etymologic/Graphics/bottom-bar.gif
        212 /sites/etymologic/Graphics/top-bar.gif
        204 /sites/etymologic/Graphics/bonus-com.gif
        203 /sites/etymologic/Graphics/linkbw2.gif

5. Many users like to keep track of the entire number of bytes sent from their server to calculate the approximate percentage of bandwidth utilized per day. This is field ten, and the analysis is easier than the earlier examples:

    # awk ′{ sum += $10} END { print ″total = ″ sum/1024 ″ Kb″ }′ access_log
    total = 41583.9 Kb

This equates to a reasonable transfer rate of 2.25MB/hour (41583÷18)÷1024.

6. The other log files are also worth a quick peek. The agent_log is just the Web browser identification string from each visitor:

    # head agent_log
    Woko robot 3.0
    Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
    Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
    Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
    Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
    Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
    Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
    Woko robot 3.0
    Mozilla/4.0 (compatible; MSIE 5.0; AOL 7.0; Windows 98; DigExt)
    Mozilla/4.0 (compatible; MSIE 5.0; AOL 7.0; Windows 98; DigExt)

Again, the powerful combination of sort|uniq -c|sort -rn|head will reveal the most commonly used browsers:

    # sort agent_log | uniq -c | sort -rn | head
       1049 Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)
        816 Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
        732 Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
        619 Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)
        604 Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
        438 Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
        434 Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)
        333 Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)
        326 Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.4)
    Image Gecko/3 Netscape6/6.2
        287 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Given that any agent with MSIE in its string is Microsoft Internet Explorer, this shows quite graphically that Internet Explorer certainly does lead the Web browser pack by quite a margin (of the 5,638 browser agents listed in this top ten list, all but 326 (5.7%) are MSIE).

7. The referrer_log is somewhat interesting, but it takes a bit of data massaging to see what’s really inside. The file contains a list of the last URL someone was at before visiting a Web site on this server:

    # head -5 referer_log
    http://search.yahoo.com/search/msie?o=1&m=i&a=fd&p=trivia+games&b=121&h=s -> /
    http://www.intuitive.com/sites/trivial/index.html ->
    Image /sites/trivial/Graphics/intsys.gif
    http://www.intuitive.com/sites/trivial/index.html ->
    Image /sites/trivial/Graphics/play-the-game.gif
    http://www.intuitive.com/sites/trivial/index.html ->
    Image /sites/trivial/Graphics/kudos.gif
    http://www.intuitive.com/sites/trivial/index.html ->

The problem is that many of these URLs prove to be quite long, thus preventing us from doing any meaningful analysis because the URL includes a timestamp, session ID, and so on.

Instead, cut lets us chop out just the base domain name and see what we find:

    # cut -d/ -f3 referer_log | sort | uniq -c | sort -rn | head
       9589 www.intuitive.com
        164 www.google.com
        116 pub43.ezboard.com
        115 pub44.ezboard.com
        107 forum.fok.nl
         48 search.msn.com
         47 search.yahoo.com
         39 www.dancing.baby.net
         34 www.yellow.baby.net
         34 images.google.com

Some very interesting results!

8. The final log file is error_log, and here again, we’re going to see security problems as people try to break into our server:

    # head -4 error_log
    [Mon Mar 18 00:14:54 2002] [error] [client 66.77.73.219] File does
    Image not exist: /w/web/intui2/robots.txt
    [Mon Mar 18 00:30:47 2002] [error] [client 198.107.235.65] File does
    Image not exist: /u/web/intui2/custer
    [Mon Mar 18 00:34:45 2002] [error] [client 12.233.27.11] File does
    Image not exist: /u/web/intui2/robots.txt
    [Mon Mar 18 01:11:07 2002] [error] [client 64.128.250.173] File does
    Image not exist: /u/web/intui2/OLD/Images/social-guide-title.gif

Again, you can see that the format is very uniform, so you can easily search for does not exist, extract the actual requested file, sort|uniq it, and have a list of the most common incorrect references:

    # grep ″does not exist″ error_log | awk ′{print $13}′ | sort |
      uniq -c | sort -rn | head

        34 /u/web/intui2/custer
        26 /u/web/intui2/robots.txt
        17 /u/web/intui2/origins/
        15 /u/web/intui2/favicon.ico
        13 /u/web/intui2/origins/Graphics/nav/newswire-off.gif
        13 /u/web/intui2/apps/Graphics/nav/newswire-off.gif
        11 /u/web/intui2/OLD/Images/social-guide-title.gif
        11 /u/web/intui2/OLD/Images/coolweb-ad.gif
         8 /u/web/intui2/coolweb/apps/relayto.cgi
         8 /u/web/intui2/apps/search-everything.cgi

More importantly, you can exclude those errors and see what other problems might have arisen:

    # grep -v ″does not exist″ error_log | head
    [Mon Mar 18 04:48:51 2002] [error] [client 213.106.38.231] Premature
    end of script headers: /u/web/intui2/cgi-local/etymologic.cgi
    [Mon Mar 18 05:24:40 2002] [error] [client 210.183.67.209] Client sent
    malformed Host header
    [Mon Mar 18 07:51:48 2002] [error] [client 206.114.36.6] script not found or
    unable to stat: /u/httpd/cgi-bin/PDG_Cart
    [Mon Mar 18 08:16:16 2002] [error] [client 213.77.101.163] script not found or
     unable to stat: /u/web/intui2/cgi-local/apps/querty.cgi
    [Mon Mar 18 08:16:44 2002] [error] [client 213.77.101.163] script not found or
    unable to stat: /u/web/intui2/cgi-local/apps/querty.cgi
    [Mon Mar 18 11:45:20 2002] [error] [client 193.63.5.67] attempt to invoke
    directory as script: /u/web/intui2/cgi-local/apps
    [Mon Mar 18 11:46:31 2002] [error] [client 193.63.5.67] attempt to invoke
     directory as script: /u/web/intui2/cgi-local/apps
    [Mon Mar 18 12:38:40 2002] [error] [client 164.156.231.55] attempt to invoke
    directory as script: /u/web/intui2/cgi-local
    [Mon Mar 18 13:53:23 2002] [error] [client 213.97.216.50] (11)Resource
    temporarily unavailable: couldn’t spawn
    child process: /u/web/intui2/cgi-local/switcher.pl
    [Mon Mar 18 13:53:24 2002] [error] [client 213.98.97.138] (11)Resource
    temporarily unavailable: couldn’t spawn
    child process: /u/web/intui2/cgi-local/switcher.pl

Definitely some things to explore. Why, for example, is there a call to apps/querty.cgi? What’s resource temporarily unavailable mean?

We’ll re-address Apache issues in the last two hours of the book.

Of all the log files that can be analyzed and explored, few are more interesting than the Web server itself. Armed with basic Unix tools, it proves easy to extract meaningful information and produce rudimentary statistics in just a moment or two.

Analyzing an access_log file is a clear task for a cron-launched script that can produce a report and e-mail it to the Web administrators. This will be left as an exercise to the reader.

Trimming Log Files with logrotate

Let’s put the question of detecting security problems aside for the rest of this hour and look at the issue of managing the log files themselves.

On a busy server, it’s not unusual to see log files that are enormous, and adding lines every few seconds. On a busy Web server, for example, a tail -f /var/log/httpd/access_log might well spit out 50–100 lines each minute as the sites are visited.

The challenge is to simultaneously minimize the size of the log files, while still making sure that they’re accessible and available as needed.

Task 16.4: The logrotate Program

Historically, system administrators have written their own scripts to manage and trim log files, but with the latest generation of Linux, there’s a very helpful utility included called logrotate, and it does a very sophisticated job of rotating all the log files in /var/log automatically.

1. By convention, the logrotate command is called from cron on a daily basis. Based on the discussion in the last hour, it should be no surprise that it’s located in

    /etc/cron.daily:
    # cd /etc/cron.daily
    # ls
    00 - logwatch  0anacron   makewhatis.cron  slocate.cron tmpwatch
    00webalizer  logrotate  rpm              sysstat
    # more logrotate
    #!/bin/sh

    /usr/sbin/logrotate /etc/logrotate.conf

Before we look at the command, then, let’s have a peek at the configuration file.

2. If you think about the task that’s being accomplished, you’ll start to see the elegance of the logrotate solution. We have a log file that we want to have automatically renamed file.1, and we want file.1 to be renamed file.2, and so on. After a certain count, we want the oldest files to be deleted, so we have a running window of ″rotate-frequency″ days of events in the past.

The first few lines of the configuration file specify those two key settings:

    # cat /etc/logrotate.conf
    # see ″man logrotate″ for details
    # rotate log files weekly
    weekly

    # keep 4 weeks worth of backlogs
    rotate 4

In this case, the frequency is weekly and the rotation is 4, so we have a running window of the last month of activity on the server.

    # create new (empty) log files after rotating old ones
    create

    # uncomment this if you want your log files compressed
    # compress

Some daemons are very picky about log files, and will refuse to create a log file if it doesn’t already exist. If that’s the case, the create command is very useful, and it’s the default setting for logrotate. Also notice how easy it is to decide that the historic log file archives should be stored in a compressed (gzip) format by default!

    # Keep a longer archive of the ″secure″ logs, and compress them
    /var/log/secure {
      rotate 8
      compress
    }

In this example, the secure log files are kept for eight weeks rather than the default of four, and they’re compressed to save space.

    # Rotate the Apache log files when they get over 25K in size
    # Keep an archive of six weeks, compressed, and mail the new
    # rotated log to a special email address too:
    /var/log/httpd/access_log {
      rotate 6
      size 25k
      mail [email protected]
      compress
      postrotate
        /etc/rc.d/rc3.d/S15httpd restart
      endscript
    }

This is a fancy use of logrotate that really demonstrates its capabilities. Rather than having the rotation tied to a specific calendar day, the program will rotate the log file whenever it grows larger than 25K. It’ll keep a six-week archive compressed, and the oldest log file (which will be deleted once the rotation is complete) will be sent via e-mail to the special address [email protected] as part of the processing. Once the rotational sequence is completed, logrotate will call the specified shell script (the httpd control script that init uses) to force a restart, so each log file starts out with a Web server boot event.

This level of sophistication is far more than we’ll create in our own script later in this hour, and it’s a strong testimony to why the logrotate command can be such a powerful utility to the smart system administrator.

One last example:

    # no packages own lastlog or wtmp -- we′ll rotate them here
    /var/log/wtmp {
      monthly
      create 0664 root utmp
      rotate 1
    }

The wtmp file is a log of who logged in (logins are recorded in utmp while the user is still logged in, and then moved to wtmp after she’s logged out), and it’s of value, but it usually doesn’t grow very quickly. In this case, we’ll rotate it monthly and only keep a two month window. Notice that the new empty utmp file has very specific creation parameters that must be specified—the permission is 0644 and the owner of the file is root.

3. Using the logrotate command is quite simple: Log out and let cron do the work. Every night, cron will call logrotate, which reads the configuration file and does the actions specified.

To test out your configuration, use the -d (debug) flag:

    # logrotate -d /etc/logrotate.conf
    reading config file /etc/logrotate.conf
    reading config info for /var/log/secure
    reading config info for /var/log/httpd/access_log
    reading config info for /var/log/wtmp
    Handling 3 logs
    rotating pattern: /var/log/secure weekly (8 rotations)
    empty log files are rotated old logs are removed
    rotating file /var/log/secure
    log does not need rotating
    rotating pattern: /var/log/httpd/access_log 25600 bytes (6 rotations)
    empty log files are rotated old logs mailed to [email protected]
    rotating file /var/log/httpd/access_log
    log needs rotating
    renaming /var/log/httpd/access_log.6.gz to /var/log/httpd/access_log.7.gz
    renaming /var/log/httpd/access_log.5.gz to /var/log/httpd/access_log.6.gz
    renaming /var/log/httpd/access_log.4.gz to /var/log/httpd/access_log.5.gz
    renaming /var/log/httpd/access_log.3.gz to /var/log/httpd/access_log.4.gz
    renaming /var/log/httpd/access_log.2.gz to /var/log/httpd/access_log.3.gz
    renaming /var/log/httpd/access_log.1.gz to /var/log/httpd/access_log.2.gz
    renaming /var/log/httpd/access_log to /var/log/httpd/access_log.1
    creating new log mode = 0644 uid = 0 gid = 0
    running postrotate script
    running script with arg /var/log/httpd/access_log: ″
        /etc/rc.d/rc3.d/S15httpd restart
    ″
    compressing new log with: /bin/gzip -9 ′/var/log/httpd/access_log.1′
    executing: ″/bin/mail -s ′/var/log/httpd/access_log.7.gz′
      [email protected] < /var/log/httpd/access_log.7.gz″
    removing old log /var/log/httpd/access_log.7.gz
    rotating pattern: /var/log/wtmp monthly (1 rotations)
    empty log files are rotated old logs are removed
    rotating file /var/log/wtmp
    log does not need rotating

Rather a lot of output, but if you step through it, you’ll see that logrotate is indicating exactly what it’ll do. Notice the invocation of the necessary commands for restarting the httpd service and sending the oldest compressed archive file via e-mail to a longer-term archive.

Unfortunately, logrotate is only available in certain Linux and Unix distros as of this writing (Caldera, Red Hat, Solaris), though there’s no doubt it’ll eventually be widely available to the general Unix community.

Not to fear, however, because the last section of this hour will explore a shell script that can do simple rotation, albeit in a less sophisticated manner.

Building Your Own Log Rotation Tool

Now that you’ve seen how logrotate works, you should be inspired to write a flexible, general purpose log rotator that can be easily distributed to all the different Unix systems you maintain. Fortunately, that’s not too difficult to accomplish.

Task 16.5: Rotating Your Own Logs

The basic algorithm for rotating log files is to recursively select all the files in the /var/log directory that are ″plain files″ (for example, not directories, not pipes, not sockets, not device drivers) and don’t have a digit as the last letter of their name. With that list, create a set of new filenames that have the appropriate sequential suffixes, and rotate all the files.

1. Selecting just the files desired is perhaps the hardest part of this script, and it can be done with find. The required addition is the -not logical flag, which reverses the logic of the given test:

    # find /var/log -type f -not -name ′*[0-9]′ -print
    ./messages
    ./lastlog
    ./secure
    ./maillog
    ./spooler
    ./wtmp
    ./gdm/:0.log
    ./xferlog
    ./pgsql
    ./httpd/error_log
    ./httpd/access_log
    ./httpd/ssl_engine_log
    ./httpd/ssl_request_log
    ./httpd/ssl_scache.sem
    ./dmesg
    ./cron
    ./boot.log
    ./XFree86.0.log
    ./XFree86.9.log
    ./rpmpkgs

As you can see, this correctly listed all the log files that we’d like to rotate (and a few extra: the ssl_scache.sem file and the Xfree86 logs).

2. To refine this search further, we’ll add a few more tweaks to the find loop, including a test to only match files greater than 1KB, and skipping the Xfree86 logs completely:

    # cat rotatelogs.sh
    #!/bin/sh

    for name in `find /var/log -type f -size +1k -not -name ′*[0-9]′
       -not -name ′XFree*′ -not -name ′:0*′ -print`
    do
      echo Log file $name is ready to rotate
    done

This intermediate loop offers what we want:

    # rotatelogs.sh
    Log file /var/log/lastlog is ready to rotate
    Log file /var/log/maillog is ready to rotate
    Log file /var/log/wtmp is ready to rotate
    Log file /var/log/gdm/:0.log is ready to rotate
    Log file /var/log/httpd/error_log is ready to rotate
    Log file /var/log/httpd/access_log is ready to rotate
    Log file /var/log/dmesg is ready to rotate
    Log file /var/log/cron is ready to rotate
    Log file /var/log/rpmpkgs is ready to rotate

3. Now let’s just jump into the entire script, so you can see how to accomplish the increments:

    #!/bin/sh

    cd /var/log

    for name in `find . -type f -size +1k -not -name ′*[0-9]′ -not -name ′XFree*′ -n
    ot -name ″:0*″ -print`
    do

    back1=″${name}.1″; back2=″${name}.2″;
    back3=″${name}.3″; back4=″${name}.4″;
      # rotate, starting with the oldest log
      if [ -f $back3 ] ; then
        mv -fv $back3 $back4
      fi
      if [ -f $back2 ] ; then
        mv -fv $back2 $back3
      fi
      if [ -f $back1 ] ; then
        mv -fv $back1 $back2
      fi
      if [ -f $name ] ; then
        mv -fv $name $back1
      fi
      touch $name; echo chmod 0600 $name
    done

    exit 0

4. That’ll do what we want. Let’s see what happens when we run this:

    # rotatelogs.sh
    ′lastlog′ -> `lastlog.1′
    chmod 0600 lastlog
    ′maillog.3′ -> ′maillog.4′
    ′maillog.2′ -> ′maillog.3′
    ′maillog.1′ -> ′maillog.2′
    ′maillog′ -> ′maillog.1′
    chmod 0600 maillog
    ′wtmp.1′ -> ′wtmp.2′
    ′wtmp′ -> ′wtmp.1′
    chmod 0600 wtmp
    chmod 0600 error_log
    chmod 0600 access_log
    ′dmesg′ -> ′dmesg.1′
    chmod 0600 dmesg
    ′cron.3′ -> ′cron.4′
    ′cron.2′ -> ′cron.3′
    ′cron.1′ -> ′cron.2′
    ′cron′ -> ′cron.1′
    chmod 0600 cron
    ′rpmpkgs.3′ -> ′rpmpkgs.4′
    ′rpmpkgs.2′ -> ′rpmpkgs.3′
    ′rpmpkgs.1′ -> ′rpmpkgs.2′
    ′rpmpkgs′ -> ′rpmpkgs.1′
    chmod 0600 rpmpkgs
    ′./httpd/error_log.3′ -> ′./httpd/error_log.4′
    ′./httpd/error_log.2′ -> ′./httpd/error_log.3′
    ′./httpd/error_log.1′ -> ′./httpd/error_log.2′
    ′./httpd/error_log′ -> ′./httpd/error_log.1′
    chmod 0600 ./httpd/error_log
    ′./httpd/access_log.3′ -> ′./httpd/access_log.4′
    ′./httpd/access_log.2′ -> ′./httpd/access_log.3′
    ′./httpd/access_log.1′ -> ′./httpd/access_log.2′
    ′./httpd/access_log′ -> ′./httpd/access_log.1′
    chmod 0600 ./httpd/access_log

Voila! The final step is to drop this into the appropriate cron file or directory, set it to run weekly, and we’re done.

There are some refinements you could make to this script, most notably having the log files compressed with gzip, but in general, this will do a nice job of rotating all the log files on a given Unix system. Note that not all Unixes support the -v flag to mv, however, so you might have to chop that out before your script runs correctly.

Q&A

Q As it’s impossible to document all possible Unix log files in this book, how do I figure out what program creates a given log file on my Unix system?

A Usually, the log file corresponds to the name of the program, which makes it easy (for example, syslogd.log for syslog ftpd.log for ftp). If not, man –k is a smart way to look. You can also strip a trailing d if there is one.

Q Tracking hackers seems quite interesting. How often does it produce positive results?

A Well…the reality is that it’s very unlikely that you’ll accomplish anything meaningful. On the other hand, if we don’t try to notify administrators of malicious users, it certainly won’t improve things.

Workshop

Quiz

1. What are the two primary reasons to pay attention to your log files?

2. What’s in the messages log file, and what is it commonly called on other Unix systems?

3. Using the tips suggested in this hour, what’s in ksyms on a Linux system?

4. What’s the basic technique for the reverse lookup of an IP address to ascertain its domain?

5. Earlier in the hour, we analyzed the access_log file to figure out what domains were hitting our Web server. How would you modify that command to identify just the most popular top-level domains?

Answers

1. Always pay attention to your log files so you are aware of attempted break-ins, and so you can keep track of disk space on your server.

2. The messages log file is used by syslogd, and on some systems it’s called /var/log/syslog.

3. On a Linux box:

    # man –k ksyms
    genksyms        (8) - generate symbol version information
    ksyms           (8) - display exported kernel symbols

4. Use the host command and make sure you specify the –dv flags for maximal output (see the man page).

5. There are a couple of ways to accomplish this, but I really like using the rev command to reverse the domain name, cut the first field out, reverse it again, and do ″the usual″ sequence after that:

    # awk ′{print $1}′ access_log | rev | cut –d. –f1 | rev | sort |
      uniq -c | sort -rn | head -10

       2256 net
       2198 com
        570 es
        348 us
        339 pl
        317 edu
        311 249
        228 ca
        205 uk
        198 55

In the next hour we’ll move to a new topic—network configuration. We’ll start with an exploration of IP addresses and netmasks, and then we’ll look at the steps necessary to configure a Unix box to work on an Ethernet or PPP network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.195.97