Data manipulation tasks that would be very complex with combinations of grep, cut, and paste are very easily done with awk. Because awk is a programming language, it can also perform mathematical operations or check the input very easily (shells don't do math very well). It can even do floating-point math (shells deal only with integers and strings).
The basic form of an awk program looks like this:
awk '/pattern_to_match/ {prog to run}' input_file_names
Notice that the entire program is enclosed in single quotes. If no input file names are specified, awk reads from standard input (as from a pipe).
The pattern_to_match must appear between the / characters. The pattern is actually a regular expression. Regular expressions were covered earlier in this chapter. Some common regular expression examples will be given shortly.
The program to execute is written in awk code, which looks something like C. The program is executed whenever a line of input matches the pattern_to_match. If /pattern_to_match/ does not precede the program in braces { }, then the program is executed for every line of input.
awk works with fields of the input lines. Fields are words separated by white space or some other field separator. awk uses white space as a field separator by default. You can use the -F option to specify the field separator as shown in a later example. The fields in awk patterns and programs are referenced with $, followed by the field number. For example, the second field of an input line is $2. If you are using an awk command in your shell programs, the fields ($1, $2, etc.) are not confused with the shell script's positional parameters because the awk variables are enclosed in single quotes ' causing the shell to ignore them.
You really need to see some examples of using awk to appreciate its power. The following few examples use a file called newfiles, which contains a list of files on a system less than 15 days old. This file is generated as part of a system administration audit program that checks various aspects of a UNIX system. The following shows the contents of newfiles:
# cat newfiles
PROG>>>>> report of files not older than 14 days by find
the file system is /
-rw-r--r-- 1 root root 567 Dec 7 07:16 ./etc/mnttab
-rw-r--r-- 1 root root 20713 Dec 7 07:18 ./etc/rc.log
-rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.map
-rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.devs
-rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.luns
-rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.addr
-r-s------ 1 root root 0 Dec 7 07:17 ./etc/hpC2400/pscan.lock
-r-s------ 1 root root 0 Dec 7 07:17 ./etc/hpC2400/monitor.lock
-rw-r--r-- 1 root root 14299 Dec 7 07:17 ./etc/hpC2400/HPARRAY.INFO
-rw-r--r-- 1 bin bin 8553 Dec 7 07:02 ./etc/shutdownlog
-rw-r--r-- 1 root mail 32768 Dec 7 07:16 ./etc/mail/aliases.db
-rw-r--r-- 1 root mail 33 Dec 7 07:16 ./etc/mail/sendmail.pid
-rw-r--r-- 1 root root 13 Dec 7 07:16 ./etc/opt/dce/boot_time
-rw-r--r-- 1 root root 720 Dec 7 13:34 ./etc/utmp
-rw-r--r-- 1 root root 0 Dec 7 07:16 ./etc/xtab
-rw-r--r-- 1 root root 0 Dec 7 07:18 ./etc/rmtab
-rw-r--r-- 1 root root 40814 Dec 7 07:15 ./etc/rc.log.old
-rw-r--r-- 1 root root 4620 Dec 7 13:34 ./etc/utmpx
-rw-r--r-- 1 root root 9 Dec 7 13:17 ./etc/ntp.drift
-rw-r--r-- 1 root root 616 Dec 7 07:15 ./etc/auto_parms.log
-rw-r--r-- 1 root sys 219 Dec 7 07:00 ./etc/auto_parms.log.old
-rw-rw-rw- 1 root sys 520 Nov 23 12:37 ./.sw/sessions/swlist.last
-r--r--r-- 1 root informix 76 Dec 7 07:17 ./INFORMIXTMP/.inf.shmPSREP
-r--r--r-- 1 root informix 76 Dec 7 07:18 ./INFORMIXTMP/.inf.shmPSDEV
-rw------- 1 autosys autosys 4052 Nov 25 14:08 ./home/autosys/.sh_history
-rw------- 1 tsaxs users 2228 Dec 1 13:15 ./home/tsaxs/.sh_history
-rw------- 1 tsfxo users 2862 Nov 24 10:08 ./home/tsfxo/.sh_history
PROG>>>>> report of files not older than 14 days by find
the file system is /usr
-rw-rw-rw- 1 opop6 users 21 Dec 7 13:46 ./local/adm/etc/lmonitor.hst
-rw-r--r-- 1 tsgjf users 1093 Dec 7 13:17
./local/flexlm/licenses/license.log
PROG>>>>> report of files not older than 14 days by find
the file system is /opt
-rw-rw-r-- 1 bin bin 200 Dec 7 07:17 ./pred/bin/OPSDBPF
-rw-r--r-- 1 root sys 800028 Dec 7 07:17 ./pred/bin/PSRNLOGD
PROG>>>>> report of files not older than 14 days by find
the file system is /var
-rw-r--r-- 1 root sys 45089 Dec 7 07:16 ./adm/sw/swagentd.log
-rw-rw-rw- 1 root sys 562 Dec 7 07:16 ./adm/sw/sessions/swlist.last
-rw-rw-r-- 1 root root 12236 Dec 7 07:16 ./adm/ps_data
-rw-r--r-- 1 root root 65 Dec 7 07:17 ./adm/cron/log
-rw-r--r-- 1 root root 162 Dec 7 07:00 ./adm/cron/OLDlog
-r--r--r-- 1 root root 734143 Dec 7 07:16 ./adm/syslog/mail.log
-rw-r--r-- 1 root root 65743 Dec 7 13:56 ./adm/syslog/syslog.log
-rw-r--r-- 1 root root 4924974 Dec 7 07:02 ./adm/syslog/OLDsyslog.log
-rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp
-rw------- 1 root other 145920 Dec 3 14:36 ./adm/btmp
-rw-r--r-- 1 lp lp 33 Dec 7 07:17 ./adm/lp/log
-rw-r--r-- 1 lp lp 67 Dec 7 07:01 ./adm/lp/oldlog
-rw-r--r-- 1 root root 4330 Dec 7 07:18 ./adm/diag/device_table
-rw-r--r-- 1 root root 34 Dec 7 07:18 ./adm/diag/misc_sys_data
-rwxr-xr-x 1 root root 995368 Nov 22 15:16 ./adm/diag/LOG0190
-rwxr-xr-x 1 root root 995368 Nov 23 02:05 ./adm/diag/LOG0191
-rwxr-xr-x 1 root root 453964 Nov 23 07:01 ./adm/diag/LOG0192
-rwxr-xr-x 1 root root 970448 Nov 23 18:35 ./adm/diag/LOG0193
-rwxr-xr-x 1 root root 995368 Nov 24 05:24 ./adm/diag/LOG0194
-rwxr-xr-x 1 root root 995368 Nov 24 16:14 ./adm/diag/LOG0195
-rwxr-xr-x 1 root root 995368 Nov 25 03:03 ./adm/diag/LOG0196
-rwxr-xr-x 1 root root 995368 Nov 25 13:52 ./adm/diag/LOG0197
-rwxr-xr-x 1 root root 995368 Nov 26 00:41 ./adm/diag/LOG0198
-rwxr-xr-x 1 root root 995368 Nov 26 11:31 ./adm/diag/LOG0199
-rwxr-xr-x 1 root root 995368 Nov 26 22:20 ./adm/diag/LOG0200
-rwxr-xr-x 1 root root 995368 Nov 27 09:09 ./adm/diag/LOG0201
-rwxr-xr-x 1 root root 995368 Nov 27 19:58 ./adm/diag/LOG0202
-rwxr-xr-x 1 root root 995368 Nov 28 06:48 ./adm/diag/LOG0203
-rwxr-xr-x 1 root root 995368 Nov 28 17:37 ./adm/diag/LOG0204
-rwxr-xr-x 1 root root 995368 Nov 29 04:26 ./adm/diag/LOG0205
-rwxr-xr-x 1 root root 995368 Nov 29 15:16 ./adm/diag/LOG0206
-rwxr-xr-x 1 root root 995368 Nov 30 02:05 ./adm/diag/LOG0207
-rwxr-xr-x 1 root root 452020 Nov 30 06:59 ./adm/diag/LOG0208
-rwxr-xr-x 1 root root 970448 Nov 30 18:35 ./adm/diag/LOG0209
-rwxr-xr-x 1 root root 995368 Dec 1 05:24 ./adm/diag/LOG0210
-rwxr-xr-x 1 root root 995368 Dec 1 16:13 ./adm/diag/LOG0211
-rwxr-xr-x 1 root root 995368 Dec 2 03:03 ./adm/diag/LOG0212
-rwxr-xr-x 1 root root 995368 Dec 2 13:52 ./adm/diag/LOG0213
-rwxr-xr-x 1 root root 995368 Dec 3 00:41 ./adm/diag/LOG0214
-rwxr-xr-x 1 root root 995368 Dec 3 11:31 ./adm/diag/LOG0215
-rwxr-xr-x 1 root root 995368 Dec 3 22:20 ./adm/diag/LOG0216
-rwxr-xr-x 1 root root 995368 Dec 4 09:09 ./adm/diag/LOG0217
-rwxr-xr-x 1 root root 995368 Dec 4 19:58 ./adm/diag/LOG0218
-rwxr-xr-x 1 root root 995368 Dec 5 06:48 ./adm/diag/LOG0219
-rwxr-xr-x 1 root root 995368 Dec 5 17:37 ./adm/diag/LOG0220
-rwxr-xr-x 1 root root 995368 Dec 6 04:26 ./adm/diag/LOG0221
-rwxr-xr-x 1 root root 995368 Dec 6 15:15 ./adm/diag/LOG0222
-rwxr-xr-x 1 root root 995368 Dec 7 02:05 ./adm/diag/LOG0223
-rwxr-xr-x 1 root root 453964 Dec 7 07:00 ./adm/diag/LOG0224
-rwxr-xr-x 1 root root 543740 Dec 7 13:57 ./adm/diag/LOG0225
-rw-r--r-- 1 root root 19587 Dec 7 07:16 ./adm/ptydaemonlog
-rw-r--r-- 1 root root 52 Dec 7 07:16 ./adm/conslog.opts
-rw-r--r-- 1 root root 0 Dec 7 07:16 ./adm/rpc.statd.log
-rw-r--r-- 1 root root 0 Dec 7 07:16 ./adm/rpc.lockd.log
-rw-r--r-- 1 root root 24250 Dec 7 07:16 ./adm/vtdaemonlog
-rw------- 1 root root 214 Dec 7 12:07 ./adm/sulog
-rw------- 1 root root 381 Dec 3 17:34 ./adm/OLDsulog
-rw-r--r-- 1 root sys 145 Dec 7 07:16 ./adm/rbootd.log
-rw------- 1 sysadm psoft 60 Dec 1 16:59 ./tmp/EAAa09057
-rw-r--r-- 1 tsgjf users 0 Dec 7 13:17 ./tmp/lockHPCUPLANGS
-rw-r--r-- 1 tsgjf users 175 Dec 7 06:40 ./tmp/.flexlm/lmgrd.1507
-rw-r--r-- 1 tsgjf users 175 Dec 7 13:28 ./tmp/.flexlm/lmgrd.1505
-rw-r--r-- 1 lp lp 0 Dec 7 07:17 ./spool/lp/outputq
-rw-rw-rw- 1 lp lp 4 Dec 7 07:17 ./spool/lp/SCHEDLOCK
-rw------- 1 root sys 0 Nov 23 07:00
./spool/cron/tmp/croutAAAa01030
-rw------- 1 root sys 0 Nov 30 07:00
./spool/cron/tmp/croutAAAa01039
-rw------- 1 root sys 0 Dec 7 07:00
./spool/cron/tmp/croutAAAb01039
-rw-r--r-- 1 root root 4 Dec 7 07:16 ./run/syslog.pid
-rw-r--r-- 1 root root 4 Dec 7 07:16 ./run/gated.pid
-rw-r--r-- 1 root sys 145 Dec 7 07:16 ./run/gated.version
-rw-r--r-- 1 root sys 3 Dec 7 07:16 ./statmon/state
-rw-r--r-- 1 root root 29771 Dec 7 07:16
./opt/dce/config/dce_config.log
-rw-r--r-- 1 root sys 74 Dec 7 07:16
./opt/dce/rpc/local/00404/srvr_socks
-rw-r--r-- 1 root root 72 Dec 7 07:16
./opt/dce/rpc/local/00927/srvr_socks
-rw-r--r-- 1 root root 32768 Dec 7 07:16 ./opt/dce/dced/Ep.db
-rw-r--r-- 1 root root 32768 Dec 7 07:20 ./opt/dce/dced/Llb.db
-rw-r--r-- 1 root root 0 Nov 30 07:16 ./opt/perf/status.ttd
-rw-r--r-- 1 root root 33 Dec 7 07:17 ./opt/perf/datafiles/RUN
-rwxrwxrwx 1 root sys 9243180 Dec 7 13:55 ./opt/perf/datafiles/logappl
-rwxrwxrwx 1 root sys 8697612 Dec 7 13:55 ./opt/perf/datafiles/logdev
-rwxrwxrwx 1 root sys 9195152 Dec 7 13:55 ./opt/perf/datafiles/logglob
-rwxrwxrwx 1 root sys 11112 Dec 7 07:17 ./opt/perf/datafiles/logindx
-rwxrwxrwx 1 root sys 17639080 Dec 7 13:57
./opt/perf/datafiles/logproc
-rwxrwxrwx 1 root sys 3797 Dec 7 07:17
./opt/perf/datafiles/mikslp.data
-rw-rw-rw- 1 root sys 105 Nov 30 10:45 ./opt/perf/datafiles/agdb
-rw-r--r-- 1 root root 5 Dec 7 07:17
./opt/perf/datafiles/.perflbd.pid
-rw-rw-rw- 1 root sys 21176 Dec 7 07:20 ./opt/perf/status.scope
-rw-rw-rw- 1 root root 5 Nov 30 07:16 ./opt/perf/ttd.pid
-rw-r--r-- 1 root root 0 Dec 7 07:17 ./opt/perf/status.mi
-rw-rw-rw- 1 root sys 8254 Dec 7 07:17 ./opt/perf/status.perflbd
-rw-rw-rw- 1 root sys 21507 Dec 7 07:20 ./opt/perf/status.rep_server
-rw-rw-rw- 1 root sys 24570 Dec 7 07:20 ./opt/perf/status.alarmgen
-rw-rw-rw- 1 root sys 160956 Dec 6 21:13 ./opt/omni/log/inet.log
-rw-rw-rw- 1 root sys 158796 Dec 7 07:17 ./sam/log/samlog
-rw-r--r-- 1 root root 64730 Dec 7 07:17 ./sam/boot.config
-rw-rw-rw- 1 root sys 11906 Nov 24 14:27 ./sam/poe.iout
-rw-rw-rw- 1 root sys 11906 Nov 23 09:10 ./sam/poe.iout.old
-rw-rw-rw- 1 root sys 29 Nov 24 14:27 ./sam/poe.dion
You can see that this file contains several fields separated by whitespace. The next example evaluates the third field to determine whether it equals "adm," and if so, the line is printed: |
# awk '$3 == "adm" {print}' newfiles
-rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp
There is precisely one line that contains exactly "adm" in the third field.
The next example evaluates the third field to determine whether it approximately equals "adm," meaning that the third field has "adm" embedded in it, and if so, the line is printed:
# awk '$3 ~ "adm" {print}' newfiles
-rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp
-rw------- 1 sysadm psoft 60 Dec 1 16:59 ./tmp/EAAa09057
This result prints the line from the last example, which has "adm" in the third field as well as a line that contains "sysadm."
The next example performs the same search as the previous example; however, this time only fields nine and five are printed:
# awk '$3 ~ "adm" {print $9, $5}' newfiles
./adm/wtmp 2750700
./tmp/EAAa09057 60
This time only the name of the file, field nine, and the size of the file were printed.
The next example evaluates the third field to determine if it does not equal "root," and if so, prints the entire line:
# awk '$3 != "root" {print}' newfiles
PROG>>>>> report of files not older than 14 days by find
the file system is /
-rw-r--r-- 1 bin bin 8553 Dec 7 07:02 ./etc/shutdownlog
-rw------- 1 autosys autosys 4052 Nov 25 14:08 ./home/autosys/.sh_history
-rw------- 1 tsaxs users 2228 Dec 1 13:15 ./home/tsaxs/.sh_history
-rw------- 1 tsfxo users 2862 Nov 24 10:08 ./home/tsfxo/.sh_history
PROG>>>>> report of files not older than 14 days by find
the file system is /usr
-rw-rw-rw- 1 opop6 users 21 Dec 7 13:46 ./local/adm/etc/lmonitor.hst
-rw-r--r-- 1 tsgjf users 1093 Dec 7 13:17
./local/flexlm/licenses/license.log
PROG>>>>> report of files not older than 14 days by find
the file system is /opt
-rw-rw-r-- 1 bin bin 200 Dec 7 07:17 ./pred/bin/OPSDBPF
PROG>>>>> report of files not older than 14 days by find
the file system is /var
-rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp
-rw-r--r-- 1 lp lp 33 Dec 7 07:17 ./adm/lp/log
-rw-r--r-- 1 lp lp 67 Dec 7 07:01 ./adm/lp/oldlog
-rw------- 1 sysadm psoft 60 Dec 1 16:59 ./tmp/EAAa09057
-rw-r--r-- 1 tsgjf users 0 Dec 7 13:17 ./tmp/lockHPCUPLANGS
-rw-r--r-- 1 tsgjf users 175 Dec 7 06:40 ./tmp/.flexlm/lmgrd.1507
-rw-r--r-- 1 tsgjf users 175 Dec 7 13:28 ./tmp/.flexlm/lmgrd.1505
-rw-r--r-- 1 lp lp 0 Dec 7 07:17 ./spool/lp/outputq
-rw-rw-rw- 1 lp lp 4 Dec 7 07:17 ./spool/lp/SCHEDLOCK
This command results in many lines being printed that do not have "root" in the third field.
newfiles had whitespace to separate the fields. We don't often have this luxury in the UNIX world. The upcoming examples use passwd.test, which has a colon(:) as a field separator. passwd.test is shown below:
# cat passwd.test
root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash
bin:*:1:1:bin:/bin:
daemon:*:2:2:daemon:/sbin:
adm:*:3:4:adm:/var/adm:
lp:*:4:7:lp:/var/spool/lpd:
sync:*:5:0:sync:/sbin:/bin/sync
shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown
halt:*:7:0:halt:/sbin:/sbin/halt
mail:*:8:12:mail:/var/spool/mail:
news:*:9:13:news:/var/spool/news:
uucp:*:10:14:uucp:/var/spool/uucp:
operator:*:11:0:operator:/root:
games:*:12:100:games:/usr/games:
gopher:*:13:30:gopher:/usr/lib/gopher-data:
ftp:*:14:50:FTP User:/home/ftp:
man:*:15:15:Manuals Owner:/:
nobody:*:65534:65534:Nobody:/:/bin/false
col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
You can specify the field separator with the -F option followed by a separator, which is a colon(:) in passwd.test. The following example specifies the field separator and then evaluates the first field to determine whether it equals "root," and if so, prints out the entire line:
# awk -F: '$1 == "root" {print}' passwd.test
root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash
The following example specifies the field separator and then evaluates the fourth field to determine whether it equals "0," which means that the user is a member of the same group as root, and if so, prints out the entire line:
# awk -F: '$4 == "0" {print}' passwd.test
root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash
sync:*:5:0:sync:/sbin:/bin/sync
halt:*:7:0:halt:/sbin:/sbin/halt
operator:*:11:0:operator:/root:
# awk -F: '$4 < 14 {print}' passwd.test
root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash
bin:*:1:1:bin:/bin:
daemon:*:2:2:daemon:/sbin:
adm:*:3:4:adm:/var/adm:
lp:*:4:7:lp:/var/spool/lpd:
sync:*:5:0:sync:/sbin:/bin/sync
shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown
halt:*:7:0:halt:/sbin:/sbin/halt
mail:*:8:12:mail:/var/spool/mail:
news:*:9:13:news:/var/spool/news:
operator:*:11:0:operator:/root:
The next example prints all users who are in a group with a value less than or equal to 14:
# awk -F: '$4 <= 14 {print}' passwd.test
root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash
bin:*:1:1:bin:/bin:
daemon:*:2:2:daemon:/sbin:
adm:*:3:4:adm:/var/adm:
lp:*:4:7:lp:/var/spool/lpd:
sync:*:5:0:sync:/sbin:/bin/sync
shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown
halt:*:7:0:halt:/sbin:/sbin/halt
mail:*:8:12:mail:/var/spool/mail:
news:*:9:13:news:/var/spool/news:
uucp:*:10:14:uucp:/var/spool/uucp:
operator:*:11:0:operator:/root:
Let's now print all users who are in a group that does not have a value of 14:
# awk -F: '$4 != 14 {print}' passwd.test
root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash
bin:*:1:1:bin:/bin:
daemon:*:2:2:daemon:/sbin:
adm:*:3:4:adm:/var/adm:
lp:*:4:7:lp:/var/spool/lpd:
sync:*:5:0:sync:/sbin:/bin/sync
shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown
halt:*:7:0:halt:/sbin:/sbin/halt
mail:*:8:12:mail:/var/spool/mail:
news:*:9:13:news:/var/spool/news:
operator:*:11:0:operator:/root:
games:*:12:100:games:/usr/games:
gopher:*:13:30:gopher:/usr/lib/gopher-data:
ftp:*:14:50:FTP User:/home/ftp:
man:*:15:15:Manuals Owner:/:
nobody:*:65534:65534:Nobody:/:/bin/false
col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
Let's now print all users who are in a group with a value greater than or equal to 14:
# awk -F: '$4 >= 14 {print}' passwd.test
uucp:*:10:14:uucp:/var/spool/uucp:
games:*:12:100:games:/usr/games:
gopher:*:13:30:gopher:/usr/lib/gopher-data:
ftp:*:14:50:FTP User:/home/ftp:
man:*:15:15:Manuals Owner:/:
nobody:*:65534:65534:Nobody:/:/bin/false
col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
The last example shows all users who are in a group with a value greater than 14:
# awk -F: '$4 > 14 {print}' passwd.test
games:*:12:100:games:/usr/games:
gopher:*:13:30:gopher:/usr/lib/gopher-data:
ftp:*:14:50:FTP User:/home/ftp:
man:*:15:15:Manuals Owner:/:
nobody:*:65534:65534:Nobody:/:/bin/false
col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
There is much more to awk than what I covered in this section. There are additional awk examples in the shell programming Chapter 28. |
The following table summarizes some of the comparison operators of awk covered in this section:
awk - Search a line for a specified pattern and perform operation(s).
Comparison operators: | ||
---|---|---|
< | Less than. | |
<= | Less than or equal to. | |
== | Equal to. | |
~ | Strings match. | |
!= | Not equal to. | |
>= | Greater than or equal to. | |
> | Greater than. |
3.21.105.193