Chapter 7. Processing Text with awk

Awk is a programming language that can be used to make your shell scripts more powerful, as well as to write independent scripts completely in awk itself. Awk is typically used to perform text-processing operations on data, either through a shell pipe or through operations on files. It's a convenient and clear language that allows for easy report creation, analysis of data and log files, and the performance of otherwise mundane text-processing tasks. Awk has a relatively easy-to-learn syntax. It is also a utility that has been a standard on Unix systems for years, so is almost certain to be available. If you are a C programmer or have some Perl knowledge, you will find that much of what awk has to offer will be familiar to you. This is not a coincidence, as one of the original authors of awk, Brian Kernighan, was also one of the original creators of the C language. Many programmers would say that Perl owes a lot of its text processing to awk. If programming C scares you, you will find awk to be less daunting, and you will find it easy to accomplish some powerful tasks.

Although there are many complicated awk programs, awk typically isn't used for very long programs but for shorter one-off tasks, such as trimming down the amount of data in a web server's access log to only those entries that you want to count or manipulate, swapping the first two columns in a file, or manipulating comma-separated (CSV) files. This chapter introduces you to the basics of awk, providing an introduction to the following subjects:

  • The different versions of awk and how to install gawk (GNU awk)

  • The basics of how awk works

  • The many ways of invoking awk

  • Different ways to print and format your data

  • Using variables and functions

  • Using control blocks to loop over data

What Is awk (Gawk/Mawk/Nawk/Oawk)?

Awk was first designed by Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan at AT&T Bell Laboratories. (If you take the first letter of each of their last names, you see the origin of awk.) They designed awk in 1977, but awk has changed over the years through many different implementations. Because companies competed, rather than cooperated, in their writing of their implementations of the early Unix operating system, different versions of awk were developed for SYSV Unix compared to those for BSD Unix. Eventually, a POSIX standard was developed, and then a GNU Free Software version was created. Because of all these differing implementations of awk, different systems often have different versions installed.

The many different awks have slightly different names; together, they sound like a gaggle of birds squawking. The most influential and widely available version of awk today is GNU awk, known as gawk for short. Some systems have the original implementation of awk installed, and it is simply referred to as awk. Some systems may have more than one version of awk installed: the new version of awk, called nawk, and the old version available as oawk (for either old awk or original awk). Some create a symlink from the awk command to gawk, or mawk.

However it is done on your system, it may be confusing and difficult to discern which awk you have. If you don't know which version or implementation you have, it's difficult to know what functionality your awk supports. Writing awk scripts is frustrating if you implement something that is supported in GNU awk, but you have only the old awk installed.

Gawk, the GNU awk

Gawk is commonly considered to be the most popular version of awk available today. Gawk comes from the GNU Foundation, and in true GNU fashion, it has many enhancements that other versions lack.

The enhancements that gawk has over the traditional awks are too numerous to cover here; however, a few of the most notable follow:

  • Gawk tends to provide you with more informative error messages. Most awk implementations try to tell you what line a syntax error occurs, but gawk does one better by telling you where in that line it occurs.

  • Gawk has no built-in limits that people sometimes run into when using the other awks to do large batch processing.

  • Gawk also has a number of predefined variables, functions, and commands that make your awk programming much simpler.

  • Gawk has a number of useful flags that can be passed on invocation, including the very pragmatic options that give you the version of gawk you have installed and provide you with a command summary (--version and --help, respectively).

  • Gawk allows you to specify line breaks using to continue long lines easily.

  • Gawk's regular expression capability is greatly enhanced over the other awks.

  • Although gawk implements the POSIX awk standard, the GNU extensions it has do not adhere to these standards, but if you require explicit POSIX compatibility this can be enabled with gawk using the invocation flags --traditional or --posix. For a full discussion of the GNU extensions to the awk language, see the gawk documentation, specifically Appendix A.5 in the latest manual.

If these features are not enough, the gawk project is very active, with a number of people contributing, whereas mawk has not had a release in several years. Gawk has been ported to a dizzying array of architectures, from Atari, Amiga, and BeOS to Vax/VMS. Gawk is the standard awk that is installed on GNU/Linux and BSD machines.

The additional features, the respect that the GNU Foundation has in making quality free (as in freedom) software, the wide deployment on GNU/Linux systems, and the active development in the project are all probable reasons why gawk has become the favorite over time.

What Version Do I Have Installed?

There is no single test to find out what version or implementation of awk you have installed. You can do a few things to deduce it, or you can install it yourself so you know exactly what is installed. Check your documentation, man pages, and info files to see if you can find a mention of which implementation is referenced, looking out for any mention of oawk, nawk, gawk, or mawk. Also, poke around on your system to find where the awk binary is, and see if there are others installed. It is highly unlikely that you have no version installed, but the hard part is figuring out which version you do have.

Gawk takes the standard GNU version flags to determine what version you are running. If you run awk with these flags as shown in the following Try It Out, and it succeeds, you know that you have GNU awk available.

Installing gawk

By far the most popular awk is the GNU Foundation's implementation, gawk. If you find that your system does not have gawk installed, and you wish to install it, follow these steps. If you have a system that gawk has not been ported to, you may need to install a different awk. The known alternatives and where they can be found are listed in the awk FAQ at www.faqs.org/faqs/computer-lang/awk/faq/.

Note

Be careful when putting gawk on your system! Some systems depend on the version of awk that they have installed in /usr/bin, and if you overwrite that with gawk, you may find your system unable to work properly, because some system scripts may have been written for the older implementation. For example, fink for Mac OS X requires the old awk in /usr/bin/awk. If you replace that awk with gawk, fink no longer works properly. The instructions in this section show you how to install gawk without overwriting the existing awk on the system, but you should pay careful attention to this fact!

By far the easiest way to install gawk is to install a prepackaged version, if your operating system provides it. Installation this way is much simpler and easier to maintain. For example, to install gawk on the Debian GNU/Linux OS, type this command:

apt-get install gawk

Mac OS X has gawk available through fink. Fink is a command-line program that you can use to fetch and easily install some useful software that has been ported to OS X. If you don't have fink installed on your system, you can get it at http://fink.sourceforge.net/download/index.php.

If your system does not have packages, or if you want to install gawk on your own, follow these steps:

  1. Obtain the gawk software. The home page for GNU gawk is www.gnu.org/software/gawk/. You can find the latest version of the software at http://ftp.gnu.org/gnu/gawk/. Get the latest .tar.gz from there, and then uncompress and untar it as you would any normal tar:

    $ tar -zxf gawk-3.1.4.tar.gz
    $ cd gawk-3.1.4
  2. Review the README file that is included in the source. Additionally, you need to read the OS-specific README file in the directory README_d for any notes on installing gawk on your specific system.

  3. To configure awk, type the following command:

    $ sh ./configure
    checking for a BSD-compatible install... /usr/bin/install -c
    checking whether build environment is sane... yes
    checking for gawk... gawk
    checking whether make sets $(MAKE)... yes
    checking for gcc... gcc

    This continues to run through the GNU autoconf configuration, analyzing your system for various utilities, variables, and parameters that need to be set or exist on your system before you can compile awk. This can take some time before it finishes. If this succeeds, you can continue with compiling awk itself. If it doesn't, you need to resolve the configuration problem(s) that are presented before proceeding. Autoconf indicates if there is a significant problem with your configuration and requires you to resolve it and rerun ./configure before it can continue. It is not uncommon for autoconf to look for a utility and not find it and then proceed. This does not mean it has failed; it exits with an error if it fails.

  4. To compile awk, issue a make command:

    $ make
          make 'CFLAGS=-g -O2' 'LDFLAGS=-export-dynamic' all-recursive
    make[1]: Entering directory `/home/micah/working/gawk-3.1.4'
    Making all in intl
    make[2]: Entering directory `/home/micah/working/gawk-3.1.4/intl'
    make[2]: Nothing to be done for `all'.
    make[2]: Leaving directory `/home/micah/working/gawk-3.1.4/intl'
    Making all in .

    This command continues to compile awk. It may take a few minutes to compile, depending on your system.

  5. If everything goes as expected, you can install the newly compiled awk simply by issuing the make install command as root:

    $ su
    Password:
    # make install

    Awk is placed in the default locations in your file system. By default, make install installs all the files in /usr/local/bin, /usr/local/lib, and so on. You can specify an installation prefix other than /usr/local using --prefix when running configure; for instance, sh ./configure --prefix=$HOME will make awk so that it installs in your home directory. However, please heed the warning about replacing your system's installed awk, if it has one!

How awk Works

Awk has some basic functionality similarities with sed (see Chapter 6). At its most basic level, awk simply looks at lines that are sent to it, searching them for a pattern that you have specified. If it finds a line that matches the pattern that you have specified, awk does something to that line. That "something" is the action that you specify by your commands. Awk then continues processing the remaining lines until it reaches the end. Sed acts in the same way: It searches for lines and then performs editing commands on the lines that match. The input comes from standard in and is sent to standard out. In this way, awk is stream-oriented, just like sed.

In fact, there are a number of things about awk that are similar to sed. The syntax for using awk is very similar to sed; both are invoked using similar syntax; both use regular expressions for matching patterns.

Although the similarities exist, there are syntactic differences. When you run awk, you specify the pattern, followed by an action contained in curly braces. A very basic awk program looks like this:

awk '/somedata/ { print $0 }' filename

The rest of this brief section provides just an overview of the basic steps awk follows in processing the command. The following sections fill in the details of each step.

In this example, the expression that awk looks for is somedata. This is enclosed in slashes, and the action to be performed, indicated within the curly braces, is print $0. Awk works by stepping through three stages. The first is what happens before any data is processed; the second is what happens during the data processing loop; and the third is what happens after the data is finished processing. Before any lines are read in to awk and then processed, awk does some preinitialization, which is configurable in your script, by specifying a BEGIN clause. At the end of processing, you can perform any final actions by using an END clause.

Invoking awk

You can invoke awk in one of several ways, depending on what you are doing with it. When you become more familiar with awk, you will want to do quick things on the command line, and as things become more complex, you will turn them into awk programs.

The simplest method is to invoke awk on the command line. This is useful if what you are doing is relatively simple, and you just need to do it quickly. Awk can also be invoked this way within small shell scripts. You run an awk program on the command line by typing awk and then the program, followed by the files you want to run the program on:

awk 'program' filename1 filename2

The filename1 and filename2 are not required. You can specify only one input file, or two or more, and awk can even be run without any input files. Or you can pipe the data to awk instead of specifying any input files:

cat filename1 | sed 'program'

The program is enclosed in single quotes to keep the shell from interpreting special characters and to make the program a single argument to awk. The contents of program are the pattern to match, followed by the curly braces, which enclose the action to take on the pattern. The following Try It Out gives you some practice running basic awk programs.

Your awk commands will soon become longer and longer, and it will be cumbersome to type them on the command line. At some point you will find putting all your commands into a file to be a more useful way of invoking awk. You do this by putting all the awk commands into a file and then invoking awk with the -f flag followed by the file that contains your commands. The following Try It Out demonstrates this way of invoking awk.

You can also write full awk shell scripts by adding a magic file handle at the top of the file, as in the following Try It Out.

The print Command

Earlier, in the section How awk Works, you saw the basic method of using awk to search for a string and then print it. In this section, you learn exactly how that print command works and some more advanced useful incarnations of it.

First, you need some sample data to work with. Say you have a file called countries.txt, and each line in the file contains the following information:

Country   Internet domain   Area in sq. km   Population   Land lines   Cell phones

The beginning of the file has the following contents:

Afghanistan    .af 647500         28513677       33100          12000
Albania        .al    28748          3544808        255000         1100000
Algeria        .dz    2381740        32129324       2199600        1447310
Andorra        .ad    468            69865          35000          23500
Angola         .ao    1246700        10978552       96300          130000

The following command searches the file countries.txt for the string Al and then uses the print command to print the results:

$ awk '/Al/ { print $0 }' countries.txt
Albania         .al     28748   3544808         255000          1100000
Algeria         .dz     2381740 32129324        2199600         1447310

As you can see from this example, the regular expression surrounds the string to be searched for, in this case Al, and this matches two lines. The lines that are matched then have the command specified within the curly braces acted on it; in this case print $0 is executed, printing the lines.

This isn't very interesting, because you can do this with grep or sed. This is where awk starts to become interesting, because you can very easily say that you want to print only the matching countries' landline and cell phone usage:

$ awk '/Al/ { print $5,$6 }' countries.txt
255000 1100000
2199600  1447310

In this example, the same search pattern was supplied, and for each line that is matched, awk performs the specified actions, in this case, printing the fifth and sixth field. Awk automatically stores each field in its numerical sequential order. By default, awk defines the fields as any string of printing characters separated by spaces. The first field is the $0 field, which represents the entire line; this is why when you specified the action print $0, the entire line was printed for each match. Field $1 represents the first field (in our example, Country), the $2 represents the second field (Internet Domain), and so on.

By default, awk's behavior is to print the entire line, so each of the following lines results in the same output:

awk '/Al/' countries.txt
awk '/Al/ { print $0 }' countries.txt
awk '/Al/ { print }' countries.txt

Although explicitly writing print $0 is not necessary, it does make for good programming practice because you are making it very clear that this is your action instead of using a shortcut.

It is perfectly legal to omit a search pattern from your awk statement. When there is no search pattern provided, awk by default matches all the lines of your input file and performs the action on each line.

For example, the following command prints the number of cell phones in each country:

$ awk '{ print $6 }' countries.txt
12000
1100000
1447310
23500
130000

It prints each line because I did not specify a search pattern.

You can also insert text anywhere in the command action, as demonstrated in the following Try It Out.

If you want to print a newline as part of your print command, just include the standard newline sequence as part of the string, as in the following Try It Out.

Using Field Separators

The default field separator in awk is a blank space. When you insert a blank space by pressing the spacebar or Tab key, awk delineates each word in a line as a different field. However, if the data that you are working with includes spaces within the text itself, you may encounter difficulties.

For example, if you add more countries to your countries.txt file to include some that have spaces in them (such as Dominican Republic), you end up with problems. The following command prints the area of each country in the file:

$ awk '{ print $3 }' countries.txt
647500
28748
2381740
.do
468
1246700

Why is the .do included in the output? Because one of the lines of this file contains this text:

Dominican Republic  .do  8833634     48730  901800  2120400

The country Dominican Republic counts as two fields because it has a space within its name. You need to be very careful that your fields are uniform, or you will end up with ambiguous data like this. There are a number of ways to get around this problem; one of the easiest methods is to specify a unique field separator and format your data accordingly. In this case, you need to format your countries.txt file so that any country that has spaces in its name instead had underscores, so Dominican Republic becomes Dominican_Republic.

Unfortunately, it isn't always practical or possible to change your input data file. In this case, you can invoke awk with the -F flag to specify an alternative field separator character instead of the space. A very common field separator is the comma, so to instruct awk to use a comma as the character that separates fields, invoke awk using -F, to indicate the comma should be used instead. Most databases are able to export their data into CSV (Comma Separated Values) files. If your data is formatted using commas to separate each field, you can specify that field separator to awk on the command line, as the following Try It Out section demonstrates.

Using the printf Command

The printf (formatted print) command is a more flexible version of print. If you are familiar with C, you will find the printf command very familiar; it was borrowed from that language. Printf is used to specify the width of each item printed. It also can be used to change the output base to use for numbers, to determine how many digits to print after the decimal point, and more. Printf is different from print only because of the format string, which controls how to output the other arguments. One main difference between print and printf is that printf does not include a newline at the end. Another difference is that with printf you specify how you want to format your string. The printf command works in this format:

printf(<string>,<format string>)

The parentheses are optional, but otherwise, the basic print command that you have been using so far is almost identical:

printf("Hi Mom!
")

The string is the same with the exception of the added character, which adds a newline to the end of the string. This doesn't seem very useful, because now you have to add a newline when you didn't before. However, printf has more flexibility because you can specify format codes to control the results of the expressions, as shown in the following Try It Out examples.

Using printf Format Modifiers

These printf format characters are useful for representing your strings and numbers in the way that you expect. You can also add a modifier to your printf format characters to specify how much of the value to print or to format the value with a specified number of spaces.

You can provide an integer before the format character to specify a width that the output would use, as in the following example:

$ awk '{ printf "|%16s|
", $6 }' countries.txt
|     Afghanistan|
|         Albania|
|         Algeria|

Here, the width 16 was passed to the format modifier %s to make the string the same length in each line of the output. You can left-justify this text by placing a minus sign in front of the number, as follows:

$ awk '{ printf "|%-16s|
", $1 }' countries.txt
|Afghanistan     |
|Albania         |
|Algeria         |

Use a fractional number to specify the maximum number of characters to print in a string or the number of digits to print to the right of the decimal point for a floating-point number:

$ awk '{ printf "|%-.4s|
", $1 }' countries.txt
|Afgh|
|Alba|
|Alge|

Using the sprintf Command

The sprintf function operates exactly like printf, with the same syntax. The only difference is that it assigns its output to a variable (variables are discussed in the next section), rather than printing it to standard out. The following example shows how this works:

$ awk '{ variable = sprintf("[%-.4s]", $1); print variable}' countries.txt
|Afgh|
|Alba|
|Alge|

This assigns the output from the sprintf function to the variable variable and then prints that variable, which results in the same output as if you had used printf.

Using Variables in awk

In Chapter 2, variables were introduced as a mechanism to store values that can be manipulated or read later, and in many ways they operate the same in awk, with some differences in syntax and particular built-in variables. The last section introduced the sprintf command, which assigns its output to a variable. The example in that section was a user-defined variable. Awk also has some predefined, or built-in, variables that can be referenced. The following sections provide more detail on using these two types of variables with awk.

User-Defined Variables

User-defined variables have a few rules associated with them. They must not start with a digit and are case sensitive. Besides these rules, your variables can consist of alphanumeric characters and underscores. A user-defined variable must not conflict with awk's reserved built-in variables or commands. For example, you may not create a user-defined variable called print, because this is an awk command. Unlike some programming languages, variables in awk do not need to be initialized or declared. The first time you use a variable, it is set to an empty string ("") and assigned 0 as its numerical value. However, relying on default values is a bad programming practice and should be avoided. If your awk script is long, define the variables you will be using in the BEGIN block, with the values that you want set as defaults.

Variables are assigned values simply by writing the variable, followed by an equal sign and then the value. Because awk is a "weak-typed" language, you can assign numbers or strings to variables:

myvariable = 3.141592654
myvariable = "some string"

When you perform a numeric operation on a variable, awk gives you a numerical result; if a string operation is performed, a string will be the result.

In the earlier section on printf, the Try It Out example used format string modifiers to specify columnar widths so that the column header lined up with the data. This format string modifier could be set in a variable instead of having to type it each time, as in the following code:

BEGIN { colfmt="%-15s %20s
"; printf colfmt, "Country", "Cell phones
" }
      { printf colfmt, $1, $6 }

In this example, a user-defined variable called colfmt is set, containing the format string specifiers that you want to use in the rest of the script. Once it is defined, you can reference it simply by using the variable; in this case it is referenced twice in the two printf statements.

Built-in Variables

Built-in variables are very useful if you know what they are used for. The following subsections introduce you to some of the most commonly used built-in variables.

Remember, you should not create a user-defined variable that conflicts with any of awk's built-in variables.

The FS Variable

FS is awk's built-in variable that contains the character used to denote separate fields. In the section Using Field Separators you modified this variable on the command line by passing the -F argument to awk with a new field separator value (in that case, you replaced the default field separator value with a comma to parse CSV files). It is actually more convenient to put the field separator into your script using awk's built-in FS variable rather than setting it on the command line. This is more useful when the awk script is in a file, rather on the command line where specifying flags to awk is not difficult.

To change the field separator within a script you use the special built-in awk variable, FS. To change the field separator variable, you need to assign a new value to it at the beginning of the script. It must be done before any input lines are read, or it will not be effective on every line, so you should set the field separator value in an action controlled by the BEGIN rule, as in the following Try It Out.

FS Regular Expressions

The FS variable can contain more than a single character, and when it does, it is interpreted as a regular expression. If you use a regular expression for a field separator, you then have the ability to specify several characters to be used as delimiters, instead of just one, as in the following Try It Out.

The NR Variable

The built-in variable NR is automatically incremented by awk on each new line it processes. It always contains the number of the current record. This is a useful variable because you can use it to count how many lines are in your data, as in the following Try It Out.

The following table contains the basic built-in awk variables and what they contain. You will find these very useful as you make awk scripts and you need to make decisions about how your script runs depending on what is happening internally.

Built-in Variable

Contents

ARGC, ARGV

Contains a count and an array of the command-line arguments.

CONVFMT

Controls conversions of numbers to strings; default value is set to %.6g.

ENVIRON

Contains an associative array of the current environment. Array indices are set to environment variable names.

FILENAME

The name of the file that awk is currently reading. Set to - if reading from STDIN; is empty in a BEGIN block.

FNR

Current record number in the current file, incremented for each line read. Set to 0 each time a new file is read.

FS

Input field separator; default value is " ", a string containing a single space. Set on command line with flag -F.

NF

Number of fields in the current input line. NF is set every time a new line is read.

NR

Number of records processed since the beginning of execution. It is incremented with each new record read.

OFS

Output field separator; default value is a single space. The contents of this variable are output between fields printed by the print statement.

ORS

Output record specifier; the contents of this variable are output at the end of every print statement. Default value is , a newline.

PROCINFO

An array containing information about the running program. Elements such as "gid", "uid", "pid", and "version" are available.

RS

Input record separator; default value is a string containing a newline, so an input record is a single line of text.

Control Statements

Control statements are statements that control the flow of execution of your awk program. Awk control statements are modeled after similar statements in C, and the looping and iteration concepts are the same as were introduced in Chapter 3. This means you have your standard if, while, for, do, and similar statements.

All control statements contain a control statement keyword, such as if, and then what actions to perform on the different results of the control statement.

if Statements

One of the most important awk decision making statements is the if statement. It follows a standard if (condition) then-action [else else-action] format, as in the following Try It Out.

Comparison Operators

These examples use the less than operation, but there are many other operators available for making conditional statements powerful. Another example is the equal comparison operator, which checks to see if something is equal to another thing. For example, the following command looks in the first field on each line for the string Andorra and, if it finds it, prints it:

awk '{ if ($1 == "Andorra") print }'

Unlike some languages, relational expressions in awk do not return a value; they evaluate to a true condition or a false condition only.

The following table lists the comparison operators available in awk.

Comparison Operator

Description

<

Less than

<=

Less than or equal to

>

Greater than

>=

Greater than or equal to

!=

Not equal

==

Equal

It is also possible to combine as many comparison operators in one statement as you require by using AND (&&) as well as OR (||) operators. This allows you to test for more than one thing before your control statement is evaluated to be true.

For example:

$ awk '{ if ((($1 == "Andorra") && ($3 <= 500)) || ($1 == "Angola")) print }'
Andorra .ad 468 69865 35000 23500
Angola .ao 1246700 10978552 96300 130000

This prints any line whose first field contains the string Andorra and whose third field contains a number that is less than or equal to 500, or any line whose first field contains the string Angola.

As this example illustrates, each condition that you are testing must be surrounded by parentheses. Because the first and second condition are together (the first field has to match Andorra and the third field must be less than or equal to 500), the two are enclosed together in additional parentheses. There are also opening and closing parentheses that surround the entire conditional.

Arithmetic Functions

The comparison operators are useful for making comparisons, but you often will want to make changes to variables. Awk is able to perform all the standard arithmetic functions on numbers (addition, subtraction, multiplication, and division), as well as modulo (remainder) division, and does so in floating point. The following Try It Out demonstrates some arithmetic functions.

Output Redirection

Be careful when using comparison operators, because some of them double as shell output variables in different contexts. For example the > character can be used in an awk statement to send the output from a command or a function into the file specified. For example, if you do the following:

$ awk 'BEGIN { print 4+5 > "result" }'

you create a file called result in your current working directory and then print the result of the sum of 4 + 5 into the file. If the file result already exists, it will be overwritten, unless you use the shell append operator, as follows:

$ awk 'BEGIN { print 5+5 >> "result" }'

This appends the summation of 5 + 5 to the end of the result file. If that file doesn't exist, it will be created.

Output from commands can also be piped into other system commands in the same way that this can be done on the shell command line.

While Loops

While statements in awk implement basic looping logic, using the same concepts introduced in Chapter 3. Loops continually execute statements until a condition is met. A while loop executes the statements that you specify while the condition specified evaluates to true.

While statements have a condition and an action. The condition is the same as the conditions used in if statements. The action is performed as long as the condition tests to be true. The condition is tested; if it is true, the action happens, and then awk loops back and tests the condition again. At some point, unless you have an infinite loop, the condition evaluates to be false, and then the action is not performed and the next statement in your awk program is executed.

For Loops

For loops are more flexible and provide a syntax that is easier to use, although they may seem more complex. They achieve the same results as a while loop but are often a better way of expressing it. Check out this example.

Functions

Awk has some built-in functions that make life as an awk programmer easier. These functions are always available; you don't need to define them or bring in any extra libraries to make them work. A function is called with arguments and returns the results. Functions are useful for things such as performing numeric conversions, finding the length of strings, changing the case of a string, running system commands, printing the current time, and the like.

Different functions have different requirements for how many arguments must be passed in order for them to work. Many have optional arguments that do not need to be included or have defaults that can be set if you desire. If you provide too many arguments to a function, gawk gives you a fatal error, while some awk implementations just ignore the extra arguments.

Functions are called in a standard way: the function name, an opening parenthesis, and then before the final parenthesis the arguments to the function. For example, sin($3) is calling the sin function and passing the argument $3. This function returns the mathematical sine value of whatever argument is sent to it.

Function arguments that are expressions, such as x+y, are evaluated before the function is passed those arguments. The result of x+y is what is passed to the function, rather than "x+y" itself.

Resources

The following are some good resources on the awk language:

  • You can find the sources to awk at ftp://ftp.gnu.org/pub/gnu/awk.

  • The Awk FAQ has many useful answers to some of the most commonly asked questions. It is available at www.faqs.org/faqs/computer-lang/awk/faq/.

  • The GNU Gawk manual is a very clear and easy-to-understand guide through the language: www.gnu.org/software/gawk/manual/gawk.html.

  • The newsgroup for awk is comp.lang.awk.

Summary

Awk can be complex and overwhelming, but the key to any scripting language is to learn some of the basics and start writing some simple scripts. As you practice, you will become more proficient and faster with writing your scripts. Now that you have a basic understanding of awk, you can dive further into the complexities of the language and use what you know to accomplish whatever it is you need to do in your shell scripts.

In this chapter:

  • You learned what awk is and how it works, all the different versions that are available, and how to tell what version you have installed on your system. You also learned how to compile and install gawk, the most frequently used awk implementation.

  • You learned how awk programs flow, from BEGIN to END, and the many different ways that awk can be invoked: from the command line or by creating independent awk scripts.

  • You learned the basic awk print command and the more advanced printf and sprintf.

  • You learned about different fields, the field separator variable, and different ways to change this to what you need according to your data.

  • You learned about string formatting and format modifier characters, and now you can make nice-looking reports easily.

  • You learned how to create your own variables and about the different built-in variables that are available to query throughout your programs.

  • Control blocks were introduced, and you learned how to do if, for, and do loops.

  • Arithmetic operators and comparison operators were introduced, as well as different ways to increment and decrement variables.

  • You were briefly introduced to some of awk's standard built-in functions.

Exercises

  1. Pipe your /etc/passwd file to awk, and print out the home directory of each user.

  2. Change the following awk line so that it prints exactly the same but doesn't make use of commas:

    awk '{ print "Number of cell phones in use in",$1":",$6 }' countries.txt
  3. Print nicely formatted column headings for each of the fields in the countries.txt file, using a variable to store your format specifier.

  4. Using the data from the countries.txt file, print the total ratio of cell phones to all the landlines in the world.

  5. Provide a total of all the fields in the countries.txt at the bottom of the output.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.202.203