Chapter 6. Command-Line Options and Typed Variables

You should have a healthy grasp of shell programming techniques now that you have gone through the previous chapters. What you have learned up to this point enables you to write many non-trivial, useful shell scripts and functions.

Still, you may have noticed some remaining gaps in the knowledge you need to write shell code that behaves like the UNIX commands you are used to. In particular, if you are an experienced UNIX user, it might have occurred to you that none of the example scripts shown so far have the ability to handle options preceded by a dash (-) on the command line. And if you program in a conventional language like C or Pascal, you will have noticed that the only type of data that we have seen in shell variables is character strings; we haven’t seen how to do arithmetic, for example.

These capabilities are certainly crucial to the shell’s ability to function as a useful UNIX programming language. In this chapter, we will show how bash supports these and related features.

Command-Line Options

We have already seen many examples of the positional parameters (variables called 1, 2, 3, etc.) that the shell uses to store the command-line arguments to a shell script or function when it runs. We have also seen related variables like * (for the string of all arguments) and # (for the number of arguments).

Indeed, these variables hold all of the information on the user’s command-line. But consider what happens when options are involved. Typical UNIX commands have the form command [-options]args, meaning that there can be 0 or more options. If a shell script processes the command teatime alice hatter, then $1 is “alice” and $2 is “hatter”. But if the command is teatime -o alice hatter, then $1 is -o, $2 is “alice”, and $3 is “hatter”.

You might think you could write code like this to handle it:

if [ $1 = -o ]; then
    code that processes the -o option
    1=$2
    2=$3
fi
     
normal processing of $1 and $2...

But this code has several problems. First, assignments like 1=$2 are illegal because positional parameters are read-only. Even if they were legal, another problem is that this kind of code imposes limitations on how many arguments the script can handle—which is very unwise. Furthermore, if this command had several possible options, the code to handle all of them would get very messy very quickly.

shift

Luckily, the shell provides a way around this problem. The command shift performs the function of:

1=$2
2=$3
...

for every argument, regardless of how many there are. If you supply a numeric argument to shift, it will shift the arguments that many times over; for example, shift 3 has this effect:

1=$4
2=$5
...

This leads immediately to some code that handles a single option (call it -o) and arbitrarily many arguments:

if [ $1 = -o ]; then
    process the -o option  
    shift
fi
normal processing of arguments...

After the if construct, $1, $2, etc., are set to the correct arguments.

We can use shift together with the programming features we have seen so far to implement simple option schemes. However, we will need additional help when things get more complex. The getopts built-in command, which we will introduce later, provides this help.

shift by itself gives us enough power to implement the - N option to the highest script we saw in Chapter 4 (Task 4-1). Recall that this script takes an input file that lists artists and the number of albums you have by them. It sorts the list and prints out the N highest numbers, in descending order. The code that does the actual data processing is:

filename=$1
howmany=${2:-10}
sort -nr $filename | head -$howmany

Our original syntax for calling this script was highest filename [- N ], where N defaults to 10 if omitted. Let’s change this to a more conventional UNIX syntax, in which options are given before arguments: highest [- N ] filename. Here is how we would write the script with this syntax:

if [ -n "$(echo $1 | grep '^-[0-9][0-9]*$')" ]; then
    howmany=$1
    shift
elif [ -n "$(echo $1 | grep '^-')" ]; then
    print 'usage: highest [-N] filename'
    exit 1
else
    howmany="-10"
fi
     
filename=$1
sort -nr $filename | head $howmany

This uses the grep search utility to test if $1 matches the appropriate pattern. To do this we provide the regular expression ^-[0-9][0-9]*$ to grep, which is interpreted as “an initial dash followed by a digit, optionally followed by one or more digits.” If a match is found then grep will return the match and the test will be true, otherwise grep will return nothing and processing will pass to the elif test. Notice that we have enclosed the regular expression in single quotes to stop the shell from interpreting the $ and *, and pass them through to grep unmodified.

If $1 doesn’t match, we test to see if it’s an option at all, i.e., if it matches the pattern - followed by anything else. If it does, then it’s invalid; we print an error message and exit with error status. If we reach the final (else) case, we assume that $1 is a filename and treat it as such in the ensuing code. The rest of the script processes the data as before.

We can extend what we have learned so far to a general technique for handling multiple options. For the sake of concreteness, assume that our script is called alice and we want to handle the options -a, -b, and -c:

while [ -n "$(echo $1 | grep '-')" ]; do
    case $1 in 
        -a ) process option -a 
               ;;
        -b ) process option -b 
               ;;
        -c ) process option -c 
               ;;
        *  ) echo 'usage: alice [-a] [-b] [-c] args...'
             exit 1
    esac
    shift
done
normal processing of arguments...

This code checks $1 repeatedly as long as it starts with a dash (-). Then the case construct runs the appropriate code depending on which option $1 is. If the option is invalid—i.e., if it starts with a dash but isn’t -a, -b, or -c—then the script prints a usage message and returns with an error exit status.

After each option is processed, the arguments are shifted over. The result is that the positional parameters are set to the actual arguments when the while loop finishes.

Notice that this code is capable of handling options of arbitrary length, not just one letter (e.g., -adventure instead of -a).

Options with Arguments

We need to add one more ingredient to make option processing really useful. Recall that many commands have options that take their own arguments. For example, the cut command, on which we relied heavily in Chapter 4, accepts the option -d with an argument that determines the field delimiter (if it is not the default TAB). To handle this type of option, we just use another shift when we are processing the option.

Assume that, in our alice script, the option -b requires its own argument. Here is the modified code that will process it:

while [ -n "$(echo $1 | grep '-')" ]; do
    case $1 in 
        -a ) process option -a ;;
        -b ) process option -b 
               $2 is the option's argument
             shift ;;
        -c ) process option -c ;;
        *  ) echo 'usage: alice [-a] [-b barg] [-c] args...'
             exit 1
    esac
    shift
done
     
normal processing of arguments...

getopts

So far, we have a complete, but constrained, way of handling command-line options. The above code does not allow a user to combine arguments with a single dash, e.g., -abc instead of -a -b -c. It also doesn’t allow one to specify arguments to options without a space in between, e.g., -barg in addition to -b arg.[1]

The shell provides a built-in way to deal with multiple complex options without these constraints. The built-in command getopts [2] can be used as the condition of the while in an option-processing loop. Given a specification of which options are valid and which require their own arguments, it sets up the body of the loop to process each option in turn.

getopts takes two arguments. The first is a string that can contain letters and colons. Each letter is a valid option; if a letter is followed by a colon, the option requires an argument. getopts picks options off the command line and assigns each one (without the leading dash) to a variable whose name is getopts’s second argument. As long as there are options left to process, getopts will return exit status 0; when the options are exhausted, it returns exit status 1, causing the while loop to exit.

getopts does a few other things that make option processing easier; we’ll encounter them as we examine how to use getopts in this example:

while getopts ":ab:c" opt; do
    case $opt in 
        a  ) process option -a 
               ;;
        b  ) process option -b 
               $OPTARG is the option's argument 
               ;;
        c  ) process option -c 
               ;;
        ? ) echo 'usage: alice [-a] [-b barg] [-c] args...'
             exit 1
    esac
done
shift $(($OPTIND - 1))
normal processing of arguments...

The call to getopts in the while condition sets up the loop to accept the options -a, -b, and -c, and specifies that -b takes an argument. (We will explain the : that starts the option string in a moment.) Each time the loop body is executed, it will have the latest option available, without a dash (-), in the variable opt.

If the user types an invalid option, getopts normally prints an unfortunate error message (of the form cmd: getopts: illegal option — o) and sets opt to ?. However if you begin the option letter string with a colon, getopts won’t print the message.[3] We recommend that you specify the colon and provide your own error message in a case that handles ?, as above.

We have modified the code in the case construct to reflect what getopts does. But notice that there are no more shift statements inside the while loop: getopts does not rely on shifts to keep track of where it is. It is unnecessary to shift arguments over until getopts is finished, i.e., until the while loop exits.

If an option has an argument, getopts stores it in the variable OPTARG, which can be used in the code that processes the option.

The one shift statement left is after the while loop. getopts stores in the variable OPTIND the number of the next argument to be processed; in this case, that’s the number of the first (non-option) command-line argument. For example, if the command line were alice -ab rabbit, then $OPTIND would be “3”. If it were alice -a -b rabbit, then $OPTIND would be “4”.

The expression $(($OPTIND - 1)) is an arithmetic expression (as we’ll see later in this chapter) equal to $OPTIND minus 1. This value is used as the argument to shift. The result is that the correct number of arguments are shifted out of the way, leaving the “real” arguments as $1, $2, etc.

Before we continue, now is a good time to summarize everything getopts does:

  1. Its first argument is a string containing all valid option letters. If an option requires an argument, a colon follows its letter in the string. An initial colon causes getopts not to print an error message when the user gives an invalid option.

  2. Its second argument is the name of a variable that will hold each option letter (without any leading dash) as it is processed.

  3. If an option takes an argument, the argument is stored in the variable OPTARG.

  4. The variable OPTIND contains a number equal to the next command-line argument to be processed. After getopts is done, it equals the number of the first “real” argument.

The advantages of getopts are that it minimizes extra code necessary to process options and fully supports the standard UNIX option syntax (as specified in intro of the User’s Manual).

As a more concrete example, let’s return to our graphics utility (Task 4-2). So far, we have given our script the ability to process various types of graphics files such as PCX files (ending with .pcx), GIF files (.gif), XPM files (.xpm), etc. As a reminder, here is what we have coded in the script so far:

filename=$1
     
if [ -z $filename ]; then
    echo "procfile: No file specified"
    exit 1
fi
     
for filename in "$@"; do
    pnmfile=${filename%.*}.ppm

    case $filename in
        *.jpg ) exit 0 ;;

        *.tga ) tgatoppm $filename > $pnmfile ;;

        *.xpm ) xpmtoppm $filename > $pnmfile ;;

        *.pcx ) pcxtoppm $filename > $pnmfile ;;

        *.tif ) tifftopnm $filename > $pnmfile ;;

        *.gif ) giftopnm $filename > $pnmfile ;;

            * ) echo "procfile: $filename is an unknown graphics file."
                exit 1 ;;
    esac

    outfile=${pnmfile%.ppm}.new.jpg

    pnmtojpeg $pnmfile > $outfile
    rm $pnmfile

done

This script works quite well, in that it will convert the various different graphics files that we have lying around into JPEG files suitable for our web page. However, NetPBM has a whole range of useful utilities besides file converters that we could use on the images. It would be nice to be able to select some of them from our script.

Things we might wish to do to modify the images include changing the size and placing a border around them. We want to make the script as flexible as possible; we will want to change the size of the resulting images and we might not want a border around every one of them, so we need to be able to specify to the script what it should do. This is where the command-line option processing will come in useful.

We can change the size of an image by using the NetPBM utility pnmscale. You’ll recall from the last chapter that the NetPBM package has its own format called PNM, the Portable Anymap. The fancy utilities we’ll be using to change the size and add borders work on PNMs. Fortunately, our script already converts the various formats we give it into PNMs. Besides a PNM file, pnmscale also requires some arguments telling it how to scale the image.[4] There are various different ways to do this, but the one we’ll choose is -xysize which takes a horizontal and a vertical size in pixels for the final image.[5]

The other utility we need is pnmmargin, which places a colored border around an image. Its arguments are the width of the border in pixels and the color of the border.

Our graphics utility will need some options to reflect the ones we have just seen. -s size will specify a size into which the final image will fit (minus any border), -w width will specify the width of the border around the image, and -c color-name will specify the color of the border.

Here is the code for the script procimage that includes the option processing:

# Set up the defaults
size=320
width=1
colour="-color black"
usage="Usage: $0 [-s N] [-w N] [-c S] imagefile..."

while getopts ":s:w:c:" opt; do
    case $opt in
      s  ) size=$OPTARG ;;
      w  ) width=$OPTARG ;;
      c  ) colour="-color $OPTARG" ;;
      ? ) echo $usage
           exit 1 ;;
    esac
done

shift $(($OPTIND - 1))

if [ -z "$@" ]; then
    echo $usage
    exit 1
fi

# Process the input files
for filename in "$*"; do
    ppmfile=${filename%.*}.ppm

    case $filename in
        *.gif ) giftopnm $filename > $ppmfile ;;

        *.tga ) tgatoppm $filename > $ppmfile ;;

        *.xpm ) xpmtoppm $filename > $ppmfile ;;

        *.pcx ) pcxtoppm $filename > $ppmfile ;;

        *.tif ) tifftopnm $filename > $ppmfile ;;

        *.jpg ) jpegtopnm -quiet $filename > $ppmfile ;;

            * ) echo "$0: Unknown filetype '${filename##*.}'"
                exit 1;;
    esac

    outfile=${ppmfile%.ppm}.new.jpg
    pnmscale -quiet -xysize $size $size $ppmfile |
        pnmmargin $colour $width |
        pnmtojpeg > $outfile

    rm $ppmfile

done

The first several lines of this script initialize variables with default settings. The defaults set the image size to 320 pixels and a black border of width 1 pixel.

The while, getopts, and case constructs process the options in the same way as in the previous example. The code for the first three options assigns the respective argument to a variable (replacing the default value). The last option is a catchall for any invalid options.

The rest of the code works in much the same way as in the previous example except we have added the pnmscale and pnmmargin utilities in a processing pipeline at the end.

The script also now generates a different filename; it appends .new.jpg to the basename. This allows us to process a JPEG file as input, applying scaling and borders, and write it out without destroying the original file.

This version doesn’t address every issue, e.g., what if we don’t want any scaling to be performed? We’ll return to this script and develop it further in the next chapter.

Typed Variables

So far we’ve seen how bash variables can be assigned textual values. Variables can also have other attributes, including being read only and being of type integer.

You can set variable attributes with the declare built-in. [6] Table 6-1 summarizes the available options with declare.[7] A - turns the option on, while + turns it off.

Table 6-1. Declare options

Option

Meaning

-a

The variables are treated as arrays

-f

Use function names only

-F

Display function names without definitions

-i

The variables are treated as integers

-r

Makes the variables read-only

-x

Marks the variables for export via the environment

Typing declare on its own displays the values of all variables in the environment. The -f option limits this display to the function names and definitions currently in the environment. -F limits it further by displaying only the function names.

The -a option declares arrays—a variable type that we haven’t seen yet, but will be discussed shortly.

The -i option is used to create an integer variable, one that holds numeric values and can be used in and modified by arithmetic operations. Consider this example:

$ val1=12 val2=5
$ result1=val*val2
$ echo $result1
val1*val2
$
$ declare -i val3=12 val4=5
$ declare -i result2
$ result2=val3*val4
$ echo $result2
60

In the first example, the variables are ordinary shell variables and the result is just the string “val1*val2”. In the second example, all of the variables have been declared as type integer. The variable result contains the result of the arithmetic computation twelve multiplied by five. Actually, we didn’t need to declare val3 and val4 as type integer. Anything being assigned to result2 is interpreted as an arithmetic statement and evaluation is attempted.

The -x option to declare operates in the same way as the export built-in that we saw in Chapter 3. It allows the listed variables to be exported outside the current shell environment.

The -r option creates a read-only variable, one that cannot have its value changed by subsequent assignment statements and cannot be unset.

A related built-in is readonly name ... which operates in exactly the same way as declare -r. readonly has three options: -f, which makes readonly interpret the name arguments as function names rather than variable names, -p, which makes the built-in print a list of all read-only names, and -a, which interprets the name arguments as arrays.

Lastly, variables declared in a function are local to that function, just like using local to declare them.

Integer Variables and Arithmetic

The expression $(($OPTIND - 1)) in the last graphics utility example shows another way that the shell can do integer arithmetic. As you might guess, the shell interprets words surrounded by $(( and )) as arithmetic expressions.[8] Variables in arithmetic expressions do not need to be preceded by dollar signs, though it is not wrong to do so.

Arithmetic expressions are evaluated inside double quotes, like tildes, variables, and command substitutions. We’re finally in a position to state the definitive rule about quoting strings: when in doubt, enclose a string in single quotes, unless it contains tildes or any expression involving a dollar sign, in which case you should use double quotes.

For example, the date command on modern versions of UNIX accepts arguments that tell it how to format its output. The argument +%j tells it to print the day of the year, i.e., the number of days since December 31st of the previous year.

We can use +%j to print a little holiday anticipation message:

echo "Only $(( (365-$(date +%j)) / 7 )) weeks until the New Year"

We’ll show where this fits in the overall scheme of command-line processing in Chapter 7.

The arithmetic expression feature is built into bash’s syntax, and was available in the Bourne shell (most versions) only through the external command expr. Thus it is yet another example of a desirable feature provided by an external command being better integrated into the shell. getopts, as we have already seen, is another example of this design trend.

bash arithmetic expressions are equivalent to their counterparts in the Java and C languages.[9] Precedence and associativity are the same as in C. Table 6-2 shows the arithmetic operators that are supported. Although some of these are (or contain) special characters, there is no need to backslash-escape them, because they are within the $((...)) syntax.

Table 6-2. Arithmetic operators

Operator

Meaning

++

Increment by one (prefix and postfix)

Decrement by one (prefix and postfix)

+

Plus

-

Minus

*

Multiplication

/

Division (with truncation)

%

Remainder

**

Exponentiation[10]

<<

Bit-shift left

>>

Bit-shift right

&

Bitwise and

|

Bitwise or

~

Bitwise not

!

Logical not

^

Bitwise exclusive or

,

Sequential evaluation

[10] Note that ** is not in the C language.

The ++ and - operators are useful when you want to increment or decrement a value by one.[11] They work the same as in Java and C, e.g., value++ increments value by 1. This is called post-increment; there is also a pre-increment: ++value. The difference becomes evident with an example:

$ i=0
$ echo $i
0
$ echo $((i++))
0
$ echo $i
1
$ echo $((++i))
2
$ echo $i
2

In both cases the value has been incremented by one. However, in the first case (post-increment) the value of the variable was passed to echo and then the variable was incremented. In the second case (pre-increment) the increment was performed and then the variable passed to echo.

Parentheses can be used to group subexpressions. The arithmetic expression syntax also (as in C) supports relational operators as “truth values” of 1 for true and 0 for false. Table 6-3 shows the relational operators and the logical operators that can be used to combine relational expressions.

Table 6-3. Relational operators

Operator

Meaning

<

Less than

>

Greater than

<=

Less than or equal to

>=

Greater than or equal to

==

Equal to

!=

Not equal to

&&

Logical and

||

Logical or

For example, $((3 > 2)) has the value 1; $(( (3 > 2) || (4 <= 1) )) also has the value 1, since at least one of the two subexpressions is true.

The shell also supports base N numbers, where N can be from 2 to 36. The notation B # N means "N base B“. Of course, if you omit the B #, the base defaults to 10.

Arithmetic Conditionals

In Chapter 5, we saw how to compare strings by the use of [...] notation (or with the test built-in). Arithmetic conditions can also be tested in this way. However, the tests have to be carried out with their own operators. These are shown in Table 6-4.

Table 6-4. Test relational operators

Operator

Meaning

-lt

Less than

-gt

Greater than

-le

Less than or equal to

-ge

Greater than or equal to

-eq

Equal to

-ne

Not equal to

And as with string comparisons, the arithmetic test returns a result of true or false; 0 if true, 1 otherwise. So, for example, [ 3 -gt 2 ] produces exit status 0, as does [ ( 3 -gt 2 ) || ( 4 -le 1 ) ], but [ ( 3 -gt 2 ) && ( 4 -le 1 ) ] has exit status 1 since the second subexpression isn’t true.

In these examples we have had to escape the parentheses and pass them to test as separate arguments. As you can see, the result can look rather unreadable if there are many parentheses.

Another way to make arithmetic tests is to use the $((...)) form to encapsulate the condition. For example: [ $(((3 > 2) && (4 <= 1))) = 1 ]. This evaluates the conditionals and then compares the resulting value to 1 (true).[12]

There is an even neater and more efficient way of performing an arithmetic test: by using the ((...)) construct.[13] This returns an exit status of 0 if the expression is true, and 1 otherwise.

The above expression using this construct becomes (( (3 > 2) && (4 <= 1) )). This example returns with an exit status of 1 because, as we said, the second subexpression is false.

Arithmetic Variables and Assignment

As we saw earlier, you can define integer variables by using declare. You can also evaluate arithmetic expressions and assign them to variables with the use of let. The syntax is:

let intvar=expression

It is not necessary (because it’s actually redundant) to surround the expression with $(( and )) in a let statement. let doesn’t create a variable of type integer; it only causes the expression following the assignment to be interpreted as an arithmetic one. As with any variable assignment, there must not be any space on either side of the equal sign (=). It is good practice to surround expressions with quotes, since many characters are treated as special by the shell (e.g., *, #, and parentheses); furthermore, you must quote expressions that include whitespace (spaces or TABs). See Table 6-5 for examples.

Table 6-5. Sample integer expression assignments

Assignment

Value

let x=

$x

1+4

5

`1 + 4’

5

`(2+3) * 5’

25

`2 + 3 * 5’

17

`17 / 3’

5

`17 % 3’

2

`1<<4’

16

`48>>3’

6

`17 & 3’

1

`17 | 3’

19

`17 ^ 3’

18

Here is the code:

for dir in ${*:-.}; do
    if [ -e $dir ]; then
        result=$(du -s $dir | cut -f 1)
        let total=$result*1024
     
        echo -n "Total for $dir = $total bytes"
     
        if [ $total -ge 1048576 ]; then
              echo " ($((total/1048576)) Mb)"
        elif [ $total -ge 1024 ]; then
              echo " ($((total/1024)) Kb)"
        fi
    fi
done

To obtain the disk usage of files and directories, we can use the UNIX utility du. The default output of du is a list of directories with the amount of space each one uses, and looks something like this:

6       ./toc
3       ./figlist
6       ./tablist
1       ./exlist
1       ./index/idx
22      ./index
39      .

If you don’t specify a directory to du, it will use the current directory (.). Each directory and subdirectory is listed along with the amount of space it uses. The grand total is given in the last line.

The amount of space used by each directory and all the files in it is listed in terms of blocks. Depending on the UNIX system you are running on, one block can represent 512 or 1024 bytes. Each file and directory uses at least one block. Even if a file or directory is empty, it is still allocated a block of space in the filesystem.

In our case, we are only interested in the total usage, given on the last line of du’s output. To obtain only this line, we can use the -s option of du. Once we have the line, we want only the number of blocks and can throw away the directory name. For this we use our old friend cut to extract the first field.

Once we have the total, we can multiply it by the number of bytes in a block (1024 in this case) and print the result in terms of bytes. We then test to see if the total is greater than the number of bytes in one megabyte (1048576 bytes, which is 1024 x 1024) and if it is, we can print how many megabytes it is by dividing the total by this large number. If not, we see if it can be expressed in kilobytes, otherwise nothing is printed.

We need to make sure that any specified directories exist, otherwise du will print an error message and the script will fail. We do this by using the test for file or directory existence (-e) that we saw in Chapter 5 before calling du.

To round out this script, it would be nice to imitate du as closely as possible by providing for multiple arguments. To do this, we wrap the code in a for loop. Notice how parameter substitution has been used to specify the current directory if no arguments are given.

As a bigger example of integer arithmetic, we will complete our emulation of the pushd and popd functions (Task 4-8). Remember that these functions operate on DIR_STACK, a stack of directories represented as a string with the directory names separated by spaces. bash’s pushd and popd take additional types of arguments, which are:

  • pushd +n takes the nth directory in the stack (starting with 0), rotates it to the top, and cds to it.

  • pushd without arguments, instead of complaining, swaps the two top directories on the stack and cds to the new top.

  • popd +n takes the nth directory in the stack and just deletes it.

The most useful of these features is the ability to get at the nth directory in the stack. Here are the latest versions of both functions:

.ps 8
pushd ( )
{
    dirname=$1   if [ -n $dirname ] && [ ( -d $dirname ) -a
           ( -x $dirname ) ]; then
        DIR_STACK="$dirname ${DIR_STACK:-$PWD' '}"
        cd $dirname
        echo "$DIR_STACK"
    else
        echo "still in $PWD."
    fi
}
     
popd ( )
{
    if [ -n "$DIR_STACK" ]; then
        DIR_STACK=${DIR_STACK#* }
     
        cd ${DIR_STACK%% *}
        echo "$PWD"
    else
        echo "stack empty, still in $PWD."
    fi
}

To get at the nth directory, we use a while loop that transfers the top directory to a temporary copy of the stack n times. We’ll put the loop into a function called getNdirs that looks like this:

getNdirs ( )
{
    stackfront=''
    let count=0
    while [ $count -le $1 ]; do
        target=${DIR_STACK%${DIR_STACK#* }}
        stackfront="$stackfront$target"
        DIR_STACK=${DIR_STACK#$target}
        let count=count+1
    done
     
    stackfront=${stackfront%$target}
}

The argument passed to getNdirs is the n in question. The variable target contains the directory currently being moved from DIR_STACK to a temporary stack, stackfront. target will contain the nth directory and stackfront will have all of the directories above (and including) target when the loop finishes. stackfront starts as null; count, which counts the number of loop iterations, starts as 0.

The first line of the loop body copies the first directory on the stack to target. The next line appends target to stackfront and the following line removes target from the stack ${DIR_STACK#$target}. The last line increments the counter for the next iteration. The entire loop executes n+1 times, for values of count from 0 to N.

When the loop finishes, the directory in $target is the nth directory. The expression ${stackfront%$target} removes this directory from stackfront so that stackfront will contain the first n-1 directories. Furthermore, DIR_STACK now contains the “back” of the stack, i.e., the stack without the first n directories. With this in mind, we can now write the code for the improved versions of pushd and popd:

pushd ( )
{
    if [ $(echo $1 | grep '^+[0-9][0-9]*$') ]; then
     
        # case of pushd +n: rotate n-th directory to top
        let num=${1#+}
        getNdirs $num
     
     
        DIR_STACK="$target$stackfront$DIR_STACK"
        cd $target
        echo "$DIR_STACK"
 
    elif [ -z "$1" ]; then
        # case of pushd without args; swap top two directories
        firstdir=${DIR_STACK%% *}
        DIR_STACK=${DIR_STACK#* }
        seconddir=${DIR_STACK%% *}
        DIR_STACK=${DIR_STACK#* }
        DIR_STACK="$seconddir $firstdir $DIR_STACK"
        cd $seconddir
     
    else
        # normal case of pushd dirname
        dirname=$1
        if [ ( -d $dirname ) -a ( -x $dirname ) ]; then
            DIR_STACK="$dirname ${DIR_STACK:-$PWD" "}"
            cd $dirname
            echo "$DIR_STACK"
        else
            echo still in "$PWD."
        fi
    fi
}
     
popd ( )
{
    if [ $(echo $1 | grep '^+[0-9][0-9]*$') ]; then
     
        # case of popd +n: delete n-th directory from stack
        let num=${1#+}
        getNdirs $num
        DIR_STACK="$stackfront$DIR_STACK"
        cd ${DIR_STACK%% *}
        echo "$PWD"
     
    else
     
        # normal case of popd without argument
        if [ -n "$DIR_STACK" ]; then
            DIR_STACK=${DIR_STACK#* }
            cd ${DIR_STACK%% *}
            echo "$PWD"
        else
            echo "stack empty, still in $PWD."
        fi
    fi
}

These functions have grown rather large; let’s look at them in turn. The if at the beginning of pushd checks if the first argument is an option of the form + N. If so, the first body of code is run. The first let simply strips the plus sign (+) from the argument and assigns the result—as an integer—to the variable num. This, in turn, is passed to the getNdirs function.

The next assignment statement sets DIR_STACK to the new ordering of the list. Then the function cds to the new directory and prints the current directory stack.

The elif clause tests for no argument, in which case pushd should swap the top two directories on the stack. The first four lines of this clause assign the top two directories to firstdir and seconddir, and delete these from the stack. Then, as above, the code puts the stack back together in the new order and cds to the new top directory.

The else clause corresponds to the usual case, where the user supplies a directory name as argument.

popd works similarly. The if clause checks for the + N option, which in this case means “delete the nth directory.” A let extracts the N as an integer; the getNdirs function puts the first n directories into stackfront. Finally, the stack is put back together with the nth directory missing, and a cd is performed in case the deleted directory was the first in the list.

The else clause covers the usual case, where the user doesn’t supply an argument.

Before we leave this subject, here are a few exercises that should test your understanding of this code:

  1. Implement bash’s dirs command and the options +n and -l. dirs by itself displays the list of currently remembered directories (those in the stack). The +n option prints out the nth directory (starting at 0) and the -l option produces a long listing; any tildes (~) are replaced by the full pathname.

  2. Modify the getNdirs function so that it checks for N exceeding the number of directories in the stack and exits with an appropriate error message if true.

  3. Modify pushd, popd, and getNdirs so that they use variables of type integer in the arithmetic expressions.

  4. Change getNdirs so that it uses cut (with command substitution), instead of the while loop, to extract the first N directories. This uses less code but runs more slowly because of the extra processes generated.

  5. bash’s versions of pushd and popd also have a -N option. In both cases -N causes the nth directory from the right-hand side of the list to have the operation performed on it. As with +N, it starts at 0. Add this functionality.

  6. Use getNdirs to reimplement the selectd function from the last chapter.

Arithmetic for Loops

Chapter 5 introduced the for loop and briefly mentioned another type of for loop, more akin to the construct found in many programming languages like Java and C. This type of for loop is called an arithmetic for loop.[14]

The form of an arithmetic for loop is very similar to those found in Java and C:

for (( initialisation ; ending condition ; update ))
do
        statements...
done

There are four sections to the loop, the first three being arithmetic expressions and the last being a set of statements just as in the standard loop that we saw in the last chapter.

The first expression, initialisation, is something that is done once at the start of the loop and if it evaluates to true the loop continues its process; otherwise, it skips the loop and continues with the next statement. When initialisation is true the loop then evaluates ending condition. If this is true then it executes statements, evaluates update and repeats the cycle again by evaluation ending condition. The loop continues until ending condition becomes false or the loop is exited via one of the statements.

Usually initialisation is used to set an arithmetic variable to some initial value, update updates that variable, and ending condition tests the variable. Any of the values may be left out in which case they automatically evaluate to true. The following simple example:

for ((;;))
do
        read var
        if [ "$var" = "." ]; then
                break
        fi
done

loops forever reading lines until a line consisting of a . is found. We’ll look at using the expressions in an arithmetic for loop in our next task.

This task is best accomplished using nested for loops:

for (( i=1; i <= 12 ; i++ ))
do
        for (( j=1 ; j <= 12 ; j++ ))
        do
                echo -ne "$(( j * i ))	"
        done

        echo
done

The script begins with a for loop using a variable i; the initialisation clause sets i to 1, the ending condition clause tests i against the limit (12 in our case), and the update clause adds 1 to i each time around the loop. The body of the loop is another for loop, this time with a variable called j. This is identical to the i for loop except that j is being updated.

The body of the j loop has an echo statement where the two variables are multiplied together and printed along with a trailing tab. We deliberately don’t print a newline (with the -n option to echo) so that the numbers appear on one line. Once the inner loop has finished a newline is printed so that the set of numbers starts on the next line.

Arithmetic for loops are useful when dealing with arrays, which we’ll now look at.

Arrays

The pushd and popd functions use a string variable to hold a list of directories and manipulate the list with the string pattern-matching operators. Although this is quite efficient for adding or retrieving items at the beginning or end of the string, it becomes cumbersome when attempting to access items that are anywhere else, e.g., obtaining item N with the getNdirs function. It would be nice to be able to specify the number, or index, of the item and retrieve it. Arrays allow us to do this.[15]

An array is like a series of slots that hold values. Each slot is known as an element, and each element can be accessed via a numerical index. An array element can contain a string or a number, and you can use it just like any other variable. The indices for arrays start at 0 and continue up to a very large number.[16] So, for example, the fifth element of array names would be names[4]. Indices can be any valid arithmetic expression that evaluates to a number greater than or equal to 0.

There are several ways to assign values to arrays. The most straightforward way is with an assignment, just like any other variable:

names[2]=alice
names[0]=hatter
names[1]=duchess

This assigns hatter to element 0, duchess to element 1, and alice to element 2 of the array names.

Another way to assign values is with a compound assignment:

names=([2]=alice [0]=hatter [1]=duchess)

This is equivalent to the first example and is convenient for initializing an array with a set of values. Notice that we didn’t have to specify the indices in numerical order. In fact, we don’t even have to supply the indices if we reorder our values slightly:

names=(hatter duchess alice)

bash automatically assigns the values to consecutive elements starting at 0. If we provide an index at some point in the compound assignment, the values get assigned consecutively from that point on, so:

names=(hatter [5]=duchess alice)

assigns hatter to element 0, duchess to element 5, and alice to element 6.

An array is created automatically by any assignment of these forms. To explicitly create an empty array, you can use the -a option to declare. Any attributes that you set for the array with declare (e.g., the read-only attribute) apply to the entire array. For example, the statement declare -ar names would create a read-only array called names. Every element of the array would be read-only.

An element in an array may be referenced with the syntax ${ array[i]}. So, from our last example above, the statement echo ${names[5]} would print the string “duchess”. If no index is supplied, array element 0 is assumed.

You can also use the special indices @ and *. These return all of the values in the array and work in the same way as for the positional parameters; when the array reference is within double quotes, using * expands the reference to one word consisting of all the values in the array separated by the first character of the IFS variable, while @ expands the values in the array to separate words. When unquoted, both of them expand the values of the array to separate words. Just as with positional parameters, this is useful for iterating through the values with a for loop:

for i in "${names[@]}"; do
    echo $i
done

Any array elements which are unassigned don’t exist; they default to null strings if you explicitly reference them. Therefore, the previous looping example will print out only the assigned elements in the array names. If there were three values at indexes 1, 45, and 1005, only those three values would be printed.

If you want to know what indices currently have values in an array then you can use ${!array[@]}. In the last example this would return 1 45 1005.[17]

A useful operator that you can use with arrays is #, the length operator that we saw in Chapter 4. To find out the length of any element in the array, you can use ${#array[i]}. Similarly, to find out how many values there are in the array, use * or @ as the index. So, for names=(hatter [5]=duchess alice), ${#names[5]} has the value 7, and ${#names[@]} has the value 3.

Reassigning to an existing array with a compound array statement replaces the old array with the new one. All of the old values are lost, even if they were at different indices to the new elements. For example, if we reassigned names to be ([100]=tweedledee tweedledum), the values hatter, duchess, and alice would disappear.

You can destroy any element or the entire array by using the unset built-in. If you specify an index, that particular element will be unset. unset names[100], for instance, would remove the value at index 100; tweedledee in the example above. However, unlike assignment, if you don’t specify an index the entire array is unset, not just element 0. You can explicitly specify unsetting the entire array by using * or @ as the index.

Let’s now look at a simple example that uses arrays to match user IDs to account names on the system. The code takes a user ID as an argument and prints the name of the account plus the number of accounts currently on the system:

for i in $(cut -f 1,3 -d: /etc/passwd) ; do
   array[${i#*:}]=${i%:*}
done
     
echo "User ID $1 is ${array[$1]}."
echo "There are currently ${#array[@]} user accounts on the system."

We use cut to create a list from fields 1 and 3 in the /etc/passwd file. Field 1 is the account name and field 3 is the user ID for the account. The script loops through this list using the user ID as an index for each array element and assigns each account name to that element. The script then uses the supplied argument as an index into the array, prints out the value at that index, and prints the number of existing array values.

We’ll now look at combining our knowledge of arrays with arithmetic for loops in the next task:

Selection sort is a common algorithm for quickly sorting a set of elements. While it isn’t the quickest sorting algorithm available, it is easy to understand and implement.

It works by selecting the smallest element in the set and moving it to the head of the set. It then repeats the process for the remainder of the set until the end of the set is reached.

For example, to sort the set 21543 it would start at 2 and then move down the set. 1 is less than 2 (and the other elements) so 1 is moved to the start: 12543. Then looking at 2 and moving down the list it finds nothing less than 2 so it moves to the next element, 5. Moving down the list 4 is less than 5, but 3 is less than 4, so 3 is moved: 12354. The next element is 5, and 4 is less than this so 4 is moved: 12345. Five is the last element so the sort is finished.

The code for this is as follows:

values=(39 5 36 12 9 3 2 30 4 18 22 1 28 25)
numvalues=${#values[@]}

for (( i=0; i < numvalues; i++ )); do
  lowest=$i

  for (( j=i; j < numvalues; j++ )); do
    if [ ${values[j]} -le ${values[$lowest]}; then
      lowest=$j
    fi
  done

  temp=${values[i]}
  values[i]=${values[lowest]}
  values[lowest]=$temp
done

for (( i=0; i < numvalues; i++ )); do
  echo -ne "${values[$i]}	"
done

echo

At the start of the script we set up an array of randomly ordered values and a variable to hold the number of array elements as a convenience.

The outer i for loop is for looping over the entire array and pointing to the current “head” (where we put any value we need to swap). The variable lowest is set to this index.

The inner j loop is for looping over the remainder of the array. It compares the remaining elements with the value at lowest; if a value is less then lowest is set to the index of that element.

Once the inner loop is finished the values of the “head” (i) element and lowest are swapped by using a temporary variable temp.

On completing the outer loop, the script prints out the sorted array elements.

Note that some of the environment variables in bash are arrays; DIRSTACK functions as a stack for the pushd and popd built-ins, BASH_VERSINFO is an array of version information for the current instance of the shell, and PIPESTATUS is an array of exit status values for the last foreground pipe that was executed.

We’ll see a further use of arrays when we build a bash debugger in Chapter 9.

To end this chapter, here are some problems relating to what we’ve just covered:

  1. Improve the account ID script so that it checks whether the argument is a number. Also, add a test to print an appropriate message if the user ID doesn’t exist.

  2. Make the script print out the username (field 5) as well. Hint: this isn’t as easy as it sounds. A username can have spaces in it, causing the for loop to iterate on each part of the name.

  3. As mentioned earlier, the built-in versions of pushd and popd use an array to implement the stack. Change the pushd, popd, and getNdirs code that we developed in this chapter so that it uses arrays.

  4. Change the selection sort in the last task into a bubble sort. A bubble sort works by iterating over the list comparing pairs of elements and swapping them if they are in incorrect order. It then repeats the process from the start of the list and continues until the list is traversed with no swaps.



[1] Although most UNIX commands allow this, it is actually contrary to the Command Syntax Standard Rules in intro of the User’s Manual.

[2] getopts replaces the external command getopt, used in Bourne shell programming; getopts is better integrated into the shell’s syntax and runs more efficiently. C programmers will recognize getopts as very similar to the standard library routine getopt.

[3] You can also turn off the getopts messages by setting the environment variable OPTERR to 0. We will continue to use the colon method in this book.

[4] We’ll also need the -quiet option, which suppresses diagnostic output from some NetPBM utilities.

[5] Actually, -xysize fits the image into a box defined by its arguments without changing the aspect ratio of the image, i.e., without stretching the image horizontally or vertically. For example, if you had an image of size 200 by 100 pixels and you processed it with pnmscale -xysize 100 100, you’d end up with an image of size 100 by 50 pixels.

[6] The typeset built-in is synonymous with declare but is considered obsolete.

[7] The -a and -F options are not available in bash prior to version 2.0.

[8] You can also use the older form $[...], but we don’t recommend this because it will be phased out in future versions of bash.

[9] The assignment forms of these operators are also permitted. For example, $((x += 2)) adds 2 to x and stores the result back in x.

[11] ++ and - are not available in versions of bash prior to 2.04.

[12] Note that the truth values returned by $((...)) are 1 for true, 0 for false—the reverse of the test and exit statuses.

[13] ((...)) is not available in versions of bash prior to 2.0.

[14] Versions of bash prior to 2.04 do not have this type of loop.

[15] Support for arrays is not available in versions of bash prior to 2.0.

[16] Actually, up to 599147937791. That’s almost six hundred billion, so yes, it’s pretty large.

[17] This is not available in versions of bash prior to 3.0.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.13.201