Chapter 10. Shell Scripting Functions

In writing shell scripts, you will often find yourself repeating the same code over and over again. Repeatedly typing the same code can be tiring and can lead to errors. This is where shell scripting functions should be used. Shell functions are used to simplify your shell scripts, making them easier to read and maintain.

Shell functions are like a magic box: You throw some things into it, it begins to shake and glow with a holy aura, and then out pops your data, magically changed. The magic that is performed on your data is a set of common operations that you have encapsulated into the function and given a name. A function is simply a way of taking a group of commands and putting a name on them. The bash man page describes functions as storing "a series of commands for later execution. When the name of a shell function is used as a simple command name, the list of commands associated with that function name is executed."

Other programming languages call functions subroutines. In essence they are atomic shell scripts, having their own exit codes and arguments. The main difference is that they run within your current shell script. This means that you have one instantiation of the shell, rather than spawning a new instance of the shell for each function. Instead of defining functions, you can put your functions into separate shell scripts, in separate files, and then run those scripts from within your shell script. However, this means you have to maintain a number of individual files, and that can get messy.

This chapter covers the following topics:

  • Defining and using functions

  • Using arguments and returning data from functions

  • Function variable scope

  • Understanding recursion

Defining Functions

The syntax for defining functions is not complex. Functions just need to be named and have a list of commands defined in the body. Choose function names that are clear descriptions of what the function does and short enough that they are useful. In bash, a function is defined as follows:

name () { commandlist; }

This function is very dry, but it illustrates the syntax of the most basic function definition. The name of the function is name. It is followed by a required set of parentheses that indicates this to be a function. Then a set of commands follows, enclosed in curly braces, each command separated by semicolons. The space immediately following the first curly brace is mandatory, or a syntax error will be generated.

The curly braces surround what is known as a block of code, sometimes referred to as the body of the function. A block of code combines several different commands into one unit. Anything that is contained in a block of code is executed as one unit. Blocks of code are valid shell scripting constructs outside of functions.

For example, the following is valid bash syntax defining two distinct blocks of code:

$ { ls -l; df -h; } ; { df -h; ls -l; }

If you were to type this rather useless bit of shell code into the shell and run it, you would find that the first block of code has both its commands executed in order, and then the second block of code has its two commands executed in order.

Blocks of code behave like anonymous functions; they have no name, and unlike functions, variables used in blocks of code are visible outside of the function. So if you set a value to a variable in a block of code, it can be referenced outside of that block of code:

$ { a=1; }
$ echo $a
1

Blocks of code are not functions because they have no names and because their variables are visible outside of the block. They are useful for combining sequences of commands, but they cannot be replicated without retyping the block of code.

Adding Names to Blocks of Code

A function is simply a block of code with a name. When you give a name to a block of code, you can then call that name in your script, and that block of code will be executed.

You can see how functions work by defining a basic function in the shell.

If you put more than one command in your function's block of code, separate each command with a semicolon, and end the list of commands with a final semicolon. For example, the following function places three separate commands in the code block:

$ diskusage () { df; df -h ; du -sch ; }
$

When you print out the function in the shell using the declare shell built-in command, you will see how multiple commands look when they have been formatted:

$ declare -f diskusage
diskusage ()
{
    df;
    df -h;
    du -sch
}

You can declare a function on the command line using the shell's multiline input capability.

Function Declaration Errors

It is easy to incorrectly declare and use functions. Because everyone does it, it is good to know what the most common syntax mistakes and their resulting errors are so you can recognize them and fix them.

If you forget to include the parentheses in your function declaration, the error you receive will not tell you that; it will instead be confused by the unexpected curly braces.

Using Functions

To use a function that you have declared is as simple as executing a command in the shell, using the name of the function as the command.

Declaring before Use

When you define a function, the commands that are in the block of code are not executed. The shell does parse the list of commands to verify that the syntax is valid, and if so, it stores the name of the function as a valid command.

As demonstrated in the previous section, the shell must have the function name stored before it can be called, or there will be an error. This means that a function must be known by a shell script before it can be used; otherwise, it is an unknown command. You should always make sure that your functions are declared early in your shell scripts so that they are useful throughout the rest of your scripts. The following Try It Out shows what happens when you try to call a function before declaring it.

It is good practice to declare all of your functions at the beginning of your shell script so that they are all in one central place and can be found easily later. If you realize halfway through a long shell script that you need a function and declare it there, and then use it afterward throughout the script, it will not cause any technical problem, but this practice makes for code that tends toward tangled spaghetti. Such code is hard to understand, hard to maintain, and more likely to contain bugs than the corresponding cleaner code.

It is instructive to note that if you try to declare a function within the declaration of another function, the second function will not be defined until the first function is called. It is better to avoid this headache and keep each function as an entirely separate unit.

Although you do not want to define functions inside of functions, it is not uncommon to call a function from within another function, as in the following example.

Function Files

If you are writing a shell script that is long, I hope you will find yourself abstracting many aspects of your script into functions so that you may reuse your code rather than rewrite your code. Putting your functions at the beginning of your script is good practice; however, if the number of functions that you have defined becomes so large that your actual script doesn't start for pages and pages, you should consider putting all your functions into a function file.

A function file simply contains all of your functions, rather than putting them in your main script. To create a function file, remove your functions from your main script, and put them in a separate file. You must also add a line into your main script to load these functions; otherwise, they will not be known to the main script. To load these functions from your function file, you would replace the functions in your main script with the following line:

source function_file

The bash command source reads in and executes whatever file you specify; in this case, the file you are specifying is function_file. The name of this file is up to you. Because function_file contains only functions, bash simply loads all of these into memory and makes them available to the main script. (If you have commands outside of functions in this file, they are also run.) If you want to decrease the legibility of your shell script by taking a shortcut, you can substitute a period (.) for the bash command source; the period does the same thing as source but is much harder to notice. It is better to explicitly spell out that this is what you are doing by using source to keep your code readable.

When abstracting your functions into a function file, you should consider a number of things. One important consideration is where in the file system your function file is located. In the preceding example, no path was specified, so function_file has to exist in the directory where the main script is located. It must be located here every time this script is run. If you wish to put your functions in another location, you simply need to specify the path locating the function_file. This brings up another consideration: namely, that now you must manage multiple files associated with your one script. If these are worthy tradeoffs, then it makes sense to put your functions into a separate file; otherwise, it may be wise to leave them in the script itself.

Putting your functions into a function file makes these functions available to other scripts. You can write useful functions that you may want to reuse in the future, and instead of copying and pasting the functions from one script to another, you can simply reference the appropriate function files. Functions do not have to be associated with a particular script; they can be written to be completely atomic so that they are useful for as many scripts as possible.

Common Usage Errors

A common problem when invoking functions is including the parentheses when you shouldn't. You include the parentheses only when you are defining the function itself, not when you are using it. In the following Try It Out, you see what happens when you try to invoke a function using parentheses.

Undeclaring Functions

If you have defined a function, but you no longer want to have that function defined, you can undeclare the function using the unset command, as in the following example.

Using Arguments with Functions

After functions have been declared, you effectively use them as if they were regular commands. Most regular Unix commands can take various arguments to change their behavior or to pass specific data to the command. In the same way that you can pass arguments to commands, you can use arguments when you execute functions. When you pass arguments to a function, the shell treats them in the same way that positional parameter arguments are treated when they are passed to commands or to shell scripts.

The individual arguments that are passed to functions are referenced as the numerical variables, $1, $2, and so on. The number of arguments is known as $#, and the set of variables available as $@. This is no different from how shell scripts themselves handle arguments.

Using Return Codes with Functions

Every command you run in Unix returns an exit code, indicating the success or various failures that could occur. This exit code is not output on the screen after every command you type, but it is set into a shell variable, $?. Every time you run a command, this variable is set to the new exit code of that command. It is common in shell scripting to test this variable to see if something you ran succeeded the way you expect. Typically, if you run a command and it succeeds, an exit code of 0 is set into the $? variable; if the command doesn't succeed, the exit code will be set to a nonzero status. The different nonzero numbers that can be used for an exit code that fails depend solely on the program itself; generally, what they mean is documented in the man page of the command under the EXIT STATUS section of the man page. You can see the exit code at any point in the shell simply by running echo $?, which prints the exit code of the last command run, as you can see in the following Try It Out.

In the same way that commands in Unix return exit codes, shell scripts are often written to exit with different codes depending on the relative success or failure of the last command executed in the script, or if you explicitly specify an exit code with the exit command.

Within shell scripts themselves, functions are also designed to be able to return an exit code, although because the shell script isn't actually exiting when a function is finished, it is instead called a return code. Using return codes enables you to communicate outside of your function to the main script the relative success or failure of what happened within the function. In the same way that you can specify in your shell script exit with the exit code, you can specify return with a return code in a function. Analogous to exit codes, return codes are by convention a success if they are zero and a failure if they are nonzero. Additionally, in the same manner that exit codes work, if no return code is specified in a function, the success or failure of the last command in the function is returned by default.

Variable Scope: Think Globally, Act Locally

Functions are often written to perform work and produce a result. That result is something that you usually want to use in your shell script, so it needs to be available outside the context of the function where it is set. In many programming languages, variables in functions and subroutines are available only within the functions themselves. These variables are said to have local scope because they are local only to the function. However, in bash shell scripts, variables are available everywhere in the script; hence, they are referred to as having global scope and are called global variables.

Programmers who fancy themselves to have style will recognize global variables as the path that leads to sloppy code. Throwing the scope wide open allows for mistakes and carelessness, because there are no formal restrictions keeping you from doing something that obfuscates or redefines a variable without your knowing it. Programs are generally easier to read, understand, and hence maintain when global variables are restricted. If you can read and modify a variable anywhere in your script, it becomes difficult to remember every place that you have used it and hard to reason through all the potential uses and changes it might undergo. It is easy to end up with unexpected results if you are not careful. You may even forget that you used a variable in some function and then use it again, thinking it has never been used.

However, you can still write good, clean code by being careful. Keeping your variable names unique to avoid namespace pollution is a good first step. In the same way that your function names should be named clearly, so should your variables. It is bad practice to use variables such as a or b; instead use something descriptive so you aren't likely to use it again unless you are using it for the exact purpose it was meant for.

Understanding Recursion

Recursion has been humorously defined as follows: "When a function calls itself, either directly or indirectly. If this isn't clear, refer to the definition of recursion." Recursion can be very powerful when used in functions to get work done in a beautifully simple manner. You have seen how it is possible to call a function from within another function. To perform recursion, you simply have a function call itself, rather than calling another function. Variables in functions need to change every time they are recursed; otherwise, you end up with an infinite loop scenario, so your program, infinitely recursing over itself without ever finishing, will never end. The beauty of recursion is to loop just the right number of times and not infinitely. Recursion allows you to loop as many times as necessary without having to define the number of times. The following Try It Out shows you how to perform simple recursion.

Summary

Functions are an essential aspect of shell scripting. They allow you to organize your scripts into modular elements that are easier to maintain and to enhance. Although you do not need to use functions, they often help you save time and typing by defining something once and using it over and over again. Because the syntax for defining functions is very simple, you are encouraged to use them whenever you can. Functions can be understood, both conceptually as well as syntactically, as shell scripts within shell scripts. This concept is extended even more powerfully when you use functions recursively.

In this chapter, you learned:

  • What functions are and how they are useful in saving time and typing

  • What makes a function: the function name and the associated code block

  • How to declare functions in a single line, on multiple lines, in shell scripts, and in separate function files

  • How to show what a function is defined as, how to test if a function is defined, and how to undefine a function

  • Some common function declaration missteps and how to avoid them

  • How numerical positional variables can be used as function arguments as well as the standard shell arguments

  • How to define and use exit status and return values in functions

  • Variable scope, global variables, and problematic aspects to global variables

  • And finally, how to use recursion in functions to perform powerful operations

Tracking down difficult bugs in your scripts can sometimes be the most time-consuming process of shell scripting, especially when the error messages you get are not very helpful. The next chapter covers techniques for debugging your shell scripts that will make this process easier.

Exercises

  1. Experiment with defining functions: See what happens when you fail to include a semicolon on the command line between commands or when you forget to close the function with the final curly brace. Become familiar with what happens when functions are defined incorrectly so you will know how to debug them when you use them practically.

  2. What is wrong with creating a function called ls that replaces the existing command with a shortcut to your favorite switches to the ls command?

  3. What is the difference between defining a shell function and setting a shell alias?

  4. Write an alarm clock script that sleeps for a set number of seconds and then beeps repeatedly after that time has elapsed.

  5. Use a recursive function to print each argument passed to the function, regardless of how many arguments are passed. You are allowed to echo only the first positional argument (echo $1).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.68