Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. Processing Text with sed

When you need to edit a file, you typically open up your favorite editor, perform the change, and then save the file and exit. Editors are great for modifying files and seem to be suitable for any type of editing needed. However, imagine you have a web site with a couple of thousand HTML files that need the copyright year at the bottom changed from 2004 to 2005. The interactive nature of editors would require you to type every change that you need to make. You would launch your editor and open each file individually and, like an automaton, make the change, save, exit, repeat. After spending hours performing the same change on thousands of files, you realize you've forgotten about a whole section of the web site and actually have several thousand more, and next year you will need to do this again, with more files. There has to be a better way.

Fortunately, there is. This chapter introduces you to sed, an intelligent text-processing tool that will save you not only time but also, more important, your sanity. The sed command gives you the power to perform these changes on the command line, or in a shell script, with very little headache. Even better, sed will allow you to repeat the advanced batch editing of files simply. Sed can be run on the command line and is a powerful addition to any shell scriptwriter's toolbox. Learning the building blocks of sed will enable you to create tools to solve complex problems automatically and efficiently.

This chapter introduces you to the building blocks of sed by covering the following subjects:

Getting and installing sed
Methods of invoking sed
Selecting lines to operate on
Performing substitutions with sed
Advanced sed invocation
Advanced addressing
Common one-line sed scripts

Introducing sed

In this chapter, I give you a gentle introduction to sed and its powerful editing capabilities. Learning sed can take some time, but the investment pays off tenfold in time saved. It can be frustrating to figure out how to use sed to do what you need automatically, and at times you may decide you could do the rote changes interactively in less time. However, as you sharpen your skills, you'll find yourself using sed more frequently and in better ways. Soon you will revel in the divine realm of automation and reach to the programmer's version of nirvana.

The name sed means stream editor. It's designed to perform edits on a stream of data. Imagine a bubbling stream of cool mountain water filled with rainbow trout. You know that this stream empties into a sewage system a few miles down, and although you aren't a big trout fan you want to save the fish. So you do a little work to reroute the stream through a pipe to drain into a healthy lake. (The process of piping is discussed in Chapter 8.) With sed, you can do a little magic while the stream is flowing through the pipe to the lake. With a simple sed statement, the stream and all the trout would flow into the pipe, and out would come the same icy mountain stream, filled with catfish instead of trout. You could also change the cool water into iced tea, but that won't help the fish. Using your traditional text editor is like manually replacing each trout in the stream with catfish by hand; you'd be there forever, fish frustratingly slipping out of your grasp. With sed, it's a relatively simple and efficient task to make global changes to all the data in your stream.

Sed is related to a number of other Unix utilities, and what you learn in this chapter about sed will be useful for performing similar operations using utilities such as vi and grep. Sed is derived originally from the basic line editor ed, an editor you will find on most every Unix system but one that is rarely used because of its difficult user interface. (Although unpopular as an editor, ed continues to be distributed with Unix systems because the requirements to use this editor are very minimal, and thus it is useful in dire recovery scenarios when all other editors may fail because their required libraries are not available.)

Sed is shell independent; it works with whatever shell you want to use it with. Because the default shell on most systems is Bash, the examples here are based on the Bash shell.

Sed can be ugly and frightening, but it is also quite powerful. Maybe you've seen some unintelligible, scary sed, such as the following line, and are wary of learning it because it looks like gibberish:

sed '/
/!G;s/(.)(.*
)/&21/;//D;s/.//' myfile.txt

Even someone who knows sed well would have to puzzle over this line for a while before they understood that this reversed the order of every character in every line of myfile.txt, effectively creating a mirror image of the file. This line makes most people's heads spin. But don't worry. I'll start off with the basics, giving you a strong foundation so that you'll no longer find sed frightening and unintelligible.

sed Versions

Sed is a brick-and-mortar Unix command. It comes standard with nearly every Unix that exists, including Linux and Mac OS X, and it generally does not need to be installed, as it is such an essential shell command. However, it is possible that some systems don't ship with sed or come with a version that doesn't have the same features as others.

In fact, there is a dizzying array of sed implementations. There are free versions, shareware versions, and commercial versions. Different versions may have different options, and some of the examples in this chapter, especially the more advanced ones, may not work as presented with every version.

The most common version is arguably GNU sed, currently at revision 4.1.2. This version is used in the examples throughout this chapter. The GNU sed has a number of extensions that the POSIX sed does not have, making things that used to be difficult much simpler. If you need multibyte support for Japanese characters, there is a BSD implementation that offers those extensions. There is a version of sed called ssed (super-sed) that has more features than GNU sed and is based on the GNU sed code-base. There are versions of sed that are designed for constrained environments so they are small and fast (minised), versions that can be plugged into the Apache web server (mod_sed), and color versions of sed (csed). Most implementations will do the basics of what I cover here, so it is not necessary that you have GNU sed; however, you may find that the extensions that GNU sed offers will make your life easier.

Mac OS X comes with the BSD version of sed; GNU/Linux tends to distribute GNU sed. If your operating system is something else, you will be able to find a version of sed that works for you. The sed Frequently Asked Questions has an entire section devoted to the different versions of sed and where you can find one that works for your operating system (see http://sed.sourceforge.net/sedfaq2.html#s2.2).

Not all sed implementations are without cost. Commercial versions of sed are available, useful because many of them include support or provide sed for an esoteric or outdated operating system. Aside from that reason, they don't offer much more than GNU sed, probably have fewer features, and do not adhere as strictly to POSIX standards.

Sed is generally found at /bin/sed or /usr/bin/sed.

To see what version you have on your system, type the following command:

$ sed --version
GNU sed version 4.1.2
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.

If this doesn't work, try just typing sed by itself on the command line to see if you get anything at all. You may have to specify /bin/sed or /usr/bin/sed. If the --version argument is not recognized by your system, you are not running GNU sed. In that case, try the following command to get the current version number:

$ strings /bin/sed | grep -i ver

Installing sed

If you find that you don't have any version of sed installed, I recommend getting a version targeted for your operating system or the one directly from GNU. Mac OS X comes with a BSD version of sed, but you can easily install the GNU version through fink (http://fink.sourceforge.net/). On Debian GNU/Linux you can install sed as root by typing apt-get install sed.

Installing GNU sed by hand is not very difficult. The process is even less difficult if you have a system that already has another version of sed installed, as the GNU sed installation requires some form of sed installed to install itself. This sounds like a chicken-and-egg problem, but the GNU sed provides the necessary bootstrap sed as part of the installation to resolve this. You can get the latest .tar.gz file of GNU sed from ftp://ftp.gnu.org/pub/gnu/sed/. After you have obtained the sed tar, you uncompress it as you would any normal tar file:

$ tar -zxf sed-4.1.2.tar.gz
$ cd sed-4.1.2

Read the README file that is included with the source for specific instructions.

Bootstrap Installation

If you are building sed on a system that has no preexisting version of sed, you need to follow a bootstrap procedure outlined in README.boot. (If you have the BSD version of sed, you won't need to do this and can skip to the section Configuring and Installing sed.) This is because the process of making sed requires sed itself. The standard GNU autoconf configure script uses sed to determine system-dependent variables and to create Makefiles.

To bootstrap the building of sed, you run the shell script bootstrap.sh. This attempts to build a basic version of sed that works for the configure script. This version of sed is not fully functional and should not be used typically for anything other than bootstrapping the build process.

You should see output like the following when you run bootstrap.sh:

$ sh ./userinputh
Creating basic config.h...
+ rm -f 'lib/*.o' 'sed/*.o' sed/sed
+ cd lib
+ rm -f regex.h
+ cc -DHAVE_CONFIG_H -I.. -I. -c alloca.c

It continues building and may report a number of compiler warnings. Don't worry about these; however, if you get errors and the bootstrap version of sed fails to build, you will need to edit the config.h header file that was created for your system. Read the README.boot file and the comments that are contained in the config.h file to determine how to solve this. On most systems, however, this bootstrap version of sed should build fine.

Once the build has completed, you need to install the bootstrapped version of sed somewhere in your $PATH so you can build the full version. To do this, simply copy the sed binary that was built in the sed directory to somewhere in your $PATH. In the following example you create the bin directory in your home directory, append that path to your existing $PATH environment variable, and then copy the sed binary that was created in the bootstrap procedure into your $HOME/bin directory. This will make this version of sed available for the remainder of the build process.

$ mkdir $HOME/bin
$ export PATH=$PATH:$HOME/bin
$ cp sed/sed $HOME/bin

Configuring and Installing sed

If you already have a version of sed installed on your system, you don't need to bootstrap the installation but can simply use the following command to configure sed:

$ sh ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc

This will continue to run through the GNU autoconf configuration, analyzing your system for various utilities, variables, and parameters that need to be set or exist on your system before you can compile sed. This can take some time before it finishes. If this succeeds, you can continue with compiling sed itself. If it doesn't, you will need to resolve the configuration problem before proceeding.

To compile sed, simply issue a make command:

$ make
make  all-recursive
make[1]: Entering directory `/home/micah/working/sed-4.1.2'
Making all in intl
make[2]: Entering directory `/home/micah/working/sed-4.1.2/intl'

This will continue to compile sed, which shouldn't take too long. On my system the configuration took longer than the compile. If this succeeds, you can install the newly compiled sed simply by issuing the make install command as root:

$ su
Password:
# make install

Sed will be put in the default locations in your file system. By default, make install installs all the files in /usr/local/bin, /usr/local/lib, and so on. You can specify an installation prefix other than /usr/local using --prefix when running configure; for instance, sh ./configure --prefix=$HOME will make sed so that it will install in your home directory.

Note

Warning! Be very careful that you do not overwrite your system-supplied sed, if it exists. Some underlying systems may depend on that version, and replacing it with something other than what the vendor supplied could result in unexpected behavior. Installing into /usr/local or into your personal home directory is perfectly safe.

How sed Works

Because sed is a stream editor, it does its work on a stream of data it receives from stdin, such as through a pipe, writing its results as a stream of data on stdout (often just your screen). You can redirect this output to a file if that is what you want to do (see Chapter 8 for details on redirecting). Sed doesn't typically modify an original input file; instead you send the contents of your file through a pipe to be processed by sed. This means that you don't need to have a file on the disk with the data you want changed; this is particularly useful if you have data coming from another process rather than already written in a file.

Invoking sed

Before you get started with some of the examples that follow, you will need some data to work with. The /etc/passwd file, available on all Unix derivatives, contains some useful data to parse with sed. Everyone will have a slightly different /etc/passwd file, so your results may vary slightly. The output that is shown in the following examples and exercises will be based on the following lines from my /etc/passwd file; you can copy this and save it or download it from the Wrox web site.

If for some reason your system's version of /etc/passwd produces unrecognizably different output from the examples, try using this version instead. If you use this version instead of the file on your system, you will need to change the path in each example from /etc/passwd to the specific location where you put this file.

root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

As mentioned previously, sed can be invoked by sending data through a pipe to it. Take a look at how this works by piping your password file through sed using the following command. You should see output similar to that shown here, which lists the command usage description for sed:

$ cat /etc/passwd | sed
Usage: sed [OPTION]... {script-only-if-no-other-script} [input-file]...

  -n, --quiet, --silent
                 suppress automatic printing of pattern space
  -e script, --expression=script
                 add the script to the commands to be executed
  -f script-file, --file=script-file
                 add the contents of script-file to the commands to be executed
  -i[SUFFIX], --in-place[=SUFFIX]
                 edit files in place (makes backup if extension supplied)
  -l N, --line-length=N
                 specify the desired line-wrap length for the `l' command
  --posix
                 disable all GNU extensions.
  -r, --regexp-extended
                 use extended regular expressions in the script.
  -s, --separate
                 consider files as separate rather than as a single continuous
                 long stream.

-u, --unbuffered
                 load minimal amounts of data from the input files and flush
                 the output buffers more often
      --help     display this help and exit
      --version  output version information and exit

If no -e, --expression, -f, or --file option is given, then the first
non-option argument is taken as the sed script to interpret.  All
remaining arguments are names of input files; if no input files are
specified, then the standard input is read.

E-mail bug reports to: [email protected] .
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.

Many of the different sed options you see here are covered later in this chapter. The important thing to note right now is that because you did not tell sed what to do with the data you sent to it, sed felt that you needed to be reminded about how to use it.

This command dumps the contents of /etc/passwd to sed through the pipe into sed's pattern space. The pattern space is the internal work buffer that sed uses to do its work, like a workbench where you lay out what you are going to work on.

Simply putting something on a workbench doesn't do anything at all; you need to know what you are going to do with it. Similarly, dumping data into sed's pattern space doesn't do anything at all; you need to tell sed to do something with it. Sed expects to always do something with its pattern space, and if you don't tell it what to do, it considers that an invocation error. Because you incorrectly invoked sed in this example, you found that it spit out to your screen its command usage.

Editing Commands

Sed expects you to provide an editing command. An editing command is what you want sed to do to the data in the pattern space. The following Try It Out example uses the delete-line editing command, known to sed as d. This command will delete each line in the pattern buffer.

Try It Out: Deleting All Lines with sed

Invoke sed again, but this time tell sed to use the editing command delete line, denoted by the single letter d:

$ cat /userinputwd | sed 'd'
$

How It Works

Because sed was invoked properly this time with an editing command, it didn't give the command usage. In fact, it didn't print anything at all. What did happen? This command sent the entire contents of the /etc/passwd file through the pipe to sed. Sed took the first line of /etc/passwd and read it into its pattern buffer. It then performed the delete line editing command on the contents of its pattern buffer and then printed out the pattern buffer. Because the editing command deleted the line in the pattern buffer, the pattern buffer was empty, so when sed printed the pattern buffer, nothing was printed. Sed then read the next line of the /etc/password file and repeated this process until it reached the end of the file. Sed effectively read in each line to the pattern buffer, deleted the line in the buffer, and then printed the empty buffer. This results in printing nothing over and over, not even a new line, so it appears as if nothing happens at all, but internally there is work happening.

Keep in mind that the original /etc/passwd file was not altered at all. Sed only read the contents of the file as input; you did not tell it to write to the file, only read from it. The results of the editing commands on each line are printed to standard output. In this case, nothing was printed to the screen because you used the d editing command to delete every line in the file.

Another important aspect of sed to learn from this example is that it operates line by line. The editing command was not applied to the entire file all at once. Whatever editing command you tell sed to perform will be done to each line in the pattern buffer, in the order that the lines are placed into the buffer.

The editing command was surrounded with single quotes. This was not absolutely necessary, but it is a good habit to get into. This same example could be written without the single quotes and the same output would result (in this case nothing). The singles quotes are useful to explicitly delineate your editing command. Without the single quote, some instances of editing commands might be incorrectly parsed and expanded by the shell. Without the single quotes, the shell may try to change your editing command unexpectedly to something other than you intended, so it's a good habit to add them routinely.

This example piped the contents of the file /etc/passwd to sed. This is a perfectly valid way of using sed, taking input from standard input and sending the output to standard output. However, the next set of examples shows a few different ways that sed can be invoked, to the same effect.

Invoking sed with the -e Flag

Instead of invoking sed by sending a file to it through a pipe, you can instruct sed to read the data from a file, as in the following example.

Try It Out: Reading Data from a File

The following command does exactly the same thing as the previous Try It Out, without the cat command:

$ sed -e 'd' /userinputwd
$

How It Works

Invoking sed in this manner explicitly defines the editing command as a sed script to be executed on the input file /etc/passwd. The script is simply a one-character editing command, but it could be much larger, as you'll see in later examples. In this case, input is taken from the file /etc/passwd, and standard output is the screen.

Output from sed is to standard output, which is generally the screen, but you can change this by redirecting the output to a file, using standard output redirection in the shell. Although it is useful to see the output of your sed commands on the screen, often you will want to save the output to a file, rather than simply viewing the results. Because the results of your sed commands are sent to standard output, you simply need to use your shell's I/O redirection capabilities to send standard output to a file. To do this, you use the redirection operator > to place the output into a file, as in the next Try It Out.

Chapter 8 covers redirection of standard output.

Try It Out: Redirection

The following example redirects the standard output from the sed command into the file called newpasswd in the /tmp directory:

$ sed -e 'd' /etc/passwd > /tmp/newpasswd
$

How It Works

Because this command deletes all the lines in the file, this results in an empty file.

Note

Be careful not to redirect the standard out from your sed commands to the original file, or you will cause problems. You may want to actually replace the original file with the results of your sed command, but do not use this command to do so, because the > redirection operator will overwrite the file before adding the results of your sed command. It is important to make sure your sed commands will produce the results you want before you overwrite your original file!

Typically, what you will do with sed is to redirect your output to a new file. If everything is fine, then you can replace the old file with your new one. GNU sed has an option that allows you to perform your sed operations in-line on the file itself; however, this is dangerous and should be used with care.

The -n, --quiet, and --silent Flags

As you saw in the preceding examples, sed by default prints out the pattern space at the end of processing its editing commands and then repeats that process.

The -n flag disables this automatic printing so that sed will instead print lines only when it is explicitly told to do so with the p command.

The p command simply means to print the pattern space. If sed prints out the pattern space by default, why would you want to specify it? The p command is generally used only in conjunction with the -n flag; otherwise, you will end up printing the pattern space twice, as demonstrated in the following Try It Out.

Try It Out: Extra Printing

Type the following command to see what sed will output when you specify the p command without the -n flag:

$ cat /etc/passwd | sed 'p' | head −10
root:x:0:0:root user:/root:/bin/sh
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

Type the same command, this time specifying the -n flag:

$ cat /etc/passwd | sed -n 'p' | head −10
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

How It Works

As you can see in the output from the first command, if you specify the p editing command without the -n flag, duplicate lines are printed. In the second example, however, there are no duplicate lines printed because sed was instructed to be silent and print only the lines you specified (in this case, all of them).

The -n flag has a couple of synonyms; if you find it easier to remember --quiet or --silent, these flags do the same thing.

These are the basic methods for invoking sed. Knowing these will allow you to move forward and use sed in a more practical way. When you are more familiar with some of sed's editing capabilities, you'll be ready for the more advanced methods covered in the Advanced sed Invocation section of this chapter.

sed Errors

It is easy to incorrectly specify your sed editing commands, as the syntax requires attention to detail. If you miss one character, you can produce vastly different results than expected or find yourself faced with a rather cryptic error message.

Sed is not friendly with its error messages, and unfortunately, different versions of sed have different cryptic errors for the same problems. GNU sed tends to be more helpful in indicating what was missing, but it is often very difficult for sed to identify the source of the error, and so it may spit out something that doesn't help much in fixing the problem. I explained in the previous section how GNU sed will output its command usage if you incorrectly invoke it, and you may get other strange errors as well.

Selecting Lines to Operate On

Sed also understands something called addresses. Addresses are either particular locations in a file or a range where a particular editing command should be applied. When sed encounters no addresses, it performs its operations on every line in the file.

The following command adds a basic address to the sed command you've been using:

$ cat /etc/passwd | sed '1d' |more
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

Notice that the number 1 is added before the delete edit command. This tells sed to perform the editing command on the first line of the file. In this example, sed will delete the first line of /etc/password and print the rest of the file. Because your /etc/passwd file may have so many lines in it that the top of the file scrolls by, you can send the result to the pager more. Notice that the following line is missing from the output:

root:x:0:0:root user:/root:/bin/sh

Most Unix systems have the root user as the first entry in the password file, but after performing this command you will see the entire password file, with the root user line missing from the top. If you replaced the number 1 with a 2, only the second line is removed.

Address Ranges

So what if you want to remove more than one line from a file? Do you have to tell sed every single line you want to remove? Fortunately not; you can specify a range of lines that you want to be removed by telling sed a starting line and an ending line to perform your editing commands on, as in the following Try It Out.

Try It Out: Address Ranges

Type the following command to see how an address range works with sed:

$ cat /etc/passwd | sed '1,5d'
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

How It Works

When you specify two numbers, separated by commas, sed performs the editing command specified on the range that starts with the first number and ends with the second. This example will delete the first five lines of the file.

You do not need to start deleting from the first line of the file but can set an address range for any range within the file, as long as it is in chronological order. 4,10d will delete the range 4-10, for example.

What happens if you specify a reverse chronological order range?

Try It Out: Reverse Address Ranges

Try specifying the address range '10,4d' to see what happens:

$ cat /etc/passwd | sed '10,4d'
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh

How It Works

In this example, sed reads in each line individually, in order, and applies the editing command to it. Because you've told it to start deleting from line 10, it reads in each line into the pattern buffer until it has counted up to 10, and then it deletes that line. Sed then looks for the range 9-4 to delete, but it won't see those lines after it has reached line 10. Sed does not back up in its processing to look for those lines, so line 10 is deleted, but nothing else.

Now that you have a better understanding of how sed applies its commands to its pattern buffer, what do you think sed does if you specify a line number or range that doesn't exist in the file? Sed dutifully looks for the lines that you specify to apply its command, but it never finds them, so you get the entire file printed out with nothing omitted.

If you forget to complete your address range, you receive an error from sed. The cryptic nature of sed's errors means that sed will tell you something is wrong, but not in a helpful way. For example, if you forgot the number after the comma, sed won't understand that you were trying to specify an address range and will complain about the comma:

$ cat /etc/passwd | sed '1,d'
sed: -e expression #1, char 3: unexpected `,'

If you forgot the number before the comma in your address range, sed thinks that you are trying to specify the comma as an editing command and tells you that there is no such command:

$ cat /etc/passwd | sed ',10d'
sed: -e expression #1, char 1: unknown command: `,'

You can also instruct sed to match an address line and certain numbers following that first match.

Suppose you want to match line 4 and the five lines following line 4. You do this by appending a plus sign before the second address number, as in the following command:

$ cat /etc/passwd | sed '4,+5d'
root:x:0:0:root:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

This will match line 4 in the file, delete that line, continue to delete the next five lines, and then cease its deletion and print the rest.

Address Negation

By appending an exclamation mark at the end of any address specification, you negate that address match. To negate an address match means to match only those lines that do not match the address range.

Try It Out: Address Negation

Beginning with the previous example where you specified deleting the first five lines of the /etc/passwd file, you can simply negate that match to say that you want to keep the first ten lines and delete the rest. Type the following command to try this:

$ cat /etc/passwd | sed '1,5!d'
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync

How It Works

Appending the exclamation mark to the address range 1,5 told sed to match everything except the first five lines and perform the deletion. Address negation also works for single-line addresses.

Address Steps

GNU sed has a feature called address steps that allows you to do things such as selecting every odd line, every third line, every fifth line, and so on.

Address steps are specified in the same way that you specify a delete range, except instead of using a comma to separate the numbers, you use a tilde (∼). The number before the tilde is the number that you want the stepping to begin from. If you want to start stepping from the beginning of the file, you use the number 1. The number that follows the tilde is what is called the step increment. The step increment tells sed how many lines to step. The following Try It Outs provide examples of address steps.

Try It Out: Address Stepping

Suppose that you want to delete every third line in your file, beginning with the first line. Run the following command to use an address step to accomplish this:

$ cat /etc/passwd | sed '1∼3d'
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh

How It Works

This deletes the first line, steps over the next three lines, and then deletes the fourth line. Sed continues applying this pattern until the end of the file.

Try It Out: More Address Stepping

If you want to start deleting every other line, starting with the second line, you can do so as follows:

$ cat /etc/passwd | sed '2∼2d'
root:x:0:0:root user:/root:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
man:x:6:12:man:/var/cache/man:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh

How It Works

This tells sed to delete the second line, step over the next line, delete the next line, and repeat until the end of the file is reached.

Substitution

This section introduces you to one of the more useful editing commands available in sed, the substitution command. This command is probably the most important command in sed and has a lot of options.

The substitution command, denoted by s, will substitute any string that you specify with any other string that you specify. To substitute one string with another, you need to have some way of telling sed where your first string ends and the substitution string begins. This is traditionally done by bookending the two strings with the forward slash (/) character.

Try It Out: Substitution

Type the following command to perform a basic literal string substitution, replacing the login name root with the reverse:

$ cat /etc/passwd | sed 's/root/toor/'
toor:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync

games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

How It Works

This command substitutes the first occurrence on a line of the string root with the string toor. (Notice that this line uses the s editing command and not d.)

If you forget to specify the trailing slash to the sed command, you will get an error. GNU sed will tell you that there is an unterminated s command; other sed implementations will just say that your command is garbled:

$ cat /etc/passwd | sed 's/root/toor'
sed: command garbled: s/root/toor

It is very important to note that sed substitutes only the first occurrence on a line. If the string root occurs more than once on a line (which it does in the example file), only the first match will be replaced. Usually you want to replace every string in the file with the new one instead of just the first occurrence. To do this, you must tell sed to make the substitution globally, replacing every occurrence of the string on every line of the file. To tell sed to do a global substitution, add the letter g to the end of the command:

$ cat /etc/passwd | sed 's/root/toor/g'
toor:x:0:0:toor user:/toor:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

Appending the g at the end of the substitution is passing a flag to the substitution command. The following section covers other substitution flags.

Substitution Flags

There are a number of other useful flags that can be passed in addition to the g flag, and you can specify more than one at a time.

The following is a full table of all the flags that can be used with the s substitution command.

Flag	Meaning
g	Replace all matches, not just the first match.
NUMBER	Replace only NUMBERth match.
p	If substitution was made, print pattern space.
w FILENAME	If substitution was made, write result to FILENAME. GNU sed additionally allows writing to /dev/stderr and /dev/stdout.
I or i	Match in a case-insensitive manner.
M or m	In addition to the normal behavior of the special regular expression characters ^ and $, this flag causes ^ to match the empty string after a newline and $ to match the empty string before a newline.

A useful flag to pass to a substitution command is i or its capital incarnation, I. Both indicate to sed to be case insensitive and match either the uppercase or lowercase of the characters you specify. If the /etc/passwd file had both the strings Root and root, the previous sed operation would match only the lowercase version. To get both throughout the entire file, you specify both the i flag and the g flag, as in the following example. Type the following sed substitution command:

$ cat /etc/passwd | sed 's/Root/toor/ig'
toor:x:0:0:toor user:/toor:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

If you specify any number as a flag (NUMBER flag), this tells sed to act on the instance of the string that matched that number. The /etc/passwd file has three instances of the string root in the first line, so if you want to replace only the third match on, add the number 3 at the end of the substitution delimiter, as in the following:

$ cat /etc/passwd | sed 's/root/toor/3' |head −2
root:x:0:0:root user:/toor:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

With this command, sed searches for the third instance of the string root in the file /etc/passwd and substitutes the string toor. (I piped the output through the Unix command head with the flag −2 to limit the output to the first two lines for brevity.)

The POSIX standard doesn't specify what should happen when the NUMBER flag is specified with the g flag, and there is no wide agreement on how this should be interpreted amongst the different sed implementations. The GNU implementation of sed ignores the matches before the NUMBER and then matches and replaces all matches from that NUMBER on.

Using an Alternative String Separator

You may find yourself having to do a substitution on a string that includes the forward slash character. In this case, you can specify a different separator by providing the designated character after the s. Suppose you want to change the home directory of the root user in the passwd file. It is currently set to /root, and you want to change it to /toor. To do this, you specify a different separator to sed. I use a colon (:) in this example:

$ cat /etc/passwd | sed 's:/root:/toor:' | head −2
root:x:0:0:root user:/toor:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

Notice this is exactly like doing string substitution with the slash character as a separator; the first string to look for is /root; the replacement is /toor.

It is possible to use the string separator character in your string, but sed can get ugly quickly, so you should try to avoid it by using a different string separator if possible. If you find yourself in the situation where you do need to use the string separator, you can do so by escaping the character. To escape a character means to put a special character in front of the string separator to indicate that it should be used as part of the string, rather than the separator itself. In sed you escape the string separator by putting a backslash before it, like so:

$ cat /etc/passwd | sed 's//root//toor/' | head −2
root:x:0:0:root user:/toor:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

This performs the exact search and replace as the example before, this time using the slash as a string separator, escaping the slash that appears in the string /root so it is interpreted properly. If you do not escape this slash, you will have an error in your command, because there will be too many slashes presented to sed and it will spit out an error. The error will vary depending on where in the process sed encounters it, but it will be another example of sed's rather cryptic errors:

sed: -e expression #1, char 10: unknown option to `s'

You can use any separator that you want, but by convention people use the slash separator until they need to use something else, as in this case.

String substitution is not limited to single words. The string you specify is limited only by the string separator that you use, so it is possible to substitute a whole phrase, if you like. The following command replaces the string root user with absolutely power corrupts:

$ cat /etc/passwd | sed 's/:root user/:absolutely power corrupts/g' |head −2
root:x:0:0:absolutely power corrupts:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

It is often useful to replace strings of text with nothing. This is a funny way of saying deleting words or phrases. For example, to remove a word you simply replace it with an empty string, as in the following Try It Out.

Try It Out: Replacing with Empty Space

Use an empty substitution string to delete the root string from the /etc/passwd file entirely:

$ cat /etc/passwd | sed 's/root//g' | head −2
:x:0:0::/:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

How It Works

The 's/root//g' tells sed to replace all instances of root with the empty replacement string that follows the separator.

The same goes for strings with spaces in them. If you want to remove the root user string, you can replace it with an empty string:

$ cat /etc/passwd | sed '/root user//g' | head −2
root:x:0:0::/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

Address Substitution

As with deletion, it is possible to perform substitution only on specific lines or on a specific range of lines if you specify an address or an address range to the command.

If you want to substitute the string sh with the string quiet only on line 10, you can specify it as follows:

$ cat /etc/passwd | sed '10s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh

sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/quiet

This is just like the line-specific delete command, but you are performing a substitution instead of a deletion. As you can see from the output of this command, the substitution replaces the sh string with quiet only on line 10.

Similarly, to do an address range substitution, you could do something like the following:

$ cat /etc/passwd | sed '1,5s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/quiet
daemon:x:1:1:daemon:/usr/sbin:/bin/quiet
bin:x:2:2:bin:/bin:/bin/quiet
sys:x:3:3:sys:/dev:/bin/quiet
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

As you can see from the output, the first five lines had the string sh changed to quiet, but the rest of the lines were left untouched.

Advanced sed Invocation

You often want to make more than one substitution or deletion or do more complex editing using sed. To use sed most productively requires knowing how to invoke sed with more than one editing command at a time.

You can specify multiple editing commands on the command line in three different ways, explained in the following Try It Out. The editing commands are concatenated and are executed in the order that they appear. You must specify the commands in appropriate order, or your script will produce unexpected results.

Try It Out: Advanced sed Invocation

This Try It Out shows you three ways to use multiple commands to produce the same output. Create a simple text file called stream.txt, using your favorite editor, and put the following lines in it:

Imagine a quaint bubbling stream of cool mountain water filled with rainbow trout and elephants drinking iced tea.

The first way to specify multiple editing commands on the command line is to separate each editing command with a semicolon. To replace the trout with catfish and remove the elephants, try the following command, using semicolons to separate editing commands:

$ cat stream.txt | sed 's/trout/catfish/; s/ and elephants//'

The second way to specify multiple editing commands on the command line is to specify multiple -e arguments. Try the following command on the stream.txt file:

$ cat stream.txt | sed -e 's/trout/catfish/' -e 's/ and elephants//'

The third way to specify multiple editing commands on the command line is to use the multiline capability of the Bash shell. Bash knows when you have not terminated a single quote and prompts you for more input until you enter the completing single quote.

To do the example in this way, try typing the following in a Bash shell. After the first single quote, press Enter (Return) as follows:

$ cat stream.txt | sed '
> s/trout/catfish/
> s/ and elephants//'

How It Works

Each of the three examples in this Try It Out results in the following text:

Imagine a quaint bubbling stream of cool mountain water filled with rainbow catfish drinking iced tea.

In the first example, two separate editing commands, separated by semicolons, were specified to one sed command. The first command performed a substitution of the string trout with the replacement catfish, and then a semicolon was placed after the trailing substitution delimiter. The second editing command follows immediately afterward; it replaces the string and elephants with an empty string, effectively deleting that string. The two substitution commands are grouped together, surrounded by the single quotes. Each editing command supplied is performed on each input line in the order they appear; this means that the first substitution is performed, and the resulting text is then provided to the next command for the next substitution. The second command is not run before the first, and the input that the second line receives is always the processed data from the first command. The data progresses in a sequential manner through all the supplied editing commands until it reaches the end.

The second example contains two -e arguments, each individual editing command paired with its own -e flag. Each editing command passed on the command line is performed sequentially on each input line, in the same manner as it is done in the first example.

You can invoke sed using both the semicolon-separated method of the first example and the multiple -e argument of the second example together, as many times as you require. Sed simply concatenates all the commands and script files together in the order they appear. So the first command or commands specified will be executed first, followed by the next, until there are no more commands to execute. The third example, using the multiline capability, results in the exact same output as the previous two examples. It is simply a different way of invoking sed. Korn, Bourne, and zsh all perform in this manner; however, the C shell does not work this way.

These three methods are ways to invoke sed by specifying editing commands on the command line. The other method of invoking sed is to specify a file that contains all the commands that you want sed to run. This is useful when your commands become cumbersome on the command line or if you want to save the commands for use in the future. If you make a mistake on the command line, it can be confusing to try to fix that mistake, but if your commands are specified in a file you can simply re-edit that file to fix your mistake.

To specify the file containing the editing commands you want sed to perform, you simply pass the -f flag followed immediately by the file name containing the editing commands, as in the following Try It Out.

Try It Out: sed Scripts

Create a text file called water.sed with your favorite editor. Place the following text in the file, and then save and exit your editor:

s/trout/catfish/
s/ and elephants//

As you can see, this file consists of two editing commands and nothing else.

Using the stream.txt file from the previous examples, execute the following command:

$ sed -f water.sed stream.txt
Imagine a quaint bubbling stream of cool mountain water filled with rainbow catfish drinking iced tea.

How It Works

This executes the commands you saved in the file water.sed on the stream.txt file. The water.sed file simply contains the editing commands in the order that they should be executed. This produces the same results as the three preceding examples.

As your sed scripts become more complicated, you will find it more and more useful to put them into files and execute them this way.

The comment Command

As in most programming languages, it's useful to include comments in your script to remember what different parts do or to provide information for others who might be trying to decipher your script. To add a comment in a sed script, you do what you do in other shell scripting environments: precede the line with the # character. The comment then continues until the next newline.

There are two caveats with the comment command: the first is that comments are not portable to non-POSIX versions of sed. If someone is running a version of sed that is not POSIX-conformant, their sed may not like comments anywhere in your sed script except on the very first line.

The second caveat with the comment command is that if the first two characters of your sed script are #n, the -n (no auto-print) option is automatically enabled. If you find yourself in the situation where your comment on the first line should start with the n character, simply use a capital N or place a space between the # and the n:

# Not going to enable -n
#Not going to enable -n
#no doubt, this will enable -n

The insert, append, and change Commands

The insert and append commands are almost as harmless as the comment command. Both of these commands simply output the text you provide. Insert (i) outputs the text immediately, before the next command, and append (a) outputs the text immediately afterward.

A classic example that illustrates these commands is converting a standard text file into an HTML file. HTML files have a few tags at the beginning of the files, followed by the body text and then the closing tags at the end. Using i and a, you can create a simple sed script that will add the opening and closing tags to any text file.

Try It Out: Inserting and Appending

Place the following sed program into a file called txt2html.sed:

#! /bin/sed -f

1 i
<html>
<head><title>Converted with sed</title></head>
<body bgcolor="#ffffff">
<pre>

$ a
</pre>
</body>
</html>

Now take a text file (such as stream.txt) and run it through this sed script:

$ cat stream.txt | sed -f txt2html.sed
<html>
<head><title>Converted with sed</title></head>
<body bgcolor="#ffffff">
<pre>

Imagine a quaint bubbling stream of cool mountain water filled with rainbow trout
and elephants drinking iced tea.
</pre>
</body>
</html>

How It Works

You will see that sed inserted, starting at line 1, the four opening HTML tags that indicate that the file is HTML, and set the <title> and the background color. Then your text file is printed, and at the end of your file (denoted by the $), sed appended the closing HTML tags.

The insert and append commands add information only. On the other hand, the change (c) command replaces the current line in the pattern space with the text that you specify. The only difference between the substitute command (s) and the change command (c) is that substitute works on a character-by-character basis, whereas the change command changes the entire line. It works much like substitute but with broader strokes; it completely replaces one thing for another regardless of the context.

To illustrate this, change your water.sed script to the following and name it noelephants.sed:

s/trout/catfish/
/ and elephants/ cAlthough you can substitute trout with catfish, there is no substitute for elephants, so we cannot offer this item.

Run this as you did previously in the sed Scripts Try It Out:

$ cat stream.txt | sed -f noelephants.sed
Although you can substitute trout with catfish, there is no substitute for elephants, so we cannot offer this item.

Although the first substitution was run, changing the trout string to catfish, the change command replaces the entire line. When substitute was used in the earlier example, you matched the string and elephants and replaced it with an empty string. In this example, you again matched the string and elephants but this time used the change command, and instead of substituting, the entire original line was modified.

Advanced Addressing

Knowing exactly where in your file you want to perform your sed operations is not always possible. It's not so easy to know the exact line number or range of numbers you want the command(s) to act upon. Fortunately, sed allows you to apply your knowledge of regular expressions (regexps) to make your addressing much more powerful and useful.

In the Selecting Lines to Operate On section, you learned how to specify addresses and address ranges by specifying a line number or range of line numbers. When addresses are specified in this manner, the supplied editing command affects only the lines that you explicitly denoted in the address.

The same behavior is found when you use regular expressions in addresses; only those addresses that match the regular expression will have the editing command applied to them.

Regular Expression Addresses

To specify an address with a regular expression, you enclose the regular expression in slashes. The following example shows the top of my /etc/syslog.conf file as an example (yours may be slightly different):

#  /etc/syslog.conf     Configuration file for syslogd.
#
#                       For more information see syslog.conf(5)
#                       manpage.

# First some standard logfiles.  Log by facility.

auth,authpriv.*                 /var/log/auth.log

As you can see, a number of comments are in the file, followed by a few spaces and then some lines that are used for syslog. The following Try It Out shows you how to use a simple regular expression to remove all the comments in this file.

Try It Out: Regular Expression Addresses

Take a look at your /etc/syslog.conf file to see the comments and then perform the following command to remove them with a regular expression:

$ cat /etc/syslog.conf | sed '/^#/d'


auth,authpriv.*                 /var/log/auth.log

Notice how the blank lines are also printed.

How It Works

To understand this command, you need to look at each piece. The first part looks a little bit like a cartoon character swearing, /^#/. This is the address that you are specifying, and in this case it is a regular expression address. (You can tell this because the address is surrounded by slashes.) Directly following the trailing slash is the familiar d editing command that says to delete. The editing command you specify will be applied only to lines that match the pattern you specify in the regular expression.

The ^ character means to match the beginning of the line. Given that, you can extract this regular expression address to mean "match the # character if it is at the beginning of the line." Sed applies the editing command—in this case, the delete command—to every match it finds.

The result of this command is that every line beginning with the # character is deleted and the rest are printed out.

A good way to see exactly what a regular expression will actually match is to do things the other way around. Instead of deleting all the matches, print only the matches, and delete everything else. By using the -n flag to tell sed to not print anything, unless you explicitly tell it to, combined with the p flag, sed prints only the matches it finds and deletes all the other lines:

sed -n -e '/regexp/p' /path/to/file

cat /path/to/file | sed -n '/regexp/p'

Note the p command is specified after the regular expression, rather than the d.

Try It Out: Inverted Regular Expression Match

Try the sed regular expression command from the previous Try It Out, this time printing only the matches and deleting everything else:

$ cat /etc/syslog.conf | sed -n '/^#/p'
#  /etc/syslog.conf     Configuration file for syslogd.
#

#                       For more information see syslog.conf(5)
#                       manpage.
#
# First some standard logfiles.  Log by facility.
#

How It Works

This command prints to your screen all the comments in the /etc/syslog.conf file and nothing else.

The following table lists four special characters that are very useful in regular expressions.

Character	Description
^	Matches the beginning of lines
$	Matches the end of lines
.	Matches any single character
*	Matches zero or more occurrences of the previous character

In the following Try It Out examples, you use a few of these to get a good feel for how regular expressions work with sed.

Try It Out: Regular Expression Example 1

Using a regular expression as an address, try this command. Remember that your output might differ slightly if your input syslog.conf is different from the example used here:

$ cat /etc/syslog.conf | sed '/^$/d'
#  /etc/syslog.conf     Configuration file for syslogd.
#
#                       For more information see syslog.conf(5)
#                       manpage.
#
# First some standard logfiles.  Log by facility.
#
auth,authpriv.*                 /var/log/auth.log

How It Works

Here, you use two special regular expression characters in your search string. The first special character is the same as that used in the previous example, the ^ character, which matches the beginning of the lines. The second character is the $ character, which matches the end of lines.

This combination means that sed looks through /etc/syslog.conf and matches and then deletes those lines that have nothing between the beginning and end of the line. The result of this sed command is the removal of all blank lines in the file.

It is important to note that an empty line in a file does not contain a line full of spaces, tabs, or anything other than a newline.

If you are concerned that your regular expression might match something that you are not expecting, you can try it with the -n flag and the print command. This tells sed to print all the matches of blank lines and nothing else:

$ cat /etc/syslog.conf | sed -n '/^$/p'

If the only thing printed is blank lines and nothing else (as is shown in the preceding code), then the regular expression is matching what you expect it to.

Try It Out: Regular Expression Example 2

Here's another example to illustrate the uses of regular expressions with sed. Suppose you want to print only the lines that begin with the letter a, b, or c. Try this command:

$ cat /etc/syslog.conf | sed -n '/^[abc]/p'
auth,authpriv.*                 /var/log/auth.log

How It Works

This combines the regular expression ^ (match at the beginning of the line) with the regular expression [abc] to print only those lines that begin with one of those characters. Notice that it does not look for lines that begin with the string abc but instead looks for any one of those characters.

The square brackets denote a range of characters; you can specify [g-t] to get all lowercase characters between g and t or specify [3-25] to get all numbers between 3 and 25.

Character Class Keywords

Some special keywords are commonly available to regexps, especially GNU utilities that employ regexps. These are very useful for sed regular expressions as they simplify things and enhance readability.

For example, the characters a through z as well as the characters A through Z constitute one such class of characters that has the keyword [[:alpha:]], meaning all alphabetic characters. Instead of having to specify every character in a regular expression, you can simply use this keyword instead, as in the following example.

Using the alphabet character class keyword, this command prints only those lines in the /etc/syslog.conf file that start with a letter of the alphabet:

$ cat /etc/syslog.conf | sed -n '/^[[:alpha:]]/p'
auth,authpriv.*                 /var/log/auth.log

If you instead delete all the lines that start with alphabetic characters, you can see what doesn't fall within the [[:alpha:]] character class keyword:

$ cat /etc/syslog.conf | sed '/^[[:alpha:]]/d'
#  /etc/syslog.conf     Configuration file for syslogd.
#
#                       For more information see syslog.conf(5)
#                       manpage.

#
# First some standard logfiles.  Log by facility.
#

The following table is a complete list of the available character class keywords in GNU sed.

Character Class Keyword	Description
[[:alnum:]]	Alphanumeric [a-z A-Z 0-9]
[[:alpha:]]	Alphabetic [a-z A-Z]
[[:blank:]]	Blank characters (spaces or tabs)
[[:cntrl:]]	Control characters
[[:digit:]]	Numbers [0-9]
[[:graph:]]	Any visible characters (excludes whitespace)
[[:lower:]]	Lowercase letters [a-z]
[[:print:]]	Printable characters (noncontrol characters)
[[:punct:]]	Punctuation characters
[[:space:]]	Whitespace
[[:upper:]]	Uppercase letters [A-Z]
[[:xdigit:]]	Hex digits [0-9 a-f A-F]

Character classes are very useful and should be used whenever possible. They adapt much better to non-English character sets, such as accented characters.

Regular Expression Address Ranges

I demonstrated in the Address Ranges section that specifying two line numbers, separated by commas, is equivalent to specifying a range of lines over which the editing command will be executed.

The same behavior applies with regular expressions. You can specify two regular expressions, separated by a comma, and sed will match all of the lines from the first line that matches the first regular expression all the way up to, and including, the line that matches the second regular expression. The following Try It Out demonstrates this behavior.

Try It Out: Regular Expression Address Ranges

Create a file called story.txt containing the following data:

The Elephants and the Rainbow Trout
  - a moral story about robot ethics

Once upon a time, in a land far far away,
there was a stream filled with elephants drinking ice tea
while watching rainbow trout swim by.

The end.

No, really, the story is over, you can go now.

Using a simple regular expression address range, you can print specific lines:

$ sed -n -e '/Once upon a time/,/The end./p' somefile.txt
Once upon a time, in a land far far away,
there was a stream filled with elephants drinking ice tea
while watching rainbow trout swim by.

The end.

How It Works

This regular expression range prints all the lines between the two matching regular expressions, including those lines. It will not print anything before or after those regular expressions.

If Once upon a time is not found, no data is printed. However, if Once upon a time is found, but not The end., then all subsequent lines are printed. If you specify an address range to sed, it goes through the entire file, printing each line, waiting for the second element of the address range. It has no idea whether The end. will appear in the next line it reads or not. The same concept applies when you specify address ranges in terms of line numbers. If you specify a line number that does not exist in the file, sed just prints until it reaches the end of the file, looking for that line number but never finding it. It is perhaps easier to think of the first address as the address where the action will start and the second where it will be stopped. Actions are started as soon as the first match is made, and the action continues on all following lines until the second match stops the action.

Combining Line Addresses with regexps

If you want to use a line address in combination with a regular expression, sed won't stop you. In fact, this is an often-used addressing scheme.

Simply specify the line number in the file where you want the action to start working and then use the regular expression to stop the work.

Try It Out: Line Addresses

Try the following line address, mixed with a regular expression address range using a character class keyword:

$ cat /etc/syslog.conf | sed '1,/^$/d'
#
# First some standard logfiles.  Log by facility.
#

auth,authpriv.*                 /var/log/auth.log

How It Works

This command starts deleting from the first line in the file and continues to delete up to the first line that is blank.

Advanced Substitution

Doing substitutions with regular expressions is a powerful technique.

Using address ranges with regular expressions simply required taking what you already knew about address ranges and using regular expressions in place of simple line numbers. The same one-to-one mapping works with substitution and regular expressions. You already know that to substitute the string trout with the string catfish throughout the stream.txt file, you simply do the following:

$ cat stream.txt | sed 's/trout/catfish/g'
Imagine a quaint bubbling stream of cool mountain water filled with rainbow catfish and elephants drinking iced tea.

To do regular expression substitutions, you simply map a regular expression onto the literal strings as you mapped the regular expression on top of the literal line numbers in the previous section. Suppose you have a text file with a number of paragraphs separated by blank lines. You can change those blank lines into HTML  markers, using a regular expression substitution command:

sed 's/^$/<p>/g'

The first part of the substitution looks for blank lines and replaces them with the HTML  paragraph marker.

Add this sed command to the beginning of your txt2html.sed file. Now your HTML converter will add all the necessary headers, convert any blank lines into  markers so that they will be converted better in your browser, and then append the closing HTML tags.

Referencing Matched regexps with &

Matching by regular expression is useful; however, you sometimes want to reuse what you matched in the replacement. That's not hard if you are matching a literal string that you can identify exactly, but when you use regular expressions you don't always know exactly what you matched. To be able to reuse your matched regular expression is very useful when your regular expressions match varies.

The sed metacharacter & represents the contents of the pattern that was matched. For instance, say you have a file called phonenums.txt full of phone numbers, such as the following:

You want to make the area code (the first three digits) surrounded by parentheses for easier reading. To do this, you can use the ampersand replacement character, like so:

$ sed -e 's/^[[:digit:]][[:digit:]][[:digit:]]/(&)/g' phonenums.txt
(555)5551212
(555)5551213
(555)5551214
(666)5551215
(666)5551216
(777)5551217

Let's unpack this; it's a little dense. The easy part is that you are doing this sed operation on the file phonenums.txt, which contains the numbers listed. You are doing a regular expression substitution, so the first part of the substitution is what you are looking for, namely ^[[:digit:]][[:digit:]][[:digit:]]. This says that you are looking for a digit at the beginning of the line and then two more digits. Because an area code in the United States is composed of the first three digits, this construction will match the area code. The second half of the substitution is (&). Here, you are using the replacement ampersand metacharacter and surrounding it by parentheses. This means to put in parentheses whatever was matched in the first half of the command. This will turn all of the phone numbers into what was output previously.

This looks nicer, but it would be even nicer if you also included a dash after the second set of three numbers, so try that out.

Try It Out: Putting It All Together

Using what you know already, you can make these phone numbers look like regular numbers. Put the previous list of numbers in a file, name the file phonenums.txt, and try this command:

$ sed -e 's/^[[:digit:]]{3}/(&)/g' -e 's/)[[:digit:]]{3}/&-/g' phonenums.txt >
nums.txt
$ cat nums.txt
(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217

How It Works

That command is a mouthful! However, it isn't much more than you already did. The first part of the command is the part that puts the parentheses around the first three numbers, exactly as before, with one change. Instead of repeating the character class keyword [[:digit:]] three times, you replaced it with {3}, which means to match the preceding regular expression three times.

After that, you append a second pattern to be executed by adding another -e flag. In this second regular expression substitution, you look for a right parenthesis and then three digits, in the same way as before.

Because these commands are concatenated one after another, the first regular expression substitution has happened, and the first three numbers already have parentheses around them, so you are looking for the closing parenthesis and then three numbers. Once sed finds that, it replaces the string by using the ampersand metacharacter to place the numbers where they already were and then adds a hyphen afterward.

At the very end of the command, the output is redirected to a new file called nums.txt. When redirecting to a file, no output is printed to the screen, so you run cat nums.txt to print the output.

Back References

The ampersand metacharacter is useful, but even more useful is the ability to define specific regions in a regular expressions so you can reference them in your replacement strings. By defining specific parts of a regular expression, you can then refer back to those parts with a special reference character.

To do back references, you have to first define a region and then refer back to that region. To define a region you insert backslashed parentheses around each region of interest. The first region that you surround with backslashes is then referenced by 1, the second region by 2, and so on.

Try It Out: Back References

In the previous example you formatted some phone numbers. Now continue with that example to illustrate back references. You now have a file called nums.txt that looks like this:

(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217

With one sed command, you can pick apart each element of these phone numbers by using back references. First, define the three regions in the left side of the sed command. Select the area code, the second set of numbers up to the dash, and then the rest of the numbers.

To select the area code, define a regular expression that includes the parenthesis:
```
/.*)/
```
This matches any number of characters up to a right-parenthesis character. Now, if you want to reference this match later, you need to enclose this regular expression in escaped parentheses, like this:
```
/(.*))/
```
Now that this region has been defined, it can be referenced with the 1 character.
Next, you want to match the second set of numbers, terminated by the hyphen character. This is very similar to the first match, with the addition of the hyphen:
```
/(.*-)/
```
This regular expression is also enclosed in parentheses, and it is the second defined region, so it is referenced by 2.
The third set of numbers is specified by matching any character repeating up to the end of the line:
```
/(.*$)/
```
This is the third defined region, so it is referred to as 3.

Now that you have all your regions defined, put them all together in a search and then use the references in the replacement right side, like so:

$ cat nums.txt | sed 's/(.*))(.*-)(.*$)/Area code: 1 Second: 2 Third: 3/'
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214
Area code: (666) Second: 555- Third: 1215
Area code: (666) Second: 555- Third: 1216
Area code: (777) Second: 555- Third: 1217

How It Works

As you see, this command line takes each number and defines the regions that you specified as output.

Hold Space

Like the pattern space, the hold space is another workbench that sed has available. The hold space is a temporary space to put things while you do other things, or look for other lines. Lines in the hold space cannot be operated on; you can only put things in the hold space and take things out from it. Any actual work you want to do on lines has to be done in the pattern space. It's the perfect place to put a line that you found from a search, do some other work, and then pull out that line when you need it. In short, it can be thought of as a spare pattern buffer.

There are a couple of sed commands that allow you to copy the contents of the pattern space into the hold space. (Later, you can use other commands to copy what is in the hold space into the pattern space.) The most common use of the hold space is to make a duplicate of the current line while you change the original in the pattern space.

The following table details the three basic commands that are used for operating with the hold space.

Command	Description of Command's Function
h or H	Overwrite (h) or append (H) the hold space with the contents of the pattern space. In other words, it copies the pattern buffer into the hold buffer.
g or G	Overwrite (g) or append (G) the pattern space with the contents of hold space.
x	Exchange the pattern space and the hold space; note that this command is not useful by itself.

Each of these commands can be used with an address or address range.

The classic way of illustrating the use of the hold space is to take a text file and invert each line in the file so that the last line is first and the first is last, as in the following Try It Out.

Try It Out: Using the Hold Space

Run the following sed command on the story.txt file:

$ cat story.txt | sed -ne '1!G' -e 'h' -e '$p'
No, really, the story is over, you can go now.

The end.

while watching rainbow trout swim by.
there was a stream filled with elephants drinking ice tea
Once upon a time, in a land far far away,

  - a moral story about robot ethics
The Elephants and the Rainbow Trout

How It Works

First, notice that there are actually three separate commands, separated by -e flags. The first command has a negated address (1)and then the command G. This means to apply the G command to every line except the first line. (If this address had been written 1G, it would mean to apply the G command only to the first line.)

Because the first line read in didn't have the G command applied, sed moved onto the next command, which is h. This tells sed to copy the first line of the file into the hold space.

The third command is then executed. This command says that if this line is the last line, then print it. Because this is not the last line, nothing is printed. Sed is finished processing the first line of the file, and the only thing that has happened is it has been copied into the hold space.

The cycle is repeated by sed reading in the second line of the file. Because the second line does not match the address specified in the first command, sed actually executes the G command this time. The G takes the contents of the hold space, which contains the first line because you put it there in the first cycle, and appends this to the end of the pattern space. Now the pattern space contains the second line, followed by the first line.

The second sed command is executed. This takes the contents of the pattern space and overwrites the hold space with it. This means that it is now taking the pattern space, which contains the second line of the file and the first line, and then it places it in the hold space.

The third command is executed, and because sed is not at the end of the file, it doesn't print anything.

This cycle continues until sed reaches the last line of the file, and the third command is finally executed, printing the entire pattern space, which now contains all the lines in reverse order.

More sed Resources

Refer to the following resources to learn even more about sed:

You can find the source code for GNU sed at ftp://ftp.gnu.org/pub/gnu/sed.
The sed one-liners (see the following section) are fascinating sed commands that are done in one line: http://sed.sourceforge.net/sed1line.txt.
The sed FAQ is an invaluable resource: http://sed.sourceforge.net/sedfaq.html.
Sed tutorials and other odd things, including a full-color, ASCII breakout game written only in sed, are available at http://sed.sourceforge.net/grabbag/scripts/.
The sed-users mailing list is available at http://groups.yahoo.com/group/sed-users/.
The man sed and info sed pages have the best information and come with your sed installation.

Common One-Line sed Scripts

The following code contains several common one-line sed commands. These one-liners are widely circulated on the Internet, and there is a more comprehensive list of one-liners available at http://sed.sourceforge.net/sed1line.txt.

The comments indicate the purpose of each script. Most of these scripts take a specific file name immediately following the script itself, although the input may also come through a pipe or redirection:

# Double space a file
   sed G file

   # Triple space a file
   sed 'G;G' file

   # Under UNIX: convert DOS newlines (CR/LF) to Unix format
   sed 's/.$//' file    # assumes that all lines end with CR/LF
   sed 's/^M$// file    # in bash/tcsh, press Ctrl-V then Ctrl-M

   # Under DOS: convert Unix newlines (LF) to DOS format
   sed 's/$//' file                     # method 1
   sed -n p file                        # method 2

   # Delete leading whitespace (spaces/tabs) from front of each line
   # (this aligns all text flush left). '^t' represents a true tab
   # character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
   sed 's/^[ ^t]*//' file

   # Delete trailing whitespace (spaces/tabs) from end of each line
   sed 's/[ ^t]*$//' file               # see note on '^t', above

   # Delete BOTH leading and trailing whitespace from each line
   sed 's/^[ ^t]*//;s/[ ^]*$//' file    # see note on '^t', above

   # Substitute "foo" with "bar" on each line
   sed 's/foo/bar/' file        # replaces only 1st instance in a line
   sed 's/foo/bar/4' file       # replaces only 4th instance in a line
   sed 's/foo/bar/g' file       # replaces ALL instances within a line

   # Substitute "foo" with "bar" ONLY for lines which contain "baz"
   sed '/baz/s/foo/bar/g' file

   # Delete all CONSECUTIVE blank lines from file except the first.
   # This method also deletes all blank lines from top and end of file.
   # (emulates "cat -s")
   sed '/./,/^$/!d' file       # this allows 0 blanks at top, 1 at EOF
   sed '/^$/N;/
$/D' file     # this allows 1 blank at top, 0 at EOF

   # Delete all leading blank lines at top of file (only).
   sed '/./,$!d' file

   # Delete all trailing blank lines at end of file (only).
   sed -e :a -e '/^
*$/{$d;N;};/
$/ba' file

   # If a line ends with a backslash, join the next line to it.
   sed -e :a -e '/\$/N; s/\
//; ta' file

   # If a line begins with an equal sign, append it to the previous
   # line (and replace the "=" with a single space).
   sed -e :a -e '$!N;s/
=/ /;ta' -e 'P;D' file

Common sed Commands

In addition to the substitution command, which is used most frequently, the following table lists the most common sed editing commands.

Editing Command	Description of Command's Function
#	Comment. If first two characters of a sed script are #n, then the -n (no auto-print) option is forced.
{ COMMANDS }	A group of COMMANDS may be enclosed in curly braces to be executed together. This is useful when you have a group of commands that you want executed on an address match.
d[address][,address2]]d	Deletes line(s) from pattern space.
n	If auto-print was not disabled (-n), print the pattern space, and then replace the pattern space with the next line of input. If there is no more input, sed exits.

Less Common sed Commands

The remaining list of commands that are available to you in sed are much less frequently used but are still very useful and are outlined in the following table.

Command	Usage
: label	Label a line to reference later for transfer of control via b and t commands.
a[address][,address2]a text	Append text after each line matched by address or address range.
b[address][,address2]]b[label]	Branch (transfer control unconditionally) to :label.
c[address][,address2]] text	Delete the line(s) matching address and then output the lines of text that follow this command in place of the last line.
D[address][,address2]]D	Delete first part of multiline pattern (created by N command) space up to newline.
g	Replace the contents of the pattern space with the contents of the hold space.
G	Add a newline to the end of the pattern space and then append the contents of the hold space to that of the pattern space.
h	Replace the contents of the hold space with the contents of the pattern space.
H	Add a newline to the end of the hold space and then append the contents of the pattern space to the end of the pattern space.
i[address][,address2] text	Immediately output the lines of text that follow this command; the final line ends with an unprinted "".
l N	Print the pattern space using N lines as the word-wrap length. Nonprintable characters and the character are printed in C-style escaped form. Long lines are split with a trailing "" to indicate the split; the end of each line is marked with "$".
N	Add a newline to the pattern space and then append the next line of input into the pattern space. If there is no more input, sed exits.
P	Print the pattern space up to the first newline.
r[address][,address2] FILENAME	Read in a line of FILENAME and insert it into the output stream at the end of a cycle. If file name cannot be read, or end-of-file is reached, no line is appended. Special file /dev/stdin can be provided to read a line from standard input.
w[address][,address2] FILENAME	Write to FILENAME the pattern space. The special file names /dev/stderr and /dev/stdout are available to GNU sed. The file is created before the first input line is read. All w commands that refer to the same FILENAME are output without closing and reopening the file.
x	Exchange the contents of the hold and pattern spaces.

GNU sed-Specific sed Extensions

The following table is a list of the commands specific to GNU sed. They provide enhanced functionality but reduce the portability of your sed scripts. If you are concerned about your scripts working on other platforms, use these commands carefully!

Editing Command	Description of Command's Function
e [COMMAND]	Without parameters, executes command found in pattern space, replacing pattern space with its output. With parameter COMMAND, interprets COMMAND and sends output of command to output stream.
L N	Fills and joins lines in pattern space to produce output lines of N characters (at most). This command will be removed in future releases.
Q [EXIT-CODE]	Same as common q command, except that it does not print the pattern space. It provides the ability to return an EXIT-CODE.
R FILENAME	Reads in a line of FILENAME and inserts it into the output stream at the end of a cycle. If file name cannot be read or end-of-file is reached, no line is appended. Special file /dev/stdin can be provided to read a line from standard input
T LABEL	Branch to LABEL if there have been no successful substitutions (s) since last input line was read or branch taken. If LABEL is omitted, the next cycle is started.
v VERSION	This command fails if GNU sed extensions are not supported. You can specify the VERSION of GNU sed required; default is 4.0, as this is the version that first supports this command.
W FILENAME	Write to FILENAME the pattern space up to the first newline. See standard w command regarding file handles.

Summary

As you use sed more and more, you will become more familiar with its quirky syntax and you will be able to dazzle people with your esoteric and cryptic-looking commands, performing very powerful text processing with a minimum of effort.

In this chapter, you learned:

The different available versions of sed.
How to compile and install GNU sed, even on a system that doesn't have a working version.
How to use sed with some of the available editing commands.
Different ways to invoke sed: on the command line with the -e flag, separated by semicolons, with the bash multiline method, and by writing sed scripts.
How to specify addresses and address ranges by specifying the specific line number or specific range of line numbers. You learned address negation and stepping, and regular expression addressing.
The bread and butter of sed, substitution, was introduced, and you learned how to do substitution with flags, change the substitution delimiter, do substitution with addresses and address ranges, and do regular expression substitutions.
Some of the other basic sed commands: the comment, insert, append, and change commands.
What character class keywords are and how to use them.
About the & metacharacter and how to do numerical back references.
How to use the hold space to give you a little breathing room in what you are trying to do in the pattern space.

The next chapter covers how to read and manipulate text from files using awk. Awk was designed for text processing and works well when called from shell scripts.

Exercises

Use an address range negation to print only the fifth line of your /etc/passwd file. Hint: Use the delete editing command.
Use an address step to print every fifth line of your /etc/passwd file, starting with the tenth.
Use an address step to delete the tenth line of your /etc/passwd file and no other line.
Write a sed command that takes the output from ls -l issued in your home directory and changes the owner of all the files from your username to the reverse. Make sure not to change the group if it is the same as your username.
Do the same substitution as Exercise 4, except this time, change only the first ten entries and none of the rest.
Add some more sed substitutions to your txt2html.sed script. In HTML you have to escape certain commands in order that they be printed properly. Change any occurrences of the ampersand (&) character into & for proper HTML printing. Hint: You will need to escape your replacement. Once you have this working, add a substitution that converts the less than and greater than characters (< and >) to < and > respectively.
Change your txt2html.sed script so that any time it encounters the word trout, it makes it bold by surrounding it with the HTML bold tags ( and the closing ). Also make the script insert the HTML paragraph marker () for any blank space it finds.
Come up with a way to remove the dash from the second digit so instead of printing Area code: (555) Second: 555- Third: 1212, you instead print Area code: (555) Second: 555 Third: 1212.
Take the line reversal sed script shown in the Hold Space section and re-factor it so it doesn't use the -n flag and is contained in a script file instead of on the command line.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Processing Text with sed

Create new playlist

Sign In

Sign Up

Chapter 6. Processing Text with sed

Introducing sed

sed Versions

Installing sed

Bootstrap Installation

Configuring and Installing sed

Note

How sed Works

Invoking sed

Editing Commands

Invoking sed with the -e Flag

Note

The -n, --quiet, and --silent Flags

sed Errors

Selecting Lines to Operate On

Address Ranges

Address Negation

Address Steps

Substitution

Substitution Flags

Using an Alternative String Separator

Address Substitution

Advanced sed Invocation

The comment Command

The insert, append, and change Commands

Advanced Addressing

Regular Expression Addresses

Character Class Keywords

Regular Expression Address Ranges

Combining Line Addresses with regexps

Advanced Substitution

Referencing Matched regexps with &

Back References

Hold Space

More sed Resources

Common One-Line sed Scripts

Common sed Commands

Less Common sed Commands

GNU sed-Specific sed Extensions

Summary

Exercises

Table of Contents for
6. Processing Text with sed