Chapter 1. An Embarrassment of Riches: The Linux Environment

The reader is introduced to the vast possibilities of the Linux command line, and excuses are made for its eclecticism.

What You Will Learn

Some basic shell commands are described in this chapter, especially those related to some common programming tasks. Used as a toolkit, they can be a handy collection of tools for everyday use.

Linux provides an incredible array of such tools, useful for any development effort, Java or otherwise. They will be important not only for the development of your Java code, but for all the hundreds of related housekeeping tasks associated with programming and with managing your development environment. A few tools are described briefly in this chapter, to hint at what can be done and to whet your appetite for more.

We will also describe a command which will help you learn about other commands. Even so, it may be quite worth your while to have another book about UNIX/Linux handy. If there is something you, as a programmer, need to do on a Linux system, chances are there is already a command (or a sequence of commands) which will do it.

Finally, we will discuss the extent of our remaining ignorance upon finishing the chapter.

Let us take a moment to explain that last comment. As readers of computer books ourselves, we are often frustrated when we discover how lightly a topic has been covered, but particularly so when other parts of the same book are found to fully explore their topics. When only some parts of a book are thorough, you often don’t know that you don’t know it all. We will introduce some basic shell concepts and commands here, and we may expand on some of these in later chapters, but each of our chapters covers topics that could each fill its own book. Therefore we need to leave out lots of material. We will also let you know when we have left things out because they are off-topic, or because we don’t have room. We’ll also try to tell you where to look for the rest of the knowledge. We try to sum this up in a final section of each chapter entitled What You Still Don’t Know. But we do have a lot of information to impart, so let’s get going.

The Command Line: What’s the Big Deal?

One of the revolutionary things that UNIX (and thus Linux) did was to separate operating system commands from the operating system itself. The commands to display files, show the contents of directories, set permissions, and so on were, in the “olden days,” an integral part of an operating system. UNIX removed all that from the operating system proper, leaving only a small “kernel” of necessary functionality in the operating system. The rest became executables that lived outside of the operating system and could be changed, enhanced, or even replaced individually by (advanced) users without modifying the operating system. The most significant of these standalone pieces was the command processor itself, called the shell.

The shell is the program that takes command-line input, decides what program(s) you are asking to have run, and then runs those programs. Before there were Graphical User Interfaces, the shell was the user interface to UNIX. As more developers began working with UNIX, different shells were developed to provide different features for usability. Now there are several shells to choose from, though the most popular is bash. Some BSD/UNIX die hards still swear by csh, a.k.a. the C-shell, though most of its best features have been incorporated into bash.

Tip

There are actually quite a few shells to choose from, and several editors for entering text. Our recommendation: If you learn only one shell, learn bash. If you learn only one editor, learn vi. Some basic shell scripting will go a long way to eliminating mundane, repetitive tasks. Some basic vi editing will let you do things so much faster than what GUI editors support. (More on editing in Chapter 2.)

Since commands could be developed and deployed apart from the operating system, UNIX and Linux have, over the years, had a wide variety of tools and commands developed for them. In fact, much of what is called Linux is really the set of GNU tools which began development as Open Source long before Linux even existed. These tools, while not technically part of the operating system, are written to work atop any UNIX-like operating system and programmers have come to expect them on any Linux system that they use. Some commands and utilities have changed over the years, some are much the same as they first were in the early days of UNIX.

Developers, encouraged by the openness of Open Source (and perhaps having too much free time on their hands) have continued to create new utilities to help them get their job done better/faster/cheaper. That Linux supports such a model has helped it to grow and spread. Thus Linux presents the first time user with a mind-boggling array of commands to try to learn. We will describe a few essential tools and help you learn about more.

Basic Linux Concepts and Commands

There are some basic Linux commands and concepts that you should know in order to be able to move around comfortably in a Linux filesystem. Check your knowledge of these commands, and if need be, brush up on them. At the end of the chapter, we list some good resources for learning more about these and other commands. Remember, these are commands that you type, not icons for clicking, though the windowing systems will let you set up icons to represent those commands, once you know what syntax to use.

So let’s get started. Once you’ve logged in to your Linux system, regardless of which windowing system you are using—KDE, Gnome, Window Maker, and so on, start up an xterm window by running xterm (or even konsole) and you’ll be ready to type these commands. [1]

Redirecting I/O

The second great accomplishment of UNIX, [2] carried on into its Linux descendants, was the concept of redirecting input and output (I/O). It was based on the concept of a standardized way in which I/O would be done, called standard I/O.

Standard I/O

A familiar concept to Linux developers is the notion of standard I/O. Virtually every Linux process begins its life with three open file descriptors—standard in, standard out, and standard error. Standard in is the source of input for the process; standard out is the destination of the process’ output; and standard error is the destination for error messages. For “old fashioned” command-line applications, these correspond to keyboard input for standard in and the output window or screen for both standard out and error.

A feature of Linux that makes it so adaptable is its ability to redirect its I/O. Programs can be written generically to read from standard in and write to standard out, but then when the user runs the program, he or she can change (or redirect) the source (in) or destination (out) of the I/O. This allows a program to be used in different ways without changing its code.

Redirecting I/O is accomplished on the Linux shell command line by the “<” and “>” characters. Consider the ls program which lists the contents of a directory. Here is a sample run of ls:

$ ls
afile    more.data    zz.top
$

We can redirect its output to another location, a file, with the “>” character:

$ ls > my.files
$

The output from the ls command no longer appears on the screen (the default location of standard out); it has been redirected to the file my.files.

What makes this so powerful a construct (albeit for a very simple example) is the fact that not only was no change to the program required, but the programmer who wrote the ls program also did nothing special for I/O. He simply built the program to write to standard out. The shell did the work of redirecting the output. This means that any program invoked by the shell can have its output similarly redirected.

Standard error is another location for output, but it was meant as the destination for error messages. For example, if you try to list the contents of a nonexistent directory, you get an error message:

$ ls bogus
ls: bogus: No such file or directory
$

If you redirect standard out, nothing changes:

$ ls bogus > save.out
ls: bogus: No such file or directory
$

That’s because the programmer wrote the program to send the message to standard error, not standard out. In the shell (bash) we can redirect standard error by preceding the redirect symbol with the number 2, as follows: [3]

$ ls bogus 2> save.out
$

Note there is no output visible from ls. The error message, ls: bogus: No such file or directory, has been written to the file save.out.

In a similar way standard input (stdin) can be redirected from its default source, the keyboard.

As an example, we’ll run the sort program. Unless you tell it otherwise, sort will read from stdin—that is, the keyboard. We type a short list of phrases and then type a ^D (a Control-D) which won’t really echo to the screen as we have shown but will tell Linux that it has reached the end of the input. The lines of text are then printed back out, now sorted by the first character of each line. (This is just the tip of the iceberg of what sort can do.)

$ sort
once upon a time
a small creature
came to live in
the forest.
^D
a small creature
came to live in
once upon a time
the forest.

Now let’s assume that we already have our text inside a file called story.txt. We can use that file as input to the sort program by redirecting the input with the “<” character. The sort doesn’t know the difference. Our output is the same:

$ sort < story.txt
a small creature
came to live in
once upon a time
the forest.

Pipes

The output from one command can also be sent directly to the input of another command. Such a connection is called a pipe. Linux command-line users also use “pipe” as a verb, describing a sequence of commands as piping the output of one command into another. Some examples:

$ ls  | wc > wc.fields
$ java MyCommand < data.file | grep -i total > out.put

The first example runs ls, then pipes its output to the input of the wc program. The output of the wc command is redirected to the file wc.fields. The second example runs java, giving it a class file named MyCommand. Any input that this command would normally read from keyboard input will be read this time from the file data.file. The output from this will be piped into grep, and the output from grep will be put into out.put.

Don’t worry about what these commands really do. The point of the example is to show how they connect. This has wonderful implications for developers. You can write your program to read from the keyboard and write to a window, but then, without any change to the program, it can be instructed to read from files and write to files, or be interconnected with other programs.

This leads to a modularization of functions into small, reusable units. Each command can do a simple task, but it can be interconnected with other commands to do more, with each pipeline tailored by the user to do just what is needed. Take wc for example. Its job is to count words, lines, and characters in a file. Other commands don’t have to provide an option to do this; any time you want to count the lines in your output, just pipe it into wc.

The ls Command

The ls command is so basic, showing the names of files in a directory. Be sure that you know how to use these options:

  • ls lists the files in a directory.

  • ls -l is the long form, showing permissions, ownership, and size.

  • ls -ld doesn’t look inside the directory, so you can see the directory’s permissions.

  • ls -lrt shows the most recently modified files last, so you can see what you’ve just changed.

Filenames

Filenames in Linux can be quite long and composed of virtually any character. Practically speaking, however, you’re much better off if you limit the length to something reasonable, and keep to the alphanumeric characters, period, and the underscore (“_”). That’s because almost all the other punctuation characters have a special meaning to the shell, so if you want to type them, you need to escape their special meaning, or suffer the results of unintended actions.

Filenames are case sensitive—upper- and lowercase names are different. The files ReadMe.txt and readme.txt could both be in the same directory; they are distinct files.

Avoid using spaces in filenames, as the shell uses whitespace to delineate between arguments on a command line. You can put a blank in a name, but then you always have to put the name in quotes to refer to it in the shell.

To give a filename more visual clues, use a period or an underscore. You can combine several in one filename, too. The filenames read_me_before_you_begin or test.data.for_my_program may be annoyingly long to type, but they are legal filenames.

Note

The period, or “dot,” in Linux filenames has no special meaning. If you come from the MS-DOS world, you may think of the period as separating the filename from the extension, as in myprogrm.bas where the filename is limited to eight characters and the extension to three characters. Not so in Linux. There is no “extension,” it’s all just part of the filename.

You will still see names like delim.c or Account.java, but the .c or .java are simply the last two characters or the last five characters, respectively, of the filenames. That said, certain programs will insist on those endings for their files. The Java compiler will insist that its source files end in .java and will produce files that end in .class—but there is no special part of the filename to hold this. This will prove to be very handy, both when you name your files and when you use patterns to search for files (see below).

Permissions

Permissions in Linux are divided into three categories: the owner of a file (usually the user who created it), the group (a collection of users), and others, meaning everyone who is not the owner and not in the group. Any file belongs to a single owner and, simultaneously, to a single group. It has separate read/write/execute permissions for its owner, its group, and all others. If you are the owner of a file, but also a member of the group that owns the file, then the owner permissions are what counts. If you’re not the owner, but a member of the group, then the group permissions will control your access to the file. All others get the “other” permissions.

If you think of the three permissions, read/write/execute, as three bits of a binary number, then a permission can be expressed as an octal digit—where the most significant bit represents read permission, the middle bit is write permission, and the least significant bit is execute permission. If you think of the three categories, user/group/others, as three digits, then you can express the permissions of a file as three octal digits, for example “750”. The earliest versions of this command required you to set file permissions this way, by specifying the octal number. Now, although there is a fancier syntax (for example, g+a), you can still use the octal numbers in the chmod command. See the example below.

The fancier, or more user-friendly, syntax uses letters to represent the various categories and permissions. The three categories of user, group, and other are represented by their first letters: u, g, and o. The permissions are similarly represented by r, w, and x. (OK, we know “x” is not the first letter, but it is a reasonable choice.) For both categories and permissions, the letter a stands for “all.” Then, to add permissions, use the plus sign (+); to remove permissions, use the minus sign (-). So g+a means “add all permissions to the group category,” and a+r means “add read permissions to all categories.”

Be sure that you know these commands for manipulating permissions:

  • chmod changes the mode of a file, where mode refers to the read/write/execute permissions.

  • chown changes the owner of a file. [4]

  • chgrp changes the group owner of a file.

Table 1.1 shows some common uses of these commands.

Table 1.1. Changing permissions

Command

Explanation

chmod a+r file

Gives everyone read permission.

chmod go-w file

Takes away write permission from group, others.

chmod u+x file

Sets up a shell script so you can execute it like a command.

chmod 600 file

Sets permission to read and write for the owner but no permissions for anyone else.

File Copying

Do you know these commands?

  • mv

  • cp

  • ln

The mv command (short for “move”) lets you move a file from one place in the hierarchy of files to another—that is, from one directory to another. When you move the file, you can give it a new name. If you move it without putting it in a different directory, well, that’s just renaming the file.

  • mv Classy.java Nouveau.java

  • mv Classy.java /tmp/outamy.way

  • mv Classx.java Classz.java ..

  • mv /usr/oldproject/*.java .

The first example moves Classy.java to a new name, Nouveau.java, while leaving the file in the same directory.

The second example moves the file named Classy.java from the current directory over to the /tmp directory and renames it outamy.way—unless the file outamy.way is an already existing directory. In that case, the file Classy.java will end up (still named Classy.java) inside the directory outamy.way.

The next example just moves the two Java source files up one level, to the parent directory. The “.. is a feature of every Linux directory. Whenever you create a directory, it gets created with two links already built in: “..” points to its parent (the directory that contains it), and “. points to the directory itself.

A common question at this point is, “Why does a directory need a reference to itself?” Whatever other reasons there may be, it certainly is a handy shorthand to refer to the current directory. If you need to move a whole lot of files from one directory to another, you can use the “.” as your destination. That’s the fourth example.

The cp command is much like the mv command, but the original file is left right where it is. In other words, it copies files instead of moving them. So:

cp Classy.java Nouveau.java

will make a copy of Classy.java named Nouveau.java, and:

cp Classy.java /tmp

will make a copy of Classy.java in the /tmp directory, and:

cp *.java /tmp

will put the copies of all the Java sources in the current directory to the /tmp directory.

If you run this command,

ln Classy.java /tmp

you might think that ln copies files, too. You will see Classy.java in your present working directory and you will see what appears to be a copy of the file in the /tmp directory. But if you edit your local copy of Classy.java and then look at the “copy” that you made in the /tmp directory, you will see the changes that you made to your local file now also appear in the file in the /tmp directory.

That’s because ln doesn’t make a copy. It makes a link. A link is just another name for the same contents. We will discuss linking in detail later in the book (see Section 6.2.1).

Seeing Stars

We need to describe shell pattern matching for those new to it. It’s one of the more powerful things that the shell (the command processor) does for the user—and it makes all the other commands seem that much more powerful.

When you type a command like we did previously:

mv /usr/oldproject/*.java .

the asterisk character (called a “star” for short) is a shorthand to match any characters, which in combination with the .java will then match any file in the /usr/oldproject directory whose name ends with .java.

There are two significant things to remember about this feature. First, the star and the other shell pattern matching characters (described below) do not mean the same as the regular expressions in vi or other programs or languages. Shell pattern matching is similar in concept, but quite different in specifics.

Second, the pattern matching is done by the shell, the command interpreter, before the arguments are handed off to the specific command. Any text with these special characters is replaced, by the shell, with one or more filenames that match the pattern. This means that all the other Linux commands (mv, cp, ls, and so on) never see the special characters—they don’t do the pattern matching, the shell does. The shell just hands them a list of filenames.

The significance here is that this functionality is available to any and every command, including shell scripts and Java programs that you write, with no extra effort on your part. It also means that the syntax for specifying multiple files doesn’t change between commands—since the commands don’t implement that syntax; it’s all taken care of in the shell before they ever see it. Any command that can handle multiple filenames on a command line can benefit from this shell feature.

If you’re familiar with MS-DOS commands, consider the way pattern matching works (or doesn’t work) there. The limited pattern matching you have available for a dir command in MS-DOS doesn’t work with other commands—unless the programmer who wrote that command also implemented the same pattern matching feature.

What are the other special characters for pattern matching with filenames? Two other constructs worth knowing are the question mark and the square brackets. The “? will match any single character.

The [...] construct is a bit more complicated. In its simplest form, it matches any of the characters inside; for example, [abc] matches any of a or b or c. So Version[123].java would match a file called Version2.java but not those called Version12.java or VersionC.java. The pattern Version*.java would match all of those. The pattern Version?.java would match all except Version12.java, since it has two characters where the ? matches only one.

The brackets can also match a range of characters, as in [a-z] or [0-9]. If the first character inside the brackets is a “^” or a “!”, then (think “not”) the meaning is reversed, and it will match anything but those characters. So Version[^0-9].java will match VersionC.java but not Version1.java. How would you match a “-”, without it being taken to mean a range? Put it first inside the brackets. How would you match a “^ or “! without it being understood as the “not”? Don’t put it first.

Some sequences are so common that a shorthand syntax is included. Some other sequences are not sequential characters and are not easily expressed as a range, so a shorthand is included for those, too. The syntax for these special sequences is [: name:] where name is one of: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. The phrase [:alpha:] matches any alphabetic character. The phrase [:punct:] matches any punctuation character. We think you got the idea.

Escape at Last

Of course there are always times when you want the special character to be just that character, without its special meaning to the shell. In that case you need to escape the special meaning, either by preceding it with a backslash or by enclosing the expression in single quotes. The commands rm Account$1.class or rm 'Account$1.class' would remove the file even though it has a dollar sign in its name (which would normally be interpreted by the shell as a variable). Any character sequence in single quotes is left alone by the shell; no special substitutions are done. Double quotes still do some substitutions inside them, such as shell variable substitution, so if you want literal values, use the single quotes.

Tip

As a general rule, if you are typing a filename which contains something other than alphanumeric characters, underscores, or periods, you probably want to enclose it in single quotes, to avoid any special shell meaning.

File Contents

Let’s look at a directory of files. How do you know what’s there? We can start with an ls to list the names:

$ ls
ReadMe.txt   Shift.java  dispColrs  moresrc
Shift.class  anIcon.gif  jam.jar    moresrc.zip
$

That lists them alphabetically, top to bottom, then left to right, arranged so as to make the most use of the space while keeping the list in columns. (There are options for other orderings, single column, and so on.)

An ls without options only tells us the names, and we can make some guesses based on those names (for example, which file is Java source, and which is a compiled class file). The long listing ls -l will tell us more: permissions, links, owner, group, size (in bytes), and the date of last modification.

$ ls -l
total 2414
-rw-r--r--   1 albing  users        132 Jan 22 07:53 ReadMe.txt
-rw-r--r--   1 albing  users        637 Jan 22 07:52 Shift.class
-rw-r--r--   1 albing  users        336 Jan 22 07:55 Shift.java
-rw-r--r--   1 albing  users       1374 Jan 22 07:58 anIcon.gif
-rw-r--r--   1 albing  users       8564 Jan 22 07:59 dispColrs
-rw-r--r--   1 albing  users       1943 Jan 22 08:02 jam.jar
drwxr-xr-x   2 albing  users         48 Jan 22 07:52 moresrc
-rw-r--r--   1 albing  users    2435522 Jan 22 07:56 moresrc.zip
$

While ls is only looking at the “outside” of files, [5] there is a command that looks at the “inside,” the data itself, and based on that, tries to tell you what kind of file it found. The command is called file, and it takes as arguments a list of files, so you can give it the name of a single file or you can give it a whole long list of files.

Note

Remember what was said about pattern matching in the shell: we can let the shell construct that list of files for us. We can give file the list of all the files in our current directory by using the “* on the command line so that the shell does the work of expanding it to the names of all the files in our directory (since any filename will match the star pattern).

$ file *
ReadMe.txt:  ASCII text
Shift.class: compiled Java class data, version 45.3
Shift.java:  ASCII Java program text
anIcon.gif:  GIF image data, version 89a, 26 x 26,
dispColrs:   PNG image data, 565 x 465, 8-bit/color RGB, non-interlaced
jam.jar:     Zip archive data, at least v2.0 to extract
moresrc:     directory
moresrc.zip: Zip archive data, at least v1.0 to extract
$

The file looks at the first several hundred bytes of the file and does a statistical analysis of the types of characters that it finds there, along with other special information it uses about the formats of certain files.

Three things to note with this output from file. First, notice that dispColrs was (correctly) identified as a PNG file, even without the .png suffix that it would normally have. That was done deliberately to show you that the type of file is based not just on the name but on the actual contents of the file.

Second, notice that the .jar file is identified as a ZIP archive. They really do use a identical internal format.

Thirdly, file is not foolproof. It’s possible to have perfectly valid, compilable Java files that file thinks are C++ source, or even just English text. Still, it’s a great first guess when you need to figure out what’s in a directory.

Now let’s look at a file. This simplest way to display its contents is to use cat.

$ cat Shift.java
import java.io.*;
import java.net.*;
/**
 * The Shift object
 */
public class
Shift
{
  private int val;

  public Shift() { }

  // ... and so on

} // class Shift

When a file is longer than a few lines you may want to use more or less to look at the file. [6] These programs provide a screen’s worth of data, then pause for your input. You can press the space bar to get the next screen’s worth of output. You can type a slash, then a string, and it will search forward for that string. If you have gone farther forward in the file than you wanted, press “b” to go backwards.

To find out more about the many, many commands available, press ? (the question mark) while it’s running.

Typical uses for these commands are:

  • To view one or more files, for example more *.java, where you can type :n to skip to the next file.

  • To page through long output from a previous pipe of commands, for example, $ grep Account *.java | more, which will search (see more on grep below) for the string Account in all of the files whose names end in .java and print out each line that is found—and that output will be paginated by more.

If you need only to check the top few lines of a file, use head. You can choose how many lines from the front of the file to see with a simple parameter. The command head -7 will write out the first seven lines, then exit.

If your interest is the last few lines of a file, use tail. You can choose how many lines from the end of the file to see; the command tail -7 will write out the last seven lines of the file. But tail has another interesting parameter, -f. Though tail normally prints its lines and then, having reached the end of file, it quits, the -f option tells tail to wait after it prints the last few lines and then try again. [7] If some other program is writing to this file, then tail will, on its next read, find more data and print it out. It’s a great way to watch a log file, for example, tail -f /tmp/server.log.

In this mode, tail won’t end when it reaches the end of file, so when you want it to stop you’ll have to manually interrupt it with a ^C (Control-C—i.e., hold down the Control key and press the C key).

The grep Command

No discussion of Linux commands would be complete without mentioning grep. Grep, an acronym for “generalized regular expression processor,” is a tool for searching through the contents of a file. It searches not just for fixed sequences of characters, but can also handle regular expressions.

In its simplest form, grep myClass *.java will search for and display all lines from the specified files that contain the string myClass. (Recall that the *.java expansion is done by the shell, listing all the files that end with .java.)

The first parameter to grep, myClass in the example above, is the string that you want to search for. But the first nonoption parameter to grep is considered a regular expression meaning that it can contain special characters for pattern matching to make for more powerful searches (see Section 2.2.3). Some of the most common option parameters for grep are listed in Table 1.2.

Table 1.2. Options for grep

Option

Explanation

-i

Ignore upper/lower case differences in its matching.

-l

Only list the filename, not the actual line that matched.

-n

Show the line number where the match was found.

-v

Reverses the meaning of the search—shows every line that does not match the pattern.

Here’s a quick example:

grep println *.java | grep -v System.out

It will look for every occurrence of println but then exclude those that contain System.out. Be aware that while it will exclude lines like

System.out.println(msg);

it will also exclude lines like this:

file.println(msg);  // I'm not using System.out

It is, after all, just doing string searches.

The find Command

If someone compiled a list of the top 10 most useful Linux utilities, find would most likely be near the top of the list. But it would also make the top 10 most confusing. Its syntax is very unlike other Linux utilities. It consists of predicates—logical expressions that cause actions and have true/false values that determine if the rest of the expression is executed. Confused? If you haven’t used find before you probably are. We’ll try to shed a little light by showing a few examples.

find . -name '*frag*' -print

This command looks for a file whose name contains frag. It starts looking in the current directory and descends into all subdirectories in its search.

find /over/there . /tmp/here -name '*frag*.java' -print

This command looks for a file that has frag in its name and ends with .java. It searches for this file starting in three different directories—the current directory (“.”), /over/there, and /tmp/here.

find . -name 'My[A-Z]*.java' -exec ls -l '{}' ;

Starting in the current directory, this command searches for a file whose name begins with My followed by an uppercase alphabetic character followed by anything else, ending with .java. When it finds such a file, it will execute a command—in this case, the ls command with the -l option. The braces are replaced with the name of the file that is found; the “;” indicates to find the end of the command.

The -name is called a predicate; it takes a regular expression as an argument. Any file that matches that regular expression pattern is considered true, so control passes on to the next predicate—which in the first example is simply -print that prints the filename (to standard out) and is always true (but since no other predicate follows it in this example, it doesn’t matter). Since only the names that match the regular expression cause the -name predicate to be true, only those names will get printed.

There are other predicates besides -name. You can get an entire list by typing man find at a command prompt, but Table 1.3 lists a few gems, to give you a taste of what find can do.

Table 1.3. Some find predicates

Option

Explanation

-type d

Is true if the file is a directory.

-type f

Is true if the file is a plain file (e.g., not a directory).

-mtime -5

Is true if the file is less than five days old, that is, has been modified within the last five days. A +5 would mean older than five days and a 5 with no sign means exactly five days.

-atime -5

Is true if the file was accessed within the last five days. The + and - mean greater and less than the specified time, as in the previous example.

-newer myEx.class

Is true if the file is newer than the file myEx.class.

-size +24k

Is true if the file is greater than 24K. The suffix c would mean bytes or characters (since b stands for 512-byte blocks in this context). The + and - mean greater and less than the specified size, as in the other examples.

Let’s look at an example to see how they fit together:

$ find . -name '*.java' -mtime +90 -atime +30 -print
./MyExample.java
./old/sample/MyPrev.java
$

This command printed out the names of two files that end with .java found beneath the current directory. These files hadn’t been modified in the last 90 days nor accessed within the last 30 days. The next thing you might want to do is to run this command again adding something at the end to remove these old files.

$ find . -name '*.java' -mtime +90 -atime +30 -print -exec rm '{}' ;
./MyExample.java
./old/sample/MyPrev.java
$

The Shell Revisited

Most Linux shells—the command interpreters—can be considered programming languages in their own right. That is, they have variables and control structures—if statements, for loops, and so on. While the syntax can be subtly different between shells, the basic constructs are all there.

Entire books can be—and have been—written on shell programming. (It’s one of our favorite subjects to teach.) Programs written in the shell language are often called shell scripts. Such scripts can be powerful yet easy to write (once you are familiar with the syntax) and can make you very productive in dealing with all those little housekeeping tasks that accompany program development. All you need to do (dangerous words, no?) is to put commands in a text file and give the file execute permissions. But that’s a subject for another day.

Some elements of shell scripting, however, are useful even if you never create a single shell script. Of these, perhaps the most important to know (especially for Java programmers) is how to deal with shell variables.

Note

We’ll be describing the syntax for bash, the default shell on most Linux distributions. The syntax will differ for other shells, but the concepts are largely the same.

Any string of alphanumeric or underscore characters can be used as the name of a variable. By convention shell variables typically use uppercase names—but that is only convention (although it will hold true for most if not all of our examples, too). Since commands in Linux are almost always lowercase, the use of uppercase for shell variables helps them to stand out.

Set the value of a shell variable with the familiar method—the equal sign:

$ FILE=/tmp/abc.out
$

This has assigned the variable FILE the value /tmp/abc.out. But to make use of the value that is now in FILE, the shell uses syntax that might not be familiar to you: The name must be preceded with a “$”.

Shell variables can be passed on to other environments if they are exported, but they can never be passed back up. To set a shell variable for use by your current shell and every subsequent subshell, export the variable:

$ export FILE
$

You can combine the assignment of a value with the exporting into one step. Since repeating the export doesn’t hurt, you will often see shell scripts use the export command every time they do an assignment, as if it were part of the assignment syntax—but you know better.

$ export FILE="/tmp/way.out"
$

Note

The shell uses the dollar sign to distinguish between the variable name and just text of the same letters. Consider the following example:

$ echo first > FILE
$ echo second  > TEXT
$ FILE=TEXT
$ cat FILE
first
$

The cat command will dump the contents of the file named FILE to the screen—and you should see first. But how would you tell the shell that you want to see the contents of the file whose name you have put in the shell variable FILE? For that you need the “$”:

$ cat $FILE
second
$

This is a contrived example, but the point is that shell syntax supports arbitrary strings of characters in the command line—some of them are filenames, others are just characters that you want to pass to a program. It needs a way to distinguish those from shell variables. It doesn’t have that problem on the assignment because the “=” provides the needed clue. To say it in computer science terms, the “$” syntax provides the R-value of the variable. (Not the insulation R-value, but what you expect when a variable is used on the Right-hand-side of an assignment operator, as opposed to the L-value used on the Left-hand-side of an assignment operator.)

There are several shell variables that are already exported because they are used by the shell and other programs. You may need or want to set them to customize your environment. Since they are already exported, you won’t need to use the export command and can just assign a value, but it doesn’t hurt.

The most important shell variable to know is PATH. It defines the directories in the filesystem where the shell will look for programs to execute. When you type a command like ls or javac the shell will look in all of the directories specified in the PATH variable, in the order specified, until it finds the executable.

$ echo $PATH
/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:.
$

The PATH shown in the example has five directories, separated by colons (“:”). (Note the fifth one, the “.”; it says to look in the current directory.) Where do you suppose it will find cat? You can look for it yourself by searching in each directory specified in PATH. Or you can use the which command:

$ which cat
/bin/cat
$

Some commands (like exit) don’t show up, since they are built into the shell. Others may be aliases—but that opens a whole other topic that we aren’t covering here. Just remember that each directory in the PATH variable is examined for the executable you want to run. If you get a command not found error, the command may be there, it just may not be on your PATH.

To look at it the other way around: If you want to install a command so that you can execute it from the command line, you can either always type its full pathname, or (a more user-friendly choice) you can set your PATH variable to include the location of the new command’s executable.

So where and how do you set PATH? Whenever a shell is started up, it reads some initialization files. These are shell scripts that are read and executed as if they were typed by the user—that is, not in a subshell. Among other actions, they often set values for variables like PATH. If you are using bash, look at .bashrc in your home directory.

Shell scripts are just shell commands stored in a file so that you don’t need to type the same commands and options over and over. There are two ways to run a shell script. The easiest, often used when testing the script, is

$ sh myscript

where myscript is the name of the file in which you have put your commands. (See Chapter 2 for more on how to do that.) Once you’ve got a script running the way you’d like, you might want to make its invocation as seamless as any other command. To do that, change its permissions to include the execution permission and then, if the file is located in a place that your PATH variable knows about, it will run as a command. Here’s an example:

$ chmod a+rx myscript
$ mv myscript ${HOME}/bin
$ myscript
... (script runs)
$

The file was put into the bin directory off of the home directory. That’s a common place to put homebrew commands. Just be sure that $HOME/bin is in your PATH, or edit .bashrc and add it.

If you want to parameterize your shell, you’ll want to use the variables $1, $2, and so on which are given the first, second, and so on parameters on the command line that you used to invoke your script. If you type myscript Account.java then $1 will have the value Account.java for that invocation of the script.

We don’t have the space to go into all that we’d like to about shell programming, but let us leave you with a simple example that can show you some of its power. Used in shell scripts, for loops can take a lot of drudgery out of file maintenance. Here’s a simple but real example.

Imagine that your project has a naming convention that all Java files associated with the user interface on your project will begin with the letters “UI”. Now suppose your boss decides to change that convention to “GUI” but you’ve already created 200 or more files using the old naming convention. Shell script to the rescue:

for i in UI*.java
do
  new="G${i}"
  echo $i ' ==> ' $new
  mv $i $new
done

You could just type those commands from the command line—that’s the nature of shell syntax. But putting them into a file lets you test out the script without having to type it over and over, and keeps the correct syntax once you’ve got it debugged. Assuming we put those commands into a file called myscript, here’s a sample run:

$ myscript
UI_Button.java ==> GUI_Button.java
UI_Plovar.java ==> GUI_Plovar.java
UI_Screen.java ==> GUI_Screen.java
UI_Tofal.java ==> GUI_Tofal.java
UI_Unsov.java ==> GUI_Unsov.java
...
$

Imagine having to rename 200 files. Now imagine having to do that with a point-and-click interface. It could take you all morning. With our shell script, it will be done in seconds.

We can’t hope to cover all that we’d like to about shell scripting. Perhaps we have been able to whet your appetite. There are lots of books on the subject of shell programming. We’ve listed a few at the end of this chapter.

The tar and zip Commands

The tar and zip commands allow you to pack data into an archive or extract it back. They provide lossless data compression (unlike some image compression algorithms) so that you get back out exactly what you put in, but it can take up less space when archived.[8] Therefore tar and zip are often used for data backup, archival, and network transmission.

There are three basic actions that you can take with tar, and you can specify which action you want with a single letter[9] in the arguments on the command line. You can either

  • c: Create an archive.

  • x: Extract from an archive.

  • t: Get a table of contents.

In addition, you’ll want to know these options:

  • f: The next parameter is the filename of the archive.

  • v: Provide more verbose output.

Using these options, Table 1.4 shows examples of each of the basic functions.

Table 1.4. Examples of the tar command

Command

Explanation

tar tvf packedup.tar

Gives a table of contents, in long (or verbose) form. Without the v, all you get is the filenames; with the v you get additional information similar in format to the ls -l command.

tar xvf packedup.tar

Extracts all the files from the TAR file, creating them according to their specified pathname, assuming your user ID and file permissions allow it. Remove the v option if you don’t want to see each filename as the file is extracted.

tar cvf packedup.tar mydir

Creates a TAR archive named packedup.tar from the mydir directory and its contents. Remove the v option if you don’t want to see each filename as the file is added to the archive.

Now let’s do the same thing using the zip command (Table 1.5). There are actually two commands here—one to compress the files into an archive (zip), and the other to reverse the process (unzip).

Table 1.5. Examples of the zip and unzip commands

Command

Explanation

unzip -l packedup.zip

Gives a table of contents of the archive with some extra frill around the edges, like a count of the files in the archive.

unzip packedup.zip

Extracts all the files from the ZIP file, creating them according to their specified pathname, assuming your user ID and file permissions allow it. Add the quiet option with -q if you would like unzip not to list each file as it unzips it.

zip -r packedup mydir

Creates a ZIP archive named packedup.zip from the mydir directory and its contents. The -r tells zip to recursively descend into all the subdirectories, their subdirectories, and so on; otherwise, zip will just take the files at the first layer and go no deeper.

Tip

Since TAR and ZIP files can contain absolute as well as relative pathnames, it is a good idea to look at their contents (e.g., tar tvf file) before unpacking them, so that you know what is going to be written where.

There are many, many more options for tar and zip that we are not covering here, but these are the most common in our experience, and they will give you a good start.

The tar and zip commands are also worth knowing about by a Java developer because of their relationship to JAR files. If you are working with Java you will soon run across the notion of a Java ARchive file, or JAR file. They are recognizable by name, ending in .jar. Certain Java tools are built to understand the internal format of JAR files. For Enterprise Java (J2EE) there are similar archives known as WAR files and EAR files. The command syntax for dealing with the jar command that builds these archives is very similar to the basic commands of tar. The internal format of a jar is the same as a ZIP file. In fact, most places where you can use a JAR file you can use a ZIP file as well. (You will see more about this when we discuss the standard Java tools in Section 5.11.)

Tip

Here’s one more handy example we know you’ll use:

find . -name '*.java' -print | zip allmysource -@

This command starts in the current directory (“.”) finding every file that ends in .java and gives their names to zip which will read them from standard in instead of its argument list (told to do so with the -@ argument) and zip them all into an archive named allmysource.zip. To put it simply, it will zip up all your Java source files from the current directory on down.

The man Command

Primitive but handy, the man command (short for manual) was the early UNIX online manual. While we’ve come to expect (and ignore) online help, the idea of online manuals was rather revolutionary in the early days of UNIX. In contrast to walls of printed documentation, UNIX provided terse but definitive descriptions of its various commands. When they are done well, these descriptions are an invaluable handy reference. They are not the best way to learn about a command, but they can be a great guide to using the command’s options correctly.

The format is simply man followed by the name of the command about which you want information. So man man will tell you about the man command itself.

The most useful option to man is the -k option. It will do a keyword search in the titles of all the manpages looking for the keyword that you give. Try typing man -k java to see what commands are available. The (1) means that it’s a user command—something that you can type from the shell prompt, as opposed to (2) which is a system call or (3) which is a C library call. These numbers refer to the original UNIX documentation volumes (volume one was shell commands and so on), and it all fit into a single three ring binder.

Tip

One other way to find out something about a command, if you know the command name already, is to ask the command itself for help. Most commands have either a -? or --help option. Try --help first. If you need to type -? either put it in single quotes or type it with a backslash before the question mark, as in -?, since the ? is a pattern-matching character to the shell.

There are other help systems available, such as info and some GUI-based ones. But man provides some of the quickest and most terse help when you need to check the syntax of a command or find out if there is an option that does what you need.

Review

We’ve looked at commands that will show you where files are in your directory structure, show files’ permissions and sizes, change the permissions, show you what is in a file, look for files by searching for strings, and look for files based on names or other properties.

Even so, we’ve given only the briefest coverage to only a few of the scores of Linux commands worth knowing. Tops among these is the shell, bash in our case. Whole books have been written on this subject, and you would do well to have one at hand.

What You Still Don’t Know

The shell is a powerful language in its own right. While you think of it mostly as a command interpreter used for running other commands, it is, in fact, a language, complete with variables, logic and looping constructs. We are not suggesting that you write your application in shell scripts, but you will find it useful for automating many repetitive tasks. There is so much that can be done with shell scripts that we encourage you to read more about this and to talk with other Linux users.

Linux is replete with so many different commands. Some are powerful languages like awk and perl, others are simple handy utilities like head, tail, sort, tr, and diff. There are hundreds of other commands that we don’t even have time to mention.

Resources

  • Cameron Newham and Bill Rosenblatt, Learning the Bash Shell, O’Reilly Associates, ISBN 1565923472.

  • Ellie Quigley, Linux Shells by Example, 4th ed., Prentice Hall PTR, ISBN 013147572X.

  • Rafeeq Rehman and Christopher Paul, The Linux Development Platform, Prentice Hall PTR.

  • Mark G. Sobell, A Practical Guide to Linux, Addison-Wesley, ISBN 0201895498.

  • Mark G. Sobell, A Practical Guide to Red Hat Linux, Addison-Wesley, ISBN 0201703130.



[1] If you’re not using a windowing system, these commands are typed at the shell prompt that you get after you log in. But if you’re not using a windowing system, either you’re not a beginner (and don’t need this introduction) or you can’t get your windowing system to work, in which case you may need more help that we can give you here.

[2] Yes, we are aware that much of UNIX actually comes from the Multics project, but we credit UNIX with popularizing it.

[3] The use of the number 2 comes from an implementation detail: All the I/O descriptors for a UNIX process were kept in an array. The first three elements of the array, numbered 0, 1, and 2, were defined to be the standard in, out, and err, in that order. Thus in the shell you can also redirect standard out by using “1>” as well as the shorter “>”.

[4] On Linux the use of this command is restricted to the superuser, or “root.”

[5] Technically, ls (without arguments) need only read the directory, whereas ls -l looks at the contents of the inode in order to get all the other information (permissions, size, and so on), but it doesn’t look at the data blocks of the file.

[6] Like any open marketplace, the marketplace of ideas and open source software has its “metoo” products. Someone thought they could do even better than more, so they wrote a new, improved and largely upward compatible command. They named it less, on the minimalist philosophy (with apologies to Dave Barry: “I am not making this up”) that “less is more.” Nowadays, the more is rather passe. The less command has more features and has largely replaced it. In fact, on many Linux distributions, more is a link to less. In the name of full disclosure, there is also a paging program called pg, the precursor to more, but we’ll say no more about that.

[7] The less command has the same feature. If you press “F” while looking at a file, it goes into an identical mode to the tail -f command. As is often the case in the wacky world of Linux, there is more than one way to do it.

[8] Well, technically, tar doesn’t compress the data in the file, but it does provide a certain amount of “compression” by cutting off the tail ends of blocks of data; for example, a file of 37 bytes in its own file takes up 4K of disk space since disk blocks are allocated in “chunks” (not the technical term). When you tar together a whole bunch of files, those extra tail-end empty bytes are not used (except in the final block of the TAR file). So, for example, 10 files of 400 bytes could be packed into a single 4K file, instead of the 40K bytes they would occupy on the filesystem. So, while tar won’t compress the data inside the file (and thus is quite assuredly “lossless”) it does result in a smaller file.

[9] Linux option strings always start with a “-”, right? Yes, except for tar. It seems there is always an exception to every rule. The newer versions of tar allow the leading minus sign, but can also work without it, for historical compatibility reasons. Early versions of UNIX only had single letter options. Newer POSIX versions of UNIX and the GNU tools, which means all flavors of Linux, also support longer full-word options prefixed with a double minus, as in --extract instead of x or -x.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.11.20