Chapter 11. Index Generation

To find a topic of interest in a large document, book, or reference work, you usually turn to the table of contents or, more often, to the index. Therefore, an index is a very important part of a document, and most users’ entry point to a source of information is precisely through a pointer in the index. You should, therefore, plan an index and develop it along with the main text [38]. For reasons of consistency, it is beneficial, with the technique discussed below, to use special commands in the text to always print a given keyword in the same way in the text and the index throughout the whole document.

This chapter first reviews the basic indexing commands provided by standard LaTeX, and explains which tools are available to help you build a well-thought-out index. The LaTeX Manual itself does not contain a lot of information about the syntax of the index entries. However, several articles in TUGboat deal with the question of generating an index with TeX or LaTeX [47,162,163]. The syntax described in Section 11.1 is the one recognized by MakeIndex [37, 103] and xindy [71, 76, 152], the most widely used index preparation programs.

Section 11.2 describes how the MakeIndex processor is used. The interpretation of the input file and the format of the output file are controlled by style parameters. Section 11.2.4 lists these parameters and gives several simple examples to show how changing them influences the typeset result.

Section 11.3 presents xindy, an alternative to MakeIndex. It’s preferable to use this program whenever you have non-English documents or other special demands, such as production of technical indexes. The xindy program provides total flexibility for merging and sorting index entries, and for arbitrary formatting of references.

The final section describes several LaTeX packages to enhance the index and to create multiple indexes, which will be discussed with the help of an example.

The process of generating an index is shown schematically in Figure 11.1. The steps for generating an index with LaTeX and either MakeIndex or xindy are illustrated in this figure.

Image

Figure 11.1. The sequential flow of index processing and the various auxiliary files used by LaTeX and external index processors

Figure 11.2 on the next page shows, with an example, the various steps involved in transforming an input file into a typeset index. It also shows, in somewhat more detail, which files are involved in the index-generating process. Figure 11.2(a) shows some occurrences of index commands (index) in the document source, with corresponding pages listed on the left. Figure 11.2(b) shows a raw index .idx file generated by LaTeX. File extensions may differ when using multiple indexes or glossaries. After running the .idx file through the index processor, it becomes an alphabetized index .ind file with LaTeX commands specifying a particular output format [Figure 11.2(c)]. The typeset result after formatting with LaTeX is shown in Figure 11.2(d).

Image

Figure 11.2. Stepwise development of index processing

LaTeX and MakeIndex, when employed together, use several markup conventions to help you control the precise format of the output. The xindy program has a MakeIndex compatibility mode that supports the same format. In Section 11.1, which describes the format of the index command, we always use the default settings.

11.1. Syntax of the index entries

This section describes the default syntax used to generate index entries with LaTeX and either MakeIndex or xindy. Different levels of complexity are introduced progressively, showing, for each case, the input file and the generated typeset output.

Figures 11.3 and 11.4 on page 656 show the input and generated output of a small LaTeX document, where various simple possibilities of the index command are shown, together with the result of including the showidx package (see Section 11.4.2). To make the index entries consistent in these figures (see Section 11.1.7), the commands Com and Prog were defined and used. The index-generating environment theindex has been redefined to get the output on one page (Section 11.4.1 explains how this can be done).

Image

Figure 11.3. Example of index commands and the showidx package. This file is run through LaTeX once, then the index processor is executed and LaTeX is run a second time.

Image

Figure 11.4. This figure shows the index generated by the example input of Figure 11.3. All index entries are shown in the margin, so it is easy to check for errors or duplications.

Generating the raw index

After introducing the necessary index commands in the document, we want to generate the index to be included once again in the LaTeX document on a subsequent run. If the main file of a document is main.tex, for example, then the following changes should be made to that file:

• Include the makeidx package with a usepackage command.

• Put a makeindex command in the document preamble.

• Put a printindex command where the index is to appear—usually at the end, right before the end{document} command.

You then run LaTeX on the entire document, causing it to generate the file main.idx, which we shall call the .idx file.

11.1.1. Simple index entries

Each index command causes LaTeX to write an entry in the .idx file. The following example shows some simple index commands, together with the index entries that they produce. The page number refers to the page containing the text where the index command appears. As shown in the example below, duplicate commands on the same page (such as index{stylistic} on page 23) produce only one “23” in the index.

Image

Image Spaces can be harmful

Pay particular attention to the way spaces are handled in this example. Spaces inside index commands are written literally to the output .idx file and, by default, are treated as ordinary characters by MakeIndex, which places them in front of all letters. In the example above, look at the style entries on pages 14 and 16. The leading spaces are placed at the beginning of the index and on two different lines because the trailing blank on page 16 lengthens the string by one character. We end up with four different entries for the same term, an effect that was probably not desired. It is therefore important to eliminate such spurious spaces from the index commands when you use MakeIndex. Alternatively, you can specify the -c option when running the index processor. This option suppresses the effect of leading and trailing blanks (see Sections 11.2.2 and 11.3.1). Another frequently encountered error occurs when the same English word is spelled inconsistently with initial lowercase and uppercase letters (as with Stylist on page xi), leading to two different index entries. Of course, this behavior is wanted in languages like German, where “Arm” (arm) and “arm” (poor) are really two completely different words. In English, such spurious double entries should normally be eliminated.

If you use xindy, space compression is done automatically. Furthermore, xindy supports international indexing and thus correctly and automatically handles case sensitivity in a language-specific way. Therefore, with xindy you won’t encounter the problems mentioned above.

11.1.2. Generating subentries

A maximum of three levels of index entries (main, sub, and subsub entries) are available. To produce such entries, the argument of the index command should contain both the main entries and subentries, separated by a ! character. This character can be redefined in the MakeIndex style file (see Table 11.1 on page 660).

Image

Table 11.1. Input style parameters for MakeIndex

Image

11.1.3. Page ranges and cross-references

You can specify a page range by putting the command index{...|(} at the beginning of the range and the command index{...|)} at the end of the range. Page ranges should span a homogeneous numbering scheme (e.g., Roman and Arabic page numbers cannot fall within the same range). Note that MakeIndex and xindy do the right thing when both ends of a page range fall on the same page, or when an entry falls inside an active range.

You can also generate cross-reference index entries without page numbers by using the see encapsulator. Because the “see” entry does not print any page number, the commands index{...|see{...}} can be placed anywhere in the input file after the egin{document} command. For practical reasons, it is convenient to group all such cross-referencing commands in one place.

Image

11.1.4. Controlling the presentation form

Sometimes you may want to sort an entry according to a key, while using a different visual representation for the typesetting, such as Greek letters, mathematical symbols, or specific typographic forms. This function is available with the syntax key@visual, where key determines the alphabetical position and the string visual produces the typeset text of the entry.

Image

For some indexes, certain page numbers should be formatted specially. For example, an italic page number might indicate a primary reference, or an n after a page number might denote that the item appears in a footnote on that page. MakeIndex allows you to format an individual page number in any way you want by using the encapsulator syntax specified by the | character. What follows the | sign will “encapsulate” or enclose the page number associated with the index entry. For instance, the command index{keyword|xxx} will produce a page number of the form xxx{n}, where n is the page number in question. Similarly, the commands index{keyword|(xxx} and index{keyword|)xxx} will generate a page range of the form xxx{n–m}.

Preexisting commands (like extit in the example below) or user commands can be used to encapsulate the page numbers. As an example, a document containing the command definition

Image

would yield something like this:

Image

The see encapsulator is a special case of this facility, where the see command is predefined by the makeidx package.

11.1.5. Printing special characters

To typeset one of the characters having a special meaning to MakeIndex or xindy (!, ", @, or |)1 in the index, precede it with a " character. More precisely, any character is said to be quoted if it follows an unquoted " that is not part of a " command. The latter case allows for umlaut characters. Quoted !, @, ", and | characters are treated like ordinary characters, losing their special meaning. The " preceding a quoted character is deleted before the entries are alphabetized.

1 As noted earlier, in MakeIndex other characters can be substituted for the default ones and carry a special meaning. This behavior is explained on page 662.

Image

11.1.6. Creating a glossary

LaTeX also has a glossary command for making a glossary. The makeglossary command produces a file with an extension of .glo, which is similar to the .idx file for the index commands. LaTeX transforms the glossary commands into glossaryentry entries, just as it translates any index commands into indexentry entries.

MakeIndex can also handle these glossary commands, but you must change the value for some of the style file keywords, as shown in the style file myglossary.ist.

Image

In addition, you have to define a suitable theglossary environment.

11.1.7. Defining your own index commands

As was pointed out in the introduction, it is very important to use the same visual representation for identical names or commands throughout a complete document, including the index. You therefore can define user commands, which always introduce similar constructs in the same way into the text and the index.

For example, you can define the command Index, whose argument is entered at the same time in the text and in the index.

Image

As explained in more detail below, you must be careful that the argument of such a command does not contain expandable material (typically control sequences) or spurious blanks. In general, for simple terms like single words, there is no problem and this technique can be used. You can even go one step further and give a certain visual representation to the entry—for instance, typesetting it in a typewriter font.

Image

Finally, you can group certain terms by defining commands that have a generic meaning. For instance, LaTeX commands and program names could be treated with special commands, as in the following examples:

Image

The Com command adds a backslash to the command’s name in both text and index, simplifying the work of the typist. The s command definition is necessary, because extbackslash would be substituted in an OT1 font encoding context, as explained in Section 7.3.5 on page 346. At the same time, commands will be ordered in the index by their names, with the -character being ignored during sorting. Similarly, the Prog command does not include the exttt command in the alphabetization process, because entries like index{ exttt{key}} and index{key} would then result in different entries in the index.

11.1.8. Special considerations

When an index command is used directly in the text, its argument is expanded only when the index is typeset, not when the .idx file is written. However, when the index command is contained in the argument of another command, characters with a special meaning to TeX, such as , must be properly protected against expansion. This problem is likely to arise when indexing items in a footnote, or when using commands that put their argument in the text and enter it at the same time in the index (see the discussion in Section 11.1.7). Even in this case, robust commands can be placed in the “@” part of an entry, as in index{rose@ extit{rose}}, but fragile commands must be protected with the protect command.

As with every argument of a command you need to have a matching number of braces. However, because index allows special characters like % or in its argument if the command is used in main text, the brace matching has an anomaly: braces in the commands { and } take part in the matching. Thus, you cannot write index{{} or something similar.

11.2. makeindex—A program to format and sort indexes

In the previous section we showed examples where we ran the MakeIndex program using its default settings. In this section we will first take a closer look at the MakeIndex program, and then discuss ways of changing its behavior.

11.2.1. Generating the formatted index

To generate the formatted index, you should run the MakeIndex program by typing the following command (where main is the name of the input file):

Image

This produces the file main.ind, which will be called the .ind file here. If MakeIndex generated no error messages, you can now rerun LaTeX on the document and the index will appear. (You can remove the makeindex command if you do not want to regenerate the index.) Page 658 describes what happens at this point if there are error messages.

In reading the index, you may discover additional mistakes. These should be corrected by changing the appropriate index commands in the document and regenerating the .ind file (rerunning LaTeX before and after the last step).

An example of running MakeIndex is shown below. The .idx file, main.idx, is generated by a first LaTeX run on the input shown in Figure 11.3 on the next page. You can clearly see that two files are written—namely, the ordered .ind index file for use with LaTeX, called main.ind, and the index .ilg log file, called main.ilg, which (in this case) will contain the same text as the output on the terminal. If errors are encountered, then the latter file will contain the line number and error message for each error in the input stream. Figure 11.4 on the following page shows the result of the subsequent LaTeX run. The example uses the showidx package for controlling the index (see Section 11.4.2).

Image

11.2.2. Detailed options of the MakeIndex program

The syntax of the options of the MakeIndex program are described below:

Image

-c Enable blank compression. By default, every blank counts in the index key.

The -c option ignores leading and trailing blanks and tabs and compresses intermediate ones to a single space.

-i Use standard input (stdin) as the input file. When this option is specified and -o is not, output is written to standard output (stdout, the default output stream).

-g Employ German word ordering in the index, following the rules given in German standard DIN5007. In this case the normal precedence rule of MakeIndex for word ordering (symbols, numbers, uppercase letters, lowercase letters) is replaced by the German word ordering (symbols, lowercase letters, uppercase letters, numbers). Additionally, this option enables MakeIndex to recognize the German TeX commands "a, "o, "u, and "s as ae, oe, ue, and ss, respectively, for sorting purposes. The quote character must be redefined in a style file (see page 662); otherwise, you will get an error message and MakeIndex will abort. Note that not all versions of MakeIndex recognize this option.

-l Use letter ordering. The default is word ordering. In word ordering, a space comes before any letter in the alphabet. In letter ordering, spaces are ignored. For example, the index terms “point in space” and “pointing” will be alphabetized differently in letter and word ordering.

-q Operate in quiet mode. No messages are sent to the error output stream (stderr). By default, progress and error messages are sent to stderr as well as the transcript file. The -q option disables the stderr messages.

-r Disable implicit page range formation. By default, three or more successive pages are automatically abbreviated as a range (e.g., 1–5). The -r option disables this default, making explicit range operators the only way to create page ranges.

-o ind Take ind as the output index file. By default, the file name base of the first input file idx0 concatenated with the extension .ind is used as the output file name.

-p no Set the starting page number of the output index file to no. This option is useful when the index file is to be formatted separately. Other than pure numbers, three special cases are allowed for no: any, odd, and even. In these special cases, the starting page number is determined by retrieving the last page number from the .log file of the last LaTeX run. The .log file name is determined by concatenating the file name base of the first raw index file (idx0) with the extension .log. The last source page is obtained by searching backward in the log file for the first instance of a number included in square brackets. If a page number is missing or if the .log file is not found, no attempt will be made to set the starting page number. The meaning of each of the three special cases follows:

any The starting page is the last source page number plus one.

odd The starting page is the first odd page following the last source page number.

even The starting page is the first even page following the last source page number.

-s sty Take sty as the style file. There is no default for the style file name. The environment variable INDEXSTYLE defines where the style file resides.

-t log Take log as the transcript file. By default, the file name base of the first input file idx0 concatenated with the extension .ilg is used as the transcript file name.

11.2.3. Error messages

MakeIndex displays on the terminal how many lines were read and written and how many errors were found. Messages that identify errors are written in the transcript file, which, by default, has the extension .ilg. MakeIndex can produce error messages when it is reading the .idx file or when it is writing the .ind file. Each error message identifies the nature of the error and the number of the line where the error occurred in the file.

Errors in the reading phase

In the reading phase, the line numbers in the error messages refer to the positions in the .idx file being read.

Extra '!' at position ...

The index command’s argument has more than two unquoted ! characters. Perhaps some of them should be quoted.

Extra '@' at position ...

The index command argument has two or more unquoted @ characters with no intervening !. Perhaps one of the @ characters should be quoted.

Extra '|' at position ...

The index command’s argument has more than one unquoted | character. Perhaps the extras should be quoted.

Illegal null field

The index command argument does not make sense because some string is null that shouldn’t be. The command index{!funny} will produce this error, since it specifies a subentry “funny” with no entry. Similarly, the command index{@funny} is incorrect, because it specifies a null string for sorting.

Argument ... too long (max 1024)

The document contained an index command with a very long argument. You probably forgot the right brace that should delimit the argument.

Errors in the writing phase

In the writing phase, line numbers in the error messages refer to the positions in the .ind file being written.

Unmatched range opening operator

An index{...|(} command has no matching index{...|)} command following it. The “...” in the two commands must be completely identical.

Unmatched range closing operator

An index{...|)} command has no matching index{...|(} command preceding it.

Extra range opening operator

Two index{...|(} commands appear in the document with no intervening command index{...|)}.

Inconsistent page encapsulator ... within range

MakeIndex has been instructed to include a page range for an entry and a single page number within that range is formatted differently—for example, by having an index{cat|see{animals}} command between an index{cat|(} command and an index{cat|)} command.

Conflicting entries

MakeIndex thinks it has been instructed to print the same page number twice in two different ways. For example, the command sequences index{lion|see{...}} and index{lion} appear on the same page.

MakeIndex can produce a variety of other error messages indicating that something is seriously wrong with the .idx file. If you get such an error, it probably means that the .idx file was corrupted in some way. If LaTeX did not generate any errors when it created the .idx file, then it is highly unlikely to have produced a bad .idx file. If, nevertheles, this does happen, you should examine the .idx file to establish what went wrong.

11.2.4. Customizing the index with MakeIndex

MakeIndex ensures that the formats of the input and output files do not have to be fixed, but they can be adapted to the needs of a specific application. To achieve this format independence, the MakeIndex program is driven by a style file, usually characterized with a file extension of .ist (see also Figure 11.1 on page 648). This file consists of a series of keyword/value pairs. These keywords can be divided into input and output style parameters. Table 11.1 on the following page describes the various keywords and their default values for the programming of the input file. This table shows, for instance, how to modify the index level separator (level, with ! as default character value). Table 11.2 on page 661 describes the various keywords and their default values for steering the translation of the input information into LaTeX commands. This table explains how to define the way the various levels are formatted (using the item series of keywords). Examples will show in more detail how these input and output keywords can be used in practice. MakeIndex style files use UN*X string syntax, so you must enter \ to get a single in the output.

Image
Image

Table 11.2. Output style parameters for MakeIndex

In the following sections we show how, by making just a few changes to the values of the default settings of the parameters controlling the index, you can customize the index.

A stand-alone index

The example style mybook.ist (shown below) defines a stand-alone index for a book, where “stand-alone” means that it can be formatted independently of the main source. Such a stand-alone index can be useful if the input text of the book is frozen (the page numbers will no longer change), and you only want to reformat the index.

Image

Assuming that the raw index commands are in the file mybook.idx, then you can call MakeIndex specifying the style file’s name:

Image

A nondefault output file name is used to avoid clobbering the source output (presumably mybook.dvi). If the index is in file mybook.ind, then its typeset output will also be in mybook.dvi, thus overwriting the .dvi file for the main document.

Moreover, if you want the page numbers for the index to come out correctly, then you can specify the page number where the index has to start (e.g., 181 in the example below).

Image

MakeIndex can also read the LaTeX log file mybook.log to find the page number to be used for the index (see the -p option described on page 657).

Changing the “special characters”

The next example shows how you can change the interpretation of special characters in the input file. To do so, you must specify the new special characters in a style file (for instance, myinchar.ist shown below). Using Table 11.1 on page 660, in the following example we change the @ character (see page 651) to =, the sub-level indicator ! (see page 650) to >, and the quotation character " (see page 652) to ! (the default sublevel indicator).

Image

In Figure 11.5 on the next page, which should be used in conjunction with the german option of the babel package, the double quote character (") is used as a shortcut for the umlaut construct ". This shows another feature of the ordering of MakeIndex: namely, the constructs " and " are considered to be different entries (Br"ucke and Br"ucke, M"adchen and M"adchen, although in the latter case the key entry was identical, Maedchen). Therefore, it is important to use the same input convention throughout a complete document.

Image

Figure 11.5. Example of the use of special characters with MakeIndex

Changing the output format of the index

You can also personalize the output format of the index. The first thing that we could try is to build an index with a nice, big letter between each letter group. This is achieved with the style myhead.ist, as shown below (see Table 11.2 on the preceding page for more details) and gives the result shown in Figure 11.6.

Image

Figure 11.6. Example of customizing the output format of an index

Image

You could go a bit further and right-adjust the page numbers, putting in dots between the entry and the page number to guide the eye, as shown in Figure 11.7. This effect can be achieved by adding the following commands:

Image
Image

Figure 11.7. Adding leaders to an index

The LaTeX command dotfill can be replaced by fancier commands, but the underlying principle remains the same.

Treating funny page numbers

As described earlier, MakeIndex accepts five basic kinds of page numbers: digits, uppercase and lowercase alphabetic, and uppercase and lowercase Roman numerals. You can also build composed page numbers. The separator character for composed page numbers is controlled by the MakeIndex keyword page_compositor; the default is the hyphen character (-), as noted in Table 11.1 on page 660. The precedence of ordering for the various kinds of page numbers is given by the keyword page_precedence; the default is rRnaA, as noted in Table 11.2 on page 661.

Image Problems with letters as page numbers

Let us start with an example involving simple page numbers. Assume the pages with numbers ii, iv, 1, 2, 5, a, c, A, C, II, and IV contain an index command with the word style. With the default page_precedence of rRnaA this would be typeset in the index as shown below. The c and C entries are considered to be Roman numerals, rather than alphabetic characters:

style, ii, iv, c, II, IV, C, 1, 2, 5, a, A

This order can be changed by using the page_precedence keyword to "rnAaR". Running MakeIndex on the same index entries now yields:

style, ii, iv, c, 1, 2, 5, A, a, II, IV, C

As you see, the letters like C are still interpreted as roman numerals. Thus, as long as MakeIndex offers no possibility to modify this behavior, it is ill adapted for pages numbered by letters—either one accepts a potentially incorrect order in the page references or one has to manually correct the index in the end.

Composed page numbers

The situation looks somewhat different if composed page numbers are used, e.g., page numbers like “B-3” (where “B” is the appendix number and “3” the page number within this appendix). In this case C will be interpreted as a letter, but I is still considered a roman numeral. Thus, in this setting you can have up to eight appendices before you run into trouble.

Suppose that the unsorted index entries show the page numbers C--3, 1--1, D--1--1, B--7, F--3--5, 2--2, D--2--3, A--1, B--5, and A--2. If this raw index is processed with MakeIndex, it will result in an empty formatted index and a lot of error messages, since the default page separator is a single hyphen. However, by setting the page_compositor keyword to "--" you can process this raw index successfully getting the following result:

style, 1–1, 2–2, A–1, A–2, B–5, B–7, C–3, D–1–1, D–2–3, F–3–5

Since MakeIndex supports only a single page separator more complex page numbering schemes involving several different page separators (such as A–4.1) can not be processed by this program.

11.2.5. MakeIndex pitfalls

The index command tries to write its argument unmodified to the .idx file whenever possible.1 This behavior has a number of different consequences. If the index text contains commands, as in index{Prog}, the entry is likely wrongly sorted because in main text this entry is sorted under the sort key Prog (with the special character as the starting sort character) regardless of the definition of the Prog command. On the other hand, if it is used in some argument of another command, Prog will expand before it is written to the .idx file; the placement in the index will then depend on the expansion of Prog. The same thing happens when you use index inside your own definitions. That is, all commands inside the index argument are expanded (except when they are robust or preceded by protect).

1 The way LaTeX deals with the problem of preventing expansion is not always successful. The index package (see Section 11.4.3) uses a different approach that prevents expansion in all cases.

For sorting, MakeIndex assumes that pages numbered with lowercase Roman numerals precede those numbered with Arabic numerals, which in turn precede those numbered with the lowercase alphabet, uppercase Roman numerals and finally the uppercase alphabet. This precedence order can be changed (see the entry page_precedence in Table 11.2 on page 661).

MakeIndex will place symbols (i.e., patterns starting with a non-alphanumeric character) before numbers, and before alphabetic entries in the output. Symbols are sorted according to their ASCII values. For word sorting, uppercase and lowercase are considered the same, but for identical words, the uppercase variant will precede the lowercase one. Numbers are sorted in numeric order.

Spaces are treated as ordinary characters when alphabetizing the entries and for deciding whether two entries are the same (see also the example on page 650). Thus, if “” denotes a space character, the commands index{cat}, index{cat}, and index{cat} will produce three separate entries. All three entries look similar when printed. Likewise, index{aspace} and index{aspace} produce two different entries that look the same on output. For this reason it is important to check for spurious spaces by being careful when splitting the argument of an index command across lines in the input file. The MakeIndex option -c turns off that behavior and trims leading and trailing white space, compressing all white space within to one blank. We recommend that you use it all the time.

11.3. xindy—An alternative to MakeIndex

The xindy program by Roger Kehr and Joachim Schrod is a flexible indexing system that represents an alternative to MakeIndex. It avoids several limits, especially for generating indexes in non-English languages. Usage of xindy is recommended in the following cases:

• You have an index with non-English words and you want to use a drop-in replacement.

Migration from MakeIndex is easy because xindy can be used without changing the index entries in your document. A compatibility style file will produce results corresponding to MakeIndex’s default set-up. The main difference will be that sorting index entries will work out of the box.

• You want to ensure that the index is more consistent than that created with MakeIndex.

Because MakeIndex takes every indexed term literally, you need to specify index visualization explicitly, as explained in Section 11.1.4 on page 651. In particular, this step is needed if your visualization needs LaTeX commands. If you forget your special visualization in one place, you will get an inconsistent index. The xindy program takes common LaTeX representations and computes the index key from them—therefore you do not have to specify the difference between the index key and the visualization, every time. (For example, you no longer need the different definitions of Index and Indextt from Section 11.1.7 on page 653.) Of course, you can still provide specific visualizations in your index entry.

• You want more checks for correctness.

If you have an index cross-reference with see, as explained in Section 11.1.3 on page 651, xindy checks that the referenced index entry really exists. This way you can avoid dangling references in your indexes.

• You want to create a technical index in an efficient way.

Many technical indexes involve heavy LaTeX markup in the index keys. The xindy program allows user-defined construction of the index keys from this markup. This gives you the ability to emit index entries automatically from your LaTeX commands, so as to get every usage of a technical term into the index. However, you will have to invest the time to define your index key construction rules.

• You want to create an index with “unusual” terms.

For certain terms, special sorting rules exist due to historical reasons. For example, village and people’s names are sometimes sorted differently than they’re spelled—“St. Martin” is sorted as “Martin” or as “Saint Martin” dependent on context, “van Beethoven” is sorted as “Beethoven”, and so on. Symbol indexes are another example where sort order is more or less arbitrarily defined, but should be consistent over a series of work.

The xindy program offers these advantages because it has dropped many of MakeIndex’s hard-wired assumptions that are not valid in international documents with arbitrary location reference structures. Instead, xindy provides a flexible framework for configuring index creation, together with a simple MakeIndex-like script for standard tasks.

The power of xindy is largely derived from five key features:

Internationalization xindy can be easily configured for languages with different letter sets and/or different sorting rules. You can define extra letters or complete alphabets, and you can provide a set of rules to sort and group them. At the moment, about 50 predefined language sets are available.

Modular configuration xindy is configured with declarations that can be combined and reused. For standard indexing tasks, LaTeX users do not have to do much except grab available modules.

Markup normalization A tedious problem related to technical or multilanguage indexes concerns markup and nontext material. The xindy program allows you to ignore different encodings for the same subject, or to easily strip markup items such as math mode.

User-definable location references An index entry points to locations. Fancy indexes may use not only page numbers, but also book names, law paragraphs, and structured article numbers (e.g., “I-20”, “Genesis 1, 31”). The xindy program enables you to sort and group your location references arbitrarily.

Highly configurable markup xindy provides total markup control. This feature is usually not of importance for LaTeX users, but comes in handy for indexing non-TeX material.

Availability

If the xindy program is not part of your TeX distribution, its web site (www.xindy.org) offers distributions for many operating systems and more reference documentation. Note that its Windows support is not as good as its UN*X or Linux support. CTAN holds xindy distribution files as well.

11.3.1. Generating the formatted index with xindy

The xindy program comes with a command texindy that allows it to be used in a simple, MakeIndex-like way for standard tasks. Options equivalent to those of MakeIndex are not described here in detail again; refer to Section 11.2.2 instead. The options -M and -L are described in more detail in the following sections.

Image

The files idx0, idx1, and so on contain raw index entries. If you specify more than one input file, you might want to use -o to name the output file, as the default output file name is always computed from idx0.

When you use option -c, -p, or -s, you will be warned that these MakeIndex options are not supported. In fact, xindy style files are self-written modules and are specified with option -M; Section 11.3.4 explains their creation in more detail.

The texindy command compresses blanks by default, since the authors think that this is the behavior you would expect from an index processor. In fact, the whole TeX program suite works by default under the assumption that sequences of white space are essentially one blank. If you insist on MakeIndex-compatible behavior, you can use the module keep-blanks, as explained in Section 11.3.3.

MakeIndex has the -p option to output a LaTeX command to the .ind file that sets the page counter. It may even try to parse the LaTeX log file for that purpose. The xindy program has no such option, and this omission is by design. The xindy authors believe that having a separate LaTeX document for an index is too prone to error and that the ability to include a LaTeX file with the printindex command into the main document is a much better approach.

Indexing LaTeX commands

The texindy command ignores unknown TeX commands by default under the assumption that they do not produce text. It also knows about typical text-producing commands like LaTeX and BibTeX and handles them correctly. If you have your own command definition that produces text, or if you use one supplied by a package, then the entry is sorted incorrectly. You will either need to specify an explicit sort key in your index entry, as in index{prog@Prog}, or write a xindy style file with a merge rule, as explained in Section 11.3.4.

Be aware that producing index entries in arguments of commands has its own pitfalls, e.g., in command{Properties of Progindex{Prog}}. Then LaTeX commands might be expanded before they are written to the .idx file and the placement in the index will depend on the expansion of Prog.

11.3.2. International indexing with xindy

Most non-English languages present additional challenges for index processing. They have accented characters or language-specific characters that obey special rules on how to sort them. It is usually not enough to ignore the accents, and, of course, one must not use the binary encoding of national characters for sorting. In fact, it would be very hard to use binary encoding for sorting even if one wants to—most implementations of LaTeX output many non-ASCII characters as ^^xy, where xy is the hex code of the respective character.

The reality is different: either foreign characters are input with macros, or the inputenc package is used. For example, LaTeX users in Western Europe on a Linux system are likely to add usepackage[latin1]{inputenc} to all their documents (or on recent Linux distributions the option utf8), while Windows users would use the inputenc option ansinew or utf8. Then, the raw index file suddenly has lots of LaTeX commands in it, since all national and accented characters are output as commands. In MakeIndex, the author needs to separately specify sort and print keys for such index entries. This specification may be managed for some entries, but matters become very error prone if it must be done for all entries that have national characters. In addition, creating index entries automatically by LaTeX commands (as recommended in Section 11.1.7) is no longer possible.

The xindy program deals with this problem. It knows about LaTeX macros for national characters and handles them as needed. It allows you to define new alphabets and their sort order as well as more complex multiphase sort rules to describe the appropriate sorting scheme. You can then address typical real-world requirements, such as the following:

German German recognizes two different sorting schemes to handle umlauts: normally, ä is sorted like ae, but in phone books or dictionaries, it is sorted like a. The first scheme is known as DIN order, the second as Duden order [44].

Spanish In Spanish, the ligature ll is a separate letter group, appearing after l and before m.

French In French, the first phase of sorting ignores the diacritics, so that cote, côte, coté and côté are all sorted alike. In the next phase, within words that differ only in accents, the accented letters are looked at from right to left. Letters with diacritics then follow letters without them. Thus, cote and côte come first (no accent on the e), and then words with o come before words with ô.

The xindy program provides language modules for a growing number of languages. Such a language module defines the alphabet with all national characters, their sort rules, and letter group definitions adapted to that language. In addition, accented characters commonly used within that language are handled correctly. The predefined language modules cover Western and Eastern European languages. Currently, there is no support available for Asian languages.

There are about 50 predefined languages available, 35 of them are readily usable with texindy. They are listed in Table 11.3 on the facing page; you select one of them with the texindy option -L. The other predefined languages have non-Latin scripts, their usage is described in the xindy documentation.

Image

Table 11.3. Languages supported by texindy

You can also build your own xindy language module. The xindy utility make-rules simplifies this procedure if your language fulfills the following criteria:

• Its script system uses an alphabet with letters.

• It has a sort order based on these letters (and on accents).

• No special context backtracking is required for sorting; accents influence only the sort order of the accented letters.

The xindy web site (www.xindy.org) has more information about language module creation with or without make-rules. If you create a new one, please contribute it to the xindy project.

11.3.3. Modules for common tasks

Like MakeIndex, xindy may be configured by creating a personal style file, as explained in Section 11.3.4. Most users, however, do not need the full power of xindy configuration. They merely want to solve common problems with a predefined set of possible solutions.

To simplify the completion of common tasks, xindy is distributed with a set of modules, listed in Table 11.4 on the next page. They provide standard solutions for sorting, page range building, and layout requirements. If you have no further demands, you can build your international index without a personal style file; you just specify a language option and the modules you want on the texindy command line. If you use the texindy command, you will deal with three categories of modules:

Image
Image

Table 11.4. xindy standard modules

Automatic modules These modules establish a behavior that is conformant to MakeIndex. You cannot turn them off as long as you use the texindy command. If you do not want their behavior, you have to use xindy directly as described in Section 11.3.4.

Default modules Some modules are activated by default and can be turned off with texindy options.

Add-on modules You can select one or more additional modules with the xindy option -M.

The automatic module LaTeX-loc-fmts indicates a difference between xindy and MakeIndex. In MakeIndex, you can use a general encapsulation notation to enclose your page number with an arbitrary command (see Section 11.1.4). In xindy, you have to define a location reference class with a corresponding markup definition for each command (see page 678). The LaTeX-loc-fmts module provides such definitions for the most common encapsulations, textbf and textit.

11.3.4. Style files for individual solutions

The xindy program is a highly configurable tool. The chosen functionality is specified in a style file. The texindy command provides convenient access for most purposes, by building a virtual style file from existing modules. If you want to extend the features provided, change functionality, or build your own indexing scheme, you have to use xindy directly and write your own style file, which is just another module. The available xindy modules may be reused.

This section demonstrates how to use xindy with your own style file. It describes the basic concepts underlying the xindy program and gives examples for typical extensions.

The xindy process model

The xindy style files are also the means by which you create indexes for non-LaTeX documents (e.g., XML documents, other Unicode-based markup systems). Features used for that purpose are not described in this section as they are beyond the scope of this book. If you’re interested, you’ll find more material at the xindy web site. To understand xindy style files, we need to present more detail on the basic model that xindy uses. Figure 11.8 on the following page shows the processing steps. A xindy style file contains merge rules, sort rules, location specifications, and markup specifications. Using these declarations, it defines how the raw index from the .idx file is transformed into the tagged index in the .ind file.

Image

Figure 11.8. xindy process model

• Merge rules specify how a sort key is computed from a raw key. A raw index may contain raw keys that represent the same entry, but are typed in differently. This may be caused by LaTeX expanding or not expanding commands depending on the context. Another cause may be authors using different notations, e.g., ä, "a or "a. Using merge rules makes manual additions for unification unnecessary. Merge rules are also helpful for indexing LaTeX commands. xindy ignores all commands. If, e.g., MF is used for and added to the index, the entry will not appear within the “M” section. A document specific merge rule can guarantee correct sorting without being forced to write index{METAFONT@MF} every time.

• Sort rules declare alphabets, and order within alphabets. The alphabet may not only consist of single characters, but sometimes multiple characters may form a unit for sorting (e.g., ll in Spanish). Such new characters must be ordered relative to other characters. A xindy language module consists of alphabet declarations, sort rules, and letter group definitions.

• After sorting, index entries with the same sort key are combined into a consolidated index entry with several locations and a print key. From the raw keys, the first one that appeared in the document is selected as the print key. Ordering, grouping, mixing, and omitting locations to get the final list of locations is a complex task that may be influenced in many ways by location specifications.

• Markup specifications describe which LaTeX commands are added to the consolidated index entries, thus producing a tagged index that can be used as input for LaTeX.

Calling xindy directly

The xindy options are very similar to those available with texindy. You specify your style file like any other module.

Image
Building a xindy style file

A xindy style file will usually start with loading predefined modules that provide much of the desired functionality. Recall that you also have to name explicitly those modules listed as automatic (auto) in Table 11.4 on page 672. Afterwards, you can provide definitions of your own that extend or override the already loaded modules.

Image

Style file syntax

The previous example of a xindy style file showed some of the syntax elements that are available. We now give more precise definitions:

• Basically, a style file consists of a list of declarative clauses in parentheses, starting with a declaration name and followed by several parameters.

• A parameter may be either a string or an option. An option has a keyword, written as :opt, and may have an argument, usually a string but also a number or a fixed value like none. As the name indicates, options are optional; which options are valid depends on the function. A parameter may also comprise a list of parameters in parentheses, as shown in some examples below.

• Comments start with a semicolon and go until the end of line. The examples show a typical way to use different numbers of semicolons: one for inline comments (after xindy clauses), two for block comments in front of code, and three for comments with “section headers” for the style file. But this is merely a convention—in all places the first semicolon starts the comment.

• Strings are enclosed in double quotes. Newlines are allowed in strings. Within strings, the tilde is an escape character that makes the following letter do something special. For example, ~n specifies a newline.

Merge and sort rules

Merge rules help to normalize raw index entries before sorting and grouping take place. They can be used to unify different notations and to strip the entry from markup material that is irrelevant to sorting. If you merge different index entries, they will appear as one entry and consequently have the same printed representation; that is, all of them will look like the first one that appears in your document. Note that you can only merge single words, not whole phrases.

A merge rule takes two parameters, and declares that occurrences of the first parameter within a word are substituted by the second parameter. Within the second parameter, the virtual characters ~b and ~e may be used: ~b is ordered in front of all other characters, whereas ~e comes after all characters. These two virtual characters are not output, as merge rules are used to construct the sort key from the raw key—and sort keys are internal entry identificators.

Unify index entries

For example, in a city index, places with St in their name may also be written with Saint. Those different spellings should be unified to one index entry nevertheless. In other words, indexing St Barth and Saint Barth shall result in only one index entry.

Image

Unify using regular expressions

In a merge rule, you can also specify a pattern (regular expression) and a replacement string. So-called extended regexps are the default and are defined in the POSIX 1003.2 standard. On UN*X systems, you will find their description in the man page of egrep. You can also use basic regular expressions, with the option :bregexp in the merge rule. The replacement string may refer to subexpressions, which leads to powerful specifications that are often hard to create and debug. Note also that usage of regular expressions will slow processing down. To index XML tags without angles, you can write:

Image

This will cause index{<HTML>} and index{HTML} to be unified as one entry, which may not be the desired effect. To list them separately, but next to each other, you could modify <HTML> to HTML~e as follows:

Image

Sort rules specify how characters or character sequences are sorted (i.e., at which position in the alphabet they should be placed). A sort rule consists of two strings. The first string is sorted like the second one. The second string may use ~b and ~e to specify the sort order, as explained above.

Letter groups

The xindy program checks for each letter group to see whether it matches a prefix of the entries’ sort key. The longest match assigns the index entry to this letter group. If no match is found, the index entry is put into the group default.

Combine letter groups

The following definitions add all entries with the given prefixes to the same letter group ABC:

Image

Extra letter groups

With indexes that are a bit unbalanced on, say, the letter X, you may want to build an extra letter group named xsl that contains all entries that start with xsl:. These entries will be sorted before all other entries that start with x.

Image
Locations

The list of references behind an index entry may contain several groups that have a nonobvious but required order—perhaps Roman numbers, then Arabic numbers, then letters-Arabic numbers combined. We associate this scheme with a typical book having preface matter, normal content, and appendices. In xindy, each such group is called a location class. Within each location class, references are ordered as well. References may be combined to ranges like 10–15 or 5ff. As you see, xindy allows you to manipulate sorting and range building in various ways.

Page range length

As an example, to change the minimal length of page ranges, just modify your location class for pages:

Image

Suppress page ranges

To suppress ranges for Roman numbers, change the :min-range-length option as follows:

Image

Nonstandard locations

If your raw index contains references with non-numeric components and an unusual syntax (e.g., Pasta::II.4), you have to define a special alphabet so that xindy knows how to sort. Use it to define a location class that describes the reference syntax, including separators:

Image
Location formatting

The xindy program has a very flexible mechanism for formatting, sorting, and grouping locations with special meanings. In your document, you mark up index entries for special formatting, such as index{keyword|definition}. In xindy, you define an attribute with a corresponding markup definition.

You can also configure how your different index entry categories should interact: mix them or list them separately, allow subsuming ranges between them or not, omit entries once part of a range or not.

The following examples illustrate different variations of handling references with special formatting.

Image

Example 1: Mix, subsume, and omit locations.

Image

Example 2: Mix and subsume locations.

Image

Example 3: Do not mix locations, list definitions first.

Image

Note that define-attributes has one parameter in parentheses. It consists of either one list of attribute names enclosed in parentheses or a list of strings, each string enclosed in parentheses. All attributes that are together in one brace are mixed. If you have several attributes, an expression like

Image

would indicate that definitions may be mixed with the group of important references, but not with default references.

11.4. Enhancing the index with LaTeX features

This section describes LaTeX’s support for index creation. It presents possibilities to modify the index layout and to produce multiple indexes.

11.4.1. Modifying the layout

You can redefine the environment theindex, which by default is used to print the index. The layout of the theindex environment and the definition of the item, subitem, and subsubitem commands are defined in the class files article, book, and report. In the book class you can find the following definitions:

Image

Although this is programmed in a fairly low-level internal language, you can probably decipher what it sets up. First it tests for two-column mode and saves the result. Then it sets some spacing parameters, resets the page style to plain, and calls wocolumn. Finally it changes item to execute @idxitem, which produces a paragraph with a hanging indention of 40 points. A higher-level reimplementation (using ifthen) might perhaps look as follows:

Image

Adjusting this definition allows you to make smaller modifications, such as changing the page style or the column separation.

You can also make an index in three rather than two columns. To do so, you can use the multicol package and the multicols environment:

Image

We require at least 10 lines of free space on the current page; otherwise, we want the index to start on a new page. In addition to generating a title at the top, we enter the heading as a “Chapter” in the table of contents (.toc) and change the page style to plain. Then the item command is redefined to cope with index entries (see above), and the entries themselves are typeset in three columns using the multicols environment.

11.4.2. showidx, repeatindex, tocbibind, indxcite—Little helpers

Several useful little LaTeX packages exist to support index creation. A selection is listed in this section, but by browsing through the on-line catalogue [169] you will probably find additional ones.

Show index entries in margin

The package showidx (by Leslie Lamport) can help you improve the entries in the index and locate possible problems. It shows all index commands in the margin of the printed page. Figure 11.4 on page 656 shows the result of including the showidx package.

Handle page breaks gracefully

The package repeatindex (by Harald Harders) repeats the main item of an index if a page or column break occurs within a list of subitems. This helps the reader correctly identify to which main item a subitem belongs.

Table of contents support

The package tocbibind (by Peter Wilson) can be used to add the table of contents itself, the bibliography, and the index to the Table of Contents listing. See page 48 for more information on this package.

Automatic author index

The package indxcite (by James Ashton) automatically generates an author index based on citations made using BibTeX. This type of functionality is also available with the bibliography packages natbib and jurabib, both of which are described in detail in Chapter 12.

11.4.3. index—Producing multiple indexes

The index package (written by David Jones and distributed as part of the camel package) augments LaTeX’s indexing mechanism in several areas:

• Multiple indexes are supported.

• A two-stage process is used for creating the raw index files (such as the default .idx file) similar to that used to create the .toc file. First the index entries are written to the .aux file, and then they are copied to the .idx file at the end of the run. With this approach, if you have a large document consisting of several included files (using the include command), you no longer lose the index if you format only part of the document with includeonly. Note, however, that this makes the creation of a chapter index more difficult.

• A starred form of the index command is introduced. In addition to entering its argument in the index, it typesets the argument in the running text.

• To simplify typing, the shortindexingon command activates a shorthand notation. Now you can type ^{foo} for index{foo} and _{foo} for index*{foo}. These shorthand notations are turned off with the shortindexingoff command. Because the underscore and circumflex characters have special meanings inside math mode, this shorthand notation is unavailable there.

• The package includes the functionality of the showidx package. The command proofmodetrue enables the printing of index entries in the margins. You can customize the size and style of the font used in the margin with the indexproofstyle command, which takes a font definition as its argument (e.g., indexproofstyle{footnotesizeitshape}).

• The argument of index is never expanded when the index package is used. In standard LaTeX, using index{command} will sometimes write the expansion of command to the .idx file (see Section 11.2.5 on page 665). With the index package, command itself is always written to the .idx file. While this is helpful in most cases, macro authors can be bitten by this behavior. In Section 11.1.7, we recommended that you define commands that automatically add index entries. Such commands often expect that index will expand its parameter and they may not work when you use the index package. Be careful and check the results of the automatic indexing—this is best practice, anyhow.

You can declare new indexes with the ewindex command. The command enewindex, which has an identical syntax, is used to redefine existing indexes.

Image

The first argument, tag, is a short identifier used to refer to the index. In particular, the commands index and printindex are redefined to take an optional argument—namely, the tag of the index to which you are referring. If this optional argument is absent, the index with the tag “default” is used, which corresponds to the usual index. The second argument, raw-ext, is the extension of the raw index file to which LaTeX should write the unprocessed entries for this index (for the default index it is .idx). The third argument, proc-ext, is the extension of the index file in which LaTeX expects to find the processed index (for the default index it is .ind). The fourth argument, indextitle, is the title that LaTeX will print at the beginning of the index.

As an example we show the set-up used to produce this book. The preamble included the following setting:

Image

In the backmatter, printing of the index was done with the following lines:

Image

For each generated raw index file (e.g., tlc2.adx for the list of authors) we ran MakeIndex to produce the corresponding formatted index file for LaTeX:

Image

While all of these tools help to get the correct page numbers in the index, the real difficulty persists: choosing useful index entries for your readers. This problem you still have to solve (if you are lucky, with help).

In fact, the index of this book was created by a professional indexer, Richard Evans of Infodex Indexing Services in Raleigh, North Carolina. Dick worked closely with Frank to produce a comprehensive index that helps you, the reader, find not only the names of things (packages, programs, commands, and so on) but also the tasks, concepts, and ideas described in the book. But let him tell you (from the Infodex FAQ at http://www.mindspring.com/~infodex):

Question: Why do I need an indexer? Can’t the computer create an index?

Answer: To exactly the same degree that a word processor can write the book. Indexes are creative works, requiring human intellect and analysis.

LaTeX can process the indexing markup, but only a human indexer can decide what needs to be marked up. Our sincere thanks to Dick for his excellent work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.64.172