Chapter 13. Bibliography Generation

While a table of contents (see Section 2.3) and an index (discussed in Chapter 11) make it easier to navigate through a book, the presence of bibliographic references should allow you to verify the used sources and to probe further subjects you consider interesting. To make this possible, the references should be precise and lead to the relevant work with a minimum of effort.

There exist many ways for formatting bibliographies, and different fields of scholarly activities have developed very precise rules in this area. An interesting overview of Anglo-Saxon practices can be found in the chapter on bibliographies in The Chicago Manual of Style [38]. Normally, authors must follow the rules laid out by their publisher. Therefore, one of the more important tasks when submitting a book or an article for publication is to generate the bibliographic reference list according to those rules.

Traditional ways of composing such lists by hand, without the systematic help of computers, are plagued with the following problems:

• Citations, particularly in a document with contributions from many authors, are hard to make consistent. Difficulties arise, such as variations in the use of full forenames versus abbreviations (with or without periods); italicization or quoting of titles; spelling “ed.”, “Ed.”, or “Editor”; and the various forms of journal volume number.

• A bibliography laid out in one style (e.g., alphabetic by author and year) is extremely hard to convert to another (e.g., numeric citation order) if requested by a publisher.

• It is difficult to maintain one large database of bibliographic references that can be reused in different documents.

In Chapter 12 we were mainly concerned with the citation of sources within the text. In the present chapter we concentrate on the formatting of reference lists and bibliographies, and we discuss possibilities for managing collections of citations in databases. The chapter is heavily based on the BibTeX program, written by Oren Patashnik, which integrates well with LaTeX.

We start by introducing the program and variants of it, touching on recent developments geared toward creating a successor. This is followed by a detailed introduction to the BibTeX database format, which collects information on how to specify bibliographical data in a suitable form to be processed by BibTeX. Instead of collecting your own bibliographical data, there is also the possibility of drawing information from various on-line sources that offer such data in BibTeX format. Some of them are introduced in Section 13.3.

Having collected data for BibTeX databases, the next natural step is to look for tools that help in managing such databases. Section 13.4 offers tools of various flavors for this task, ranging from command-line utilities to GUI-based programs for various platforms.

Once everything is under control, we return in Section 13.5 to the task of typesetting and look at how different BibTeX styles can be used to produce different bibliography layouts from the same input. As there may not be a suitable style for a particular set of layout requirements available, Section 13.5.2 discusses how to generate customized styles using the custom-bib package without the need for any BibTeX style programming.

For those readers who really want to (or have to) dig into the mysteries of BibTeX style programming, the final section gives more details about the format of such style files, including a short overview of the commands and intrinsic functions available. The global structure of the generic style documentation file btxbst.doc is explained, and it is shown how to adapt an existing style file to the needs of a particular house style or foreign language.

13.1. The BibTeX program and some variants

The BibTeX program was designed by Oren Patashnik to provide a flexible solution to the problem of automatically generating bibliography lists conforming to different layout styles. It automatically detects the citation requests in a LaTeX document (by scanning its .aux file or files), selects the needed bibliographical information from one or more specified databases, and formats it according to a specified layout style. Its output is a file containing the bibliography listing as LaTeX code that will be automatically loaded and used by LaTeX on the following run. Section 12.1.3 on page 687 discussed the interface between the two programs in some detail.

At the time of this book’s writing BibTeX was available as version 0.99c, but if you look into the first edition of this book (a decade back), you will find that it also talks about version 0.99c. The version 0.99a probably dates back to 1986. In other words, the program has been kept stable for a very long period of time. As a consequence, the BibTeX database format is very well established in the LaTeX world, with many people having numerous citation entries collected over the years. Thus, it comes as no surprise that all development that happened in the last decade is based on that format as a standard.

In this section we briefly survey a number of developments in this arena. Some new projects have surfaced especially in recent years, but there are also some projects that date back a few years.

13.1.1. bibtex8—An 8-bit reimplementation of BibTeX

Due to its age and origins BibTeX is 7-bit, ASCII based. Although it is able to handle foreign characters, its functionality in this respect is rather limited. The BibTeX8 program written by Niel Kempson and Alejandro Aguilar-Sierra is an 8-bit reimplementation of BibTeX with the ability to specify sorting order information. This allows you to store your BibTeX database entries in your favorite 8-bit code page, and to use the inputenc package in your LaTeX document (see Sections 7.5.2 and 7.11.3). Sorting order information related to a specific encoding can be specified on the command line—for example,

Image

on the author’s machine. The sorting order is stored in files with the extension .csf (e.g., in the above example in the file 88591lat.csf). The distribution comes with a number of such files for the most popular encodings. The format is well documented so that it should be possible to provide your own .csf file if necessary. Related command-line options are -7 and -8 to force 7-bit or 8-bit processing, respectively, without a special sorting order.

The BibTeX8 program offers a second set of command-line options that allows you to enlarge its internal tables. In 1995, when the first release of the program was written, standard BibTeX had only small, hard-wired internal tables, making it impossible to typeset, say, a bibliography listing with several hundred citations. These days most installations use higher compile defaults (e.g., 5000 citations) so that the flexibility of BibTeX8 in this respect is seldom needed. But in case a particular job hits one of the limits and emits a message like “Sorry—you've exceeded BibTeX's...” you can use BibTeX8 with a suitable command-line setting to get around the problem. You can find out about the possible options by calling the program without any input or with the option -h or --help.

13.1.2. Recent developments

Besides BibTeX and BibTeX8, both of which have been available for a long time, there have been some more recent developments that target bibliography generation. In this section we briefly introduce three projects that might be of interest to the reader. It is quite possible that one or the other project merge together in the future, so this list should be viewed as a snapshot of the situation in 2003 and as proof that there is a renewed interest in further development.

bibulus—Bibliographies with XML and perl

The program bibulus by Thomas Widmann is a BibTeX replacement written in perl.1 It does not use BibTeX’s database file format but rather works with bibliographical entries stored in XML format and provides its own document type definition (bibulus.dtd). This way bibliographical entries can be manipulated and processed with any application that understands XML. To enable the reuse of existing .bib files, the program provides a tool to convert your BibTeX databases to XML format.

1 For installation and use it needs a recent perl implementation (5.8+).

The bibulus program uses Unicode internally and thus is truly multilingual; at the same time it is able to read and write output in other encodings. The textual strings generated by the program have been translated into a large number of languages. The current implementation of bibulus provides support for more than a dozen languages.

From the program’s point of view LaTeX is only one of the different possible target output formats. Alternatives range from plain text output, to HTML, to input formats for other programs dealing with citations.

Like the other two programs described below, bibulus is work in progress. It is available from http://www.nongnu.org/bibulus, where you will also find further information on the project.

BibTeX++—A BibTeX successor in Java

The BibTeX++ project is a Java-based implementation of a citation manager written by Emmanuel Donin de Rosière in the course of a master thesis [146] supervised by Ronan Keryell. Being intended to serve as a BibTeX successor, it can, of course, be used in the LaTeX world, but it also accepts other bibliography formats and different style languages and can produce output for several typesetting systems. The program is integrated in a web-based environment, so it can retrieve lacking information from various Internet sources. BibTeX++ uses a plug-in concept that allows you to dynamically extend its functionalities, perhaps to support special formatting conventions or to generate output for other formatters.

Existing BibTeX style files can be converted to a BibTeX++ style using a translation program that was developed as part of the project. The result can be further customized by using the BibTeX++ concepts, thus easing the initial development of a new style.

The project’s home is at http://bibtex.enstb.org, where you will find a CVS repository as well as compiled binaries and further information.

MIBibTeX—A multilingual successor of BibTeX

The program MIBibTeX, developed by Jean-Michel Hufflen, is a reimplementation and extension of BibTeX with particular focus on multilingual features. A first release became available in 2001. However, the author found that the approach taken back then was not really suitable for the typographical conventions used in some languages. At that stage of the project he developed a questionnaire to obtain more insight into the problems and conventions with bibliographic data in different European countries. In response, a new implementation was started; its first results were presented at various conferences in 2003.

The current release (v1.3) implements a style language named nbst, for specifying layout and formatting directives. This language is close, but not identical, to XSLT, the language for manipulating and processing XML documents.

The project’s home is at http://lifc.univ-fcomte.fr/~hufflen/texts/mlbibtex/mlbibtex/mlbibtex.html, where further information can be found.

13.2. The BibTeX database format

A BibTeX database is a plain text (ASCII) file that contains bibliographical entries internally structured as keyword/value pairs. A typical database file was shown in Figure 12.2 on page 690. In this section we study the allowed syntax of its entries in some detail; see also [135].

Each entry in a BibTeX database consists of three main parts: a type specifier, followed by a key, and finally the data for the entry itself. The type describes the general nature of the entry (e.g., whether it is an article, book, or some other publication). The key is used in the interface to LaTeX; it is the string that you have to place in the argument of a cite command when referencing that particular entry. The data part consists of a series of field entries (depending on the type), which can have one of two forms as seen in the following generic format and example:

Image

The comma is the field separator. Spaces surrounding the equals sign or the comma are ignored. Inside the text part of a field (enclosed in a pair of double quotes or a pair of braces) you can have any string of characters, but braces must be matched. The quotes or braces can be omitted for text consisting entirely of numbers (like the year field in the example above). Note that LaTeX’s comment character % is not a comment character inside .bib database files. Instead, anything outside an entry is considered a comment as long as it does not contain an @ sign (which would be misinterpreted as the start of a new entry).

BibTeX ignores the case of the letters for the entry type, key, and field names. You must, however, be careful with the key. LaTeX honors the case of the keys specified as the argument of a cite command, so the key for a given bibliographic entry must match the one specified in the LaTeX file (see Section 12.2.1).

13.2.1. Entry types and fields

As discussed above, you must describe each bibliographic entry as belonging to a certain class, with the information itself tagged by certain fields.

The first thing you have to decide is what type of entry you are dealing with. Although no fixed classification scheme can be complete, with a little creativity you can make BibTeX cope reasonably well with even the more bizarre types of publications. For nonstandard types, it is probably wise not to attach too much importance to BibTeX’s warning messages (see below).

Most BibTeX styles have at least the 13 standard entry types, which are shown in Table 13.1 on the facing page. These different types of publications demand different kinds of information; a reference to a journal article might include the volume and number of the journal, which is usually not meaningful for a book. Therefore, different database types have different fields. In fact, for each type of entry, the fields are divided into three classes:

Image
Image

Table 13.1. BibTeX’s entry types as defined in most styles

Required Omission of the field will produce a warning message and, possibly, a badly formatted bibliography entry. If the required information is not meaningful, you are using the wrong entry type. If the required information is meaningful but, say, already included in some other field, simply ignore the warning.

Optional The field’s information will be used if present, but you can omit it without causing formatting problems. Include the optional field if it can help the reader.

Ignored The field is ignored. BibTeX ignores any field that is not required or optional, so you can include any fields in a .bib file entry. It is a good idea to put all relevant information about a reference in its .bib file entry, even information that may never appear in the bibliography. For example, the abstract of a paper can be entered into an abstract field in its .bib file entry. The .bib file is probably as good a place as any for the abstract, and there exist bibliography styles for printing selected abstracts (see the abstract bibliography style mentioned in Table 13.4 on page 791).

Table 13.1 on the facing page describes the standard entry types, along with their required and optional fields, as used by the standard bibliography styles.

The fields within each class (required or optional) are listed in the typical order of occurrence in the output. A few entry types, however, may perturb the alphabetic ordering slightly, depending on which fields are missing. The meaning of the individual fields is explained in Table 13.2 on the next page. Nonstandard bibliography styles may ignore some optional fields or use additional ones like isbn when creating the reference (see also the examples starting on page 793). Remember that, when used in a .bib file, the entry-type name is preceded by an @ character.

Image
Image

Table 13.2. BibTeX’s standard entry fields

Sorting of entries

Most BibTeX style files sort the bibliographical entries. This is done by internally generating a sort key from the author’s/editor’s name, the date of the publication, the title, and other information. Entries with identical sort keys will appear in citation order.

The author information is usually the author field, but some styles use the editor or organization field. In addition to the fields listed in Table 13.1, each entry type has an optional key field, used in some styles for alphabetizing, for cross-referencing, or for forming a ibitem label. You should therefore include a key field for any entry whose author information is missing. Depending on the style the key field can also be used to overwrite the automatically generated internal key for sorting.1 A situation where a key field is useful is the following:

Image

1 Some BibTeX styles (e.g., jurabib) use the sortkey field instead.

Without the key field, the alpha style would construct a label from the first three letters of the information in the organization field. Although the style file will strip off the article “The”, you would still get a rather uninformative label like “[Ass86]”. The key field above yields a more acceptable “[ACM86]”.

We now turn our attention to the fields recognized by the standard bibliography styles. These “standard” fields are shown in Table 13.2 on the facing page. Other fields, like abstract, can be required if you use one of the extended nonstandard styles shown in Table 13.4 on page 791. As nonrecognized fields are ignored by the BibTeX styles, you can use this feature to include “comments” inside an entry: it is enough to put the information to be ignored inside braces following a field name (and = sign) that is not recognized by the BibTeX style.

As with the names of the entry types in Table 13.1 on the preceding page, the names of the fields should be interpreted in their widest sense to make them applicable in a maximum number of situations. And you should never forget that a judicious use of the note field can solve even the more complicated cases.

13.2.2. The text part of a field explained

The text part of a field in a BibTeX entry is enclosed in a pair of double quotes or curly braces. Part of the text itself is said to be enclosed in braces if it lies inside a matching pair of braces other than the ones enclosing the entire entry.

The structure of a name

The author and editor fields contain a list of names. The exact format in which these names are typeset is decided by the bibliography style. The entry in the .bib database tells BibTeX what the name is. You should always type names exactly as they appear in the cited work, even when they have slightly different forms in two works. For example:

Image

If you are sure that both authors are the same person, then you could list both in the form that the author prefers (say, Donald E. Knuth), but you should always indicate (e.g., in our second case) that the original publication had a different form.

Image

BibTeX alphabetizes this as if the brackets were not there, so that no ambiguity arises as to the identity of the author.

Most names can be entered in the following two equivalent forms:

Image

The second form, with a comma, should always be used for people who have multiple last names that are capitalized. For example,

Image

If you enter "Miguel Parra Benavides", BibTeX will take "Parra" as the middle name, which is wrong in this case. When the other parts are not capitalized, no such problem occurs (e.g., "Johann von Bergen" or "Pierre de la Porte").

If several words of a name have to be grouped, they should be enclosed in braces. BibTeX treats everything inside braces as a single name, as shown below.

Image

In this case, Inc. and Ltd. are not mistakenly considered as first names.

In general, BibTeX names can have four distinct parts, denoted as First, von, Last, and Jr. Each part consists of a list of name tokens, and any list but Last can be empty. Thus, the two entries below are different:

Image

The first has von, Last, and First parts, while the second has only First and Last parts (von der Schmidt), resulting possibly in a different sorting order.

A “Junior” part can pose a special problem. Most people with “Jr.” in their name precede it with a comma, thus entering it as follows:

Image

Certain people do not use the comma, and these cases are handled by considering the “Jr.” as part of the last name:

Image

Recall that in the case of “Miguel Parra Benavides, you should specify

Image

The First part of his name has the single token “Miguel”; the Last part has two tokens, “Parra and “Benavides; and the von and Jr parts are empty.

A complex example is

Image

This name has three tokens in the First part, two in the von part, and two in the Last part. BibTeX knows where one part ends and the other begins because the tokens in the von part begin with lowercase letters (van de in this example).

In general, von tokens have the first letter at brace-level 0 in lowercase. Technically speaking, everything in a “special character” is at brace-level 0 (see page 768), so you can decide how BibTeX treats a token by inserting a dummy special character whose first letter past the TeX control sequence is in the desired case, upper or lower. For example, in

Image

BibTeX will take the uppercase “De La” as the von part, since the first character following the control sequence is lowercase. With the abbrv style you will get the correct abbreviation M. De La Cruz, instead of the incorrect M. D. L. Cruz if you did not use this trick.

BibTeX handles hyphenated names correctly. For example, an entry like

Image

with the abbrv style, results in “M.-V. Delgrande”.

When multiple authors are present, their names should be separated with the word “and”, where the “and” must not be enclosed in braces.

Image

There are two authors, Frank Mittelbach and Chris Rowley, but only one editor, since the “and” is enclosed in braces. If the number of authors or editors is too large to be typed in extenso, then the list of names can be ended with the string “and others”, which is converted by the standard styles into the familiar “et al.”

To summarize, you can specify names in BibTeX using three possible forms (the double quotes and curly braces can be used in all cases):

Image

The first form can almost always be used. It is, however, not suitable when there is a Jr part, or when the Last part has multiple tokens and there is no von part.

The format of the title

The bibliography style decides whether a title is capitalized. Usually, titles of books are capitalized, but those for articles are not. A title should always be typed as it appears in the original work. For example:

Image

Different languages and styles have their own capitalization rules. If you want to override the decisions of the bibliography style, then you should enclose the parts that should remain unchanged inside braces. Note that this will not be sufficient when the first character after the left brace is a backslash (see below). It is usually best to enclose whole words in braces, because otherwise LaTeX may lose kerning or ligatures when typesetting the word. In the following example, the first version is preferable over the second:

Image
Accented and special characters

BibTeX accepts accented characters. If you have an entry with two fields

Image

then the alpha bibliography style will yield the label [Göd31], which is probably what you want. As shown in the example above, the entire accented character must be placed in braces; in this case either {"o} or {"{o}} will work. These braces must not themselves be enclosed in braces (other than the ones that might delimit the entire field or the entire entry); also, a backslash must be the very first character inside the braces. Thus, neither {G{"{o}}del} nor {G"{o}del} works here.

This feature handles accented characters and foreign symbols used with LaTeX. It also allows user-defined “accents”. For purposes of counting letters in labels, BibTeX considers everything inside the braces to be a single letter. To BibTeX, an accented character is a special case of a “special character”, which consists of everything from a left brace at the topmost level, immediately followed by a backslash, up through the matching right brace. For example, the field

Image

has two special characters: “{'{E}mile}” and “{ij}”.

In general, BibTeX does not process TeX or LaTeX control sequences inside a special character, but it will process other characters. Thus, a style that converts all titles to lowercase transforms

Image

The article “The” remains capitalized because it is the first word in the title.

The special character scheme has its uses for handling accented characters (although the introduction of additional braces may upset the generation of ligatures and kerns). It may help to make BibTeX’s alphabetizing do what you want, but again with some caveats; see the discussion of the SortNoop command on page 771. Also, since BibTeX counts an entire special character as just one letter, you can force extra characters inside labels.

13.2.3. Abbreviations in BibTeX

BibTeX text fields can be abbreviated. An abbreviation is a string of ASCII characters starting with a letter and not containing a space or any of the following 10 characters:

Image

You can define your own abbreviations with the @string command in a .bib file, as shown below.

Image

Abbreviations can be used in the text part of BibTeX fields, but they should not be enclosed in braces or quotation marks. With the above string definitions, the following two ways of specifying the journal field are equivalent:

Image

The case of the name for an abbreviation is not important, so CACM and cacm are considered identical, but BibTeX produces a warning if you mix different cases. Also, the @string command itself can be spelled as all lowercase, all uppercase, or a mixture of the two cases.

@string commands can appear anywhere in the .bib file, but an abbreviation must be defined before it is used. It is good practice to group all @string commands at the beginning of a .bib file, or to place them in a dedicated .bib file containing only a list of abbreviations. The @string commands defined in the .bib file take precedence over definitions in the style files.

You can concatenate several strings (or @string definitions) using the concatenation operator #. Given the definition

Image

you can easily construct nearly identical journal fields for different entries:

Image

Most bibliography styles contain a series of predefined abbreviations. As a convention, there should always be three-letter abbreviations for the months: jan, feb, mar, and so forth. In your BibTeX database files you should always use these three-letter abbreviations for the months, rather than spelling them explicitly. This assures consistency inside your bibliography. Information about the day of the month is usually best included in the month field. You might, for example, make use of the possibility of concatenation:

Image

Names of popular journals in a given application field are also made available as abbreviations in most styles. To identify them you should consult the documentation associated with the bibliographic style in question. The set of journals listed in Table 13.3 on the facing page should be available in all styles. You can easily define your own set of journal abbreviations by putting them in @string commands in their own database file and listing this database file as an argument to LaTeX’s ibliography command.

Image

Table 13.3. Predefined journal strings in BibTeX styles

13.2.4. The BibTeX preamble

BibTeX offers a @preamble command with a syntax similar to that of the @string command except that there is no name or equals sign, just the string. For example:

Image

You can see how the different command definitions inside the @preamble are concatenated using the # symbol. The standard styles output the argument of the @preamble literally to the .bbl file, so that the command definitions are available when LaTeX reads the file. If you add LaTeX commands in this way, you must ensure that they are added using providecommand and not ewcommand. There are two reasons for this requirement. First, you deprive yourself of the ability to change the definition in the document (e.g., the bibliography might add a simple definition for the command url that you may want to replace by the definition from the url package). Second, sometimes the bibliography is read in several times (e.g., with the chapterbib package), an operation that would fail if ewcommand is used.

The other example command used above (SortNoop) was suggested by Oren Patashnik to guide BibTeX’s sorting algorithm in difficult cases. This algorithm normally does an acceptable job, but sometimes you might want to override BibTeX’s decision by specifying your own sorting key. This trick can be used with foreign languages, which have sorting rules different from those of English, or when you want to order the various volumes of a book in a way given by their original date of publication and independently of their re-edition dates.

Suppose that the first volume of a book was originally published in 1986, with a second edition appearing in 1991, and the second volume was published in 1990. Then you could write

Image

According to the definition of SortNoop, LaTeX throws away its argument and ends up printing only the true year for these fields. For BibTeX SortNoop is an “accent”; thus, it will sort the works according to the numbers 861991 and 901990, placing volume 1 before volume 2, just as you want.

Be aware that the above trick may not function with newer BibTeX styles (for example, those generated with custom-bib) and that some styles have added a sortkey field that solves such problems in a far cleaner fashion.

13.2.5. Cross-referencing entries

BibTeX entries can be cross-referenced. Suppose you specify cite{Wood:color} in your document, and you have the following two entries in the database file:

Image

The special crossref field tells BibTeX that the Wood:color entry should inherit missing fields from the entry it cross-references—Roth:postscript. BibTeX automatically puts the Roth:postscript entry into the reference list if it is cross-referenced by a certain number of entries (default 2) on a cite or ocite command, even if the Roth:postscript entry itself is never the argument of a cite or ocite command. Thus, with the default settings, Roth:postscript will automatically appear on the reference list if one other entry besides Wood:color cross-references it.

The default is compiled into the BibTeX program, but on modern installations1 it can be changed on the command-line by specifying --min-crossrefs together with the desired value:

Image

1 In BibTeX8 this option is named –min_crossrefs or -M.

For instance, the bibliography for Example 12-5-41 from page 738 was produced with the above setting to ensure that the proceedings entry was typeset as a separate reference even though there was only one cross-reference to it. On the other hand, if you want to avoid a separate entry for the whole proceedings regardless of the number of entries referencing it, set the --min-crossrefs option to a suitably large value (e.g., 500).

A cross-referenced entry must occur later in the database files than every entry that cross-references it. Thus, all cross-referenced entries could be put at the end of the database. Cross-referenced entries cannot themselves cross-reference another entry.

You can also use LaTeX’s cite command inside the fields of your BibTeX entries. This can be useful if you want to reference some other relevant material inside a note field:

Image

However, such usage may mean that you need additional LaTeX and BibTeX runs to compile your document properly. This will happen if the citation put into the .bbl file by BibTeX refers to a key that was not used in a citation in the main document. Thus, LaTeX will be unable to resolve this reference in the following run and will need an additional BibTeX and two additional LaTeX runs thereafter.

13.3. On-line bibliographies

If you search the Internet you will find a large number of bibliography entries for both primary and secondary publications in free as well as commercial databases. In this section we mention a few free resources on scientific publications that offer bibliographic data in BibTeX and some other formats.

Nelson Beebe maintains nearly 400 BibTeX databases related to scientific journals and particular scientific topics.1 These range from “Acta Informatica” and “Ada User Journal” to “X Journal” and “X Resource [journal]”. All are available as .bib source file, .html, .pdf, and .ps listings.

1 The bibliographic databases and support programs for maintaining and manipulating them can be found at http://www.math.utah.edu:8080/pub/tex/bib/index-table.html.

Nelson Beebe’s most interesting .bib databases, as far as TeX is concerned, are the files texbook2.bib and texbook3.bib (books about TeX, , and friends), type.bib (a list of articles and books about typography), gut.bib (the contents of the French Cahiers Gutenberg journal), komoedie.bib (the contents of the German Die TeXnische Komödie journal), texgraph.bib (sources explaining how to make TeX and graphics work together), texjourn.bib (a list of journals accepting TeX as input), tugboat.bib (all the articles in TUGboat), and standard.bib (software standards). The web resources provided by Nelson Beebe also include a series of BibTeX styles and many command-line tools for manipulating bibliography data (discussed in Section 13.4.3).

The Collection of Computer Science Bibliographies by Alf-Christian Achilles, containing more than 1.2 million references, can be found at http://liinwww.ira.uka.de/bibliography/index.html and at several mirror sites. The data included comes from external bibliographical collections like those created by Nelson Beebe. One added-value feature is the search functionality, which allows you to research authors, particular subjects, topics, and other categories. Nearly all of the reference data is available in BibTeX format.

Another interesting source is CiteSeer, Scientific Literature Digital Library, developed by Steve Lawrence, which can be found at http://citeseer.nj.nec.com. Helpful features include extensive search possibilities, context information on publications (e.g., related publications), citations to the document from other publications, statistical information about citations to a citation, and much more.

These examples represent merely a small selection of the vast amount of material found on the Internet. They might prove useful if you are interested in research papers on mathematics, computer science, and similar subjects.

13.4. Bibliography database management tools

As BibTeX databases are plain text files, they can be generated and manipulated with any editor that is able to write ASCII files. However, with large collections of BibTeX entries, this method can get quite cumbersome and finding information becomes more and more difficult. For this reason people started to develop tools to help with these tasks. Many of them can be found at http://www.tug.org/tex-archive/biblio/bibtex/utils/.

A selection of such tools is described in this section. They range from command-line tools for specific tasks to programs with a graphical user interface for general database maintenance. New products of both types are emerging, so it is probably worthwhile to check out available Internet resources (e.g., http://bibliographic.openoffice.org/biblio-sw.html).

13.4.1. biblist—Printing BibTeX database files

A sorted listing of all entries in a BibTeX database is often useful for easy reference. Various tools, with more or less the same functionality, are available, and choosing one or the other is mostly a question of taste. In this section we discuss one representative tool, the biblist package written by Joachim Schrod. It can create a typeset listing of (possibly large) BibTeX databases. Later sections show some more possibilities.

To use biblist you must prepare a LaTeX document using the article class. Options and packages like twoside, german, or geometry can be added. Given that entries are never broken across columns, it may not be advisable to typeset them in several columns using multicol, however.

The argument of the ibliography command must contain the names of all BibTeX databases you want to print. With a ibliographystyle command you can choose a specific bibliography style. By default, all bibliography entries in the database will be output. However, if you issue explicit ocite commands (as we did in the example), only the selected entries from the databases will be printed. Internal cross-references via the crossref field or explicit cite commands are marked using boxes around the key instead of resolving the latter.

13-4-1
Image

You must run LaTeX, BibTeX, and LaTeX. No additional LaTeX run is necessary, since the cross-references are not resolved to conserve space. For this reason you will always see warnings about unresolved citations.

13.4.2. bibtools—A collection of command-line tools

Several sets of interesting BibTeX tools are widely available. The first set was written (mostly) by David Kotz. His tools are collectively available for UN*X systems (or cygwin under Windows). You may have to adjust the library path names at the top of the scripts to make them work in your environment.

aux2bib Given an .aux file, this perl script creates a portable .bib file containing only the entries needed for the particular document. This ability is useful when LaTeX files need to be shipped elsewhere. The script works by using a special BibTeX style file (subset) to extract the necessary entries, which means that only standard fields are supported.

bibkey This C-shell script uses the sed, egrep, and awk utilities to prepare the list of all entries having a given string as (part of) their citation key.

Usage: bibkey string file

Characters in the string parameter above that have a special meaning in regular expressions used by either sed or egrep must be escaped with a (e.g., \ for the backslash). Case is ignored in the search. Any valid egrep expression is allowed, including, for example, a search for multiple keys:

Image

looktex Entries containing a given string in a BibTeX database are listed when this C-shell script is run. It is a generalization of the bibkey script, and all comments about that script also apply in this case.

makebib This C-shell script makes an exportable .bib file from a given set of .bib files and an optional list of citations.

Usage: makebib bibfile(s) [citekey(s)]

The output is written to subset.bib. If citekey(s) is not given, then all references in the bibfile(s) are included.

printbib This C-shell script makes a .dvi file from a .bib file for handy reference.

It is sorted by cite key and includes keyword and abstract fields.

Usage: printbib bibfile(s)

The file abstract.dvi is generated and can be run through a dvi driver to be printed. Figure 13.1 shows the output when running this shell script on the database jura.bib from page 717.

Image

Figure 13.1. Output of the program printbib

bib2html This perl script produces an HTML version of one or more BibTeX database files.

Usage: bib2html style [-o outputfile] bibfile(s)

There are several styles from which to choose; Figure 13.2 on the facing page was produced using the style alpha on the jura.bib database. If no outputfile is given, the file bib.html is used as a default. Instead of generating a listing of a complete database you can use the option -a and specify an .aux file, in which case a bibliography containing only references from this document is created.

Image

Figure 13.2. Output of the program bib2html

Usage: bib2html style [-o outputfile] -a auxfile

13.4.3. bibclean, etc.—A second set of command-line tools

A second set of tools to handle BibTeX databases were developed by Nelson Beebe. We give a brief description of each of them.

bibclean This C program is a pretty-printer, syntax checker, and lexical analyzer for BibTeX bibliography database files [13]. The program, which runs on UN*X, Vax/VMS, and Windows platforms, has many options, but in general you can just type

Image

For example, when used on the database file tex.bib, the bibclean program reports the following problem:

Image

bibextract This program extracts from a list of BibTeX files those bibliography entries that match a pair of specified regular expressions, sending them to stdout, together with all @preamble and @string declarations. Two regular expressions must be specified: the first to select keyword values (if this string is empty then all fields of an entry are examined), and the second to further select from the value part of the fields which bibliography entries must be output. Regular expressions should contain only lowercase strings.

For example, the following command will extract all entries containing “PostScript” in any of the fields:

Image

The next command will extract only those entries containing the string Adobe in the author or organization field:

Image

Note that one might have to clean the .bib files using bibclean before bibextract finds correct entries. For example, the two entries with author “Mittel-bach” are found with

Image

Using bibextract alone would fail because of the entry containing the line year={1980ff}.

citefind and citetags Sometimes you have to extract the entries effectively referenced in your publication from several large BibTeX databases. The Bourne shell scripts citefind and citetags use the awk and sed tools to accomplish that task. First, citetags extracts the BibTeX citation tags from the LaTeX source or .aux files and sends them to the standard output stdout. There, citefind picks them up and tries to find the given keys in the .bib files specified. It then writes the resulting new bibliography file to stdout. For instance,

Image

Nelson Beebe also developed the showtags package, which adds the citation key to a bibliography listing. In other words, it does a similar job to biblist as shown in Example 13-4-1 on page 775 or the program printbib as shown in Figure 13.1 on page 776.

13-4-2
Image

13.4.4. bibtool—A multipurpose command-line tool

The program bibtool was developed by Gerd Neugebauer for manipulating BibTeX databases. It combines many of the features from the programs and scripts discussed earlier and adds several new features under the hood of a single program. It is distributed as a C source file, though you may find precompiled binaries—for example, in the Debian distribution. It has been successfully compiled on many architectures, provided a suitable C compiler is available.

In this section we show some of the features provided by the program. Many more are described in the user manual [132] accompanying it.

Pretty-printing, merging, and sorting

In its simplest invocation you can call the program with one or more BibTeX databases as its argument(s), in which case the program acts as a pretty-printer and writes the result to stdout.1 If the option -o file is used, then the result is written to the specified file. For example, to use it on the database shown in Figure 12.2 on page 690, we could write

1 If no input files are specified bibtool reads from stdin. Thus, you can also use it as a filter in a UN*X pipe construction, which can be handy sometimes.

Image

This would produce a pretty-printed version of that database in new-tex.bib. All entries will be nicely indented, with every field on a separate line, and all the equals signs will be lined up. For instance, the worst-looking entry in tex.bib

Image

has now been reformatted as follows:

Image

Merging and sorting

If you specify several database files, then all are merged together in the output. If desired, you can sort them according to the reference keys (using the option -s or -S for reverse sort). Alternatively, you can specify your own sort key using the resource2 sort.format:

Image

2 Resources are program directives that you assign values. This is often done in external files (explained later); on the command line they can be specified after the -- option.

Be aware that sorting may produce an invalid bibliography file if the file contains internal cross-references, since the entries referenced via a BibTeX crossref field have to appear later in the database and this may not be the case after sorting. The manual explains how to define sort keys that take this problem into account.

Removing duplicate keys

Merging databases together may also result in duplicate entries or, more precisely, in entries that have the same reference keys for use with LaTeX. A database containing such duplicates will produce errors if processed by BibTeX. If you specify the option -d, then the duplicates are written out as comments rather than as real entries, which keeps BibTeX happy. However, it might mean that different entries are actually collapsed into a single one (if they happened to have identical keys), so you need to use this option with some care.

Normalization and rewriting of entries

BibTeX supports both double quotes and braces as field delimiters, so the mixture used in the GNUmake entry is perfectly legal though perhaps not advisable. A better approach is to stick to one scheme, always using braces or always using double quotes. The rewriting rule

Image

changes the field delimiters to brace groups, except in cases where strings are concatenated. It produces the following result for the sample entry:

Image

Readers who are familiar with regular expressions will probably be able to understand the rather complex field rewriting rule above without further explanation. If not, the manual discusses these features at great length.

External resource files

Rewriting rules (and, in fact, any other resource definitions) can also be placed in a separate file (default extension .rsc) and loaded using the option -r. For example, to remove double-quote delimiters you can use

Image

which loads the distribution file braces.rsc containing three rewriting rules similar to the one above covering additional cases.

Rewriting rules can be restricted to work only on certain fields by specifying those fields followed by a # sign before the regular expression pattern. For example, the following rule will rewrite the year field if it contains only two digits potentially surrounded by double quotes or braces and the first digit is not zero (since we do not know if 02 refers to 2002 or 1902):

Image

Semantic checks

Instead of rewriting you can do semantic checks using the check.rule resource. For instance,

Image

will generate a warning that a year field with suspicious contents was found if the field contains only two digits (in the message part @ is replaced by the entry type and $ by the reference key). Applying it to our sample database, we get

Image

More elaborate semantic checks are discussed in the user manual.

Removing @string declarations

BibTeX databases may also contain @string declarations used as abbreviations in the entries. In certain cases you may want those to be replaced by the strings themselves. This can be done as follows:

Image

This has the result that the series field for the entries lgc97 and lwc99 changes from

Image

to the expanded form

Image

The bibtool program expands strings whose definitions are found in the database files themselves—abbreviations that are part of the BibTeX style file are left untouched. If they should also get expanded, you have to additionally load a .bib file that contains them explicitly as @string declarations.

Extracting entries

For selecting a subset of entries from a database a number of possibilities exist. The option -x aux-file will check in the specified aux-file for citation requests and generate from them a new .bib file containing only entries required for the particular document. For example:

Image

There is no need to specify any source database(s), since this information is also picked up from the .aux file. Any cross-referenced entries will automatically be included as necessary.

Another possibility is provided with the option -X regexp, which extracts all entries whose reference key matches the regular expression regexp. For example,

Image

will select the two entries with the reference keys MR-PQ and Southall. Details on regular expressions can be found in the manual. Using regular expressions will select only entries that are explicitly matched. Thus, cross-referenced entries such as EP92 in this example will not be included automatically, though this outcome can be forced by setting the resource select.crossrefs to ON.

In addition, several resources can be set to guide selection. For example, to select all entries with Knuth or Lamport as the author or editor, you could say

Image

To find all entries of type book or article, you could say

Image

To find all entries that do not have a year field, you could say

Image

By combining such resource definitions in a resource file and by passing the results of one invocation of bibtool to another, it is possible to provide arbitrarily complicated rewriting and searching methods.

Reference key generation

As we learned in Chapter 12 the reference key, the string used as an argument in the cite command to refer to a bibliography entry, can be freely chosen (with a few restrictions). Nevertheless, it is often a good idea to stick to a certain scheme since that helps you remember the keys and makes duplicate keys less likely. The bibtool program can help here by changing the keys in a database to conform to such a scheme. Of course, that makes sense only for databases not already in use; otherwise, BibTeX would be unable to find the key specified in your documents.

Two predefined schemes are available through the options -k and -K. They both generate keys consisting of author names and the first relevant word of the title in lowercase (excluding “The” and similar words) and ignoring commands and braces. Thus, when running bibtool on the database from Figure 12.3 on page 717, and then searching for lines containing an @ sign (to limit the listing),

Image

we get the following output:

Image

The slightly strange key ending in :zivilproze is due to the fact that the entry contains Zivilprozess ordnung, making the program believe the word ends after ss, which itself is discarded because it is a command. Similarly, "u is represented as “u” in the fourth key. You can dramatically improve the situation by additionally loading the resource file tex_def.rsc. This file uses the tex.define resource to provide translation for common LaTeX commands, so that

Image

produces the keys

Image

Other BibTeX database-manipulating programs have similar problems in parsing blank-delimited commands, so it is usually better to use ss{} or {ss} in such places. For example, in Figure 13.2 on page 777 you can see that bib2html was also fooled by the notation and added an incorrect extra space in the first entry.

The other key-generating option (-K) is similar. It adds the initials of the author(s) after the name:

Image

Other schemes can be specified using the powerful configuration options documented in the user manual.

13.4.5. pybliographer—An extensible bibliography manager

The pybliographer scripting environment developed by Frédéric Gobry is a tool for managing bibliographic databases. In the current version it supports the following formats: BibTeX, ISI (web of knowledge), Medline, Ovid, and Refer/EndNote. It can convert from one format to another. It is written in Python, which means that it is readily available on UN*X platforms; usage on Windows systems may prove to be difficult, even though there are Python implementations for this platform as well. The home of pybliographer is http://pybliographer.org.

The graphical front end for pybliographer, which builds on the Gnome libraries, is called pybliographic. Upon invocation you can specify a database to work with, usually a local file, though it can be a remote database specified as a URL. For example,

Image

will bring up a work space similar to the one shown in Figure 13.3 on the facing page. It will be similar, but not identical, because the graphical user interface is highly customizable. For instance, in the version used by the author an “editor” column was added between “author” and date columns in the main window. If you wish to see other fields use the preference dialog (Settings → Preferences → Gnome). On UN*X systems the preferences are stored in the file .pybrc.conf. Although this file is not user editable, you can remove it to restore the default configuration if necessary.

Image

Figure 13.3. The pybliographic work space

Hierarchical searching

Figure 13.3 shows several other interesting features. On the bottom of the main window you see that the loaded database (tugboat.bib) contains 2446 entries, 3 of which are currently displayed. This is due to the fact that we searched it for entries matching the regular expression pattern Mittelbach in the author field (30 entries found), within the results searched for entries containing LaTeX3 or class design in the title field (5 entries found), and within these results restricted the search to publications from the years 1995 to 1999. The search dialog window shows the currently defined hierarchical views available. By clicking on either of them you can jump between the different views; by right-clicking you can delete views no longer of interest. The fields available for searching are customizable. The initial settings offer only a few fields.

To edit an existing entry you can double-click it in the main window. Alternatively, you can use the Edit menu from the toolbar, or you can right-click an entry, which pops up a context menu. The latter two possibilities can also be used to delete entries or add new ones. The edit dialog window shows the entry in a format for manipulation opened at the “Mandatory” tab holding the fields that are mandatory for the current entry type. In addition, there are the optional fields in the “Optional” tab and possibly other fields in the “Extra” tab. This classification is done according to the current settings and can be easily adjusted according to your own preference. While pybliographic is capable of correctly loading databases with arbitrary field names, they will all appear in the Extra tab, which may not be convenient if you work with extended BibTeX styles such as jurabib that consider additional fields to be either required or optional. In such cases it pays to adjust the default settings (Settings → Entries, Fields).

Signaling dangerous contents

To the right of the fields you can see round buttons that are either green or red. With the red buttons pybliographic signals that the field content contains some data that the program was unable to parse correctly and that editing the text is likely to result in loss of data. For example, in the title field it was unable to interpret the command LaTeX{} correctly and so displayed LaTeX instead. The journal field is flagged because the database actually contains

Image

This reference to an abbreviation would get lost the moment you modify that particular field. To modify such entries you have to change to “Native Editing”, as shown in Figure 13.4. This can be done by clicking the “Native Editing” button in the editing dialog window. The window then changes to the format shown in the middle window of Figure 13.4, offering a standard BibTeX entry format that you can manipulate at will. It is then your responsibility to ensure that the BibTeX syntax is obeyed. As seen in the right window in that figure, there is the possibility to make the native editing mode serve as the default.

Image

Figure 13.4. Native editing in pybliographic

Image Default capitalization rules

While loading a database pybliographic does some capitalization normalization on a number of fields (e.g., title). As this is better done by BibTeX when formatting for a particular journal you should consider disabling this feature (Settings → Preferences → Bibtex+ → Capitalize). In fact, with languages other than English you have to disable it to avoid proper nouns being incorrectly changed to lowercase.

The distribution also contains a number of command-line scripts. The documentation describes how to provide additional ones. For example, to convert files between different formats you can use pybconvert. The script

Image

converts the BibTeX database tex.bib to the Refer format, resulting in output such as the following:

Image

Depending on the contents of individual fields you may receive warnings, such as “warning: unable to convert ' extsl'”, since pybliographer has no idea how to convert such commands to a non-TeX format such as Refer. In that case you should manually correct the results as necessary.

The script pycompact is similar to the aux2bib perl script or the -x option of bibtool discussed earlier. However, unlike the latter option, it does not include cross-referenced entries, so it is safer to use bibtool if available.

An interesting script is pybcheck, which expects a list of BibTeX database files or a directory name as its argument. It then checks all databases for correct syntax, duplicate keys, and other issues. For example, running pybcheck EX results in

Image

This script simply verifies the individual databases, so duplicate entries across different files are not detected.

Emacs users can run the command directly from a compile buffer via M-x compile followed by pybcheck file(s). From the output window you can then jump directly to any error detected using the middle mouse button.

13.4.6. JBibtexManager—A BibTeX database manager in Java

The JBibtexManager program developed by Nizar Batada is a BibTeX database manager written in Java; see Figure 13.5 on the following page. Due to the choice of programming language it works on all platforms for which Java 1.4 or higher is available (e.g., Windows, UN*X flavors, Mac).

Image

Figure 13.5. The JBibtexManager work space (German locale)

This program offers searching on the author, editor, title, and keyword values; sorting on the type, reference key, author, year, title, journal, editor, and keywords; and, of course, standard editing functions, including adding, deleting, copying, and pasting between different bibliographies. It automatically detects duplicate reference keys if bibliographies are merged. In addition, it offers the possibility to search a bibliography for duplicate entries (i.e., entries that differ only in their reference keys, if at all).

Like pybliographic, this program can import data in several bibliography formats: BibTeX, INSPEC, ISI (web of knowledge), Medline (XML), Ovid, and Scifinder. Export formats of HTML and plain text are available. With formats that do not contain any reference key information, the program automatically generates suitable keys provided the author information is structured in a way the program understands.

Although JBibtexManager is intended to work primarily with BibTeX databases, importing such files for the first time can pose some problems as not all syntax variations of the BibTeX format are supported. In particular, there should be at most one field per line. Thus, the GNUmake entry in our sample tex.bib database would not be parsed correctly. In addition, entries are recognized only if the entry type (starting with the @ sign) starts in the first column. If not, the entry is misinterpreted as a comment and dropped.1

1 Most of these restrictions have been lifted in the new version of JBibtexManager.

Of course, these types of problems happen only the first time an externally generated bibliography is loaded; once the data is accepted by the system, it will be saved in a way that enables it to be reloaded again. One way to circumvent the problems during the initial loading is to preprocess the external database with a tool like bibtool or bibclean, since after validation and pretty-printing the entries are in an acceptable format.

Unknown fields in a database entry are neither visible nor modifiable except when using the “raw BibTeX” mode in the newest version of the program. It is, however, possible to customize the recognized fields on a per-type basis so that the program is suitable for use with extended BibTeX styles such as those used by jurabib or natbib.

The program is not available on CTAN. Its current home is http://jabref.sourceforge.net/, where it was merged with a similar project called BibKeeper under the new name JabRef.

13.4.7. BibTexMng—A BibTeX database manager for Windows

The BibTexMng program developed by Petr and Nikolay Vabishchevich implements a BibTeX database manager on Windows; see Figure 13.6 on the next page. It supports all typical management tasks—editing, searching, sorting, moving, or copying entries from one file to another.

Image

Figure 13.6. The BibTexMng work space

In contrast to pybliographic or JBibtexManager, the BibTexMng program deals solely with BibTeX databases; it has no import or export functions to other bibliographical formats. The only “foreign” export formats supported are .bbl files and .htm files (i.e., processing a selection of entries with BibTeX or BibTeX8 from within the program and producing HTML from a selection of entries).

In the current release the program unfortunately knows about only the standard BibTeX entry types (see Table 13.1 on page 763), the standard BibTeX fields (Table 13.2), and the following fields:

Image

Not usable with jurabib et al.

Any other field is silently discarded the first time a BibTeX database is loaded; the same thing happens to entry types if they do not belong to the standard set. This means that the program is not usable if you intend to work with BibTeX styles, such as jurabib, that introduce additional fields or types, as neither can be represented by the program. It does, however, work for most styles available, including those intended for natbib (e.g., styles generated with custom-bib).

Another limitation to keep in mind is that the BibTexMng program does not support @string declarations. If those are used in an externally generated BibTeX database, you have to first remove them before using the database with BibTexMng. Otherwise, the entries will be incorrectly parsed. To help with this task the program offers to clean an external database for you (via File → Cleaning of BibTeX database). This operation replaces all strings by their definitions and removes all unknown fields, if any exist.

13.5. Formatting the bibliography with BibTeX styles

Now that we know how to produce BibTeX database entries and manipulate them using various management tools, it is time to discuss the main purpose of the BibTeX program. This is to generate a bibliography containing a certain set of entries (determined from the document contents) in a format conforming to a set of conventions.

We first discuss the use of existing styles and present example results produced by a number of standard and nonstandard styles. We then show how the custom-bib package makes it possible to produce customized styles for nearly every requirement with ease.

13.5.1. A collection of BibTeX style files

Various organizations and individuals have developed style files for BibTeX that correspond to the house style of particular journals or editing houses. Nelson Beebe has collected a large number of BibTeX styles. For each style he provides an example file, which allows you to see the effect of using the given style.1 Some of the BibTeX styles—for instance, authordate i, jmb, and named—must be used in conjunction with their accompanying LaTeX packages (as indicated in Table 13.4) to obtain the desired effect.

1 See Appendix C to find out how you can obtain these files from one of the TeX archives if they are not already on your system.

Image
Image
Image

Table 13.4. Selected BibTeX style files

You can also customize a bibliography style, by making small changes to one of those in the table (see Section 13.6.3 for a description of how this is done). Alternatively, you can generate your own style by using the custom-bib program (as explained in Section 13.5.2 on page 798).

In theory, it is possible to change the appearance of a bibliography by simply using another BibTeX style. In practice, there are a few restrictions due to the fact that the BibTeX style interface was augmented by some authors so that their styles need additional support from within LaTeX. We saw several such examples in Chapter 12. For instance, all the author-date styles need a special LaTeX package such as natbib or harvard to function, and the BibTeX styles for jurabib will work only if that package is loaded.

On the whole the scheme works quite well, and we prove it in this section by showing the results of applying different BibTeX styles (plus their support packages if necessary) without otherwise altering the sample document. For this we use the by now familiar database from Figure 12.2 on page 690 and cite five publications from it: an article and a book by Donald Knuth, which will show us how different publications by the same author are handled; the manual from the Free Software Foundation, which is an entry without an author name; the unpublished entry with many authors and the special BibTeX string “and others”; and a publication that is part of a proceeding, so that BibTeX has to include additional data from a different entry.

In our first example we use the standard plain BibTeX style, which means we use the following input:

Image

To produce the final document, the example LaTeX file has to be run through LaTeX once to get the citation references written to the .aux file. Next, BibTeX processes the generated .aux file, reading the relevant entries from the BibTeX database tex.bib. The actual bibliography style in which the database entries are to be output to the .bbl file for later treatment by LaTeX is specified with the command ibliographystyle in the LaTeX source. Finally, LaTeX is run twice more—first to load the .bbl file and again to resolve all references.1 A detailed explanation of this procedure was given in Section 12.1.3 on page 687, where you will also find a graphical representation of the data flow (Figure 12.1).

1 In fact, for this example only one run is necessary—there are no cross-references to resolve because we used ocite throughout.

The plain style has numeric labels (in brackets) and the entries are alphabetically sorted by author, year, and title. In case of the GNU manual the organization was used for sorting. This will give the following output:

13-5-1
Image

By replacing plain with abbrv we get a similar result. Now, however, the entries are more compact, since first names, month, and predefined journal names (Table 13.3 on page 771) are abbreviated. For instance, ibmjrd in the second reference now gives “IBM J. Res. Dev.” instead of “IBM Journal of Research and Development”.

13-5-2
Image

With the standard BibTeX style unsrt we get the same result as with the plain style, except that the entries are printed in order of first citation, rather than being sorted. The standard sets of styles do not contain a combination of unsrt and abbrv, but if necessary it would be easy to integrate the differences between plain and abbrv into unsrt to form a new style.

13-5-3
Image

The standard style alpha is again similar to plain, but the labels of the entries are formed from the authors’ names and the year of publication. The slightly strange label for the GNU manual is due to the fact that the entry contains a key field from which the first three letters are used to form part of the label. Also note the interesting label produced for the reference with more than three authors. The publications are sorted, with the label being used as a sort key, so that now the GNU manual moves to fourth place.

13-5-4
Image

Many BibTeX styles implement smaller or larger variations of the layouts produced with the standard styles. For example, the phaip style for American Institute of Physics journals implements an unsorted layout (i.e., by order of citation), but omits article titles, uses abbreviated author names, and uses a different structure for denoting editors in proceedings. Note that the entry with more than three authors has now been collapsed, showing only the first one.

13-5-5
Image

If we turn to styles implementing an author-date scheme, the layout usually changes more drastically. For instance, labels are normally suppressed (after all, the lookup process is by author). The chicago style, for example, displays the author name or names in abbreviated form (first name reversed), followed by the date in parentheses. In addition, we see yet another way to handle the editors in proceedings and instead of the word “pages” we get “pp.” For this example we loaded the natbib package to enable author-date support.

13-5-6
Image

As a final example we present another type of layout that is implemented with the help of the jurabib package. Since more customizing is necessary we show the input used once more. The trick used to suppress the heading is not suitable for use in real documents as the space around the heading would be retained!

Image

This will produce a layout in which the author name is replaced by a rule if it has been listed previously. In case of multiple authors the complete list has to be identical (see first two entries). Also, for the first time ISBN and ISSN numbers are shown when present in the entry. If you look closely, you will see many other smaller and larger differences. For example, this is the first style that does not translate titles of articles and proceeding entries to lowercase but rather keeps them as specified in the database (see page 809 for a discussion of how BibTeX styles can be modified to achieve this effect).

As the original application field for jurabib was law citations, it is one of the BibTeX styles that does not provide default strings for the journals listed in Table 13.3 on page 771; as a result, we get an incomplete second entry. BibTeX will warn you about the missing string in this case. You can then provide a definition for it in the database file or, if you prefer, in a separate database file that is loaded only if necessary.

13-5-7
Image

13.5.2. custom-bib—Generate BibTeX styles with ease

So far, we have discussed how to influence the layout of the bibliography by using different bibliography styles. If a particular BibTeX style is recommended for the journal or publisher you are writing for, then it is all that is necessary. However, a more likely scenario is that you have been equipped with a detailed set of instructions that tell you how references should be formatted, but without pointing you to any specific BibTeX style—a program that may not even be known at the publishing house.

Hunting for an existing style that fits the bill or can be adjusted slightly to do so (see Section 13.6.3) is an option, of course, but given that there are usually several variations in use for each typographical detail, the possibilities are enormous and thus the chances of finding a suitable style are remote. Consider the following nine common requirements for presenting author names:

Image

Table 13.5. Requirements for formatting names

Combining these with a specification for the separation symbol to use (e.g., comma, semicolon, slash), the fonts to use for author names (i.e., Roman, bold, small caps, italic, other), and perhaps a requirement for different fonts for surname and first names, you will get more than 500 different styles for presenting author names in the bibliography. Clearly, this combinatorial explosion cannot be managed by providing predefined styles for every combination.

Faced with this problem, Patrick Daly, the author of natbib, started in 1993 to develop a system that is capable of providing customized BibTeX styles by collecting answers to questions like the above (more than 70!) and then building a customized .bst file corresponding to the answers.

The system works in two phases: (1) a collection phase in which questions are interactively asked and (2) a generation phase in which the answers are used to build the BibTeX style. Both phases are entirely done by using LaTeX and thus can be carried out on any platform without requiring any additional helper program.

The collection is started by running the program makebst.tex through LaTeX and answering the questions posed to you. Most of the questions are presented in the form of menus that offer several answers. The default answer is marked with a * and can be selected by simply pressing return. Other choices can be selected by typing the letter in parentheses in front of the option. Selecting a letter not present produces the default choice.

Initializing the system

We now walk you through the first questions, which are somewhat special because they are used to initialize the system. Each time we indicate the suggested answer.

Image

Replying with y will produce a description of the procedure (as explained above); otherwise, the question has no effect.

Image

Here the correct answer is return. The default merlin.mbs is currently the only production master file available, though this might change one day.

Image

Specify the name for your new BibTeX style file, without an extension—for example, ttct (Tools and Techniques for Computer Typesetting series). As a result of completing the first phase you will then receive a file called ttct.dbj from which the BibTeX style file ttct.bst is produced in the second phase.

Image

Enter any free-form text you like, but note that a return ends the comment. It is carried over into the resulting files and can help you at a later stage to identify the purpose of this BibTeX style.

Image

If you enter y to this question the context of later questions will be shown in the following form:

Image

Whether this provides any additional help is something you have to decide for yourself. The default is not to provide this extra information.

Image

If you are generating a BibTeX style for a language other than English you can enter the name of the language here. Table 13.6 lists currently supported languages. Otherwise, reply with return.

Image
Image

Table 13.6. Language support in custom-bib (summer 2003)

By answering y you can load predefined journal names for certain disciplines into the BibTeX style. You are then asked to specify the files containing these predefined names (with suitable defaults given).

This concludes the first set of questions for initializing the system. What follows are many questions that offer choices concerning layout and functional details. These can be classified into three categories:

Citation scheme The choice made here influences later questions. If you choose author-date support, for example, you will get different questions than if you choose a numerical scheme.

Extensions These questions are related to extending the set of supported BibTeX fields, such as whether to include a url field.

Typographical details You are asked to make choices about how to format specific parts of the bibliographical entries. Several of the choices depend on the citation scheme used.

While it is possible to change your selections in the second phase of the processing (or to start all over again), it is best to have a clear idea about which citation scheme and which extensions are desired before beginning the interactive session. The typographical details can be adjusted far more easily in the second phase if that becomes necessary. We therefore discuss these main choices in some detail.

Selecting the citation scheme

The citation scheme is selected by answering the following question:

Image

The default choice is “numerical”. If you want to produce a style for the author-date scheme, select a (and disregard the mentioning of “nonstandard interface”). For alpha-style citations, use either b, o, or f depending on the label style you prefer. Choice c is of interest only if you want to produce a style for displaying BibTeX databases, so do not select it for production styles.

If the default (i.e., a numerical citation scheme) was selected, the follow-up question reads:

Image

Select the default. All other choices generate BibTeX styles that produce some sort of HTML output (which needs further manipulation before it can be viewed in browsers). This feature is considered experimental.

If you have selected an author-date citation scheme (i.e., a), you will be rewarded with a follow-up question for deciding on the support interface from within LaTeX:

Image

The default choice, natbib, is usually the best, offering all the possibilities described in Sections 12.3.2 and 12.4.1. The option o should not be selected. If you have documents using citation commands from, say, the harvard package (see Example 12-3-4 on page 700), the option h would be suitable. For the same reason, the other options might be the right choice in certain circumstances. However, for document portability, natbib should be the preferred choice. Note in particular that some of the other packages mentioned in the options are no longer distributed in the mainstream LaTeX installation.

Determining the extensions supported

Besides supporting the standard BibTeX entry types (Table 13.1 on page 763) and fields (Table 13.2), makebst.tex can be directed to support additional fields as optional fields in the databases, so that they will be used if present. Some of these extensions are turned off by default, even though it makes sense to include them in nearly every BibTeX style file.

Image

Replying with l will greatly help in presenting foreign titles properly. Example 12-5-6 on page 719 shows the problems that can arise and explains how they can be resolved when a language field is present (see Example 12-5-36 on page 734). So a deviation from the default is suggested.

Image

Choosing a will integrate support for an annote field in the .bst file as well as support for including annotations stored in files of the form citekey.tex. However, in contrast to jurabib, which also offers this feature, the inclusion cannot be suppressed or activated using a package option. Since you are quite likely to want this feature turned on and off depending on the document, you might be better served by using two separate BibTeX styles differing only in this respect.

The nonstandard field eid (electronic identifier) is automatically supported by all generated styles. The fields doi, isbn, and issn are included by default but can be deselected. Especially for supporting the REVTeX package from the American Physical Society, a number of other fields can be added.

Finally, support for URLs can be added by answering the following question with something different from the default.

Image

We suggest including support for URLs as references to electronic resources become more and more common. In the bibliography the URL is tagged with urlprefixurl{field-value}, with default definitions for both commands. By loading the url package, better line breaking can be achieved.

As one of the last questions you are offered the following choice:

Image

We strongly recommend retaining the default! LaTeX2ε is nearly a decade old, and NFSS should have found its way into every living room. Besides, the plain TeX commands ( m, f, and so on) are no longer officially part of LaTeX. They may be defined by a document class (for compatibility reasons with LaTeX 2.09)—but then they may not. Thus, choosing the obsolete syntax may result in the BibTeX style not functioning properly in all circumstances.1

1 Warning: in older versions the question was “NEW FONT SELECTION SCHEME” and the default was to use the obsolete commands. So be careful.

Note that the questions about the extensions are mixed with those about typographical details and do not necessarily appear in the order presented here.

Specifying the typographical details

The remaining questions (of which there are plenty) concern typographical details, such as formatting author names, presenting journal information, and many more topics. As an example we show the question block that deals with the formatting of article titles:

Image

If you make the wrong choice with any of them, do not despair. You can correct your mistake in the second phase of the processing as explained below.

Generating the BibTeX style from the collected answers

The result of running makebst.tex through LaTeX and answering all these questions is a new file with the extension .dbj. It contains all your selections in a special form suitable to be processed by DOCSTRIP, which in turn produces the final BibTeX style (see Section 14.2 for a description of the DOCSTRIP program). Technically speaking, a BibTeX bibliographic style file master (merlin.mbs by default) contains alternative coding that depends on DOCSTRIP options. By choosing entries from the interactive menus discussed above, some of this code is activated, thereby providing the necessary customization.

If you specified ttct in response to the question for the new .bst file, for example, you would now have a file ttct.dbj at your disposal. Hence, all that is necessary to generate the final BibTeX style ttct.bst is to run

Image

The content of the .dbj files generated from the first phase is well documented and presented in a form that makes further adjustments quite simple. Suppose you have answered y in response to the question about the title of articles on the previous page (i.e., use double quotes around the title) but you really should have replied with d (use double quotes around title and punctuation). Then all you have to do is open the .dbj file with a text editor and search for the block that deals with article titles:

Image

Changing the behavior then entails nothing more than uncommenting the line you want and commenting out the line currently selected:

Image

After that, rerun the file through LaTeX to obtain an updated BibTeX style.

13.6. The BibTeX style language

This section presents a condensed introduction to the language used in BibTeX style files. The information should suffice if you want to slightly modify an existing style file. For more details, consult Oren Patashnik’s original article, “Designing BibTeX Styles” [136].

BibTeX styles use a postfix stack language (like PostScript) to tell BibTeX how to format the entries in the reference list. The language has 10 commands, described in Table 13.7 on page 807, to manipulate the language’s objects: constants, variables, functions, the stack, and the entry list.

Image
Image

Table 13.7. BibTeX style file commands

BibTeX knows two types of functions: built-in functions, provided by BibTeX itself (see Table 13.8 on page 808), and user functions, which are defined using either the MACRO or FUNCTION command.

Image
Image

Table 13.8. BibTeX style file built-in functions

You can use all printing characters inside the pair of double quotes delimiting string constants. Although BibTeX, in general, ignores case differences, it honors the case inside a string. Spaces are significant inside string constants, and a string constant cannot be split across lines.

Variable and function names cannot begin with a numeral and may not contain any of the 10 restricted characters shown on page 769. BibTeX ignores case differences in the names of variables, functions, and macros.

Constants and variables can be of type integer or string (Boolean true and false are represented by the integers 1 and 0, respectively).

There are three kinds of variables:

Global variables These are either integer- or string-valued variables, which are declared using an INTEGERS or STRINGS command.

Entry variables These are integer- or string-valued variables, which are declared using the ENTRY command. Each of these variables will have a value for each entry on the list read in a BibTeX database.

Fields These are string-valued, read-only variables that store the information from the database file. Their values are set by the READ command. As with entry variables there is a value for each entry.

13.6.1. The BibTeX style file commands and built-in functions

Table 13.7 on page 807 gives a short description of the 10 BibTeX commands. Although the command names appear in uppercase, BibTeX ignores case differences.

It is recommended (but not required) to leave at least one blank line between commands and to leave no blank lines within a command. This convention helps BibTeX recover from syntax errors.

Table 13.8 on page 808 gives a short overview of BibTeX’s 37 built-in functions (for more details, see [136]). Every built-in function with a letter in its name ends with a $ sign.

13.6.2. The documentation style btxbst.doc

Oren Patashnik based the standard BibTeX style files abbrv, alpha, plain, and unsrt on a generic file, btxbst.doc, which is well documented and should be consulted for gaining a detailed insight into the inner workings of BibTeX styles.

In the standard styles, labels have two basic formatting modes: alphabetic, like [Lam84], and Numeric, like [34]. References can be ordered in three ways:

Sorted, alphabetic labels Alphabetically ordered, first by citation label, then by author(s) (or its replacement field), then by year and title.

Sorted, numeric labels Alphabetically ordered, first by author(s) (or its replacement field), then by year and title.

Unsorted Printed in the order in which the references are cited in the text.

The basic flow of a style file is controlled by the following command-lines, which are found at the end of the btxbst.doc file:

Image

These commands are explained in Tables 13.7 and 13.8.

The code of a style file starts with the declaration of the available fields with the ENTRY declaration and the string variables to be used for the construction of the citation label.

Next come some functions for formatting chunks of an entry. There are functions for each of the basic fields. The format.names function parses names into their “First von Last, Junior” parts, separates them by commas, and puts an “and” before the last name (but ending with “et al.” if the last of multiple authors is "others"). The format.authors function applies to authors, and format.editors operates on editors (it appends the appropriate title: “, editor” or “, editors”).

The next part of the file contains all the functions defining the different types accepted in a .bib file (i.e., functions like article and book). These functions actually generate the output written to the .bbl file for a given entry. They must precede the READ command. In addition, a style designer should provide a function default.type for unknown types.

Each entry function starts by calling output.bibitem to write ibitem and its arguments to the .bbl file. Then the various fields are formatted and printed by the function output or output.check, which handles the writing of separators (commas, periods, ewblock’s) as needed. Finally, fin.entry is called to add the final period and finish the entry.

The built-in functions are preceded by the variable they consume on the stack. If they leave a result on the stack, it is shown in parentheses. A “literal” L is an element on the stack. It can be an integer I, a string S, a variable V, a function F, or a special value denoting a missing field. If the popped literal has an incorrect type, BibTeX complains and pushes the integer 0 or the null string, depending on the function’s resulting type.

The next section of the btxbst.doc file contains definitions for the names of the months and for certain common journals. Depending on the style, full or abbreviated names may be used. These definitions are followed by the READ command, which inputs the entries in the .bib file.

Then the labels for the bibliographic entries are constructed. Exactly which fields are used for the primary part of the label depends on the entry type.

The labels are next prepared for sorting. When sorting, the sort key is computed by executing the presort function on each entry. For alphabetic labels you might have to append additional letters (a, b, ...) to create a unique sorting order, which requires two more sorting passes. For numeric labels, either the sorted or the original order can be used. In both cases, you need to keep track of the longest label for use with the thebibliography environment.

Finally, the .bbl file is written by looping over the entries and executing the call.type$ function for each one.

13.6.3. Introducing small changes in a style file

Often it is necessary to make slight changes to an existing style file to suit the particular needs of a publisher.

As a first example, we show you how to eliminate the (sometimes unpleasant) standard BibTeX style feature that transforms titles to lowercase. In most cases, you will want the titles to remain in the same case as they are typed. A variant of the style unsrt can be created for this purpose. We will call it myunsrt, since it is different from the original style. Similar methods can be used for other styles.

Looking at Table 13.8 on the facing page, you will probably have guessed that function change.case$ is responsible for case changes. With the help of an editor and looking for the above string, you will find that function format.title must be changed. Below we show that function before and after the modification:

Image

With the help of Table 13.8 on the preceding page, you can follow the logic of the function and the substitution performed.

Another function that must be changed in a similar way is format.edition. Here we can omit the inner if statement since there would be no difference in the branches.

Image

In format.chapter.pages, format.thesis.type, and format.tr.number, similar changes must be made.

Adding a new field

Sometimes you may want to add a new field. As an example, let’s add an annote field. Two approaches can be taken: the one adopted in the style annotate or the one used in the style annotation. Let us look at the simpler solution first. The style annotation, based on plain, first adds the field annote to the ENTRY definition list; the fin.entry function is changed then to treat the supplementary field. As seen in the example of the function book, the function fin.entry is called at the end of each function defining an entry type.

Image

After outputting the citation string inside a quotation environment, the annotation text is written following the text “Annotation”, which starts a separate line. If the field is absent, nothing is written (the test, annote missing$, takes the skip$ branch of the if$ command).

The other style, annotate, based on alpha, takes a more complicated approach. After adding the element annotate to the ENTRY definition list, the function format.annotate is created to format that supplementary field. The function has a decision flow similar to the code shown above.

Image

The formatting routine for each of the entry types of Table 13.1 on page 763 has a supplementary line format.annotate write$ just following the call to fin.entry.

Foreign language support

If you want to adapt a BibTeX style to languages other than English, you will, at the very least, have to translate the hard-coded English strings in the BibTeX style files, like “edition” in the example at the facing page.

First you should edit a style file and introduce the new terms in the necessary places. As you are working with only one language, it is possible to introduce the proper language-specific typographic conventions at the same time. An example of this approach is the nederlands style developed by Werenfried Spit. This harvard-based style has been adapted to Dutch following the recommendations of Van Dale (1982). We will now look at some examples of functions that were adapted by this style.

In Dutch, one does not distinguish between one or more editors. The generic Dutch word redactie replaces the two possibilities.

Image

The following examples show how, for one particular language, you can go relatively far in the customization (in form and translation) of an entry—in this case, the format of the edition field. In this example, up to the third edition, Dutch-specific strings are used. Starting with the fourth edition, the generic string ie is used, where i is the number of the edition. You can also see the nesting of the if$ statements and the use of the case-changing command change.case$.

Image

Of course, the strings for the names of the months should be changed and some other language-specific strings can be defined.

Image

In addition, the sorting routine for the names, sort.format.names, must know about the language-dependent rules for showing names in the right order.

Also, most languages have articles or other short words that should be ignored for sorting titles.

Image

Here the chop.word function chops the word specified from the string presented on the stack—in this case, the definite (De) and indefinite (Een) articles.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.239.17