7.8. The LaTeX world of symbols

Shortly after TeX and came into existence, people started to develop new symbol fonts for use with the system. Over time the set of available symbols grew to a considerable number. The Comprehensive LaTeX Symbol List by Scott Pakin [134] lists 2590 symbols1 and the corresponding LaTeX commands that produce them. For some symbols the necessary fonts and support packages may have to be obtained (e.g., from a CTAN host; see Appendix C) and installed by the user. They are usually accompanied by installation instructions and general documentation.

1 Counted spring 2003.

The fonts and packages described in this section form only a subset of what is available. If you cannot find a symbol here, the 70 pages of [134] are a valuable resource for locating what you need. We start by looking at a number of dingbat fonts, some of which contain quite unusual symbols. This examination is followed by an introduction to the TIPA system, which provides support for phonetic symbols. The section finishes with a discussion of ways to obtain a single (though in Europe not unimportant) symbol: the euro. Being a relatively new addition to the symbol world, it is missing in many fonts and thus needs alternative ways to produce it. All packages and fonts listed in this section and in [134] are freely available.

7.8.1. dingbat—A selection of hands

The dingbat package written by Scott Pakin provides access to two symbol fonts developed by Arthur Keller (ark10.mf) and Doug Henderson (dingbat.mf). The package makes a set of hands and a few other symbols available; the example shows most of them. Note that the largepencil glyph is bigger than the space it officially occupies (shown by the frame drawn around it).

7-8-1
Image

These fonts exist only as a implementation, so they are not really suitable when intending to produce PDF (e.g., with pdfTeX).

7.8.2. wasysym—Waldi’s symbol font

The wasysym package developed by Axel Kielhorn provides access to the wasy fonts designed by Roland Waldi. These fonts first appeared in 1989 and are nowadays available both in source and Type 1 outlines. They cover a wide range of symbols from different areas, including astronomical and astrological symbols, APL, musical notes, circles, and polygons and stars (see Table 7.19 on the facing page). The wasysym package defines command names like phone to access each glyph. Alternatively, if you want only a few glyphs from the font, you can use the pifont interface and access the symbols directly under the name wasy.

Image

Table 7.19. Glyphs in the wasy fonts

7-8-2
Image

7.8.3. marvosym—Interface to the MarVoSym font

The MarVoSym font designed by Martin Vogel is another Pi font containing symbols from various areas including quite uncommon ones, such as laundry signs (in case you are doing your own laundry lists Image), astronomy and astrology symbols, and many others.

The LaTeX support package marvosym was written by Thomas Henlich, who also converted the font from TrueType format to PostScript Type 1. This package defines command names for all symbols, some of which are listed in the next example; the full set is given in marvodoc.pdf accompanying the distribution.

7-8-3
Image

Assuming a recent distribution, one can also access the symbols directly by using the glyph chart in Table 7.20 on the preceding page and the pifont interface with the Pi font name being mvs. In older distributions the file umvs.fd that makes this method work might be missing, but it can be easily added as shown below.

Image Image

Table 7.20. Glyphs in the MarVoSym font

7-8-4
Image

7.8.4. bbding—A alternative to Zapf Dingbats

For those who cannot use PostScript Type 1 fonts, Karel Horak designed a font with containing most of the symbols from Hermann Zapf’s dingbat font. The package bbding by Peter Møller Neergaard provides an interface that defines command names for each symbol (using a naming convention modeled after WordPerfect’s names for accessing the Zapf Dingbats font). The complete list can be found in the package documentation, a few examples are given below.

7-8-5
Image

Alternatively, referring to the glyph chart in Table 7.21 on the following page, you can address individual symbols via the pifont interface, by accessing the font under the name ding (compare this to Table 7.9 on page 379 showing the original Zapf designs).

Image

Table 7.21. Glyphs in the font bbding

7-8-6
Image

7.8.5. ifsym—Clocks, clouds, mountains, and other symbols

The ifsym package written by Ingo Klöckl provides access to a set of symbol fonts designed in . At present they are not available in Type 1 format. Depending on the chosen package option(s), different symbol sets are made available. We show only a small selection here. The full documentation (German only) is provided in the PostScript file ifsym.ps, which is part of the distribution. All available symbols are also listed in [134].

The option clock makes seven clock-related symbols available. It also provides the command showclock to display an analog watch, with the hands showing the correct time. Its two arguments denote the hour (0–11) and minutes (0–59). The minutes displayed are rounded to the nearest 5-minute interval; using a value greater than 11 for the hour makes the symbol disappear without warning. All symbols are available in normal and bold extended series.

7-8-7
Image

The option weather defines 22 weather symbols, a few of which are shown on the first line of the next example. The Thermo command displays a different thermometer symbol depending on the number in its argument (0–6).

For alpinists and travelers the option alpine provides 17 symbols for use in route descriptions or maps. The option misc offers a set of unrelated symbols, some of which are also found in other fonts, and the option geometry provides commands for 30 geometric shapes, some of which are shown on the fourth line of the example.

7-8-8
Image

The command extifsymbol allows you to access symbols by their slot positions. Its optional argument defines the symbol font to use (default ifsym). Glyph charts of all ifsym fonts are part of the package documentation. Somewhat more interesting is the command extifsym, which allows you to produce pulse diagrams. It can also be used to display digital digits (where b denotes an empty space of the right width).

7-8-9
Image

7.8.6. tipa—International Phonetic Alphabet symbols

The TIPA bundle [50] developed by Rei Fukui consists of a set of fonts and a corresponding package to enable typesetting of phonetic symbols with LaTeX. TIPA contains all the symbols, including diacritics, defined in the 1979, 1989, 1993, and 1996 versions of the International Phonetic Alphabet (IPA). Besides IPA symbols, TIPA contains symbols that are useful for other areas of phonetics and linguistics including the following:

• Symbols used in American phonetics, for example, Image, Ε, , and λ;

• Symbols used in the historical study of Indo-European languages, such as þ, Image, Image, Image, Ь, Ъ, and accents such as Image and Image;

• Symbols used in the phonetic description of languages in East Asia, such as Image, Image, Image, Image, Image (needs option extra);

• Diacritics used in extIPA Symbols for Disordered Speech and VoQS (Voice Quality Symbols), for example, Image, Image, and Image (needs option extra).

The IPA symbols are encoded in the standard LaTeX encoding T3, for which the package tipa provides additional support macros. The encoding is available for the font families Computer Modern Roman, Sans, and Typewriter (based on the designs for Computer Modern by Donald Knuth), as well as for Times Roman and Helvetica.

Strictly speaking, T3 is not a proper LaTeX text encoding, as it does not contain the visible ASCII characters in their standard positions. However, one can take the position that phonetic symbols form a language of their own and for this language, the TIPA system provides a highly optimized input interface in which digits and uppercase letters serve as convenient shortcuts (see Table 7.22) to input common phonetic symbols within the argument of extipa or the environment IPA. All phonetic symbols are also available in long form; for example, to produce a Image one can use extschwa. The following example shows the TIPA system in a Times and Helvetica environment.

Image

Table 7.22. TIPA shortcut characters

7-8-10
Image

Image Redefined math commands

TIPA defines *, ;, :, !, and | as special macros with which to easily input phonetic symbols that do not have a shortcut input as explained above. In standard LaTeX all five are already defined for use in math mode, so loading tipa highjacks them for use by linguists. If that is not desirable, the option safe prevents these redefinitions. The long forms then have to be used—for example, the command extroundcap instead of |c. The following lines show a few more complicated examples with the output in Computer Modern Roman, Sans, and Typewriter.

7-8-11
Image

If loaded with the option tone, TIPA provides a one command to produce “tone letters”. The command takes one argument consisting of a string of numbers denoting pitch levels, 1 being the lowest and 5 the highest. Within this range, any combination is allowed and there is no limit on the length of the combination, as exemplified in the last line of the next example, which otherwise shows the usage of one to display the four tones of Chinese.

7-8-12
Image

The above examples merely scrape the surface of the possibilities offered by TIPA. To explore it in detail consult the tipaman manual, which is part of the TIPA distribution.

7.8.7. Typesetting the euro symbol (€)

On January 1, 2002, the euro (€) became the official currency in 12 countries of the European Union.1 A long time before that event, the European Commission had a logo designed, to be used whenever one refers to the new European currency. The Commission now also encourages the use of symbols that are adjusted to the current font of a document. Meanwhile, most foundries have integrated specially designed euro symbols into their fonts, but there are still many fonts without euro in use. For instance, the PostScript standard fonts, which are hard-wired in most existing laser printers, cannot be assumed to have euro symbols.

1 More exactly, bank notes and coins were introduced on that day.

The official LaTeX command to access a euro symbol is exteuro, which is part of the textcomp package. However, many fonts simply do not contain a euro glyph. In such a case textcomp attempts to fake the symbol by putting two slashes through an uppercase C (e.g., in Times Roman €).

With popular fonts designed for use with TeX, the euro symbol is usually available but, unfortunately, the euro sign designed by Jörg Knappen for the European Computer Modern fonts (i.e., LaTeX’s default font families) is somewhat futuristic and considered acceptable by many people only in the sans serif family:

7-8-13
Image

The situation is somewhat better with the Computer Modern Bright families. Although produced using the designs of the European Computer Modern fonts, the euro symbol comes out nicely, as nearly all serifs are dropped in these families.

7-8-14
Image

But what should be done if the fonts used in the document do not contain the symbol? In that case the solution is to use either separate symbol fonts that provide a generic euro symbol (with a neutral design, that can be combined with many font families) or symbol fonts specially designed to be used with certain text font families. In any event the symbol should be available in several weight (and width) series and sizes so that it can be effectively used in different typesetting situations (e.g., in a heading like the one of the current section).

eurosym—euros for LaTeX

The first set of fonts providing generic euro symbols for use with TeX were probably the EuroSym fonts designed by Henrik Theiling. They are available as sources as well as PostScript Type 1 outlines and contain the euro symbol designed according to the official construction method. As a nice feature, the fonts contain a picture of the construction method in slot zero. So for those who always wanted to know how the symbol should be designed, the following example is illuminating:

7-8-15
Image

Regular euros

The eurosym package, which is used to access these fonts, defines the command euro. By default, this command generates the official symbol to vary with the series and shape attributes of the current document font. See Table 7.23 on the next page for the set of possibilities.

Image

Table 7.23. Classification of the EuroSym font family

7-8-16
Image

Poor man’s euros

As an alternative, the package offers commands to construct a euro symbol from the letter “C” in the current font by combining it with horizontal bars (which exist in three widths). The next example shows that the results range from unacceptable to more or less adequate, depending on the shape of the “C” and the chosen bar width. In any case a properly defined euro symbol for a font is preferable and should be used if available.

7-8-17
Image

With the package options gen, gennarrow, and genwide, one can change the euro command so that it points to geneuro, geneuronarrow, or geneurowide, respectively. In all cases you can access the official euro symbol using the command officialeuro.

Finally, the package offers the convenient command EUR to typeset an amount of money together with the euro symbol separated by a small space.1 As different countries have different conventions about where to place the currency sign, the package recognizes the options left (default) and right.

1 Some other packages use this command name to denote the euro symbol itself—an unfortunate inconsistency.

7-8-18
Image

Another way to format monetary amounts is provided by the euro package, which is documented on page 96.

The Adobe euro fonts

Adobe also offers a set of Type 1 fonts that contain the euro symbol. This font set contains serifed, sans serif (with a design close to the official logo), and typewriter variants. All are available in upright and italic shapes and in normal and bold weights. To exploit these fonts, one needs a PostScript printer or, more generally, a printer that can render such fonts (e.g., with the help of the ghostscript program).

While the fonts can be freely used for printing purposes, Adobe does not allow them to be generally distributed or included in a TeX distribution. For this reason you have to manually download them from the Adobe web site: ftp://ftp.adobe.com/pub/adobe/type/win/all/eurofont.exe. This is a self-extracting archive for Windows. On Unix platforms the fonts can be extracted from it using the program unzip.

After downloading the fonts, one has to rename them to conform to Karl Berry’s font naming conventions [19] and, if necessary, get support files for LaTeX, such as .fd files, a mapping file for dvips, and a package to make them accessible in documents. Depending on the TeX installation (e.g., the TeXlive CD), these files might be already available. Otherwise, they can be downloaded from CTAN:fonts/euro.

eurosans—One way of getting euros from Adobe

Several LaTeX packages are available that provide access to the Adobe euro fonts, each using a different strategy. As its name indicates, the eurosans package developed by Walter Schmidt provides only access to Adobe’s EuroSans fonts (see Table 7.24 on the next page). The reason being that the serifed variants seldom fit the body fonts of documents, while the more neutral sans serif designs blend well with most typefaces, except for typewriter fonts. As the EuroMono typefaces from Adobe are actually condensed versions of EuroSans, they have been integrated as a condensed series (NFSS classifications mc, bc and sbc) by the package. Weight (medium or boldface), shape (upright or oblique), and width (regular or condensed) vary according to surrounding conditions in the document.

Image

Table 7.24. Classification of the Adobe euro font families (eurosans classification)

An important aspect of this package (and one absent from other packages), is the ability to scale the fonts by a factor, using the option scaled. By default, it scales the fonts down to 95% of their nominal size. If a different scale factor is needed to match the size of the document font, an explicit value can be provided, as seen in the next example.

7-8-19
Image

Restricting variance

The number of produced variations can be reduced (for example, varying the font series but always using normal shape) through a redefinition of the euro command.

7-8-20
Image

If there is no requirement for a serifed euro symbol, the eurosans package is usually preferable to other solutions, as it provides the most comprehensive set of font series and supports scaling of the fonts. The package documentation also describes how to install the fonts and the support files if necessary.

europs—Another way of getting euros from Adobe

A different approach was taken in the europs package developed by Jörn Clausen. It provides the command EUR to access the symbols from the Adobe euro fonts. This command selects a different symbol depending on the font attributes of the surrounding text, as can be seen in the next example.

7-8-21
Image

As this switch of shapes may not be desirable (e.g., the serifed euro may not blend well with the serifed document font), the package also offers the commands EURtm (serifed symbol), EURhv (sans serif symbol), and EURcr (monospaced symbol)—the names being modeled after the three PostScript fonts Times, Helvetica, and Courier. These commands fix the font family, but react to requests for bold or oblique variants. However, as the last line in the previous example shows, none of the symbols blends particularly well with these fonts. Finally, the package offers EURofc, which generates the official euro symbol (i.e., one from the sans serif regular font).

marvosym—Revisited for cash

Another free PostScript font that contains euro symbols as glyphs is the MarVoSym font, described in Section 7.8.3 on page 401. It is available in three shapes to blend with Times, Helvetica, and Courier. As this font is a Pi font, it comes in only one weight series, which somewhat limits its usefulness as a source for the euro symbol. The font contains two glyphs with the official euro design, which differ in their amounts of side-bearings. To better demonstrate this difference, the following example puts a frame around them. It also shows the other currency symbols available in this package.

7-8-22
Image

7.9. The low-level interface

While the high-level font commands are intended for use in a document, the low-level commands are mainly for defining new commands in packages or in the preamble of a document; see also Section 7.9.4. To make the best use of such font commands, it is helpful to understand the internal organization of fonts in LaTeX’s font selection scheme (NFSS).

One goal of LaTeX’s font selection scheme is to allow rational font selection, with algorithms guided by the principles of generic markup. For this purpose, it would be desirable to allow independent changes for as many font attributes as possible. On the other hand, font families in real life normally contain only a subset of the myriad imaginable font attribute combinations. Therefore, allowing independent changes in too many attributes results in too many combinations for which no real (external) font is available and a default has to be substituted.

LaTeX internally keeps track of five independent font attributes: the “current encoding”, the “current family”, the “current series”, the “current shape”, and the “current size”. The encoding attribute was introduced in NFSS release 2 after it became clear that real support of multiple languages would be possible only by maintaining the character-encoding scheme independently of the other font attributes.

The values of these attributes determine the font currently in use. LaTeX also maintains a large set of tables used to associate attribute combinations with external fonts (i.e., .tfm files that contain the information necessary for (La)TeX to do its job). Font selection inside LaTeX is then done in two steps:

1. A number of font attributes are changed using the low-level commands fontencoding, fontfamily, fontseries, fontshape, and fontsize.

2. The font corresponding to this new attribute setting is selected by calling the selectfont command.

The second step comprises several actions. LaTeX first checks whether the font corresponding to the desired attribute settings is known to the system (i.e., the .tfm file is already loaded) and, if so, this font is selected. If not, the internal tables are searched to find the external font name associated with this setting. If such a font name can be found, the corresponding .tfm file is read into memory and afterwards the font is selected for typesetting. If this process is not successful, LaTeX tries to find an alternative font, as explained in Section 7.9.3.

7.9.1. Setting individual font attributes

Every font attribute has one command to change its current value. All of these commands will accept more or less any character string as an argument, but only a few values make sense. These values are not hard-wired into LaTeX’s font selection scheme, but rather are conventions set up in the internal tables. The following sections introduce the naming conventions used in the standard set-up of LaTeX, but anyone can change this set-up by adding new font declarations to the internal tables. Obviously, anybody setting up new fonts for use with LaTeX should try to obey these conventions whenever possible, as only a consistent naming convention can guarantee that appropriate fonts are selected in a generically marked-up document.

If you want to select a specific font using this interface—say, Computer Modern Dunhill bold condensed italic 14pt—a knowledge of the interface conventions alone is not enough, as no external font exists for every combination of attribute values. You could try your luck by specifying something like the following set of commands:

Image

This code would be correct according to the naming conventions, as we will see in the following sections. Because this attribute combination does not correspond to a real font, however, LaTeX would have to substitute a different font. The substitution mechanism may choose a font that is quite different from the one desired, so you should consult the font tables (.fd files) to see whether the desired combination is available. Section 7.9.3 provides more details on the substitution process.

Choosing the font family

The font family is selected with the command fontfamily. Its argument is a character string that refers to a font family declared in the internal tables. The character string was defined when these tables were set up and is usually a short letter sequence—for example, cmr for the Computer Modern Roman family. The family names should not be longer than five letters, because they will be combined with possibly three more letters to form a file name, which on some systems can have at most eight letters.

Choosing the font series

The series attribute is changed with the fontseries command. The series combines a weight and a width in its argument; in other words, it is not possible to change the width of the current font independently of its weight. This arrangement was chosen because it is hardly ever necessary to change weight or width individually. On the contrary, a change in weight (say, to bold) often is accompanied by a change in width (say, to extended) in the designer’s specification. This is not too surprising, given that weight changes alter the horizontal appearance of the letters and thus call for adjustment in the expansion (i.e., the width) to produce a well-balanced look.

In the naming conventions for the argument for the fontseries command, the names for both the weight and the width are abbreviated so that each combination is unique. The conventions are shown in Table 7.25. These classifications are combined in the argument to fontseries; however, any instance of m (standing for medium in weight or width) is dropped, except when both weight and width are medium. The latter case is abbreviated with a single m. For example, bold expanded would be bx, whereas medium expanded would be x and bold medium would be b.

Image

Table 7.25. Weight and width classification of fonts

Choosing the font shape

The fontshape command is used to change the shape attribute. For the standard shapes, one- and two-letter abbreviations are used; these are shown in Table 7.26 on the facing page together with an example of the resulting shape in the Computer Modern Roman family.1

1 The ol shape was produced using pcharpath commands from the pst-char package, as Computer Modern does not contain such a shape. These types of graphical manipulations are discussed in [57].

Image

Table 7.26. Shape classification of fonts

Choosing the font size

The font size is changed with the fontsize{size}{skip} command. This is the only font attribute command that takes two arguments: the size to switch to and the baseline skip (the distance from baseline to baseline for this size). Font sizes are normally measured in points, so by convention the unit is omitted. The same is true for the second argument. However, if the baseline skip should be a rubber length—that is, if it contains plus or minus—you have to specify a unit. Thus, a valid size change could be requested by

Image

Even if such a request is valid in principle, no corresponding external font may exist in this size. In this case, LaTeX will try to find a nearby size if its internal tables allow for size correction or report an error otherwise.

If you use fonts existing in arbitrary sizes (for example, PostScript fonts), you can, of course, select any size you want. For example,

Image

will produce a birthday poster line with letters in a one-inch size. However, there is one problem with using arbitrary sizes: if LaTeX has to typeset a formula in this size (which might happen behind the scenes without your knowledge), it needs to set up all fonts used in formulas for the new size. For an arbitrary size, it usually has to calculate the font sizes for use in subscripts and sub-subscripts (at least 12 different fonts). In turn, it probably has to load a lot of new fonts—something you can tell by looking at the transcript file. For this reason you may finally hit some internal limit if you have too many different size requests in your document. If this happens, you should tell LaTeX which sizes to load for formulas using the DeclareMathSizes declaration, rather than letting it use its own algorithm. See Section 7.10.7 for more information on this issue.

Choosing the encoding

A change of encoding is performed with the command fontencoding, where the argument is the internal name for the desired encoding. This name must be known to LaTeX, either as one of the predefined encodings (loaded by the kernel) or as declared with the DeclareFontEncoding command (see Section 7.10.5). A set of standard encoding names are given in Table 7.27.

Image

Table 7.27. Standard font encodings used with LaTeX

LaTeX’s font selection scheme is based on the (idealistic) assumption that most (or, even better, all) fonts for text are available in the same encoding as long as they are used to typeset in the same language. In other words, encoding changes should become necessary only if one is switching from one language to another. In that case it is normally the task of the language support packages (e.g., those from the babel system) to arrange matters behind the scenes.

In the following example we change the encoding manually by defining an environment Cyr for typesetting in Cyrillic. In this environment both the font encoding and the input encoding are locally changed. That might sound strange but if you work with an editor or keyboard that can switch input encodings on the fly this might be exactly the way your text is stored. Of course, for proper language support, additional work would be necessary, such as changing the hyphenation rules. The encodings are declared to LaTeX by loading them with the fontenc package. T2A specifies one of the standard Cyrillic encodings; by loading T1 last, it becomes the default encoding for the document.

7-9-1
Image

Potential T1 encoding problems

Unfortunately, T1 is not fully implementable for most PostScript fonts. The following five characters are likely to show up as blobs of ink (indicating a missing glyph in the font). Note that the per thousand and per ten thousand symbols are actually formed by joining a percent sign and one or two additional small zeros; only the latter glyph is missing.

7-9-2
Image

As explained in Section 7.5.4 on page 362, the situation for TS1 is even worse, as sometimes half the glyphs from that encoding are not available in a given PostScript font.

7.9.2. Setting several font attributes

When designing page styles (see Section 4.4) or layout-oriented commands, you often want to select a particular font—that is, you need to specify values for all attributes. For this task LaTeX provides the command usefont, which takes four arguments: the encoding, family, series, and shape. The command updates those attributes and then calls selectfont. If you also want to specify the size and baseline skip, place a fontsize command in front of it. For example,

Image

would produce the same result as the hypothetical example on page 413.

Besides usefont, LaTeX provides the DeclareFixedFont declaration, which can be used to define new commands that switch to a completely fixed font. Such commands are extremely fast because they do not have to look up any internal tables. They are therefore very useful in command definitions that have to switch back and forth between fixed fonts. For example, for the doc package (see Chapter 14), one could produce code-line numbers using the following definitions:

Image

As you can see from the example, DeclareFixedFont has six arguments: the name of the command to be defined followed by the five font attributes in the NFSS classification. Instead of supplying fixed values (except for the size), the builtin hooks that describe the main document font are used (see also Section 7.3.5). Thus, in the example above CodelineFont still depends on the overall layout for the document (via the settings of encodingdefault and other parameters). However, once the definition is carried out, its meaning is frozen, so later changes to the defaults will have no effect.

7.9.3. Automatic substitution of fonts

Whenever a font change request cannot be carried out because the combination is not known to LaTeX, it tries to recover by using a font with similar attributes. Here is what happens: if the combination of encoding scheme, family, series, and shape is not declared (see Section 7.10.3), LaTeX tries to find a known combination by first changing the shape attribute to a default. If the resulting combination is still unknown, it tries changing the series to a default. As a last resort, it changes the family to a default value. Finally, the internal table entry is looked up to find the requested size. For example, if you ask for tfamilyfseriesitshape—a typewriter font in a bold series and italic shape (which usually does not exist)—then you will get a typewriter font in medium series and upright shape, because LaTeX first resets the shape before changing the series. If, in such a situation, you prefer a typewriter font in medium series with italic shape, you have to announce your intention to LaTeX using the sub function, which is explained on page 425.

The substitution process never changes the encoding scheme, because any alteration could produce wrong characters in the output. Recall that the encoding scheme defines how to interpret the input characters, while the other attributes define how the output should look. It would be catastrophic if, say, a £ sign were changed into a $ sign on an invoice just because the software tried to be clever.

Thus, every encoding scheme must have a default family, series, and shape, and at least the combination consisting of the encoding scheme together with the corresponding defaults must have a definition inside LaTeX, as explained in Section 7.10.5.

7.9.4. Using low-level commands in the document

The low-level font commands described in the preceding sections are intended to be used in the definition of higher-level commands, either in class or package files or in the document preamble.

Whenever possible, you should avoid using the low-level commands directly in a document if you can use high-level font commands like extsf instead. The reason is that the low-level commands are very precise instructions to switch to a particular font, whereas the high-level commands can be customized using packages or declarations in the preamble. Suppose, for example, that you have selected Computer Modern Sans in your document using fontfamily{cmss}selectfont. If you later decide to typeset the whole document with fonts from the PSNFSS bundle—say, Times—applying a package would change only those parts of the document that do not contain explicit fontfamily commands.

7.10. Setting up new fonts

7.10.1. Overview

Setting up new fonts for use with LaTeX basically means filling the internal font selection tables with information necessary for later associating a font request in a document with the external .tfm file containing character information used by (La)TeX. Thus the tables are responsible for associating with

Image

the external file cmdunh10.tfm. To add new fonts, you need to reverse this process. For every new external font you have to ask yourself five questions:

1. What is the font’s encoding scheme—that is, which characters are in which positions?

2. What is its family name?

3. What is its series (weight and width)?

4. What is its shape?

5. What is its size?

The answers to these questions will provide the information necessary to classify your external font according to the LaTeX conventions, as described in Section 7.9. The next few sections discuss how to enter new fonts into the NFSS tables so that they can be used in the main text. You normally need this information if you want to make use of new fonts—for example, if you want to write a short package file for accessing a new font family. Later sections discuss more complicated concepts that come into play if you want to use, for example, special fonts for math instead of the standard ones.

If new fonts from the non-TeX world are to be integrated into LaTeX, it might be necessary to start even one step earlier: you may have to generate .tfm and probably virtual font files first. The tool normally used for this step is the fontinst program, written by Alan Jeffrey and further developed and now maintained by Lars Hellström. It is described in [57] and [64] and in the source documentation [74,75].

7.10.2. Naming those thousands of fonts

A font naming scheme that can be used with TeX was proposed by Karl Berry [18], provoking some discussion [118]. The current version is described in [19] and has become the de facto standard in the TeX world. Berry tries to classify all font file names using eight alphanumeric characters, where case is not significant. This eight-character limit guarantees that the same file names can be used across all computer platforms and, more importantly, conforms to the ISO 9660 norm for CD-ROM. The principle of the scheme is described in Table 7.28, where the parts in brackets are omitted if they correspond to a default. For example, a design size is given only if the font is not linearly scaled. Table 7.8 on page 372 shows the classification of the 35 “basic” PostScript fonts according to LaTeX’s font interface. For each font the full Adobe name and, in parentheses, the corresponding short (Karl Berry) file name is given (without the encoding part). For OT1, T1, or TS1 one would need to append 7t, 8t, or 8c, respectively, to obtain the full file name—for example, putr8t for Utopia Regular in T1 encoding.

Image

Table 7.28. Karl Berry’s font file name classification scheme

The naming convention covers internal TeX names for fonts (i.e., those used in DeclareFontShape declarations as described in the next section), names for virtual fonts and their components (e.g., particular reencodings of physical fonts) [91], and the names of physical fonts. In case of PostScript fonts, the physical font names are often different from those used internally by TeX.

A glimpse of the underworld

In the latter case the mapping between internal font names and the external world has to happen when the result of a LaTeX run is viewed or printed. For example, the PostScript driver dvips uses mapping files (default extension .map) that contain lines such as

Image

telling it that the font putr8r can be obtained from the external font putr8a.pfb by reencoding it via a special encoding vector (8r.enc in this case). However, when you look into t1put.fd (the file that contains the DeclareFontShape declarations for the Utopia family in the T1 encoding), you will find that putr8r is not referenced. Instead, you will find names such as putr8t. The reason is that putr8t is a virtual font (built with the help of fontinst [74, 75]) that references putr8r. The latter link is difficult to find (other than through the naming convention itself) if you do not have access to the sources that were used to build the virtual fonts actually used by TeX. Fortunately, you seldom have to dig into that part of a TeX system; if you do, you will find more information in [57, Chapter 10] or in the references listed above.

7.10.3. Declaring new font families and font shape groups

Each family/encoding combination must be made known to LaTeX through the command DeclareFontFamily. This command has three arguments. The first two arguments are the encoding scheme and the family name. The third is usually empty, but it may contain special options for font loading and is explained on page 426. Thus, if you want to introduce a new family—say, Computer Modern Dunhill with the old TeX encoding scheme—you would write

Image

A font family normally consists of many individual fonts. Instead of announcing each family member individually to LaTeX, you have to combine fonts that differ only in size and declare them as a group.

Such a group is entered into the internal tables of LaTeX with the command DeclareFontShape, which takes six arguments. The first four are the encoding scheme, the family name, the series name, and the shape name under which you want to access these fonts later on. The fifth argument is a list of sizes and external font names, given in a special format that we discuss below. The sixth argument is usually empty; its use is explained on page 426.

We will first show a few examples and introduce terminology; then we will discuss all the features in detail.

As an example, an NFSS table entry for Computer Modern Dunhill medium (series) upright (shape) in the encoding scheme “TeX text” could be entered as

Image

assuming that only one external font for the size 10pt is available. If you also have this font available at 12pt (scaled from 10pt), the declaration would be

Image

If the external font is available in all possible sizes, the declaration becomes very simple. This is the case for Type 1 PostScript (outline) fonts, or when the driver program is able to generate fonts on demand by calling .

For example, Times Roman bold (series) upright (shape) in the LaTeX T1 encoding scheme could be entered as

Image

This example declares a size range with two open ends (no sizes specified to the left and the right of the -). As a result, the same external .tfm file (ptmb8t) is used for all sizes and is scaled to the desired size. If you have more than one .tfm file for a font—say, emtt10 for text sizes and emtt12 for display sizes (this is European Modern Typewriter)—the declaration could be

Image

In this case, the .tfm file emtt10 would be used for sizes smaller than 12pt, and emtt12 for all sizes larger than or equal to 12pt.

The preceding examples show that the fifth argument of the command DeclareFontShape consists of size specifications surrounded by angle brackets (i.e., <...>) intermixed with loading information for the individual sizes (e.g., font names). The part inside the angle brackets is called the “size info” and the part following the closing angle bracket is called the “font info”. The font info is further structured into a “size function” (often empty) and its arguments; we discuss this case below. Within the arguments of DeclareFontShape, blanks are ignored to help make the entries more readable.1 In the unusual event that a real space has to be entered, you can use the command space.

1 This is true only if the command is used at the top level. If such a declaration is used inside other constructs (e.g., the argument of AtBeginDocument), blanks might survive and in that case entries will not be recognized.

Simple sizes and size ranges

The size infos—the parts between the angle brackets in the fifth argument to DeclareFontShape—can be divided into “simple sizes” and “size ranges”. A simple size is given by a single (decimal) number, like <10> or <14.4>, and in principle can have any positive value. However, because the number represents a font size measured in points, you probably will not find values less than 4 or greater than 120. A size range is given by two simple sizes separated by a hyphen, to indicate a range of font sizes that share the same font info. The lower boundary (i.e., the size to the left of the hyphen) is included in the range, while the upper boundary is excluded. For example, <5-10> denotes sizes greater than or equal to 5pt and less than 10pt. You can omit the number on either side of the hyphen in a size range, with the obvious interpretation: <-> stands for all possible sizes, <-10> stands for all sizes less than 10pt, and <12-> stands for all sizes greater than or equal to 12pt.

Often several simple sizes have the same font info. In that case a convenient shorthand is to omit all but the last font infos:

Image

This example declares the font Pandora medium Roman as being available in several sizes, all of them produced by scaling from the same design size.

Size functions

As noted earlier, the font info (the string after the closing angle bracket) is further structured into a size function and its argument. If an * appears in the font info string, everything to the left of it forms the function name and everything to the right is the argument. If there is no asterisk, as in all of the examples so far, the whole string is regarded as the argument and the function name is “empty”.

Based on the size requested by the user and the information in the DeclareFontShape command, size functions produce the specification necessary for LaTeX to find the external font and load it at the desired size. They are also responsible for informing the user about anything special that happens. For example, some functions differ only in terms of whether they issue a warning. This capability allows the system maintainer to set up LaTeX in the way best suited for the particular site.

The name of a size function consists of zero or more letters. Some of the size functions can take two arguments, one optional and one mandatory. Such an optional argument has to be enclosed in square brackets. For example, the specification

Image

would select, for all possible sizes (we have the range 0 to ), the size function s with the optional argument 0.9 and the mandatory argument cmfib8.

The size specifications in DeclareFontShape are inspected in the order in which they are given. When a size info matches the requested user size, the corresponding size function is executed. If this process yields a valid font, no further entries are inspected. Otherwise, the search continues with the next entry. The standard size functions are listed below. The document fntguide.tex [109], which is part of the LaTeX distribution, describes how to define additional functions should it ever become necessary.

The “empty” function

Because the empty function is used most often, it has the shortest possible name. (Every table entry takes up a small bit of internal memory, so the syntax chosen tries to find a balance between a perfect user interface and compactness of storage.) The empty function loads the font info exactly at the requested size if it is a simple size. If there is a size range and the size requested by the user falls within that range, it loads the font exactly at the user size.

For example, if the user requested 14.4, then the specification

Image

would load the .tfm file called panr10.tfm at 14.4pt. Because this font was designed for 10pt (it is the Pandora Roman font at 10pt), all the values in the .tfm file are scaled by a factor of 1.44.

Sometimes one wants to load a font at a slightly larger or smaller size than the one requested by the user. This adjustment may be necessary when fonts from one family appear to be too large compared to fonts from other families used in the same document. For this purpose the empty size function allows an optional argument to represent a scale factor that, if present, is multiplied by the requested size to yield the actual size to be loaded. Thus

Image

would always load the .tfm file called phvr8t.tfm (Helvetica in T1 encoding) at 95% of the requested size. If the optional argument is used, the empty size function will issue a warning to alert the user that the font is not being loaded at its intended size.

The “s” function

The s function has the same functionality as the empty function, but does not produce warnings (the s means “silence”). Writing

Image

avoids all the messages that would be generated on the terminal if the empty function were used. Messages are still written to the transcript file, so you can find out which fonts were used if something goes wrong. The helvet package is implemented in this way, except that the scaling factor is not hard-wired but rather passed via a package option to the DeclareFontShape declaration.

The “gen” function

Often the external font names are built by appending the font size to a string that represents the typeface. For example, cmtt8, cmtt9, and cmtt10 are the external names for the fonts Computer Modern Typewriter at 8, 9, and 10pt, respectively. With font names organized according to such a scheme, you can make use of the gen function to shorten the entry. This function combines the font info and the requested size to generate (hence gen) the external font names. Thus, you can write

Image

as shorthand for

Image

thereby saving eight characters in the internal tables of NFSS. This function combines both parts literally, so you should not use it with decimal sizes like 14.4. Also, you must ensure that the digits in the external font name really represent the design size (for example, cmr17 is actually Computer Modern Roman at 17.28pt).

In all other respects, the gen function behaves like the empty function. That is, the optional argument, if given, represents a scale factor and, if used, generates an information message.

The “sgen” function

The sgen function is the silent variant of the gen function. It writes any message only to the transcript file.

The “genb” function

This size function is similar to gen, but is intended for fonts in which the size is encoded in the font name in centipoints, such as the EC fonts. As a consequence, a line such as

Image

acts as shorthand for

Image

An optional argument, if present, will have the same effect as it would with the empty function—it provides a scale factor and, if used, generates an information message.

The “sgenb” function

The sgenb function is the silent variant of the genb function. It writes any message only to the transcript file.

The “sub” function

The sub function is used to substitute a different font shape group if no external font exists for the current font shape group. In this case the argument is not an external font name but rather a different family, series, and shape combination separated by slashes (the encoding will not change for the reasons explained earlier). For example, the Computer Modern Sans family has no italic shape, only a slanted shape. Thus, it makes sense to declare the slanted shape as a substitute for the italic one:

Image

Without this declaration, LaTeX’s automatic substitution mechanism (see Section 7.9.3) would substitute the default shape, Computer Modern Sans upright.

Besides the substitution of complete font shape groups, there are other good uses for the sub function. Consider the following code:

Image

This declaration states that for sizes smaller than 8pt LaTeX should look in the font shape declaration for OT1/cmss/m/n. Such substitutions can be chained. People familiar with the standard font distribution know that there is no Computer Modern Sans font smaller than 8pt, so the substituted font shape group will probably contain another substitution entry. This may seem like a strange usage but it has the advantage that when such additional fonts become available you will need to change only one font shape group declaration—all declarations that refer indirectly to these fonts will then benefit automatically.

The “ssub” function

The ssub function has the same functionality as the sub function, but does not produces on-screen warnings (the first s means “silence”).

The “subf” function

The subf function is a cross between the empty function and sub, in that it loads fonts in the same way as the empty function but produces a warning that this operation was done as a substitution because the requested font shape is not available. You can use this function to substitute some external fonts without having to declare a separate font shape group for them, as in the case of the sub function. For example,

Image

would warn the user that the requested combination is not available and, therefore, that the font ptmb7t was loaded instead. As this is less informative than using the sub function, the latter should be preferred.

The “ssubf” function

The silent variant of subf, this function writes its messages only to the transcript file.

The “fixed” function

This function disregards the requested size and instead loads the external font given as an argument. If present, the optional argument denotes the size (in points) at which the font will be loaded. Thus, this function allows you to specify size ranges for which one font in some fixed size will be loaded.

The “sfixed” function

The silent variant of fixed, this function is used, for example, to load the font containing the large math symbols, which is often available only in one size.

Font-loading options

As already mentioned, you need to declare each family using the command DeclareFontFamily. The third argument to this command, as well as the sixth argument to DeclareFontShape, can be used to specify special operations that are carried out when a font is loaded. In this way, you can change parameters that are associated with a font as a whole.

For every external font, (La)TeX maintains, besides the information about each character, a set of global dimensions and other values associated with the font. For example, every font has its own “hyphen character”, the character that is inserted automatically when (La)TeX hyphenates a word. Another example is the normal width and the stretchability of a blank space between words (the “interword space”); again a value is maintained for every font and changed whenever (La)TeX switches to a new font. By changing these values when a font is loaded, special effects can be achieved.

Normally, changes apply to a whole family; for example, you may want to prohibit hyphenation for all words typeset in the typewriter family. In this case, the third argument of DeclareFontFamily should be used. If the changes should apply only to a specific font shape group, you must use the sixth argument of DeclareFontShape. In other words, when a font is loaded, NFSS first applies the argument of DeclareFontFamily and then the sixth argument of DeclareFontShape, so that it can override the load options specified for the whole family if necessary.

Below we study the information that can be set in this way (unfortunately, not everything is changeable) and discuss some useful examples. This part of the interface addresses very low-level commands of TeX. Because it is so specialized, no effort was made to make the interface more LaTeX-like. As a consequence, the methods for assigning integers and dimensions to variables are somewhat unusual.

Changing the hyphenation character

With hyphencharfont= number, (La)TeX specifies the character that is inserted as the hyphen when a word is hyphenated. The number represents the position of this character within the encoding scheme. The default is the value of defaulthyphenchar, which is 45, representing the position of the “-” character in most encoding schemes. If this number is set to -1, hyphenation is suppressed. Thus, by declaring

Image

you can suppress hyphenation for all fonts in the cmtt family with the encoding scheme OT1. Fonts with the T1 encoding have an alternate hyphen character in position 127, so that you can set, for example,

Image

This makes the hyphen character inserted by (La)TeX different from the compound-word dash entered in words like “so-called”. (La)TeX does not hyphenate words that already contain explicit hyphen characters (except just after the hyphen), which can create a real problem in languages in which the average word length is much larger than in English. With the above setting this problem can be solved.

Every (La)TeX font has an associated set of dimensions, which are changed by assignments of the form fontdimen number font= dimen, where number is the reference number for the dimension and dimen is the value to be assigned. The default values are taken from the .tfm file when the font is loaded. Each font has at least seven such dimensions:

fontdimen1 Specifies the slant per point of the characters. If the value is zero, the font is upright.

fontdimen2 Specifies the normal width of a space used between words (interword space).

fontdimen3 Specifies the additional stretchability of the interword space—that is, the extra amount of white space that (La)TeX is allowed to add to the space between words to produce justified lines in a paragraph. In an emergency (La)TeX may add more space than this allowed value; in that case an “underfull box” will be reported.

fontdimen4 Specifies the allowed shrinkability of the interword space—that is, the amount of space that (La)TeX is allowed to subtract from the normal interword space (fontdimen2) to produce justified lines in a paragraph. (La)TeX will never shrink the interword space to less than this minimum.

fontdimen5 Specifies the x-height. It defines the font-oriented dimension 1ex.

fontdimen6 Specifies the quad width. It defines the font-oriented dimension 1em.

fontdimen7 Specifies the amount intended as extra space to be added after certain end-of-sentence punctuation characters when onfrenchspacing is in force. The exact rules for when TeX uses this dimension (all or some of the extra space) are somewhat complex; see The TeXbook [82] for details. It is always ignored or rather replaced by the value xspaceskip, when that value is nonzero.

When changing the interword spacing associated with a font, you cannot use an absolute value because such a value must be usable for all sizes within one font shape group. You must, therefore, define the value by using some other parameter that depends on the font. You could say, for example,

Image

This declaration reduces the normal interword space to 70% of its original value. In a similar manner, the stretchability and shrinkability could be changed.

Some fonts used in formulas need more than seven font dimensions—namely, the symbol fonts called “symbols” and “largesymbols” (see Section 7.10.7). TeX will not typeset a formula if these symbol fonts have fewer than 22 and 13 fontdimen parameters, respectively. The values of these parameters are used to position the characters in a math formula. An explanation of the meaning of every such fontdimen parameter is beyond the scope of this book; details can be found in Appendix G of The TeXbook [82].

One unfortunate optimization is built into the TeX system: TeX loads every .tfm file only once for a given size. It is, therefore, impossible to define one font shape group (with the DeclareFontShape command) to load some external font—say, cmtt10—and to use another DeclareFontShape command to load the same external font, this time changing some of the fontdimen parameters or some other parameter associated with the font. Trying to do so changes the values for both font shape groups.

Suppose, for example, that you try to define a font shape with tight spacing by making the interword space smaller:

Image

This declaration will not work. The interword spacing for the medium shape will change when the tight shape is loaded to the values specified there, and this result is not what is wanted. The best way to solve this problem is to define a virtual font that contains the same characters as the original font, but differs in the settings of the font dimensions (see [73, 74, 91]). Another possible solution is to load the font at a slightly different size, as in the following declaration:

Image

That strategy makes them different fonts for TeX with separate fontdimen parameters. Alternatively, in this particular case you can control the interword space by setting spaceskip, thereby overwriting the font values. See Section 3.1.12 for some discussion of that parameter.

7.10.4. Modifying font families and font shape groups

If you need a nonstandard font shape group declaration for a particular document, just place your private declaration in a package or the preamble of your document. It will then overwrite any existing declaration for the font shape combination. Note, however, that the use of DeclareFontFamily prevents a later loading of the corresponding .fd file (see Section 7.10.6). Also, your new declaration has no effect on fonts that are already loaded.

Today’s LaTeX format preloads by default only a small number of fonts. However, by using the configuration file preload.cfg, more or fewer fonts can be loaded when the format is built. None of these preloaded fonts can be manipulated using font family or font shape declarations. Thus, if you want some special settings for the core fonts, you must ensure that none of these fonts is preloaded. For additional information on ways to customize a LaTeX installation, refer to the document cfgguide.tex [110], which is part of the LaTeX distribution.

7.10.5. Declaring new font encoding schemes

Font changes that involve alterations in the encoding scheme require taking certain precautions. For example, in the T1 encoding, most accented letters have their own glyphs, whereas in the traditional TeX text encoding (OT1), accented letters must be generated from accents and letters using the accent primitive. (It is desirable to use glyphs for accented letters rather than employing the accent primitive because, among other things, the former approach allows for correct hyphenation.) If the two approaches have to be mixed, perhaps because a font is available only in one of the encodings, the definition of a command such as " must behave differently depending on the current font encoding.

For this reason, each encoding scheme has to be formally introduced to LaTeX with a DeclareFontEncoding command, which takes three arguments. The first argument is the name of the encoding under which you access it using the fontencoding command. Table 7.27 on page 416 provides a list of standard encoding schemes and their internal NFSS names.

The second argument contains any code (such as definitions) to be executed every time LaTeX switches from one encoding to another using the fontencoding command. The final argument contains code to be used whenever the font is accessed as a mathematical alphabet. Thus, these three arguments can be used to redefine commands that depend on the positions of characters in the encoding. To avoid spurious spaces in the output (coming from extra spaces in the arguments), the space character is ignored within them. In the unlikely event that you need spaces in a definition in one of the arguments, use the space command.

The LaTeX3 project reserves the use of encodings starting with the following letters: T (standard text encodings with 256 characters), TS (symbols that are designed to extend the corresponding T encoding), X (text encodings that do not conform to the strict requirements for T encodings), M (standard math encodings with 256 characters), S (other symbol encodings), A (other special applications), OT (standard text encodings with 128 characters), and OM (standard math encodings with 128 characters). The letter O was chosen to emphasize that the 128-character encodings are old and obsolete. Ideally, these encodings will be superseded by standards defined by the TeX user groups so that in the future a change of encoding will be necessary only if one is switching from one language to another.

For your own private encodings, you should choose names starting with L for “local” or E for “experimental”. Encodings starting with U are for “Unknown” or “Unclassified” encodings—that is, for fonts that do not fit a common encoding pattern. This naming convention ensures that files using official encodings are portable. New standard encodings will be added to the LaTeX documentation as they emerge. For example, the T2* and T5 encodings have appeared since the first edition of this book was published.

The DeclareFontEncoding command stores the name of the newly declared encoding in the command LastDeclaredEncoding. This feature is sometimes useful when you are declaring other related encoding information and is, for example, used in the encoding declaration files for the Cyrillic languages.

Also, as we saw in Section 7.9.3 on font substitution, the default values for the family, series, and shape may need to be different for different encodings. For this purpose, NFSS provides the command DeclareFontSubstitution, which again takes the encoding as the first argument. The next three arguments are the default values (associated with this encoding) for family, series, and shape for use in the automatic substitution process, as explained in Section 7.9.3. It is important that these arguments form a valid font shape—in other words, that a DeclareFontShape declaration exists for them. Otherwise, an error message will be issued when NFSS checks its internal tables at egin{document}.

7.10.6. Internal file organization

Font families can be declared when a format file is generated, declared in the document preamble, or loaded on demand when a font change command in the document requests a combination that has not been used so far. The first option consumes internal memory in every LaTeX run, even if the font is not used. The second and third possibilities take a little more time during document formatting, because the font definitions have to be read during processing time. Nevertheless, it is preferable to use the latter solutions for most font shape groups, because it allows you to typeset a wide variety of documents with a single LaTeX format.

When the format is generated, LaTeX will read a file named fonttext.ltx, which contains the standard set of font family definitions and some other declarations related to text fonts. With some restrictions1 this set can be altered by providing a configuration file fontdef.cfg; see the documentation cfgguide.tex.

1 Any such customization should not be undertaken lightly as it is unfortunately very easy to produce a LaTeX format that shows subtle or even glaring incompatibilities with other installations.

All other font family definitions should be declared in external files loaded on request: either package files or font definition (.fd) files. If you place font family definitions in a package file, you must explicitly load this package after the documentclass command. But there is a third possibility: whenever NFSS gets a request for a font family foo in an encoding scheme BAR, and it has no knowledge about this combination, it will try to load a file called barfoo.fd (all letters lowercase). If this file exists, it is supposed to contain font shape group definitions for the family foo in the encoding scheme BAR—that is, declarations of the form

Image

In this way it becomes possible to declare a huge number of font families for LaTeX without filling valuable internal memory with information that is almost never used.1

1 Unfortunately, this feature is not fully available on (La)TeX installations that use different search paths for the commands input and openin. On such systems the .fd feature can be activated at installation time by supplying NFSS with a full path denoting the directories containing all the .fd files. As a result, local .fd files—those stored in the current directory—may not be usable on such systems.

Each .fd file should contain all font definitions for one font family in one encoding scheme. It should consist of one or more DeclareFontShape declarations and exactly one DeclareFontFamily declaration. Other definitions should not appear in the file, except perhaps for a ProvidesFile declaration or some ypeout statement informing the user about the font loading. As an alternative to the ypeout command, you can use the plain TeX command wlog, which writes its argument only into the transcript file. Detailed information in the transcript file should be generated by all .fd files that are used in production, because looking at this transcript will help to locate errors by providing information about the files and their versions used in a particular job. If ypeout or wlog commands are used, it is important to know that spaces and empty lines in a .fd file are ignored. Thus, you have to use the command space in the argument to ypeout or wlog to obtain a blank space on the screen and the transcript file.

New encoding schemes cannot be introduced via the .fd mechanism. NFSS will reject any request to switch to an encoding scheme that was not explicitly declared in the LaTeX format (i.e., fonttext.ltx), in a package file, or in the preamble of the document.

7.10.7. Declaring new fonts for use in math

Specifying font sizes

For every text size NFSS maintains three sizes that are used to typeset formulas (see also Section 8.7.1): the size in which to typeset most of the symbols (selected by extstyle or displaystyle); the size for first-order subscripts and superscripts (scriptstyle); and the size for higher-order subscripts and superscripts (scriptscriptstyle). If you switch to a new text size, for which the corresponding math sizes are not yet known, NFSS tries to calculate them as fractions of the text size. Instead of letting NFSS do the calculation, you might want to specify the correct values yourself via DeclareMathSizes. This declaration takes four arguments: the outer text size and the three math sizes for this text size. For example, the class file for The LaTeX Companion contains settings like the following:

Image

The first declaration defines the math sizes for the 14pt heading size to be 14pt, 10pt, and 7pt, respectively. The second declaration (the size for the chapter headings) informs NFSS that no math sizes are necessary for 36pt text size. This avoids the unnecessary loading of more than 30 additional fonts. For the first edition of The LaTeX Companion such declarations were very important to be able to process the book with all its examples as a single document (the book loaded 228 fonts out of a maximum of 255). Today, TeX installations are usually compiled with larger internal tables (e.g., the laptop implementation used to write this chapter allows 1000 fonts), so conserving space is no longer a major concern. In any event you should be careful about disabling math sizes, because if some formula is typeset in such a size after all, it will be typeset in whatever math sizes are still in effect from an earlier text size.

Adding new symbol fonts

We have already seen how to use math alphabet commands to produce letters with special shapes in a formula. We now discuss how to add fonts containing special symbols, called “symbol fonts”, and how to make such symbols accessible in formulas.

The process of adding new symbol fonts is similar to the declaration of a new math alphabet identifier: DeclareSymbolFont defines the defaults for all math versions, and SetSymbolFont overrides the defaults for a particular version.

The math symbol fonts are accessed via a symbolic name, which consists of a string of letters. If, for example, you want to install the AMS fonts msbm10, shown in Table 7.29 on the following page, you first have to make the typeface known to NFSS using the declarations described in the previous sections. These instructions would look like

Image

Table 7.29. Glyph chart for msbm10 produced by the nfssfont.tex program

Image

and are usually placed in an .fd file. You then have to declare that symbol font for all math versions by issuing the command

Image

It makes the font shape group U/msb/m/n available as a symbol font under the symbolic name AMSb. If there were a bold series in this font family (unfortunately there is not), you could subsequently change the set-up for the bold math version by saying

Image

After taking care of the font declarations, you can make use of this symbol font in math mode. But how do you tell NFSS that $alessdot b$ should produce a b, for example? To do so, you have to introduce your own symbol names to NFSS, using DeclareMathSymbol.

Image

The first argument to DeclareMathSymbol is your chosen command name. The second argument is one of the commands shown in Table 7.30 on the next page and describes the nature of the symbol—whether it is a binary operator, a relation, and so forth. (La)TeX uses this information to leave the correct amount of space around the symbol when it is encountered in a formula. Incidentally, except for mathalpha, these commands can be used directly in math formulas as functions with one argument, in which case they space their (possibly complex) argument as if it were of the corresponding type; see Section 8.9 on page 524.

Image

Table 7.30. Math symbol type classification

The third argument identifies the symbol font from which the symbol should be fetched—that is, the symbolic name introduced with the DeclareSymbolFont command. The fourth argument gives the symbol’s position in the font encoding, either as a decimal, octal, or hexadecimal value. Octal (base 8) and hexadecimal (base 16) numbers are preceded by ' and ", respectively. If you look at Table 7.29 on the preceding page, you can easily determine the positions of all glyphs in this font. Such tables can be printed using the LaTeX program nfssfont.tex, which is part of the LaTeX distribution; see Section 7.5.7 on page 369. For example, lessdot would be declared using

Image

Instead of a command name, you can use a single character in the first argument. For example, the eulervm package has several declarations of the form

Image

that specify where to fetch the digits from.

Because DeclareMathSymbol is used to specify a position in some symbol font, it is important that all external fonts associated with this symbol font via the DeclareSymbolFont and SetSymbolFont commands have the same character in that position. The simplest way to ensure this uniformity is to use only fonts with the same encoding (unless it is the U, a.k.a. unknown, encoding, as two fonts with this encoding are not required to implement the same characters).

Besides DeclareMathSymbol, LaTeX knows about DeclareMathAccent, DeclareMathDelimiter, and DeclareMathRadical for setting up math font support. Details about these slightly special declarations can be found in [109], which is part of every LaTeX distribution.

If you look again at the glyph chart for msbm10 (Table 7.29 on the preceding page), you will notice that this font contains “blackboard bold” letters, such as Image. If you want to use these letters as a math alphabet, you can define them using DeclareMathAlphabet, but given that this symbol font is already loaded to access individual symbols, it is better to use a shortcut:

Image

That is, you give the name of your math alphabet identifier and the symbolic name of the previously declared symbol font.

An important reason for not unnecessarily loading symbol fonts twice is that there is an upper limit of 16 math fonts that can be active at any given time in (La)TeX. In calculating this limit, each symbol font counts; math alphabets count only if they are actually used in the document, and they count locally in each math version. Thus, if eight symbol fonts are declared, you can use a maximum of eight (possibly different) math alphabet identifiers within every version.

To summarize: to introduce new symbol fonts, you need to issue a small number of DeclareSymbolFont and SetSymbolFont declarations and a potentially large number of DeclareMathSymbol declarations; hence, adding such fonts is best done in a package file.

Introducing new math versions

We have already mentioned that the standard set-up automatically declares two math versions, normal and bold. To introduce additional versions, you use the declaration DeclareMathVersion, which takes one argument, the name of the new math version. All symbol fonts and all math alphabets previously declared are automatically available in this math version; the default fonts are assigned to them—that is, the fonts you have specified with DeclareMathAlphabet or DeclareSymbolFont.

You can then change the set-up for your new version by issuing appropriate SetMathAlphabet and SetSymbolFont commands, as shown in previous sections (pages 352 and 433) for the bold math version. Again, the introduction of a new math version is normally done in a package file.

Changing the symbol font set-up

Besides adding new symbol fonts to access more symbols, the commands we have just seen can be used to change an existing set-up. This capability is of interest if you choose to use special fonts in some or all math versions.

The default settings in LaTeX are given here:

Image

In the standard set-up, digits and text produced by “log-like operators” such as log and max are taken from the symbol font called operators. To change this situation so that these elements agree with the main text font—say, Computer Modern Sans rather than Computer Modern Roman—you can issue the following commands:

Image

Symbol fonts with the names symbols and largesymbols play a unique rôle in TeX, and for this reason they need a special number of fontdimen parameters associated with them. Thus, only specially prepared fonts can be used for these two symbol fonts. In principle one can add such parameters to any font at load time by using the third parameter of DeclareFontFamily or the sixth parameter of DeclareFontShape. Information on the special parameters for these symbol fonts can be found in Appendix G of [82].

7.10.8. Example: Defining your own .fd files

If you want to set up new (PostScript) fonts and create the necessary .fd files, you should follow the procedure explained earlier in this section. If fontinst [74] is used to generate the necessary font metric files, then the corresponding .fd files are automatically generated as well. However, an .fd file for a single font family is also easy to write by hand, once you know which font encoding is used. As an example, let’s study the declaration file t1bch.fd for Bitstream Charter in the T1 encoding:

Image

The file starts with an identification line and then declares the font family and encoding (i.e., bch in T1) using DeclareFontFamily—the arguments of this command should correspond to the name of the .fd file, except that by convention the encoding is in lowercase there. Then each combination of series and shape is mapped to the name of a .tfm file. These fonts can and will be scaled to any desired size—hence the <-> declarations on the DeclareFontShape commands. The second part of the file sets up some substitutions for combinations for which no font is available (i.e., replacing the bold extended series with the bold series).

Assuming you have bought the additional Charter fonts (Black and BlackItalic), which are not available for free, then you may want to add the related declarations to the .fd file. Of course, one would first need to provide the appropriate virtual fonts (using, for example, fontinst) to emulate the T1 character set; fortunately, for many fonts these can be downloaded from the Internet.1

1 A good resource is Walter Schmidt’s home page: http://home.vr-web.de/~was/fonts.html.

Special license for .fd files

In contrast to most other files in the LaTeX world, the usual license for .fd files allows their modification without renaming the files. However, you are normally not allowed to distribute such a modified file!

Another possible reason for producing your own .fd files might be the need to combine fonts from different font families and present them to LaTeX as a single new font family. For example, in 1954 Hermann Zapf designed the Aldus font family as a companion to his Palatino typeface (which was originally designed as a display typeface). As Aldus has no bold series, Palatino is a natural choice to use as a bold substitute. In the example below we combine Aldus (with old-style numerals) in its medium series with Palatino bold, calling the resulting “font family” zasj. We present only a fragment of a complete .fd file that enables us to typeset Example 7-10-1 on the facing page.

Image

To access this “pseudo-family” we have to select zasj in the T1 encoding. We also have to ensure that extbf switches to bold and not to bold extended, as our .fd file does not provide any substitutions. All that can be automatically provided by writing a tiny package (named fontmix.sty) like this:

Image

Thus, by loading fontmix, we get Aldus with Palatino Bold for headlines. In many cases such a mixture does not enhance your text, so do not mistake this example as a suggestion to produce arbitrary combinations.

7-10-1
Image

7.10.9. The order of declaration

NFSS forces you to give all declarations in a specific order so that it can check whether you have specified all necessary information. If you declare objects in the wrong order, it will complain. Here are the dependencies that you have to obey:

DeclareFontFamily checks that the encoding scheme was previously declared with DeclareFontEncoding.

DeclareFontShape checks that the font family was declared to be available in the requested encoding (DeclareFontFamily).

DeclareSymbolFont checks that the encoding scheme is valid.

SetSymbolFont additionally ensures that the requested math version was declared (DeclareMathVersion) and that the requested symbol font was declared (DeclareSymbolFont).

DeclareSymbolFontAlphabet checks that the command name for the alphabet identifier can be used and that the symbol font was declared.

DeclareMathAlphabet checks that the chosen command name can be used and that the encoding scheme was declared.

SetMathAlphabet checks that the alphabet identifier was previously declared with DeclareMathAlphabet or DeclareSymbolFontAlphabet and that the math version and the encoding scheme are known.

DeclareMathSymbol makes sure that the command name can be used (i.e., is undefined or was previously declared to be a math symbol) and that the symbol font was previously declared.

• When the egin{document} command is reached, NFSS makes some additional checks—for example, verifying that substitution defaults for every encoding scheme point to known font shape group declarations.

7.11. LaTeX’s encoding models

For most users it will probably be sufficient to know that there exist certain input and output encodings and to have some basic knowledge about how to use them, as described in the previous sections. However, sometimes it is helpful to know the whole story in some detail, so as either to set up a new encoding or to better understand packages or classes that implement special features. So here is everything you always wanted to know about encodings in LaTeX.

We start by describing the general character data flow within the LaTeX system, deriving from that the base requirements for various encodings and the mapping between them. We then have a closer look at the internal representation model for character data within LaTeX, followed by a discussion of the mechanisms used to map incoming data via input encodings into that internal representation.

Finally, we explain how the internal representation is translated, via the output encodings, into the form required for the actual task of typesetting.

7.11.1. Character data within the LaTeX system

Document processing with the LaTeX system starts by interpreting data present in one or more source files. This data, which represents the document content, is stored in these files in the form of octets representing characters. To correctly interpret these octets, LaTeX (or any other program used to process the file, such as an editor) must know the encoding that was used when the file was written. In other words, it must know the mapping between abstract characters and the octets representing them.

With an incorrect mapping, all further processing will be flawed to some extent unless the file contains only characters of a subset common in both encodings.1

1 As most encodings in the Western world share as a common subset a large fraction of the ASCII code (i.e., most of the 7-bit plane), documents consisting mainly of unaccented Latin characters are still understandable if viewed or processed in an encoding different from the one in which they were originally written. However, the more characters outside visible ASCII are used, the less comprehensible the text will become. A text can become completely unintelligible when, for instance, Greek or Russian documents are reprocessed under the assumption that the text is encoded in, say, the encoding for U.S.-Windows.

LaTeX makes one fundamental assumption at this stage: that (nearly) all characters of visible ASCII (decimal 32–126) are represented by the number that they have in the ASCII code table; see Table 7.31 on the next page.

Image

Table 7.31. LICR objects represented with single characters

There is both a practical and a TeXnical reason for this assumption. The practical reason is that most 8-bit encodings in use today share a common 7-bit plane. The TeXnical reason is to effectively2 use TeX, the majority of the visible portion of ASCII needs to be processed as characters of category “letter”—since only characters with this category can be used in multiple-character command names in TeX—or category “other”—since TeX will not, for example, recognize the decimal digits as being part of a number if they do not have this category code.

2 At least this was true when this interface was being designed. These days, with computers being much faster than before, it would be possible to radically change the input method of TeX by basically disabling it altogether and parsing the input data manually—that is, character by character.

When a character—or more exactly an 8-bit number—is declared to be of category “letter” or “other” in TeX, then this 8-bit number will be transparently passed through TeX. This means that in the output TeX will typeset whatever symbol is in the font at the position addressed by that number.

A consequence of the assumption mentioned earlier is that fonts intended to be used for general text require that (most of) the visible ASCII characters are present in the font and are encoded according to the ASCII encoding. The exact list is given in Table 7.31.

LaTeX internal character representation (LICR)

All other 8-bit numbers (i.e., those outside visible ASCII) potentially being present in the input file are assigned a category code of “active”, which makes them act like commands inside TeX. This allows LaTeX to transform them via the input encodings to a form that we call the LaTeX internal character representation (LICR).

Unicode’s UTF-8 encoding is handled similarly: the ASCII characters represent themselves, and the starting octets for multiple-byte representations act as active characters that scan the input for the remaining octets. The result will be turned into an object in the LICR, if it is mapped, or it will generate an error, if the given Unicode character is not mapped.

The most important characteristic of objects in the LICR is that the representation is 7-bit ASCII so that it is invariant to any input encoding change, because all input encodings are supposed to be transparent with respect to visible ASCII. This enables LaTeX, for example, to write auxiliary files (e.g., .toc files) using the LICR representation and to read them back in a different context (and possibly different encoding) without any misinterpretations.

The purpose of the output (or font) encoding is then to map the internal character representations to glyph positions in the current font used for typesetting or, in some cases, to initiate more complex actions. For example, it might place an accent (present in one position in the current font) over some glyph (in a different position in the current font) to achieve a printed image of the abstract character represented by the command(s) in the internal character encoding.

Because the LICR encodes all possible characters addressable within LaTeX, it is far larger than the number of characters that can be represented by a single TeX font (which can contain a maximum of 256 glyphs). In some cases a character in the internal encoding can be rendered with a font by combining glyphs, such as accented characters mentioned above. However, when the internal character requires a special shape (e.g., the currency symbol “¤”), there is no way to fake it if that glyph is not present in the font.

Nevertheless, the LaTeX model for character encoding supports automatic mechanisms for fetching glyphs from different fonts so that characters missing in the current font will get typeset—provided a suitable additional font containing them is available, of course.

7.11.2. LaTeX’s internal character representation (LICR)

Technically speaking, text characters are represented internally by LaTeX in one of three ways, each of which will be discussed in the following sections.

Representation as characters

A small number of characters are represented by “themselves”; for example, the Latin A is represented as the character “A”. Characters represented in this way are shown in Table 7.31 on the previous page. They form a subset of visible ASCII, and inside TeX all of them are given the category code of “letter” or “other”. Some characters from the visible ASCII range are not represented in this way, either because they are part of the TeX syntax1 or because they are not present in all fonts. If one uses, for example, “<” in text, the current font encoding determines whether one gets < (T1) or perhaps a ¡ (OT1) in the printout.2

1 The LaTeX syntax knows a few more characters, such as *[]. They play a dual rôle, also being used to represent the characters in straight text. Sometimes problems arise trying to keep the two meanings apart. For example, a ] within an optional argument is only possible when it is hidden by a set of braces; otherwise, LaTeX will think the optional argument has ended.

2 This describes the situation in text. In math “<” has a well-defined meaning: “generate a less than relation symbol”.

Representation with character sequences

TeX’s internal ligature mechanism supports the generation of new characters from a sequence of input characters. While this is actually a property of the font, some such sequences have been explicitly designed to serve as input shortcuts for characters that are otherwise difficult to address with most keyboards. Only a very few characters generated in this way are considered to belong to LaTeX’s internal representation. These include the en dash and em dash, which are generated by the ligatures -- and ---, and the opening and closing double quotes, which are generated by '' and '' (for the latter people sometimes use the single character ", but this is incorrect as it may produce a straight double quote, i.e., "). While most fonts also implement !' and ?' to generate ¡ and ¿, this feature is not universally available in all fonts. For this reason all such characters have an alternative internal representation as a command (e.g., extendash or extexclamdown).

Representation as “font-encoding–specific” commands

The other way to represent characters internally in LaTeX (and this covers the majority of characters) is with special LaTeX commands (or command sequences) that remain unexpanded when written to a file or when placed into a moving argument. These special commands are sometimes referred to as “font-encoding–specific commands” because their meaning depends on the font encoding current when LaTeX is ready to typeset them. Such commands are declared using special declarations, as discussed below. They usually require individual definitions for each font encoding. If no definition exists for the current encoding, either a default is used (if available) or an error message is presented to the user.

Technically, when the font encoding is changed at some point in the document, the definitions of the encoding-specific commands do not change immediately, as that would mean changing a large number of commands on the spot. Instead, these commands have been implemented in such a way that they notice, once they are used, if their current definition is no longer suitable for the font encoding in force. In such a case they call upon their counterparts in the current font encoding to do the actual work.

The set of “font-encoding–specific commands” is not fixed, but rather implicitly defined to be the union of all commands defined for individual font encodings. Thus, by adding new font encodings to LaTeX, new “font-encoding–specific commands” might emerge.

7.11.3. Input encodings

Once the package inputenc is loaded (with or without options), the two declarations DeclareInputText and DeclareInputMath for mapping 8-bit input characters to LICR objects become available. Their usage should be confined to input encoding files (described below), packages, or, if necessary, to the preamble of documents.

These commands take an 8-bit number as their first argument, which can be given as a decimal number (e.g., 239), octal number (e.g., '357), or hexadecimal notation (e.g., "EF). It is advisable to use decimal notation given that the characters ' or " might get special meanings in a language support package, such as shortcuts for accents, thereby preventing octal or hexadecimal notation from working correctly if packages are loaded in the wrong order.

Image

The DeclareInputText command declares character mappings for use in text. Its second argument contains the encoding-specific command (or command sequence), that is the LICR object, to which the character number should be mapped. For instance,

Image

maps the number 239 to the encoding-specific representation of ï, which is "i. Input characters declared in this way cannot be used inside mathematical formulas.

Image

If the number represents a character for use in mathematical formulas, then the declaration DeclareInputMath must be used. For example, in the input encoding cp437de (German MS-DOS keyboard),

Image

associates the number 224 with the command alpha. Note that this declaration would make the key producing this number usable only in math-mode, as alpha is not allowed elsewhere.

Image

This declaration is available only if the option utf8 is used. It maps Unicode numbers to LICR objects (i.e., characters usable in text). For example,

Image

In theory, there should be only a single unique bidirectional mapping between the two name spaces, so that all such declarations could be already automatically made when the utf8 option is selected. In practice, the situation is a little more complicated. For one, it is not sensible to automatically provide the whole table, because that would require a huge amount of TeX’s memory. Additionally, there are many Unicode characters for which no LICR object exists (so far), and conversely many LICR objects have no equivalents in Unicode.1 The inputenc package solves that problem by loading only those Unicode mappings that correspond to the encodings used in a particular document (as far as they are known) and responds to any other request for a Unicode character with a suitable error message. It then becomes your task to either provide the right mapping information or, if necessary, load an additional font encoding.

1 This is perhaps a surprising statement, but simply consider that, for example, accent commands like " combined with some other character form a new LICR object, such as "d (whether sensible or not). Many such combinations are not available in Unicode.

As mentioned previously, the input encoding declarations can also be used in packages or in the preamble of a document. For this approach to work, it is important to load the inputenc package first, thereby selecting a suitable encoding. Subsequent input encoding declarations will act as a replacement for (or addition to) those being defined by the present input encoding.

There are two internal commands that you might see when using the inputenc package. The IeC command is used internally by the DeclareInputText declaration in certain circumstances. It ensures that when the encoding-specific command is written to a file, a space following it is not gobbled up when the file is read back in. This processing is handled automatically, so that a user never has to write this command. We mention it here because it might show up in .toc files or other auxiliary files.

The other command, @tabacckludge, stands for “tabbing accent kludge”. It is (unfortunately) needed because the current version of LaTeX inherited an overloading of the commands =, ', and ', which normally denote certain accents (i.e., are encoding-specific commands), but have special meanings inside the tabbing environment. For this reason, mappings that involve any of these accents need to be encoded in a special way. If, for example, you want to map 232 to the character è which has the internal representation 'e, you should not write

Image

but rather

Image

The latter form works everywhere, including inside a tabbing environment.

Mapping to text and/or math

For technical as well as conceptual reasons, TeX makes a very strong distinction between characters usable in text and those usable in math. Except for the visible ASCII characters, commands that produce characters can normally be used in either text or math mode but not in both modes.

Unfortunately, for some keyboard keys it is not clear whether they should be regarded as generating characters for use in math or text. For example, should the key generating the character ± be mapped to extpm, which is an encoding-specific command and thus can be used only in text, or should it be mapped to pm and therefore be available only in math?

The early releases of the inputenc package used the following strategy: all keyboard keys available in standard TeX fonts for text (i.e., those encoded in either OT1 or T1) were mapped to encoding-specific text commands, while the remaining keys got mapped to available math commands. But using a strategy solely driven by the availability of glyphs has the disadvantage that only users with a good knowledge of TeX internals could tell immediately whether using a key labeled, say “¾” or “3” would be allowed only in text or only in math.1

1 In the first releases of the inputenc package, “¾” was a text glyph but “3” was a math glyph—comprehensible?

What can be done to resolve this situation gracefully? The approach of checking for the current mode, as used in babel’s extormath command,

Image

fails if such a construction is used in a math alignment structure (it selects the wrong part of the conditional and usually ends in an incomprehensible TeX error message). Fixing this problem by starting the above construction with elax will prevent kerning and ligatures that may otherwise be present in a word. This is, in fact, a problem that is unsolvable in TeX. However, it can be solved if eTeX is used as the base formatter for LaTeX and as nowadays eTeX is available with nearly every TeX system, there are plans to make this program the basis for future maintenance releases of LaTeX.

At the time of this book’s writing, work on an extension of inputenc (based on eTeX) was under way. This proposed extension will automatically support all accessible keyboard characters in text and formulas. Once it becomes officially available, you will be able to comfortably typeset your formulas by simply adding the option math when loading the inputenc package.

Input encoding files for 8-bit encodings

Input encodings are stored in files with the extension .def, where the base name is the name of the input encoding (e.g., latin1.def). Such files should contain only the commands described in the current section.

The file should start with a ProvidesFile declaration describing the nature of the file. For example:

Image

If there are mappings to encoding-specific commands that might not be available unless additional packages are loaded, one could declare defaults for them using ProvideTextCommandDefault. For example:

Image

The command TextSymbolUnavailable, used above, issues a warning indicating that a certain character is not available with the currently used fonts. This can be useful as a default—that is, when such characters are available only if special fonts are loaded and no suitable way exists to fake the characters with existing characters (as was possible for a default for extonehalf above).

The remaining part of the file should consist only of input encoding declarations using DeclareInputText or DeclareInputMath. As mentioned earlier, the use of the latter command, though allowed, is discouraged. No other commands should be used inside an input encoding file; in particular, no commands that prevent reading the file several times (e.g., ewcommand), as the encoding files are often loaded several times in a single document!

Input mapping files for UTF-8

As mentioned earlier, the mapping from Unicode to LICR objects is not done in a single large mapping file, but rather organized in a way that enables LaTeX to load only those mappings that are relevant for the font encodings used in the current document. This is done by attempting to load for each encoding name a file name enc.dfu that, if it exists, contains the mapping information for those Unicode characters provided by that particular encoding. Other than a number of DeclareUnicodeCharacter declarations, such files should contain only a ProvidesFile line.

As different font encodings often provide to a certain extent the same characters, it is quite common for declarations for the same Unicode character to be found in different .dfu files. It is, therefore, very important that these declarations in different files be identical (which in theory they should be anyway, but...). Otherwise, the declaration loaded last will survive, which may be a different one from document to document.

So anyone who wants to provide a new .dfu file for some encoding that was previously not covered should carefully check the existing definitions in .dfu files for related encodings. Standard files provided with inputenc are guaranteed to have uniform definition—they are, in fact, all generated from a single list that is suitably split up. A full list of currently existing mappings can be found in the file utf8enc.dfu.

7.11.4. Output encodings

As we learned earlier, output encodings define the mapping from the LICR to the glyphs (or constructs built from glyphs) available in the fonts used for typesetting. These mappings are referenced inside LaTeX by two- or three-letter names (e.g., OT1 and T3). We say that a certain font is in a certain encoding if the mapping corresponds to the positions of the glyphs in the font in question. So what are the exact components of such a mapping?

Characters internally represented by ASCII characters are simply passed on to the font. In other words, TeX uses the ASCII code to select a glyph from the current font. For example, the character “A” with ASCII code 65 will result in typesetting the glyph in position 65 in the current font. This is why LaTeX requires that fonts for text contain all such ASCII letters in their ASCII code positions, as there is no way to interact with this basic TeX mechanism (other than to disable it and do everything “manually”). Thus, for visible ASCII, a one-to-one mapping is implicitly present in all output encodings.

Characters internally represented as sequences of ASCII characters (e.g., “--”), are handled as follows: when the current font is first loaded, TeX is informed that the font contains a number of so-called ligature programs. These define certain character sequences that are not to be typeset directly but rather to be replaced1 by some other glyphs from the font (the exact position of each replacement glyph is font dependent and not important otherwise). For example, when TeX sees “--” in the input (i.e., ASCII code 45 twice), a ligature program might direct it to use the glyph in position 123 instead (which then would hold the glyph “–”). Again, no interaction with this mechanism is possible. Some such ligatures are present for purely aesthetic reasons and may or may not be available in certain fonts (e.g., ff generating “ff” rather than “ff”). Others are supposed to be implemented for a certain encoding (e.g., “---” producing an emdash).

1 The actions carried out by a font ligature program can, in fact, be far more complex, but for the purpose of our discussion here this simplified view is appropriate. For an in-depth discussion, see Knuth’s paper on virtual fonts [91].

Nevertheless, the bulk of the internal character representation consists of “font-encoding–specific” commands. They are mapped using the declarations described below. All declarations have the same structure in their first two arguments: the font-encoding–specific command (or the first component of it, if it is a command sequence), followed by the name of the encoding. Any remaining arguments will depend on the type of declaration.

Thus, an encoding XYZ is defined by a bunch of declarations all having the name XYZ as their second argument. Of course, to be of any use, some fonts must be encoded in that encoding. In fact, the development of font encodings is normally done the other way around—namely, someone starts with an existing font and then provides appropriate declarations for using it. This collection of declarations is then given a suitable name, such as OT1. In the next section, we will take the font ecrm1000, shown in Table 7.32 on the facing page, whose font encoding is called T1 in LaTeX, and build appropriate declarations to access the glyphs from a font encoded in this way. The blue characters in this table are those that have to be present in the same positions in every text encoding, as they are transparently passed through TeX.

Image
Image

Table 7.32. Glyph chart for a T1-encoded font (ecrm1000)

Output encoding files

Like input encoding files, output encoding files are identified by the extension .def. However, the base name of the file is slightly more structured: the name of the encoding in lowercase letters, followed by the letters enc (e.g., t1enc.def for the T1 encoding).

Such files should contain only the declarations described in the current section. As output encoding files might be read several times by LaTeX, it is particularly important to adhere to this rule strictly and to refrain from using, for example, ewcommand, which prevents reading such a file multiple times!

For identification purposes an output encoding file should start with a ProvidesFile declaration describing the nature of the file. For example:

Image

To be able to declare any encoding-specific commands for a particular encoding, we first have to make this encoding known to LaTeX. This is achieved via the DeclareFontEncoding declaration. At this point it is also useful to declare the default substitution rules for the encoding with the help of the command DeclareFontSubstitution; both declarations are described in detail in Section 7.10.5 starting on page 430.

Image

Having introduced the T1 encoding in this way to LaTeX, we can now proceed with declaring how font-encoding–specific commands should behave in that encoding.

Image

Perhaps the simplest form of declaration is the one for text symbols, where the internal representation can be directly mapped to a single glyph in the target font. This is handled by the DeclareTextSymbol declaration, whose third argument—the font position—can be given as a decimal, hexadecimal, or octal number. For example,

Image

declare that the font-encoding–specific commands ss, AE, and ae should be mapped to the font (decimal) positions 255, 198, and 230, respectively, in a T1-encoded font. As mentioned earlier, it is safest to use decimal notation in such declarations, even though octal or hexadecimal values are often easier to identify in glyph charts like the one on the previous page. Mixing them like we did in the example above is certainly bad style. All in all, there are 49 such declarations for the T1 encoding.

Image

Often fonts contain diacritical marks as individual glyphs to allow the production of accented characters by combining such a diacritical mark with some other glyph. Such accents (as long as they are to be placed on top of other glyphs) are declared using the DeclareTextAccent command; the third argument slot is the position of the diacritical mark in the font. For example,

Image

defines the “umlaut” accent. From that point onward, an internal representation such as "a has the following meaning in the T1 output encoding: typeset “ä” by placing the accent in position 4 over the glyph in position 97 (the ASCII code of the character a). In fact, such a declaration implicitly defines a huge range of internal character presentations—that is, anything of the type " base-glyph, where base-glyph is something defined via DeclareTextSymbol or any ASCII character belonging to the LICR, such as “a”.

Even those combinations that do not make much sense, such as "P (i.e., pilcrow sign with umlaut Image) conceptually become members of the set of font-encoding–specific commands in this way. There are a total of 11 such declarations in the T1 encoding.

Image

The glyph chart on page 449 contains a large number of accented characters as individual glyphs—for example, “ä” in position '344 octal. Thus, in T1 the encoding-specific command "a should not result in placing an accent over the character “a” but instead should directly access the glyph in that position of the font. This is achieved by the declaration

Image

which states that the encoding-specific command "a results in typesetting the glyph 228, thereby disabling the accent declaration above. For all other encoding-specific commands starting with ", the accent declaration remains in place. For example, "b will produce a “Image” by placing an accent over the base character b.

The third argument, simple-LICR-object, should be a single letter, such as “a”, or a single command, such as j or oe. There are 110 such composites declared for the T1 encoding.

Image

Although not used for the T1 encoding, there also exists a more general variant of DeclareTextComposite that allows arbitrary code in place of a slot position. This is, for example, used in the OT1 encoding to lower the ring accent over the “A” compared to the way it would be typeset with TeX’s accent primitive. The accents over the “i” are also implemented using this form of declaration:

Image

What have we not covered for the T1 encoding? A number of diacritical marks are not placed on top of other characters but are placed somewhere below them. There is no special declaration form for such marks, as the actual placement usually involves low-level TeX code. Instead, the generic DeclareTextCommand declaration can be used for this purpose.

Image

For example, the “underbar” accent  in the T1 encoding is defined with the following wonderful piece of prose:

Image

Without going into detail about what the code precisely means, we can see that the DeclareTextCommand is similar in structure to ewcommand. That is, it has an optional num argument denoting the number of arguments (one here), a second optional default argument (not present here), and a final mandatory argument containing the code in which it is possible to refer to the argument(s) using #1, #2, and so on. T1 has four such declarations, for , c, d, and k.

DeclareTextCommand can also be used to build font-encoding–specific commands consisting of a single control sequence. In this case it is used without optional argument, thus defining a command with zero arguments. For example, in T1 there is no glyph for a ‰ sign, but there exists a strange little “0” in position '30, which, if placed directly behind a %, will give the appropriate glyph. Thus, we can write

Image

This discussion has now covered all commands needed to declare the font-encoding–specific commands for a new encoding. As mentioned earlier, only these commands should appear in encoding definition files.

Output encoding defaults

What happens if an encoding-specific command is used for which there is no declaration in the current font encoding? In that case one of two things might happen: either LaTeX has a default definition for the LICR object, in which case this default is used, or the users gets an error message stating that the requested LICR object is unavailable in the current encoding. There are a number of ways to set up defaults for LICR objects.

Image

The DeclareTextCommandDefault command provides the default definition for an LICR-object that is to be used whenever there is no specific setting for the object in the current encoding. Such default definitions can, for example, fake a certain character. For instance, extregistered has a default definition in which the character is built from two others, like this:

Image

Technically, the default definitions are stored as an encoding with the name “?”. While you should not rely on this fact, as the implementation might change in the future, it means that you cannot declare an encoding with this name.

Image

In most cases, a default definition does not require coding but simply directs LaTeX to pick up the character from some encoding in which it is known to exist. The textcomp package, for example, consists of a large number of default declarations that all point to the TS1 encoding. Consider the following declaration:

Image

The DeclareTextSymbolDefault command can, in fact, be used to define the default for any LICR object without arguments, not just those that have been declared with the DeclareTextSymbol command in other encodings.

Image

A similar declaration exists for LICR objects that take one argument, such as accents (which gave this declaration its name). This form is again usable for any LICR object with one argument. The LaTeX kernel, for example, contains quite a number of declarations of the type:

Image

This means that if the " is not defined in the current encoding, then use the one from an OT1-encoded font. Likewise, if you need a tie accent, pick up one from OML1 if nothing better is available.

1 OML is a math font encoding, but it contains this text accent mark.

Image

With the ProvideTextCommandDefault declaration a different kind of default can be “provided”. As the name suggests, it does the same job as the declaration DeclareTextCommandDefault, except that the default is provided only if no default has been defined before. This is mainly used in input encoding files to provide some sort of trivial defaults for unusual LICR objects. For example:

Image

Packages like textcomp can then replace such definitions with declarations pointing to real glyphs. Using Provide.. instead of Declare.. ensures that a better default is not accidentally overwritten if the input encoding file is read.

Image

In some cases an existing declaration needs to be removed to ensure that a default declaration is used instead. This task can be carried out by the UndeclareTextCommand command. For example, the textcomp package removes the definitions of extdollar and extsterling from the OT1 encoding because not every OT1-encoded font actually has these symbols.1

1 This is one of the deficiencies of the old TeX encodings; besides missing accented glyphs, they are not even identical from one font to another.

Image

Without this removal, the new default declarations to pick up the symbols from TS1 would not be used for fonts encoded with OT1.

Image

The action hidden behind the declarations DeclareTextSymbolDefault and DeclareTextAccentDefault is also available for direct use. Assume, for example, that the current encoding is U. In that case,

Image

has the same effect as entering the code below. Note in particular that the “a” is typeset in encoding U—only the accent is taken from the other encoding.

Image
A listing of standard LICR objects

Table 7.33 provides a comprehensive overview of the LaTeX internal representations available with the three major encodings for Latin-based languages: OT1 (the original TeX text font encoding), T1 (the LaTeX standard encoding, also known as Cork encoding), and LY1 (an alternate 8-bit encoding proposed by Y&Y). In addition, it shows all LICR objects declared by TS1 (the LaTeX standard text symbol encoding) provided by loading the textcomp package.

Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image

Table 7.33. Standard LICR objects

The first column of the table shows the LICR object names alphabetically sorted, indicating which LICR objects act like accents. The second column shows a glyph representation of the object.

The third column describes whether the object has a default declaration. If an encoding is listed, it means that by default the glyph is being fetched from a suitable font in that encoding; constr. means that the default is produced from low-level TeX code; if the column is empty it means that no default is defined for this LICR object. In the last case a “Symbol unavailable” error is returned when you use it in an encoding for which it has no explicit definition. If the object is an alias for some other LICR object, we list the alternative name in this column.

Columns four through seven show whether an object is available in the given encoding. Here Image means that the object is natively available (as a glyph) in fonts with that encoding, Image means that it is available through the default for all encodings, and constr. means that it is generated from several glyphs, accent marks, or other elements. If the default is fetched from TS1, the LICR object is available only if the textcomp package is loaded.

7.12. Compatibility packages for very old documents

The font interface in LaTeX changed from a fixed font structure (LaTeX 2.09 prior to 1990) to a flexible system (LaTeX2ε with NFSS version 2 integrated in 1994). During the years 1990–1993 NFSS version 1 was widely used in Europe. Although the differences between versions 1 and 2 have not been that enormous, they nevertheless make it impossible to run documents from that time successfully through today’s LaTeX. For this reason a number of compatibility packages have been developed to help in processing documents written for LaTeX 2.09 with or without NFSS 1.

7.12.1. oldlfont, rawfonts, newlfont—Processing old documents

Backward compatibility to 1993 and earlier

As we have seen, NFSS—and thus LaTeX2ε—differs from LaTeX 2.09 in several ways in its treatment of font commands. This difference is most noticeable in math formulas, where commands like fseries are not supported. Nevertheless, it is a very simple matter to typeset older documents with NFSS.

If you merely want to reprint a document, LaTeX will see the documentstyle command and automatically switch to compatibility mode, thereby emulating the old font selection mechanism of LaTeX 2.09 as described in the first edition of the LaTeX Manual. Alternatively, you can load the oldlfont package after the documentclass command. If you do so, all old font-selecting commands will be defined, font-changing commands cancel each other, and all of these commands can be used in mathematical formulas.

Some old documents refer to LaTeX 2.09 internal font commands such as wlrm or intt. These commands now generate error messages, because they are no longer defined (not even in compatibility mode). One reason they are not supported is that they were never available on all installations. To process a document containing such explicit font-changing commands, you have to define them in the preamble using the commands described in Section 7.9. For example, for the above commands, it would be sufficient to add the following definitions to the preamble:

Image

A package exists to assist you in this task: if you load the rawfonts package with the options only, twlrm, and nintt, it will make the above declarations for you. If you load it without any option, it will define all LaTeX 2.09 hard-wired font commands for you.

Reusing parts of documents also is very simple: just paste them into the new document and watch what happens. There is a good chance that LaTeX will happily process the old document fragment and, if not, it will explicitly inform you about the places where you have to change your source—for example, where you have to change occurrences of it, sf, and similar commands in formulas to the corresponding math alphabet identifier commands mathit, mathsf, and so on.

Backward compatibility with the first release of NFSS

In the first release of NFSS, the two-letter font-changing commands were redefined to modify individual attributes only. For example, sf and it behaved just like the NFSS2 commands sffamily and itshape, respectively. If you re-process an old document that was written for this convention, load the package newlfont in your document preamble to reinitiate it.

7.12.2. LaTeXsym—Providing symbols from LaTeX 2.09 lasy fonts

Eleven math symbols provided by LaTeX 2.09 are no longer defined in the base set-up of NFSS:

7-12-1
Image

If you want to use any of these symbols, load the LaTeXsym package in your document. These symbols are also made available if you load the amsfonts or the amssymb package; see Section 8.9.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.131.47