Formatting Objects

At the beginning of this chapter, it was stated that XSL is really a two-step process. We've examined the transformation step closely. The second step is formatting or page layout and is provide by formatting objects. In this section, we examine how formatting objects work and how we can use them to lay out an XML document.

XSL Formatting Objects Specification

This chapter is based on the April 1999 release of the XSL Formatting Objects specification. Currently, a partial implementation of the specification exists for rendering to PDF from xml.apache.org/FOP. It's likely that the syntax will have changed slightly by the time this book reaches print.


Formatting Objects Overview

Formatting objects are based on and extend both CSS2 and DSSSL. Perhaps the largest difference between XSL Formatting Objects, often referred to as FO objects or just FO, and other page layout mechanisms is the way FO objects are used. FO objects don't define layout per se but, rather, they specify constraints under which objects are positioned. The exact layout of any object is handled by the formatter itself.

For the most part, each FO object is based on a similar CSS construct. However, as of the most recent XSL specification, CSS constructs and FO objects are similar in name and function, but an exact mapping was not completed.

Using the Formatting Object Specification (FOP)

Currently, or at least when this chapter was completed, there was a single partial implementation of the XSL Formatting Objects specification, FOP. FOP is available from xml.apache.org/fop and was donated by J. Tauber of http://www.jtauber.com. You will need to download the latest version of FOP (at last report, fop_bin_0_12_1.jar) and add it to your CLASSPATH. For convenience, v0.12.1 is available on the CD-ROM.

You can run FOP in one of several ways. The most common is via the command line using a FOP or FOB file, which is just an XML file containing formatting objects. The format of the FOP command is

java [-DDorg.xml.sax.parser=parser.class] u-4064org.apache.fop.apps.CommandLine fo.input
 output.pdf
					

where fo.input is any file containing formatting objects and output.pdf is the resulting PDF. In addition, you can specify the sax parser to use with FOP by passing the -Dorg.xml.sax.parser switch and setting it to an appropriate parser. For example, to use FOP to process simple.fop to simple.pdf using J. Clarks sax parser, assuming it's in the CLASSPATH, you would execute the following:

java  -Dorg.xml.sax.parser=com.jclark.xml.sax.Driver u-4064org.apache.fop.apps
.CommandLine simple.fob  simple.pdf

The previous method assumes that you ran XT or some other XSL Transform engine on an original XML document and produced a formatting objects–based file using a stylesheet. You can skip this intermediate step by executing the following command:

java org.apache.fop.apps.XTCommandLine file.xml stylesheet.xsl output.pdf
					

where file.xml is an input XML document, stylesheet.xsl is a stylesheet that transforms file.xml into the formatting objects namespace, and output.pdf is the result rendered as a PDF.

The Apache FOP distribution comes with one additional tool that can be used to render to the screen the resultant FO object file using the AWT toolkit. To execute AWTCommandline, type the following command:

java org.apache.fop.apps.AWTCommandLine fo.file
					

where fo-file contains formatting objects. Figure 5.6 shows a screen shot of running AWTCommandLine on a simple formatting objects file.

Figure 5.6. AWTCommand on simple.fob.


With the ability to actually process formatting objects, let's now look at Areas and how they affect the page layout process.

FOP and the XSL Formatting Objects Specification

No complete implementation of FO objects existed at the time of this writing. While FOP implements about 28 for the FO elements and 48 FO properties, there are well over 50 elements and more than 100 properties in the XSL specification. See the xml.apache.org/fop Web page for a complete list of what is supported in FOP.


Areas

XSL formatting objects are based on a page model in which pages are broken into five regions: region-before, region-after, region-start, region-end, and region-body. Figure 5.7 shows a typical layout of these five regions from a Western language perspective. Writing modes define the actual layout of each region. Many of the readers of this text might be English speaking and expect text to be displayed in a specific fashion—left to right and top to bottom. Using the Western writing mode, region-before corresponds to the header, region-after the footer, region-start the left margin, region-end the right margin, with region-body representing the space left over. Other writing modes correspond to other written languages. Table 5.8 shows many of the common writing modes with common languages that use them and a description of their layouts and reading rules. A number of less common modes also exist that correspond to no specific language and are used in various publishing applications.

Figure 5.7. Western page layout.


Table 5.8. Writing Modes and Screen Layouts
Mode Common Language Description
lr-tb Latin/Western Read left to right, top to bottom.
rl-tb Arabic/Hebrew Read right to left, top to bottom.
tb-rl Chinese/Japanese Read top to bottom, right to left.
tb-lr Mongolian/Western Advertising Read top to bottom, left to right.
bt-lr None Read bottom to top, left to right.

In addition to the five main regions concepts, FO objects are divided into areas. Areas are the basic building blocks of page layout. The XSL Formatting Objects specification defines four rectangular areas: area containers, block areas, line areas, and inline areas. Area containers are constrained by their writing mode and have a number of common properties, such as borders and padding. The four main areas are

  • Area containers— Area containers are the highest level container in the Formatting Objects specification. Area containers are used to reserve space and hold content and may contain other area containers. Area containers can be precisely positioned and contain many other area types. Area containers have borders and can be padded. Hierarchically, area containers can contain other area containers or block areas and display spaces. In a typical book model, the highest level container is the page. Pages are further broken down into headers/footers and left/right margins with the main body contained in the area in between. Examples of area containers are region-start, region-end, region-before, region-after, and region-body.

  • Block areas— Block areas are the next level of area in the FO specification. Block areas typically represent paragraphs or lists and contain other block areas, line areas, and display spaces. The bullet points in this list would have been represented using a list-block, a kind of block area prepended with a number, character, or glyph. Block areas can contain nested block areas, line areas, and display spaces. Within a given area container, block areas are "stacked" one on top of the other and are constrained by the writing mode. Stacking is the process of ordering sequentially one area after another within its bounding area. Block areas grow and shrink as required to contain the text and other areas represented within them. Examples of block areas are: block, display-rule, display-graphic, display-link, and list-block.

  • Line areas— Line areas are placed within block areas and contain what we would normally consider lines of text. Line areas contain inline areas and inline spaces. Like block areas, line areas are "stacked" one after another, shrinking or growing as required to contain their content. There are no formatting objects that correspond directly to line areas but they are created by the formatter as required to contain inline areas. Unlike the other areas, line areas do not have borders or padding.

  • Inline areas— Inline areas are the lowest level area container in the Formatting Objects specification and typically represent characters or "glpyhs." Inline areas are always stacked and may be separated by inline spaces. An example of an inline area is a glyph-area representing a single character or glyph within a given language. Note that glyph-areas are atomic elements and cannot contain other areas.

Each of the four areas, with the exception of line areas, have two common properties—border and padding. Borders can have before, after, end, and start color as well as style and width. Padding can be specified to precede, follow, be above, or below a given area.

The FO Namespace

Before we go on and look at the anatomy of a FO Object file, we need to examine the FO namespace.

FOP and Java

Formatting is an important aspect of XML. However, because this is a book on Java and XML, we will not go into elaborate detail on XSL formatting objects but rather give a sufficient introduction such that the reader can progress on his or her own.


The FO Name space is defined by xmlns:fo=http://www.w3.org/xsl/format/1.0. However, for the remainder of this chapter, we will be using the namespace xmlns:fo="http://www.w3.org/1999/xsl/format" because the most current namespace is not yet supported by FOP.

There are more then 50 formatting objects defined by the April 1999 specification, and perhaps more will be added over time. In addition, there are over 100 properties that can be applied to FO objects. Table 5.9 lists the elements currently supported by FOP, in the FO namespace, in roughly the order they are defined in XSL specification.

Table 5.9. The FO Namespace
Namespace Elements
fo:root fo:layout-master-set
fo:simple-page-master fo:region-body
fo:region-before fo:region-after
fo:page-sequence fo:sequence-specification
fo:sequence-specifier-simple fo:sequence-specifier-repeating
fo:sequence-specifier-alternating fo:flow
fo:static-content fo:block
fo:list-block fo:list-item
fo:list-label fo:list-body
fo:page-number fo:display-sequence
fo:inline-sequence fo:display-rule
fo:display-graphic fo:table
fo:table-column fo:table-body
fo:table-row fo:table-cell

A number of properties apply to the FO namespace elements. Table 5.10 lists the current set of properties supported by FOP.

Table 5.10. FO Namespace Properties
Namespace Properties
end-indent page-master-name
page-master-first page-master-repeating
page-master-odd page-master-even
margin-top margin-bottom
margin-left margin-right
extent page-width
page-height flow-name
font-family font-style
font-weight font-size
line-height text-align
text-align-last space-before.optimum
space-after.optimum provisional-distance-between-starts
provisional-label-separation rule-thickness
color wrap-option
white-space-treatement break-before
break-after text-indent
href column-width
background-color padding-top
padding-left padding-bottom
padding-right  

Anatomy of a FO Document

FO documents are nothing more than normal XML documents, containing a root element and some number of child elements. They need to follow XML syntax and well formed-ness requirements, as well as use the FO namespace. Listing 5.28 shows a simple FO file with a common extension .fob or .fop, although .xml would have been appropriate as well. We can break this file down and see that it is made up of the following parts:

  1. Line 1—An fo:root root element that encapsulates the entire document fo:root must contain the xmlns property. In production environments, this is going to be http://www.w3.org/xsl/format/1.0. However, for FOP we need to use the older http://www.w3.org/1999/xsl/format namespace. The root element contains a fo:layout-master-set and a fo:page-sequence element.

  2. Line 2—fo:layout-master-set Layout master sets contain 1 or more simple-page-master elements, each named using the name=<pagename> property. Simple-page-masters define the margin, extents and, in general, the layout of a given page. You must have at least one fo:simple-page-master, but you can certainly have more. Books often have first, last, odd, and even specifications, each with different margins and layout.

  3. Line 11—fo:region-before, fo:region-body, fo:region-after, fo:region-start, and fo:region-end The fo:region-* elements define what regions are actually contained within our page. In this example, we choose to have three regions (because that's all FOP supports), each of which will be populated later on in the document.

  4. Line 16—fo:page-sequence Every FO document will have a fo:page-sequence element that defines the content of the page. Each fo:page-sequence element starts with a fo:sequence-specification element that defines what page master to use for this layout. There may be fo:sequence-specifier-* children within a sequence specifier, each representing a different portion or the result.

  5. Lines 20, 32, and 42—Defines the actual content of the region-before, region-body, and region-after portions of the document The fo:flow element, starting in line 42, represents the running content of the result and normally spans multiple pages. The two fo:static-content elements, starting on lines 20 and 32, represent content that will be placed on all pages. In Listing 5.28, we specify a header and footer for the pages.

Code Listing 5.28. Simple.fob—A Simple FO Object File
 1: <?xml version="1.0" encoding="utf-8"?>
 2: <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
						3:     <fo:layout-master-set>
						4:     <fo:simple-page-master page-master-name="cheap"
 5:                   height="8.5in"
 6:                   width="11in"
 7:                   margin-top="0.5in"
 8:                   margin-bottom="0.5in"
 9:                   margin-left="1in"
10:                   margin-right="1in">
11:       <fo:region-before extent="1in"/>
						12:       <fo:region-body margin-top="0.5in"/>
						13:       <fo:region-after extent=".75in"/>
14:     </fo:simple-page-master>
15:     </fo:layout-master-set>
16:     <fo:page-sequence>
						17:         <fo:sequence-specification>
						18:            <fo:sequence-specifier-single page-master-name="cheap"/>
19:         </fo:sequence-specification>
20:         <fo:static-content flow-name="xsl-before">
21:         <fo:block font-size="18pt"
22:                 font-family="sans-serif"
23:                 line-height="24pt"
24:             background-color="black"
25:             color="white"
26:                 space-after.optimum="15pt"
27:                 text-align="centered"
28:                 padding-top="3pt">
29:                 Header in a black box
30:               </fo:block>
31:           </fo:static-content>
32:         <fo:static-content flow-name="xsl-after">
33:         <fo:block font-size="18pt"
34:                 font-family="sans-serif"
35:                 line-height="24pt"
36:                 space-after.optimum="15pt"
37:                 text-align="centered"
38:                 padding-top="3pt">
39:                 page:<fo:page-number/>
40:               </fo:block>
41:         </fo:static-content >
42:         <fo:flow flow-name="xsl-body">
						43:             <fo:block>
44:             The Gosselin Grp, Inc.
45:             Federal Colonial
46:             149900
47:             106 Central Ave
48:             1865
49:             5
50:             4
51:             80,000
52:             A Prime Main Street Poperty!
53:             </fo:block>
54:         </fo:flow>
55:     </fo:page-sequence>
56: </fo:root>
57:

Listing 5.28 was rather simplistic in that all information was hard-coded into the FO document. In real situations, an XSL transform, via a stylesheet, would have been applied to yield the FO document. Listing 5.29 shows a stylesheet that takes as input the REListing.xml document and creates the output shown in Figure 5.8. We will refer back to this listing periodically during the remainder of this section.

Figure 5.8. Result of Applying FO1.xsl to REListing.xml.


Code Listing 5.29. fo1.xsl—Stylesheet to Convert FO Objects
 1: <?xml version="1.0"?>
 2: <xsl:stylesheet
 3:     xmlns:xsl="http://www.w3.org/XSL/Transform/1.0"
 4:     xmlns:fo="http://www.w3.org/1999/XSL/Format"
 5:     indent-result="no" default-space="strip" > <!-- result-ns="fo">-->
 6:
 7:
 8:     <xsl:template match="/">
 9:             <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
10:             <fo:layout-master-set>
11:             <fo:simple-page-master page-master-name="cheap"
						12:                   height="8.5in"
						13:                   width="11in"
						14:                   margin-top="0.5in"
						15:                   margin-bottom="0.5in"
						16:                   margin-left="1in"
						17:                   margin-right="1in">
18:               <fo:region-before extent="1in"/>
19:               <fo:region-body margin-top="1.25in"/>
20:               <fo:region-after extent=".75in"/>
21:                 </fo:simple-page-master>
22:             </fo:layout-master-set>
23:
24:     <fo:page-sequence>
25:         <fo:sequence-specification>
26:             <fo:sequence-specifier-single page-master-name="cheap"/>
27:         </fo:sequence-specification>
28:         <fo:static-content flow-name="xsl-before">
						29:              <fo:block font-size="18pt"
						30:              font-family="sans-serif"
						31:              line-height="24pt"
						32:              background-color="black"
						33:              color="white"
						34:              text-align="centered"
						35:              padding-top="3pt">
						36:                  <xsl:apply-templates select="/REListing/Header"/>
						37:             </fo:block>
						38:         </fo:static-content>
39:         <fo:static-content flow-name="xsl-after">
40:             <fo:block font-size="18pt"
41:             font-family="sans-serif"
42:             line-height="24pt"
43:             text-align="centered"
44:             padding-top="3pt">
45:                 page:<fo:page-number/>
46:             </fo:block>
47:         </fo:static-content >
48:         <fo:flow>
49:              <xsl:apply-templates/>
50:         </fo:flow>
51:     </fo:page-sequence>
52:     </fo:root>
53:     </xsl:template>
54:
55:     <xsl:template match="Header">
56:         <xsl:value-of select="."/>
57:     </xsl:template>
58:
59:     <xsl:template match="Listing[ListingPrice &lt; 150000]">
60:         <fo:block font-size="12pt" font-family="sans-serif">
61:         <xsl:value-of select="."/>
62:         </fo:block>
63:         <fo:block font-size="10pt" font-family="sans-serif">
64:            taxrate:<xsl:value-of select="../TaxRate"/>
65:            applied rate:$<xsl:value-of select="round((ListingPrice div 1000) * ..
/TaxRate)"/> a year
66:         </fo:block>
67:     </xsl:template>
68: </xsl:stylesheet>

Let's look more closely at each of the parts of a FO document.

Master Pages

We saw in Listing 5.28, and again in Listing 5.29, a fo:simple-page-master element. In our examples, we only had a single page. However, in a more complete document, a single page layout is not enough; we typically need several. Listed next are the more common properties of fo:simple-page-master elements. For any property not specified, the formatter provides a reasonable default.

  • name=""string""—The name of the page master. Left, right, first, last are all common. For example name="first".

  • height=""value""—The height of the page in inches (in), centimeters (cm), or pixel (px). For example, height=""8in"" for U.S. Letter or ""8.26in"" for European A4.

  • width=""value""—The width of the page. Same units as height. For example width=""11.69in"" for A4.

  • margin-top, margin-bottom, margin-left, margin-right—We can also specify page margins. Values are in inches, centimeters, or pixels.

The following code snippet provides for a common book layout where left and right facing pages have a 1 inch inner margin. Note that the left page has fo:region-start, fo:region-body, and fo:region-end areas, whereas the right page has only a fo:region-body area.

<fo:layout-master-set>
    <fo:simple-page-master page-master-name="left"
        height="8.5in"
        width="11in"
        margin-top="0.5in"
        margin-bottom="0.5in"
        margin-left="1in"
        margin-right="0.5in">
        <fo:region-start/>
        <fo:region-body/>
        <fo:region-end/>
     </fo:simple-page-master>
    <fo:simple-page-master page-master-name="right"
        height="8.5in"
        width="11in"
        margin-top="0.5in"
        margin-bottom="0.5in"
        margin-left="0.5in"
        margin-right="1in">
        <fo:region-body/>
     </fo:simple-page-master>
</fo:layout-master-set>

Page Sequences

Each FO document must have one or more fo:page-sequence elements, each containing fo:sequence-specification, fo:static-content, and fo:flow child elements. fo:sequence-specification elements describe how to use the previously defined master pages. As we've stated, it's not uncommon to have many pages defined for a given FO document. Books typically have a start page, a back page, first chapter pages, last chapter pages, body pages, and so on. We can use fo:sequence-specificier-* elements of a fo:sequence-specification to specify which pages to use and in what order. The page-master-name property of the sequence specifier tells the formatter which master page to use. There are several flavors of sequence specifiers:

  • fo:sequence-specifier-single—Use this page only once. Uses the page-master-name property to specify the name of the master page.

  • fo:sequence-specifier-repeating—Specifies a repeating page. Uses page-master-first and page-master-repeating to specify the names of the first and repeating pages, respectively.

  • fo:sequence-specifier-alternating—Specifies different pages that are used in an alternating fashion. Uses page-master-first for the first page, page-master-[odd|even] for the odd/even pages, and page-master-last-[odd|even] for the last odd/even page. A final page, past-master-blank-even, is used when a format requires that all starting pages for a given page start on a odd page, as most books do.

Note

Note

FOP currently only supports the properties page-master-name, page-master-first, page-master-repeating, page-master-odd, and page-master-even.


There are a number or properties that can be used with fo:page-sequence elements to control how page numbering will be applied. Some of the most common are

  • initial-page-value=integerWhere integer is any positive integer.

  • format=""format""—Where format is one of the formats used in XSLT. Examples include I (generates uppercase roman numerals), i (lowercase), A to generate A B C…AA AB, and so on.

  • digit-group-separator=""separator""—Where separator is a grouping separator character, such as a comma.

  • n-digits-per-group=integerWhere integer is some signed integer between 1 and 10.

After you have defined how pages will look, via fo:sequence-elements, you need to describe the contents of a page. The next sections describe both static and flowing content in detail.

Static Content

There are two kinds of content that are placed into a FO area. Static content is unchanging from page to page. Flowing content flows from page to page. The headers and footers of a document are typically static content, whereas the actual text, figures, diagrams, and so on are flowing content. We specify static content using the fo:static-content element. Lines 28–38 of Listing 5.28 describe the static content of a page. The following snippet of code shows a fo:static-content element that defines a page header based on the /REListing/Header element of our REListing.xml document.

<fo:static-content id="header" flow-name="xsl-before">
    <fo:block font-size="18pt"
    font-family="sans-serif"
    line-height="24pt"
    background-color="black"
    color="white"
    text-align="centered"
    padding-top="3pt">
    <xsl:apply-templates select="/REListing/Header"/>
    </fo:block>
</fo:static-content>

Examining the code, we see that content elements, both static and flow, are placed into an area based on the flow-name=<area_name> property. Flow names are linked to areas based on the following mapping:

xml-body maps to region-body

xml-after maps to region-after

xml-before maps to region-before

xml-start maps to region-start

xml-end maps to region-end

It's unclear from the specification why there is a separate set of mappings rather than simply using the region-* names. Perhaps, with the final version of the specification, this minor issue will have been resolved.

We can also use the id=<name> property to uniquely identify this element.

We will discuss the remainder of the information, basically the <fo:block>...</fo:block> portion, when we examine block-level objects. In general, there can be zero or more fo:static-content elements within a fo:page-sequence element. Furthermore, fo:static-content elements must precede any fo:flow elements.

Flow Objects

Flow objects represent some sort of content distribution, such as a paragraph, a chapter, a section or similar concept, and immediately follow static content. There are typically one or more fo:flow elements within a page sequence element, each of which contains text, links, tables, lists—simply put, flowing content. Anything that is not fixed on a given page belongs in a fo:flow element. In Listing 5.29, the fo:flow elements represented the listings from REListing.xml. Further fo:flow elements contain block-level objects.

There are five block-level objects. They are

  • fo:block—Used to format paragraphs, titles headings, and so on. For example

    <fo:block font-size="10pt", font-family="sans-serif">This is 10pt text sans serif text<fo:block>.

  • fo:display-graphic—Used to insert an image into a flow. For example

    <fo:display-graphic image="someimage.jgp" height="1.0in" width="2.0in"/>.

  • fo:display-included-container—Used to generate block-level areas that have a different writing mode than the current mode.

  • fo:display-rule—Used to insert a line into a block. For example

    <fo:display-rule length="1.0in" line-thickness="1pt"/>.

  • fo:display-sequence—Used as a container for child block elements and specifies a number of properties that will be inherited by its children.

More Properties

There are a large number of properties that apply to block-level formatting. See the XSL Formatting Objects specification for a complete list of all the properties and which ones apply to a given FO element.


Other Content Elements

There are a number of other content elements that can appear within a fo:flow element or children of a fo:flow element. Some of the most common are

  • fo:inline-sequence—Used as a container for properties for its children. For example

    <fo:inline-sequence font-style="italic">italic text</fo:inline-sequence> produces italic text.

  • fo:table—Used for inserting tables and has a number of child elements, such as fo:table-caption, fo:table-body, and fo:table-row.

  • fo:list-block—Used for inserting lists and has a number of child elements, such as fo:list-item, fo:list-item-body, and fo:list-item-label.

There are also a number of other less common fo:flow child elements, such as footnotes, links, characters, and so on. All these elements, when combined with their appropriate properties, give an incredible level of control over formatting and page layout.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.8.34