Chapter 2. InDesign XML Publishing: College Catalog Case Study

Most people look at InDesign as a layout tool for highly styled graphic designs that are rich with color and typographic controls. Some users also import data into tables or export InDesign as HTML. InDesign CS is fully capable of all these things, but if a person is exploring XML, it is usually because someone has said, “Hey, we need to use XML so that we can make web pages and PDFs and everything out of the same content.” Perhaps the organization is already using XML for the website, and someone has seen that InDesign can work with XML. Or someone has used InDesign and is wondering how to extract the content from InDesign in a way that a web service or other application can use it.

In any event, although InDesign can do some pretty useful XML importing and exporting, Adobe does not see this as a feature intended for typical users. Their demos are business card templates and cookbooks; making XML that will match what another application or process uses is not the focus of their examples. However, Adobe has provided a number of features in InDesign for importing, creating, and exporting XML.

To get the most of the XML capabilities of InDesign, think about the bigger issues of the processes you have in place, the workflow that will help with it, and whether you need to create XML from content you already have in InDesign (that is, to export XML), to create InDesign documents from XML (that is, to import XML), or to do both of these processes (that is, bidirectional XML import/export).

As an example, I will use an actual project that needed both import and export: a college course catalog. The course catalog consists of a number of chapters, including topics such as:

  • General information about the college, its history, and its program emphasis, as well as its academic calendar

  • Financial aid, admissions criteria, and the application process

  • Programs of study

  • Course descriptions

  • Student services, the regulations handbook, and policies and procedures

  • Faculty and staff listing, directory, and campus maps

Of these chapters, some financial aid data, the course descriptions, and the programs of study were stored in database tables. The content of the database was published directly to the college website as HTML pages using Microsoft Active Server Pages (ASP). The rest of the content was created by staff members who sent Word documents to the InDesign layout person; these documents did not exist in the database as text entries. The InDesign files were used primarily for the printed output, a bound paper catalog.

The goal was to make the database a “single source,” with the website and the printed catalog being two outputs from the same content. To synchronize the current processes, content in InDesign would be added to the database, and content from the database would be passed into InDesign.

We were dealing with two different types of content in the catalog: some could be assigned neatly to table rows and cells in a database, and some was more narrative or organized in topics. Each of these types of content needed its own analysis and design process to achieve the XML import/export. Key issues and proposed solutions were:

  • Database content was extracted as plain text (separated into paragraphs) and given to the layout person in one large .txt file. The layout person imported the plain text and then had to mark up every paragraph with the correct InDesign paragraph style. Because about two-thirds of the catalog content was in the database, this meant that the layout person was manually marking up more than 130 pages of the catalog. The proposed solution was to provide database content to the layout person such that it would format itself automatically upon import into InDesign.

  • All of the text about admissions, policies, registration, regulations, and personnel was being created in Word documents. These documents were imported as source material for the InDesign catalog. The text then was edited in InDesign and was finally added to the database and website via cut-and-paste operations from RTF files exported from InDesign. There were problems with getting changes on time and mistakes in editing that led to differences in the text outputs. The proposed solution was to provide the output to the database and website developers such that it could be imported as rich text “blobs” but still have some semantic meaning that would assist in locating and reusing it. After the initial import into the database, the database programmer would provide a web-based form for editing so that the database would be the ongoing “single source” for this content.

Both of these processes involved InDesign’s XML capabilities, as you will see.

The database programmer and the InDesign layout person provided input on how they viewed the content, how they worked with it, and what problems they found when interchanging the content between the two applications. The editorial staff for the catalog also contributed input regarding how they reviewed and made corrections to the catalog during the publishing process.

Data-Like Content Example: The Course Description XML

The data table that contained the course descriptions was one of the largest in the database. Hundreds of course descriptions were managed in it, containing data in a regular format, as in Table 2-1.

Table 2-1. Database fields for course descriptions

Course major

Course number

Course name

Course credits

Course description

Notes

Accounting

ACC 101

Accounting Principles I

4

Basic principles of financial accounting for the business enterprise with emphasis on the valuation of business assets, measurement of net income, and double-entry techniques for recording transactions. Introduction to the cycle of accounting work, preparation of financial statements, and adjusting and closing procedures. Four class hours.

Prerequisite: MTH 098 or MTH 130 or equivalent.

In InDesign, we wanted the content to look like Figure 2-1.

Example of formatted XML output for course descriptions
Figure 2-1. Example of formatted XML output for course descriptions

There are four InDesign paragraph styles defined for the content:

Course Descriptions—Major

The heading for the major under which the course falls.

Course Descriptions—Name

The bold text for the course number, official name, and credits awarded, in a single line.

Course Descriptions—Text

The normal text for the description of the course, as a paragraph.

Course Descriptions—Footnote

The italic footnote, which includes prerequisites, limitations on registration, required approvals, and the like. There could be more than one paragraph of footnotes for a course.

Naming all of the paragraph styles with the same beginning keeps them together in the InDesign paragraph styles palette.

Data Exported as XML

When we exported the course description content from the database, we combined a few of the data fields (the course name and number and credits became a single element, with tabs separating the values) to align better with what the InDesign layout would be. Example 2-1 shows how the elements of a course description were written in our XML.

Example 2-1. Sample XML structure based on database fields
<CourseDescription_Major>Accounting</CourseDescription_Major>
<CourseDescription_Name>ACC 101&#9;Accounting Principles I&#9;4 Credits
</CourseDescription_Name>
<CourseDescription_Text>Basic principles of financial accounting for the business 
enterprise with emphasis on the valuation of business assets, measurement of net 
income, and double-entry techniques for recording transactions.  Introduction to 
the cycle of accounting work, preparation of financial statements, and adjusting
and closing procedures.  Four class hours.</CourseDescription_Text>
<CourseDescription_Footnote type="prereq">
Prerequisite: MTH 098 or MTH 130 or equivalent.</CourseDescription_Footnote>

The “Notes” content from the database entry for a course was named <CourseDescription_Footnote> so that it could be recognized as a specific type of note. <CourseDescription_Footnote> was given an attribute named type, which is used generally as an indication of a prerequisite for the course, if there is one.

This approach allowed for notes that pertain to prerequisites to be searched for within the XML content.

Modeling the Structure for the Import XML

A simple DTD for the course descriptions data was generated from the XML that we extracted from the database. All of the course description elements are wrapped together in a root element named CourseDescriptions:

<?xml version="1.0" encoding="UTF-8"?>
<!-- DTD generated from database XML content using XML Spy -->
<!ELEMENT CourseDescriptions (CourseDescription_Major* | 
    CourseDescription_Name* | CourseDescription_Text* | 
    CourseDescription_Footnote*)+>
<!ELEMENT CourseDescription_Major (#PCDATA)>
<!ELEMENT CourseDescription_Name (#PCDATA)>
<!ELEMENT CourseDescription_Text (#PCDATA)>
<!ELEMENT CourseDescription_Footnote (#PCDATA)>
<!ATTLIST CourseDescription_Footnote
    type CDATA #REQUIRED>

We could have wrapped the basic structure of each course with all its fields inside an element named <CourseDescription>, but InDesign works best with XML that doesn’t have many levels of content hierarchy. So we arbitrarily made this structure simple to make it easier for the InDesign layout person.

With a simple DTD and an understanding of the basic XML structure and the paragraph styles that we were going to use in InDesign, our prep work for this import was done. We’ll dive into the details of the import and paragraph styles mapping later. (If you want to understand DTDs better, search for “XML DTD basics” online.)

Topical Content: The Handbook XML

We needed to reverse the process when we wanted to export the XML from InDesign to put into the database. We started by looking at the content in InDesign, thought about how we were going to store it in the database, and designed the XML markup that would achieve our goals.

Evaluating the Handbook Text for Structure

The text in the handbook was organized into topics:

  • Rights and Freedoms of Students

  • Code of Conduct

  • Grievance Procedure

  • Parking Regulations

  • Alcohol and Drug Policies

Some of these topics included many subtopics, some included procedures, and some included reference tables or illustrations. Compared with the database content, this content was much more freeform and harder to predict, so the XML structure had to be more generic.

To make XML that would be useful for the particular workflow of this college, we determined that we would make each main text topic flow into an XML file, which would be changed into a rich text blob in the database (because that would be the most editable form of the content for the future editing cycles).

Modeling the Structure as a Set of Topics

The content was usually edited as a single “story” or text flow in InDesign. Some of these were small and simple enough to be made into a very shallow structure: a <Story> element that contained an optional <IntroBlock> element, at least one <SectionHead>, some <SubsectionHead>s, <Subhead>s, and <para>s and optional <listitem> and <table> elements. The most complex content might include a number of topics inside a story, with the same basic headings, paragraphs, lists and tables inside a topic. We decided that content should generally be no more than three levels deep inside a story or a topic.

Our basic structure for these types of content is captured in a tree diagram as shown here:

Story
    @name
↳    IntroBlock
    ↳    para
↳    SectionHead
    ↳    SubSectionHead
        ↳    Subhead
            ↳    keyword
        ↳    para
            ↳    keyword
        ↳    listitem
        ↳    Table
            ↳    Cell
        ↳    keyword
        ↳    topic
            @title
            ↳    para
                ↳    keyword
            ↳    listitem
            ↳    keyword
            ↳    Table
                ↳    Cell

We used names of existing paragraph styles for a few elements, and kept their capitalization, such as <SectionHeading>, while we lowercased all the more generic elements, such as <para>. This made it easier to remember which element names originated from the InDesign layout.

A few elements and attributes were designed to help us manage or search the content after export. There is an attribute, name, for a <Story> element to give us a handle on the kind of information contained in a Story, such as “Career and Transfer Programs, Certificates and Advisement.” A similar attribute, title, was used on a <topic> element, so that we could identify the information in a topic even if it did not have a heading to display. The <keyword> element could be used inside a <Subhead> or <para> element.

We did not have to be very rigorous in developing our structure. We selected names that were quite generic and flattened out structures for which we didn’t think “wrapper elements” would be necessary. For example, we did not wrap a set of <listitem> elements in a <list> element. Although such an approach is common in HTML, it would be unnecessary in tagging text in InDesign, where we want the closest match we can get between the incoming elements and the number of paragraph styles that we will use. (Adobe has a similar strategy in regard to tables, having decided to dispense with <Row> and just use <Table> and <Cell> elements.)

With this basic structure converted into a DTD, we were ready to start marking up InDesign content as XML and validating it.

Iteration and Refinement

We didn’t get the structure that we used on the first try. The first versions of the XML structure were more granular (had more little elements within the <topic> and the <para> level of structure) and had many more “wrapper elements.” We tested by importing XML with various structures and different settings of the Import Options dialog to see what results we got in InDesign. If we didn’t like the results, we changed the structure and tried again. When we were finished with this process, I generated a DTD from our final XML and used that DTD for validating the content.

Note

In Chapter 8, you will see why I prefer to go with the minimum of structural rules and to develop DTDs after creating working examples of content (if you are “rolling your own” DTD). In the example project, we only had to be sure that one InDesign layout person and one database developer would be able to understand how to create, manage, and interchange a specific set of content elements.

Net Results: Vast Improvements in Understanding and Speed

We had a lot of successes with our project. Among the most significant were a somewhat improved understanding of the database by the publishing group and much greater understanding by the database team of the publishing process. Because the bulk of the work was going to be passing content from the database to the publishing application via XML, the database programmer was intimately involved in understanding how the layout person perceived the content and what tasks he needed to perform with the content.

Besides improved comprehension between the functional groups, there was also a very important improvement in time to delivery for the layout person. He was given a brief tutorial on XML import and adjusted paragraph style names before importing the XML. Thereafter, where once he had spent days (literally) marking up the 130 pages of plain text paragraphs, he now could import all of the content in a few minutes, watch it auto-format itself as it came in, and then page through it, applying column and page breaks as needed. The estimated time saved in manual layout of the 130 pages was about 80 percent.

The text that was exported as XML from InDesign was marked up by an outside vendor in order to minimize the impact on the production cycle for the catalog. The database programmer was again a critical person in the success of the process change; he figured out how to get the database (which did not store XML natively) to import XML and achieve a useful, editable set of new content pieces within the database.

Our project was stretched out over a year’s publishing cycle, and we held regular meetings and used a wiki to help track progress and document the project. I consider it a successful pilot of the processes that I am describing in this book. The process has been in use for seven years (as of 2012), and the college’s developers have been able to adjust the process without difficulty.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.104.29