We’ve seen how a docbase’s URL namespace can encode a lot
of information that can enable both programs and people to categorize
the subsets of that namespace that search engines return.
There’s also a complementary namespace that can carry an
additional information load. The <TITLE>
tag
enclosed by an HTML document’s <HEAD>
is an invaluable but often underutilized resource. Text placed there
doesn’t appear in a web page. It becomes the title of the
window in which the browser displays the page. As I mentioned in
Chapter 5, you may be disappointed if you rely on
the window title to display information that is essential to users.
I’ve found that people don’t regard the window title as
part of a web page, so you have to recapitulate it in the body of web
pages in order to get people to notice it. But while this
doctitle namespace may not be very interesting
to people, it’s enormously useful to search-results scripts.
Figure 8.2 shows what a search-results page looked
like on the BYTE site.
The result set draws from three different docbases, but everything fits into a common abstract pattern:
DATE TYPE SUBTYPE TITLE ABSTRACT
When you control both the search engine and the docbase, you can
always achieve this effect. The question is: With how much effort?
Careful design of the URL and doctitle namespaces will yield search
results that integrate easily and comfortably into this kind of
structure. The trick is to ensure that the two namespaces, in
combination, can map as completely as possible to the abstract
markers—that is, DATE
,
TYPE
, SUBTYPE
, and so on.
In this case, the search-results structure requires a creation date for each result. You can’t just rely on the file’s modification date that some search tools can report. For our purposes here, the creation date must be fixed—we want the age of the document, not a last-modified date that changes when someone edits the file or when a filter program transforms the entire docbase.
However, as is typical when you try to map multiple docbases into a common results architecture, the notion of a creation date is open to interpretation. In this example, for documents in the magazine archive, it really means issue date—that is, the month in which the article appeared, not the month in which it was written. For conference messages and press releases, the creation date really means what it says—but it also says less than it could. For records of these two types, the creation date is known not merely to the month, but to the day. Because daily grouping of results didn’t map cleanly across all three docbases and because monthly aggregation was simpler yet sufficient, I took the latter approach.
Where did the creation date come from? That depended on the docbase.
For magazine articles and press releases, it was included in each
record’s HTML document title. The URL
/art/9704/sec6/art1.htm
, for example,
corresponded to the doctitle BYTE / April 1997 / Cover
Story / Cheaper Computing. Likewise, the URL /vpr/000439.htm corresponded to the doctitle
VPR / Citrix / WinFrame for Networks /
95-08-29. For each docbase, the set of doctitles formed a
kind of virtual database. Knowing the schema of that database, the
search-results script could pick out fields and use them to structure
a results page. Note that the date appears twice for hits from the
magazine archive—as the URL component 9704 and the doctitle component
April 1997
. That’s OK; there’s more
than one way to do it. It doesn’t matter which of these
namespaces carries the marker you need, so long as at least one of
them does.
It’s useful to think of the URL and doctitle namespaces as
complementary. For example, when we made creation date
part of every Docbase URL, we took some of the
pressure off the doctitle namespace. If the doctitles don’t
need to display creation date
, they can instead
display some other useful dimension—say,
author
. It’s the union of the namespaces
that matters to a search-results script, and you can use them in
combination to carry the maximum information load.
3.142.197.212