Now that you have the building blocks of Atom, let’s move on to the details. We’ll first look at the standard elements of an Atom entry document.
Atom entry documents not only make up the bulk of an Atom feed but are also used as the transport for the Atom Publishing API and as a format for web site archives. For example, using the Atom entry document format as an archive template for your weblog seems an increasingly good idea.
entry
Within an Atom entry document, the entry
element
is the root, which must have a version attribute to denote the
version of Atom you are deploying. This book is based on the
draft-05, whose version identifier is
draft-ietf-atompub-format-05
:
do
not
deploy
. Subtlety isn’t its strong
point, you have to admit. This element may also contain any number of
XML namespace declarations for the use of other XML vocabularies. I
cover this in Chapter 11.
If the entry is part of a feed document, this element has no
attributes. Either way, the remainder of the elements are all
children of entry
.
title
The title
element is a Text construct that gives
the title of the entry. The entry must have one, and only one.
link
link
, a Link construct, gives details of related
URIs. There must be at least one with a rel
attribute of alternate
, but there
can’t be more than one of these with the same
type
value. This most commonly points to the HTML
version of the resource, as with the link
element
in both flavors of RSS.
You can have as many link
elements as you wish
with a rel
of something other than
alternate
. We’ll talk about those
later on in this chapter.
edit
The edit
element is a Service construct pointing
to the edit endpoint for this particular entry for use with the Atom
Publishing API. You can only have one of these, but it is optional.
author
author
is mandatory, unless the document is within
a feed that has already declared an author
for
everything. It’s a Person construct, denoting the
primary author of the entry, and you can only have one of them. For
multiple authors, you have to decide who the most important one was
and demote the others to contributor
. If
necessary, fight.
host
host
is optional and conveys the domain name,
dotted IPv4 address, or IPv6 colon-delimited address associated with
the origin of the entry document. Confused? Me too, until I saw it
came from the need to give authorship details of posts from wikis. In
many cases, the author of a wiki article isn’t known
by anything other than the IP address she posted from. This element
is for that situation.
contributor
contributor
is a Person construct, entirely
optional and unlimited in number, that denotes a contributor to the
entry. You must have an author
before you start
talking about contributor
s, however.
id
id
is a URI construct that provides a URI for the
entry. See the sidebar Sidebar 7-1
for more on this.
category
category
is a category construct that provides a
category for the entry document.
updated
A Date construct, the updated
element must be
present, once, within an entry. It denotes the last time the content
changed in a way that the producer deems significant. So, you
don’t need to change this if you’re
fixing spelling mistakes, for example.
published
published
is a Date construct denoting
“an instant in time associated with an event early
in the lifecycle of the entry,” according to the
specification document. Basically, this means either when it was
written or when it was made available to the public. These are
different things, granted, but there is no way to tell the difference
within Atom’s standard elements as yet.
You can, curiously, also set the published
element
to a value in the future. This suggests to applications that the
entry shouldn’t be displayed until that time, but
applications don’t have to pay any attention to this
and can go ahead and display it anyway. No manners, some people.
summary
The summary
element, in brief, is a Text construct
that gives a short summary or extract of the entry.
It’s optional if there is a
content
element, and, like the Highlander, there
can be only one.
If there is no content
element,
summary
is mandatory. The
summary
is also mandatory if the
content
has an src
attribute
and is therefore empty, or if the content
is
encoded in Base64. As for that, we’re just getting
to it.
content
The concept of content within an Atom entry document differs slightly from that within an RSS feed. Within Atom, as with RSS, you can include the content directly within the entry document, but you can also just link to the content placed within a different file. (Although, as detailed later, you’re discouraged from doing this with text content.) Furthermore, you can include any form of content (inside the feed, or linked to externally) and not just text or HTML.
The content
element is its own construct,
consisting of two attributes, type
and
src
, and its own content:
type
may be either TEXT
,
HTML
, or XHTML
—following
the same rules as the Text construct—or if none of these
things, it must be a valid MIME media type as per RFC 2045. If the
type
attribute is missing, it is considered to be
equal to TEXT
with all of the ramifications
detailed for the earlier Text construct.
The src
attribute may be a URI, which the
application may dereference to retrieve the content. If the
src
attribute is present, the
content
element must be empty, and the
type
must be a MIME type and not
TEXT
, HTML
, or
XHTML
. The MIME type returned by the server
providing the resource is definitive, however. In other words, the
feed might say something is x, but if the server says its y, you
should treat it as y.
Finally, if the value of type
begins with
“text/” or ends with
“+xml”, the content should be part
of the feed as far as possible.
copyright
This is a Text construct that conveys copyright information for the
entry. It’s optional, and only one can be present.
If it’s not there, the copyright
of the feed document takes over. If it is there, it takes precedence
over that of the feed.
So, armed with handfuls of entry documents, we can make a feed. Feeds have their own elements too. Here they are:
feed
The feed
element is always the root element of a
feed document. Like the entry
element within a
standalone entry document, it takes a single attribute,
version
, which in the case of this version of the
specification equals draft-ietf-atompub-format-05
:
do
not
deploy
.
Everything is a child of this element. It takes two children
directly, one head
, and zero or more
entry
s, containing the entry documents.
head
The head
element is a container for the metadata
of a feed. The rest of the elements in this section are children of
this head
element. It may also contain properly
namespace-qualified elements from other XML vocabularies, as
you’ll see in Chapter 11.
title
A Text construct giving the title of the feed. It is mandatory.
link
As with its namesake within the entry document,
link
is a Link construct, giving details of
related URIs. If there is no content
element
within an entry, there must be at least one link
with a rel
attribute of
alternate
. There can’t be more
than one with the same type
value.
link
is most commonly used to point to the HTML
version of the resource, as with the link
element
in both flavors of RSS.
We’ll talk about the other types of
link
later on in this chapter.
If a feed’s link rel="alternate
"
element resolves to an HTML document, then that document should have
an autodiscovery link element that reflects back to the feed. We
discuss this in Chapter 9.
introspection
The introspection
element is a Service construct
giving the URI of a site’s introspection file.
It’s optional, and you can only have one.[1]
post
This element is a Service construct that conveys the URI used to add entries to the feed, using the Atom Publishing API. It’s optional, and, yes, only one is allowed.
author
As with the Atom entry document, the author
element is a Person construct to denote the primary author of the
feed and the entries found within it. As noted in the entry document
section, the person denoted by feed/head/author
is
overruled by anyone denoted by feed/entry/author
.
However, if the majority of your entries are authored by the same
person, use of this element saves time. Either way, unless all your
entries have their own author
element, it is
mandatory. You can, naturally, have only one.
contributor
Basically, this is the same as the author
element,
it’s used only to denote any other authors. The
rules of precedence are exactly the same as those for
author
.
category
category
is a category construct that provides a
category for the entire feed document.
tagline
A Text construct giving a description or tagline for the feed. Optional; only one is allowed; brevity and wit are appreciated.
id
An Identity construct giving a unique, permanent identifier for this feed. The feed’s URI, in other words. It’s optional, but you can have only one.
generator
An optional element denoting the software used to create the feed. This is useful for statistics and for error tracking. You can have only one of these elements, obviously. The specification document puts it succinctly:
The content of this element must be a string that is a human-readable name for the generating agent. The element may have a “uri” attribute whose value must be a URI. When dereferenced, that URI should produce a representation that is relevant to that agent. The generator element may have a “version” attribute that indicates the version of the generating agent. When present, its value is unstructured text.
copyright
A Text construct conveying human-readable copyright information for
the entire feed and all its entries except those that contain their
own copyright
element. It’s
optional, and the feed itself can have only one. It
shouldn’t be used to convey machine-readable
information.
info
This is a Text construct giving a human-readable explanation of the format itself. It’s optional and really just a place for people to leave notes to other developers. It isn’t meant to be used by any application and is only viewable if you look directly at the source.
updated
A Date construct, the updated
element must be
present, once, within a feed. As with the entry document equivalent,
it denotes the last time the content changed enough for the publisher
to want readers to know about it.
So there you go: the entire makeup of an Atom feed, as of January 2005. Again, be aware that Atom is a changing specification. I am judging, perhaps wrongly, that the specification won’t change radically from the one detailed here—and if it does, you are now in a fine position to understand the changes—but before you deploy the format in anything resembling a permanent manner, go and check the latest documents.
What, therefore, is the simplest possible Atom feed document? Technically speaking, you don’t need to have any entries at all, but that’s as close to useless as you’re allowed to get. Assuming one entry, Example 7-3 shows the simplest possible Atom feed. If your feed is missing any of these elements, it is incomplete.
<?xml version="1.0" encoding="utf-8"?> <feed version="draft-ietf-atompub-format-05: do not deploy" xmlns="http://purl.org/atom/ns#draft-ietf-atompub-format-05"> <head> <title>The Simplest Feed</title> <link rel="alternate" type="text/html" href="http://example.org/index.html"/> <author><name>Ben Hammersley</name></author> <updated>2004-10-25T15:07:02Z</updated> </head> <entry> <title>The Simplest Entry Document</title> <link rel="alternate" type="text/html" href="http://example.org/example_entry"/> <author><name>Ben Hammersley</name></author> <id>http://example.org/2004/12345679</id> <updated>2004-10-25T15:07:02Z</updated> <content type="TEXT">Simple Simple Simple</content> </entry> </feed>
[1] The idea of an introspection file is also a matter of debate.
It is used with the Atom API and is a separate file containing the
URIs of the Atom API endpoints for all the sites within that domain,
for each of the API methods. There is no current standard for the
introspection file, and perhaps there never will be. Certainly, the
presence of the post
edit
elements take much of its place. As I keep stressing, in using the
Atom standards, you are on the bleeding edge of syndication
technology, which is itself built on the bleeding edge of publishing
technology. It’s not inconceivable that things will
drop off every so often.
3.144.254.133