If you prefer the C language, Genx provides a fast, efficient C library for generating well-formed and canonical XML. On top of that, it’s well documented and a real pleasure to use.
Genx (http://www.tbray.org/ongoing/When/200x/2004/02/20/GenxStatus) is an easy-to-use C library for generating well-formed XML output. In addition to its output being well-formed, Genx writes all output in canonical form. It was created by Tim Bray with help from members of the xml-dev mailing list (http://xml.org/xml/xmldev.shtml) over the first few months of 2004. Some of the benefits of Genx include size, efficiency, speed, and the integrity of its output. Genx is well documented (http://www.tbray.org/ongoing/genx/docs/Guide.html) and it’s fairly easy to figure out what’s going on just by looking at the well-commented source code.
This hack shows you how to download, install, and compile Genx, then
walks you through two example programs. The hack assumes that you are
familiar the C programming language, and that you have a C compiler
and the make
build utility available on your
system. The example programs in this hack have been tested under
Version beta5 of Genx.
The first thing you have to is download Genx. It comes in a tarball only. After you download it to the working directory for the book, you need to extract the files. While at a shell or command prompt in the working directory, if you are on a machine that runs a Unix operating system, decompress the Genx tarball with:
gzip -d genx.tgz
Then extract the tar file genx.tar with:
tar xvf genx.tar
This creates a genx subdirectory where all the files from the archive will be extracted. (If you are on Windows without Cygwin, you can use a utility like WinZip to extract the GZIP archive.)
Genx
comes with a Makefile for building the project.
While in the genx subdirectory, just type
make
, and the process begins. The build will
compile the needed files genx.c and
charProps.c. genx.c
includes the genx.h header file;
charProps.c is where character properties are
stored, and it is used to test for legal characters in XML.
The ar
(archive) command is invoked to create an
archive from object files genx.o and
charProps.o The archive is called
libgenx.a. The ranlib
utility is also invoked to create an index for the archive. You will
need to use libgenx.a when you compile your own
Genx files. One other program, tgx.c, is also
compiled and run. This program runs a number of tests on Genx and
reports on what it finds so you know everything is working.
Several test programs are provided in
the Genx package and are stored under the docs
subdirectory. I have written two sample programs that
I’ll highlight here. You can find these programs in
the genx-examples subdirectory wherever the
example file archive for this book was extracted. Change directories
to genx-examples and type
make
again (the Genx examples have their own
makefile). After you invoke
make
in genx-examples, the
example programs will be built and ready to go.
Example 7-32 is a simple C program called tick.c that uses functions from the Genx library.
Example 7-32. tick.c
#include <stdio.h> #include "../genx/genx.h" int main() { genxWriter w = genxNew(NULL, NULL, NULL); genxStartDocFile(w, stdout); genxStartElementLiteral(w, NULL, "time"); genxAddAttributeLiteral(w, NULL, "timezone", "GMT"); genxStartElementLiteral(w, NULL, "hour"); genxAddText(w, "23"); genxEndElement(w); genxStartElementLiteral(w, NULL, "minute"); genxAddText(w, "14"); genxEndElement(w); genxStartElementLiteral(w, NULL, "second"); genxAddText(w, "52"); genxEndElement(w); genxEndElement(w); genxEndDocument(w); }
Line 2 of the program is an #include
directive for
the copy of the genx.h header file that is
located in the genx directory above
genx-examples, provided that Genx and was
installed as directed.
You can also place a copy of genx.h in the
location for system include files (on my Cygwin system, for example,
the location is c:/cygwin/usr/include). If a
copy of genx.h is in the system include
location, you can change the #include
directive on
line 2 to #include <genx.h>
.
The first statement inside main()
creates a
writer for the output of the program. The variable
w
is of type genxWriter
, and it
is initialized by the genxNew
function (see line
6). Looks like a Java constructor, doesn’t it?
genWriter
is a pointer to the struct
genxWriter_rec
, which stores all kinds of
information about the document being built. The three arguments to
the genxNew
function are for memory allocation and
deallocation. When all three arguments are set to
NULL
, we are instructing Genx to use its default
memory handling (that is, with malloc()
and
free()
).
Following this initialization of a writer is a series of function
calls, each with a small job. Notice that the first or only argument
to each of these functions is w
, the writer
structure. The call to genxStartDocFile()
on line
8 starts the writing process. The second argument,
stdout
, indicates that the document will be
written to standard output. (The document could otherwise be written
to a file, as you will see in the next example.) At the end of the
program (line 21) is a call to genxEndDocument()
,
which signals the end of the document and flushes it.
The program also contain four calls to
genxStartElementLiteral()
(lines 9, 11, 14, and
17), each of which is terminated by a call to
genxEndElement()
(lines 13, 16, 19, and 20).
genxStartElementLiteral()
has three arguments.
The first is the writer structure (w
) explained
previously, next is a namespace name or URI (NULL
if none), and the third is the element name, such as
time
or hour
.
If you give an element a namespace URI in the second argument, Genx
writes the namespace URI on the element with an
xmlns
attribute and automatically creates a
prefix, which is used on any child elements that have the same
namespace declared.
The text content for a given element, if any, is created with
genxAddText()
(lines 12, 15, and 18), with the
second argument containing the actual text, such as
23
or 14
.
You can probably guess that genxAddAttributeLiteral()
(line 10) writes an attribute on the element that is
created immediately before it. It has four arguments. The first is
the writer structure, and the second is a namespace URI, which is
NULL
if no namespace is used. The third argument
is the attribute name and the fourth is the attribute value.
To run the program, just type tick
at the prompt
(it was compiled with make previously). The
output of the program should look like this:
<time timezone="GMT"><hour>23</hour><minute>14</minute><second>52</second></time>
This output is an example of canonical XML. Some obvious marks are no XML declaration and double quotes rather than single quotes around attribute values. Now let’s look at a Genx example that is a little more complex.
In the next example we will explore a different approach for writing an XML document with Genx. The program tock.c declares elements, an attribute, and a namespace before it uses them, then writes elements and an attribute with different functions that are more efficient than their literal counterparts. It also write its non-canonical output to a file. Example 7-33 shows the code for tock.c.
Example 7-33. tock.c
#include <stdio.h> #include "../genx/genx.h" int main() { genxWriter w = genxNew(NULL, NULL, NULL); FILE *f = fopen("tock.xml", "w"); genxElement time, hr, min, sec; genxAttribute tz; genxNamespace tm; genxStatus status; tm = genxDeclareNamespace(w, "http://www.wyeast.net/time", "tm", &status); time = genxDeclareElement(w, tm, "time", &status); tz = genxDeclareAttribute(w, NULL, "timezone", &status); hr = genxDeclareElement(w, tm, "hour", &status); min = genxDeclareElement(w, tm, "minute", &status); sec = genxDeclareElement(w, tm, "second", &status); genxAddText(w, "<?xml version="1.0" encoding="UTF-8"?> "); genxStartDocFile(w, f); genxPI(w, "xml-stylesheet", " href="tock.xsl" type="text/xsl" "); genxComment(w, " the current date "); genxAddText(w, " "); genxStartElement(time); genxAddAttribute(tz, "GMT"); genxAddText(w, " "); genxStartElement(hr); genxAddText(w, "23"); genxEndElement(w); genxAddText(w, " "); genxStartElement(min); genxAddText(w, "14"); genxEndElement(w); genxAddText(w, " "); genxStartElement(sec); genxAddText(w, "52"); genxEndElement(w); genxAddText(w, " "); genxEndElement(w); genxEndDocument(w); }
Line 7 creates a FILE
object by calling the
fopen()
function with a filename
(tock.xml) where the output is to be written and
the stream or writer object (w
) from which the
data will be supplied. Following that, four elements
(time
, hr
,
min
, and sec
) are declared to
be of type genxElement
(line 8). The attribute
tz
is declared to be of type
genxAttribute
(line 9), and the namespace
tm
is declared with
genxNamespace
(line 10). status
is of type genxStatus
(line 11), an
enum
that helps keep track of the status of
things, such as GENX_SUCCESS
and
GENX_BAD_NAME
, and so forth.
status
is used as the last argument of the
functions that are on lines 12 through 17, with the address-of
operator &
.
After the initial declarations, all these variables are initialized
with an appropriate function: genxDeclareNamespace()
(line 12), genxDeclareElement()
(lines 13, 15, 16, and 17), and genxDeclareAttribute()
(line 14). The namespace variable tm
is given a namespace name (http://www.wyeast.net/time) and a prefix
(tm
) with the genxDeclareNamespace()
function.
The genxAddText()
function inserts
strings—an XML declaration and newline characters and
spaces—into the file output stream (lines 19, 23, 26, 30, 34,
and 38). The addition of the XML declaration is what makes the output
non-canonical.
The functions genxPI()
(line 21) and
genxComment()
(line 22) write an XML stylesheet
processing instruction and a comment, respectively. Then the
functions genxStartElement()
(lines 24, 27, 31,
and 35) and genxAddAttribute()
(line 25) begin
writing the markup. The functions use an object rather than text to
write the markup literally, with better performance than their
counterparts genxStartElementLiteral()
and
genxAddAttributeLiteral()
. Other elements, such
as genxAddText()
(lines 28, 32, and 36) and
genxEndElement()
(lines 29, 33, 37, and 39), may
be used with both variations of the element and attribute creation
elements, or just for inserting interelemental whitespace, and so on.
To run the program, type tock
at a command or
shell prompt. Genx will then create the file
tock.xml, shown in Example 7-34.
Example 7-34. tock.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="tock.xsl" type="text/xsl" ?> <! the current date --> <tm:time xmlns:tm="http://www.wyeast.net/time" timezone="GMT"> <tm:hour>23</tm:hour> <tm:minute>14</tm:minute> <tm:second>52</tm:second> <tm:time>
Just for fun, this non-canonical output can be transformed with the XSLT stylesheet tock.xsl and validated with the RELAX NG schema tock.rng. Both files are in the genx-examples subdirectory.
There are a number of other Genx functions that I have not touched
on—such as the memory management functions
genxGetAlloc()
, genxSetAlloc()
and
such like. My take is that Tim Bray is on the right track, and that
if you use C and you need to generate XML output, you will no doubt
find that Genx is an efficient tool.
18.191.43.140