You’ve had a chance to use a number of XSLT processors in this book, such as Xalan C++ and Instant Saxon. Now you’ll get the opportunity to write your own Java or C# XSLT processor with a simple command-line interface. Actually, you won’t be writing an XSLT processor from scratch, but rather an interface to a processor that is available through Application Programming Interfaces (APIs).
This chapter assumes that you are already an experienced programmer in either or both of these languages. The nice thing about writing your own processor at the API level is that you have control over the interface and how things work. Of course, such a task requires much more effort on your part, but if a high level of control matters enough to you, the effort will be worthwhile.
The first part of the chapter walks through the creation of a Java
XSLT processor using Sun’s
Java API for XML Processing (JAXP).
The second part guides you in creating a processor with C# using
Microsoft’s .NET Framework 1.1 SDK. You
don’t need an interactive development environment
(IDE) to work with these examples—they require only a text
editor and the javac
or csc
compilers. I’ll show you where to get those
compilers if you don’t already have them.
Java Version 1.4 standard or enterprise edition comes standard with JAXP. JAXP includes the APIs you’ll need to create an XSLT processor. You must use Version 1.4 or a later Java Runtime Environment (JRE) to run this example as it is described (more on this later). You can download the latest Java JRE or Software Development Kit (SDK) from http://java.sun.com.
To write a processor with JAXP, you need two extension packages:
javax.xml.transform
and
javax.xml.transform.stream
. There are other
packages available to help you do more things in XSLT, but
we’ll focus on these packages for the sake of
simplicity. You can consult the API documentation for these packages
at http://java.sun.com/j2se/1.4/docs/api/index.html.
In examples/ch17, you will find the source code for the Moxie XSLT processor, Moxie.java. This program has only 68 lines because the heavy lifting is done by classes from Java extension packages.
Example 17-1 lists the source code.
/* * Moxie JAXP XSLT processor */ import java.io.File; import java.io.FileOutputStream; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource; public class Moxie { public static void main(String[ ] args) throws Exception { /* Output file flag */ boolean file = false; /* Default system property for Xalan processor */ System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.processor.TransformerFactoryImpl"); /* Usage strings */ String info = "Moxie JAXP XSLT processor"; String usage = " Usage: java -jar moxie.jar"; String parms = " source stylesheet [result]"; /* Test arguments */ if (args.length = = 0) { System.out.println(info + usage + parms); System.exit(1); } else if (args.length = = 3) { file = true; } else if (args.length > 3) { System.out.println("Too many arguments; exit."); System.exit(1); } /* XML source document and stylesheet */ File source = new File(args[0]); File stylesheet = new File(args[1]); /* Set up source and result streams */ StreamSource src = new StreamSource(source); StreamSource style = new StreamSource(stylesheet); StreamResult out; if (file) { FileOutputStream outFile = new FileOutputStream(args[2]); out = new StreamResult(outFile); } else { out = new StreamResult(System.out); } /* Create transformer */ TransformerFactory factory = TransformerFactory.newInstance( ); Transformer xf = factory.newTransformer(style); /* Set output encoding property */ xf.setOutputProperty(OutputKeys.ENCODING, "US-ASCII"); // encoding xf.setOutputProperty(OutputKeys.INDENT, "yes"); // indent /* Perform the transformation */ xf.transform(src, out); } }
To an experienced Java programmer, this code should readily make sense, but just to make sure the code is comprehensible, I’ve provided the following discussions that dissect each part of the program.
Moxie imports seven classes at the beginning of the program:
import java.io.File; import java.io.FileOutputStream; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource;
The first two classes are from the java.io
package. The three classes that follow are from the
javax.xml.transform
extension package, and the two
after that are from javax.xml.transform.stream
.
The File
class handles the input files (the source
XML document and the stylesheet), and
FileOutputStream
helps write an output file from
the result tree of the transformation.
TransformerFactory
assists in creating a new
instance of Transformer
class, which actually
performs the transformations. OutputKeys
lets you
submit values to the transformer that normally come from attributes
on the output
element, such as the
method
or encoding
attributes.
The remaining classes, StreamResult
and
StreamSource
, are holders for streams representing
the result and source trees, respectively.
Next in the program, the class Moxie
is defined as
well as the main( )
method that makes everything
happen. The first thing that’s done is to create a
boolean
called file
that acts
as a flag to tell the processor whether output will be sent to a
file:
/* Output file flag */ boolean file = false;
This flag is set to true if a third argument appears on the command line (explained shortly).
The next thing that you see in the program is a call to the
setProperty( )
method from the
System
class:
/* Default system property for the Xalan processor */ System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.processor.TransformerFactoryImpl");
This method is not required, but I’ve included it to
illustrate a point. The Xalan processor from Apache is the default
XSLT engine underneath JAXP’s hood. This system
property sets the transformation engine to Xalan for JAXP explicitly,
but it is already done automatically, so it is unnecessary. It is
there so that if you want to change the system property, you can
easily do so. The system property for Xalan is
org.apache.xalan.processor.TransformerFactoryImpl
.
You can change the property to Saxon Version 7 or above with the
property net.sf.saxon.TransformerFactoryImpl
, or
you can change it to jd.xslt with
jd.xml.xslt.trax.TransformerFactoryImpl
. If you
change the system property to Saxon 7, you have to add
saxon7.jar to the classpath; if you change it to
jd.xslt, you need to add jdxslt.jar.
The arguments to main( )
are evaluated with an
if
statement. The three possible command-line
arguments all represent files:
The first argument (args[0]
) represents the XML
source document that you want to transform.
The second argument (args[1]
) is the XSLT
stylesheet for performing the transformation.
The third argument (args[2]
) is optional and, if
used, represents the name of the file where the result tree will be
stored. If absent, the result tree will appear on
System.out
(standard output or the screen). The
file
variable is of type
boolean
and indicates whether this third argument
is present; if so, file
is set to
true
(false
by default) and a
file will be written for the result tree.
These arguments are interpreted as files with the help of two
File
class constructors. Constructors for two
StreamSource
objects and two
StreamResult
objects are then called:
StreamSource src = new StreamSource(source); StreamSource style = new StreamSource(stylesheet); StreamResult out; if (file) { FileOutputStream outFile = new FileOutputStream(args[2]); out = new StreamResult(outFile); } else { out = new StreamResult(System.out); }
This tells the program to interpret the input files as streams for
the benefit of the transformer. (You could also represent these files
as DOM documents by using the DOMSource
class from
javax.xml.transform.dom
, or as SAX events with
SAXSource
class from
javax.xml.transform.sax
.) An
if-else
statement provides a little logic using
the Boolean file
that either sends the result
stream to the screen or to a file.
After that, a factory is used to call a constructor and then create a new transformer:
TransformerFactory factory = TransformerFactory.newInstance( ); Transformer xf = factory.newTransformer(style);
Notice that the new transformer takes the stylesheet as an argument
(style
).
Next, the output encoding for the result tree is set to
US-ASCII
, and indention is set to
yes
by calling the setOutputProperty(
)
method twice:
xf.setOutputProperty(OutputKeys.ENCODING, "US-ASCII"); // encoding xf.setOutputProperty(OutputKeys.INDENT, "yes"); // indent
The setOutputProperty( )
method comes from the
Transformer
class. The
OutputKeys
class, discussed earlier, provides
fields, such as ENCODING
and
INDENT
, that correlate with the attributes of the
XSLT output
element (like
encoding
and indent
). These
method calls have the same effect as using the
output
element in a stylesheet like this:
<xsl:output encoding="US-ASCII" indent="yes"/>
Calling setOutputProperty( )
with
ENCODING
and a value of
US-ASCII
, and calling INDENT
with yes
, replaces the values of the
encoding
and indent
attributes
on the stylesheet’s output
element.
Finally, the program performs the actual transformation using the
transform( )
method of the
Transformer
class:
xf.transform(src, out);
The first argument, src
, is the source stream
derived from the input file, and the second argument is the result
tree. The stylesheet has already been associated with the instance of
Transformer
earlier in the code.
To run Moxie, you need to have at least
a JRE installed for Version 1.4 or later. A JRE is a Java Runtime
Environment, a Java Virtual Machine (JVM) with core classes. If you
want to change the code in Moxie.java and then
recompile it, you need a Java 2 1.4 SDK to get the
javac
compiler, but to only run it, you just need
a JRE.
To find out what version your JRE is, type the following line at a command-line prompt:
java -version
When I type this on my system, I get the following response:
java version "1.4.1_01" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01) Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)
If you get back something like this, it means you’re in good shape. Now, while in examples/ch17, type this line:
java -jar moxie.jar
or this line:
java Moxie
You will get some usage information in response:
Moxie JAXP XSLT processor Usage: java -jar moxie.jar source stylesheet [result]
If you’ve gotten this far without errors, you are
ready to perform a transformation. The document
test.xml contains a list of methods from the
Transformer
class, and
test.xsl transforms it:
java -jar moxie.jar test.xml test.xsl
The result should look like this:
<?xml version="1.0" encoding="US-ASCII"?> <methods> <method>clearParameters( )</method> <method>getErrorListener( )</method> <method>getOutputProperties( )</method> <method>getOutputProperty(String name)</method> <method>getParameter(String name)</method> <method>getURIResolver( )</method> <method>setErrorListener(ErrorListener listener)</method> <method>setOutputProperties(Properties oformat)</method> <method>setOutputProperty(String name, String value)</method> <method>setParameter(String name, Object value)</method> <method>setURIResolver(URIResolver resolver)</method> <method>transform(Source xmlSource, Result outputTarget)</method> </methods>
By default, the transformer uses UTF-8 for output encoding, but
setOutputProperty( )
overrides the default with
US-ASCII, as you can see in the XML declaration of the result tree.
The setOutputProperty( )
method also turns on
indentation—without it, all elements in the result would run
together.
If you’d like, you can also send the result tree to a file rather than to the screen. To accomplish this, you must submit a filename as the third argument on the command line, as you see here:
java -jar moxie.jar test.xml test.xsl moxie.xml
When you enter this line, Moxie writes the result tree to a file in
the current directory using the FileOutputStream
class.
You will also find a pair of files in examples/ch17 that will help you: moxie.bat is a Windows batch file and moxie.sh is a Unix shell script. You can use either of them to reduce typing. For example, to perform the previous transformation at a Unix shell prompt, just type:
moxie.sh test.xml test.xsl moxie.xml
Or, at a Windows command prompt, type:
moxie test.xml test.xsl moxie.xml
You can alter the source file Moxie.java to your
heart’s content. For more information on JAXP, check
the Javadocs for the following packages:
javax.xml.parsers
,
javax.xml.transform
,
javax.xml.transform.dom
,
javax.xml.transform.sax
, and
javax.xml.transform.stream
.
If you
alter Moxie.java, you will have to recompile it
in order to get the new version to run. With Java Version 1.4 SDK
installed, the Java compiler
javac
should
be available to you if the compiler is in your path variable. Find
out whether javac
is there by typing the following
on a command line:
javac
If the compiler is available, you will see usage information on the screen:
Usage: javac <options> <source files> where possible options include: -g Generate all debugging info -g:none Generate no debugging info -g:{lines,vars,source} Generate only some debugging info -O Optimize; may hinder debugging or enlarge class file -nowarn Generate no warnings -verbose Output messages about what the compiler is doing -deprecation Output source locations where deprecated APIs are used -classpath <path> Specify where to find user class files -sourcepath <path> Specify where to find input source files -bootclasspath <path> Override location of bootstrap class files -extdirs <dirs> Override location of installed extensions -d <directory> Specify where to place generated class files -encoding <encoding> Specify character encoding used by source files -source <release> Provide source compatibility with specified release -target <release> Generate class files for specific VM version -help Print a synopsis of standard options
To compile Moxie, enter:
javac Moxie.java
If the program compiles without errors, the compilation produces the class file Moxie.class. This class file contains the byte codes that the JRE interprets to run the program on your particular platform. You can then run the program by using this line:
java Moxie test.xml test.xsl
You can also recreate your JAR file with the jar
tool using this command:
jar cfm moxie.jar META-INF/MANIFEST.MF Moxie.class
This line uses the jar
tool to create
(c
) a new file (f
)
moxie.jar
with a manifest file
(m
) called META-INF/MANIFEST.MF
and with the class file Moxie.class. The
manifest file conveys information to the Java interpreter when, for
example, the interpreter is run with the -jar
option. One such bit of information is what class holds the
main( )
method. This information is passed on with
the following field and value pair from the manifest file:
Main-Class: Moxie
You need this field and value in order for this command to work:
java -jar moxie.jar
Actually, there is an easier way to perform all these steps at once by using the Ant build tool.
Ant is a Java-based build tool sponsored by Apache (see http://ant.apache.org). Ant is easy to use and a time saver. If you are not familiar with Ant but would like to give it a try, go to http://ant.apache.org/resources.html for a list of FAQs, articles, presentations, and books that will help you get up to speed. A build file called build.xml is in examples/ch17 and is available to you for building Moxie with Ant.
The file build.xml also depends on the
ant.properties file (which is also in
examples/ch17) to provide the location of the base
directory where the builds take place. The base directory on Windows
is assumed to be
base.dir=c:/learningxslt/examples/ch17/
; change
the base directory to the correct location.
Assuming that you have downloaded Ant (I’m using Version 1.5.3), installed it, and placed it in your path, you should be able to type the following on a command line:
ant -version
You will get this information on your screen:
Apache Ant version 1.5.3 compiled on April 9 2003
If you type the word ant
alone on a command line
on Windows, while the current directory is
examples/ch17, Ant automatically picks up the
build file build.xml and performs the build,
reporting the following to the screen:
Buildfile: build.xml init: [delete] Deleting: C:learningxsltexamplesch16moxie.jar compile: [javac] Compiling 1 source file jar: [jar] Building jar: C:LearningXSLTexamplesch16moxie.jar java: [java] Moxie JAXP XSLT processor [java] Usage: java -jar moxie.jar source stylesheet [result] [java] Java Result: 1 zip: [zip] Building zip: C:LearningXSLTexamplesch16moxie.zip finish: [copy] Copying 1 file to C:LearningXSLTexamplesch16Backup BUILD SUCCESSFUL Total time: 7 seconds
In just one step, the build process defined by build.xml performs the following tasks:
Deletes the old moxie.jar file.
Compiles Moxie.java, if it has changed since the last build.
Builds a new JAR file for Moxie (moxie.jar).
Runs the Moxie program without arguments.
Creates a zip file that stores all of Moxie’s resources in one spot (moxie.zip).
Copies moxie.zip to the directory examples/ch17/Backup.
Ant is growing in popularity and is being integrated into IDEs like jEdit, VisualAge, and even WebSphere (for links, see http://ant.apache.org/manual/ide.html). Ant also has tasks that do XSLT processing. Check it out at http://ant.apache.org/manual/CoreTasks/style.html. If you work much with Java, learning Ant will be well worth your time.
Eric Burke’s
Java and XSLT (O’Reilly) is a
good place to turn for help with using XSLT with JAXP. Brett
McLaughlin’s Java & XML, Second
Edition (O’Reilly) provides solid help
with using Java APIs such as JAXP, SAX, DOM, JDOM, with XML. I also
recommend that you get acquainted with Elliotte Rusty
Harold’s Java API XML Object Model or XOM, available
for download from http://www.xom.nu. XOM is simple, easy to
learn, and has taken many lessons from earlier APIs. XOM also has a
package (nu.xom.xslt
) for connecting to XSLT
processors that support JAXP.
I’ll now turn my attention to writing a simple XSLT processor with C#.
C# is Microsoft’s evolution of C++ and Java. It’s similar to Java, so I’ve found it easy to learn. C# takes some interesting forks from Java, such as its use of properties, delegates, and so forth. However, exploring the virtues and foibles of C# is not my mission here. I’m just going to show you how to create an XSLT processor in C#—really only a simple command-line interface to .NET’s underlying XSLT engine. It’s easy to do after you have the right pieces.
C# comes as part of Microsoft’s .NET Framework 1.1 SDK. You can download the Framework by following the .NET download link on http://www.microsoft.com/net. It’s well over 100 megabytes, so it takes some time to download, especially if you don’t have a fast Internet connection. This example uses Version 1.1 of the .NET Framework SDK. You need Windows 2000 or Windows XP for the Framework to even install, so either one is required for this exercise. .NET applications will run on other Windows operating systems, but that requires extra steps that I won’t go into here.
The Mono Project includes an open source version of C# that was declared code complete about mid-2003. The Mono version of C# runs on Windows, Linux, FreeBSD, and Mac OS X. I have not tested the C# code in this chapter with Mono, but it’s likely to work.
In examples/ch17/Pax.cs, you will also find the C# source code for the Pax XSLT processor, shown in Example 17-2.
/* * Pax C# XSLT Processor */ using System; using System.IO; using System.Text; using System.Xml; using System.Xml.XPath; using System.Xml.Xsl; public class Pax { public static void Main(String[ ] args) { // Output file flag bool file = false; // Usage strings string info = "Pax C# XSLT processor"; string usage = " Usage: Pax source stylesheet [result]"; // Test arguments if (args.Length = = 0) { Console.WriteLine(info + usage); Environment.Exit(1); } else if (args.Length = = 3) { // Third argument = output to file file = true; } else if (args.Length > 3) { Console.WriteLine("Too many arguments; exit."); Environment.Exit(1); } // Create the XslTransform XslTransform xslt = new XslTransform( ); // Load the XML document, create XPathNavigator for transform XPathDocument doc = new XPathDocument(args[0]); XPathNavigator nav = doc.CreateNavigator( ); // Load a stylesheet xslt.Load(args[1]); // Create the XmlTextWriter XmlTextWriter writer; if (file) { // Output to file with ASCII encoding writer = new XmlTextWriter(args[2], Encoding.ASCII); } else { // Output to console writer = new XmlTextWriter(Console.Out); } // Write XML declaration writer.WriteStartDocument( ); // Set indentation to 1 writer.Formatting = Formatting.Indented; writer.Indentation = 1; // Transform file xslt.Transform(nav, null, writer, null); // Close XmlTextWriter writer.Close( ); } }
Right away, you should notice that the code for Pax.cs and Moxie.java are very similar. A C# programmer should be able to figure out this code in a few glances, but again, if you’re not familiar with C#, you can read the following section, which walks through the program nearly line by line.
C# uses
similar comment characters to Java. Instead of packages, C# uses
namespaces, declaring them at the very beginning of the program with
the reserved word using
:
using System; using System.IO; using System.Text; using System.Xml; using System.Xml.XPath; using System.Xml.Xsl;
You can’t import individual classes in C# as you can
in Java: you have to use the namespace name, such as
System.Xml.Xsl
, which exposes the entire object to
the program.
Following this, the Pax
class is defined and the
Main( )
method is invoked. The command-line
arguments to Main( )
are, as in
Moxie.java, evaluated with an
if
statement. The three possible arguments
represent files:
The first argument (args[0]
) represents an XML
source document that you want to transform.
The next argument (args[1]
) represents the XSLT
stylesheet for the transformation.
The optional third argument (args[2]
) represents
the name of the file where the result tree will be stored, if it is
used. If it is absent, the result tree will appear on
Console.Out
(C#’s name for standard output or the
screen). The file
variable (of type
bool
) indicates whether the third argument is
present. file
is set to false
by default, but if the third argument is on the command line,
file
is set to true
, and the
program will know that a file should be written for the result tree.
The XslTransform
class comes from the
System.Xml.Xsl
namespace. This line instantiates a
transformer named xslt
:
XslTransform xslt = new XslTransform( );
The classes that follow are in the
System.Xml.Xpath
namespace:
XPathDocument doc = new XPathDocument(args[0]); XPathNavigator nav = doc.CreateNavigator( );
An XPathDocument
provides a cache for performing
the transformation, and the CreateNavigator( )
method from XPathDocument
creates an
XPathNavigator
for navigating the cached document.
The Load( )
method from
XslTransform
loads the stylesheet from the second
argument (args[1]
) to the program:
xslt.Load(args[1]);
In C#, the XmlTextWriter
class from the
System.Xml
namespace creates a writer for XML
output:
XmlTextWriter writer; if (file) { // Output to file with ASCII encoding writer = new XmlTextWriter(args[2], Encoding.ASCII); } else { // Output to console writer = new XmlTextWriter(Console.Out); }
If a third argument is present on the command line,
file
is set to true
, and the
output from the program will be written to a file encoded as
US-ASCII
. Encoding
is a
property from System.Text
. Some other possible
values for this property are UTF8
for UTF-8
output, Unicode
for UTF-16 output, and
BigEndianUnicode
for UTF-16BE. If
file
is false
, the output will
be written to the console using IBM437 output, based on the codepage
for a Windows command prompt.
The following line tells the writer to use an XML declaration in the output:
writer.WriteStartDocument( );
Without this line, no XML declaration is written. These lines of code set the indentation of the output to a single space per child element:
writer.Formatting = Formatting.Indented; writer.Indentation = 1;
Formatting
and Indentation
are
properties from the XmlTextWriter
class. The next
line performs the actual transformation:
xslt.Transform(nav, null, writer);
The XslTranform
instance xslt
loaded the stylesheet earlier with its Load( )
method. The first argument to Transform( )
provides the name of an instance of an
XpathNavigator
object, and the third argument is
the name of an instance of an XmlTextWriter
object. The second argument, which is null
, can
use an XsltArgumentList
to provide a list of
parameters or extension objects to the transform. The final statement
in the program closes the XmlTextWriter
object
writer
, automatically closing any element or
attributes that might still be open:
writer.Close( );
A compiled version of Pax is in examples/ch17 (Pax.exe). To run Pax, type the following line at a Windows 2000/XP command prompt:
pax
If all is well, the program will return some usage information to you:
Pax C# XSLT processor Usage: Pax source stylesheet [result]
To transform test.xml with test.xsl, type:
pax test.xml test.xsl
With this command, you will get the following results:
<?xml version="1.0" encoding="IBM437"?> <methods> <method>clearParameters( )</method> <method>getErrorListener( )</method> <method>getOutputProperties( )</method> <method>getOutputProperty(String name)</method> <method>getParameter(String name)</method> <method>getURIResolver( )</method> <method>setErrorListener(ErrorListener listener)</method> <method>setOutputProperties(Properties oformat)</method> <method>setOutputProperty(String name, String value)</method> <method>setParameter(String name, Object value)</method> <method>setURIResolver(URIResolver resolver)</method> <method>transform(Source xmlSource, Result outputTarget)</method> </methods>
The output encoding is set to IBM437
for screen
output. You can also save the output to a file using:
pax test.xml test.xsl pax.xml
The output encoding in pax.xml is
US-ASCII
as set by the
Encoding.ASCII
property. If you want to alter this
program, you’ll also need to know how to recompile
it.
With
the .NET Framework Version 1.1
downloaded and installed on your system, you should be able to access
the C# complier csc
. If you type
csc
at a command prompt with no options, you
should see this:
Microsoft (R) Visual C# .NET Compiler version 7.10.2292.4 for Microsoft (R) .NET Framework version 1.1.4322 Copyright (C) Microsoft Corporation 2001-2002. All rights reserved. fatal error CS2008: No inputs specified
To view the many options available with csc
, enter:
csc /help
To compile Pax, type this command:
csc Pax.cs
Upon success, the compiler will produce a new version of Pax.exe. For more information on C#, study the vast documentation provided with the Version 1.1 Framework download. You can access the documentation by clicking on the Documentation link under Microsoft .NET Framework SDK v1.1 under Programs or All Programs on the Start menu.
If you are a programmer, this chapter has given you a leg up for creating your own interface to an XSLT processor in either Java or C#. I hope that the code and explanations were simple enough that you got the basic concepts down, and perhaps you were inspired to try some coding yourself. It certainly isn’t that difficult to get started if you have a programming background. There is only one more chapter to go, and it’s a short one.
3.133.132.99