By default, AxKit presumes that the documents that it is serving and processing are stored as plain files on the filesystem. While this is typically adequate for most cases, in some situations you may need to get data from another source. In AxKit, the mechanism for slurping in data for further processing is called a Provider.
Providers come in two flavors: ContentProviders and StyleProviders. As their names suggest, ContentProviders are responsible for fetching the source of the content being delivered, while StyleProviders handle getting the source of the stylesheets to be used to transform that content. The default class for both types, Apache::AxKit::Provider::File, reads data from XML sources on the filesystem. Alternate classes can be configured for both the ContentProvider and the StyleProvider for a given resource using the corresponding AxKit configuration directive:
# Set each type of Provider explicitly AxContentProvider My::Provider AxStyleProvider My::Other::Provider # Or, configure both to use the same alternate class AxProvider My::Generic::Provider
Custom Providers can be used to fetch content from non-XML data, to get XML data from sources other than flat files, or a combination of the two. In some cases, you are looking to take advantage of Perl’s capable XML tools and other data processing facilities to generate an XML instance based on another source of data. In others, you simply want to read in XML for a source other than a plain file on the disk. For example, some Providers may use a SAX generator class to dynamically generate an XML document from a directory listing or Excel spreadsheet, while others may be used to extract existing XML content from a zip archive or relational database.
But wait, as you saw in Chapter 7, AxKit offers several fine options for generating dynamic content, so why would you use a Provider instead of a taglib? There is no hard and fast rule, but, in general, defining the real source of the content decides the matter. For example, a shopping cart application that includes a list of products from a database is probably best implemented through a taglib, while a content management system that returns a complete document from the same database may best be integrated into AxKit as a custom ContentProvider. That is, in the shopping cart page, the product list is only one component that is included into the content, while the data returned from the CMS defines the content. This distinction may seem a bit arbitrary, and from a technical point of view, it is, given that either task could be achieved by either means. Spending a few minutes considering the best approach for the task at hand can save hours of development time in the long run.
Generally, a Provider is expected to offer access to the sources of
the content or stylesheets that are associated with the current
request, as well as certain key pieces of metadata about those
sources. The data for the given resource must be returned from one of
the get_fh
(get filehandle),
get_strref
(get string reference), or
get_dom
(get Document Object instance) methods.
These methods are simply variations that allow the source to be
passed to AxKit in different data formats. Only one must be
implemented for the Provider to work. All custom Providers should be
inherited subclasses of AxKit’s base Provider class
Apache::AxKit::Provider
or one of its subclasses:
package My::Provider; use strict; use Apache::AxKit::Provider use vars qw( @ISA ); @ISA = qw( Apache::AxKit::Provider ); # Override some class methods and # add few of your own. 1;
Called before all other methods, the init
method
gives the Provider a chance to perform any initialization logic
needed to prepare for further processing. It is most commonly used
for things such as instantiating any objects that the Provider needs
to handle the request, initializing instance variables, and so on. In
classes that inherit from AxKit’s base Provider
class (and most should), it is passed the same arguments that were
passed to the constructor for the current instance:
sub init { my $self = shift; my %args = @_; $self->{content_application} = My::App->new( ); # and so on . . . }
The process
method is used to communicate
whether or not the Provider provides content for the given resource.
It is passed no arguments (apart from the instance reference passed
as the first argument to all Perl methods) and is expected to return
1
(or any nonzero value) to indicate that all
conditions are met for the Provider to handle the request and
0
or undef
, otherwise. For
cases in which the Provider cannot continue, it is strongly
recommended that an appropriate exception be thrown, providing an
explanation as to what may have gone wrong:
sub process { my $self = shift; my $uri = $self->apache_request->uri( ); # Get the data based on some URI-to-data mapping method # implemented elsewhere in your custom Provider my $data = $self->map_uri_to_data( $uri ); if ( defined( $data ) ) { $self->{data} = $data; return 1; } else { throw Apache::AxKit::Exception::Error( -text => "No data associated with URI '$uri'." ); } }
Used with AxKit’s caching mechanism, the
mtime
method is expected to return the last
modification time in seconds, since the epoch, for the current
resource. If the document being provided will always be dynamic
(based on user input, etc.), returning the result of
Perl’s built-in time( )
function ensures that the data is never cached unexpectedly:
sub mtime { return time( ); # content is always fresh. }
Implementing mtime
correctly for cases in which
the data being provided is not a plain file on the disk but is an
aggregate of data from more than one source can be tricky. For
example, if the content is being built as the result of an SQL query
that joins several tables that may have been updated at different
times, how does one determine the true last modification time for
that resource? The answer is always very application-specific, and I
will avoid making dubious generalizations here. It is enough to say
that being able to take advantage of AxKit’s caching
facilities wherever possible and appropriate is a
huge performance gain. The time spent
implementing mtime
is usually worth the
investment.
Called only on ContentProvider classes, the
get_styles
method is responsible for returning
the final list of processors to be applied to the given resource. It
is expected to return a reference to a list of style definitions that
AxKit uses to transform the content. Styles are applied in the order
that they appear—the first style is applied to the source
content, the second to the result of the first, and so on. The style
definitions take the form of an anonymous HASH reference containing
two required keys: href
whose value contains the
DocumentRoot-relative path to the stylesheet to be applied, and
type
, whose value declares the MIME type
associated with the Language processor to be used to apply the
stylesheet:
my @styles = ( { type => "text/xsl", href => "/styles/style1.xsl" }, type => "text/xsl", href => "/styles/style2.xsl" } );
In the default ContentProviders, get_styles( )
is used to map the current preferred style and media to any
xml-stylesheet
processing instructions contained
in the source XML. If no matching styles are found, the
ConfigReader’s
GetMatchingProcessors
method is called, the
document’s root element name and Document Type
Definition are evaluated against all
AxAdd*Processor
configuration directives in the
current scope, and any matching styles are used instead. In all,
get_styles
is a crucial method whose default
implementation provides much of AxKit’s expected
behavior. It should be overridden only with caution and a clear
purpose.
That said, some Providers, most notably those implementing a bridge
between AxKit and a content creation application that needs to define
one or more stylesheet transformations to create the
“view” of a given set of data for a
particular application state, may need explicit control over the list
of styles to apply to the content. In these cases,
get_styles
offers the most direct, least
ambiguous way to define the styles to be applied. Overriding the
default implementation of this method does not mean abandoning the
use of the preferred style and media properties that an upstream
plug-in may have set—these values are passed in as arguments to
get_styles
. The following shows how an
application-based Provider may conditionally override the current
list of styles, while still falling back to any default styles
defined via configuration directive or
xml-stylesheet
processing instruction:
package My::Provider; use vars qw( @ISA ); @ISA = qw( Apache::AxKit::Provider ); . . . sub get_styles { my $self = shift; my ( $preferred_media_name, $preferred_style_name ) = @_; my $app = $self->{some_content_application}; my @style_list = $app->get_axkit_styles( $preferred_media_name, $preferred_style_name ); # If your application returned styles, use those; otherwise, fall back to the # default implementation in your parent Provider class. if ( scalar( @style_list > 0 ) ) { return @style_list; # you return a reference, not the list itself. } else { return $self->SUPER::get_styles( $preferred_media_name, $preferred_style_name ); } }
Or, here’s how an application-driven ContentProvider may alter the preferred media and style properties while letting the default Provider handle the low-level details:
sub get_styles { my $self = shift; my ( $preferred_media_name, $preferred_style_name ) = @_; my $app = $self->{some_content_application}; my $new_preferred_style = $app->get_axkit_stylename( ) || $preferred_style_name; my $new_preferred_media = $app->get_axkit_medianame( ) || $preferred_media_name; return $self->SUPER::get_styles( $new_preferred_media, $new_preferred_style ); }
One of three methods available for returning content,
get_strref
(get string reference) offers the
ability to return the XML content for the current resource as a
reference to a scalar containing the entire document as a string. For
example, the following shows how a custom ContentProvider built on
XML::Generator::DBI
(which generates SAX events from the result of a database query) may
implement get_strref
to return a generated XML
instance:
sub get_strref { my $self = shift; my $content = undef; my $writer = XML::SAX::Writer->new( Output => $output ); my $generator = XML::Generator::DBI->new( Handler => $writer, dbh => $self->{db_handle} ); eval { $generator->execute( $self->{sql_statement} ); }; if ( my $error = $@ ) { throw Apache::AxKit::Exception::Error( -text => "Error generating XML: $error" ); } if ( length( $content ) ) { # you return a reference, not the scalar itself return $content; } else { throw Apache::AxKit::Exception::Error( -text => "No data was returned from SQL $self->{sql_statement}." ); } }
Similar to get_strref
, the
get_fh
(get filehandle) method offers a way to
return data as an open filehandle. In some circumstances too complex
to detail here, a filehandle requires fewer system resources than a
scalar variable that contains the same document as a plain string;
get_fh
offers a way to take advantage of that
optimization.
# As above, but return a filehandle instead sub get_fh { my $self = shift; # Use the Apache-friendly way to create a new filehandle my $handle = $self->apache_request->gensym( ); my $writer = XML::SAX::Writer->new( Output => $handle ); my $generator = XML::Generator::DBI->new( Handler => $writer, dbh => $self->{db_handle} ); eval { $generator->execute( $self->{sql_statement} ); }; if ( my $error = $@ ) { throw Apache::AxKit::Exception::Error( -text => "Error generating XML: $error" ); } return $handle; }
The get_dom
method offers a way to return the
XML data for the current resource as an
XML::LibXML::Document
instance. It is used most often as a means to pass the content from
application frameworks such as SAWA and CGI::XMLApplication without
incurring the overhead of serializing that DOM object via its
toString
method and reparsing it once it is
passed into AxKit.
sub get_strref { my $self = shift; my $content = $self->{XML_APP}->getDom( ); unless ( $content ) { throw Apache::AxKit::Exception::Error( -text => "Error generating XML, no document object returned" ); } return $content; }
Called throughout AxKit, the Provider’s
key
method should return a string that can be
used as a persistent, unique identifier for the current resource. It
is used extensively by AxKit’s default caching
mechanism (along with mtime
) to both look up
content that may be cached on the disk or to create the ID for a new
cache entry if caching is turned on and none previously existed.
In the default file Provider, key
simply returns
the filename associated with the current request, which is sufficient
in most cases. File-based alternate Providers are encouraged to do
the same, or to inherit from
Apache::AxKit::Provider::File
and avoid implementing the key
method
altogether. In cases in which there is no one-to-one mapping between
the current request URI and a file on the filesystem, a smarter
key
method is almost always required.
Suppose you use a content management system for part of your site
that stores the source XML documents in a relational database. You
are now faced with creating the public interface to that data.
Let’s go a step further and say that your CMS offers
an internal hierarchical mapping that allows content objects to be
selected using a path interface. Setting up the interface is easy.
You only need to create a virtual URL with a
<Location>
directive and set your custom
Provider as the ContentProvider for that URL. Then you can simply use
any additional path information from the incoming request as the path
passed to your application to retrieve the content. You must
explicitly set the cache directory for this resource, since, by
default, AxKit attempts to write the cache to the same directory as
the requested content—in this case, a directory that does not
actually exist.
# virtual URI for public side of your CMS <Location /cmsapp/content> AxContentProvider My::CMS::Provider # always set an explicit cache for virtual URIs AxCacheDir /.mycachedir </Location>
Given that all requests for content within this resource always have
the same value from the request object’s
filename
, you cannot just use the same strategy
as the default Providers. You must use the full URI (including the
additional path information) in the string returned to create a
unique cache key for each document in the document store:
sub key { my $self = shift; my $r = $self->apache_request( ); return $r->uri( ); }
You can achieve the same effect using a unique property from the content object itself:
sub key { my $self = shift; return $self->{content_object}->id( ); }
This method is expected to return 1
(or any
nonzero value) if the resource exists and is readable and
0
or undef
, otherwise.
Typically, a class member added to the current instance during
init
or process
can be
examined and the appropriate value returned.
sub process { my $self = shift; my $uri = $self->apache_request->uri( ); my $data = $self->map_uri_to_data( $uri ); if ( defined( $data ) ) { $self->{data} = $data;$self->{exists} = 1;
return 1; } else { throw Apache::AxKit::Exception::Error( -text => "No data associated with URI '$uri'." ); } } sub exists { my $self = shift;if ( defined( $self->{exists} ) ) {
return 1;
}
else {
return 0;
}
}
It is worth mentioning again: as subclasses of one of
AxKit’s default Providers, most custom Providers
only ever need to implement a few of these methods. Often,
implementing both the process
method to fetch
and preprocess the data from the given source and one of the
get_*
methods to return that data to AxKit are
all that is required for a working Provider. To bring this all
together, a simple Provider allows you to transparently serve both
content and
stylesheets from
zip archives. (See Example 8-2.)
Example 8-2. Provider::Zip
package Provider::Zip; use strict; use vars qw($VERSION @ISA); use Apache::AxKit::Provider::File; use Archive::Zip qw(:ERROR_CODES); # Inherit from the default file Provider class @ISA = ('Apache::AxKit::Provider::File'), Archive::Zip::setErrorHandler(&_error_handler); sub _error_handler { my $error = shift; AxKit::Debug(3, $error); } sub exists { my $self = shift; return defined( $self->{zip_member} ); } sub mtime { my $self = shift; return $self->{zip_member}->{lastModFileDateTime}; } sub process { my $self = shift; my $zip = Archive::Zip->new( ); if ($zip->read($self->{file}) != AZ_OK) { throw Apache::AxKit::Exception::IO (-text => "Couldn't read archive file '$self->{file}'"); } my $r = $self->apache_request; my $member; my $path_info = $r->path_info; $path_info =~ s|^/||; if ( $self->{zip_uri} ) { $member = $zip->memberNamed($self->{zip_uri}); } else { if ($path_info) { $member = $zip->memberNamed($path_info); } else { $member = $zip->memberNamed('index.xml') || $zip->memberNamed('index.xsp'), } } unless ( $member ) { throw Apache::AxKit::Exception::Declined( -text => "Document could not be retrieved from $self->{file}" ); } $self->{zip_member} = $member; return 1; } sub get_strref { my $self = shift; my ($data, $status) = $self->{zip_member}->contents( ); my $r = $self->apache_request( ); if ($status != AZ_OK) { throw Apache::AxKit::Exception::Error( -text => "Document could not be retrieved from $self->{file}: $status" ); } # Allow images to be served correctly if ( $r->path_info =~ /.(png|gif|jpg)$/ ) { my $image_type = $1; $r->content_type( 'image/' . $image_type ); $r->send_http_header( ); $r->print( $data ); throw Apache::AxKit::Exception::Declined( -text => "Image detected, skipping further processing." ); } return $data; } 1;
Obviously, a production-worthy implementation would be a bit more
complex, but the basic functionality exists. Once this custom
Provider is installed, you only need to configure AxKit to process
zip archives and then to set up special Alias
and
Location
directives for each zip to make browsing
the archived content seem transparent:
# Set AxKit to process zip archives AddHandler axkit .zip # Add an Alias, so the zipped content appears # to be a native part of the site. Alias /help /www/sites/myaxkithost/zips/helpdocs.zip # And set AxKit to use Provider::Zip to fetch both # content and stylesheets for the zipped help docs. <Location /help> AxProvider Provider::Zip </Location>
With these directives in place, a request to
http:://localhost/help/index.xml
causes AxKit to
extract the file index.xml
from the top level of
the helpdoc.zip
archive. In addition, any
xml-stylesheet
processing instructions found in
that document whose href
attribute pointed to a
document at or below that same level in the archive also cause that
stylesheet document to be extracted and applied
at request time.
3.147.48.212