Structure is nothing if it is all you got. Skeletons spook people if they try to walk around on their own. I really wonder why XML does not.
—Erik Naggum
XML doesn’t get much respect from the Rails community. It’s enterprisey. In the Ruby world that other markup language YAML (YAML Ain’t Markup Language) and data interchange format JSON (JavaScript Object Notation) get a heck of a lot more attention. However, use of XML is a fact of life for many projects, especially when it comes to interoperability with legacy systems. Luckily, Ruby on Rails gives us some pretty good functionality related to XML.
This chapter examines how to both generate and parse XML in your Rails applications, starting with a thorough examination of the to_xml
method that most objects have in Rails.
to_xml
MethodSometimes you just want an XML representation of an object, and Active Record models provide easy, automatic XML generation via the to_xml
method. Let’s play with this method in the console and see what it can do.
I’ll fire up the console for my book-authoring sample application and find an Active Record object to manipulate.
There we go, a User
instance. Let’s see that instance as its generic model, XML representation.
Ugh, that’s ugly. Ruby’s print
, formatted XML function might help us out here.
Much better! So what do we have here? Looks like a fairly straightforward serialized representation of our User
instance in XML.
to_xml
OutputThe standard processing instruction is at the top, followed by an element name corresponding to the class name of the object. The properties are represented as subelements, with non-string data fields including a type
attribute. Mind you, this is the default behavior, and we can customize it with some additional parameters to the to_xml
method.
We’ll strip down that XML representation of a user to just an email and login using the only
parameter. It’s provided in a familiar options hash, with the value of the :only
parameter as an array:
Following the familiar Rails convention, the only
parameter is complemented by its inverse, except
, which will exclude the specified properties. What if I want my user’s email and login as a snippet of XML that will be included in another document? Then let’s get rid of that pesky instruction, too, using the skip_instruct
parameter.
We can change the root element in our XML representation of User
and the indenting from two to four spaces by using the root
and indent
parameters, respectively.
By default Rails converts CamelCase and underscore attribute names to dashes as in created-at
and client-id
. You can force underscore attribute names by setting the dasherize
parameter to false
.
In the preceding output, the attribute type is included. This too can be configured using the skip_types
parameter.
to_xml
So far we’ve only worked with a base Active Record and not with any of its associations. What if we wanted an XML representation of not just a book but also its associated chapters? Rails provides the :include
parameter for just this purpose. The :include
parameter will also take an array or associations to represent in XML.
Rails 3 has a much more useful to_xml
method on core classes. Unlike Rails 2, arrays are easily serializable to XML, with element names inferred from the name of the Ruby type:
If you have mixed types in the array, this is also reflected in the XML output:
To construct a more semantic structure, the root
option on to_xml
triggers more expressive element names:
Ruby hashes are naturally representable in XML, with keys corresponding to element names, and their values corresponding to element contents. Rails automatically calls to_s
on the values to get string values for them:
This simplistic serialization may not be appropriate for certain interoperability contexts, especially if the output must pass XML Schema (XSD) validation when the order of elements is often important. In Ruby 1.8.x, the Hash
class does not order keys
for enumeration. In Ruby 1.9.x, the Hash
class uses insertion order. Neither of these may be adequate for producing output that matches an XSD. The section “The XML Builder” will discuss Builder::XmlMarkup
to address this situation.
The :include
option of to_xml
is not used on Array
and Hash
objects.
to_xml
UsageBy default, Active Record’s to_xml
method only serializes persistent attributes into XML. However, there are times when transient, derived, or calculated values need to be serialized out into XML form as well. For example, our User
model has a method that returns only draft timesheets:
To include the result of this method when we serialize the XML, we use the :methods
parameter:
We could also set the methods
parameter to an array of method names to be called.
In cases where we want to include extra elements unrelated to the object being serialized, we can pass to_xml
a block, or use the :procs
option.
If we are using the same logic applied to different to_xml
calls, we can construct lambdas ahead of time and use one or more of them in the :procs
option. They will be called with to_xml
’s option hash, through which we access the underlying XmlBuilder
. (XmlBuilder
provides the principal means of XML generation in Rails.
Note that the :procs
are applied to each top-level resource in the collection (or the single resource if the top level is not a collection). Use the sample application to compare the output with the output from the following:
To add custom elements only to the root node, to_xml
will yield an XmlBuilder
instance when given a block:
Unfortunately, both :procs
and the optional block are hobbled by a puzzling limitation: The record being serialized is not exposed to the procs being passed in as arguments, so only data external to the object may be added in this fashion.
To gain complete control over the XML serialization of Rails objects, you need to override the to_xml
method and implement it yourself.
to_xml
Sometimes you need to do something out of the ordinary when trying to represent data in XML form. In those situations, you can create the XML by hand.
This would give the following result:
Of course, you could just go ahead and use good Object Oriented design and use a class responsible for translating between your model and an external representation.
Builder::XmlMarkup
is the class used internally by Rails when it needs to generate XML. When to_xml
is not enough and you need to generate custom XML, you will use Builder
instances directly. Fortunately, the Builder API is one of the most powerful Ruby libraries available and is very easy to use, once you get the hang of it.
The API documentation says: “All (well, almost all) methods sent to an XmlMarkup
object will be translated to the equivalent XML markup. Any method with a block will be treated as an XML markup tag with nested markup in the block.”
That is a very concise way of describing how Builder
works, but it is easier to understand with some examples, again taken from Builder
’s API documentation. The xm
variable is a Builder::XmlMarkup
instance:
A common use for Builder::XmlBuilder
is to render XML in response to a request. Previously we talked about overriding to_xml
on Active Record to generate our custom XML. Another way, though not as recommended, is to use an XML template.
We could alter our UsersController show
method to use an XML template by changing it from:
Now Rails will look for a file called show.xml.builder
in the RAILS_ROOT/views/ users
directory. That file contains Builder::XmlMarkup
code like
In this view, the variable xml
is an instance of Builder::XmlMarkup
. Just as in views, we have access to the instance variables we set in our controller, in this case @user
. Using the Builder
in a view can provide a convenient way to generate XML.
Ruby has a full-featured XML library named REXML, and covering it in any level of detail is outside the scope of this book. If you have basic parsing needs, such as parsing responses from web services, you can use the simple XML parsing capability built into Rails.
Rails lets you turn arbitrary snippets of XML markup into Ruby hashes, with the from_xml
method that it adds to the Hash
class.
To demonstrate, we’ll throw together a string of simplistic XML and turn it into a hash:
There are no options for from_xml
. You can also pass it an IO
object:
Typecasting is done by using a type
attribute in the XML elements. For example, here’s the auto-generated XML for a User
object.
As part of the to_xml
method, Rails sets attributes called type
that identify the class of the value being serialized. If we take this XML and feed it to the from_xml
method, Rails will typecast the strings to their corresponding Ruby objects:
Web applications often need to serve users in front of web browsers and other systems via some API. Other languages accomplish this using SOAP or some form of XML-RPC, but Rails takes a simpler approach. In Chapter 3, REST, Resources, and Rails, we talked about building RESTful controllers and using respond_to
to return different representations of resources. By doing so we could connect to http://localhost:3000/auctions.xml
and get back an XML representation of all auctions in the system. We can now write a client to consume this data using Active Resource.
Active Resource is a standard part of the Rails framework. It has complete understanding of RESTful routing and XML representation, and is designed to look and feel much like Active Record.
The simplest Active Resource model would look something like this:
To get a list of auctions we would call its all
method:
>> auctions = Auction.all
This will connect to http://localhost:3000/auctions.xml
.
Active Resource can’t automatically filter the resources like you would with Active Record’s where
method, but you can use :params
to pass options to the server, which can then filter the results.
And then from the consumer application, you might do:
>> auctions = Auction.all(:params => { :reserve => 100 })
This method, however, could easily become unmanageable, since in reality you would want to filter out unsupported params. A much better solution when you want to filter your results is to define a custom collection method on the server, and query against that instead.1
It is then trivial to query this collection from Active Resource
>> Auction.all(:from => :open)
Active Resource also supports nested resource routes like this discussed in Chapter 3, “REST, Resources, and Rails,”.
And now from your consumer application, you can pull back all of the items for an auction:
>> Item.all(:params => {:auction_id => 1})
Finding specific resources with Active Resource follows the same pattern as retrieving a collection. To fetch the auction with the id 1986, for instance, we can do:
>> Auction.find(1986)
If instead we just want to get the first auction, we can do:
>> Auction.first
You should note that Auction.first
is equivalent to calling Auction.all.first
(i.e., it will load http://localhost:3000/auctions.xml
and then call first on the returned collection).
If we wanted to find the newest Auction, we can do something similar to the open
example, but with a newest
method.
Now we can retrieve the newest auction.
>> Auction.find(:one, :from => :newest)
You need to remember that unlike with Active Record, first
is not the same as find(:one)
.
It’s also important to understand how a request to a nonexistent item is handled. If we tried to access an item with an id of -1
(there isn’t any such item), we would get an HTTP 404 status code back. This is exactly what Active Resource receives and raises a ResourceNotFound
exception. Active Resource makes heavy use of the HTTP status codes as we’ll see throughout this chapter.
Active Resource is not limited to just retrieving data; it can also create it. If we wanted to place a new bid on an item via Active Resource, we would do the following:
This would HTTP POST to the URL:
http://localhost:3000/auctions/3/items/6/bids.xml
with the supplied data. In our controller, the following would exist:
If the bid is successfully created, the newly created bid is returned with an HTTP 201 status code and the Location header is set pointing to the location of the newly created bid. With the Location header set, we can determine what the newly created bid’s id
is. For example:
If we tried to create the preceding bid again but without a dollar amount, we could interrogate the errors.
In this case a new Bid
object is returned from the create
method, but it’s not valid. If we try to see what its id
is we also get a nil. We can see what caused the create
to fail by calling the ActiveResource#errors
method. This method behaves just like ActiveRecord#errors
with one important exception. On ActiveRecord
if we called Errors#on
, we would get the error for that attribute. In the preceding example, we got a nil instead. The reason is that Active Resource, unlike Active Record, doesn’t know what attributes there are. Active Record does a SHOW FIELDS FROM <table>
to get this, but Active Resource has no equivalent. The only way Active Resource knows an attribute exists is if we tell it. For example:
In this case we told Active Resource that there is an amount attribute through the create
method. As a result we can now call Errors#on
without a problem.
Editing an Active Resource follows the same Active Record pattern.
If we set the amount to nil, ActiveResource.save
would return false
. In this case we could interrogate ActiveResource::Errors
for the reason, just as we would with create
. An important difference between Active Resource and Active Record is the absence of the save!
and update!
methods.
Removing an Active Resource can happen in two ways. The first is without instantiating the Active Resource
>> Bid.delete(1)
The other way requires instantiating the Active Resource first:
>> bid = Bid.find(1)
>> bid.destroy
Active Resource allows for the setting of HTTP headers on each request too. This can be done in two ways. The first is to set it as a variable:
This will cause every connection to the site to include the HTTP header: HTTP-X-FLAVOR: orange. In our controller, we could use the header value.
The second way to set the headers for an Active Resource is to override the headers method.
Active Resource assumes RESTful URLs, but that doesn’t always happen. Fortunately, you can customize the URL prefix and collection_name
. Suppose we assume the following Active Resource class:
The following URLs will be used:
We could also change the element name used to generate XML. In the preceding Active Resource, a create
of an OldAuctionSystem
would look like the following in XML:
The element name can be changed with
which will produce:
One consequence of setting the element_name
is that Active Resource will use the plural form to generate URLs. In this case it would be 'auctions'
and not 'OldAuctionSystems'
. To do this you will need to set the collection_name
as well.
It is also possible to set the primary key field Active Resource uses with self.primary_key
The methods find
, create
, save
, and delete
correspond to the HTTP methods of GET, POST, PUT, and DELETE, respectively. Active Resource has a method for each of these HTTP methods, too. They take the same arguments as find
, create
, save
, and delete
but return a hash of the XML received.
Active Resource comes with support for both HTTP Basic and HTTP Digest Authentication, as well as SSL authentication using X.509 certificates. Each has various compromises of simplicity, strength, interoperability, and infrastructure/system-administration support needs.
As with most HTTP clients and servers, MD5 is the only hashing algorithm supported in HTTP Digest. This is the only algorithm mentioned by RFC 2617, but Rails supports the extended properties of the RFC that strengthen the protocol despite the hashing algorithm used.2
Other authentication mechanisms, like OAuth, CAS, and Kerberos, can be found in HTTP servers, middleware, Ruby gems, and Rails plugins.
When using Basic Authentication, the credentials are sent in plain text and as such can be easily snooped. For this reason, an HTTPS connection should be used when using Basic Authentication.
Here is a basic model class that consumes a RESTful service to obtain data, and specifies credentials for an authenticated connection to the service:
You can also use URI-style credentials by putting them in the service’s URL. This is particularly useful if you have a fully-qualified URL in a configuration file that has been supplied by the service provider:
As soon as you supply any credential to the API, Active Resource will automatically attempt to authenticate on each connection. If the username and/or password is invalid, an ActiveResource::ClientError
is generated and handled in the consuming application.
Setting the auth_type
tells Active Resource to use Digest Authentication.
It’s as simple as that! Rails takes care of the rest (pardon the pun).
Dealing with only a hashed value (HA1
being the hash of colon-separated username, authentication realm, and password) is good, as your password is never transmitted — except perhaps when you (re-)set it. However, if the repository storing the HA1
is compromised, passwords will have to be reset (even if it’s just to the same password using a new secret or realm) as the HA1
could then be used by anyone to access your account on that server only. They still won’t know your password or be able to use the HA1
within another authentication realm. As such, despite its many known limitations and interoperability issues, Digest is definitely a step above Basic.
A type of public key authentication, you may also hear this referred to as “client-side certificate authentication” and, when used in conjunction with username/password credentials, is a form of two-factor authentication as it involves something you have (the certificate) and something you know (the credentials).
In this form of SSL-based authentication, the server provides its certificate as usual (creating the SSL connection), and then the client provides its certificate so that the server continues with the SSL session.
Sometimes you may find your Active Resource model may need to access a service on another network that is only accessible through a proxy server on your network (a “forward” proxy). This is often the case in your development environment where you may have to access the Internet through a proxy server, or perhaps an intranet application that needs data from the Internet.
In particularly thrifty enterprise networks (where Internet access is actively discouraged), the proxy server may even require authentication. It is far better to work with the infrastructure teams to remove the need for proxy authentication from selected machines (like your development workstation, and the production server even more so), and preferably no explicit proxy at all.
If the organization hasn’t made it to the 90’s yet with its Internet connectivity, or only trusts its information technologists as far as it can kick them, you may have bigger problems than configuring your Rails app.
To connect through your proxy server by providing it additional credentials either by providing a URI:
or using URI-style:
On the other side of the connection, the RESTful service that our Active Resource model is consuming, we can use the authentication built-in to Rails:
If the service is supporting HTTP Digest Authentication:
The authenticate_or_request_with_http_digest
method will first try to authenticate using a HA1
-style digest password (which is what our example above uses). If that fails, it will attempt to hash a plain text password and match it against the hash in the request.
Initial authentication of client certificates is done by whatever in your HTTP stack that negotiates the SSL session (e.g. httpd, nginx), not in your Rails application.
Depending on your infrastructure technology, you may have access to additional environment variables like SSL_CLIENT_CERT
, REMOTE_USER
, X-HTTP_AUTHORIZATION
. These can be used for deeper authentication (e.g., comparing a certificate’s DN, email, and CN) and for authorization (to verify if an authenticated user is allowed to perform specific actions).
In practice, the to_xml
and from_xml
methods meet the XML handling needs for most situations that the average Rails developer will ever encounter. Their simplicity masks a great degree of flexibility and power, and in this chapter we attempted to explain them in sufficient detail to inspire your own exploration of XML handling in the Ruby world.
As a pair, the to_xml
and from_xml
methods also enabled the creation of a framework that makes tying Rails applications together using authenticated RESTful web services drop-dead easy. That framework is named Active Resource, and this chapter gave you a crash-course introduction to it.
18.189.178.237