Structure is nothing if it is all you got. Skeletons spook people if they try to walk around on their own. I really wonder why XML does not.
—Erik Naggum
XML doesn’t get much respect from the Rails community. It’s enterprisey. In the Ruby world, that other markup language, YAML (YAML ain’t markup language), and data interchange format, JSON (JavaScript object notation), get a heck of a lot more attention. However, use of XML is a fact of life for many projects, especially when it comes to interoperability with legacy systems. Luckily, Ruby on Rails gives us some pretty good functionality related to XML.
This chapter examines how to both generate and parse XML in your Rails applications, starting with a thorough examination of the to_xml
method that most objects have in Rails.
Sometimes you just want an XML representation of an object, and Active Record models provide easy, automatic XML generation via the to_xml
method. Let’s play with this method in the console and see what it can do.
I’ll fire up the console for my book-authoring sample application and find an Active Record object to manipulate.
>> User.find_by(login: 'obie')
=> #<User id: 8, login: "obie", email: "[email protected]",
crypted_password: "4a6046804fc4dc3183ad9012fbfee91c85723d8c",
salt: "399754af1b01cf3d4b87da5478d82674b0438eb8",
created_at: "2010-05-18 19:31:40", updated_at: "2010-05-18 19:31:40",
remember_token: nil, remember_token_expires_at: nil,
authorized_approver: true, client_id: nil, timesheets_updated_at: nil>
There we go—a User
instance. Let’s see that instance as its generic XML representation.
>> User.find_by(login: 'obie').to_xml
=> "<?xml version="1.0" encoding="UTF-8"?>
<user>
<authorized-approver type="boolean">true</authorized-approver>
<salt>399754af1b01cf3d4b87da5478d82674b0438eb8</salt>
<created-at type="datetime">2010-05-18T19:31:40Z</created-at>
<crypted-password>4a6046804fc4dc3183ad9012fbfee91c85723d8c
</crypted-password>
<remember-token-expires-at type="datetime"
nil="true"></remember-token-expires-at>
<updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
<id type="integer">8</id>
<client-id type="integer"
nil="true"></client-id>
<remember-token nil="true">
</remember-token>
<login>obie</login>
<email>[email protected]</email>
<timesheets-updated-at
type="datetime" nil="true"></timesheets-updated-at>
</user>
"
Ugh, that’s ugly. Ruby’s print
function might help us out here.
>> print User.find_by(login: 'obie').to_xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 <authorized-approver type="boolean">true</authorized-approver>
4 <salt>399754af1b01cf3d4b87da5478d82674b0438eb8</salt>
5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>
6
7 <crypted-password>4a6046804fc4dc3183ad9012fbfee91c85723d8c
8 </crypted-password>
9 <remember-token-expires-at type="datetime" nil="true">
10 </remember-token-expires-at>
11 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
12 <id type="integer">8</id>
13 <client-id type="integer" nil="true"></client-id>
14 <remember-token nil="true"></remember-token>
15 <login>obie</login>
16 <email>[email protected]</email>
17 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>
18 </user>
Much better! So what do we have here? Looks like a fairly straightforward serialized representation of our User
instance in XML.
The standard processing instruction is at the top followed by an element name corresponding to the class name of the object. The properties are represented as subelements, with nonstring data fields including a type
attribute. Mind you, this is the default behavior, and we can customize it with some additional parameters to the to_xml
method.
We’ll strip down that XML representation of a user to just an email and login using the only
parameter. It’s provided in a familiar options hash with the value of the :only
parameter as an array:
>> print User.find_by(login: 'obie').to_xml(only: [:email, :login])
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 <login>obie</login>
4 <email>[email protected]</email>
5 </user>
Following the familiar Rails convention, the only
parameter is complemented by its inverse, except
, which will exclude the specified properties. What if I want my user’s email and login as a snippet of XML that will be included in another document? Then let’s get rid of that pesky instruction, too, using the skip_instruct
parameter.
>> print User.find_by(login: 'obie').to_xml(only: [:email, :login], skip_instruct: true)
1 <user>
2 <login>obie</login>
3 <email>[email protected]</email>
4 </user>
We can change the root element in our XML representation of User
and the indenting from two to four spaces by using the root
and indent
parameters, respectively.
>> print User.find_by(login: 'obie').to_xml(root: 'employee', indent: 4)
1 <?xml version="1.0" encoding="UTF-8"?>
2 <employee>
3 <authorized-approver type="boolean">true</authorized-approver>
4 <salt>399754af1b01cf3d4b87da5478d82674b0438eb8</salt>
5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>
6 <crypted-password>4a6046804fc4dc3183ad9012fbfee91c85723d8c</crypted-password>
7 <remember-token-expires-at type="datetime" nil="true"></remember-token-expires-at>
8 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
9 <id type="integer">8</id>
10 <client-id type="integer" nil="true"></client-id>
11 <remember-token nil="true"></remember-token>
12 <login>obie</login>
13 <email>[email protected]</email>
14 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>
15 </employee>
By default Rails converts CamelCase and underscore attribute names to dashes as in created-at
and client-id
. You can force underscore attribute names by setting the dasherize
parameter to false
.
>> print User.find_by(login: 'obie').to_xml(dasherize: false,
only: [:created_at, :client_id])
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 <created_at type="datetime">2010-05-18T19:31:40Z</created_at>
4 <client_id type="integer" nil="true"></client_id>
5 </user>
In the preceding output, the attribute type is included. This too can be configured using the skip_types
parameter.
>> print User.find_by(login: 'obie').to_xml(skip_types: true,
only: [:created_at, :client_id])
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 <created-at>2010-05-18T19:31:40Z</created-at>
4 <client-id nil="true"></client-id>
5 </user>
So far we’ve only worked with a base Active Record and not with any of its associations. What if we wanted an XML representation of not just a book but also its associated chapters? Rails provides the :include
parameter for just this purpose. The :include
parameter will also take an array or associations to represent in XML.
>> print User.find_by(login: 'obie').to_xml(include: :timesheets)
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 <authorized-approver type="boolean">true</authorized-approver>
4 <salt>399754af1b01cf3d4b87da5478d82674b0438eb8</salt>
5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>
6 <crypted-password>
7 4a6046804fc4dc3183ad9012fbfee91c85723d8c
8 </crypted-password>
9 <remember-token-expires-at type="datetime"
10 nil="true"></remember-token-expires-at>
11 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
12 <id type="integer">8</id>
13 <client-id type="integer" nil="true"></client-id>
14 <remember-token nil="true"></remember-token>
15 <login>obie</login>
16 <email>[email protected]</email>
17 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>
18 <timesheets type="array">
19 <timesheet>
20 <created-at type="datetime">2010-05-04T19:31:40Z</created-at>
21 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
22 <lock-version type="integer">0</lock-version>
23 <id type="integer">8</id>
24 <user-id type="integer">8</user-id>
25 <submitted type="boolean">true</submitted>
26 <approver-id type="integer">7</approver-id>
27 </timesheet>
28 <timesheet>
29 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>
30 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
31 <lock-version type="integer">0</lock-version>
32 <id type="integer">9</id>
33 <user-id type="integer">8</user-id>
34 <submitted type="boolean">false</submitted>
35 <approver-id type="integer" nil="true"></approver-id>
36 </timesheet>
37 <timesheet>
38 <created-at type="datetime">2010-05-11T19:31:40Z</created-at>
39 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
40 <lock-version type="integer">0</lock-version>
41 <id type="integer">10</id>
42 <user-id type="integer">8</user-id>
43 <submitted type="boolean">false</submitted>
44 <approver-id type="integer" nil="true"></approver-id>
45 </timesheet>
46 </timesheets>
47 </user>
Rails has a much more useful to_xml
method on core classes. For example, arrays are easily serializable to XML, with element names inferred from the name of the Ruby type:
>> print ['cat', 'dog', 'ferret'].to_xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <strings type="array">
3 <string>cat</string>
4 <string>dog</string>
5 <string>ferret</string>
6 </strings>
If you have mixed types in the array, this is also reflected in the XML output:
>> print [3, 'cat', 'dog', :ferret].to_xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <objects type="array">
3 <object type="integer">3</object>
4 <object>cat</object>
5 <object>dog</object>
6 <object type="symbol">ferret</object>
7 </objects>
To construct a more semantic structure, the root
option on to_xml
triggers more expressive element names:
>> print ['cat', 'dog', 'ferret'].to_xml(root: 'pets')
1 <?xml version="1.0" encoding="UTF-8"?>
2 <pets type="array">
3 <pet>cat</pet>
4 <pet>dog</pet>
5 <pet>ferret</pet>
6 </pets>
Ruby hashes are naturally representable in XML, with keys corresponding to element names and their values corresponding to element contents. Rails automatically calls to_s
on the values to get string values for them:
>> print({owners: ['Chad', 'Trixie'], pets: ['cat', 'dog', 'ferret'],
id: 123}.to_xml(root: 'registry'))
1 <?xml version="1.0" encoding="UTF-8"?>
2 <registry>
3 <pets type="array">
4 <pet>cat</pet>
5 <pet>dog</pet>
6 <pet>ferret</pet>
7 </pets>
8 <owners type="array">
9 <owner>Chad</owner>
10 <owner>Trixie</owner>
11 </owners>
12 <id type="integer">123</id>
13 </registry>
Josh G. Says ...
This simplistic serialization may not be appropriate for certain interoperability contexts, especially if the output must pass XML Schema (XSD) validation when the order of elements is often important. In Ruby 1.9.x and 2.0, the Hash
class uses insertion order. This may not be adequate for producing output that matches an XSD. The section “The XML Builder” in this chapter will discuss Builder::XmlMarkup
to address this situation.
The :include
option of to_xml
is not used on Array
and Hash
objects.
By default, Active Record’s to_xml
method only serializes persistent attributes into XML. However, there are times when transient, derived, or calculated values need to be serialized out into XML form as well. For example, our User
model has a method that returns only draft timesheets:
1 class User < ActiveRecord::Base
2 ...
3 def draft_timesheets
4 timesheets.draft
5 end
6 ...
7 end
To include the result of this method when we serialize the XML, we use the :methods
parameter:
>> print User.find_by(login: 'obie').to_xml(methods: :draft_timesheets)
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 <id type="integer">8</id>
4 ...
5 <draft-timesheets type="array">
6 <draft-timesheet>
7 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>
8 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
9 <lock-version type="integer">0</lock-version>
10 <id type="integer">9</id>
11 <user-id type="integer">8</user-id>
12 <submitted type="boolean">false</submitted>
13 <approver-id type="integer" nil="true"></approver-id>
14 </draft-timesheet>
15 <draft-timesheet>
16 <created-at type="datetime">2010-05-11T19:31:40Z</created-at>
17 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
18 <lock-version type="integer">0</lock-version>
19 <id type="integer">10</id>
20 <user-id type="integer">8</user-id>
21 <submitted type="boolean">false</submitted>
22 <approver-id type="integer" nil="true"></approver-id>
23 </draft-timesheet>
24 </draft-timesheets>
25 </user>
We could also set the methods
parameter to an array of method names to be called.
In cases where we want to include extra elements unrelated to the object being serialized, we can pass to_xml
a block or use the :procs
option.
If we are using the same logic applied to different to_xml
calls, we can construct lambdas ahead of time and use one or more of them in the :procs
option. They will be called with to_xml
’s option hash, through which we access the underlying XmlBuilder
. (XmlBuilder
provides the principal means of XML generation in Rails.)
>> current_user = User.find_by(login: 'admin')
>> generated_at = lambda { |opts| opts[:builder].tag!('generated-at',
Time.now.utc.iso8601) }
>> generated_by = lambda { |opts| opts[:builder].tag!('generated-by',
current_user.email) }
>> print(User.find_by(login: 'obie').to_xml(procs: [generated_at,
generated_by]))
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 ...
4 <id type="integer">8</id>
5 <client-id type="integer" nil="true"></client-id>
6 <remember-token nil="true"></remember-token>
7 <login>obie</login>
8 <email>[email protected]</email>
9 <timesheets-updated-at type="datetime" nil="true"></timesheets-updated-at>
10 <generated-at>2010-05-18T19:33:49Z</generated-at>
11 <generated-by>[email protected]</generated-by>
12 </user>
>> print Timesheet.all.to_xml(procs: [generated_at, generated_by])
1 <?xml version="1.0" encoding="UTF-8"?>
2 <timesheets type="array">
3 <timesheet>
4 ...
5 <id type="integer">8</id>
6 <user-id type="integer">8</user-id>
7 <submitted type="boolean">true</submitted>
8 <approver-id type="integer">7</approver-id>
9 <generated-at>2010-05-18T20:18:30Z</generated-at>
10 <generated-by>[email protected]</generated-by>
11 </timesheet>
12 <timesheet>
13 ...
14 <id type="integer">9</id>
15 <user-id type="integer">8</user-id>
16 <submitted type="boolean">false</submitted>
17 <approver-id type="integer" nil="true"></approver-id>
18 <generated-at>2010-05-18T20:18:30Z</generated-at>
19 <generated-by>[email protected]</generated-by>
20 </timesheet>
21 <timesheet>
22 ...
23 <id type="integer">10</id>
24 <user-id type="integer">8</user-id>
25 <submitted type="boolean">false</submitted>
26 <approver-id type="integer" nil="true"></approver-id>
27 <generated-at>2010-05-18T20:18:30Z</generated-at>
28 <generated-by>[email protected]</generated-by>
29 </timesheet>
30 </timesheets>
Note that the :procs
are applied to each top-level resource in the collection (or the single resource if the top level is not a collection). Use the sample application to compare the output with the output from the following:
>> print User.all.to_xml(include: :timesheets, procs: [generated_at,
generated_by])
To add custom elements only to the root node, to_xml
will yield an XmlBuilder
instance when given a block:
>> print(User.all.to_xml { |xml| xml.tag! 'generated-by', current_user.email })
1 <?xml version="1.0" encoding="UTF-8"?>
2 <users type="array">
3 <user>...</user>
4 <user>...</user>
5 <generated-by>[email protected]</generated-by>
6 </users>
Unfortunately, both :procs
and the optional block are hobbled by a puzzling limitation: The record being serialized is not exposed to the procs being passed in as arguments, so only data external to the object may be added in this fashion.
To gain complete control over the XML serialization of Rails objects, you need to override the to_xml
method and implement it yourself.
Sometimes you need to do something out of the ordinary when trying to represent data in XML form. In those situations, you can create the XML by hand.
1 class User < ActiveRecord::Base
2 ...
3 def to_xml(options = {}, &block)
4 xml = options[:builder] || ::Builder::XmlMarkup.new(options)
5 xml.instruct! unless options[:skip_instruct]
6 xml.user do
7 xml.tag!(:email, email)
8 end
9 end
10 ...
11 end
This would give the following result:
>> print User.first.to_xml
1 <?xml version="1.0" encoding="UTF-8"?><user><email>[email protected]</email></user>
Of course, you could just go ahead and use good object-oriented design and use a class responsible for translating between your model and an external representation.
Builder::XmlMarkup
is the class used internally by Rails when it needs to generate XML. When to_xml
is not enough and you need to generate custom XML, you will use Builder
instances directly. Fortunately, the Builder API is one of the most powerful Ruby libraries available and is very easy to use, once you get the hang of it.
The API documentation says, “All (well, almost all) methods sent to an XmlMarkup
object will be translated to the equivalent XML markup. Any method with a block will be treated as an XML markup tag with nested markup in the block.”
That is a very concise way of describing how Builder
works, but it is easier to understand with some examples, again taken from Builder
’s API documentation. The xml
variable is a Builder::XmlMarkup
instance:
1 xm.em("emphasized") # => <em>emphasized</em>
2 xm.em { xm.b("emp & bold") } # => <em><b>emph & bold</b></em>
3
4 xm.a("foo", "href"=>"http://foo.org")
5 # => <a href="http://foo.org">foo</a>
6
7 xm.div { br } # => <div><br/></div>
8
9 xm.target("name"=>"foo", "option"=>"bar")
10 # => <target name="foo" option="bar"/>
11
12 xm.instruct! # <?xml version="1.0" encoding="UTF-8"?>
13
14 xm.html { # <html>
15 xm.head { # <head>
16 xm.title("History") # <title>History</title>
17 } # </head>
18 xm.body { # <body>
19 xm.comment! "HI" # <!-- HI -->
20 xm.h1("Header") # <h1>Header</h1>
21 xm.p("paragraph") # <p>paragraph</p>
22 } # </body>
23 } # </html>
A common use for Builder::XmlBuilder
is to render XML in response to a request. Previously, we talked about overriding to_xml
on Active Record to generate our custom XML. Another way, though not as recommended, is to use an XML template.
We could alter our UsersController#show
method to use an XML template by changing it from
1 def UsersController < ApplicationController
2 ...
3 def show
4 @user = User.find(params[:id])
5 respond_to do |format|
6 format.html
7 format.xml { render xml: @user.to_xml }
8 end
9 end
10 ...
11 end
to
1 def UsersController < ApplicationController
2 ...
3 def show
4 @user = User.find(params[:id])
5 respond_to do |format|
6 format.html
7 format.xml
8 end
9 end
10 ...
11 end
Now Rails will look for a file called show.xml.builder
in the app/views/users
directory. That file contains Builder::XmlMarkup
code like the following:
1 xml.user { # <user>
2 xml.email @user.email # <email>...</email>
3 xml.timesheets { # <timesheets>
4 @user.timesheets.each { |timesheet| #
5 xml.timesheet { # <timesheet>
6 xml.draft timesheet.submitted? # <draft>true</draft>
7 } # </timesheet>
8 } #
9 } # </timesheets>
10 } # </user>
In this view the variable xml
is an instance of Builder::XmlMarkup
. Just as in views, we have access to the instance variables we set in our controller, in this case @user
. Using the Builder
in a view can provide a convenient way to generate XML.
Ruby has a full-featured XML library named Nokogiri, and covering it in any level of detail is outside the scope of this book. If you have basic parsing needs, such as parsing responses from web services, you can use the simple XML parsing capability built into Rails.
Rails lets you turn arbitrary snippets of XML markup into Ruby hashes with the from_xml
method that it adds to the Hash
class.
To demonstrate, we’ll throw together a string of simplistic XML and turn it into a hash:
>> xml = <<-XML
<pets>
<cat>Franzi</cat>
<dog>Susie</dog>
<horse>Red</horse>
</pets>
XML
1 >> Hash.from_xml(xml)
2 => {"pets"=>{"cat"=>"Franzi", "dog"=>"Susie", "horse"=>"Red"}}
There are no options for from_xml
. You can also pass it an IO
object:
>> Hash.from_xml(File.new('pets.xml'))
=> {"pets"=>{"cat"=>"Franzi", "dog"=>"Susie", "horse"=>"Red"}}
Typecasting is done by using a type
attribute in the XML elements. For example, here’s the autogenerated XML for a User
object.
>> print User.first.to_xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <user>
3 <authorized-approver type="boolean">true</authorized-approver>
4 <salt>034fbec79d0ca2cd7d892f205d56ea95174ff557</salt>
5 <created-at type="datetime">2010-05-18T19:31:40Z</created-at>
6 <crypted-password>98dfc463d9122a1af0a5dc817601de437c69f365
7 </crypted-password>
8 <remember-token-expires-at type="datetime" nil="true" />
9 <updated-at type="datetime">2010-05-18T19:31:40Z</updated-at>
10 <id type="integer">7</id>
11 <client-id type="integer" nil="true" />
12 <remember-token nil="true" />
13 <login>admin</login>
14 <email>[email protected]</email>
15 <timesheets-updated-at type="datetime" nil="true" />
16 </user>
As part of the to_xml
method, Rails sets attributes called type
that identify the class of the value being serialized. If we take this XML and feed it to the from_xml
method, Rails will typecast the strings to their corresponding Ruby objects:
>> Hash.from_xml(User.first.to_xml)
=> {"user"=>{"salt"=>"034fbec79d0ca2cd7d892f205d56ea95174ff557",
"authorized_approver"=>true,
"created_at"=>Tue May 18 19:31:40 UTC 2010, "remember_token_expires_at"=>nil,
"crypted_password"=>"98dfc463d9122a1af0a5dc817601de437c69f365",
"updated_at"=>Tue May 18 19:31:40 UTC 2010, "id"=>7, "client_id"=>nil,
"remember_token"=>nil, "login"=>"admin",
"timesheets_updated_at"=>nil,
"email"=>"[email protected]"}}
In practice, the to_xml
and from_xml
methods meet the XML handling needs for most situations that the average Rails developer will ever encounter. Their simplicity masks a great degree of flexibility and power, and in this chapter, we attempted to explain them in sufficient detail to inspire your own exploration of XML handling in the Ruby world.
3.145.174.253