XML as a First-Class Citizen

Scala treats XML as a first-class citizen. So, instead of embedding XML documents into strings, you can place them inline in your code like you’d place an int or a Double value. Let’s take a look at an example:

UsingScala/UseXML.scala
 
val​ xmlFragment =
 
<symbols>
 
<symbol ticker=​"AAPL"​><units>200</units></symbol>
 
<symbol ticker=​"IBM"​><units>215</units></symbol>
 
</symbols>
 
 
println(xmlFragment)
 
println(xmlFragment.getClass)

We created a val named xmlFragment and directly assigned it to a sample XML content. Scala parsed the XML content and happily created an instance of scala.xml.Elem, as you see in the output:

 
<symbols>
 
<symbol ticker="AAPL"><units>200</units></symbol>
 
<symbol ticker="IBM"><units>215</units></symbol>
 
</symbols>
 
class scala.xml.Elem

The Scala package scala.xml provides a set of convenience classes to read, parse, create, and store XML documents. The ease of parsing XML documents in Scala is quite appealing, making XML quite bearable compared to using it in Java. Let’s explore the facilities to parse XML.

You probably have played with XPath, which provides a very powerful way to query into XML documents. Scala provides an XPath-like query ability with one minor difference. Instead of using the familiar XPath forward slashes—/ and //—to query, Scala uses backward slashes— and \—for methods that parse and extract contents. This difference was necessary since Scala follows the Java tradition of using the two forward slashes for comments and a single forward slash is the division operator. Let’s parse this XML fragment we have on hand.

Here’s a piece of code to get the symbol elements, using the XPath-like query:

UsingScala/UseXML.scala
 
var​ symbolNodes = xmlFragment ​"symbol"
 
symbolNodes foreach println
 
println(symbolNodes.getClass)

Let’s look at the output generated by the code:

 
<symbol ticker="AAPL"><units>200</units></symbol>
 
<symbol ticker="IBM"><units>215</units></symbol>
 
class scala.xml.NodeSeq$$anon$1

We called the method on the XML element to look for all symbol elements. It retuned an instance of scala.xml.NodeSeq, which represents a collection of XML nodes.

The method looks only for elements that are direct descendants of the target element, the symbols element in this example. To search through all the elements in the hierarchy starting from the target element, use the \ method. Also, use the text method to get the text node within an element. Let’s make use of those in an example:

UsingScala/UseXML.scala
 
var​ unitsNodes = xmlFragment \ ​"units"
 
unitsNodes foreach println
 
println(unitsNodes.getClass)
 
println(unitsNodes.head.text)

Let’s see what the code generated:

 
<units>200</units>
 
<units>215</units>
 
class scala.xml.NodeSeq$$anon$1
 
200

Even though the units elements are not direct children of the root element, the \ method extracted those elements—the method won’t do that. The text method helped to further extract the text from one of the units elements. We can also use pattern matching to get the text value and other contents. If we want to navigate the structure of an XML document, the methods and \ are useful. However, if we want to find matching content anywhere in the XML document at arbitrary locations, pattern matching will be more useful.

We saw the power of pattern matching in Chapter 9, Pattern Matching and Regular Expressions. Scala extends that power to matching XML fragments as well. Let’s see how:

UsingScala/UseXML.scala
 
unitsNodes.head ​match​ {
 
case​ <units>{numberOfUnits}</units> => println(s​"Units: $numberOfUnits"​)
 
}

The pattern matching extracted the following content for us:

 
Units: 200

We took the first units element and asked Scala to extract the text value. In the case statement we provided the match for the fragment we’re interested in and a pattern matching variable, numberOfUnits, as a placeholder for the text content of that element.

That helped us get the units for one symbol. There are two problems, however. The previous approach works only if the content matches exactly with the expression in the case; that is, the units element contains only one content item or one child element. If it contains a mixture of child elements and text contents, the previous match will fail. Furthermore, we want to get the units for all symbols, not just the first one. We can ask Scala to grab all contents, elements and text, using the _* symbol, like so:

UsingScala/UseXML.scala
 
println(​"Ticker Units"​)
 
xmlFragment ​match​ {
 
case​ <symbols>{symbolNodes @ _* }</symbols> =>
 
for​(symbolNode @ <symbol>{_*}</symbol> <- symbolNodes) {
 
println(​"%-7s %s"​.format(
 
symbolNode ​"@ticker"​, (symbolNode ​"units"​).text))
 
}
 
}

Let’s take a look at the output before examining the code:

 
Ticker Units
 
AAPL 200
 
IBM 215

Nice output, but the code to produce that is a bit dense. Let’s take the time to understand it.

By using the wildcard symbol _*, we asked to read everything between the <symbols> and </symbols> into the placeholder variable symbolNodes. We saw an example using the @ symbol to place a variable name in Matching Tuples and Lists. The good news: That call reads everything. The bad news: It reads everything, including the text nodes that represent the blank spaces in the XML fragment. You’re quite used to this problem if you’ve used XML DOM parsers. To deal with this, when looping through the symbolNodes we iterate over only the symbol elements by pattern matching once more, this time in the parameter to the for method.

Remember, the first parameter you provide for the for method is a pattern (see The for Expression). Finally, we perform an XPath query to get to the attribute ticker and the text value in the units elements; you’ll recall from XPath that you use an @ prefix to indicate the attribute query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.188.160