JSON in Scala – an exercise in pattern matching

There are several libraries for manipulating JSON in Scala. We prefer json4s, but if you are a die-hard fan of another JSON library, you should be able to readily adapt the examples in this chapter. Let's create a build.sbt file with a dependency on json4s:

// build.sbt
scalaVersion := "2.11.7"

libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.11"

We can then import json4s into an SBT console session with:

scala> import org.json4s._
import org.json4s._

scala> import org.json4s.native.JsonMethods._
import org.json4s.native.JsonMethods._

Let's use json4s to parse the response to our GitHub API query:

scala> val jsonResponse = parse(response)
jsonResponse: org.json4s.JValue = JObject(List((login,JString(odersky)),(id,JInt(795990)),...

The parse method takes a string (that contains well-formatted JSON) and converts it to a JValue, a supertype for all json4s objects. The runtime type of the response to this particular query is JObject, which is a json4s type representing a JSON object.

JObject is a wrapper around a List[JField], and JField represents an individual key-value pair in the object. We can use extractors to access this list:

scala> val JObject(fields) = jsonResponse
fields: List[JField] = List((login,Jstring(odersky)),...

What's happened here? By writing val JObject(fields) = ..., we are telling Scala:

  • The right-hand side has runtime type of JObject
  • Go into the JObject instance and bind the list of fields to the constant fields

Readers familiar with Python might recognize the similarity with tuple unpacking, though Scala extractors are much more powerful and versatile. Extractors are used extensively to extract Scala types from json4s types.

Tip

Pattern matching using case classes

How exactly does the Scala compiler know what to do with an extractor such as:

val JObject(fields) = ...

JObject is a case class with the following constructor:

case class JObject(obj:List[JField])

Case classes all come with an extractor that reverses the constructor exactly. Thus, writing val JObject(fields) will bind fields to the obj attribute of the JObject. For further details on how extractors work, read Appendix, Pattern Matching and Extractors.

We have now extracted fields, a (plain old Scala) list of fields from the JObject. A JField is a key-value pair, with the key being a string and value being a subtype of JValue. Again, we can use extractors to extract the values in the field:

scala> val firstField = fields.head
firstField: JField = (login,JString(odersky))

scala> val JField(key, JString(value)) = firstField
key: String = login
value: String = odersky

We matched the right-hand side against the pattern JField(_, JString(_)), binding the first element to key and the second to value. What happens if the right-hand side does not match the pattern?

scala> val JField(key, JInt(value)) = firstField
scala.MatchError: (login,JString(odersky)) (of class scala.Tuple2)
...

The code throws a MatchError at runtime. These examples demonstrate the power of nested pattern matching: in a single line, we managed to verify the type of firstField, that its value has type JString, and we have bound the key and value to the key and value variables, respectively. As another example, if we know that the first field is the login field, we can both verify this and extract the value:

scala> val JField("login", JString(loginName)) = firstField
loginName: String = odersky

Notice how this style of programming is declarative rather than imperative: we declare that we want a JField("login", JString(_)) variable on the right-hand side. We then let the language figure out how to check the variable types. Pattern matching is a recurring theme in functional languages.

We can also use pattern matching in a for loop when looping over fields. When used in a for loop, a pattern match defines a partial function: only elements that match the pattern pass through the loop. This lets us filter the collection for elements that match a pattern and also apply a transformation to these elements. For instance, we can extract every string field in our fields list:

scala> for {
  JField(key, JString(value)) <- fields
} yield (key -> value)
List[(String, String)] = List((login,odersky), (avatar_url,https://avatars.githubusercontent.com/...

We can use this to search for specific fields. For instance, to extract the "followers" field:

scala> val followersList = for {
  JField("followers", JInt(followers)) <- fields
} yield followers
followersList: List[Int] = List(707)

scala> val followers = followersList.headOption
blogURL: Option[Int] = Some(707)

We first extracted all fields that matched the pattern JField("follower", JInt(_)), returning the integer inside the JInt. As the source collection, fields, is a list, this returns a list of integers. We then extract the first value from this list using headOption, which returns the head of the list if the list has at least one element, or None if the list is empty.

We are not limited to extracting a single field at a time. For instance, to extract the "id" and "login" fields together:

scala> {
  for {
    JField("login", JString(loginName)) <- fields
    JField("id", JInt(id)) <- fields
  } yield (id -> loginName)
}.headOption      
Option[(BigInt, String)] = Some((795990,odersky))

Scala's pattern matching and extractors provide you with an extremely powerful way of traversing the json4s tree, extracting the fields that we need.

JSON4S types

We have already discovered parts of json4s's type hierarchy: strings are wrapped in JString objects, integers (or big integers) are wrapped in JInt, and so on. In this section, we will take a step back and formalize the type structure and what Scala types they extract to. These are the json4s runtime types:

  • val JString(s) // => extracts to a String
  • val JDouble(d) // => extracts to a Double
  • val JDecimal(d) // => extracts to a BigDecimal
  • val JInt(i) // => extracts to a BigInt
  • val JBool(b) // => extracts to a Boolean
  • val JObject(l) // => extracts to a List[JField]
  • val JArray(l) // => extracts to a List[JValue]
  • JNull // => represents a JSON null

All these types are subclasses of JValue. The compile-time result of parse is JValue, which you normally need to cast to a concrete type using an extractor.

The last type in the hierarchy is JField, which represents a key-value pair. JField is just a type alias for the (String, JValue) tuple. It is thus not a subtype of JValue. We can extract the key and value using the following extractor:

val JField(key, JInt(value)) = ...

Extracting fields using XPath

In the previous sections, you learned how to traverse JSON objects using extractors. In this section, we will look at a different way of traversing JSON objects and extracting specific fields: the XPath DSL (domain-specific language). XPath is a query language for traversing tree-like structures. It was originally designed for addressing specific nodes in an XML document, but it works just as well with JSON. We have already seen an example of XPath syntax when we extracted the stock price from the XML document returned by the "Markit on demand" API in Chapter 4, Parallel Collections and Futures. We extracted the node with tag "LastPrice" using r "LastPrice". The operator was defined by the scala.xml package.

The json4s package exposes a similar DSL to extract fields from JObject instances. For instance, we can extract the "login" field from the JSON object jsonResponse:

scala> jsonResponse  "login"
org.json4s.JValue = JString(odersky)

This returns a JValue that we can transform into a Scala string using an extractor:

scala> val JString(loginName) = jsonResponse  "login"
loginName: String = odersky

Notice the similarity between the XPath DSL and traversing a filesystem: we can think of JObject instances as directories. Field names correspond to file names and the field value to the content of the file. This is more evident for nested structures. The users endpoint of the GitHub API does not have nested documents, so let's try another endpoint. We will query the API for the repository corresponding to this book: "https://api.github.com/repos/pbugnion/s4ds". The response has the following structure:

{
  "id": 42269470,
  "name": "s4ds",
  ...
  "owner": { "login": "pbugnion", "id": 1392879 ... }
  ...
}

Let's fetch this document and use the XPath syntax to extract the repository owner's login name:

scala> val jsonResponse = parse(Source.fromURL(
  "https://api.github.com/repos/pbugnion/s4ds"
).mkString)
jsonResponse: JValue = JObject(List((id,JInt(42269470)), (name,JString(s4ds))...

scala> val JString(ownerLogin) = jsonResponse  "owner"  "login"
ownerLogin: String = pbugnion

Again, this is much like traversing a filesystem: jsonResponse "owner" returns a JObject corresponding to the "owner" object. This JObject can, in turn, be queried for the "login" field, returning the value JString(pbugnion) associated with this key.

What if the API response is an array? The filesystem analogy breaks down somewhat. Let's query the API endpoint listing Martin Odersky's repositories: https://api.github.com/users/odersky/repos. The response is an array of JSON objects, each of which represents a repository:

[
  {
    "id": 17335228,
    "name": "dotty",
    "size": 14699,
    ...
  },
  {
    "id": 15053153,
    "name": "frontend",
    "size": 392
    ...
  },
  {
    "id": 2890092,
    "name": "scala",
    "size": 76133,
    ...
  },
  ...
]

Let's fetch this and parse it as JSON:

scala> val jsonResponse = parse(Source.fromURL(
  "https://api.github.com/users/odersky/repos"
).mkString)
jsonResponse: JValue = JArray(List(JObject(List((id,JInt(17335228)), (name,Jstring(dotty)), ...

This returns a JArray. The XPath DSL works in the same way on a JArray as on a JObject, but now, instead of returning a single JValue, it returns an array of fields matching the path in every object in the array. Let's get the size of all Martin Odersky's repositories:

scala> jsonResponse  "size"
JValue = JArray(List(JInt(14699), JInt(392), ...

We now have a JArray of the values corresponding to the "size" field in every repository. We can iterate over this array with a for comprehension and use extractors to convert elements to Scala objects:

scala> for {
  JInt(size) <- (jsonResponse  "size")
} yield size
List[BigInt] = List(14699, 392, 76133, 32010, 98166, 1358, 144, 273)

Thus, combining extractors with the XPath DSL gives us powerful, complementary tools to extract information from JSON objects.

There is much more to the XPath syntax than we have space to cover here, including the ability to extract fields nested at any level of depth below the current root or fields that match a predicate or a certain type. We find that well-designed APIs obviate the need for many of these more powerful functions, but do consult the documentation (json4s.org) to get an overview of what you can do.

In the next section, we will look at extracting JSON directly into case classes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.186.167