As we saw, case classes significantly simplify handling of new nested data structures that we want to construct. The case class definition is probably the most convincing reason to move from Java (and SQL) to Scala. Now, what about the methods? How do we quickly add methods to a class without expensive recompilation? Scala allows you to do this transparently with traits!
A fundamental feature of functional programming is that functions are a first class citizen on par with objects. In the previous section, we defined the two EpochSeconds
functions that transform the ISO8601 format to epoch time in seconds. We also suggested the splitSession
function that provides a multi-session view for a given IP. How do we associate this or other behavior with a given class?
First, let's define a desired behavior:
scala> trait Epoch { | this: PageView => | def epoch() : Long = { LocalDateTime.parse(ts, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")).toEpochSecond(ZoneOffset.UTC) } | } defined trait Epoch
This basically creates a PageView
-specific function that converts a string representation for datetime to epoch time in seconds. Now, if we just make the following transformation:
scala> val rddEpoch = rdd.map(x => new Session(x.id, x.visits.map(x => new PageView(x.ts, x.path) with Epoch))) rddEpoch: org.apache.spark.rdd.RDD[Session[PageView with Epoch]] = MapPartitionsRDD[20] at map at <console>:31
We now have a new RDD of page views with additional behavior. For example, if we want to find out what is the time spent on each individual page in a session is, we will run a pipeline, as follows:
scala> rddEpoch.map(x => (x.id, x.visits.zip(x.visits.tail).map(x => (x._2.path, x._2.epoch - x._1.epoch)).mkString("[", ",", "]"))).take(3).foreach(println) (189.248.74.238,[(mycompanycom>homepage,104),(mycompanycom>running:slp,2),(mycompanycom>running:slp,59),(mycompanycom>running>stories>2013>04>themycompanyfreestore:cdp,2),(mycompanycom>running>stories>2013>04>themycompanyfreestore:cdp,5),(mycompanycom>running>stories>2013>04>themycompanyfreestore:cdp,0),(mycompanycom>running:slp,34),(mycompanycom>homepage,43),(mycompanycom>homepage,35),(mycompanycom:mobile>mycompany photoid>landing,6),(mycompanycom>men>shoes:segmentedgrid,50),(mycompanycom>homepage,14)]) (82.166.130.148,[]) (88.234.248.111,[(mycompanycom>plus>home,10),(mycompanycom>plus>home,8),(mycompanycom>plus>onepluspdp>sport band,2),(mycompanycom>onsite search>results found,22),(mycompanycom>plus>onepluspdp>sport band,27),(mycompanycom>plus>home,2),(mycompanycom>plus>home,18),(mycompanycom>plus>home,4),(mycompanycom>plus>onepluspdp>sport watch,3),(mycompanycom>gear>mycompany+ sportwatch:standardgrid,4),(mycompanycom>homepage,24),(mycompanycom>homepage,21),(mycompanycom>plus>products landing,2),(mycompanycom>homepage,24),(mycompanycom>homepage,23),(mycompanycom>plus>whatismycompanyfuel,2)])
Multiple traits can be added at the same time without affecting either the original class definitions or original data. No recompilation is required.
3.144.40.212