Chapter 11. Domain-Specific Languages in Scala

A Domain-Specific Language is a programming language that mimics the terms, idioms, and expressions used among experts in the targeted domain. Code written in a DSL reads like structured prose for the domain. Ideally, a domain expert with little experience in programming can read, understand, and validate this code. Sometimes, a domain expert might be able to write DSL code, even if he isn’t a professional programmer.

DSLs are a large topic. We’ll only touch the surface of DSLs and Scala’s impressive support for them. For more information on DSLs in general, see [Fowler2009], [Ford2009], and [Deursen]. The basic build tool we used for the book’s examples, sake, uses a DSL similar to the venerable make and its Ruby cousin rake. (See the README in the code download archive for details.) For other examples of Scala “internal” and “external” DSLs, see [Ghosh2008a] and [Ghosh2008b]. For some advanced work on DSLs using Scala, [Hofer2008] explores polymorphic substitution of alternative implementations for DSL abstractions, which is useful for analysis, optimization, composition, etc.

Well-crafted DSLs offer several benefits:

Encapsulation

A DSL hides implementation details and exposes only those abstractions relevant to the domain.

Efficiency

Because implementation details are encapsulated, a DSL optimizes the effort required to write or modify code for application features.

Communication

A DSL helps developers understand the domain and domain experts to verify that the implementation meets the requirements.

Quality

A DSL minimizes the “impedance mismatch” between feature requirements, as expressed by domain experts, and the implementing source code, thereby minimizing potential bugs.

However, DSLs also have several drawbacks:

Difficulties of creating good DSLs

Good DSLs are harder to design than traditional APIs. The latter tend to follow language idioms for API design, where uniformity is important. Even then, elegant, effective, and easy-to-use APIs are difficult to design. In contrast, each DSL should reflect the unique language idioms of its domain. The DSL designer has much greater latitude, which also means it is much harder to determine the “best” design choices.

Long-term maintenance

DSLs can require more maintenance over the long term to factor in domain changes. Also, new developers will require more time to learn how to use and maintain a DSL.

However, when a DSL is appropriate for an application—e.g., when it would be used frequently to implement and change functionality—a well-designed DSL can be a powerful tool for building flexible and robust applications.

From the implementation point of view, DSLs are often classified as internal and external.

An internal (sometimes called embedded) DSL is an idiomatic way of writing code in a general-purpose programming language, like Scala. No special-purpose parser is necessary for internal DSLs. Instead, they are parsed just like any other code written in the language. In contrast, an external DSL is a custom language with its own custom grammar and parser.

Internal DSLs are easier to create because they don’t require a special-purpose parser. On the other hand, the constraints of the underlying language limit the options for expressing domain concepts. External DSLs remove this constraint. You can design the language any way you want, as long as you can write a reliable parser for it. The downside of external DSLs is the requirement to write and use a custom parser.

DSLs have been around a long time. For example, internal DSLs written in Lisp are as old as Lisp itself. Interest in DSLs has surged recently, driven in part by the Ruby community, because they are very easy to implement in Ruby. As we’ll see, Scala provides excellent support for the creation of internal and external DSLs.

Internal DSLs

Let’s create an internal DSL for a payroll application that computes an employee’s paycheck every pay period, which will be two weeks long. The paycheck will include the employee’s net salary, which is the gross salary minus the deductions for taxes, insurance premiums (at least in some countries), retirement fund contributions, etc.

To better understand the contrasts between code that makes use of DSLs and code that does not, let’s try both techniques on the same problem. Here’s how the paycheck might be calculated for two employees, without the help of a DSL:

// code-examples/DSLs/payroll/api/payroll-api-script.scala

import payroll.api._
import payroll.api.DeductionsCalculator._
import payroll._
import payroll.Type2Money._

val buck = Employee(Name("Buck", "Trends"), Money(80000))
val jane = Employee(Name("Jane", "Doe"), Money(90000))

List(buck, jane).foreach { employee =>
  // Assume annual is based on 52 weeks.
  val biweeklyGross = employee.annualGrossSalary / 26.

  val deductions = federalIncomeTax(employee, biweeklyGross) +
          stateIncomeTax(employee, biweeklyGross) +
          insurancePremiums(employee, biweeklyGross) +
          retirementFundContributions(employee, biweeklyGross)

  val check = Paycheck(biweeklyGross, biweeklyGross - deductions, deductions)

  format("%s %s: %s
", employee.name.first, employee.name.last, check)
}

For each employee, the script calculates the gross pay for the pay period, the deductions, and the resulting net. These values are placed into a Paycheck, which is printed out. Before we describe the types we are using, notice a few things about the foreach loop that does the work.

First, it is noisy. For example, it mentions employee and biweeklyGross incessantly. A DSL will help us minimize that “noise” and focus on what’s really going on.

Second, notice that the code is imperative. It says “divide this, add that,” and so forth. We’ll see that our DSLs look similar, but they are more declarative, hiding the work from the user.

Here is the simple Paycheck class used in the script:

// code-examples/DSLs/payroll/paycheck.scala

package payroll

/** We're ignoring invalid (?) cases like a negative net
 *  when deductions exceed the gross.
 */
case class Paycheck(gross: Money, net: Money, deductions: Money) {

  def plusGross (m: Money)      = Paycheck(gross + m, net + m, deductions)
  def plusDeductions (m: Money) = Paycheck(gross,     net - m, deductions + m)
}

The Employee type uses a Name type:

// code-examples/DSLs/payroll/employee.scala

package payroll

case class Name(first: String, last: String)

case class Employee(name: Name, annualGrossSalary: Money)

The Money type handles arithmetic, rounding to four decimal places, etc. It ignores currency, except for the toString method. Proper financial arithmetic is notoriously difficult to do correctly for real-world transactions. This implementation is not perfectly accurate, but it’s close enough for our purposes. [MoneyInJava] provides useful information on doing real money calculations:

// code-examples/DSLs/payroll/money.scala

package payroll
import java.math.{BigDecimal => JBigDecimal,
    MathContext => JMathContext, RoundingMode => JRoundingMode}

/** Most arithmetic is done using JBigDecimals for tighter control.
 */
class Money(val amount: BigDecimal) {

  def + (m: Money)  =
      Money(amount.bigDecimal.add(m.amount.bigDecimal))
  def - (m: Money)  =
      Money(amount.bigDecimal.subtract(m.amount.bigDecimal))
  def * (m: Money)  =
      Money(amount.bigDecimal.multiply(m.amount.bigDecimal))
  def / (m: Money)  =
      Money(amount.bigDecimal.divide(m.amount.bigDecimal,
          Money.scale, Money.jroundingMode))

  def <  (m: Money)  = amount <  m.amount
  def <= (m: Money)  = amount <= m.amount
  def >  (m: Money)  = amount >  m.amount
  def >= (m: Money)  = amount >= m.amount

  override def equals (o: Any) = o match {
    case m: Money => amount equals m.amount
    case _ => false
  }

  override def hashCode = amount.hashCode * 31

  // Hack: Must explicitly call the correct conversion: double2Double
  override def toString =
      String.format("$%.2f", double2Double(amount.doubleValue))
}

object Money {
  def apply(amount: BigDecimal)  = new Money(amount)
  def apply(amount: JBigDecimal) = new Money(scaled(new BigDecimal(amount)))
  def apply(amount: Double)      = new Money(scaled(BigDecimal(amount)))
  def apply(amount: Long)        = new Money(scaled(BigDecimal(amount)))
  def apply(amount: Int)         = new Money(scaled(BigDecimal(amount)))

  def unapply(m: Money) = Some(m.amount)

  protected def scaled(d: BigDecimal) = d.setScale(scale, roundingMode)

  val scale = 4
  val jroundingMode = JRoundingMode.HALF_UP
  val roundingMode  = BigDecimal.RoundingMode.ROUND_HALF_UP
  val context = new JMathContext(scale, jroundingMode)
}

object Type2Money {
  implicit def bigDecimal2Money(b: BigDecimal)   = Money(b)
  implicit def jBigDecimal2Money(b: JBigDecimal) = Money(b)
  implicit def double2Money(d: Double)           = Money(d)
  implicit def long2Money(l: Long)               = Money(l)
  implicit def int2Money(i: Int)                 = Money(i)
}

Note that we use scala.BigDecimal, which wraps java.math.BigDecimal, as the storage type for financial figures.

Deductions are calculated using four helper methods in payroll.api.DeductionsCalculator:

// code-examples/DSLs/payroll/api/deductions-calc.scala

package payroll.api
import payroll.Type2Money._

object DeductionsCalculator {
  def federalIncomeTax(empl: Employee, gross: Money) = gross * .25

  def stateIncomeTax(empl: Employee, gross: Money) = gross * .05

  def insurancePremiums(empl: Employee, gross: Money) = Money(500)

  def retirementFundContributions(empl: Employee, gross: Money) = gross * .10
}

Each method might use the employee information and the gross salary for the pay period. In this case, we use very simple algorithms based on just the gross salary, except for insurance premiums, which we treat as a fixed value.

Running the script for the payroll API produces the following output:

(665) $ scala -cp ... payroll-api-script.scala
Buck Trends: Paycheck($3076.92,$1346.15,$1730.77)
Jane Doe: Paycheck($3461.54,$1576.92,$1884.62)

A Payroll Internal DSL

The previous code works well enough, but suppose we wanted to show it to the Accounting Department to confirm that we’re calculating paychecks correctly. Most likely, they would get lost in the Scala idioms. Suppose further that we need the ability to customize this algorithm frequently—for example, because it needs to be customized for different employee types (salaried, hourly, etc.), or to modify the deduction calculations. Ideally, we would like to enable the accountants to do these customizations themselves, without our help.

We might achieve these goals if we can express the logic in a DSL that is sufficiently intuitive to an accountant. Can we morph our API example into such a DSL?

Returning to the script for the payroll API, what if we hide most of the explicit references to context information, like the employee, gross salary, and deduction values? Consider the following text:

Rules to calculate an employee's paycheck:
  employee's gross salary for 2 weeks
  minus deductions for
    federalIncomeTax, which     is  25%  of gross
    stateIncomeTax, which       is  5%   of gross
    insurancePremiums, which    are 500. in gross's currency
    retirementFundContributions are 10%  of gross

This reads like normal English, not code. We have included some “bubble” words (see [Ford2009]) that aid readability but don’t necessarily correspond to anything essential, such as to, an, is, for, of, and which. We’ll eliminate some of these unnecessary words and keep others in our Scala DSL.

Compared to the version in the payroll API script, there’s a lot less clutter obscuring the essentials of the algorithm. This is because we have minimized explicit references to the contextual information. We only mention employee twice. We mention gross five times, but hopefully in “intuitive” ways.

There are many possible internal Scala DSLs we could construct that resemble this ad hoc DSL. Here is one of them, again in a script, which produces the same output as before:

// code-examples/DSLs/payroll/dsl/payroll-dsl-script.scala

import payroll._
import payroll.dsl._
import payroll.dsl.rules_

val payrollCalculator = rules { employee =>
  employee salary_for 2.weeks minus_deductions_for { gross =>
    federalIncomeTax            is  (25.  percent_of gross)
    stateIncomeTax              is  (5.   percent_of gross)
    insurancePremiums           are (500. in gross.currency)
    retirementFundContributions are (10.  percent_of gross)
  }
}

val buck = Employee(Name("Buck", "Trends"), Money(80000))
val jane = Employee(Name("Jane", "Doe"), Money(90000))

List(buck, jane).foreach { employee =>
  val check = payrollCalculator(employee)
  format("%s %s: %s
", employee.name.first, employee.name.last, check)
}

We’ll go through the implementation step by step, but first, let’s summarize the features of Scala that allow us to implement this DSL.

Infix Operator Notation

Consider this line in the definition of payrollCalculator:

employee salary_for 2.weeks minus_deductions_for { gross =>

This infix notation is equivalent to the following less-readable form:

employee.salary_for(2.weeks).minus_deductions_for { gross =>

You can see why we wrote 2.weeks earlier, because the result of this expression is passed to salary_for. Without the period, the infix expression would be parsed as employee.salary_for(2).weeks.... There is no weeks method on Int, of course. We’ll revisit this expression in a moment.

Method chaining like this is often implemented where each method returns this so you can continue calling methods on the same instance. Note that returning this allows those method calls to occur in any order. If you need to impose a specific ordering, then return an instance of a different type. For example, if minus_deductions_for must be called after salary_for, then salary_for should return a new instance.

Because chaining is so easy, we could have created separate methods for salary, for, minus, and deductions, allowing us to write the following expression:

employee salary for 2.weeks minus deductions for { gross =>

Note that calls to for are preceded by different calls with very different meanings. So, if the same instance is used throughout, it would have to track the “flow” internally. Chaining different instances would eliminate this problem. However, since no computations are actually needed between these words, we chose the simpler design where words are joined together, separated by _.

Implicit Conversions and User-Defined Types

Returning to 2.weeks, since Int doesn’t have a weeks method, we use an implicit conversion to a Duration instance that wraps an Int specifying an amount:

// code-examples/DSLs/payroll/dsl/duration.scala

package payroll.dsl

case class Duration(val amount: Int) {
  /** @return the number of work days in "amount" weeks. */
  def weeks = amount * 5

  /** @return the number of work days in "amount" years. */
  def years = amount * 260
}

The weeks method multiples that amount by 5 to return the corresponding amount of work days. Hence, we designed the payroll calculator to work with days as the unit of time. This decision is completely hidden behind the DSL. Should we later add support for work hours, it would be easy to refactor the design to use hours instead.

Duration is one of the ad hoc types that we designed to encapsulate the implicit context, to implement helper methods for the DSL, etc. We’ll discuss the implicit conversion method we need in a moment.

Apply Methods

A number of the implementation objects use apply to invoke behavior. The rules object encapsulates the process of building the rules for payroll calculation. Its apply method takes a function literal, Employee => Paycheck.

Payroll Rules DSL Implementation

Now let’s explore the implementation, starting with the rules object and working our way down:

// code-examples/DSLs/payroll/dsl/payroll.scala

package payroll.dsl
import payroll._

object rules {

  def apply(rules: Employee => Paycheck) = new PayrollBuilderRules(rules)

  implicit def int2Duration(i: Int) = Duration(i)

  implicit def employee2GrossPayBuilder(e: Employee) =
      new GrossPayBuilder(e)

  implicit def grossPayBuilder2DeductionsBuilder(b: GrossPayBuilder)
      = new DeductionsBuilder(b)

  implicit def double2DeductionsBuilderDeductionHelper(d: Double) =
      new DeductionsBuilderDeductionHelper(d)
}

import rules._
...

The function literal argument for rules.apply is used to construct a PayrollBuilderRules that will process the specified rules. It is used at the very beginning of the DSL.

val payrollCalculator = rules { employee => ...

The rules object also defines implicit conversions. The first one is used by the 2.weeks expression. It converts 2 into a Duration instance, which we discussed previously. The other conversions are used later in the DSL to enable transparent conversion of Doubles, Employees, etc. into wrapper instances that we will describe shortly.

Note that the rules object is imported so these conversions are visible in the rest of the current file. It will also need to be imported in files that use the DSL.

The PayrollBuilderRules is our first wrapper instance. It evaluates the function literal for the whole rule set, wrapped in a try/catch block:

// code-examples/DSLs/payroll/dsl/payroll.scala
...
class PayrollException(message: String, cause: Throwable)
    extends RuntimeException(message, cause)

protected[dsl] class PayrollBuilderRules(rules: Employee => Paycheck) {
  def apply(employee: Employee) = {
    try {
      rules(employee)
    } catch {
      case th: Throwable => new PayrollException(
        "Failed to process payroll for employee: " + employee, th)
    }
  }
}
...

Note that we protect access to PayrollBuilderRules, because we don’t want clients using it directly. However, we left the exception public for use in catch clauses. (You can decide whether or not you like wrapping a thrown exception in a “domain-specific” exception, as shown.)

Note that we have to pass the employee as a “context” instance in the function literal. We said that it is desirable to make the context as implicit as possible. A common theme in our implementation classes, like PayrollBuilderRules, is to hold context information in wrapper instances and to minimize their visibility in the DSL. An alternative approach would be to store context in singleton objects so other instances can get to them. This approach raises thread safety issues, unfortunately.

To see what we mean concerning the context, consider the part of our script that uses the payroll DSL, where the deductions are specified:

... { gross =>
  federalIncomeTax            is  (25.  percent_of gross)
  stateIncomeTax              is  (5.   percent_of gross)
  insurancePremiums           are (500. in gross.currency)
  retirementFundContributions are (10.  percent_of gross)
}

Consider the insurance premiums, for which a flat Money(500) is deducted. Why didn’t we just write insurancePremiums are 500., instead? It turns out we have to “sneak” the gross instance into the expression somehow. The name gross implies that it is a Money representing the employee’s salary for the pay period. Tricksey DSLses!! It is actually another helper instance, DeductionsBuilder, which holds the whole paycheck, including the gross pay, and the employee instance. The name gross is used merely because it reads well in the places where it is used.

This block is calculating the deductions and deducting them from the gross pay to determine the net pay. The gross instance handles this process. There is no “communication” between the four lines of the function literal. Furthermore, federalIncomeTax, insurancePremiums, etc. are objects with no connection to DeductionsBuilder (as we’ll see shortly). It would be great if they could be members of DeductionsBuilder or perhaps some other wrapper instance enclosing this scope. Then each line would be a method call on one or the other wrapper. Unfortunately, this doesn’t work. Hence, each line must specify the gross instance to maintain continuity. We jump through various hoops to support the syntax, yet allow gross to be available, as needed.

So, we contrived the convention that “raw” numbers, like the insurance deduction, have to be qualified by the particular currency used for the gross pay. We’ll see how the expression 500. in gross.currency works in a moment. It is something of a hack, but it reads well and it solves our design problem.

Here is a possible alternative design that would have avoided the problem:

... { builder =>
  builder federalIncomeTax            (25.  percent_of gross)
  builder stateIncomeTax              (5.   percent_of gross)
  builder insurancePremiums           500.
  builder retirementFundContributions (10.  percent_of gross)
}

Now the fact that a builder is being used is more explicit, and federalIncomeTax, insurancePremiums, etc. are methods on the builder. We opted for a more readable style, with the penalty of a harder implementation. You’ll sometimes hear the phrase fluent interface used to refer to DSLs that emphasize readability.

Here is GrossPayBuilder:

// code-examples/DSLs/payroll/dsl/payroll.scala
...
import payroll.Type2Money._

protected[dsl] class GrossPayBuilder(val employee: Employee) {

  var gross: Money = 0

  def salary_for(days: Int) = {
    gross += dailyGrossSalary(employee.annualGrossSalary) * days
    this
  }

  // Assume 260 working days: 52 weeks (including vacation) * 5 days/week.
  def weeklyGrossSalary(annual: Money) = annual / 52.0
  def dailyGrossSalary(annual: Money)  = annual / 260.0
}
...

Recall that rules defines an implicit conversion from Employee to this type. It is invoked by the expression employee salary_for, so the GrossPayBuilder.salary_for method can be called. GrossPayBuilder initializes the gross and appends new values to it whenever salary_for is called, which assumes we’re adding gross pay in increments of days. Finally, salary_for returns this to support chaining.

Deduction calculation is the most complex part. When minus_deductions_for is used in the DSL, it triggers the implicit conversion defined in rules from the GrossPayBuilder to a DeductionsBuilder:

// code-examples/DSLs/payroll/dsl/payroll.scala
...
protected[dsl] class DeductionsBuilder(gpb: GrossPayBuilder) {

  val employee = gpb.employee
  var paycheck: Paycheck = new Paycheck(gpb.gross, gpb.gross, 0)

  def currency = this

  def minus_deductions_for(deductionRules: DeductionsBuilder => Unit) = {
    deductionRules(this)
    paycheck
  }

  def addDeductions(amount: Money) = paycheck = paycheck plusDeductions amount

  def addDeductionsPercentageOfGross(percentage: Double) = {
    val amount = paycheck.gross * (percentage/100.)
    addDeductions(amount)
  }
}
...

DeductionsBuilder saves the employee from the passed-in GrossPayBuilder, which it doesn’t save as a field. It also initializes the paycheck using the calculated gross pay.

Note that the currency method simply returns this. We don’t need to do anything with the actual currency when this method is invoked. Instead, it is used to support a design idiom that we’ll discuss shortly.

The minus_deductions_for does the important work. It invokes the function literal with the individual rules and then returns the completed Paycheck instance, which is ultimately what rules.apply returns.

Our remaining two methods are used to calculate individual deductions. They are called from DeductionsBuilderDeductionHelper, which we show now:

// code-examples/DSLs/payroll/dsl/payroll.scala
...
class DeductionCalculator {
  def is(builder: DeductionsBuilder) = apply(builder)
  def are(builder: DeductionsBuilder) = apply(builder)

  def apply(builder: DeductionsBuilder) = {}
}

object federalIncomeTax extends DeductionCalculator
object stateIncomeTax extends DeductionCalculator
object insurancePremiums extends DeductionCalculator
object retirementFundContributions extends DeductionCalculator

protected[dsl] class DeductionsBuilderDeductionHelper(val factor: Double) {
  def in (builder: DeductionsBuilder) = {
    builder addDeductions Money(factor)
    builder
  }
  def percent_of (builder: DeductionsBuilder) = {
    builder addDeductionsPercentageOfGross factor
    builder
  }
}

Now we see that federalIncomeTax, etc. are singleton objects. Note the “synonym” methods is and are. We used are for the objects with plural names, like insurancePremiums, and is for the singular objects, like federalIncomeTax. In fact, since both methods delegate to apply, they are effectively bubble words that the user could omit. That is, the following two DSL lines are equivalent:

federalIncomeTax is (25. percent_of gross)
federalIncomeTax    (25. percent_of gross)

The apply method takes DeductionsBuilder and does nothing with it! In fact, by the time apply is called, the deduction has already been calculated and factored into the paycheck. By implication, the presence of expressions like federalIncomeTax is are effectively syntactic sugar (at least as this DSL is currently implemented). They are a fancy form of comments, but at least they have the virtue of type checking the “kinds” of deductions that are allowed. Of course, as the implementation evolves, these instances might do real work.

To see why DeductionCalculator.apply is empty, let’s discuss DeductionsBuilderDeductionHelper. Recall that the rules object has an implicit conversion method to convert a Double to a DeductionsBuilderDeductionHelper. Once we have a helper instance, we can call either the in method or the percent_of method. Every line in the deductions function literal exploits this instance.

For example, (25. percent_of gross) is roughly equivalent to the following steps:

  1. Call to rules.double2DeductionsBuilderDeductionHelper(25.) to create a new DeductionsBuilderDeductionHelper(25.)

  2. Call to the helper’s percent_of(gross) method, where gross is a DeductionsBuilder

  3. gross.addDeductionsPercentageOfGross(factor)

In other words, we used DeductionsBuilderDeductionHelper to convert an expression of the form Double method DeductionsBuilder into an expression of the form DeductionsBuilder method2 Double. DeductionsBuilder accumulates each deduction into the paycheck we’re building.

The expression 500. in gross.currency works almost identically. DeductionsBuilder.currency is effectively another bubble word; it simply returns this, but gives a readable idiom for the DSL. The in method simply converts the Double to a Money and passes it to DeductionsBuilder.addDeductions.

So DeductionCalculator.apply does nothing, because all the work is already done by the time apply is called.

Internal DSLs: Final Thoughts

So which is better, the original API implementation or the DSL implementation? The DSL implementation is complex. Like any language, testing its robustness can be a challenge. Users will try many combinations of expressions. They will probably not understand the compiler error messages that refer to the internals we’ve hidden behind the DSL.

Designing a quality DSL is difficult. With an API, you can follow the Scala library conventions for types, method names, etc. However, with a DSL, you’re trying to imitate the language of a new domain. It’s hard to get it right.

It’s worth the effort, though. A well-designed DSL minimizes the translation effort between requirements and code, thereby improving communications with stakeholders about requirements. DSLs also facilitate rapid feature change and hide distracting implementation details. As always, there is a cost/benefit analysis you should make when deciding whether to use a DSL.

Assuming you’ve made the “go” decision, a common problem in DSL design is the finishing problem (see [Ford2009]). How do you know when you’ve finished building up the state of an instance and it’s ready to use?

We solved this problem in two ways. First, we nested the calculation steps in a function literal. As soon as rules(employee) was invoked, the paycheck was built to completion. Also, all the steps were evaluated “eagerly.” We didn’t need to put in all the rules, then run them at the end. Our only ordering requirement was the need to calculate the gross pay first, since the deductions are based on it. We enforced the correct order of invocation using instances of different types.

There are cases in which you can’t evaluate the build steps eagerly. For example, a DSL that builds up a SQL query string can’t run a query after each step of the build process. In this case, evaluation has to wait until the query string is completely built.

By contrast, if your DSL steps are stateless, chained method invocation works just fine. In this case, it doesn’t matter when you stop calling chained methods. If you chain methods that build up state, you’ll have to add some sort of done method and trust the users to always use it at the end.

External DSLs with Parser Combinators

When you write a parser for an external DSL, you can use a parser generator tool like Antlr (see [Antlr]). However, the Scala library includes a powerful parser combinator library that can be used for parsing most external DSLs that have a context-free grammar. An attractive feature of this library is the way it defines an internal DSL that makes parser definitions look very similar to familiar grammar notations, like EBNF (Extended Backus-Naur Form—see [BNF]).

About Parser Combinators

Parser combinators are building blocks for parsers. Parsers that handle specific kinds of input—floating-point numbers, integers, etc.—can be combined together to form other parser combinators for larger expressions. A combinator framework makes it easy to combine parsers to handle sequential and alternative cases, repetition, optional terms, etc.

We’ll learn more about parsing techniques and terminology as we proceed. A complete exposition of parsing techniques is beyond our scope, but our example should get you started. You can find additional examples of parsers written using Scala’s parser combinator library in [Spiewak2009b], [Ghosh2008a], and [Odersky2008].

A Payroll External DSL

For our parser combinator example, we’ll reuse the example we just discussed for internal DSLs. We’ll modify the grammar slightly, since our external DSL does not have to be valid Scala syntax. Other changes will make parser construction easier. Here’s an example written in the external DSL:

paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for {
  federal income tax            is  25.  percent of gross,
  state income tax              is  5.   percent of gross,
  insurance premiums            are 500. in gross currency,
  retirement fund contributions are 10.  percent of gross
}

Compare this example to the internal DSL we defined in A Payroll Internal DSL:

... = rules { employee =>
  employee salary_for 2.weeks minus_deductions_for { gross =>
    federalIncomeTax            is  (25.  percent_of gross)
    stateIncomeTax              is  (5.   percent_of gross)
    insurancePremiums           are (500. in gross.currency)
    retirementFundContributions are (10.  percent_of gross)
  }
}
...

In our new DSL, we insert a specific employee in the script. We wouldn’t expect a user to copy and paste this script for every employee. A natural extension that we won’t pursue would allow the user to loop over all salaried employees in a database, for example.

Some of the differences are “gratuitous”; we could have used the same syntax we used previously. These changes include removing underscores between words in some expressions and expanding camel-case words into space-separated words. That is, we turned some single words into multi-word expressions. We made these changes because they will be easy to implement using parser combinators, but using the same multi-word expressions would have added a lot of complexity to the internal DSL’s implementation.

We no longer need “local variables” like employee and gross. Those words still appear in the DSL, but our parser will keep track of the corresponding instances internally.

The remaining changes are punctuation. It is still convenient to surround the list of deductions with curly braces. We now use a comma to separate the individual deductions, as that will make the parser’s job easier. We can also drop the parentheses we used earlier.

To see how closely the internal DSL for Scala’s parser combinator library resembles the context-free grammar, let’s start with the grammar itself, written in a variation of EBNF. We’ll omit commas to separate sequences, for clarity:

paycheck = empl gross deduct;

empl = "paycheck" "for" "employee" employeeName;

gross = "is" "salary" "for" duration;

deduct = "minus" "deductions" "for" "{" deductItems "}";

employeeName = """ name " " name """;

name = ...

duration = decimalNumber weeksDays;

weeksDays = "week" | "weeks" | "day" | "days";

deductItems = Ɛ | deductItem { "," deductItem };

deductItem = deductKind deductAmount;

deductKind = tax | insurance | retirement;

tax = fedState "income" "tax";

fedState = "federal" | "state";

insurance = "insurance" "premiums";

retirement = "retirement" "fund" "contributions";

deductAmount = percentage | amount;

percentage = toBe doubleNumber "percent" "of" "gross";

amount = toBe doubleNumber "in" "gross" "currency";

toBe = "is" | "are";

decimalNumber = ...

doubleNumber = ...

We can see that most of the terminals (the literal strings paycheck, for, employee, the characters { and }, etc.) will be bubble words, as defined in the previous section. We’ll ignore these after parsing. The Ɛ is used to indicate an empty production for deductItems, although there will rarely be no deductions!

We didn’t spell out the details for decimal numbers, double numbers, and allowed letters in the employee names. We simply elided those definitions. We’ll handle the details later.

Each line in the grammar defines a production rule. The end of the definition is marked with a semicolon. A nonterminal appears on the lefthand side of the equals sign. The righthand side consists of terminals (e.g., the literal strings and characters we just mentioned) that require no further parsing, other nonterminals (including possibly a recursive reference to the lefthand side nonterminal), and operators that express relationships between the items. Notice that the grammar forms have a hierarchical decomposition, although not a directed acyclic graph, as generally speaking these grammars can have cycles.

We have a context-free grammar because every production rule has a single nonterminal on the lefthand side of the equals sign, i.e., without any additional context information required to specify the production’s applicability and meaning.

Production rules like toBe = "is" | "are" mean the is production (a terminal in this case) or the are production will match. This is an example of an alternative composition.

When productions are separated by white space on the righthand side of another production, e.g., prod1 prod2, both productions are required to appear sequentially for a match. (Most EBNF formats actually require a comma to separate these items.) Hence, these expressions are more like “and” expressions, but sequential composition is so common that no & operator is used, the analog of | for alternative composition.

The production rule with "{" deductItem { "," deductItem } "}" demonstrates how to specify optional (zero or more) repetitions. This expression matches a literal { character, followed by a deductItem (another production), followed by zero or more expressions consisting of a literal comma , and another deductItem, and finally ending with a literal } character. Sometimes an asterisk is used to indicate repetition zero or more times, e.g., prod *. For repetition at least once, prod + is sometimes used.

Finally, if we had optional items in our grammar, we would enclose them in square brackets, [...]. There are other kinds of composition operators possible (and supported in the Scala library), a few of which we’ll discuss. See the [ScalaAPI2008] entry for Parsers for more details.

A Scala Implementation of the External DSL Grammar

Here is the parser written using Scala’s parser combinators. At this point, we won’t do anything to actually calculate an employee’s paycheck, so we’ll append V1 to the class name:

// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-v1.scala

package payroll.pcdsl
import scala.util.parsing.combinator._
import org.specs._
import payroll._
import payroll.Type2Money._

class PayrollParserCombinatorsV1 extends JavaTokenParsers {

  def paycheck = empl ~ gross ~ deduct

  def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName

  def gross = "is" ~> "salary" ~> "for" ~> duration

  def deduct = "minus" ~> "deductions" ~> "for" ~> "{" ~> deductItems  <~ "}"

  // stringLiteral provided by JavaTokenParsers
  def employeeName = stringLiteral

  // decimalNumber provided by JavaTokenParsers
  def duration = decimalNumber ~ weeksDays

  def weeksDays = "weeks" | "week" | "days" | "day"

  def deductItems = repsep(deductItem, "," )

  def deductItem = deductKind ~> deductAmount

  def deductKind = tax | insurance | retirement

  def tax = fedState <~ "income" <~ "tax"

  def fedState = "federal" | "state"

  def insurance = "insurance" ~> "premiums"

  def retirement = "retirement" ~> "fund" ~> "contributions"

  def deductAmount = percentage | amount

  def percentage = toBe ~> doubleNumber <~ "percent" <~ "of" <~ "gross"

  def amount = toBe ~> doubleNumber <~ "in" <~ "gross" <~ "currency"

  def toBe = "is" | "are"

  // floatingPointNumber provided by JavaTokenParsers
  def doubleNumber = floatingPointNumber
}

The body of PayrollParserCombinatorsV1 looks very similar to the grammar we defined for the DSL. Each production rule becomes a method. The terminating semicolon is dropped, but since the production is a method, it would be valid Scala to leave it in.

Where we had whitespace between each production on the righthand side, we now use a combinator operator, either , ∼>, or <∼. The combinator for sequential composition is , used when we want to retain for further processing the results produced by both productions on the left and right sides of the . For example, when processing the paycheck production, we want to keep all three results from empl, gross, and deduct. Hence we use two operators:

def paycheck = empl ~ gross ~ deduct

We use another sequential composition combinator ∼> when we no longer need the result of the production to the left. For example, when processing the empl production, we only want to keep the parse result for the last production, employeeName:

def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName

Similarly, we use <∼ when we no longer need the result for the production to the right. For example, when processing the tax production, we only want to keep the result of the first production, fedState:

def tax = fedState <~ "income" <~ "tax"

Our heavy use of the <∼ sequential combinator in the various productions related to deductions indicates that we aren’t keeping track of the source of each deduction, just the amount of the deduction. A real paycheck application would print this information, of course. Our aim is for simplicity. As an exercise, consider how PayrollParserCombinatorsV1 and the subsequent refinements below would change if we tracked this information. Would you necessarily keep the parsed strings or track the information some other way?

The “or” case is expressed with the | method, just as in the grammar:

def weeksDays = "weeks" | "week" | "days" | "day"

The rep method can be used for zero or more repetitions. We actually use a similar method, repsep, which lets us specify a separator, in our case a comma (,):

def deduct = ... ~> "{" ~> repsep(deductItem, "," ) <~ "}"

Note that deduct combines several features we have just described.

Like repetition, there is an opt method for optional terms, which we aren’t using.

PayrollParserCombinatorsV1 inherits from JavaTokenParsers, which inherits from RegexParsers, which inherits from the root parser trait Parsers. It’s well known that parsing non-trivial grammars with just regular expressions tends to break down pretty quickly. However, using regular expressions to parse individual terms inside a parsing framework can be very effective. In our example, we exploit the productions in JavaTokenParsers to parse quoted strings (for the employee’s name), decimal literals, and floating-point literals.

Let’s try it out! Here is a specification that exercises the parser for two cases, without and with deductions:

// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-v1-spec.scala

package payroll.pcdsl
import scala.util.parsing.combinator._
import org.specs._
import payroll._
import payroll.Type2Money._

object PayrollParserCombinatorsV1Spec
  extends Specification("PayrollParserCombinatorsV1") {

  "PayrollParserCombinatorsV1" should {
    "parse rules when there are no deductions" in {
      val input = """paycheck for employee "Buck Trends"
                     is salary for 2 weeks minus deductions for {}"""
      val p = new PayrollParserCombinatorsV1
      p.parseAll(p.paycheck, input) match {
        case p.Success(r,_) => r.toString mustEqual
                    """(("Buck Trends"~(2~weeks))~List())"""
        case x => fail(x.toString)
      }
    }

    "calculate the gross, net, and deductions for the pay period" in {
      val input =
          """paycheck for employee "Buck Trends"
             is salary for 2 weeks minus deductions for {
               federal income tax            is  25.  percent of gross,
               state income tax              is  5.   percent of gross,
               insurance premiums            are 500. in gross currency,
               retirement fund contributions are 10.  percent of gross
             }"""
      val p = new PayrollParserCombinatorsV1
      p.parseAll(p.paycheck, input) match {
        case p.Success(r,_) => r.toString mustEqual
            """(("Buck Trends"~(2~weeks))~List(25., 5., 500., 10.))"""
        case x => fail(x.toString)
      }
    }
  }
}

This part of the specification shows us how to instantiate and use the parser:

val p = new PayrollParserCombinatorsV1

p.parseAll(p.paycheck, input) match {
  case p.Success(r,_) => r.toString mustEqual "..."
  case x => fail(x.toString)
}

The parseAll method is defined in a parent class. We invoke the top-level production method, paycheck, and pass its return value as the first argument to parseAll and pass the string to parse as the second argument.

If the parsing process is successful, the result of the parse is returned as an instance of type p.Success[+T], a case class declared in the Parsers trait. Why is there a p. prefix? It indicates that p.Success is a path-dependent type, which we will discuss in Path-Dependent Types. For now, just know that even though Success is defined in the Parsers trait, the actual type of the instance is dependent on the PayrollParserCombinatorsV1 instance we created. In other words, if we had another parser, say p2 of type MyOtherParser, then p2.Success[String] would be different from p.Success[String] and one could not be substituted for the other.

The Success instance contains two fields. The first is the result of the parse, an instance of type T (assigned to r in the case clause). The second is the remaining input string to parse, which will be empty after a successful parse (we will have parsed the whole string at this point). This string is assigned to _.

If the parse fails, the returned instance is either a p.Failure or p.Error, which our example handles with a generic case clause. Both are derived from p.NoSuccess, which contains fields for an error message and the unconsumed input at the point of failure. A p.Failure in a parser will trigger backtracking so that a retry with a different parser can be invoked by the parser framework, if possible. An Error result does not trigger backtracking and is used to signal more serious problems.

For completeness, both p.Success and p.NoSuccess derive from p.ParseResult.

We have two big unanswered questions: what do the production methods actually return, and what is the type of the result instance returned in the p.Success?

The production methods themselves return parsers. Most of them in our example return p.Parser[String] (again, a path-dependent type). However, because the deduct method handles repetition (it invokes the repsep method), it actually returns a p.Parser[List[String]]. When this parser is used, it will return a List[String], with one string corresponding to each match in the repetition.

So, our call to p.parseAll(p.paycheck, input) earlier parses the input string using the parser returned by p.paycheck. That brings us to the second question: what is the result of a successful parse?

To see what’s returned, compile the PayrollParserCombinatorsV1 file listed at the beginning of this section and invoke the scala interpreter with the -cp option to include the directory where the class files were written (it will be build if you used the build process for the code example distribution).

Once in the interpreter, enter the following expressions after the scala> prompt. (You can also find this input the payroll-parser-comb-script.scala file in the code example distribution.)

scala> import scala.util.parsing.combinator._
import scala.util.parsing.combinator._

scala> import payroll.pcdsl._
import payroll.pcdsl._

scala> val p = new PayrollParserCombinatorsV1
p: payroll.pcdsl.PayrollParserCombinatorsV1 = 
    payroll.pcdsl.PayrollParserCombinatorsV1@79e84310

scala> p.empl
res0: p.Parser[String] = Parser (~>)

scala> p.weeksDays
res2: p.Parser[String] = Parser (|)

scala> p.doubleNumber
res3: p.Parser[String] = Parser ()

scala> p.deduct
res1: p.Parser[List[String]] = Parser (<~)

scala> p.paycheck
res4: p.Parser[p.~[p.~[String,p.~[String,String]],List[String]]] = Parser (~)

scala> p.parseAll(p.weeksDays, "weeks")
res5: p.ParseResult[String] = [1.6] parsed: weeks

scala> val input = """paycheck for employee "Buck Trends"
     | is salary for 2 weeks minus deductions for {}"""
input: java.lang.String =
paycheck for employee "Buck Trends"
       is salary for 2 weeks minus deductions for {}

scala> p.parseAll(p.paycheck, input)
res6: p.ParseResult[p.~[p.~[String,p.~[String,String]],List[String]]] = 
    [2.53] parsed: (("Buck Trends"~(2~weeks))~List())

scala>

We import the necessary types and create a PayrollParserCombinatorsV1 instance. Then we call several of the production methods to see what kind of Parser each returns. The first three—empl, weeksDays, and doubleNumber—return p.Parser[String].

Note what’s written on the righthand side in the output for the first three parsers: empl, weeksDays, and doubleNumber. We see Parser (∼>), Parser (|), and Parser (), respectively. The parsers returned reflect the definitions of the production rules, where empl ends with a combinator of the form prod1 ∼> prod2, weeksDays returns a combinator of the form prod1 | prod2, and doubleNumber returns a parser for a single production.

Because deduct consists of combinators that handle repetition, the parser returned by deduct is of type p.Parser[List[String]], as we stated previously. The righthand side of the output is Parser (<∼), because the definition of deduct ends with prod1 <∼ prod2.

Things get more interesting when we look at the top-level production, paycheck. What is p.Parser[p.∼[p.∼[String,p.∼[String,String]],List[String]]] = Parser (∼) supposed to mean? Well, the righthand side should be easy to understand now; the definition of paycheck ends in prod1 ∼ prod2. What is the type parameter for p.Parser on the lefthand side of the equals sign?

The Parsers trait also defines a case class named that represents a pair of sequential rules:

case class ~[+a, +b](_1: a, _2: b) {
  override def toString = "("+ _1 +"~"+ _2 +")"
}

The actual path-dependent type in our example is p.∼[+a,+b]. Hence, the type parameter T in p.Parser[T] is p.∼[p.∼[String,p.∼[String,String]],List[String]], which is a hierarchical tree of types.

Let’s break it down, working our way inside out. Note that there are three p.∼. We’ll start with the innermost type, p.∼[String,String], and map the type declaration to the output we saw in the scala session "Buck Trends"∼(2∼weeks∼List()).

The p.∼[String,String] corresponds to the parser that handles expressions like 2 weeks. Hence, the instance created when we parsed our example string was the instance p.∼("2", "weeks"). Calling the p.∼.toString method produces the output (2~weeks).

Working out one level, we have p.∼[String,p.∼[String,String]]. This combination parses paycheck for employee "Buck Trends" is salary for 2 weeks. Recall that we discard paycheck for employee and is salary for, keeping only the Buck Trends and 2 weeks. So we create an instance p.∼("Buck Trends", p.∼("2", "weeks")). Calling toString again results in the string ("Buck Trends"∼(2∼weeks)).

Finally, at the outermost level, you can see we have the following: p.∼[p.∼[String,p.∼[String,String]],List[String]]. We’ve already discussed everything up to the last List[String], which comes from the deduct production:

def deduct = "minus" ~> "deductions" ~> "for" ~>
             "{" ~> repsep(deductItem, "," ) <~ "}"

We discard everything except for the list of zero or more deductItems. There are none in our example, so we get an empty list for which toString returns List(). Therefore, calling p.∼.toString on our outermost type, the one that parameterizes p.Parser, returns the string "Buck Trends"∼(2∼weeks∼List()). We’re done!

Well, not quite. We’re still not calculating an actual paycheck for ol’ Buck. Let’s complete our implementation.

Generating Paychecks with the External DSL

As we parse the DSL, we want to look up the employee by name, fetch his or her gross salary for the specified pay period, and then calculate the deductions as we go. When the parser returned by paycheck finishes, we want to return a Pair with the Employee instance and the completed Paycheck.

We will reuse “domain” classes like Employee, Money, Paycheck, etc. from earlier in the chapter. To do the calculations on demand, we will create a second iteration of PayrollParserCombinatorsV1 that we’ll call PayrollParserCombinators. We’ll modify the parsers returned by some of the production methods to return new kinds of parsers. We’ll also do administrative work like storing running context data, as needed. Our implementation won’t be thread-safe. You’ll want to ensure that only one thread uses a given PayrollParserCombinators. We could make it more robust, but doing so isn’t the goal of this exercise.

Here is our final PayrollParserCombinators:

// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb.scala

package payroll.pcdsl
import scala.util.parsing.combinator._
import org.specs._
import payroll._
import payroll.Type2Money._

class UnknownEmployee(name: Name) extends RuntimeException(name.toString)

class PayrollParserCombinators(val employees: Map[Name, Employee])
  extends JavaTokenParsers {

  var currentEmployee: Employee = null
  var grossAmount: Money = Money(0)

  /** @return Parser[(Employee, Paycheck)] */
  def paycheck = empl ~ gross ~ deduct ^^ {
    case e ~ g ~ d => (e, Paycheck(g, g-d, d))
  }

  /** @return Parser[Employee] */
  def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName ^^ { name =>
    val names = name.substring(1, name.length-1).split(" ") // remove ""
    val n = Name(names(0), names(1));
    if (! employees.contains(n))
      throw new UnknownEmployee(n)
    currentEmployee = employees(n)
    currentEmployee
  }

  /** @return Parser[Money] */
  def gross = "is" ~> "salary" ~> "for" ~> duration ^^ { dur =>
    grossAmount = salaryForDays(dur)
    grossAmount
  }

  def deduct = "minus" ~> "deductions" ~> "for" ~> "{" ~> deductItems  <~ "}"

  /**
   * "stringLiteral" provided by JavaTokenParsers
   * @return Parser[String]
   */
  def employeeName = stringLiteral

  /**
   * "decimalNumber" provided by JavaTokenParsers
   * @return Parser[Int]
   */
  def duration = decimalNumber ~ weeksDays ^^ {
    case n ~ factor => n.toInt * factor
  }

  def weeksDays = weeks | days

  def weeks = "weeks?".r ^^ { _ => 5 }

  def days = "days?".r ^^ { _ => 1 }

  /** @return Parser[Money] */
  def deductItems = repsep(deductItem, ",") ^^ { items =>
    items.foldLeft(Money(0)) {_ + _}
  }

  /** @return Parser[Money] */
  def deductItem = deductKind ~> deductAmount

  def deductKind = tax | insurance | retirement

  def tax = fedState <~ "income" <~ "tax"

  def fedState = "federal" | "state"

  def insurance = "insurance" ~> "premiums"

  def retirement = "retirement" ~> "fund" ~> "contributions"

  def deductAmount = percentage | amount

  /** @return Parser[Money] */
  def percentage = toBe ~> doubleNumber <~ "percent" <~ "of" <~ "gross"  ^^ {
    percentage => grossAmount * (percentage / 100.)
  }

  def amount = toBe ~> doubleNumber <~ "in" <~ "gross" <~ "currency" ^^ {
    Money(_)
  }

  def toBe = "is" | "are"

  def doubleNumber = floatingPointNumber ^^ { _.toDouble }

  // Support method. Assume 260 (52 * 5) paid work days/year
  def salaryForDays(days: Int) =
      (currentEmployee.annualGrossSalary / 260.0) * days
}

For simplicity, we’ll use a map of “known” employees, keyed by Name instances, that we save as a field in PayrollParserCombinators. A real implementation would probably use a data store of some kind.

There are two other fields: currentEmployee, which remembers which employee we are processing, and grossAmount, which remembers the gross amount of pay for the employee for the pay period. Both fields have a slight design smell. They are mutable. They are set only once per parse, but not when they are declared, only when we parse the input that allows us to calculate them. You might have also noticed that if the same PayrollParserCombinators instance is used more than once, we don’t reset these fields to their default values. No doubt it would be possible to write scripts in the DSL that exploit this bug.

These weaknesses are not inherent to parser combinators. They reflect simplifications we used for our purposes. As an exercise, you might try improving the implementation to eliminate these weaknesses.

We have added Javadoc-style @return annotations for most of the productions to make it clear what they are now returning. In some cases, the productions are unchanged, as the original parser instances are fine as is. Most of the changes reflect our desire to calculate the paycheck as we go.

Consider the new paycheck production:

/** @return Parser[(Employee, Paycheck)] */
def paycheck = empl ~ gross ~ deduct ^^ {
  case e ~ g ~ d => (e, Paycheck(g, g-d, d))
}

Now, we return a Pair with the Employee and the computed Paycheck. The empl ∼ gross ∼ deduct combination would still return Parser[String] (we’ll drop the path-dependent prefix for now). We have added a new combinator ^^, e.g., prod1 ^^ func1, where func1 is a function. If prod1 succeeds, then the result of applying func1 to the result of prod1 is returned. That is, we return func1(prod1).

For paycheck, we give it a function literal that does a pattern match to extract the three results from empl, gross, and deduct, respectively. We create a 2-tuple (Pair) with e, the Employee, and a Paycheck calculated from the gross salary for the pay period (in g) and the sum of all the deductions (in d).

It’s important to keep clear that the anonymous function passed as an argument to ^^ returns a tuple (Employee, Paycheck), but the production paycheck method itself returns a Parser[(Employee, Paycheck)]. This pattern has been true from the beginning, actually, where Strings were always involved in our first version. It will remain true for all the production rules in PayrollParserCombinators.

The empl production assumes the employee’s first name and last name are given. (Obviously, this would be inadequate in a real application.)

/** @return Parser[Employee] */
def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName ^^ { name =>
   val names = name.substring(1, name.length-1).split(" ") // remove ""
   val n = Name(names(0), names(1));
   if (! employees.contains(n))
     throw new UnknownEmployee(n)
   currentEmployee = employees(n)
   currentEmployee
}

To construct the name, the embedded double quotes have to be removed, which is why we start by extracting the substring that tosses the first and last characters. The name is used to look up the Employee instance in the map, saving the value in the currentEmployee field. In general, there is not a lot of “graceful” error handling in PayrollParserCombinators. However, the empl method handles the case where no employee is found with the specified name, throwing an UnknownEmployee exception when this occurs.

The rest of the productions work similarly. Sometimes, a parser converts an input string to an Int (e.g., duration) or a Money (e.g., gross). An interesting case is deduct. It folds the list of deductions into a single deduction amount, using addition. The foldLeft method takes two argument lists. The first has a single argument that specifies the initial value, in this case, zero Money. The second argument list has a single function literal argument that takes two arguments: the accumulated value of the folding operation, and an item from the list. In this case, we return the sum of the arguments. So, foldLeft iterates over the items collection, adding them together. See Traversing, Mapping, Filtering, Folding, and Reducing for more information on foldLeft and related operations.

The weeks and days productions remind us that we are using parser combinators based on regular-expressions. (We’re also using stringLiteral, decimalNumber, and floatingPointNumber provided by JavaTokenParsers.) Note that weeks and days ignore the parsed string. They just return a multiplication factor used to determine total days in the pay period in the duration production rule.

There are other combinator methods for applying functions to parser results in different ways. See the Parsers Scaladoc page for details.

The following (somewhat incomplete) specification shows the calculation of paychecks when there are no deductions and when there are several deductions:

// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-spec.scala

package payroll.pcdsl
import scala.util.parsing.combinator._
import org.specs._
import payroll._
import payroll.Type2Money._

// Doesn't test "sad path" scenarios...
object PayrollParserCombinatorsSpec
    extends Specification("PayrollParserCombinators") {

  val salary = Money(100000.1)  // for a full year
  val gross = salary / 26.      // for two weeks
  val buck = Employee(Name("Buck", "Trends"), salary)
  val employees = Map(buck.name -> buck)

  implicit def money2double(m: Money) = m.amount.doubleValue

  "PayrollParserCombinators" should {
    "calculate the gross == net when there are no deductions" in {
      val input = """paycheck for employee "Buck Trends"
                     is salary for 2 weeks minus deductions for {}"""
      val p = new PayrollParserCombinators(employees)
      p.parseAll(p.paycheck, input) match {
        case p.Success(Pair(employee, paycheck),_) =>
          employee mustEqual buck
          paycheck.gross must beCloseTo(gross, Money(.001))
          paycheck.net must beCloseTo(gross, Money(.001))
          // zero deductions?
          paycheck.deductions must beCloseTo(Money(0.), Money(.001))
        case x => fail(x.toString)
      }
    }

    "calculate the gross, net, and deductions for the pay period" in {
      val input =
        """paycheck for employee "Buck Trends"
           is salary for 2 weeks minus deductions for {
             federal income tax            is  25.  percent of gross,
             state income tax              is  5.   percent of gross,
             insurance premiums            are 500. in gross currency,
             retirement fund contributions are 10.  percent of gross
           }"""

      val p = new PayrollParserCombinators(employees)
      p.parseAll(p.paycheck, input) match {
        case p.Success(Pair(employee, paycheck),_) =>
          employee mustEqual buck
          val deductions = (gross * .4) + Money(500)
          val net = gross - deductions
          paycheck.gross must beCloseTo(gross, Money(.001))
          paycheck.net must beCloseTo(net, Money(.001))
          paycheck.deductions must beCloseTo(deductions, Money(.001))
        case x => fail(x.toString)
      }
    }
  }
}

If you work out what the results should be from the input strings, you’ll see that the implementation correctly calculates the paycheck.

Besides the many small details that differ between this implementation of the external DSL and the previous implementation of the internal DSL, there is one big conceptual difference from the two implementations. Here we are computing the paycheck as we parse code written in the external DSL. In the internal DSL case, we generated a paycheck calculator when we parsed the DSL. Afterward, we used that calculator to compute paychecks for one employee at a time. We could have generated a paycheck calculator like we did before, but we chose a simpler approach to focus on the construction of the parser itself. Also, as we discussed earlier, we weren’t as careful about thread safety and other issues in the implementation.

Internal Versus External DSLs: Final Thoughts

Scala provides rich support for creating your own internal and external DSLs. However, a non-trivial DSL can be a challenge to implement and debug. For the examples in this chapter, the parser combinators implementation was easier to design and write than the implementation for the internal DSL. However, we found that debugging the internal DSL was easier.

You must also consider how robust the parser must be when handling invalid input. Depending on the level of sophistication of the users of the DSL, you may need to provide very good feedback when errors occur, especially when your users are non-programmers. The parser combinator library in Scala version 2.8 will provide improved support for error recovery and reporting, compared to the version 2.7.X library.

The version 2.8 library will also provide support for writing packrat parsers that can implement unambiguous parsing expression grammars (PEGs). The 2.8 implementation of packrat parsers also supports memoization, which helps improve performance, among other benefits. If you need a fast parser, a packrat parser will take you further before you need to consider more specialized tools, like parser generators.

Recap and What’s Next

It’s tempting to create DSLs with abandon. DSLs in Scala can be quite fun to work with, but don’t underestimate the effort required to create robust DSLs that meet your clients usability needs, nor long-term maintenance and support issues.

If you choose to write a DSL, you have rich options in Scala. The syntax is flexible yet powerful enough that an internal DSL may be sufficient. A internal DSL is an excellent starting point, especially if other programmers will be the primary writers of code in the DSL.

If you expect your non-programming stakeholders to read or even write code written in the DSL, it might be worth the extra effort to create an external DSL that eliminates as many of the programming-language idioms as possible. Consider whether the code written in the DSL will need to be processed for other purposes, like generating documentation, spreadsheets, etc. Since you will have to write a parser for the DSL anyway, it might be straightforward to write others to handle these different purposes.

In the next chapter, we’ll explore the richness of Scala’s type system. We’ve learned many of its features already. Now, we’ll explore the type system in full detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.112.220