Chapter 16. Opening Up Your System with Domain-Specific Languages

In Chapter 15, we looked at the process of creating interpreters to solve certain kinds of problems. The Interpreter pattern is all about using an abstract syntax tree (AST) to obtain the answer or perform the action that you are looking for. As we saw in that chapter, the Interpreter pattern is not really concerned with where the AST comes from; it just assumes that you have one and focuses on how the AST should operate. In this chapter, we will explore the Domain-Specific Language (DSL) pattern, which looks at the world from the other end of the telescope. The DSL pattern suggests that you should focus on the language itself, not on the interpreter. It says that sometimes you can make problems easier to solve by giving the user a convenient syntax for expressing the problem in the first place.

You won’t find the DSL pattern in your copy of Design Patterns. Nevertheless, as we will see in this chapter, Ruby’s flexible syntax makes building a particular style of DSL very easy.

The Domain of Specific Languages

Like most of the patterns that we have looked at in this book, the basic idea behind the DSL pattern is not very complicated. You can understand DSLs by stepping back and asking exactly what we are trying to do when we write programs. The answer is (I hope) that we are trying to make our users happy. A user is interested in getting the computer to do something—balance the accounts or steer a space probe to Mars. In short, the user wants the computer to satisfy some requirement. Given that, we might naively ask, why does the user need us? Why can’t we just hand the user the Ruby interpreter with a hearty "Good luck!" This is a silly idea because, in general, users do not understand programming and computers; they don’t usually know any more about bits and bytes than we know about accounting or celestial mechanics. The user understands his or her one area, his or her domain, but not the domain of programming.

What if we could create a programming language that, instead of expressing computer-related ideas, allowed the user say things about his or her domain of interest directly? What if we created a language that allows accountants to say accounting things and rocket scientists to say space probe things? Then that silly idea of just handing the user the language and saying, "Have at it," doesn’t seem so bad after all.

Now we could certainly build such a language using all the techniques that we saw in Chapter 15. That is, we could sharpen up our pencil and write a parser for an accounting language or use Racc to create a celestial navigation language. Martin Fowler calls these more or less traditional approaches external DSLs. External DSLs are external in the sense that there is a parser and an interpreter for the DSL, and there are the programs written in the DSL, and the two are completely distinct. For example, if we created a specialized accounting DSL and wrote a parser and interpreter for it in Ruby, we would end up with two entirely separate things: the accounting DSL and an interpreter program for it.

Given the existence of external DSLs, we might wonder whether there are also internal DSLs and how they might differ from the external kind. An internal DSL, again according to Fowler, is one in which we start with some implementation language, perhaps Ruby, and we simply bend that one language into being our DSL. If we are using Ruby to implement our DSL—and if you have looked at the title of this book, you know we are—then anyone who writes a program in our little language is actually, and perhaps unknowingly, writing a Ruby program.

A File Backup DSL

It turns out that it is actually fairly easy to build internal DSLs in Ruby. Imagine that we want to build a backup program, something that will wake up every so often and copy our valuable files off to some (presumably safe) directory. We decide to do this by creating a DSL, a language called PackRat, that will allow users to talk purely in terms of which files that they want to back up and when. Something like this would do fine:


   backup '/home/russ/documents'

   backup '/home/russ/music', file_name('*.mp3') & file_name('*.wav')

   backup '/home/russ/images', except(file_name('*.tmp'))

   to '/external_drive/backups'

   interval 60


What this little PackRat program says is that we have three directories full of stuff that we want copied to the /external_drive/backups directory once an hour (i.e., every 60 minutes). While we want everything from the documents directory backed up, as well as everything except the temporary files from the images directory, we want only the audio files from the music directory. Because we are never in the mood to reinvent things that already exist, PackRat makes use of the handy file-finding expressions that we built in Chapter 15.

It's a Data File—No, It's a Program!

Now we might decide to pull out our regular expressions or parser generator and write a traditional parser for the PackRat program above: We first read a word that should be "backup," then we look for a quote, and then . . . But there must be an easier way. Looking over the backup instructions, we realize that they could almost be Ruby method calls. Wait! They could be Ruby method calls. If backup, to, and interval are all names of Ruby methods, then what we have is perfectly valid Ruby program—that is, a series of calls to backup, to, and interval, each with one or two parameters. There are no parentheses around the arguments to those method calls, but of course that is perfectly valid in Ruby.

Just to get things started, let’s see if we can’t write a little Ruby program that does nothing except read in the backup.pr file. Here is the start of our DSL interpreter, a little program called packrat.rb:


   require 'finder'

   def backup(dir, find_expression=All.new)
     puts "Backup called, source dir=#{dir} find expr=#{find_expression}"
   end

   def to(backup_directory)
     puts "To called, backup dir=#{backup_directory}"
   end

   def interval(minutes)
     puts "Interval called, interval = #{minutes} minutes"
   end

   eval(File.read('backup.pr'))


It doesn’t look like much, but this code captures a lot of the ideas that you need to implement an internal DSL in Ruby. First, we have the three methods backup, to, and interval. The key bit of code for our DSL is the last statement:

   eval(File.read('backup.pr'))

This statement says to read in the contents of the file backup.pr and run those contents as Ruby program text.[1] This means that the interval and to methods and all those backup statements in backup.pr—in other words, the things that looked like Ruby method calls—will actually be sucked into our program and interpreted as Ruby method calls. When we run packrat.rb, we get the output of all those method calls:


   Backup called, source
                  dir=/home/russ/documents find expr=#<All:0xb7d84c14>
   Backup called, source dir=/home/russ/music find expr=#<And:0xb7d84b74>
   Backup called, source dir=/home/russ/images find expr=#<Not:0xb7d84afc>
   To called, backup dir=/external_drive/backups
   Interval called, interval = 60 minutes


It is this idea of sucking in the DSL and interpreting it as Ruby that puts the "internal" into an internal DSL. With that eval statement, the interpreter and the PackRat program merge. It is very B-movie science fiction.

Building PackRat

Now that we have our user unknowingly writing Ruby method calls, what should we really do inside those methods? That is, what should interval, to, and backup actually do? The answer is that they should remember that they were called; in other words, they should set up some data structures. To get started, let’s create a class that represents the whole backup request. Let’s call it Backup:


   class Backup
     include Singleton

     attr_accessor :backup_directory, :interval
     attr_reader :data_sources

     def initialize
       @data_sources = []
       @backup_directory = '/backup'
       @interval = 60
     end

     def backup_files
       this_backup_dir = Time.new.ctime.tr(' :','_')
       this_backup_path = File.join(backup_directory, this_backup_dir)
       @data_sources.each {|source| source.backup(this_backup_path)}
     end

     def run
       while true
         backup_files
         sleep(@interval*60)
       end
     end
   end


The Backup class is really just a container for the information stored in the backup.pr file. It has attributes for the interval and the backup directory plus an array in which to store all of the directories to be backed up. The only slightly complex aspect of Backup is the run method, which actually performs the periodic backups by copying all of the source data to the backup directory (actually a time-stamped subdirectory under the backup directory) and then sleeping until it is time for the next backup. We have made the Backup class a singleton given that our little utility will only ever have one.

Next we need a class to represent the directories that are to be backed up:


   class DataSource
     attr_reader :directory, :finder_expression

     def initialize(directory, finder_expression)
       @directory = directory
       @finder_expression = finder_expression
     end

     def backup(backup_directory)
       files=@finder_expression.evaluate(@directory)
       files.each do |file|
         backup_file( file, backup_directory)
       end
     end

     def backup_file( path, backup_directory)
       copy_path = File.join(backup_directory, path)
       FileUtils.mkdir_p( File.dirname(copy_path) )
       FileUtils.cp( path, copy_path)
     end
   end


A DataSource is just a container for a path to a directory and a file-finder expression AST. DataSource also has much of the code to do the actual file copying.

Pulling Our DSL Together

Now that we have built all of our supporting code, making the PackRat DSL actually work is easy. Let’s rewrite our original backup, to, and interval methods to use the classes we just wrote:


   def backup(dir, find_expression=All.new)
     Backup.instance.data_sources << DataSource.new(dir, find_expression)
   end

   def to(backup_directory)
     Backup.instance.backup_directory = backup_directory
   end

   def interval(minutes)
     Backup.instance.interval = minutes
   end

   eval(File.read('backup.pr'))
   Backup.instance.run


We’ll look at this code this one method at a time. The backup method just grabs the Backup singleton instance and adds a data source to it. Similarly, the interval method collects the backup interval and sets the right field on the Backup singleton. The to method does the same with the backup directory path.

Finally, we have the last two lines of our PackRat interpreter:


   eval(File.read('backup.pr'))
   Backup.instance.run


The eval statement we have seen before: It just pulls in our PackRat file and evaluates it as Ruby code. The very last line of the program finally starts the backup cycle going.

The structure of the PackRat interpreter is pretty typical of this style of internal DSL. Start by defining your data structures—in our case, the Backup class and its friends. Next, set up some top-level methods that will support the actual DSL language—in PackRat, the interval, to, and backup methods. Then, suck in the DSL text with an eval(File.read(...)) statement. Typically, the effect of pulling in the DSL text is to fill in your data structures; in our case, we ended up with a fully configured Backup instance. Finally, do whatever it is that the user asked you to do. How do you know what to do? Why, by looking in those freshly populated data structures.

Taking Stock of PackRat

The internal DSL technique certainly has some advantages: We managed to create the whole backup DSL in less than 70 lines of code, and most of those are devoted to the Backup/Source infrastructure that we probably would have needed no matter how we implemented the program. In addition, with a Ruby-based internal DSL, you get the entire language infrastructure for free. If you had a directory name with a single quote in it,[2] you could escape that quote in the usual Ruby way:

   backup '/home/russ/bob's_documents'

In fact, since this is Ruby, you could also do it like this:

   backup "/home/russ/bob's_documents"

If we were writing our own parser in the traditional way, we would probably need to write some code to deal with that embedded quote. Not so here, because we just inherit it from Ruby. Likewise, we get comments for free:


   #
   # Back up Bob's directory
   #
   backup "/home/russ/bob's_documents"


Our users can also take advantage of the full programming capabilities of Ruby if they want:


   #
   # A file-finding expression for music files
   #
   music_files = file_name('*.mp3') | file_name('*.wav')

   #
   # Back up my two music directories
   #
   backup '/home/russ/oldies', music_files
   backup '/home/russ/newies', music_files

   to '/tmp/backup'

   interval 60


The preceding code creates a file-finding expression ahead of time and uses it in two backup statements.

Improving PackRat

Although our PackRat implementation is functional, it is a bit limited, in that we can specify only one backup configuration at a time. If we want to use two or three backup directories, or if we want to back up some files on a different schedule than other files, we are out of luck with our current implementation. Another problem is that PackRat is a bit messy: It relies on the interval, to, and backup top-level methods.

A way around this is to redo the syntax for our packrat.pr file so that the user is actually creating and configuring multiple instances of Backup:


   Backup.new do |b|
     b.backup '/home/russ/oldies', file_name('*.mp3') | file_name('*.wav')
     b.to '/tmp/backup'
     b.interval 60
   end

   Backup.new do |b|
     b.backup '/home/russ/newies', file_name('*.mp3') | file_name('*.wav')
     b.to '/tmp/backup'
     b.interval 60
   end


Let’s see how we can get this to work, starting with the Backup class itself:


   class Backup
     attr_accessor :backup_directory, :interval
     attr_reader :data_sources

     def initialize
       @data_sources = []
       @backup_directory = '/backup'
       @interval = 60
       yield(self) if block_given?
       PackRat.instance.register_backup(self)
     end

     def backup(dir, find_expression=All.new)
       @data_sources << DataSource.new(dir, find_expression)
     end

     def to(backup_directory)
       @backup_directory = backup_directory
     end

     def interval(minutes)
       @interval = minutes
     end

     def run
     while true
       this_backup_dir = Time.new.ctime.tr(" :","_")
       this_backup_path = File.join(backup_directory, this_backup_dir)
       @data_sources.each {|source| source.backup(this_backup_path)}
       sleep @interval*60
     end
    end
   end


Because the user will be creating any number of instances, the Backup class is no longer a singleton. We have also moved the backup, to, and interval methods inside the Backup class. The remaining two changes both appear in the initialize method. The Backup class's initialize method calls yield with itself as the only parameter. This allows the user to configure the Backup instance in a code block passed into new:


   Backup.new do |b|
     # Configure the new Backup instance
   end


The other change to the Backup initialize method is that the new version registers itself with the new PackRat class:


   class PackRat
     include Singleton

     def initialize
       @backups = []
     end

     def register_backup(backup)
       @backups << backup
     end

     def run
       threads = []
       @backups.each do |backup|
         threads << Thread.new {backup.run}
       end

       threads.each {|t| t.join}
     end
  
   end

   eval(File.read('backup.pr'))
   PackRat.instance.run


The PackRat class maintains a list of Backup instances and starts each one up in its own thread when its run method is called.

Using and Abusing Internal DSLs

As we have seen, internal DSLs allow you to apply a unique kind of leverage to certain kinds of problems. But like all tools, they are not without their limitations. As free flowing as Ruby syntax is, you are limited to what you can parse with a Ruby-based internal DSL. For example, you probably could not write an internal DSL in Ruby that could directly parse raw HTML.

Another issue is error messages. Unless you are very careful, errors in the DSL program can produce some pretty strange messages. For example, what if your hapless user accidentally typed x when he or she meant b in the backup.pr file:


   Backup.new do |b|
     b.backup '/home/russ/newies', name('*.mp3') | name('*.wav')
     b.to '/tmp/backup'
     x.interval 60
   end


The result would be the following error message:

   ./ex6_multi_backup.rb:86: undefined local variable or method 'x' ...

To a user who is just trying to specify some files to back up and who knows nothing about Ruby, this error message is, well, less than friendly. With careful coding and judicious use of exception catching, you can frequently mitigate this problem. Nevertheless, these kinds of non sequitur error messages are a constant problem in internal DSLs.

Finally, if security is an issue, stay away from internal DSLs—far, far away. After all, the whole point of an internal DSL is that you take some arbitrary code that someone else wrote and suck it into your program. That requires a toothbrush-sharing level of trust.

Internal DSLs in the Wild

The most prominent example of a pure internal DSL in the Ruby world is probably rake, Ruby’s answer to ant or make. The rake DSL syntax is similar to the second version of the PackRat syntax, which allowed for multiple backups.

The rake utility lets you specify the steps that make up your build process as a series of tasks. Tasks can depend on one another. Thus, if task B depends on task A, then rake will run task A before it runs task B. As a simple example, the following rake file backs up my music directories:


   #
   # Directories for my collection of music
   #
   OldiesDir = '/home/russ/oldies'
   NewiesDir = '/home/russ/newies'

   #
   # Backup directory
   #
   BackupDir = '/tmp/backup'

   #
   # Unique directory name for this copy
   #
   timestamp=Time.new.to_s.tr(" :", "_")

   #
   # rake tasks
   #
   task :default => [:backup_oldies, :backup_newies]

   task :backup_oldies do
     backup_dir = File.join(BackupDir, timestamp, OldiesDir)
     mkdir_p File.dirname(backup_dir)
     cp_r OldiesDir, backup_dir
   end

   task :backup_newies do
     backup_dir = File.join(BackupDir, timestamp, NewiesDir)
     mkdir_p File.dirname(backup_dir)
     cp_r NewiesDir, backup_dir
   end


This rake file defines three tasks. The backup_oldies and backup_newies tasks do precisely what their names suggest. The third task, default, depends on the other two. Thus, when rake tries to run the default task, it will first run backup_oldies and backup_newies.

Aside from rake, there is, of course, Rails. While not a pure, straightforward internal DSL like rake, Rails is full of DSL-like features—bits of code where you can almost forget you are programming in Ruby. Just to take one outstanding example, ActiveRecord allows you to specify class relationships in a very DSL-like way:


   class Manager < ActiveRecord::Base
     belongs_to :department
     has_one :office
     has_many :committees
   end


Wrapping Up

The Domain-Specific Language pattern is the first pattern that we have examined that is not an original GoF pattern. But don’t hold that against it—when paired with Ruby’s very flexible syntax, the internal DSL is one of those special techniques in computer science that brings a lot of power and flexibility without requiring a lot of code. The idea behind the internal DSL is really very straightforward: You define your DSL so that it fits within the rules of Ruby syntax; you define the infrastructure required to get a program written in your DSL to do what the DSL program says it should. The punch line comes when you simply use eval to execute the DSL program as ordinary Ruby code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.178.211