Chapter 15. Assembling Your System with the Interpreter

In the late 1980s, a much earlier edition of Russ Olsen the software engineer—perhaps the beta version of the professional me, certainly not the release candidate—worked on a Geographical Information System (GIS). One of the key goals of that GIS system was easy adaptability. Customers' maps were all different, and each customer wanted to have its maps look just the way that customer wanted them to look. Also, each customer wanted to use its maps in some unique way, and we naturally wanted our system to zig with every customer's zag.

Unfortunately, we were writing this system in the C programming language. While C has a lot of points to recommend it, easy adaptability is not one of them. Writing C is hard—you need to pay close attention to all of that pointer arithmetic or your program is very likely to suffer an early demise. What was worse, C lived on the wrong conceptual level for our system. When you write a C program you are dealing with ints and floats and pointers to structs, but we wanted to spend our time dealing with the valleys and rivers and political borders that make up a map.

The architects of that GIS system (an elite group that emphatically did not include me) solved the easy-adaptability problem with one dramatic decision: They ruled that most of the system was not to be written in C after all. Instead, perhaps 80 percent of the application was coded in a specialized language that knew all about geographic things like latitudes and longitudes. This language sported a sophisticated query syntax that made it easy to move all of the medium-sized trees to the north by 500 meters. The map language was not exactly what you would hear when two cartographers get together for lunch, but it was much closer to that than any C program ever written.

As I say, about 80 percent of the code was this specialized, map-oriented stuff. And what of the other 20 percent? That was written in C, but the interesting part was what it did: That C code implemented an interpreter, an interpreter for the other 80 percent of the code, the stuff written in the specialized, map-oriented language. In short, that old GIS system was a massive implementation of the Interpreter pattern.

In this chapter, we will look at the Interpreter pattern, which suggests that sometimes the best way to solve a problem is to invent a new language for just that purpose. We will explore how a typical interpreter is assembled and consider some ways of dealing with the tedious task of parsing. We will also find out that interpreters are perhaps not the best-performing beasts in the programming zoo, but that for their performance price they do offer a lot of flexibility and extensibility.

The Right Language for the Job

The Interpreter pattern is built around a very simple idea: Some programming problems are best solved by creating a specialized language and expressing the solution in that language. What kind of problems are good candidates for the Interpreter pattern? As a general rule, problems well suited to the Interpreter pattern tend to be selfcontained, with crisp boundaries around them. For example, if you are writing code that searches for specific objects based on some specification, you might consider creating a query language.[1] Conversely, if you are faced with the task of creating complex object configurations, you might think about building a configuration language.

Another clue that your problem might be right for the Interpreter pattern is that you find yourself creating lots of discrete chunks of code, chunks that seem easy enough to write in themselves, but which you find yourself combining in an ever-expanding array of combinations. Perhaps a simple interpreter could do all of the combining work for you.

Building an Interpreter

Interpreters typically work in two phases. First, the parser reads in the program text and produces a data structure, called an abstract syntax tree (AST). The AST represents the same information as the original program, but transformed into a tree of objects that, unlike the original program text, can be executed with reasonable efficiency. Second, the AST is evaluated against some set of external conditions, or context, to produce the desired computation.

For example, we might build an interpreter to evaluate very simple arithmetic expressions like this one:

   5.0*(3+x)

First, we need to parse this expression. Our parser would start out by tackling the first character of the expression—the numeral 5—and then move on to the decimal point and the zero, working out along the way that it is dealing with a floating-point number. Having finished with the 5.0, the parser would move on to process the rest of the expression, eventually producing a data structure that looks something like Figure 15-1.

Figure 15-1. Abstract syntax tree for a simple arithmetic expression

image

The data structure shown in Figure 15-1 is our AST. The leaf nodes of the AST—that is, the 5.0, the 3, and the x—are called terminals, and represent the most basic building blocks of the language. The nonleaf nodes—in this example, the + and the *—are called (logically enough) nonterminals. The nonterminals represent the higher-order concepts in the language.

As you can see from the UML diagram in Figure 15-2, the nonterminals have a reference to one or more subexpressions, which allows us to build arbitrarily complex trees.[2]

Figure 15-2. Class Diagram for the Interpreter pattern

image

Although the GoF called the key method in the Interpreter pattern interpret, names such as evaluate or execute also make sense and show up in code frequently.

With the AST in hand, we are almost ready to evaluate our expression, save for one small detail: What is the value of x? To evaluate our expression, we need to supply a value for x. Is x equal to 1 or 167 or –279? The GoF called such values or conditions supplied at the time the AST is interpreted the context. Returning to our example, if we interpret our AST with a context where x is set to 1, we will get a result of 20.0; if we interpret it a second time with x set to 4, we will get a result of 35.0.

Whatever the values in the context, the AST evaluates itself by recursively descending down the tree. We ask the root node of the tree (in this case, the node representing the multiplication) to evaluate itself. It, in turn, recursively tries to evaluate its two factors. The 5.0 is easy, but the other factor, the addition, must evaluate its terms (the 3 and x). We finally hit bottom here and the results come bubbling back up through the recursion.

We can learn two things from this simple arithmetic expression example. First, the Interpreter pattern has a lot of moving parts. Think of all the different classes that make up the AST and then add in the parser. The sheer number of components is why the Interpreter pattern is in practice limited to relatively simple languages: We are presumably trying to solve some real business problem and not engage in programming language research. The second thing that we can take away from the example is that the Interpreter pattern is unlikely to be fast. Aside from the parsing overhead, traversing the AST will inevitably exact some speed penalty of its own.

We do get a couple of things back in return for the interpreter's complexity and speed penalty. First, we get flexibility. Once you have an interpreter, it is usually very easy to add new operations to it. Certainly we can imagine that once we have built an interpreter for our little arithmetic expressions that it would be easy to add subtraction and multiplication nodes to the AST. Second, we get the AST itself. An AST is a data structure that represents some specific bit of programming logic. While we originally built the AST to evaluate it, we can manipulate it so that it does other things, too. We might, for example, have the AST print out a description of itself:

   Multiply 5.0 by the sum of 3 and x, where x is 1.

A File-Finding Interpreter

Enough theory—let’s create an interpreter in Ruby. The last thing the world needs is another arithmetic expression interpreter, so we will do something a little different here. Imagine that we are writing a program that manages files, lots of files of many different formats and sizes. We will frequently need to search for files with particular characteristics, such as all the MP3 files or all the writable files. Not only that, but we will also need to find files that share a specific combination of characteristics, such as all the big MP3 files or all the JPEG files that are read only.

Finding All the Files

This sounds like a problem that could be solved with a simple query language. We can imagine that each expression in our language will specify the kind of files that we are looking for. Let’s start with the classes that will make up the AST and put off the parser until later.

The most basic kind of file search simply returns all of the files, so we build a class to do that:


   require 'find'

   class Expression
     # Common expression code will go here soon...
   end

   class All < Expression
    def evaluate(dir)
      results= []
      Find.find(dir) do |p|
        next unless File.file?(p)
        results << p
      end
      results
    end
   end


There is not a lot going on here. The key method of our interpreter is called evaluate; the evaluate method in All simply uses the Ruby standard library Find class to gather up all the files that live under a given directory. If you pass Find.find a directory name and a block, it will call the block once for everything that it finds under that directory. And I mean everything: Because Find works recursively, our block will be called not only for all of the files in the directory but also for all of the subdirectories, all of the files in the subdirectories, and so on. Of course, we are only interested in files, so we need to do a bit of filtering. The line

         next unless File.file?(p)

will skip anything that is not a file.

Oh, and don’t fret too much about that empty Expression superclass. We will fill it in with some useful code a little later on.

Finding Files by Name

The next most obvious thing is to create a class that will return all files whose names match a given pattern:


   class FileName < Expression
     def initialize(pattern)
       @pattern = pattern
     end

     def evaluate(dir)
       results= []
       Find.find(dir) do |p|
         next unless File.file?(p)
         name = File.basename(p)
         results << p if File.fnmatch(@pattern, name)
       end
       results
     end
   end


FileName is only slightly more complicated than All. This class employs a couple of very useful methods from the Ruby File class. The File.basename method returns just the filename part of a path—give it "/home/russ/chapter1.doc" and you will get back "chapter1.doc". The File.fnmatch method returns true only if the filename pattern specified in the first parameter (something like "*.doc") matches the filename in the second parameter (something like "chapter1.doc").

Using our file-finding classes is very straightforward. If my test_dir directory contains two MP3 files and an image, I can find all three files with this code:


   expr_all = All.new
   files = expr_all.evaluate('test_dir')


But if I am just interested in the MP3s, I can say this:


   expr_mp3 = FileName.new('*.mp3')
   mp3s = expr_mp3.evaluate('test_dir')


In the preceding examples, the directory name that we pass into the evaluate method plays the part of the context—some set of externally supplied parameters that we evaluate the expression against. We can evaluate the same expression against different contexts and produce different results. I might, for example, want to find all of the MP3 files in my music directory:

   other_mp3s = expr_mp3.evaluate('music_dir')

Big Files and Writable Files

Of course, finding all the files, or finding all the files with a certain style of name, in no way exhausts the possibilities. We might, for example, want to look for files that are bigger than some specified size:


   class Bigger < Expression
    def initialize(size)
      @size = size
    end

    def evaluate(dir)
      results = []
      Find.find(dir) do |p|
        next unless File.file?(p)
        results << p if( File.size(p) > @size)
      end
      results
    end
   end


Or we might go looking for files that are writable:


   class Writable < Expression
    def evaluate(dir)
      results = []
      Find.find(dir) do |p|
        next unless File.file?(p)
        results << p if( File.writable?(p) )
      end
      results
    end
   end


More Complex Searches with Not, And, and Or

Now that we have some basic file-searching classes—these will become the terminals in our AST—let’s move on to something more interesting. What if we want to find all of the files that are not writable? Clearly, we could build yet another class along the lines of the ones we created earlier. But let’s do something different; let’s build our first nonterminal, Not:


   class Not < Expression
     def initialize(expression)
       @expression = expression
     end

     def evaluate(dir)
       All.new.evaluate(dir) - @expression.evaluate(dir)
     end
   end


The constructor of the Not class takes another file-finding expression, which represents the expression we want to negate. When evaluate is called, it starts with all of the paths (as conveniently determined by the All class) and, using the Array subtraction operator, removes all of the paths returned by the expression. What is left is every path that is not returned by the expression. Thus, to find all of the files that are not writable, you would say:


   expr_not_writable = Not.new( Writable.new )
   readonly_files = expr_not_writable.evaluate('test_dir')


The beauty of the Not class is that it is not just applicable to Writable. We might, for instance, use Not to find all of the files smaller than 1KB:


   small_expr = Not.new( Bigger.new(1024) )
   small_files = small_expr.evaluate('test_dir')


Or we might look for all files that are not MP3s:


   not_mp3_expr = Not.new( FileName.new('*.mp3') )
   not_mp3s = not_mp3_expr.evaluate('test_dir')


We can also create a nonterminal that combines the results of two file-searching expressions:


   class Or < Expression
    def initialize(expression1, expression2)
      @expression1 = expression1
      @expression2 = expression2
    end

   def evaluate(dir)
      result1 = @expression1.evaluate(dir)
      result2 = @expression2.evaluate(dir)
      (result1 + result2).sort.uniq
    end
   end


With Or, we can find all of the files that are either MP3s or bigger than 1K in one shot:


   big_or_mp3_expr = Or.new( Bigger.new(1024), FileName.new('*.mp3') )
   big_or_mp3s = big_or_mp3_expr.evaluate('test_dir')


And where there is an Or, can And be far behind?


   class And < Expression
     def initialize(expression1, expression2)
       @expression1 = expression1
       @expression2 = expression2
     end
     def evaluate(dir)
       result1 = @expression1.evaluate(dir)
       result2 = @expression2.evaluate(dir)
       (result1 & result2)
     end
   end


We now have everything we need to specify some truly complex file searches. How about all of the big MP3 files that are not writable?


   complex_expression = And.new(
                           And.new(Bigger.new(1024),
                                   FileName.new('*.mp3')),
                           Not.new(Writable.new ))


This complex expression gives us a glimpse into another nice property of the Interpreter pattern. Once we have created a complex AST like the one above, we can use it over and over with different contexts:


   complex_expression.evaluate('test_dir')
   complex_expression.evaluate('/tmp')


Done correctly, the Interpreter pattern gives you a lot of mileage for your effort. In our example, it took us only seven classes to get to a reasonably flexible file-searching AST.

Creating the AST

Surprisingly, the Interpreter pattern as defined by the GoF is silent on the origins of the AST. The pattern simply assumes that you have somehow gotten your hands on an AST and works from there. Of course, the AST has to come from somewhere. In practice, there are a surprising number of alternative ways to create an AST.

A Simple Parser

Perhaps the most obvious way of coming up with an AST is to build a parser. Building a parser for our little file-searching language is not especially difficult. Suppose we define the syntax to look something like this:

   and (and(bigger 1024)(filename *.mp3)) writable

Then the following code does a passable job of parsing, all in about 50 lines:


   class Parser
     def initialize(text)
       @tokens = text.scan(/(|)|[w.*]+/)
     end

     def next_token
       @tokens.shift
     end

     def expression
       token = next_token

     if token == nil
       return nil

     elsif token == '('
       result = expression
       raise 'Expected )' unless next_token == ')'
       result

     elsif token == 'all'
       return All.new

     elsif token == 'writable'
       return Writable.new

     elsif token == 'bigger'
       return Bigger.new(next_token.to_i)

     elsif token == 'filename'
       return FileName.new(next_token)

     elsif token == 'not'
       return Not.new(expression)

     elsif token == 'and'
       return And.new(expression, expression)

     elsif token == 'or'
       return Or.new(expression, expression)

     else
       raise "Unexpected token: #{token}"
     end
    end
   end


To use the parser, pass the file-finding expression into the constructor and then call the parse method. The parse method returns the corresponding AST, all ready to be evaluated:


   parser = Parser.new "and (and(bigger 1024)(filename *.mp3)) writable"
   ast = parser.expression


The Parser class uses the scan method from the String class to break up the expression into convenient tokens:

      @tokens = text.scan(/(|)|[w.*]+/)

Using a little regular-expression magic,[3] this code breaks the string contained in text into an array of substrings or tokens, where each token is either a left or right parenthesis or a discrete chunk of the expression like "filename" or "*.mp3".[4]

The bulk of the parser is taken up by the expression method, which looks at the tokens one at a time, building up the AST as it goes.

A Parser-less Interpreter?

Although the parser that we built was not really all that difficult to create, it did take some effort and raises a somewhat subversive-sounding question: Do we really need a parser at all? The file-searching classes that we have built so far would, all by themselves, make a fine internal programmer-oriented API. If we just need a good way to specify file searches from code, then perhaps we can just create the file-searching AST in code in exactly the same way we did in the examples in the previous section. In this way, we can still get all of the flexibility and extensibility benefits of the Interpreter pattern without the bother of dealing with parsing.

If you do decide to go with a parser-less interpreter, it is frequently worth the trouble of adding some shortcuts to make life easier for your users. For example, we could extend our file-searching interpreter by defining some operators in the Expression class that will create And and Or expressions with a little less syntax:[5]


   class Expression
     def |(other)
       Or.new(self, other)
     end

     def &(other)
       And.new(self, other)
     end
   end


You knew I was going to come up with a use for the Expression class! With these new operators defined, complex file-searching expressions flow a little more easily off the keyboard. Instead of


   Or.new(
    And.new(Bigger.new(2000), Not.new(Writable.new)),
    FileName.new('*.mp3'))


we can now say

   (Bigger.new(2000) & Not.new(Writable.new)) | FileName.new("*.mp3")

We can even take this syntactic sugaring one step further by defining some convenience methods to create the terminals:


   def all
     All.new
   end

   def bigger(size)
     Bigger.new(size)
   end

   def name(pattern)
     FileName.new(pattern)
   end

   def except(expression)
     Not.new(expression)
   end

   def writable
     Writable.new
   end


The new convenience methods reduce the file-searching expression above to something even more succinct:

   (bigger(2000) & except(writable) ) | file_name('*.mp3')

The only caveat here is that we can’t use the name not for the Not.new convenience method because this name collides with the Ruby not operator.

Let XML or YAML Do the Parsing?

If you do decide that you need a parser, a tempting alternative to building your own is to define your new language in XML or even YAML.[6] If you do this, then you can use the built-in XML or YAML parsing libraries that come with your Ruby installation to handle the parsing. On the surface, this sounds like a great idea—you get to enjoy all of the flexibility and extensibility of a full interpreter and you don’t have to mess with the details of a parser. Who could complain?

Unfortunately, your users might well complain. While XML and YAML are great choices for representing data, neither is really ideal for expressing programs. Keep in mind that the main motivation behind building an interpreter is to give your users a natural way to express the kind of processing that needs to be done. If the ideas embodied in your interpreter can be naturally expressed in XML or YAML, then go ahead and use that format and take full advantage of the built-in parser for that format. But if your language does not fit—and I would venture to guess that most Interpreter pattern languages will not—then don’t try to jam your language into the wrong format just to save a bit of coding.

Racc for More Complex Parsers

If your language is fairly complex and neither XML nor YAML seems appropriate, you might consider using a parser generator such Racc. Racc is modeled (and named) after the venerable UNIX YACC utility. Racc takes as input a description of the grammar for your language and spits out a parser, written in Ruby for that language. While Racc is a fine tool, it is not for the faint of heart—learning how to use a parser generator entails a fairly long walk up a steep learning curve.

Let Ruby Do the Parsing?

There is one other answer to the parser question. Perhaps, just perhaps, you could implement your Interpreter pattern in such a way that users could write their programs in actual Ruby code. Maybe you could design your AST API in such a way that the code flows so naturally that your users might be unaware that they are, in fact, writing Ruby code. This is such an intriguing idea that I will leave it for now and devote the next chapter to it.

Using and Abusing the Interpreter Pattern

In my mind, the Interpreter pattern is unique among the GoF patterns that we will look at in this book in that it tends to be underused. Over the years I have seen a number of systems that could have benefited from the Interpreter pattern, systems that laboriously solved their problems with other, less appropriate designs. For example, back in the stone age of databases, a query was a program that some database expert laboriously coded. This kind of thing went on for a very long time until the advent of (mostly interpreted) query languages like SQL. Similarly, for many years the construction of even very simple GUIs required the services of a software engineer, who would take days or weeks to spew out page after page of code. Now every middle school kid with access to a keyboard can create reasonably elaborate GUIs, all thanks to the introduction of an interpreted language; we call it HTML.

Why is the Interpreter pattern so widely neglected? Most software engineers who spend their days solving business problems may be expert in things like database design and Web application development, but they may not have thought about ASTs and parsers since that second-year CIS 253 course, if at all. That's too bad. As we have seen, when properly applied, the Interpreter pattern adds a unique flexibility to your system.

We have already touched on the main technical drawbacks of interpreters. First, there is the complexity issue. When you are considering the Interpreter pattern, and especially if you are going to build a parser, think about how complex your language will be. The simpler, the better. Also, think about who will be writing programs in your new language. Will these programmers be experienced software engineers, who will be able to get along with minimal error messages, or will your users be less technical civilians, who may need more elaborate diagnostics?

Second, there is the issue of program efficiency. Remember, it is going to be darned hard to make

   Add.new(Constant.new(2), Constant.new(2)).interpret

run as quickly as

   2 + 2

For all of its flexibility and power, the Interpreter pattern is not a good choice for the 2 percent of your code that really is performance sensitive. Of course, that does leave the remaining 98 percent wide open . . .

Interpreters in the Wild

Interpreters are easy to find in the Ruby world. In fact, we need look no farther than Ruby itself. Ruby is, of course, an interpreted language, albeit a slightly more complex one than is envisioned by the Interpreter pattern.

Similarly, regular expressions—those marvelous pattern-matching beasts that were so helpful when we needed to code our little parser—are themselves implemented as an interpreted language. When you write a regular expression like /[rR]uss/, behind the scenes it gets translated into an AST that matches a couple of variations of my first name.

There is also Runt,[7] a library that provides a simple language for expressing things like date and time ranges and schedules. Using Runt, you can write temporal expressions that will, for example, match only certain days of the week:


   require 'rubygems'
   require 'runt'

   mondays = Runt::DIWeek.new(Runt::Monday)
   wednesdays = Runt::DIWeek.new(Runt::Wednesday)
   fridays = Runt::DIWeek.new(Runt::Friday)


With the three objects above, we can discover that Christmas will fall on a Friday in 2015, since

   fridays.include?(Date.new(2015,12,25))

will return true, while


   mondays.include?(Date.new(2015,12,25))
   wednesdays.include?(Date.new(2015,12,25))


will both return false.

As usual with the Interpreter pattern, the real power of Runt shines through when you start combining expressions. Here is a Runt expression that will match the dreaded "all morning long" class schedule that I used to suffer through in my college days:


   nine_to_twelve = Runt::REDay.new(9,0,12,0)
   class_times = (mondays | wednesdays | fridays) & nine_to_twelve


Runt is a good example of a parser-less interpreter: It is intended to be a simple, easy-to-use library of classes for Ruby programmers.

Wrapping Up

In this chapter, we looked at the Interpreter pattern. This pattern suggests that sometimes a simple interpreter is the best way to solve a problem. The Interpreter pattern is good at solving well-bounded problems such as query or configuration languages and is a good option for combining chunks of existing functionality together.

The heart of the Interpreter pattern is the abstract syntax tree. You think of your new little language as a series of expressions, and decompose those expressions into a tree structure. How you perform that decomposition is up to you: You can supply your clients with an API for building up the tree in code, or you can write a parser that takes strings and turns them into the AST. Either way, once you have the AST, you can use it to evaluate itself and come up with the solution.

The Interpreter pattern brings with it the advantages of flexibility and extensibility. By building different ASTs, you can get the same Interpreter classes to do different things. It is usually straightforward to extend your language by adding new kinds of nodes to your AST. These benefits carry a cost, however, in terms of performance and complexity. Interpreters do tend to be slow and making them run fast is difficult, so it is probably best to limit your use of the Interpreter pattern to areas that do not demand high performance. The complexity comes from the simple fact that the Interpreter pattern requires a fair bit of infrastructure: You need all of the classes that go into building the AST, and maybe a parser to boot.

In the next chapter, we will look at domain-specific languages (DSLs), a pattern that is closely related to the Interpreter pattern. In particular, we will spend a lot of time examining internal DSLs, an elegant alternative to the sometimes painful job of coming up with a parser for your interpreter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.151.21