Chapter 6. Choosing between Internal and External DSLs

Now that we’ve gone through the details of implementing internal and external DSLs, we’re now at a point where we can better understand their strengths and weaknesses. This gives us enough information to decide which of the two techniques to use, and indeed to decide if a DSL is appropriate at all.

One of the great difficulties is the lack of information to base your choice on. Only a few people do much with DSLs, and those that do tend to only use one or two techniques, and so can’t really compare the different styles. This issue is further complicated by the fact that many of the techniques in this book aren’t widely known. My hope is that this book will help people build DSLs more easily, but until it’s been out in the wild for a while, we can’t tell what effect it has on decisions on using a DSL or choosing one kind of DSL. So, my thoughts on this topic are more speculative than I would like.

6.1 Learning Curve

At first glance, the learning curve costs seem to favor using an internal DSL. After all, an internal DSL is really just a funky kind of API, and you are using facilities of a language you already know. With an external DSL, you have to learn about parsers, grammars, and Parser Generators.

There’s some truth to this, but the picture is rather more nuanced. There is certainly a bunch of new concepts to learn with Syntax-Directed Translation, and the way you drive parsers with grammars can sometimes seem like magic. It’s not as bad as many people fear it is, but if you’ve not worked with these kinds of tools before, I would recommend that you work with some trial examples first to become familiar with the tools before you make any estimates on doing the real work.

Sadly, the learning curve for Syntax-Directed Translation is made worse by the poor documentation for most Parser Generator tools. Even the documentation that is there tends to be written for people working on general-purpose languages rather than DSLs. For many tools, the only documentation is a Ph.D. thesis. There’s a crying need to do more to make Parser Generator tools accessible to those who want to use them for DSL work but don’t have a background in the language community.

There is the point that you can use Delimiter-Directed Translation instead. The tools here are much more familiar—breaking up strings, regular expressions, no need for grammars. There are limits to where you can go with Delimiter-Directed Translation, and most of the time I think it’s better to face the learning curve of Syntax-Directed Translation, but Delimiter-Directed Translation is an option to keep in mind, particularly for a regular language.

Using an XML carrier syntax is another way to avoid the cost of learning Syntax-Directed Translation. In this case, I certainly think that learning Syntax-Directed Translation is worth the cost, as the resulting language is so much clearer to read.

On the other hand, internal DSLs aren’t necessarily as easy as you might think. Although you are using a familiar language, you are doing it in a very odd way. Internal DSLs often rely on obscure tricks in the host language to produce something that’s fluent. So, even if you know the language well, you may need to spend some time finding out about the tricks available to you in your particular language. The patterns in this book should help you get started by suggesting what to look for, but you’ll find particular language tricks that aren’t here. Finding these and sorting out how to use them presents a learning curve of its own. The bright side is that you can mount this learning curve slowly, learning new techniques as you develop the DSL. This contrasts with Syntax-Directed Translation where you have to learn much more just to get going.

So, despite the fact that the difference is smaller than you might initially think, I’d still say that internal DSLs are easier to learn.

When considering the learning curve, remember that it applies not just to you but to anyone who wants to touch your code. Using an external DSL is likely to be less approachable for others who don’t want to put much effort into learning how to use it.

6.2 Cost of Building

If you’re using a DSL technique for the first time, the major cost is the cost of ascending the learning curve. Once you’re familiar with the technique, that cost will go away, but there’s still some cost involved in providing a DSL.

When we’re thinking about the cost of building a DSL, it’s important to separate the cost of building the model from the cost of building the DSL that layers over it. In this discussion, I’m going to take the presence of the model as a given. It’s true that in many cases the model will be built in conjunction with the DSL, but the model has its own justification.

With an internal DSL, the extra effort involved is creating a layer of Expression Builders over the model. The Expression Builders are relatively straightforward to write, but most of the effort isn’t in getting them to work but in fiddling with the language so that you have something that works well. This Expression Builder cost won’t appear if you are putting the fluent methods directly in the model, but that may lead to other costs if people find these methods confusing compared to a command-query API.

With an external DSL, the equivalent cost is building the parser. Once you are up to speed with Syntax-Directed Translation, it’s actually quite quick to write a grammar and the translation code. My current sense is that the cost of developing a parser is similar to that of building an Expression Builder layer.

Once you are familiar with Syntax-Directed Translation, I don’t think it is any harder than using an XML carrier syntax, and is easier than using Delimiter-Directed Translation unless the language is quite simple.

So, my sense at the moment is that, once you are familiar with the techniques, there’s no big difference in cost for building an internal or external DSL.

6.3 Programmer Familiarity

Many people argue that with an internal DSL, programmers who use it are using the language that they are familiar with, which makes it easier to work with than a new, external DSL. To some extent this is true, but I don’t think the difference is as marked as most people think. The odd fluent interface style takes at least a little to get used to, although rather less than it does to learn how to build it. An external DSL is also not hard to learn as it is, by definition, rather simple. Echoing the syntactic conventions of your usual programming language can help make it more approachable.

Other than the syntactic element, the biggest difference is often that of the tools. If your host language is one with a sophisticated IDE, then you get to keep that familiar tooling with an internal DSL. You may need to use a more complicated technique like Class Symbol Table to preserve the tool’s support, but this way you can keep enjoying the IDE’s strengths. With external DSLs, however, you’re unlikely to be offered anything but the most basic level of editing support. You’ll usually have to fall back to a regular text editor. It’s not too difficult to support syntax highlighting, and most text editors are very configurable in that regard, but things like type-aware autocompletion are almost certainly beyond you.

6.4 Communication with Domain Experts

Internal DSLs are always tied to the syntax of the host language. The result will almost always be some constraints on how you can express things, together with some amount of syntactic noise. While this is unlikely to be a big factor for programmer users (who are used to these elements), domain experts are a different matter. The degree of constraints and syntactic noise also depends on the language; some languages are better suited for DSLs than others.

Even the best internal DSLs, however, don’t offer the same syntactic flexibility as an external DSL. The size of the comfort gap will depend on particular domain experts, but such is the value of the communication channel that I’d be inclined to push that bit harder and use an external DSL if it looks like it could make the difference.

If you’re not comfortable with building an external DSL, but not sure how well an internal DSL will fly with domain experts, you can try using an internal DSL first, then switch later if you think it’s worthwhile. Since you can use the same Semantic Model for both, the incremental cost of building two DSLs isn’t really that great.

6.5 Mixing In the Host Language

An internal DSL is really nothing more than a convention to use certain fluent methods to do things. There’s nothing to stop you from arbitrarily mixing DSLish code with regular imperative code. This wafer-thin boundary between the DSL and the host language has properties that may be beneficial or problematic—depending on what you are trying to do.

A benefit of this thin boundary is that it allows you to use the host language freely when you don’t have the constructs of the internal DSL available to you. So, if you need to express arithmetic in your DSL, there’s no point in making DSL constructs for this; just use the features of the host language. If you need to build abstractions on top of the DSL, you can use the abstraction facilities of the host language.

This strength is particularly nice when you need to put chunks of imperative code inside your DSL. A good example of this is using a DSL to describe how to build software. Build languages that use a Dependency Network, such as Make and Ant, have been around for a long time. Both Make and Ant are external DSLs, and both are very good at expressing the Dependency Network that you need for builds. However, the content of many build tasks requires more complex logic, and often the dependencies themselves need abstractions layered on top of them. Ant has thus suffered from sliding into generality, acquiring all manner of imperative constructs that don’t suit its nature or syntax.

Here, the contrast is with an internal DSL, such as the Rake language which is a Ruby internal DSL for building software. Being able to freely mix the Dependency Network with imperative code in Nested Closures makes it much easier to describe complicated build actions. Using Ruby’s objects and methods to build abstractions on top of the Dependency Network helps describe the higher-level structure of the build.

It’s not impossible to mix external DSLs with host code. You can embed host code into DSL scripts as Foreign Code. Similarly, you can embed DSLs into general-purpose code as strings—which is how we typically embed things like regular expressions and SQL today. But the mixing is awkward. Tools usually don’t know what you are doing and thus are clunky in how they work. It’s hard to integrate symbols between the two environments, so things like referring to a host code variable within a DSL fragment become difficult. If you want to intermix host and DSL code, then an internal DSL is almost always the way to go.

6.6 Strong Expressiveness Boundary

The ability to freely mix host and DSL code isn’t always a positive. It only really works if the users of the DSL are comfortable with the host language. It thus doesn’t usually apply to the case where you have domain experts reading your DSL. Throwing lumps of a host language into the DSL will usually only raise a communication barrier that the DSL was supposed to avoid.

Intermixing is also unhelpful in cases where you want DSLs to be written by a different group of programmers. Indeed, often the benefit of a DSL is that it produces a restricted range of what can be done. This restriction can make it easier to understand what to do, and serves as a barrier to bugs. If you have a DSL with strong boundaries, that limits the kinds of things you need to test for. Pricing rules in a DSL aren’t going to send arbitrary messages to your integration server or alter your order processing workflow. With a general-purpose language, anything is possible, so you have to watch the boundaries through convention and review. An external DSL’s limitations reduce what you have to watch for. Most of the time, this is good as it protects you from mistakes, but it may also help with security as well.

6.7 Runtime Configuration

One of the main reasons that XML DSLs have become so popular is that they allow you to alter the execution context of the code from compile time to runtime. For situations where you are using a compiled language and want to alter the behavior of the system without recompiling, this is an important factor. External DSLs allow you to do this since you can easily parse them at runtime, translate into a Semantic Model, and then execute that model. (Of course if you are programming in an interpreted language, then everything is at runtime anyway, so this isn’t an issue.)

One approach is to use interpreted languages in conjunction with a compiled language. You can then write an internal DSL in the interpreted language. In this scenario, many of the common benefits of an internal DSL may be attenuated. Unless most of the team is familiar with the dynamic language, you won’t get the language familiarly benefit of internal DSLs. Tooling for the dynamic language is often poorer. You won’t be able to easily mix the dynamic language and static language constructs, but a full dynamic language also means you can’t put firm boundaries around the DSL. That’s not to say you shouldn’t use an internal DSL in this way—there are plenty of cases where these potential issues aren’t applicable. But this attenuation does lead to more situations where an external DSL meshes better with a static host language.

6.8 Sliding into Generality

One of the most successful DSLs of modern times is Ant. Ant is a language for specifying builds for Java; it’s an external DSL in XML syntax. In a discussion about DSLs, James Duncan Davidson, Ant’s creator, asked: “How do we prevent disasters like Ant occurring?”

Ant is both a roaring success and a nightmare. It filled a huge gap in Java development at the time, but since then, its success has forced many teams to face its flaws. There are many problems with Ant, its XML syntax (which I also thought was a good idea at the time) is perhaps the most noticeable. But the real issue behind Ant is that over time, it steadily grew in capability so that it no longer has the limited expressiveness that a DSL needs.

This is a common road to heck. People with a Unix background will often use the example of Sendmail. It happens because the demands placed on the DSL get steadily greater, leading to more features and greater complexity—and, drop by drop, all the clarity that a good DSL has leaks out.

This danger always exists with external DSLs—and, like most issues in design, has no simple answer. It needs a constant attention and determination to not let things get too complex. There are alternatives. One is to let other languages develop for more complicated cases. Instead of extending one language, you can introduce other languages for particular and difficult cases. You can layer another language over the base DSL whose output is that base DSL. This can be a useful technique to allow abstractions to be built in a language that lacks abstraction-building features. Internal DSLs are often a good choice when this kind of complexity grows, because they allow you to mix DSL and general-purpose elements.

Since internal DSLs are melded in with a general-purpose host language, they don’t suffer from this problem. An analogous problem may arise when mixing with the host language gets so intertwined that you lose any sense of DSLness.

6.9 Composing DSLs

I’ve been saying ad nauseam that you want small DSLs that are very limited in their capabilities. So, to get real work done, you have to integrate your DSLs with one or more general-purpose languages. You can also compose DSLs together.

With internal DSLs, composing is as easy as mixing them with the host language. You can also use the host language’s abstraction features to help make the composition work.

With external DSLs, such composition is more difficult. To do this composition with Syntax-Directed Translation, you need to be able to write independent grammars for different languages, and yet be able to compose the grammars together. Most Parser Generators, however, don’t have facilities to handle this case—another consequence of their focus on supporting general-purpose programming languages. As a result, you need to use Foreign Code if you want to compose DSLs, which is more clunky that it need be. (There is some work going on to provide tools that support more composition, but they are currently rather immature.)

6.10 Summing Up

My conclusion is that there is no conclusion. I don’t see a clear, general advantage for internal or external DSLs. I’m not even sure I see some general guidelines to pontificate. I hope I’ve given you enough information thus far to help you judge what would best suit your particular situation.

One thing I do want to stress, however, is that experimenting in both directions need not be as expensive as you think. If you use a Semantic Model, it’s relatively easy to layer on multiple DSLs, both internal and external. This gives you lots of opportunity for experimentation to find an approach that works well for you.

An approach that Glenn Vanderburg finds useful is to use an internal DSL early on, when you’re still trying to understand what you want to do with it. That way you have easy access to facilities from the host language and a more seamless environment to evolve in. Once things settle down, and there’s a need for some of the advantages of an external DSL, you can then build one. Again, a Semantic Model makes this process much easier.

There is another option that I haven’t mentioned yet—using a language workbench. I’ll come to that in “Language Workbenches,” p. 129.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.37.20