Chapter 2. Using Domain-Specific Languages

After going through the examples in the last chapter, you should now have a good feel for what a DSL is, even though I haven’t given any general definition yet. (You can find some more examples in “A Zoo of DSLs,” p. 147.) Now I’ll move on to that definition and discuss the benefits and problems of DSLs. I want to do this early on to provide some context before I start talking about implementing them in the next chapter.

2.1 Defining Domain-Specific Languages

“Domain-specific language” is a useful term and concept, but one that has very blurred boundaries. Some things are clearly DSLs, but others can be argued one way or the other. The term has also been around for a while and, like most things in software, has never had a very firm definition. For this book, however, I think a definition is valuable.

Domain-specific language (noun): a computer programming language of limited expressiveness focused on a particular domain.

There are four key elements to this definition:

Computer programming language: A DSL is used by humans to instruct a computer to do something. As with any modern programming language, its structure is designed to make it easy for humans to understand, but it should still be something executable by a computer.

Language nature: A DSL is a programming language, and as such should have a sense of fluency where the expressiveness comes not just from individual expressions but also from the way they can by composed together.

Limited expressiveness: A general-purpose programming language provides lots of capabilities: supporting varied data, control, and abstraction structures. All of this is useful but makes it harder to learn and use. A DSL supports a bare minimum of features needed to support its domain. You can’t build an entire software system in a DSL; rather, you use a DSL for one particular aspect of a system.

Domain focus: A limited language is only useful if it has a clear focus on a small domain. The domain focus is what makes a limited language worthwhile.

Notice that the domain focus comes last in that list, and is merely a consequence of the limited expressiveness. Many people use a literal definition of DSL as a language for a specific domain. But literal definitions are often incorrect: We don’t call coins “compact disks” even though they are disks that rather more compact than those disks that we do apply the term to.

I divide the DSLs into three main categories: external DSLs, internal DSLs, and language workbenches.

• An external DSL is a language separate from the main language of the application it works with. Usually, an external DSL has a custom syntax, but using another language’s syntax is also common (XML is a frequent choice). A script in an external DSL will usually be parsed by a code in the host application using text parsing techniques. The Unix tradition of little languages fits this style. Examples of external DSLs that you probably have come across include regular expressions, SQL, Awk, and XML configuration files for systems like Struts and Hibernate.

• An internal DSL is a particular way of using a general-purpose language. A script in an internal DSL is valid code in its general-purpose language, but only uses a subset of the language’s features in a particular style to handle one small aspect of the overall system. The result should have the feel of a custom language, rather than its host language. The classic example of this style is Lisp; Lisp programmers often talk about Lisp programming as creating and using DSLs. Ruby has also developed a strong DSL culture: Many Ruby libraries come in the style of DSLs. In particular, Ruby’s most famous framework, Rails, is often seen as a collection of DSLs.

• A language workbench is a specialized IDE for defining and building DSLs. In particular, a language workbench is used not just to determine the structure of a DSL but also as a custom editing environment for people to write DSL scripts. The resulting scripts intimately combine the editing environment and the language.

Over the years, these three styles have developed their own communities. You’ll find people who are very experienced in internal DSLs but have no idea how to build an external DSL. I find this problematic because, as a result, people may not choose the best tool for the job. I remember talking to a team who had used very clever internal DSL processing techniques to support a custom syntax that, I’m convinced, would have been much easier as an external DSL. But, since they didn’t know how to build external DSLs, they didn’t have that option open to them. Hence it’s important to me in this book to present both internal and external DSLs clearly so you’ll have that information. (I’m rather more sketchy on language workbenches as they are so new and still evolving.)

Another way of looking at a DSL is as a way of manipulating an abstraction. In software development, we build abstractions and then manipulate them, often on multiple levels. The most common way to build in abstraction is by implementing a library or framework; the most common way to manipulate this framework is through command-query API calls. In this view a DSL is a front-end to a library providing a different style of manipulation to the command-query API. In this context, the library is the Semantic Model of the DSL. A consequence of this is that DSLs tend to follow libraries, and indeed I consider a Semantic Model to be a necessary adjunct to a well-built DSL.

When people talk about DSLs, it’s easy to think that building the DSL is the hard work. In fact, usually the hard work is building the model; the DSL then just layers on top of it. It still takes effort to get a DSL that works well, but that effort is usually much smaller than for building the underlying model.

2.1.1 Boundaries of DSLs

As I said, DSLs are a concept with blurry boundaries. While I don’t think anyone would disagree that regular expressions are a DSL, there’s plenty of cases that are open to reasonable argument. As a result I think it’s worth talking about some of these cases here as they help provide a better idea of how to think about DSLs.

Each style of DSL has different boundary conditions, so I’ll discuss them separately. As we go through these, it’s worth remembering that the distinguishing characteristics of DSLs are their language nature, domain focus, and limited expressiveness. As it turns out, the domain focus isn’t a good boundary condition—the boundaries more commonly revolve around limited expressiveness and the language nature.

I’ll start with internal DSLs. Here, the boundary question is the difference between an internal DSL and the normal command-query API. In many ways, an internal DSL is nothing more than a quirky API (as the old Bell labs saying goes, “Library design is language design”). In my view, the heart of the difference is the language nature. Mike Roberts suggested to me that a command-query API defines the vocabulary of the abstraction, whereas an internal DSL adds a grammar.

A common way of documenting a class with a command-query API is to list all the methods it has. When you do this, each method should make sense on its own. You have a list of “words,” each with a somewhat self-sufficient meaning. The methods of an internal DSL often only make sense in the context of a larger expression in the DSL. In the Java internal DSL example earlier, I had a method called to that specified the target state of a transition. Such a method would be a bad name in a command-query API, but fits inside a phrase like .transition(lightOn).to(unlockedPanel).

As a result, an internal DSL should have the feel of putting together whole sentences, rather than a sequence of disconnected commands. This is the basis for calling these kinds of APIs fluent interfaces.

Limited expressiveness, for an internal DSL, is obviously not a core property of the language, since the language of an internal DSL is a general-purpose language. In this case, limited expressiveness comes from the way you use it. When forming a DSL expression, you limit yourself to a small subset of the general language features. It’s common to avoid conditions, looping constructs, and variables. Piers Cawley called this a pidgin use of the host language.

With external DSLs, the boundary is with general-purpose programming languages. Languages can have a domain focus but still be general-purpose languages. A good example of this is R, a language and platform for statistics; it is very much targeted at statistics work, but has all the expressiveness of a general-purpose programming language. Thus, despite its domain focus, I would not call it a DSL.

A more obvious DSL is regular expressions. Here, the domain focus (matching text) is coupled with limited features—just enough to make text matching easy. One common indicator of a DSL is that it isn’t Turing-complete. DSLs usually avoid the regular imperative control structures (conditions and loops), don’t have variables, and can’t define subroutines.

It’s at this point where many people will disagree with me, using the literal definition of a DSL to argue that languages like R should be counted as a DSL. The reason I put a strong emphasis on limited expressiveness is that it is what makes the distinction between DSLs and general-purpose languages useful. The limited expressiveness gives DSLs different characteristics, both in using them and in implementing them. This leads to a different way of thinking about DSLs compared to general-purpose languages.

If this boundary isn’t fuzzy enough, let’s consider XSLT. XSLT’s domain focus is that of transforming XML documents, but it has all the features one might expect in a regular programming language. In this case, I think the way it is used matters more than the language itself. If XSLT is being used to transform XML, then I would call it a DSL. However, if it’s being used to solve the eight queens problem, I would call it a general-purpose language. A particular usage of a language can put it on either side of the DSL line.

Another boundary with external DSLs is with serialized data structures. Is a list of property assignments (color = blue) in a configuration file a DSL? I think that here, the boundary condition is the language nature. A series of assignments lacks fluency, so it doesn’t fit the criteria.

A similar argument applies to many configuration files. Many environments these days provide a lot of their programmability through some kind of configuration files, often using XML syntax. In many cases, these XML configurations are effectively DSLs. However, this may not always be the case. Sometimes, the XML files are intended to be created by other tools, so XML is only used for serialization and not intended to be used by humans. In that case, since humans aren’t expected to use it, I wouldn’t classify it as a DSL. Of course it’s still valuable to have a storage format that is human-readable, as it can be useful in debugging. The question isn’t whether it’s human-readable or not, but whether the representation is a human’s main way of interacting with that aspect of the system.

One of the biggest issues with these kinds of configuration files is that, even though they aren’t intended to be human-edited, they end up being the primary editing mechanism in practice. In this case the XML becomes a DSL by accident.

With language workbenches, the boundary is between a language workbench and any application that allows a user to design their own data structure and forms—something like Microsoft Access. After all, it’s possible to take a state model and represent it in a relational database structure (I’ve seen far worse ideas). You can then produce forms to manipulate the model. There are two questions here: Is Access a language workbench, and is the thing you define a DSL?

I’ll start with the second question. Since we are building a particular application for the state machine, we have both domain focus and limited expressiveness. The critical issue is that of the language nature. If we are putting data in forms and saving them in a table, there usually isn’t a real language-like feel to it. A table can be an expression of a language nature—FIT (“FIT,” p. 155) and Excel both use a tabular representation and both have a language feel to them (I would consider FIT to be domain-specific and Excel general-purpose). But most applications do not try to achieve that kind of fluency; they just create forms and windows that don’t stress the interconnections. For example, the textual interface of the Meta-Programming System Language Workbench has a feel very different from most form-based UIs. Similarly, few applications allow you to lay out a diagram to define how things are put together in the manner of MetaEdit.

As to whether Access is a language workbench, I’d go back to the design intent. Access wasn’t designed to be a language workbench, although you can use it that way if you really want. Look at how many people use Excel as a database—even though it wasn’t designed to be one.

In a broader sense, is a purely human jargon a DSL? A common example that’s bandied around is the language used to order a coffee at Starbucks: “Venti, half-caf, nonfat, no-foam, no-whip latte.” The language is nice because it has limited expressiveness, a domain focus, a sense of grammar as well as vocabulary. It falls outside my definition, however, because I use “domain-specific language” to refer to computer languages only. If we implemented a computer language to understand Starbucks expressions, then that would truly be a DSL, but the words we spout when getting our caffeine fix are a human language. I use domain language to mean a domain-specific human language and reserve “DSL” for computer languages.

So, what has this discussion of the boundaries of DSLs taught us? Hopefully, one thing that is clear is that there are few sharp boundaries. Reasonable people can disagree on what is a DSL. Tests like language nature and limited expressiveness are themselves very blurry, so we should expect the result to exhibit the same blur. And not everyone will use the boundary conditions that I do.

In this discussion, I’ve excluded many things from being a DSL, but this doesn’t mean that I don’t consider them valuable. The purpose of a definition is to help in communication so different people can have the same idea of what we’re talking about. For this book, it helps make clear whether the techniques I describe are relevant. I find that this definition of DSLs helps target the techniques I describe more effectively.

2.1.2 Fragmentary and Stand-alone DSLs

The secret panel state machine example I used in “Using Domain-Specific Languages,” p. 27 is a stand-alone DSL. By this I mean that you can look at a block of DSL script, typically a single file, and it is all DSL. If you are familiar with the DSL but not with the host language of the application, you should be able to understand what the DSL does because the host language either isn’t there (in the external case) or is subdued by the internal DSL.

Another way DSLs appear is in a fragmentary form. In this form, little bits of DSL are used inside the host language code. You can think of them as enhancing the host language with additional features. In this case, you can’t really follow what the DSL is doing without understanding the host language.

For an external DSL, a good example of a fragmentary DSL is regular expressions. You don’t have a whole file of regular expressions in a program, but you have little snippets interspersed with regular host code. Another example of this is SQL, often used in the form of SQL statements within the context of a larger program.

Similar fragmentary approaches are used with internal DSLs. A particularly fruitful area of internal DSL development has been in the unit testing world. In particular, expectation grammars in mock object libraries are short bursts of DSLs within a larger host code context. A popular language feature for internal fragmentary DSLs is Annotations which allow you to add metadata to the host code programming elements. This makes annotations suitable for fragmentary DSLs but useless for stand-alone ones.

The same DSL can be used in both stand-alone and fragmentary contexts; SQL is a good example of this. Some DSLs are designed to be used in a fragmentary form, others in a stand-alone form, and still others can swing both ways.

2.2 Why Use a DSL?

Now, I hope, we’re pretty much on board with what a DSL is. The next question is why should we consider using one.

DSLs are a tool with limited focus. They aren’t like object orientation or agile processes which introduce a fundamental shift into the way we think about software development. Instead, DSLs are a very specific tool for very particular conditions. A typical project might use half a dozen or so DSLs in various places—indeed, many already do.

In “Languages and Semantic Model,” p. 16, I kept saying that a DSL is a thin veneer over a model, where the model might be a library or framework. This phrase should remind us that whenever you think about the benefits (or disadvantages) of a DSL, it’s important to separate the benefits provided by the model from the benefits of the DSL. It’s a common mistake to confuse the two.

DSLs have the potential to realize certain benefits. When you are considering using a DSL, you should weigh these benefits and decide which of them are applicable to your circumstances.

2.2.1 Improving Development Productivity

The heart of the appeal of a DSL is that it provides a means to more clearly communicate the intent of a part of a system. If you read Miss Grant’s controller definition in a DSL form, it’s easier for you to understand what it’s doing than through the command-query API of the model.

This clarity isn’t just an aesthetic desire. The easier it is to read a lump of code, the easier it is to find mistakes, and the easier it is to modify the system. So, for the same reason that we encourage meaningful variable names, documentation, clear coding constructs—we should encourage DSL usage.

People often underestimate the productivity impact of defects. Not only do defects detract from the external quality of software, they also slow developers down by sucking up time in investigations and fixes, sowing confusion about the behavior of the system. The limited expressiveness of DSLs makes it harder to say wrong things and easier to see when you’ve made an error.

The model alone provides a considerable improvement in productivity. It avoids duplication by gathering together common code; above all, it provides an abstraction to think about the problem that makes it easier to specify what’s going on in an understandable way. A DSL enhances this by providing a more expressive form to read and manipulate that abstraction. A DSL can help people learn how to use an API since it shifts focus to how different API methods should be combined together.

An interesting example of this I’ve come across is using a DSL to wrap an awkward third-party library. The DSL’s usual advantages of a more fluent interface are magnified when the command-query interface is poor. In addition, the DSL only has to support the actual client usage, which can significantly reduce the surface area that the client developers need to learn.

2.2.2 Communication with Domain Experts

I believe that the hardest part of software projects, the most common source of project failure, is communication with the customers and users of that software. By providing a clear yet precise language to deal with domains, a DSL can help improve this communication.

This benefit is more nuanced than the simple productivity argument. For a start, many DSLs aren’t suitable for domain communication—the DSLs for regular expressions or build dependencies don’t really fit in here. Only a subset of stand-alone DSLs really apply to this communication channel.

When people talk about DSLs in this context, it’s often along the lines of “Now we can get rid of programmers and have business people specify the rules themselves.” I call this argument the COBOL fallacy—since that was the expectation with COBOL. It’s a common argument, but I don’t think it improves with repetition.

Despite the COBOL fallacy, I do think DSLs can improve communication. It’s not that domain experts will write the DSLs themselves; but they can read them and thus understand what the system thinks it’s doing. By being able to read DSL code, domain experts can spot mistakes. They can also talk more effectively to the programmers who do write the rules, perhaps by writing some rough drafts that can be refined into proper DSL rules.

I’m not saying that domain experts should never write DSLs themselves. I have run into too many cases where a team had succeeded in getting domain experts to write significant bits of behavior using a DSL. However, I still think the biggest gain from using a DSL in this way comes when domain experts start reading it. Focusing on reading can be the first step towards writing the DSL, with the advantage that you lose nothing if you don’t take that further step.

My focus on DSLs as something for domain experts to read does introduce an argument against using DSLs. If you want domain experts to understand the content of a Semantic Model, you can do this just by providing a visualization of the model. It’s worth considering whether a visualization alone is a more efficient route than supporting a DSL. And it’s useful to have visualizations in addition to a DSL.

Involving domain experts in a DSL is very similar to involving domain experts in building a model. I’ve often found great benefit by building a model together with domain experts; constructing a Ubiquitous Language [Evans DDD] deepens the communication between software developers and domain experts. A DSL provides another technique to engage that communication. Depending on the circumstances, you might find domain experts participating in the model and the DSL, or the DSL only.

Indeed some people find that trying to describe a domain using a DSL is useful even if the DSL is never implemented. It can be beneficial just as a platform for communication.

So, all in all, involving domain experts in a DSL is difficult to achieve but has a high payoff. And even if you can’t get the domain experts’ involved, you may still get enough of a gain in developer productivity to make the DSL worth the effort.

2.2.3 Change in Execution Context

When talking about why we might want to express our state machine in XML, one strong reason was that the definition could be evaluated at runtime rather than compile time. This kind of reasoning, where we want code to run in a different environment, is a common driver for using a DSL. For XML configuration files, shifting logic from compile time to runtime is a common reason.

There are other useful shifts in execution context. One project I looked at needed to trawl though databases to find contracts that matched certain conditions and tag them. They wrote a DSL to support specifying these conditions and used it to populate a Semantic Model in Ruby. It would be slow to read all of the contracts into memory to run the query logic in Ruby, but the team could use the Semantic Model representation to generate SQL to do the processing in the database. Writing the rules in SQL directly was too difficult for the developers, let alone the business people. However, the business people could read (and in this case, write) the appropriate expressions in the DSL.

Using a DSL like this can often make up for limitations in a host language, allowing us to express things in a comfortable DSL and then generate code for the actual execution environment to use.

A model can facilitate this kind of shift. Once you have a model, it’s easy to either execute it directly or generate code from it. Models can be also be populated from a forms-style interface as well as a DSL. A DSL has a couple of advantages over using forms. DSLs are often better than forms at representing complicated logic. Furthermore, we can use the same code management tools, such as version control systems, to manage these rules. When rules are entered via a form and stored in a database, version control is often neglected.

This relates to a spurious benefit of a DSL. I’ve heard people argue that the good thing about a DSL is that it allows the same behavior to be executed in different language environments. One could write business rules that generate code in C# and Java, or describe validations that can run in C# on the server and Javascript on the client. This is a spurious benefit because you can gain this just by using a model; you don’t need a DSL at all. A DSL can make it easier to understand these rules, but that’s a separate issue.

2.2.4 Alternative Computational Model

Mainstream programming is pretty much all done using an imperative model of computation. This means that we tell the computer what things to do in what sequence, control flow is handled using conditionals and loops, we have variables—indeed lots of things that we take for granted. Imperative computation has become popular because it’s relatively easy to understand and easy to apply to lots of problems. However, it isn’t always the best choice.

The state machine is a good example of this. We can write imperative code and conditionals to handle this kind of behavior—it can be pretty nicely structured too. But thinking of it as a state machine is often more helpful. Another common example is defining how to build software. You can do it with imperative logic, but after a while most people recognize that it’s easier to do with a Dependency Network (e.g., to run tests, your compilations must be up-to-date). As a result, languages designed for describing builds (such as Make and Ant) use dependencies between tasks as their primary structuring mechanism.

You often hear such nonimperative approaches referred to as declarative programming. The notion is that these styles allow you to declare what should happen, rather than work through the imperative statements that describe how the behavior works.

You don’t need a DSL to use an alternative computational model. The core behavior of an alternative computational model comes from a Semantic Model, as the state machine example illustrates. However, a DSL can make a big difference as it makes it much easier for people to manipulate declarative programs that populate the Semantic Model.

2.3 Problems with DSLs

Having talked about when to use a DSL, it makes sense that I talk a bit about when not to use them, or at least about the problems involved in using them.

Fundamentally, the only reason to not use a DSL is if you don’t see any of the benefits of a DSL apply to your situation—or at least, you don’t see the benefits being worth the cost of building the DSL.

Even when DSLs are applicable, they do come with problems. On the whole, I think these problems are currently overstated, usually because people aren’t familiar enough with how to build DSLs and how they fit the broader software development picture. Also, many commonly stated problems with DSLs stem from the same confusion between DSL and model that plague many stated DSL benefits.

Many problems with DSLs are specific to one of the particular styles of DSL, and to understand these issues you need to have a deeper understanding of how these DSLs are implemented. As a result, I’ll leave the discussion of these problems till later; for now, I’ll just look at the broad problems in line with what we’ve currently discussed.

2.3.1 Language Cacophony

The most common objection I hear to DSLs is what I call the language cacophony problem: the concern that languages are hard to learn, so using many languages will be much more complicated than using a single one. Having to know multiple languages makes it harder to work on the system and to introduce new people to the project.

When people talk about this concern, there’s a couple of misconceptions that they commonly have. The first is that they often mistake the effort of learning a DSL with the effort of learning a general-purpose language. DSLs are far simpler than a general-purpose language, and thus far easier to learn.

Many critics understand this, but still object to DSLs because, even if they are relatively easy to learn, having many DSLs makes it harder to understand what’s going on in a project. The misconception here is forgetting that a project will always have complicated areas that are hard to learn. Even if you don’t have DSLs, you will typically have many abstractions in your codebase that you need to understand. Usually, these abstractions are captured by libraries in order to make them tractable. Even if you don’t have to learn several DSLs, you still have to learn several libraries.

So the true learning cost question is how much harder it is to learn a DSL than to learn the underlying model on its own. I’d argue that the incremental cost of learning the DSL is quite small compared to the cost of understanding the model. Indeed, since the whole point of a DSL is to make it easier to understand and manipulate the model, having a DSL should reduce the learning cost.

2.3.2 Cost of Building

A DSL may be a small incremental cost over its underlying library, but it’s still a cost. There’s still code to write, and above all to maintain. Thus, like any code, it has to pull its weight. Not every library benefits from having a DSL wrapper over it. If a command-query API does the job just fine, then there’s no value in adding another API on top of it. Even if a DSL might help, sometimes it would just be too much effort to build and maintain for the marginal benefit.

The maintenance of the DSL is an important factor. Even a simple internal DSL may cause problems if most of the development team finds it difficult to understand. External DSLs in particular add a lot of moving parts to the process, with parsers that are often intimidating for developers.

One of the things that inflates the cost of adding a DSL is the fact that people aren’t used to building them. There are new techniques to learn. Although you shouldn’t ignore these costs, you should remember that learning curve costs can be amortized across multiple times that you might use a DSL in the future.

Also, remember that the cost of a DSL is the cost over the cost of building the model. Any complicated area needs some mechanism to manage the complexity, and if it’s complicated enough to consider a DSL, it’s almost certainly complicated enough to benefit from a model. A DSL may help you think about the model and reduce the cost of building it.

This leads to the related issue—that encouraging DSLs will lead to many bad DSLs being built. Indeed I expect many bad DSLs to be built, just as there are plenty of libraries with bad command-query APIs. The question is whether a DSL will make things worse. A good DSL can wrap a bad library and make it easier to deal with (although I’d rather fix the library if I can). A bad DSL is a waste of resources to build and maintain, but that can be said of any bad code.

2.3.3 Ghetto Language

The ghetto language problem is a contrast to the language cacophony problem. Here, we have a company that’s built a lot of its systems on an in-house language which is not used anywhere else. This makes it difficult for them to find new staff and to keep up with technological changes.

In analyzing this argument, I begin by noting that if you’re writing whole systems in a language, that means it isn’t a DSL (at least by my definition) but a general-purpose language. Although you can use many of the DSL techniques for building general-purpose languages, I would very strongly urge you not to do so. Building and maintaining a general-purpose language is a big undertaking that condemns you to a lot of work and a life in a ghetto. Don’t do that.

I think there are a couple of real issues implied by the ghetto language problem. The first of these is that there’s always a danger for a DSL to accidentally evolve into a general-purpose language. You take your DSL and gradually add new features; today you add conditional expressions, another day you add loops, and whoops—you’re Turing-complete.

The only defense against this is to guard firmly against it. Make sure you have a clear sense of what narrow problem the DSL is focused on. Question any new features that seem to fall outside that mission. If you need to do more, consider using more than one language and combining them, instead of letting one DSL grow too big.

The same problem can plague frameworks. A good library has a clear sense of purpose. If your product pricing library includes an implementation of the HTTP protocol, you’re suffering from essentially the same failure to separate concerns.

The second issue is that of building yourself what you should be taking from outside. This applies to libraries as much as DSLs. For example, there’s little reason now to build your own object-relational mapping system. My general rule with software is that if it’s not your business, don’t write it yourself—always look to take it from somewhere else. In particular, with the rise of open source tools it often makes sense to work on extending an existing open source effort than writing your own from scratch.

2.3.4 Blinkered Abstraction

The usefulness of a DSL is that it provides an abstraction that you can use to think about a subject area. Such an abstraction is really valuable; it allows you to express the behavior of a domain much more easily than if you think in terms of lower-level constructs.

However, any abstraction, be it a DSL or a model, always carries with it a danger—that of putting blinkers on your thinking. With a blinkered abstraction, you spend more effort on fitting the world into your abstraction than the other way around. You see this when you come across something that doesn’t fit in with the abstraction—and you burn time trying to make it fit, instead of changing the abstraction to easily absorb the new behavior. Blinkering tends to occur once you’ve got comfortable with an abstraction and you feel it’s bedded down—at this point it’s natural to be worried by the prospect of uprooting it.

Blinkered abstractions are a problem with any abstraction, not just a DSL, but there is a concern that a DSL can make it worse. Since a DSL provides a more comfortable way of manipulating an abstraction, it can make you more reluctant to change it. This problem can be exacerbated when using the DSL with domain experts, who often are even more reluctant to change an abstraction once they get used to it.

As with any abstraction, you should always look at a DSL as something that’s evolving, not finished.

2.4 Wider Language Processing

This book is about domain-specific languages, but it’s also about techniques for language processing. The two overlap, because 90% of the use of language processing techniques in an average development team is for DSLs. But these techniques can be used for some other things as well and I would be remiss not to discuss some of these.

I ran into an excellent example of this when visiting a ThoughtWorks project team. They had the task of communicating to a third-party system by sending messages whose payload was defined by COBOL copybooks. COBOL copybooks are a data structure format for records. There were a lot of them, so my colleague Brian Egge decided to build a parser for the subset of COBOL copybook syntax in use and generate Java classes to interface to these records. Once he’d built the parser, he could happily interface to as many copybooks as he needed; none of the rest of the code needed to know about COBOL data structures, and any changes could be handled with a simple regeneration. It would be an appalling stretch to call COBOL copybooks a DSL—but the same basic techniques that we use for external DSLs did the trick.

So, just because I talk about these techniques in the context of DSLs shouldn’t stop you from applying them to other problems. Once you’ve got the hang of language processing ideas, there are many ways you can use them.

2.5 DSL Lifecycle

In this opening, I introduced a DSL by first describing a framework and its command-query API, and then layering a DSL on top of the API to make it easier to manipulate. I used this approach because I think it’s easier to understand DSLs that way, but it’s not the only way that people use DSLs in practice.

A common alternative is to define the DSL first. In this mode, you begin with some scenarios and write those scenarios down in the way you’d like the DSL to look. If the language is part of the domain functionality, it’s good to do this with a domain expert—this is a good first step to using the DSL as a communication medium.

Some people like to start with statements that they expect to be syntactically correct. This means that for an internal DSL, they’ll stick to the syntax of the host language. For an external DSL they’ll write statements they are confident they can parse. Others are more informal at the beginning and then take a second pass through the DSL to get it close to a reasonable syntax.

So, doing the state machine in this case, you’d sit down with some people who understand the customers’ needs. You’d come up with a set of example controller behaviors, either based on what people wanted in the past, or on something you think they’ll desire. For each of these, you would try to write them in some DSL form. As you work through various cases, you’ll modify the DSL to support new capabilities. By the end of the exercise, you’ll have worked through a reasonable sample of cases and will have a pseudo-DSL description of each of them.

If you’re using a language workbench, you’ll need to do this stage outside the workbench using a plain text editor, or regular drawing software, or pen and paper.

Once you have a representative set of pseudo-DSLs, you can start implementing them. Implementing here involves designing the state machine model in the host language, the command-query API for the model, the concrete syntax of the DSL, and the translation between the DSL and the command-query API. People do this in different ways. Some might like to do little bits at a time across all these elements: building a little bit of the model, adding the DSL to drive it, and hooking that thread all up with tests. Others might prefer to build and test the framework first and then layer the DSL over it. Yet others might like to get the DSL in place, then build the library, and fit them together. As I’m an incrementalist, I prefer thin slices of end-to-end functionality, so I go with the first of the three.

So I might start with a simplest of the cases that I see. I’d program a library that can support that case, using test-driven development. I’d then take the DSL and implement that, tying it to the framework I’d built. I’d be happy to make some changes to the DSL to make it easier to build, although I would run those changes past the domain expert to ensure we still share a common communication medium. Once I have one controller working, I’d pick the next one. I would evolve the framework and tests first, then evolve the DSL.

This doesn’t mean that the model-first route is a bad one; indeed its often an excellent choice. Usually it is used when you don’t think about using a DSL at first, or you’re not sure you’ll need one. You thus build the framework, work with it for a while, and then decide that a DSL would be a useful addition. In this case, you might have a state machine model up and running and used by many customers. You then realize that it’s harder than you’d like to add new customers, so you decide to try a DSL.

Here are a couple approaches you can use to grow a DSL on top of the model. A language-seeded approach slowly builds the DSL on top of the model, treating the model as a mostly black box. We would start by looking at all the controllers we currently have and sketching out pseudo-DSL for each one. Then we’d implement the DSL scenario by scenario, much as in the earlier case. We usually wouldn’t make any deep changes to the model, although I would be happy to add methods to the model to help support the DSL.

With a model-seeded approach, we’d add fluent methods to the model first, to make it easier to configure the model, and then gradually draw them away into a DSL. This approach is more oriented towards internal DSLs; you can think of it as a heavy refactoring of the model to derive the internal DSL. An appealing aspect to the model-seeded approach is that it’s very gradual, so it doesn’t inflict a notable cost to build the DSL.

There are many cases, of course, where you don’t even know you have a framework. You might build several controllers and only then realize that there is a lot of common functionality. I’d then refactor the system to create separation between the model and the configuration code. This separation is the vital step. While I might have a DSL in mind while doing it, I’d be more inclined to get the separation done first, before putting the DSL on top.

While I’m here, I should stress something that I wish I didn’t need to. Do make sure all your DSL scripts are kept under some form of version control system. A DSL script becomes part of your code and thus should be under version control just like everything else. The great thing about textual DSLs is that they play well with version control systems, allowing you to keep a clear track of the changes to the behavior of your system.

2.6 What Makes a Good DSL Design?

When people reviewed this book, they often asked for tips on creating a good design for the language. After all, language design is tricky and we want to avoid a proliferation of bad languages. I’d love to have a good advice to share, but I confess I don’t have a clear idea in my mind.

The overall goal for a DSL, as with any writing, is clarity for the reader. You want your typical reader, which may be a programmer or a domain expert, to be able to understand what the sentences in the DSL mean, as quickly and clearly as possible. While I don’t feel I can say much about how to do that, I do think it’s valuable to keep that goal in mind as you work.

I’m generally a fan of iterative design, and this is no exception. Try out ideas on your target audience. Be prepared to provide multiple alternatives and see how people react. Getting a good language will involve trying and rejecting lots of missteps. Don’t worry about wrong turns; the more of those you make and correct, the more likely you are to find a good path.

Don’t be afraid to use the jargon of the domain in the DSL and in its Semantic Model. If the users of the DSL are familiar with the jargon, then they should see it in the DSL. Jargon is there to enhance communication within a domain even if it sounds gibberish to those outside.

Do take advantage of the common conventions in your regular life. If everyone uses Java or C#, then use “//” for your comments and “{” and “}” for any hierarchic structures.

One area where I do think you need a specific caution is this: Don’t try to make the DSL read like natural language. There have been various attempts to do that with general-purpose languages, with Applescript as the most obvious example. The trouble is that such attempts lead to a lot of syntactic sugar which complicates understanding of the semantics. Remember that a DSL is a programming language, so using it should feel like programming, with the greater terseness and precision that programming has compared to a natural language. Trying to make a programming language look like natural language puts your head into the wrong context; when you’re manipulating a program, you must always remember you’re in a programming language environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.196.103