Preface

Domain-specific languages have been a part of the computing landscape since before I got into programming. Ask an old Unix-hand or Lisp-hand and they’ll happily bore you to tears on how DSLs have been a useful part of their bag of tricks. Despite this, they’ve never become a very visible part of the computing landscape. Most people learn about DSLs from someone else, and they often learn only a limited set of available techniques.

I’ve written this book to try to change this situation. My intention is to introduce you to a wide range of DSL techniques, so that you can make an informed choice about whether to use a DSL in your work and what kinds of DSL techniques to employ.

DSLs are popular for several reasons, but I will highlight the two main ones: improving productivity for developers and improving communication with domain experts. A well-chosen DSL can make it easier to understand a complicated block of code, thus improving the productivity of those working with it. It can also make it easier to communicate with domain experts, by providing a common text that acts as both executable software and a description that domain experts can read to understand how their ideas are represented in a system. This communication with domain experts is a benefit more difficult to achieve, but the resulting gain is much broader because it helps unclog one of the worst bottlenecks in software development—the communication between programmers and their customers.

I should also not overstate the value of DSLs. I frequently say that whenever you’re discussing the benefits, or indeed the problems, of DSLs, you should consider substituting “DSL” with “library.” Much of what you gain with a DSL you can also gain by building a framework. Indeed, most DSLs are merely a thin facade over a library or framework. As a result, the costs and benefits of a DSL are less than people think, but these costs and benefits are not understood as well as they should be. Knowing good techniques reduces the cost of building a DSL considerably—and my hope in this book is to enable that. The facade may be thin, but it is often useful and worth building.

Why Now?

DSLs have been around for ages, yet in recent years they’ve generated a significant uptick in interest. At the same time, I decided to spend a couple years writing this book. Why? While I don’t know if I can provide a definitive explanation for the general uptick, I can share a personal perspective.

At the turn of the millennium, there was a sense of an overwhelming standardization in programming languages—at least in my world of enterprise software. For a couple of years, Java was The One Future Language, and even when Microsoft challenged that statement with C#, it was still very much a similar language. New development was dominated by compiled, static, OO languages with a C-like syntax. (Even Visual Basic got made to look as close to this as it could.)

But it soon became clear that not everything sat well with this Java/C# hegemony. There were bits of important logic that didn’t fit well with those languages—which led to the rise of XML configuration files. Programmers were soon joking that they were writing more lines of XML than of Java/C#. Partly, this was due to a desire to modify behavior at runtime, but it was also a desire to express aspects of behavior in a more custom way. XML, despite its very noisy syntax, allows you to define your own vocabulary and provides a strong hierarchic structure.

But the noise of XML ended up being too much. People complained of angle brackets hurting their eyes. There was a desire to get the benefits of XML config files without the cost of XML.

Now our narrative reaches the mid-noughties and the explosive appearance of Ruby on Rails. Whatever Rails’ place is as a practical platform (and I think it’s a good one), it’s had a huge impact on how people think about library and framework design. A big part of the modus operandi of the Ruby community is a more fluent approach—trying to make interacting with a library feel like programming in a specialized language. This is a strand of thinking that goes back to one of oldest programming languages, Lisp. This approach also saw flowerings in what you would think as the stony ground of Java/C#: Both languages have seen fluent interfaces become more popular, probably due to the lasting influence of the original creators of JMock and Hamcrest.

As I looked at all of this, I felt a sense of a knowledge gap. I saw people using XML where a custom syntax would be more readable and not harder to do. I saw people bending Ruby into complicated contortions when a custom syntax would be easier. I saw people playing around with parsers when a fluent interface in their regular language would be a lot less work.

My hypothesis is that these things are happening because of a knowledge gap. Skilled programmers don’t know enough about DSL techniques to make an informed decision about which ones to use. That’s the kind of gap I enjoy trying to fill.

Why Are DSLs Important?

I’ll talk about this in more detail in “Why Use a DSL?,” p. 33 but I see two primary reasons why you should be interested in DSLs (and thus the techniques in this book).

The first reason is to improve programmer’s productivity. Consider this fragment of code:

input =~ /d{3}-d{3}-d{4}/

You may recognize it as a regular expression match, and probably you know what it’s matching. Regular expressions are often criticized for being cryptic, but think of how you would write this pattern match if all you could use were regular control code. How easy would it be to understand and modify that code, compared to a regular expression?

DSLs are very good at taking certain narrow parts of programming and making them easier to understand and therefore quicker to write, quicker to modify, and less likely to breed bugs.

The second reason for valuing DSLs goes beyond programmers. Since DSLs are smaller and easier to understand, they allow nonprogrammers to see the code that drives important parts of their business. By exposing the real code to the people who understand the domain, you enable a much richer communication channel between programmers and their customers.

When people talk about this kind of thing, they often say that DSLs will allow you to get rid of programmers. I’m extremely skeptical of that argument; after all, it was said of COBOL. Although there certainly are languages, such as CSS, written by people who don’t call themselves programmers, it’s the reading that matters more than the writing. If a domain expert can read, and mostly understand, the code that drives a key part of her business, then she can communicate in a much more detailed fashion with the programmer who actually types in the code.

This second reason for using DSLs isn’t easy to achieve. But the rewards are worth the effort. Communication between programmers and their customers is the biggest bottleneck in software development, so any technique that can address it is worth its weight in single malts.

Don’t Be Frightened by the Size of This Book

The thickness of this book may be a bit intimidating to you; it certainly makes me gulp to see how much there is here. I’m wary of big books, because I know we all only have so much time to read—so a big book is a big investment of time (which is much more valuable than the cover price). Therefore, I’ve used a format that I prefer in cases like this: a duplex book.

A duplex book is really two books under one cover. The first book is a narrative book, designed to be read cover to cover. My aim with the narrative book is to provide a brief overview of the topic, enough to get a broad understanding but not to do any detailed work. My target for a narrative section is no more than 150 pages, so it is a manageable amount to read.

The second, and larger, book is reference material, which is designed not to be read cover to cover (although some people do) but instead to be dipped into when needed. Some people like to read the narrative first to get a broad overview of the subject and then dive into those bits of the reference section that interest them. Others like to dive into the interesting parts of the reference section as they work through the narrative. The purpose of the split is for me to give you an idea of what’s skippable and what isn’t—then you can choose when you wish to skip and when you want to delve deeper.

I’ve also tried to make the reference bits reasonably self-standing, so if you want someone to use Tree Construction you can tell them to read just that pattern and get a good idea of what to do, even if their memory of the narrative is a little hazy. This way, once you’ve absorbed the narrative overview, it becomes a reference book that’s handy to grab when you need to look up some details.

The main reason the book is so large is that I haven’t figured out how to make it shorter. One of my primary aims in this book is to provide a resource that explores the breadth of different techniques available for DSLs. There are books out there that talk about code generation, or Ruby metaprogramming, or using Parser Generator tools. With this book, I want to sweep across all these techniques so that you can better understand their similarities and differences. They all play a role in a broader landscape, and my aim here is to provide a tour of that landscape while giving you enough detail to get started with the techniques I’m talking about.

What You’ll Learn

I’ve designed this book as a wide-ranging guide on different kinds of DSLs and the approaches to building them. Often, when people start experimenting with DSLs, they pick up only one technique. The point of this book is to show you a broad variety of techniques, so that you can evaluate which one is the best for your circumstances. I’ve provided details and examples on how to implement many of these techniques. Naturally, I cannot show you everything you can do, but there is enough to get you started and help you through the early decisions.

The early chapters should give you a good idea of what a DSL is, when DSLs come in useful, and what is their role compared to a framework or library. The implementation chapters will give you a broad start in how to build external and internal DSLs. The external DSL material will show you the role of a parser, the usefulness of a Parser Generator, and different ways of using a parser to parse an external DSL. The internal DSL section will show you how to think about the various language constructs you can use in a DSL style. While this won’t tell you how to best use your particular language, it will help you understand how techniques in one language correspond to those in others.

The code generation section will outline different strategies for code generation, should you need to use it. The language workbench chapter is a very brief overview of a new generation of tools. For most of this book I concentrate on techniques that have been used for decades; language workbenches are more of a future technique that is promising but unproven.

Who Should Read This Book?

My primary target audience for this book is professional software developers who are considering building a DSL. I imagine such a reader as someone with at least a couple of years of programming experience and thus comfortable with the basic ideas of software design.

If you’re deeply involved in language design, you probably won’t find much new in this book in terms of material. What I hope you will find useful is the approach I’ve used to organizing and communicating this information. Although there is a huge amount of work done in language design, particularly in academia, very little of this makes its way into the professional programming world.

The first couple of chapters of the narrative section should also be useful to anyone wondering what a DSL is and why it may be worth using. Reading the full narrative section will provide an overview on the various implementation techniques to use.

Is This a Java Book or a C# Book?

As with most books I write, the ideas here are pretty much independent of programming language. One of my top priorities is to uncover general principles and patterns that can be used with whatever programming language you happen to be using. As such, the ideas in the book should be valuable to you if you are using any kind of modern OO language.

One potential language gap here is functional languages. While I think much of this book will still be relevant, I don’t have enough experience in functional languages to really know to what extent their programming paradigm would alter the advice here. The book is also somewhat limited for procedural languages (i.e., non-OO languages like C) because several of the techniques I describe rely on object orientation.

Although I am writing about general principles here, in order to describe them properly I believe I need to show examples—which require a particular programming language to be written in. In choosing a language for examples, my primary criteria is how widely read the language is. As a result, almost all examples in this book are in Java or C#. Both are widely used in the industry; both have a familiar C-like syntax, memory management, and libraries that remove many awkward contortions. I am not claiming that these are the best languages to write DSLs in (in particular, because I don’t think they are), but they are the best languages to help communicate the general concepts I’m describing. I’ve tried to use both languages pretty much equally, tipping the balance only when one of them made things a bit easier. I’ve also tried to avoid elements of the language that require too much knowledge of the syntax, although that’s a difficult tradeoff since a good use of internal DSLs often involves exploiting syntactic quirks.

There are a few ideas which absolutely require a dynamic language and thus cannot be illustrated in Java or C#. In those cases I’ve turned to Ruby since it’s the dynamic language I’m most familiar with. It also helps that it’s well-suited to writing DSLs with. Again, despite my personal familiarity and considerable liking of the language, you should not infer that these techniques are not applicable elsewhere. I enjoy Ruby a lot, but the only way you can get my language bigotry to become evident is by dissing Smalltalk.

I should mention that there are many other languages for which DSLs are appropriate, including many that are specially designed to make it easier to write internal DSLs. I don’t mention them here because I haven’t done enough work with them to feel confident about pontificating on them. You should not interpret that as any negative opinion on them.

In particular, one of the difficult things about trying to write a languageindependent book on DSLs is that the usefulness of many techniques depends very directly on the features of a particular language. You should always be aware of the fact that your language environment can severely change the tradeoffs compared to the broad generalizations I have to make.

What’s Missing

One of the most frustrating parts of writing a book like this is the moment when I realize that I have to stop. I’ve put a couple of years of work into writing this, and I believe I have a lot of useful material for you to read. But I’m also conscious of the many gaps that remain. They are all gaps I’d like to fill, but doing so would take a significant amount of time. My belief is that it’s better to have an incomplete published book than wait years for a complete book—if a complete book is even possible. So here I mention the main gaps that I could see but didn’t have time to cover.

I’ve already alluded to one of these—the role of functional languages. There is a strong history of DSL construction in modern functional languages based on ML and/or Haskell—and I’ve pretty much ignored this work in my book. It’s an interesting question how much a familiarity with functional languages and their DSL usage would affect the structure of the material in this book.

Perhaps the most frustrating gap for me is the lack of a decent discussion of diagnostics and error handling. I remember being taught at university how the truly hard part of compiler writing is diagnostics—and thus I realize I’m glossing over a considerable topic by not covering it properly here.

My favorite section of this book is the section on alternative computational models. There is so much more I could write about here—but again, time was my enemy. In the end I decided I’d have to do with less alternative computational models than I would like—hopefully there’s still enough to inspire you to explore some more.

The Reference Book

While the narrative book is a pretty normal structure, I feel I need to talk a bit more about the structure of the reference section. I’ve divided the reference section into a series of topics grouped into chapters to keep similar topics together. My aim was that each topic should generally be self-standing—once you’ve read the narrative, you should be able to dive into a particular topic for more detail without looking into other topics. Where there are exceptions, I mention that at the start of the corresponding topic.

The majority of the topics are written as patterns. The focus of a pattern is a common solution to a recurring problem. So if a common problem is “How do I structure my parser?”, two possible patterns for the solution are Delimiter-Directed Translation and Syntax-Directed Translation.

There’s been a lot written about patterns in software development in the last twenty years or so, and different authors have different views on them. For me, patterns are useful because they provide a good way of structuring a reference section like this. The narrative will tell you that if you want to parse text, these two patterns are likely candidates; the patterns themselves will give you more information on selecting one and enough to get you started on implementing it.

Although I’ve written most of the reference section using a pattern structure, I haven’t used it for every case. Not all of the reference topics felt like solutions to me. With some topics, such as Nested Operator Expression, a solution didn’t really seem to be the focus of the topic, and the topic didn’t fit the structure I’m using for patterns; so in these cases, I didn’t use a pattern-style description. There are other cases that are hard to call patterns, such as Macro or BNF, but using the pattern structure seemed like a good way to describe them. On the whole, I’ve been guided by whether the pattern structure, in particular the separation of “how it works” and “when to use,” seems to work for the concept I’m describing.

Pattern Structure

Most authors use some kind of standard template when writing about patterns. I’m no exception, both in using a standard template and in having one that’s different from everyone else’s. My template, or pattern form, is the one I first used in P of EAA [Fowler PoEAA]. It has the following form.

Perhaps the most important element is the name. One of the biggest reasons I like using patterns as my reference topics is that it helps create a strong vocabulary to discuss the subject. There’s no guarantee that this vocabulary will be widely used, but at least it encourages me to be consistent in my own writing, while giving others a starting point should they wish to use it.

The next two elements are the intent and sketch. They are there to briefly summarize the pattern. They are a reminder of the pattern, so if you already “have the pattern” but don’t know the name, they can jog your memory. The intent is a sentence or two of text, while the sketch is something more visual. Sometimes I use a diagram for sketch, sometimes a brief code example—whatever I think will quickly convey the essence of the pattern. When I use a diagram, I sometimes use UML, but am quite happy to use something else if I think it will convey the meaning more easily.

Next comes a slightly longer summary, usually around a motivating example. This is a couple of paragraphs, and again is there to help people get an overview before diving into the details.

The two main body sections of the pattern are How it works and When to use it. The ordering of the two is somewhat arbitrary; if you’re trying to decide whether to use a pattern, you may only want to read the “when” section. Often, however, the “when” section doesn’t make much sense without knowing how it works.

The last sections are examples. Although I do my best to explain how a pattern works in the “how” section, often you need an example, with code, to really get the point. Code examples are dangerous, however, because they show only one application of the pattern, and some people may think it’s that application that is the pattern, rather than the general concept. You can use the same pattern a hundred times, making it a little different every time, but I only have limited space and energy for examples. So, always remember that the pattern is much more than the particular example shows.

All of the examples are deliberately very simple, focused only on the pattern in question. I use simple, independent examples because they match my goal of making each reference chapter independent of others. Naturally, there’ll be a host of other issues to deal with when you apply the pattern to your circumstances, but with a simple example I feel you at least have a chance of understanding the core point. Richer examples can be more realistic, but they would force you to deal with a bunch of issues extraneous to the pattern you are studying. So my aim is to show you the pieces, but leave to you the challenge of assembling them together for your particular needs.

This also means that my primary aim in the code is understandability. I’ve not taken into account performance issues, error handling, or other things that distract from the pattern’s essence.

I try to avoid code that I think is hard to follow, even if it’s more idiomatic for the language I’m using. This is a particularly awkward balance for internal DSLs that often rely on obscure language tricks in order to enhance the flow of the language.

Many patterns will miss out a section or two if I feel there isn’t anything compelling to put into that section. Some patterns don’t have examples because the best examples are in other patterns—when that happens, I do try to point them out.

Acknowledgments

As usual when I write a book, there’s a lot of other people who have done a great deal to help making the book happen. While my name may be on it, there are many other people who greatly improved its quality.

My first thanks go to my colleague Rebecca Parsons. One of my concerns about writing a book on this topic has been delving into an area with a great deal of academic background that I’m seriously under-aware of. Rebecca has been a huge help here, since she has a strong background in language theory. On top of that, she’s one of our leading technical troubleshooters and strategists, so she combines the academic background with a lot of practical experience. She would have liked, and is certainly qualified, to play a bigger role in this book, but ThoughtWorks find her far too useful. I’m glad for the many hours of talks she’s been able to give me.

When it comes to reviewers, an author always hopes for (and, kind of, dreads) the reviewer who goes through everything and finds tons of problems, both small and large. I’ve been lucky to find Michael Hunger who has played this role remarkably well. From the earliest days this book appeared on my website, he’s been pummeling me with my errors and how to fix them—and believe me, that’s a pummeling I need. Just as importantly, Michael has played a big role in pushing me to describe techniques utilizing static typing, particularly with respect to statically typed Symbol Tables. He has made tons of further suggestions, which would take another two books to do justice to; I hope to see these ideas explored in the future.

Over the last couple of years, I’ve given tutorials on this material in conjunction with my colleagues Rebecca Parsons, Neal Ford, and Ola Bini. Besides giving these tutorials, they’ve done much to shape the ideas in them and in this book, leading me to steal quite a few thoughts.

ThoughtWorks have generously given me a great deal of time to write this book. After spending so much of my life determined to never work for a company, I’m glad to have found a company that makes me want to stay and actively play a role in building it.

I’ve had a strong group of official reviewers who have gone through this book, found errors, and suggested improvements:

David Bock

Gilad Bracha

Aino Corry

Sven Efftinge

Eric Evans

Jay Fields

Steve Freeman

Brian Goetz

Steve Hayes

Clifford Heath

Michael Hunger

David Ing

Jeremy Miller

Ravi Mohan

Terance Parr

Nat Pryce

Chris Sells

Nathaniel Schutta

Craig Taverner

Dave Thomas

Glenn Vanderburg

A small but important thank you is due to David Ing who suggested the title for a Zoo of DSLs.

One of the nice things about being a series editor is that I’ve acquired a really good team of authors who are an outstanding sounding board for questions and ideas. Of these, I particularly want to thank Elliotte Rusty Harold for his wonderfully detailed comments and review.

Many of my colleagues at ThoughtWorks have acted as sources for ideas. I want to thank everyone who has let me poke around in projects over the last few years. I see far more ideas than I can write about, and I really enjoy having such a rich seam to mine from.

Several people made useful comments on the Safari Books Online roughcut, which I managed to make use of before we went to print: Pavel Bernhauser, Mocky, Roman Yakovenko, tdyer.

My thanks to those at Pearson who published this book. Greg Doench was the acquisition editor who looked after the overall process of publishing the book. John Fuller was the managing editor who oversaw the production.

Dmitry Kirsanov turned my sloppy English into something worthy of a book. Alina Kirsanova composed the book into the layout you now see and produced the Index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.134.133