Chapter 45. Textual Polishing

Perform simple textual substitutions before more serious processing.

3 hours ago => 3.hours.ago

Internal DSLs are often easier to develop, particularly if you’re not comfortable with parsing. However, the resulting DSLs contain host language artifacts that can are awkward for nonprogrammers to read.

Textual Polishing uses a series of simple regular expression substitutions to smooth some of these out.

45.1 How It Works

Textual Polishing is a very simple technique. It involves running a series of text substitutions on the DSL script before it gets to the parser. A simple example is if readers find the use of dots for method calls off-putting. A simple substitution of dots for spaces can turn 3 hours ago into 3.hours.ago. More involved patterns can turn 3% into percentage(3). The output of the Textual Polishing is an expression in an internal DSL.

Specifying the polishing is a simple matter of writing a sequence of regular expression substitutions—which most language environments support. The tricky thing, of course, is getting the regular expressions correct so you don’t get unwanted substitutions. A space in a quoted string probably should not be turned into a dot, but that makes the regex much harder to write.

I’ve seen Textual Polishing most often in dynamic languages, where you can evaluate text at runtime. Here, the language reads in the DSL expression, polishes it, then evaluates the resulting internal DSL code. You can, however, also do this with a static language. In this case, you’d run the polishing before compiling the DSL script—which does introduce another step into the build process.

While Textual Polishing is mostly an internal DSL technique, there are a few cases where it can be useful with external DSLs. When certain things are hard to spot with the usual lexer and parser chain, a preprocessing of Textual Polishing before lexing can make things more helpful. Semantic indentation and possibly semantic newlines are examples.

You can think of Textual Polishing as a simple application of textual Macros, with all the corresponding problems.

45.2 When to Use It

I confess I’m rather wary of Textual Polishing; my feeling is that if you use a little, it doesn’t help much, and if you use it a lot, it gets very complicated, so it may then be better to use an external DSL. Although the basic notion of repeated substitutions is simple, it’s very easy to make mistakes in the regular expressions.

Textual Polishing cannot do anything to change the syntactic structure of the input, so you are still tied to the basic syntactic structure of the host language. Indeed, I think it’s important to keep the prepolished DSL and the resulting internal DSL expressions recognizably similar. The resulting internal DSL should be as clear as possible for programmers to read—the polishing is only a visual convenience for nonprogrammers.

If you find the noise characters in an internal DSL annoying, an alternative approach to Textual Polishing is to use an editor that supports syntax coloring and set it up to color the noise characters with a very gentle color that fades into the background. That way, a reader’s eye is more likely to skip over them. If you set it to the same color as the background, you make these characters disappear completely.

If you find yourself doing a lot of polishing, I strongly suggest that you explore using an external DSL instead. Once you get up the learning curve of writing a parser, you’ll get much more flexibility, and it will be easier to maintain the parser than the sequence of polishing steps.

45.3 Polished Discount Rules (Ruby)

Consider an application that processes discount rules against orders. A simple discount rule might be to discount the price by 3% if the order’s value is greater than $30,000. To capture that phrase in a Ruby internal DSL, I might use an expression like this:

rule = DiscountBuilder.percent(3).when.minimum(30000).content

Not too bad, but still a bit awkward for nonprogrammers. Some of the awkwardness I remove by using object scoping. If I can put the expressions as lines in a separate file, I can use Ruby’s instance_eval (a form of Object Scoping) to evaluate each line.

image

Then my rules file can have lines like this:

percent(3).when.minimum(30000)

With this technique, I also move the call of content (the Method Chaining end method) to the processing code, which gets it out of the user-visible part of the DSL. The check builder_has_rule? is needed since it evaluates each line, and if that line is a comment, there won’t be a rule defined. Similarly, if the rule is malformed, there’ll be errors, but I’ll neglect handling that for this example.

This may be good for programmers, but domain experts may prefer a different formulation—something like this:

3% if value at least $30000

I can get this formulation into the above DSL by using Textual Polishing. The polishing is a series of textual substitutions.

image

The first transformation is to turn 3% into percent(3).

image

This is the basic approach: Make a suitable regex, match it, and replace it with the call that you need in the actual internal DSL.

In this example, I’m expecting the various elements to be separated by whitespace, just as I would when tokenizing an external DSL. As a result, it’s valuable to ensure that all of the regexes have boundary expressions at both ends. In most cases, this boundary is  (word boundary), but occasionally I need something else (such as s+ here since “%” doesn’t constitute a word boundary).

The “at least” is handled the same way, albeit with a more complicated regex.

image

Our domain expert prefers “if” to “when.” In an unpolished internal DSL, this is a problem because it’s a Ruby keyword, but polishing can fix that.

image

An alternative here is to rename the when method to something like my_if or _if. Doing this makes it easier to see the correspondence between the polished text and the resulting DSL.

My last step is to replace the spaces with method call dots, and the result will now be valid Ruby in my internal DSL.

image

This doesn’t look too bad, but the code is only enough to process this one particular example. To handle more cases, the code will have to get more complex and much more ugly. So in this case, I’d be keeping a careful eye on it, ready to reach for an external DSL to use instead.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.71.106