Chapter 12. DSL implementation challenges

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12. DSL implementation challenges

In this chapter

Scaling DSL implementations
Deploying DSL scripts in production
Treating code as data
Creating user-extensible languages

The first part of this book dealt with building a DSL from scratch, and the second part with testing, documenting, versioning, and presenting the DSL. Between the two, there are still some gaps—details that are important for creating successful DSLs that we couldn’t look at before you gained some experience building DSLs in Boo.

In this chapter, we’re going to look at some of the more interesting challenges for DSLs, and in the next chapter we’ll make use of many of the solutions outlined in this chapter to build a real-world DSL.

Note

Many of the topics in this chapter involve AST manipulation, which we covered in chapter 6. As a quick reminder, the abstract syntax tree (AST) is how the compiler represents the code text internally. Boo allows you to access and modify the AST, and these modifications will affect the generated code, producing the result that you want.

12.1. Scaling DSL usage

A common goal for DSLs is to create a more transparent environment, one in which it’s clear what the system is doing and why. It usually works beautifully in the demo, and as long as the system remains small. Where it often fails is when the system reaches a critical mass, when it has hundreds or thousands of scripts working together to produce the appropriate results. In one such system I worked on, we had well over 15,000 scripts; in another, we had close to 4,000. As you can imagine, we ran into several problems with these systems that we hadn’t seen when they were small.

A lot of the solutions in this chapter resulted from working in that type of scenario. They’re applicable and valuable in smaller-scale languages, but their value will truly become apparent after your system has become successful and sees a lot of use.

In general, the challenges (and their solutions) can be divided into several general areas, as shown in figure 12.1:

Technical— Technical problems include startup time, response time, memory usage, and cold restart time (the time it takes to execute a script or a set of scripts the very first time). This generally covers the system not being ready for the scale of the problems it’s given.
Deployment— Unlike most code, DSL scripts are expected to be changed in production quite often. Doing so safely requires some prior planning.
Transparency— In a big system, understanding why the system performed a particular operation can be decidedly nontrivial.
Clarity— In a big system, it’s often hard to make sure that you’re expressing yourself clearly. This happens when the complexity of the problems increases, and you need to match that with complex solutions. Clarity is lost if you fail to take advantage of such a scenario to get to a higher level of expressiveness and understanding.

Figure 12.1. What it takes to scale a DSL

All the challenges we will deal with in this chapter can be assigned to one of these areas.

12.1.1. Technical—managing large numbers of scripts

Let’s assume we have a system with several thousand scripts. What kind of issues are we going to run into?

Startup time— Compilation time will be a significant factor, because the scripts will be compiled on the fly. Yes, we’re caching the compiled assembly, but the startup time can be significant in such a system.
Memory— There will almost certainly be a significant number of changes to the scripts that will have to be recompiled, loading more assemblies into memory. Because we can’t release assemblies on the CLR, we have to either accept this issue or deal with AppDomains to manage it.

Of the two, startup performance is the more worrying concern. An assembly isn’t that big most of the time, and even a hundred changed scripts would add less than half a megabyte to the application. It’s not ideal, but it’s something we can live with. The startup time can be measured in minutes if you have enough scripts.

Note

In this scenario, a large number of scripts is several hundred that need to be accessed immediately after system startup. If you have fewer scripts than that, or if not all the scripts need to be accessed shortly after system startup, you generally won’t have to worry about startup performance issues.

There are two factors that relate to the speed of compiling a set of scripts: the number of the files and the number of times the compiler is called.

In general, it’s best to invoke the compiler as few times as possible, with as many scripts as possible. This tends to produce fewer bigger assemblies, which is preferred over smaller, but more numerous, assemblies. There is also a cost for invoking the compiler itself, but this is usually only significant if you invoke it for each individual script, rather than compiling in batches.

It’s a fine balancing act. On the one hand, it’s slower to compile more files, but it’s even slower to compile each script independently. The solution chosen for Rhino DSL (and described in chapter 7) is to compile all the scripts in the target script directory (though this can be overridden if you choose). This approach compiles scripts in batches but doesn’t attempt to compile all the scripts in one go. In that way, it balances the need to keep the system responsive and overall system performance.

But even that’s not a good enough solution in some cases. There are situations where you must compile a large number of scripts quickly. You have several options in that scenario. You can live with the performance hit, perform precompilation, or perform background compilation.

12.1.2. Performing precompilation

Although I tend to refer to DSL code as “scripts” to express how flexible they usually are, there is no real reason to treat them as such. In a mostly static system, you could easily compile all the scripts as part of the build process, and be done with them.

Listing 12.1 shows how you could compile a set of scripts ahead of time. We will discuss its usage immediately.

Listing 12.1. Precompiling a directory of scripts

public class DslPreCompiler
{
    public static void PreCompile(DslEngine engine,
           string directory, string destinationFile)
    {
        var allFiles = FileHelper.GetAllFilesRecursive(directory, "*.boo");
        // Compile all the files
        var compilerContext = engine.Compile(allFiles);
        // Copy generated assembly to destination
        File.Copy(compilerContext.GeneratedAssemblyFileName,
                  destinationFile, true);
    }
}

As you can see, the DslPreCompiler takes a DSL engine and a directory of scripts, and it produces the compiled output.

But that only deals with the first part of the problem, compiling all the scripts in one go. How do you tell Rhino DSL that it should look for those scripts in the assembly? You can override the cache behavior to look at the compiled assembly file, as shown in listing 12.2.

Listing 12.2. Overriding the cache behavior

public class PrecompiledCache : IDslEngineCache
{
    private readonly Assembly assembly;

    public PrecompiledCache(Assembly assembly)
    {
        this.assembly = assembly;
    }

    public Type Get(string path)
    {
        var type = assembly.GetType(
                   Path.GetFileNameWithoutExtension(path));
        if(type!=null)
            return type;
        throw new InvalidOperationException("Could not find " + path
                        + " in the precompiled assembly");
    }

    // Other methods omitted for brevity's sake; they all
    // throw NotSupportedException
}

In this example, the precompiled cache always goes to the assembly to find its types, and the PrecompiledCache intentionally throws an error if the type isn’t found. That’s done to keep the implementation simple. A more complex implementation would also handle the case where the scripts are modified from the precompiled version.

Now all you have to do is execute the precompilation:

DslPreCompiler.PreCompile(
    new QuoteGenerationDslEngine(), scriptDirectory, "test.dll");

Then you need to let the DSL engine know about the cache, as in listing 12.3.

Listing 12.3. Letting the DSL engine know about the precompiled cache

var factory = new DslFactory();
factory.Register<QuoteGeneratorRule>(new QuoteGenerationDslEngine
{
    Cache = new PrecompiledCache(Assembly.LoadFrom("test.dll"))
});
var rule = factory.Create<QuoteGeneratorRule>(
    Path.Combine(scriptDirectory, "sample.boo")
    , new RequirementsInformation(50));

As you can see, this is pretty easy and noninvasive. All requests for a DSL instance will be satisfied from the precompiled assembly.

12.1.3. Compiling in the background

Rhino DSL caches the results of compiling a script, so all you need to do to perform background compilation is create a thread that will ask for instances of all the scripts. This will cause all the scripts to be compiled before the first request, saving you the compilation time.

Rhino DSL is thread-safe, so you can spin off a thread that would request an instance of all the scripts. A simple example is shown in listing 12.4.

Listing 12.4. Compiling all the scripts in a background thread

var factory = new DslFactory();
factory.Register<QuoteGeneratorRule>(new QuoteGenerationDslEngine());
ThreadPool.QueueUserWorkItem(delegate
{
    var allFiles = FileHelper.GetAllFilesRecursive(
                                scriptdirectory, "*.boo");
    foreach (var file in allFiles)
    {
        factory.Create<QuoteGeneratorRule>(file,
                                new RequirementsInformation(50));
    }
});

Because the scripts are cached once they’re compiled, all you have to do is request them once and, presto, they’re cached.

Note

By default, Rhino DSL caching is persistent, so it survives application restarts. Take that into account when you consider system performance.

12.1.4. Managing assembly leaks

I already mentioned that an assembly isn’t a resource that can be freed easily. If you want to unload assemblies, you must load the entire AppDomain. Assembly leaks will only be an issue if you expect a large number of changes in production between application restarts. Again, a large number, in this context, is thousands of changes occurring regularly.

This is a common issue for development, but system uptime in development is rarely long enough for this to be a problem. In production, this situation occurs much less commonly than it’s talked about, so I’ll merely identify the solutions for this problem rather than demonstrate them in detail.

The first option, as I’ve mentioned, is not dealing with the issue. This is a valid response if you don’t expect to have many changes in production. It’s what I usually recommend doing.

The second option is to move the DSL to a separate AppDomain and perform all communication with the DSL instance over an AppDomain boundary. This has its own performance issues, and it requires that all the objects you send to the DSL instance either be serializable or inherit from MarshalByRefObject. It’s not a solution that I like much.

The third option is to move much of the application itself to an AppDomain and rely on a small core to manage those AppDomains. This way, at given times, the application core could spin off a new AppDomain and unload the old one, freeing the memory. This approach is similar to the way IIS and Windows Process Activation work, and I suggest taking a look at them as well, because it’s entirely possible to make use of the existing infrastructure for this purpose.

As you can see, the technical challenges are easily overcome using well-known technical solutions, such as background compilation, caching, and precompilation.

A more complex problem isn’t strictly technical in nature: deployment. We’ve looked at the implications of editing scripts in production from the technical side, but not from the control and management sides.

12.2. Deployment—strategies for editing DSL scripts in production

Once the application is in production, how are users supposed to edit the DSL scripts? It’s likely that some important behaviors are defined in the DSL and that users will want to make modifications to them. This is a question that comes up often. Yes, the DSL is easy to change, but how should you deal with changes that affect production?

Let’s first assume we’re talking about a web application or a backend system, not a Windows application. There are several aspects to this problem. First, there is the practical matter of creating some sort of UI to allow users to make the changes. This is generally not something trivial to produce as part of the admin section of an application.

There are also many other challenges in this scenario that need to be dealt with, such as handling frequent changes, cascading updates, debugging, invasive execution, error handling, and so on. Just making sure that all the scripts are synchronized across all the nodes in a web farm can be a nontrivial task. You also have to deal with issues such as providing auditing information, identifying who did what, why, and when, and you need to be able to safely roll back a change.

In development mode, there is no issue, because you can afford to be unsafe there. The worst thing that can happen is that you’ll need to restart the application or revert to a clean checkout. For production, this isn’t an option. This is not a simple matter.

My approach, usually, is to avoid this requirement as much as possible. I don’t allow such changes to be made in production. It’s still possible, but it’s a manual process that’s there for emergency use only. Like the ability to log in to the production database and run queries, it should be avoided if possible.

But disallowing changes isn’t always possible. If the client needs the ability to edit DSL scripts in production, you need to provide a way for them to do so. What I have found to be useful is to provide a way to not work directly on production. Instead, we work on scripts stored in a source control server that is considered part of the application itself. You can see how it works in figure 12.2.

Figure 12.2. Using a source control server to hold the scripts for the system offers big benefits, such as authorization, auditing, and distribution.

If you want to access the scripts, you check them out of source control. Then you can edit them with any tool you want (often the same tools that you use during development), and finish by committing them back to the repository. The application monitors the repository and will update itself when a commit is done to the production branch.

This approach has several big advantages. First, you don’t have the problem of partial updates, you get a pretty good audit trail, and you have built-in reversibility. In addition to that, you avoid the whole problem of having to build a UI for editing the production scripts; you use the same tools that you used during development.

As a side benefit, this approach also takes care of pushing script changes to a farm, so you don’t have to provide a separate solution for that. And yes, this method of editing scripts in production incorporates continuous integration as part of the application.

12.3. Ensuring system transparency

Ensuring that the system is transparent (that it’s possible to understand what goes on in the bowels of the system) can be hard work, particularly if you don’t have a good grasp of the system’s internals. In this section, we’ll discuss the tooling that you have available to make the system transparent.

We’ll also look at a new DSL that will allow us to discuss those patterns in greater depth. Chapter 13 discusses the implementation of the DSL in detail; this is a short overview.

12.3.1. Introducing transparency to the Order-Processing DSL

The Order-Processing DSL is used to define business rules regarding (surprise!) order processing. The syntax of the Order-Processing DSL is as follows:

when someBusinessCondition:
     takeAction1
     takeAction2

Listing 12.5 shows an example of an Order-Processing script.

Listing 12.5. An example of the Order-Processing DSL

when customer.IsPreferred:
     apply_discount 5.percent()

With that knowledge, let’s take on our first challenge. Users want to know which rules took active part in the processing of a specific order. If there is a problem, that would make it much easier to track down. Users would know what rules had executed, in what order, and what the result of that execution was. With that feature in mind, let’s go about implementing it.

When I built the Order-Processing DSL, I choose to use when, rather than if, for the common conditional. That’s because when makes for a clearer distinction between business conditions and code conditions. If is a keyword, and when isn’t, but it’s possible to make a keyword out of when using a meta-method or a macro.

In most scenarios, I prefer to use a meta-method (though there is no real objective reason). Listing 12.6 shows one possible implementation of the when() meta-method.

Listing 12.6. The `when()` meta-method

[Meta]
public static MethodInvocationExpression when(
              Expression condition, BlockExpression action)
{
    var conditionBlock = new Block(condition.LexicalInfo);
    conditionBlock.Add(new ReturnStatement(condition));
    return new MethodInvocationExpression(
       // the init method
        new ReferenceExpression("Initialize"),
        // a delegate with the condition
        new BlockExpression(conditionBlock),
        // a delegate with the action
        action
    );
}

The when() meta-method decomposes the when keyword into its component parts and sends each part to the Initialize() method.

The when() meta-method shown in listing 12.5 will transform the code in listing 12.3 into the code shown in listing 12.7 (translated to C# to make it easier to understand).

Listing 12.7. The `when()` meta-method output (translated to C#)

Initialize( delegate { return customer.IsPreferred; }, delegate
{
    apply_discount(5.percent());
});

We have decomposed the code using the when keyword into two parts, the conditions and the action. Now we can implement the OrderRule implicit base class, as shown in listing 12.8.

Note

The when keyword implementation is split between the when() meta-method, which captures the appropriate information at compile time, and the Execute() method, which executes the code. The Initialize() method is called from the code generated by the when() meta-method, which is placed in the Prepare() method. We’ll walk through this in chapter 13, so don’t worry if you don’t fully understand it yet.

Listing 12.8. The `OrderRule` implementation of conditional semantics

public void Initialize (Func<bool> condition, Action action)
{
    Condition = condition;
    Action = action;
}

public void Execute()
{
    // Execute the condition
    var result = Condition();
    // Log the result of this condition

    RuleContext.AddConditionResult(this, result);
    if (result) // Execute the action if condition evaluates to true.
        Action();
}

Using this approach, we can create what is effectively an auditable if statement. The only thing we have left to do is to display the results of the execution to the user. The when() meta-method generates the code to call the Initialize() method with the condition and actions wrapped in delegates. We later execute those delegates when calling the Execute method and then record the result of the condition.

That’s not quite the end of adding transparency to a DSL. It would be nice if we could go directly from a script instance to its source. That would allow us to show the user the source of each rule, so they can understand what it does.

12.3.2. Capturing the script filename

We want to capture a script’s filename so we can display it. The question is, how? We have the filename of the script during compilation, so what we need to do is record this in the script in a manner that will allow us to access it at runtime.

One way of doing this is to pass the filename to the Initialize() method. The new implementation is shown in listing 12.9.

Listing 12.9. Recording the filename

 [Meta]
public static MethodInvocationExpression when(
              Expression condition, BlockExpression action)
{
    var conditionBlock = new Block(condition.LexicalInfo);
    conditionBlock.Add(new ReturnStatement(condition));
    return new MethodInvocationExpression(
        // The init method
        new ReferenceExpression("Initialize"),
        // A delegate with the condition
        new BlockExpression(conditionBlock),
        // A delegate with the action
        action,
        new StringLiteralExpression(condition.LexicalInfo.FileName)
    );
}

public void Initialize(Func<bool> condition, Action action,
                        string filename)
{
    Filename = filename;
    Condition = condition;
    Action = action;
}

That was pretty easy, wasn’t it? We could also go with more complex semantics, using a compiler step to add a property to all the scripts. But the solution in listing 12.9 is short and simple and it works, so there’s no reason to go there yet.

Because we now have access to the filename of the script at runtime, we can show the user which script it is, and let them immediately see what caused a particular decision. But that might be too cumbersome for some scenarios. We don’t necessarily want the entire script—we only want the condition. How do we get that?

12.3.3. Accessing the code at runtime

Let’s take a deeper look at the problem. Assume the condition in use is the code in listing 12.10.

Listing 12.10. Sample code in the Order-Processing DSL

when Order.Amount > 10:
    print "big sale!"

We’d like to give users the following information: “Because ‘Order.Amount > 10’ evaluated to true, executing the rule action.” The problem is how to get the string that represents the rule.

It turns out it’s simple to do. We ask the compiler nicely, as shown in listing 12.11.

Listing 12.11. Providing the `Initialize()` method with the condition string

 [Meta]
public static MethodInvocationExpression when(
    Expression condition, BlockExpression action)
{
    var conditionBlock = new Block(condition.LexicalInfo);
    conditionBlock.Add(new ReturnStatement(condition));
    return new MethodInvocationExpression(
        // The init method
        new ReferenceExpression("Initialize"),
        // A delegate with the condition
        new BlockExpression(conditionBlock),
        // a delegate with the action
        action,
        new StringLiteralExpression(condition.LexicalInfo.FileName),
        // Will translate the code into a string, and turn that into
        // a string Literal, so we can pass that as a parameter
        // to the When method
        new StringLiteralExpression(condition.ToCodeString())
    );
}

public void Initialize(Func<bool> condition, Action action,
                        string filename, string conditionString)
{
    Filename = filename;
    Condition = condition;
    Action = action;
    ConditionString = conditionString;
}

The important part here happens in the when() meta-method. We translate the call to the when keyword into a call to the Initialize() method. We’re passing the arguments that we got, and also a string literal with the code that was extracted from the relevant expression.

Our ability to play around with the AST has allowed us to handle this scenario gracefully. What this means is that now we can access the condition as a string at runtime by accessing the property. We can display that information to the user, log it, or store it as part of the audit process.

Strings are good if you need the information for display purposes, but if you want to process a piece of code at runtime, and you want to get access to the code object model, strings aren’t useful. For that, there’s another solution.

12.3.4. Processing the AST at runtime

We don’t have to do anything complex in order to implement the ability to inspect the compiler AST at runtime. As we saw in chapter 6, when we talked about quasi-quotation, Boo already has the facilities to take a compiler’s AST and translate that into the code that would re-create the originating AST.

This makes our task a lot easier, because we can utilize this functionality instead of trying to roll our own (been there, done that, wasn’t fun). Listing 12.12 shows what we need to do.

Revisiting serialized AST nodes

The code in listing 12.12 contains something we haven’t looked at so far: a call to CompilationHelper.RevisitSerializedExpression. What is this, and why is it needed?

Each step in the compiler pipeline adds additional information to the AST, which is required by the following steps. Meta-methods are executed fairly late in the compiler pipeline, after some of the steps required to compile serialized AST have been run.

We solve the problem of wanting to serialize AST too late in the pipeline by re-executing the steps to serialize the AST on the serialized node, which means that we can continue with the compilation process successfully.

Listing 12.12. Serializing the condition expression and passing it to `Initialize()`

 [Meta]
public static MethodInvocationExpression when(
    Expression condition, BlockExpression action)
{
    //Translate the expression to code that will recreate this expression
    //at runtime
    Expression serializedCondition =
                new CodeSerializer().Serialize(condition);
    // Revisit condition to ensure proper compilation.
    CompilationHelper.RevisitSerializedExpression(serializedCondition);

    var conditionBlock = new Block(condition.LexicalInfo);
    conditionBlock.Add(new ReturnStatement(condition));
    return new MethodInvocationExpression(
        new ReferenceExpression("Initialize"),
        new BlockExpression(conditionBlock),
        action,
        new StringLiteralExpression(condition.LexicalInfo.FileName),
        new StringLiteralExpression(condition.ToCodeString()),
        serializedCondition
        );
}

public void Initialize(Func<bool> condition, Action action,
                       string filename, string conditionString,
                       Expression conditionExpression)
{
    Filename = filename;
    Condition = condition;
    Action = action;
    ConditionString = conditionString;
    ConditionExpression = conditionExpression;
}

Using this code, we can take the script in listing 12.10 and get the results shown in figure 12.3.

Figure 12.3. Executing the code in listing 12.12 on the script in listing 12.10 allows access to the `ConditionExpression` property as an AST node at runtime.

It displays both the compiled expression and the AST that describes it. This is critically important, because you can now take this piece of AST and do transformations or views on it.

There are a lot of things that you can do with the AST once you have it. In fact, this is how LINQ works (see sidebar on the similarities between LINQ and the AST). A simple enough example is that you could take the condition AST and translate it to some graphical representation. But we covered the UI exhaustively in chapter 10; let’s explore a more interesting use for processing the AST at runtime.

12.4. Changing runtime behavior based on AST information

One of the more annoying problems when building an internal DSL is that you have to deal with code-related issues, such as the NullReferenceException.

For example, let’s say that we have the following order rule:

when Order.Amount > 10 and Customer.IsPreferred:
      ApplyDiscount 5.percent

We have a problem with this rule because we also support an anonymous-checkout mode in which a customer can create an order without registering on the site. In that mode, the Customer property is null, and trying to access the IsPreferred property will throw a NullReferenceException.

Similarities between LINQ and the AST

You might have noticed that there is a remarkable similarity between LINQ’s expression trees (Expression<T>) and the AST. Indeed, an expression tree is an AST that’s limited to expressions only. You can use the AST to do everything that you do with expression trees, and usually do it the same way.

If you have any prior knowledge of expression trees, it’ll be applicable to AST manipulation, and if you understand the Boo AST, that knowledge is transferable to working with expression trees.

We could rewrite the rule to avoid the exception, like this:

when Order.Amount > 10 and Customer is not null and Customer.IsPreferred:
      ApplyDiscount 5.percent

But I think this is extremely ugly. The business meaning of the code gets lost in the technical details. We could also decide to return a default instance of the customer when using anonymous checkout (using the Null Object pattern), but let’s look at another way to handle this.

We can define the rule as invalid when Customer isn’t there. This way, a rule shouldn’t run if it references Customer when the Customer is null. The dirty way to hack this is shown in listing 12.13.

Listing 12.13. Hacking around the `NullReferenceException`

var referencesCustomer = File.ReadAllText(ruleName).Contains("Customer");
if(referencesCustomer && Customer == null)
   return;

If you grimaced when looking at this code, that’s a good sign. Let’s solve this properly, without hacks.

First, we already have help from the compiler because we have access to the condition expression (as shown in listing 12.12). We can utilize this to make decisions at runtime. In this case, we’ll use this to detect when we’re referencing a null property and mark the rule as invalid. You can see this in listing 12.14.

Listing 12.14. Deciding whether to execute the rule based on the `Customer` property

public void Execute()
{
    var visitor = new ReferenceAggregatorVisitor();
    visitor.Visit(ConditionExpression);
    if (visitor.References.Contains("Customer") && Customer == null)
        return;// Rule invalid

    bool result = Condition(); // Execute the condition
    RuleContext.AddConditionResult(this, result);
    if (result) // Execute the action if condition evaluates to true.
        Action();
}

The ReferenceAggregatorVisitor that’s used in listing 12.14 is shown in listing 12.15.

Listing 12.15. `ReferenceAggregatorVisitor` finds references in a piece of code

public class ReferenceAggregatorVisitor : DepthFirstVisitor
{
    public IList<string> References = new List<string>();

    public override void OnReferenceExpression(ReferenceExpression node)
    {
        References.Add(node.Name);
        base.OnReferenceExpression(node);
    }
}

This is a simple example of how you can add smarts to the way your code behaves, and this technique is the foundation for a whole host of options. I use a similar approach for adaptive rules and for more complex auditable actions.

12.5. Data mining your scripts

Working with the AST doesn’t just mean dealing with compiler transformations. There is a lot of information in the AST that you can use in ways you may find surprising.

For example, suppose you have a set of DSL scripts, and you want to see what users are using the DSL for. You can try to read it all, but it’s much more interesting (and feasible) to use the compiler to do so, because this opens more options.

Take a look at the DumpExpressionsToDatabaseVisitor in listing 12.16. It extracts all the information from a script, breaks it down to the expression and statement level, and puts it in a database.

Listing 12.16. Extracting expressions from a script for use in data mining

public class DumpExpressionsToDatabaseVisitor
     : DepthFirstVisitor
{
    readonly string connectionString;

    public DataMiningVisitor(string connectionString)
    {
        this.connectionString = connectionString;
    }

    public override bool Visit(Node node)
    {
        using (var con = new SqlConnection(connectionString))
        using (var command = con.CreateCommand())
        {
            con.Open();
            command.CommandText = @"
            INSERT INTO Expressions (Expression, File)
            VALUES(@Expr,@File)";
            command.Parameters.AddWithValue("@Expr", node.ToString());
            command.Parameters.AddWithValue("@File",
                    node.LexicalInfo.File);
            command.ExecuteNonQuery();
        }
        Console.WriteLine(node);
        return base.Visit(node);
    }
}

You can make use of the DumpExpressionsToDatabaseVisitor by executing it over all your scripts, as shown in listing 12.17.

Listing 12.17. Extracting expression information from all scripts in a directory

foreach (var file in Directory.GetFiles(dsl, "*.boo"))
{
    var compileUnit = BooParser.ParseFile(file);
    new DumpExpressionsToDatabaseVisitor (connectionString)
           .Visit(compileUnit);
}

Note that this is disposable code, written for a single purpose and with the intention of being thrown out after it’s used. But why am I suggesting this?

Well, what would happen if you ran this code on your entire DSL code base and started applying metrics to it? You could query your code structure using SQL, like this:

select count(*), Expression from Expressions
group by Expression
order by count(*) desc

This will find all the repeated idioms in your DSL, which will give you a good idea about where you could help your users by giving them better ways of expressing themselves.

For example, let’s say you found that this expression was repeated many times:

user.IsPreferred and order.Total > 500 and 
    (order.PaymentMethod is Cash or not user.IsHighRisk)

This is a good indication that a business concept is waiting to be discovered here. You could turn that into a part of your language, with something like this:

IsGoodDealForVendor

Here we aren’t interested in the usual code-quality metrics; we’re interested in business-quality metrics. Getting this information is easy, and it’ll ensure that you can respond and modify the language based on user actions and input.

12.6. Creating DSLs that span multiple files

It’s common to think about each DSL script independently, but this is a mistake. We need to consider the environment in which the DSL lives. In the same way that we rarely consider code files independently, we shouldn’t consider scripts independently.

For example, in the Message-Routing DSL we might have a rule like this:

priority 10
when msg is NewOrder and msg.Amount > 10000:
    dispatch_to "orders@strategic"

While building our DSL our focus might be mainly on the actual language and syntax we want. But we may need to perform additional actions, rather than just dispatch the message, such as logging all strategic messages. As you can see, we can easily add this to the DSL:

priority 10
when msg is NewOrder and msg.Amount > 10000:
     log "strategic messages" , msg
     dispatch_to "orders@strategic"

But this is a violation of the single responsibility principle, and we care about such things with a DSL just as much as we do with code. So we could leave the original snippet alone and add another script:

when destination.Contains("strategic"):
    log "strategic messages", msg

Now the behavior of the system is split across several files, and it’s the responsibility of the DSL engine to deal with this appropriately. One way to arrange this would be to use the folder structure shown in figure 12.4.

Figure 12.4. A suggested folder structure for the Message-Routing DSL

The DSL engine can tell from the information in the message that it needs to execute only the routing rules in /routes/orders, and it can execute the after actions without getting the routing scripts tied to different concerns.

If you want to be a stickler, we’re dealing with two dialects that are bound to the same DSL engine. In this case, they’re similar to one another, but that doesn’t have to be the case.

Multifile DSLs don’t have to combine the execution of several scripts for the different dialects; they could also be used to execute different scripts in the same dialect. Consider the possible folder structure shown in figure 12.5.

Figure 12.5. A convention-based folder structure for the Message-Routing DSL

In this case, the DSL is based on the idea of executing things in reverse depth order. When a message arrives, we try to match it to the deepest scope possible (in this case, handling strategic customers), and we go up until we reach the root.

Nevertheless, this is still just another way of bringing several scripts of the same DSL together, albeit in a fairly interesting way. In section 12.8, we’ll deal with a single DSL that’s built of several files, each of them belonging to a different DSL implementation. We still have a bit to cover before going there, though.

For now, remember that when you’re designing and building a DSL, thinking about a single file is the easiest way to go, but you ought to consider the entire environment when you make decisions. A language that doesn’t support separation of concerns (the process of separating a computer program into distinct features that overlap in functionality as little as possible, discussed in detail at http://en.wikipedia.org/wiki/Separation_of_concerns) is bound to get extremely brittle quickly.

12.7. Creating DSLs that span multiple languages

Another common misconception about DSLs is that you can only have a single language in a DSL. We’ve already looked at including several dialects of a language in a DSL for versioning purposes, but that’s not what I’m talking about here. What I have in mind is a single DSL composed of several different languages, each with its own purpose. Each language implementation contributes toward the larger collective DSL.

Conceptually, those different languages compose a single DSL with a single language that has different syntax for different parts of the system. From the implementation perspective, we are talking about different DSLs with a coordinator that understands how they all work and how to combine them.

As a simple example, let’s consider the Message-Routing DSL again. Currently it is used for two separate purposes: message routing, and translating the message from an external representation to its internal one. This works for now, but it’s likely to cause problems when the system grows. How will we handle a single message arriving at multiple endpoints and needing to be dispatched to several endpoints, but having to go through the same transformation?

A good solution would be to split the functionality of translating messages from that of the message routing. We’d have a DSL for translating messages, and another for routing messages, and both would work in concert to achieve their goals.

I know that this is a bit abstract, but the next section will go into all the implementation details you could want.

12.8. Creating user-extensible languages

Developers aren’t the only ones who can extend languages. You can build a language that allows users to extend it without requiring any complex knowledge on their part.

Once you release a language into the hands of your users, it will take a while before you can release a new version. That means that if users don’t have a way to solve a problem right now, they will brute force a solution. By giving users the ability to extend the language themselves (and remember, we’re talking about business users here, not developers), you can reduce the complexity that will show up in your DSL scripts (reducing the maintenance overhead).

12.8.1. The basics of user-extensible languages

Suppose we wanted to build a DSL to handle order management. Here’s a typical scenario:

when is_preferred_user and order_amount > 500:
    apply_discount 5.percent

This rule sounds reasonable, right? Except that it isn’t a good example of a business rule. A business rule usually has a lot more complexity to it.

Listing 12.18 shows a more realistic example.

Listing 12.18. A typical order-management business rule

when user.payment_method is credit_card and 
    ((order_amount > 500 and order_amount < 1200)
     or number_of_payments < 4) and 
     user.is_member_of("weekend buy club") 
     and Now.DayOfWeek in (DayOfWeek.Sunday, DayOfWeek.Saturday)
    and applied_discounts < 10:
    apply_discount 5.percent

At a glance, it’s hard to understand exactly what this rule does. This is a good example of a stagnant DSL. It’s no longer being actively developed (easily seen by the use of framework terms such as DayOfWeek, which in most cases you’ll want to abstract away), and the complexity is growing unchecked.

Usually, this happens when the DSL design has not taken into account new functionality, or when it relies on the DSL developers to extend the language when needed. Because it requires developer involvement, it’s often easier for users to solve problems by creating complex conditionals rather than expressing the business logic in a readable way.

A good way to avoid additional complexity and reduced abstraction is to incorporate known best practices from the development side in your DSLs, such as allowing encapsulation and extensibility.

As a simple example, we could allow the user to define conditions, as shown in listing 12.19. Those conditions allow users to define their own abstraction, which then becomes part of the language.

Listing 12.19. Defining business conditions in a user-editable script

define weekend_club_member:
    user.is_member_of("weekend club member")

define sale_on_weekend:
    Now.DayOfWeek in (DayOfWeek.Sunday, DayOfWeek.Saturday)

define good_payment_option: # Yes, I know, it is a bad name, sorry
    ((order_amount > 500 and order_amount < 1200) 
         or number_of_payments < 4)

Now the condition becomes much simpler, as shown in listing 12.20.

Listing 12.20. With user-defined abstractions, the rule becomes much simpler

when user.payment_method is credit_card and good_payment_option 
            and sale_on_weekend and weekend_club_member and 
            applied_discounts < 10:
    apply_discount 5.percent

This is important, because it avoids language rot and allows the end user to add abstraction levels. But how can we implement this?

Build the facilities for abstractions

If you want to create a language that’s both usable and maintainable, you have to give users the facilities to create and use abstractions. This is a common theme in this chapter.

Just because the code that users write is a DSL script doesn’t mean that good design guidelines should be ignored. The separation of concerns principle and the DRY (Don’t Repeat Yourself) principle are important even here.

We can ignore them to some degree when we’re building our languages, because with languages that aren’t aimed at developers, too much abstraction can cause the language to be inaccessible, but we can’t truly disregard them.

We’ll do it by integrating another DSL into our DSL. You’ve already seen that DSL (the Business-Condition DSL) in listing 12.19—it’s focused on capturing business conditions, and it works alongside the Order-Processing DSL to give users a good way of capturing common conditions and abstracting them.

And now, let’s get into the implementation details.

12.8.2. Creating the Business-Condition DSL

The first thing you need to know about the Business-Condition DSL is that it’s hardly a DSL. In fact, we’ll cheat our way out of implementing it.

The problem is simple. If we wanted to make this a full-blown DSL, we’d have to deal with quite a bit of complexity in making sure that we map the concepts from the Business-Condition DSL to the Order-Processing DSL. Instead, we’re going to simply capture the business condition and transplant it whole into the Order-Processing DSL.

Listing 12.21 shows how we can extract the business condition from a set of definition files that follow the format shown in listing 12.18.

Listing 12.21. The Business-Condition DSL extracts definitions from the file

public class Define
{
    private Expression expression;

    public Expression Expression
    {
        get { return expression == null ? null : expression.CloneNode(); }
        set { expression = value == null ? null : value.CloneNode(); }
    }

    public string Name {get;set; }
}

public class BusinessConditionDslEngine
{
    private readonly string path;
    public BusinessConditionDslEngine(string path)
    {
        this.path = path;
    }

    public IEnumerable<Define> GetAllDefines()
    {
        foreach (var definition in
            FileHelper.GetAllFilesRecursive(path, "*.define"))
        {
            var compileUnit = BooParser.ParseFile(definition);

            foreach (MacroStatement defineStatement in
                compileUnit.Modules[0].Globals.Statements)
            {
                string name = ((ReferenceExpression)
                               defineStatement.Arguments[0]).Name;
                var statement = ((ExpressionStatement)
                                 defineStatement.Block.Statements[0]);
                Expression expression = statement.Expression;

                yield return new Define
                {
                    Name = name,
                    Expression = expression
                };
            }
        }
    }
}

What is going on here? The Define class is trivial, but it has a subtle gotcha that you should be aware of. You must not share an AST node (expression or statement) if you intend to reuse it afterward. AST nodes are mutable, and the compilation process will change them. When you need to share an AST node, always share a copy of the node.

The code in the GetAllDefines() method in listing 12.21 is a lot more interesting, though. Here we use the Boo parser to get the AST from the file, and then we walk through the AST and extract the information that we want.

The AST is structured as a compilation unit, containing a list of modules, each mapping to a single input file. A module is composed of namespace imports, type definitions, and globals; all the free-floating statements end up in the globals section in the AST. Because we only passed a single file, we have only a single module.

Listing 12.21 assumes that the file is composed of only macro statements (there is no error handling in the code, because we also assume correct input). For each of those statements, we get the define name (which is the first argument) and the single expression inside the macro block. We stuff them into the define class, and we’re done.

Why is define a macro?

We haven’t created a DefineMacro class inheriting from an AbstractAstMacro, so why does the compiler think that this is a macro?

The short answer is that the compiler doesn’t; the compiler isn’t even involved. It’s the Boo parser we’re using here, and the parser doesn’t care about such things. The parser contains a rule for recognizing things that look like macro statements.

When the parser finds something that looks like a macro statement, it creates an instance of the MacroStatement AST node. Later in the compiler pipeline, there is a step (MacroExpander) that will look at the macro statement and decide whether there is a matching macro implementation.

Because we never involve the compiler pipeline in listing 12.21, we can directly access the AST and rely on the parser to do this work for us.

Once we have those definitions, plugging them into the Order-Processing DSL is easy. Listing 12.22 shows the important details.

Listing 12.22. Finding references to definitions and replacing them with values

public class ReplaceDefinitionsWithExpression :
    AbstractTransformerCompilerStep
{
    private readonly Define[] defines;

    public ReplaceDefinitionsWithExpression(Define[] defines)
    {
        this.defines = defines;
    }

    public override void Run()
    {
        Visit(CompileUnit);
    }

    public override void OnReferenceExpression(ReferenceExpression node)
    {
        Define define = defines.FirstOrDefault(x => x.Name == node.Name);
        if (define != null)
        {
            ReplaceCurrentNode(define.Expression);
        }
    }
}

This is a pretty straightforward implementation. This compiler step will search for all reference expressions (the names of things, such as variables, methods, types, and so on). If it finds a reference expression whose name matches the name of a definition, it will replace the reference expression with the definition expression.

For example, given the following definition,

define strategic_order:
    Order.Amount > 1000

and the following order rule,

when strategic_order:
    print "big sale!"

the strategic_order reference expression will be replaced with Order.Amount > 1000.

Now, all we have left to do is to plug ReplaceDefinitionsWithExpression into the DSL engine. Listing 12.23 shows how this is done.

Listing 12.23. `OrderRuleDslEngine` modified to understand definitions

public class OrderRuleDslEngine : DslEngine
{
    private readonly Define[] defines;

    public OrderRuleDslEngine(Define[] defines)
    {
        this.defines = defines;
    }

    protected override void CustomizeCompiler(BooCompiler compiler,
        CompilerPipeline pipeline, string[] urls)
    {
        compiler.Parameters.References.Add(typeof(BooCompiler).Assembly);
        pipeline.Insert(1,
                        new ImplicitBaseClassCompilerStep(
                            typeof(OrderRule),
                            "Prepare"));
        pipeline.Insert(2, new ReplaceDefinitionsWithExpression(defines));
    }
}

The OrderRuleDSLEngine class accepts the definitions in the constructor and registers the ReplaceDefinitionsWithExpression compiler step as the second step in the pipeline. That’s all we need to do. Because the step is one of the first to run, as far as the compiler is concerned, we’re compiling this:

when Order.Amount > 1000:
    print "big sale!"

Listing 12.24 brings it all together.

Listing 12.24. Adding `defines` to a Quote-Generation script

DslFactory factory = new DslFactory();
var defines = new BusinessConditionDslEngine("Defines").GetAllDefines();
factory.Register<OrderRule>(new OrderRuleDslEngine(defines));
var rule = factory.Create<OrderRule>("Script.boo");

Now, users can introduce new keywords and concepts into the language themselves, without requiring developers to modify it for them.

The difference between Business-Condition DSL defines and C or C++ defines

If you have ever worked with C or C++, you’re probably familiar with the #define statement, which allows you to do pretty much what I have outlined in section 12.8.2.

The main difference between the two approaches is that in the C/C++ approach, defines use text substitution, which exposes a whole host of problems. You only have to remember the guidance about proper use of parentheses in defines to realize that.

The approach outlined here uses AST substitution, which means that it’s far more robust, easy to extend, and safer to use.

In one project where I introduced this approach, I also allowed users to extract a business condition from a rule and automatically refactor. Not only did it create the appropriate definition file, but it also scanned all the other existing rules and modified them if they had the same extracted rule. (If you want to try this yourself, consider using Node.Matches() to compare the different nodes.)

12.9. Summary

In previous chapters, we focused on the baseline knowledge, on building simple DSLs in Boo, and on understanding the AST and how DSLs fit in with the development lifecycle. In this chapter, we finally started to bring it all together.

We looked at scaling a DSL up (in terms of numbers of scripts and usage), managing deployment, and ensuring that we have sufficient control over how we deploy new DSL scripts to production.

We also focused on AST manipulation, and how we can use that to get all sorts of interesting results from our DSL. I am particularly fond of the ability to capture and manipulate the AST at runtime, because that gives us almost limitless power.

Finally, we touched on refactoring and giving users the ability to build their own abstractions. I find this to be an essential step for giving users a rich language that can evolve over time with minimal developer effort. It also helps keep language rot from setting in. (If users have to call a developer to get a change, that change will often either not be made or be made late, necessitating awkward workarounds.) We’ll talk more about abstractions and how to ensure that we build the right ones in chapter 13.

This chapter has been a whirlwind of various implementation details and ways to utilize them. At the moment, it may seem hard to tie all of them together, and, indeed, they’re used in different stages of the project and for different reasons. I hope that the next chapter, in which we build a full-blown DSL implementation from scratch, will clear things up. Let’s begin ...

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12. DSL implementation challenges

Create new playlist

Sign In

Sign Up

Chapter 12. DSL implementation challenges

Note

12.1. Scaling DSL usage

Figure 12.1. What it takes to scale a DSL

12.1.1. Technical—managing large numbers of scripts

Note

12.1.2. Performing precompilation

Listing 12.1. Precompiling a directory of scripts

Listing 12.2. Overriding the cache behavior

Listing 12.3. Letting the DSL engine know about the precompiled cache

12.1.3. Compiling in the background

Listing 12.4. Compiling all the scripts in a background thread

Note

12.1.4. Managing assembly leaks

12.2. Deployment—strategies for editing DSL scripts in production

Figure 12.2. Using a source control server to hold the scripts for the system offers big benefits, such as authorization, auditing, and distribution.

12.3. Ensuring system transparency

12.3.1. Introducing transparency to the Order-Processing DSL

Listing 12.5. An example of the Order-Processing DSL

Listing 12.6. The when() meta-method

Listing 12.7. The when() meta-method output (translated to C#)

Note

Listing 12.8. The OrderRule implementation of conditional semantics

12.3.2. Capturing the script filename

Listing 12.9. Recording the filename

12.3.3. Accessing the code at runtime

Listing 12.10. Sample code in the Order-Processing DSL

Listing 12.11. Providing the Initialize() method with the condition string

12.3.4. Processing the AST at runtime

Revisiting serialized AST nodes

Listing 12.12. Serializing the condition expression and passing it to Initialize()

Figure 12.3. Executing the code in listing 12.12 on the script in listing 12.10 allows access to the ConditionExpression property as an AST node at runtime.

12.4. Changing runtime behavior based on AST information

Similarities between LINQ and the AST

Listing 12.13. Hacking around the NullReferenceException

Listing 12.14. Deciding whether to execute the rule based on the Customer property

Listing 12.15. ReferenceAggregatorVisitor finds references in a piece of code

12.5. Data mining your scripts

Listing 12.16. Extracting expressions from a script for use in data mining

Listing 12.17. Extracting expression information from all scripts in a directory

12.6. Creating DSLs that span multiple files

Figure 12.4. A suggested folder structure for the Message-Routing DSL

Figure 12.5. A convention-based folder structure for the Message-Routing DSL

12.7. Creating DSLs that span multiple languages

12.8. Creating user-extensible languages

12.8.1. The basics of user-extensible languages

Listing 12.18. A typical order-management business rule

Listing 12.19. Defining business conditions in a user-editable script

Listing 12.20. With user-defined abstractions, the rule becomes much simpler

Build the facilities for abstractions

12.8.2. Creating the Business-Condition DSL

Listing 12.21. The Business-Condition DSL extracts definitions from the file

Why is define a macro?

Listing 12.22. Finding references to definitions and replacing them with values

Listing 12.23. OrderRuleDslEngine modified to understand definitions

Listing 12.24. Adding defines to a Quote-Generation script

The difference between Business-Condition DSL defines and C or C++ defines

12.9. Summary

Table of Contents for
Chapter 12. DSL implementation challenges

Listing 12.6. The `when()` meta-method

Listing 12.7. The `when()` meta-method output (translated to C#)

Listing 12.8. The `OrderRule` implementation of conditional semantics

Listing 12.11. Providing the `Initialize()` method with the condition string

Listing 12.12. Serializing the condition expression and passing it to `Initialize()`

Figure 12.3. Executing the code in listing 12.12 on the script in listing 12.10 allows access to the `ConditionExpression` property as an AST node at runtime.

Listing 12.13. Hacking around the `NullReferenceException`

Listing 12.14. Deciding whether to execute the rule based on the `Customer` property

Listing 12.15. `ReferenceAggregatorVisitor` finds references in a piece of code

Why is `define` a macro?

Listing 12.23. `OrderRuleDslEngine` modified to understand definitions

Listing 12.24. Adding `defines` to a Quote-Generation script