Chapter 55. Model-Aware Generation

Generate code with an explicit simulacrum of the semantic model of the DSL, so that the generated code has generic-specific separation.

image

When you generate code, you embed within that code the semantics of the DSL script. By using a Model-Aware Generation, you replicate some form of the Semantic Model in the generated code in order to preserve the separation of generic and specific code within the generated code.

55.1 How It Works

The most important aspect of Model-Aware Generation is that it preserves the principle of generic-specific separation. The actual form that the model takes in the generated code is much less important, which is why I like to say that the generated code contains a simulacrum of the Semantic Model.

It’s a simulacrum model for many reasons. Usually, you are generating code because of limitations in the target environment—these limitation often make it harder to express a Semantic Model than you would like. As a result, lots of compromises will need to be made, which makes the Semantic Model less effective as a statement of the intent of the system. However, it’s important to realize that this isn’t such a big deal as long as you keep the generic-specific separation.

Since the simulacrum model is a self-standing version of the Semantic Model, you can, and should, build and test the model without using any code generation. Ensure the model has a simple API to populate it. The code generation will then generate configuration code that calls this API. You can then test the simulacrum model using testing scripts that use this same API. This allows you to build, test, and refine the core behavior of the target environment with running the code-generation process. You can do this with a relatively simple test population of the model, which should be easier to understand and debug.

55.2 When to Use It

Using a Model-Aware Generation has many advantages compared to using Model Ignorant Generation. The simulacrum model without generation is easier to build and test, because you don’t have to rerun and comprehend code generation while working on the simulacrum model. Since the generated code is now made up of API calls on the simulacrum model, that code is much easier to generate, which makes the generator simpler to build and maintain.

The main reason to not use Model-Aware Generation is due to limitations in the target environment. Either it’s too hard to express even a simulacrum model, or there are performance problems with having a simulacrum model at runtime.

In many cases, you are using DSLs as a front end to an existing model. If you are generating code to work with the model, then you are using Model-Aware Generation.

55.3 Secret Panel State Machine (C)

For an example of Model-Aware Generation, I’ll turn again to the secret panel state machine that I started this book with. I’m now imagining a situation where we’ve run out of Java-enabled toasters to run our security system, and our new batch are only programmable in C. As a result, we need to generate the C code from the existing Java semantic model.

In this writeup, I won’t talk about actually generating the code; for that, take a look at the example in Transformer Generation. Here I’ll concentrate on what the final code, both generated and handwritten, might look like with a Model-Aware Generation.

There are many ways you can implement a model like this in C. Essentially, I’m doing it as a data structure plus routines that navigate over this data structure in order to produce the behavior we need. Each physical controller only controls a single device, so we can store the data structure as static data. I shall also avoid heap allocations and allocate all the memory I need from the beginning.

I’ve built the data structure as a set of nested records and arrays. At the top of the structure is a controller.

image

You’ll notice that I represent the current state as an integer. As you’ll see, I use integer references in the simulacrum model to represent all the various links between different parts of the model.

The state machine has arrays for states, events, and commands.

image

The sizes of the various arrays are set through macro defines.

image

Events and commands have their name and code.

image

The state struct holds actions and transitions. Actions are integers corresponding to the commands, while transitions are pairs of integers for the trigger event and target state.

image

Many C programmers would prefer to use pointer arithmetic rather than array indices to navigate through the array structures, but I’d rather avoid inflicting pointer arithmetic on my non-C readers (not to mention myself, as my C was never very good even before it got rusty). There is a broader point here. I believe that generated code should be readable even if it isn’t edited, because it will often be used for debugging. To make it readable, you have to understand your target audience, such as who is doing the debugging. To use this example, even if you as a generator writer are comfortable with pointer arithmetic, you should be wary of using it in the generated code if the people reading that code aren’t comfortable.

To finish off the data structure, I declare the state machine and the controller as static variables, which means there are only one of them.

static stateMachine machine;
static Controller controller;

All of these data definitions are done within a single .c file. This way, I can encapsulate the data structure behind a bunch of externally declared functions. The specific code only knows about these functions and is, rightly, ignorant about the data structure itself. In this case, ignorance is truly bliss.

When I initialize the state machine, I put zero bytes into the first character of the string record, effectively making them blank.

image

To declare a new event, I look for the first blank event and insert the data there.

image

assert_error is a macro that checks the condition and, if it’s false, calls an error function with the message.

#define assert_error(test, message)
do { if (!(test)) sm_error(#message); } while (0)

Note that I’ve wrapped the macro in a do-while block. It looks odd, but prevents awkward interactions if the macro is used inside an if statement.

Commands are declared in the same way, so I’ll skip that code.

States are declared through a number of functions. The first one just declares the name of the state.

image

Declaring the actions and transitions is a bit more complicated, as we have to look up the ID of the action based on the name. Here’s the actions:

image

The transitions are similar.

image

I can now use these declaration functions to define a complete state machine—in this case, the familiar one for Miss Grant.

image

image

This population code is the code that would be generated by a code generator (see “Secret Panel Controller (Java generating C),” p. 535).

I should now show the code that makes the state machine work. In this case, this is the function that’s called to handle an event with a given event code.

image

So that’s the working state machine model. There are a few points to note about it. First, the data structure is somewhat primitive, as it involves walking through an array to look up the various codes and names. In defining the state machine, this is probably no big deal, but in running the machine we might be better off replacing the linear search with a hash function. Since the state machine is well encapsulated, this is easy to do, so I’ll leave it as an exercise for the reader. Changing such implementation details of the model doesn’t affect the interface of the configuration functions that define new state machines. This is an important encapsulation.

The model does not include any notion of reset events. The various reset events that are defined through the DSL scripts and the Java semantic model are just turned into extra transitions in the C state machine. This makes running the state machine simpler, and is an example of a typical tradeoff where I prefer simplicity of operation to clearly stating intent. For the true Semantic Model, I prefer to keep as much intent as I can, but for a model in a generated target environment I value capturing intent a little less.

I could go further in simplifying the executing state machine by removing all the names for events, commands, and states. These names are only used while configuring the machine and aren’t used at all during the execution. So I could use some lookup tables that I discard once the machine is fully defined. Indeed the declaration functions could just use integers, something like declare_action(1,2);. While this isn’t anywhere near as readable, you can argue that it matters less as this code is generated anyway. In these situations I’m inclined to keep the names, as I prefer even generated code to be readable, but more importantly it allows the state machine to produce more useful diagnostics when things go wrong. I’d sacrifice this, however, if space was really tight in the target environment.

55.4 Loading the State Machine Dynamically (C)

Generating code in C in the above example means that to set up a new state machine, we need to recompile. Using a Model-Aware Generation also allows us to build state machines at runtime, by driving the code generation through another file.

In this case, I can express the behavior of a particular state machine through a text file such as this:

image

I can generate this file from the Java Semantic Model.

image

To run the state machine, I can easily interpret config_machine using Delimiter-Directed Translation with the simple string-processing functions built into the standard C library.

The overall function to build the machine works by just opening the file and interpreting each line as it goes.

image

The standard C function strtok allows me to break a string into tokens separated by whitespace. I can pull the first token and then dispatch to a specific function to interpret that kind of line.

image

Each specific function pulls the necessary tokens and calls the static declare functions that I defined in the previous example. I’ll just show events, as all the others look pretty much the same.

image

(Repeated calls to strtok with a NULL first argument pull further tokens from the same string as the previous call to strtok.)

I don’t consider this textual format a DSL, as I designed it to make it easy to interpret, not for readability by humans. It’s useful to have a certain amount of human readability—such as using the names of states, events, and commands—as that helps in debugging. Still, in this case human readability was a distant second to ease of interpretation.

The point of this example is to illustrate that code generation for a static target language does not mean you cannot use runtime interpretation. By using Model-Aware Generation, I can compile just the generic state machine model together with a very simple interpreter. My code generator then just generates the text file to be interpreted. This allows me to use C for my controllers, but without having to recompile to make a change in the state machine. By generating a file that’s designed for ease of interpretation in the environment I have available, I can minimize the cost of the interpreter. I could, of course, go a step further and put the full DSL processor in C—but this would raise the processing demands of the C system and require more involved C programming. Depending on the particular situation, that may be a viable option, and we would no longer be in the world of Model-Aware Generation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.252.204