Chapter 44. Class Symbol Table

Use a class and its fields to implement a symbol table in order to support type-aware autocompletion in a statically typed language.

image

Modern IDEs provide lots of powerful and compelling features to make programming easier. A particularly useful one is type-aware autocompletion. In my C# and Java IDEs, I can type the name of a variable, type the period, and get a list of all the methods that are defined on that object. Even people like me who enjoy dynamically typed languages have to admit that this is a benefit of statically typed languages. When working in an internal DSL, you don’t want to give up this capability for typing the name of a symbol in the DSL. However, the most common ways of expressing DSL symbols are to use strings or a built-in symbol type—so there’s no relevant type information.

Class Symbol Table allows you to make symbols statically typed in the host language by defining each symbol as a field in a Expression Builder.

44.1 How It Works

The base of making this work is to write your DSL script inside a single Expression Builder class. This builder will usually be a subclass of a more general Expression Builder where you can place the behavior needed for all your scripts. The script’s Expression Builder will then consist of a method for the script itself and fields for the symbols. So, if you have tasks in your DSL and need to define three of them in your script, you’ll have a field declaration like this:

Tasks drinkCoffee, makeCoffee, wash;

A class named Tasks is, like so many things in DSL processing, an unconventional name. Again, the readability of the DSL is trumping my usual code style rules. By defining fields like this, I can now refer to them in the DSL script as fields; also, the IDE will offer autocompletion for them, and the compiler will check them.

Just defining the fields, however, is not enough. When I refer to a field in the DSL script, it refers to the contents of the field, not the field definition. While I’m writing code, the IDE knows about both; but when I run the program, the link to definition of the field disappears, leaving me with only the field contents. In normal life, this isn’t a problem, but to make our Class Symbol Table we need a link to the field definition at runtime.

We can provide this by populating each field with a suitable object before the script is executed. A good way to do this is to use the class instance as the active script—put code in the constructor to populate the fields and the script inside an instance method. The contents of the fields are usually small Expression Builders that link to the underlying Semantic Model object and also contain the field name to help with cross-referencing. In terms of a Symbol Table, the field name acts as the key and the builder acts as the value; but occasionally, you will need another kind of key access, which is why it’s handy for the builders in the field to keep the field name.

The DSL script will usually refer to the field by the field literal itself—which is the whole point. To refer to the wash task, I can just type the wash field name in the DSL script. However, as we’re processing the DSL script, we’ll need the builders in the fields to refer to each other. This will sometimes involve looking up fields by name, or iterating through all fields of a certain type. Doing this will require more tricky code, usually using reflection. Usually there’s not too much of it and, provided it’s well encapsulated, it shouldn’t make the language too difficult to process.

44.2 When to Use It

The primary consequence of using Class Symbol Table is that it provides full static typing of all the DSL language elements. The big benefit this gives us is that it allows IDEs to use all the sophisticated tools based on static typing—such as type-aware autocompletion. It also provides compile-time type checking on the DSL script, which matters a lot to many people (but rather less to me).

With such a focus on IDE capabilities, I see this technique as much less useful if you don’t have an IDE that takes advantage of static types. It also does not bring much benefit in a dynamically typed language.

The downside of this technique is that you have to bend your DSL significantly to fit within the type system. The resulting builder classes look very odd; also, you have to put your DSL scripts in a place where they can take advantage of these facilities, such as all in the same class. These restrictions may make the DSL harder to read and use.

So for me, the fundamental tradeoff is between the restrictions on the DSL script and the benefits of the IDE support. I’ve got rather dependent on good IDE support in languages where it’s available, which would prompt me to use techniques like this to get it.

If you want this kind of static type support, you can often get what you need by using enums as symbols (see Symbol Table for an example of this).

44.3 Statically Typed Class Symbol Table (Java)

I used Class Symbol Table for the Java example in the introduction, so that seems like a good example to show how this works.

The DSL script is in a specific class.

image

image

The DSL script is housed in its own class. The script itself is in one method, and the fields of the class represent the symbol table. I’ve set things up so the DSL script class is a subclass of a builder—this way I can have the superclass builder control the way the script is run. (Using a subclass like this also allows me to use Object Scoping, although I don’t need it here.)

image

I define the public method to run the script on the superclass; it executes the code to set up the Class Symbol Table fields before running the script. In this case, running the DSL script performs a basic preparation of the information for the state machine, and a second pass actually produces the Semantic Model objects. So, running a script has three distinct stages: initializing the identifiers (generic), running the DSL script (specific), and finally producing the model state machine (generic).

I need the first step of initializing the identifiers, since any reference to a field in the DSL script refers to the contents of the field rather than the field itself. In this case, the suitable objects are specific identifier objects that hold the name of the identifier and refer to the underlying model object. Doing this ends up being a bit more messy than I’d like, as I want to write generic code for setting up the identifiers to avoid duplicating setup code. However, any generic code doesn’t know about the specific type of the identifier being set up, and so has to determine it dynamically.

Hopefully, this will become a little clearer when we look at an example—in this case, the event builder class (Events). The first thing to discuss is the name of the class. Any style book on object-oriented programming will wisely tell you to avoid plural class names, and I agree with that advice. However, here a plural name reads better in the context of the DSL, so this is another case of general coding rules being broken to make a good DSL script. The DSL naming doesn’t alter the fact that it is truly a builder of events, so I’ll refer to it as the event builder class in my text (and similarly for its siblings).

The event builder extends a general identifier class.

image

There is a simple division of responsibility here, with the identifier class carrying the responsibilities needed for all identifiers, and the subclasses carrying what’s needed for specific types.

Let’s look at the first step of running the script—initializing the identifiers. Since many identifier classes need to be initialized, I have some generic code to do that. This way I can provide a list of classes which are identifiers, and the code will initialize all fields of those classes.

image

Doing it this way is more tricky than I like, but it avoids having to write duplicate initializing methods. Essentially, I look through every field on the DSL script object, and if the type of the field is one of those I’ve passed in, I initialize it with a special static utility method that finds and calls the right constructor. As a result, once I’ve called initializeIdentifiers, I have all of these fields populated with objects that will help me construct the state machine.

The next step is to execute the DSL script itself. The DSL script executes by building up suitable intermediate objects to capture all the information about the state machine.

The first step is defining the codes for the events and commands.

image

Since the code has all the information I need to create a model event object, I can create it on calling code and put it inside the identifier (the command builder looks just the same).

The event and command builders are degenerately simple Expression Builders. The state builder is a bit more of a builder, as it needs several steps.

Since a state model object isn’t immutable, I can create it in the constructor.

image

The first building behavior I’ll show is creating the actions. The basic behavior here is simple—I go through the supplied command identifiers and store them in the state builder.

image

If the DSL script always defines the codes before it defines the states (as I’ve done here), I could save myself the need to store command builders in the state builder and instead put the model command objects into the model state object. However, this would lead to errors if I define a state before its action codes. Using the builder as an intermediate object allows me to work it either way.

There is a bit of trickiness here. The DSL makes the assumption that the first mentioned state is the start state. As a result, I have to check, whenever I begin defining a state, if this is the first state I define, and if so make it the start state. Since it’s only the overall state machine builder that can really tell if a state is the first one to be defined, I want the machine builder to make the decision about whether to set a state as first.

image

The state builder does need to call the machine builder to tell it that it’s being defined, but it shouldn’t know what the machine builder is going to do with that information, as that’s the secret of the machine builder. So I make what is effectively an event notification call from the state builder (since that is all it knows) and let the machine builder decide what to do on that event. This is a good example of naming being important in communicating what I think the responsibilities and relative knowledge of the objects should be.

The other thing we can do with a state builder is to define a transition. As this requires a couple of steps, it’s a dash more complicated. I begin with the transition method, which creates a separate transition builder object.

image

Since I don’t need to mention the transition builder’s type in the DSL script, I can give it a more meaningful name. Its only builder method is the to clause, which adds itself to the source state builder’s list of transition builders.

image

These are the elements I need to capture all the specific information in the DSL script. When the script is run, I have a data structure of intermediate data: The builders are captured in the fields of the DSL script object itself. I now need to run through this structure to produce a fully wired up model state machine.

image

Most of the work here is going through all the state builders, getting them to produce their wired-up model objects. To find all these states, I need to get all the objects out of the fields of the script class, so again I use some reflective trickery to find all fields of the state builder’s type.

image

To produce its model object, the state builder wires up the commands and produces its transitions.

image

The last step is to produce the reset events.

image

Using a class and its fields as a symbol table does involve a bit of tricky code in places, but the benefit is full static typing and IDE support. That’s usually a worthwhile tradeoff.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.17.18