Essay 49 Generating Code at Its Core

Taking the leap into code generation is an important pilgrimage every developer ought to take. It frees us to think about code as a powerful tool to transform how we work, not just as merely the material we use to write programs.

So, how do we actually write a generator? For a truly in-depth source, I highly recommend Jack Herrington’s outstanding book, Code Generation in Action [Her03]. It covers detailed techniques and high-level patterns for generating code of all kinds. But we don’t need that level of detail to get started. Here are the essentials.

Define Your Input Source

First, create an input source. It’s the place that houses all the parameters our code generator needs to do its work. The input source can be as simple as a plain XML or JSON file or as robust as a database itself.

When we first deployed our company’s code generator, X2O, we used an XML file as the input source. The XML file defined the tables, fields, and foreign keys for the database we generated code against. Here’s an example of converting the blog data model in Essay 48, Separate Robot Work from Human Work into an XML input source:

 
<input_source>
 
<table​ name=​"Posts"​​>
 
<field​ name=​"ID"​ type=​"int"​ identity=​"true"​ ​/>
 
<field​ name=​"Title"​ type=​"NVarChar"​ length=​"100"​​/>
 
<field​ name=​"CreateDate"​ type=​"DateTime"​ ​/>
 
<field​ name=​"Body"​ type=​"NText"​ ​/>
 
<foreignkey​ name=​"AuthorID"​ to_table=​"Authors"​ ​/>
 
</table>
 
<table​ name=​"Authors"​​>
 
<field​ name=​"ID"​ type=​"int"​ identity=​"true"​ ​/>
 
<field​ name=​"FirstName"​ type=​"NVarChar"​ length=​"50"​​/>
 
<field​ name=​"LastName"​ type=​"NVarChar"​ length=​"50"​ ​/>
 
</table>
 
<table​ name=​"Comments"​​>
 
<field​ name=​"ID"​ type=​"int"​ identity=​"true"​ ​/>
 
<field​ name=​"Comment"​ type=​"NText"​ ​/>
 
<field​ name=​"Email"​ type=​"NVarChar"​ length=​"100"​ ​/>
 
<field​ name=​"CreateDate"​ type=​"DateTime"​ ​/>
 
<foreignkey​ name=​"PostID"​ to_table=​"Posts"​ ​/>
 
</table>
 
</input_source>

Over time, your input source will grow. As you find more things to generate, you’ll likely need more kinds of inputs. For example, a few months after building the first version of X2O, we wanted to augment our generator by having it create documentation. We added an attribute called friendly_description for each table and field node. We could then reference those attributes to generate API reference documentation for our ActionScript code.

Choose the Right Programming Language

Program in a language that’s suitable for generating code. The language we write a code generator with doesn’t have to be the same as the language the generated code is written in. In X2O, we use C# to write our code generators, but the output contains SQL, C#, HTML, and ActionScript.

The language of choice must have I/O capabilities so you can actually save the generated code output to your machine. Fortunately, pretty much any of today’s popular programming languages (C, C++, C#, VB, Java, PHP, Python, Ruby, Perl) support this. If you’ve never read or written files using your programming language, spend an hour researching it. Your code generator will be doing a lot of this.

Herrington’s preferred language is Ruby because of its I/O support and its support of text-template tools (like ERb and ERuby), and it plays well with XML, the input source language he uses in his examples.

Extract Your Input Source into Something Usable

With input source in hand, write a program to extract its contents into something usable. In our case, we mapped the contents of the XML file into its own object in C#. This lets you have both a system that’s easy to work with when constructing the input source (XML) and a system that’s easy to work with when you’re generating code against the input source (like an object in C#).

In today’s landscape, languages like E4X (ECMAScript for XML) make converting an input source into a programmatic object pretty seamless. Whatever method you use, it’s critical to have an easy way to loop through and introspect your input source. You’ll see why in the next step.

Combine Your Input Source Provider with Templates

With a usable programming environment and input source defined, the next step is to write templates. In our blog example, each tedious part of the development process had a formula. For example, to generate all CRUD statements, we do nothing more than loop through every table in our data model and apply the same statements for each. Take the SQL CREATE statements. We can take the following bit of real SQL code...

 
CREATE​ PROCEDURE CreatePost (
 
@Title NVARCHAR(255),
 
@CreateDate ​DATETIME​,
 
@Body NTEXT,
 
@AuthorID ​INT​)
 
AS
 
INSERT​ ​INTO​ Post ​VALUES​ (
 
@Title,
 
@CreateDate,
 
@Body,
 
@AuthorID)

...and replace the custom parts with replaceable variables...

 
CREATE​ PROCEDURE ​Create​[cur_table] (
 
[List_of_attributes_as_input_params])
 
AS
 
INSERT​ ​INTO​ [cur_table] ​VALUES​ (
 
[List_of_attributes_as_SQL_insert_params])

...to create a template for generating CREATE statements.

With this template, we can loop through each table node in our input source provider and fill in the appropriate values. In this case, cur_table is just the name of each table, while List_of_attributes_as_input_params and List_of_attributes_as_SQL_insert_params are found by inspecting the field nodes of the input source provider.

In pseudocode, the creation of generated code looks like this:

  1. Build an example file for the code you want to generate.

  2. Create a template by extracting the custom parts and replacing them with variables.

  3. Write code to read in the template file, loop through the input source, and replace the variables from the template file as necessary.

  4. Write the newly created file to disk.

  5. Do something with the files at the end (run them, compile them, and so on).

Component-Driven Design

A good rule of thumb is to keep all generators as separate libraries. Early on, X2O was a mass of code in one large file. The code that generated the database, SQL scripts, data access layer, web services, Flash objects, and CMS files all lived in the same library. While it worked, it grew to be unmanageable. It was harder to maintain because any minor change to the generator meant recompiling tens of thousands of lines of code.

Once we pulled each part out into about three dozen separate libraries, it was a lot easier to maintain. We could then chain all the generators together by referencing them in one all-encompassing master generator library. It also lets us toggle certain generators if we don’t always need them.

Encapsulating and componentizing are good programmer habits anyway, but they’re especially important when we’re building dozens of little generators.

With these five simple tips in mind, we can get out of the starting gates.

Automate with Care

Is there anything bad about code generation? Are there times when we shouldn’t be using it to our advantage? Yes. Here’s a couple common mistakes you might make early on in your automation experience.

Avoid Touching Generated Code with Bare Hands

Make a strict rule that any generated code is not to be modified after it has been generated. Generated code is like fine china: you break it, you pay for it!

Generating code, only to go noodling around in it afterward, might make our process more tedious, not less. Why? Suppose we add a new field to our database and want to regenerate our new code against an updated data model. Each time we did that, we’d have to remind ourselves what we hand-modified and ensure the code is modified again.

If we really do need to noodle around our generated code, there are elegant ways around the problem. In C#, we can mark a class as partial. This lets us define a class in multiple source files. In X2O, every generated C# class is partial so that, if we ever needed to, we could add any additional methods or properties in a separate file marked with the same partial class.

If you don’t have the option of partial classes in your language of choice, there are other elegant approaches too. For instance, you can extend classes or write custom helper classes.

Keep Generated Code as Tidy as Real Code

When we program by hand, keeping our code tidy is particularly important because we want it to be easy to maintain down the road. That’s exactly why it’s hard to motivate ourselves to keep generated code equally tidy—we never need to actually maintain the code we generate. We maintain only the generator.

That’s why some argue that code generation is a trade-off between rapid output and custom-fit code—it creates a lot of excess that rarely gets used by the end application. Because it’s so easy for programs to spit out code, we may not care as much to have it generate concise, optimized code. But this is something easily resolved.

Perhaps our next project doesn’t need a certain set of data access methods. We can use our input source to define some optional parameters so that we’re not spitting out sheets of excess code for a project that doesn’t need it. As our generator matures, we might want to toggle certain code from generating. This is where component-driven design really helps.

Some argue that code generators produce inelegant code. However, this has nothing to do with code generators and everything to do with how we prescribe what our code generator should produce.

If our generated classes have duplicate functions or common methods, we can refactor the templates that make up our code generator. We can write the duplicate functions into a stand-alone class that lives outside the generator. We can still apply the same programming-by-hand techniques to our generated code.

In code generation, nothing stops us from still following good programming principles.

Know What Not to Generate

While code generation makes you think more critically about the patterns in your everyday work, it’s equally important to not force those patterns. After the first few sweet victories of successful code generation, we might feel that air of invincibility and start trying to wrap everything into a code generator—even the things that really aren’t automatable (but certainly tedious). It’s easy to try to cram too much automation into things that are still too custom.

This is where we really have to consider the benefits of code generation. If our output code requires too many custom inputs to generate or requires too many hacks to use, we probably shouldn’t be generating that bit in the first place. Just like bad code smells, there are also bad code generation smells.

Writing code generators gets us thinking about what is truly automatable and tedious vs. what is just tedious.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.40.47